I'm implementing muP for the OLMo model, and am facing an issue with the coordinate check.


The increasing l1 is for the network output. Following the docs, I also set readout init and query init to zero. I also ensure that the initialization is applied after set_base_shapes is called.
What other things can I check to debug the issue?