Skip to content

Increasing coord check for the network output #71

@AkshitaB

Description

@AkshitaB

I'm implementing muP for the OLMo model, and am facing an issue with the coordinate check.

sp_trsfmr_adamw_coord
μp_trsfmr_adamw_coord

The increasing l1 is for the network output. Following the docs, I also set readout init and query init to zero. I also ensure that the initialization is applied after set_base_shapes is called.

What other things can I check to debug the issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions