Not getting perf improvements from muP at ~1.5B scale

Hey guys, first of all thanks for the awesome work!

I've implemented muP in the llm.c project (see [here](https://github.com/karpathy/llm.c/pull/650/)), the coord checks seem to be flat / correct (I went up to 15 steps and still flat!) but I am not getting any performance improvement using mup?

Could it be that this is due to smaller scale? We're testing it on 1.5B LLMs. Should we expect a different behavior at ~7B?

I wrote up a mini document on what i've done to support mup in llm.c [here](https://github.com/karpathy/llm.c/pull/650/files#diff-1af578c962426039e2974b91b115b2002468e8aa7cfe6cadba58238fe02434ad) under `mup.md`.

Am I missing something here?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Not getting perf improvements from muP at ~1.5B scale #76

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Not getting perf improvements from muP at ~1.5B scale #76

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions