-
Notifications
You must be signed in to change notification settings - Fork 46
Implement NSCache to cache MLX ModelContext #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@mattt could you please take a look at this? I tested the PR with https://github.com/mattt/chat-ui-swift and I’m seeing an issue with conversation history handling. The model appears to be responding to the previous (n-1) prompt rather than the latest one, which suggests the chat state may be getting out of sync somewhere. Though cache seems to be working fine. |
|
@noorbhatia Oh, nice. Thank you for opening this PR in response to the issue you filed earlier. I'm wrapping up work on a Swift implementation of Xet. Once I cut that initial release, I'll take a look at this next. |
|
Hi @noorbhatia. Thanks for your patience. I just pushed ee7e4dc, which extends this approach with an actor-coordinated cache that coalesces concurrent model loads per key. That way, we avoid duplicate work, while still benefiting from the same eviction behavior of How does that look to you? |
|
FWIW, this implementation uses classes and locks. I tried a pure actor approach, but it ran into Swift 6’s strict To make this work cleanly, MLX would need to make (Alternatively, it may just be that I'm not smart enough to figure out how to make this work) |
430945e to
ee7e4dc
Compare
|
Thanks so much @mattt !! This is an elegant and sophisticated solution. I really like the actor-based coalescing approach, it cleanly handles deduplication while keeping state safely isolated. |
Thanks for the clear explanation. |
Implement a simple caching for MLXLanguage
A possible solution for #89