AI OS in a Process
Mayfly is a coroutine-first, modular scheduling engineβoriginally the internal kernel for Tiffany β now evolving independently as a lightweight, pluggable microkernel for AI agents. It provides a scalable, extensible execution environment for ephemeral, concurrent jobs across coroutine, async, threaded, and multiprocess engines, aiming to be the core of agent runtimes, simulation systems, and intelligent orchestration.
Mayfly isn't just a job runner β it's a scheduler, runtime, and reasoning substrate for concurrent agent cognition.
Mayfly originated as the internal kernel of the Tiffany cognitive agent system, extracted as a standalone engine to be reused across Microscalerβs agent stack. Designed by Charles Sibbald and the Microscaler core team, it embodies the ethos of resilience, composability, and minimalism.
Its job model was shaped by decades of microkernel, actor, and AI research β and designed to scale from embedded AI to cloud-scale distributed cognition.
Mayfly is a runtime to think with.
Thanks go to David Beazley for inspiration on coroutine scheduling, and to the Erlang/OTP team for their pioneering work on lightweight processes and distributed systems.
Mayfly will be the AI OS in a process: orchestrating tasks, reasoning steps, perception cycles, and distributed cognition with the same elegance as a real-time microkernel.
Inspired by both Erlang and modern distributed systems, Mayfly enables:
- Thousands of lightweight jobs across coroutine, thread, or process engines
- Structured lifecycle control of tasks (spawn, yield, sleep, wait, cancel)
- Local reasoning and global coordination
- Simulation of time, planning, and agentic flow
- Federated agent behavior across mesh environments
Mayfly can operate in two modes:
- Used within Tiffany or another embedding runtime
- Scheduler is instantiated and controlled programmatically
- No external I/O or listening ports
-
Mayfly runs as a long-lived process (e.g., container, microVM, system service)
-
Accepts jobs via IPC, gRPC, or A2A messages
-
Exposes
/metrics,/__health, and remote scheduling endpoints -
Ideal for:
- Distributed execution clusters
- Mesh-based agent platforms
- Fault-tolerant job infrastructure
The mayfly-daemon crate encapsulates this mode and acts as a node within a larger mesh or scheduler topology. It is optional but provides flexibility for infrastructure scenarios.
As Mayfly evolves, we intentionally constrain its scope to avoid bloating and role confusion. It is critical to ensure Mayfly remains a focused execution kernel β not a domain-specific orchestrator or reasoning engine.
- No LLMs
- No decision trees
- No goal resolution or planning logic
Cognition belongs in Tiffany or an upstream agent orchestration layer.
- No pipelines or behaviors
- No rules or scheduling policies based on domain semantics
- It only knows how to run tasks, not why
- No understanding of tasks like "summarize", "plan", "analyze"
- All jobs must conform to abstract, pluggable
SchedulableJobtraits
- It does not provide pubsub, event queues, or general messaging
- It relies on external systems (gRPC, QUIC, A2A) for ingress/egress
- No tokens, sessions, ACLs, or roles
- All security concerns are handled by edge components
- Execution of jobs on local or remote engines
- Observability of those jobs
- Lifecycle control (spawn, yield, sleep, wait, cancel, retry)
- Minimal, agentic coordination (task routing, offloading, stealing)
Think of Mayfly like the kernel in an OS. It runs things, manages time and memory, but never makes decisions about applications, users, or goals.
- Green-threaded coroutine task execution using
may - Unified SystemCall interface (sleep, yield, spawn, wait, etc.)
- WaitMap: arbitrary-key blocking and waking
- ReadyQueue: fair, prioritized task queue
- Clock abstraction: monotonic and wall time
- Local task spawning, lifecycle, retry logic
- Task IDs, states, backoff, and metrics
- Modular execution engines: Coroutine / Async / Threaded / Process
- Supervisor trees and structured task graphs
- Simulated time kernel for predictive planning
- Metrics + tracing for observability
- Pluggable job protocols (A2A, JSON-RPC, LLM-Chat)
- WASM task engine + sandboxing
- Distributed runner mesh with auto-routing and resilience
- Work stealing + offloading for dynamic load redistribution
- Daemon runtime for remote execution, IPC/gRPC interfaces, and cluster behavior
- Coroutine engine refactor
- Public crate split from Tiffany
- Async + threaded execution engines
- Tracing + Prometheus metrics
- A2A Protocol job interface
- Simulated time + replayable plans
- Distributed runner mesh
- Fault detection + migration
- Work stealing + offloading
- Job deduplication + supervision
Mayfly nodes can operate cooperatively in a decentralized mesh:
-
Each runner manages its own task queues and engines
-
Jobs can be routed to other nodes based on:
- Hashing (job ID, queue, agent key)
- Load metrics
- Engine capability (GPU, WASM, etc.)
-
Supports:
- Gossip-based runner discovery
- Job forwarding and task migration
- Node failure detection + recovery
To maintain balance across the mesh:
- Work stealing: underutilized nodes pull tasks from busy peers
- Work offloading: overloaded nodes push tasks to idle peers
Key principles:
- Steal from the tail (oldest or lowest priority)
- Offload when local thresholds exceeded (CPU, queue depth, memory)
- Track origin + lineage for observability and traceability
- Serialize tasks into transferable job envelopes
Planned metrics:
mayfly_jobs_forwarded_totalmayfly_jobs_stolen_totalmayfly_job_origin{node_id="..."}
This enables:
- Horizontal scaling
- Hot code mobility
- Workload recovery from partial failures
Mayfly integrates with the A2A Protocol to support agent-native communication:
-
Incoming jobs can be submitted as A2A
task.submitorgoal.proposemessages -
Internal Mayfly jobs can emit
a2a.result,a2a.intent, ora2a.plan.execute -
Runner mesh communication can optionally use signed A2A messages for:
- Job forwarding
- Agent-to-agent delegation
- Result propagation
This enables:
- Full interop with LangChain, AutoGen, and other A2A-compatible agents
- Human-inspectable agent message logs
- Verifiable, typed intent exchange between distributed Mayfly runtimes
task.rsβ task lifecycle,TaskId,TaskState,TaskWakersyscall.rsβ abstract system calls (Sleep,WaitOn,Spawn, etc.)scheduler.rsβ queue execution loop + syscall dispatcherwait_map.rsβ dynamic signal blocking/waking mapready_queue.rsβ priority scheduler queuepal/β platform abstraction layerclock.rsβ time simulation + wall timeengine/β coroutine, async, threaded, process (extensible)mesh.rs(planned) β distributed runner coordination + routinga2a.rs(planned) β message parsing, validation, schema translation
use mayfly::prelude::*;
struct MyJob;
impl SchedulableJob for MyJob {
fn run(&self) -> Result<(), JobError> {
println!("Running cognitive task");
Ok(())
}
}
fn main() {
let sched = Scheduler::builder()
.register_queue("default")
.build();
sched.submit("default", MyJob);
sched.run_blocking();
}| Engine | Description | Use Case |
|---|---|---|
| Coroutine | may-based green threads |
Reasoning, planning, simulation |
| Async | Rust futures + event loop | Streaming, network ops |
| Threaded | Native threads (Rayon etc) | Parallel CPU tasks |
| Process | Subprocesses, WASM, plugins | Isolation, security, plugins |
mayfly_tasks_total{status="completed"}mayfly_task_latency_secondsmayfly_queue_depth{queue="default"}mayfly_syscall_counts{type="wait"}mayfly_jobs_forwarded_totalmayfly_jobs_stolen_totalmayfly_job_origin{node_id="..."}- Planned: OpenTelemetry + log correlation
- Property-based task and queue testing
- Load tests across thousands of tasks
- Time simulation + determinism
- Distributed mesh convergence simulation
- Fault injection for node failure recovery
mayfly/
βββ src/
β βββ lib.rs
β βββ task.rs
β βββ syscall.rs
β βββ scheduler.rs
β βββ ready_queue.rs
β βββ wait_map.rs
β βββ clock.rs
β βββ pal/
β βββ engine/
β βββ mesh.rs (planned)
β βββ a2a.rs (planned)
The tinkerbell binary exposes a few flags for customizing the runtime:
| Flag | Description |
|---|---|
--concurrency <N> |
Number of worker threads to spawn |
--quantum <ms> |
Scheduling quantum in milliseconds |
-v, --verbose |
Increase logging verbosity (use -vv for trace) |
--dump-state |
Emit a scheduler snapshot on shutdown |
MIT OR Apache 2.0
