refactor: decompose lexer into functional submodules #264

LunaStev · 2026-01-05T08:34:27Z

Following the modularization of the parser and codegen, this PR breaks down the monolithic lexer.rs into specialized submodules within front/lexer/src/lexer/. This reorganization separates low-level source navigation from high-level token dispatch and literal parsing, making the lexer significantly easier to maintain and extend.

Key Changes

1. Lexer Modularization

The lexer logic has been split into the following functional components:

core.rs: Definitions for the Lexer and Token structures, serving as the foundational types.
cursor.rs: Implements low-level source navigation methods such as advance(), peek(), peek_next(), and match_next().
scan.rs: The primary entry point for tokenization, containing the main next_token() dispatch logic and character-level matching.
ident.rs: Logic for scanning identifiers and mapping them to language keywords.
literals.rs: Specialized scanning for string and character literals, including escape sequence handling.
trivia.rs: Logic for skipping non-token "trivia" such as whitespace and various comment styles.
common.rs: Internal shared imports and utilities used across the lexer submodules.

2. Integration & API Cleanup

Exposed Structure: Updated front/lexer/src/lib.rs and mod.rs to correctly export the new modular structure while maintaining a clean public API.
External Updates: Adjusted imports in the front/parser crate to align with the new lexer paths, specifically ensuring TokenType and Token are correctly referenced.

3. Behavioral Consistency

This is a pure structural refactor. The tokenization logic, keyword recognition, and literal parsing remain behaviorally identical to the previous implementation.
The Lexer public interface remains stable to prevent breaking changes in the compiler runner.

Impact

Maintainability: Concerns are now clearly separated. For example, changing how numbers are peeked only requires touching cursor.rs, while adding new keywords only involves ident.rs.
Readability: Individual files are now focused and significantly smaller, reducing the overhead for new contributors.
Architecture: Completes the project-wide goal of modularizing the frontend crates.

Break down the lexer implementation into logical components to improve code organization and readability. Changes: - **New Module Structure**: - `core.rs`: `Lexer` and `Token` struct definitions and entry points. - `cursor.rs`: Low-level source navigation (`advance`, `peek`, `match_next`). - `scan.rs`: Main token dispatch logic (`next_token`). - `ident.rs`: Identifier scanning and keyword mapping. - `literals.rs`: String and character literal parsing. - `trivia.rs`: Whitespace and comment skipping. - `common.rs`: Internal shared imports. - **Integration**: - Updated `front/lexer/src/lib.rs` and `mod.rs` to expose the new structure. - Updated imports in `front/parser` to align with the refactored lexer API (explicit `use lexer::token::TokenType` where necessary). This modularization separates concerns, making the lexer easier to maintain and extend. Signed-off-by: LunaStev <luna@lunastev.org>

LunaStev merged commit a8216e6 into wavefnd:master Jan 5, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

refactor: decompose lexer into functional submodules #264

refactor: decompose lexer into functional submodules #264

Uh oh!

LunaStev commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

refactor: decompose lexer into functional submodules #264

refactor: decompose lexer into functional submodules #264

Uh oh!

Conversation

LunaStev commented Jan 5, 2026

Key Changes

1. Lexer Modularization

2. Integration & API Cleanup

3. Behavioral Consistency

Impact

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant