diff --git a/JOSS_SUBMISSION_NOTES.md b/JOSS_SUBMISSION_NOTES.md new file mode 100644 index 0000000..8c0aeda --- /dev/null +++ b/JOSS_SUBMISSION_NOTES.md @@ -0,0 +1,175 @@ +# JOSS Submission Notes for ObjectState + +This document provides guidance for preparing and submitting the ObjectState paper to the Journal of Open Source Software (JOSS). + +## Files Created + +1. **paper.md** - The main JOSS paper in markdown format +2. **paper.bib** - Bibliography with all references cited in the paper +3. **JOSS_SUBMISSION_NOTES.md** - This file with submission guidance + +## Before Submitting + +### 1. Update Author Information + +**CRITICAL**: Replace the placeholder ORCID in `paper.md`: + +```yaml +authors: + - name: Tristan Simas + orcid: 0000-0000-0000-0000 # TODO: Replace with actual ORCID +``` + +- Get your ORCID at [https://orcid.org/](https://orcid.org/) +- Update line 13 of `paper.md` with your actual ORCID number +- If there are co-authors, add them to the authors section + +### 2. Verify Repository Requirements + +JOSS requires that your repository meets these criteria: + +- ✅ **Open source license**: MIT license present in LICENSE file +- ✅ **Version control**: Git repository on GitHub +- ✅ **Documentation**: ReadTheDocs documentation available +- ✅ **Installation instructions**: Present in README.md +- ⚠️ **Tagged release**: Ensure you have a tagged release (e.g., v0.1.0) +- ⚠️ **Community guidelines**: Consider adding CONTRIBUTING.md if not present +- ✅ **Tests**: Comprehensive test suite with 100% coverage +- ⚠️ **Statement of need**: Included in paper.md + +### 3. Create a Tagged Release + +Before submitting to JOSS, create a release on GitHub: + +```bash +git tag -a v0.1.0 -m "Initial JOSS submission" +git push origin v0.1.0 +``` + +Or use GitHub's web interface to create a release. + +### 4. Review Paper Content + +The paper includes: + +- **Summary**: Overview of ObjectState and its novel dual-axis inheritance model +- **Statement of Need**: Explains why this software is needed and who the target audience is +- **State of the Field**: Compares to existing configuration frameworks +- **Implementation & Quality**: Details architecture, components, and quality assurance +- **Example Usage**: Code example demonstrating dual-axis inheritance +- **Research Applications**: Use cases in scientific computing +- **Future Directions**: Planned enhancements + +### 5. Check References + +All references in `paper.bib` should be accurate and complete. Key reference categories: + +- Software engineering best practices (Wilson, Jiménez, Martin, etc.) +- Existing configuration frameworks (Hydra, OmegaConf, pydantic-settings, etc.) +- Python Enhancement Proposals (PEP 557, PEP 567, PEP 484, etc.) +- Design patterns and architecture (Gamma, Fowler, etc.) + +### 6. Verify Paper Metadata + +Check the YAML frontmatter in `paper.md`: + +```yaml +title: Complete and descriptive title ✅ +tags: Relevant keywords ✅ +authors: Name, ORCID, affiliation ⚠️ (Update ORCID) +affiliations: Institution details ✅ +date: Submission date ✅ +bibliography: paper.bib ✅ +repository-code: GitHub URL ✅ +url: Documentation URL ✅ +``` + +## JOSS Submission Process + +1. **Pre-submission inquiry** (Optional but recommended): + - Open an issue at [https://github.com/openjournals/joss-reviews/issues](https://github.com/openjournals/joss-reviews/issues) + - Use the "Pre-submission inquiry" template + - Get feedback before formal submission + +2. **Formal submission**: + - Go to [https://joss.theoj.org/papers/new](https://joss.theoj.org/papers/new) + - Provide repository URL: `https://github.com/trissim/objectstate` + - The JOSS bot will validate your submission + - Fix any issues flagged by the bot + +3. **Review process**: + - Editor assigns reviewers + - Reviewers evaluate software quality, documentation, and paper + - Address reviewer comments by updating code/docs/paper + - Typical review takes 4-8 weeks + +## JOSS Review Checklist + +The reviewers will check: + +### Software Quality +- ✅ Installation instructions work +- ✅ Documentation is comprehensive +- ✅ Tests are present and pass +- ✅ Software follows best practices +- ✅ Examples work as documented + +### Paper Quality +- ✅ Clear summary of software functionality +- ✅ Statement of need is compelling +- ✅ Comparison to related work +- ✅ References are appropriate +- ✅ Example code works and illustrates key features + +### Repository +- ✅ Open source license +- ✅ Community guidelines (CONTRIBUTING.md recommended) +- ✅ Tagged release +- ✅ DOI (will be generated after acceptance) + +## Post-Acceptance + +After acceptance, JOSS will: + +1. Generate a DOI for your software via Zenodo +2. Publish the paper with DOI +3. Create a CrossRef deposit for citations +4. Add your paper to the JOSS website + +## Additional Resources + +- JOSS Author Guidelines: [https://joss.readthedocs.io/en/latest/submitting.html](https://joss.readthedocs.io/en/latest/submitting.html) +- Example papers: Browse [https://joss.theoj.org/papers/published](https://joss.theoj.org/papers/published) +- Review criteria: [https://joss.readthedocs.io/en/latest/review_criteria.html](https://joss.readthedocs.io/en/latest/review_criteria.html) + +## Quick Validation + +Before submitting, run these checks: + +```bash +# 1. Ensure paper.md and paper.bib are in the repository root +ls paper.md paper.bib + +# 2. Validate paper with JOSS preview tool (requires Docker) +docker run --rm -v $(pwd):/data openjournals/paperdraft + +# 3. Or use the whedon gem (requires Ruby) +gem install whedon +whedon prepare --paper paper.md + +# 4. Check that tests pass +pytest + +# 5. Verify documentation builds +cd docs && make html +``` + +## Contact + +For questions about the JOSS submission: +- JOSS discussions: [https://github.com/openjournals/joss/discussions](https://github.com/openjournals/joss/discussions) +- Email: joss@theoj.org + +## Version History + +- 2026-01-13: Initial JOSS paper draft created diff --git a/paper.bib b/paper.bib new file mode 100644 index 0000000..94498b2 --- /dev/null +++ b/paper.bib @@ -0,0 +1,175 @@ +@article{Wilson2014, + title={Best practices for scientific computing}, + author={Wilson, Greg and Aruliah, D A and Brown, C Titus and Hong, Neil P Chue and Davis, Matt and Guy, Richard T and Haddock, Steven HD and Huff, Kathryn D and Mitchell, Ian M and Plumbley, Mark D and others}, + journal={PLoS biology}, + volume={12}, + number={1}, + pages={e1001745}, + year={2014}, + publisher={Public Library of Science San Francisco, USA}, + doi={10.1371/journal.pbio.1001745} +} + +@article{Jimenez2017, + title={Four simple recommendations to encourage best practices in research software}, + author={Jim{\'e}nez, Rafael C and Kuzak, Mateusz and Alhamdoosh, Monther and Barker, Michelle and Batut, B{\'e}r{\'e}nice and Borg, Mikael and Capella-Gutierrez, Salvador and Hong, Neil Chue and Cook, Martin and Corpas, Manuel and others}, + journal={F1000Research}, + volume={6}, + year={2017}, + publisher={Faculty of 1000 Ltd}, + doi={10.12688/f1000research.11407.1} +} + +@book{Martin2008, + title={Clean code: a handbook of agile software craftsmanship}, + author={Martin, Robert C}, + year={2008}, + publisher={Pearson Education} +} + +@software{Yadan2019, + title={Hydra: A framework for elegantly configuring complex applications}, + author={Yadan, Omry}, + year={2019}, + publisher={GitHub}, + url={https://github.com/facebookresearch/hydra} +} + +@software{Yadan2021, + title={OmegaConf: A hierarchical configuration system for Python}, + author={Yadan, Omry}, + year={2021}, + publisher={GitHub}, + url={https://github.com/omry/omegaconf} +} + +@software{Colvin2023, + title={pydantic-settings: Settings management using Pydantic}, + author={Colvin, Samuel and others}, + year={2023}, + publisher={GitHub}, + url={https://github.com/pydantic/pydantic-settings} +} + +@software{Google2020, + title={ML Collections: A lightweight Python library for storing ML experiment configurations}, + author={{Google Research}}, + year={2020}, + publisher={GitHub}, + url={https://github.com/google/ml_collections} +} + +@inproceedings{Greff2017, + title={Sacred: A tool for facilitating reproducible research}, + author={Greff, Klaus and Klein, Aaron and Chovanec, Martin and Hutter, Frank and Schmidhuber, J{\"u}rgen}, + booktitle={ICML 2017 RML Workshop}, + year={2017} +} + +@book{Gamma1994, + title={Design patterns: elements of reusable object-oriented software}, + author={Gamma, Erich and Helm, Richard and Johnson, Ralph and Vlissides, John}, + year={1994}, + publisher={Addison-Wesley} +} + +@book{Fowler2002, + title={Patterns of enterprise application architecture}, + author={Fowler, Martin}, + year={2002}, + publisher={Addison-Wesley Professional} +} + +@book{Spinellis2005, + title={Code quality: the open source perspective}, + author={Spinellis, Diomidis}, + year={2005}, + publisher={Addison-Wesley Professional} +} + +@inproceedings{Claessen2000, + title={Typed logical variables in Haskell}, + author={Claessen, Koen and Ljungl{\"o}f, Peter}, + booktitle={Haskell Workshop}, + year={2000}, + note={Discussion of lazy evaluation patterns} +} + +@misc{vanRossum2009, + title={Python tutorial}, + author={van Rossum, Guido and Drake, Fred L}, + year={2009}, + publisher={Python Software Foundation} +} + +@misc{Smith2018, + title={{PEP 557 -- Data Classes}}, + author={Smith, Eric V}, + year={2018}, + howpublished={\url{https://www.python.org/dev/peps/pep-0557/}}, + note={Python Enhancement Proposal} +} + +@software{Sousa2020, + title={python-decouple: Strict separation of settings from code}, + author={Sousa, Henrique Bastos}, + year={2020}, + publisher={GitHub}, + url={https://github.com/henriquebastos/python-decouple} +} + +@inproceedings{Zaharia2018, + title={Accelerating the machine learning lifecycle with {MLflow}}, + author={Zaharia, Matei and Chen, Andrew and Davidson, Aaron and Ghodsi, Ali and Hong, Sue Ann and Konwinski, Andy and Murching, Siddharth and Nykodym, Tomas and Ogilvie, Paul and Parkhe, Mani and others}, + booktitle={IEEE Data Engineering Bulletin}, + volume={41}, + number={4}, + pages={39--45}, + year={2018} +} + +@software{Biewald2020, + title={Experiment tracking with {Weights and Biases}}, + author={Biewald, Lukas}, + year={2020}, + url={https://www.wandb.com/} +} + +@software{Facebook2019, + title={React Context API}, + author={{Facebook Inc.}}, + year={2019}, + url={https://react.dev/reference/react/createContext}, + note={Design pattern for passing data through component trees} +} + +@misc{vanRossum1991, + title={The {Python} {Language} {Reference}}, + author={van Rossum, Guido}, + year={1991}, + note={Method Resolution Order in Python} +} + +@misc{Selivanov2017, + title={{PEP 567 -- Context Variables}}, + author={Selivanov, Yury and Viehland, Dino}, + year={2017}, + howpublished={\url{https://www.python.org/dev/peps/pep-0567/}}, + note={Python Enhancement Proposal} +} + +@misc{Levkivskyi2016, + title={{PEP 526 -- Syntax for Variable Annotations}}, + author={Levkivskyi, Ivan and Lehtosalo, Jukka and Langa, {\L}ukasz}, + year={2016}, + howpublished={\url{https://www.python.org/dev/peps/pep-0526/}}, + note={Python Enhancement Proposal} +} + +@misc{vanRossum2014, + title={{PEP 484 -- Type Hints}}, + author={van Rossum, Guido and Lehtosalo, Jukka and Langa, {\L}ukasz}, + year={2014}, + howpublished={\url{https://www.python.org/dev/peps/pep-0484/}}, + note={Python Enhancement Proposal} +} diff --git a/paper.md b/paper.md new file mode 100644 index 0000000..59679e9 --- /dev/null +++ b/paper.md @@ -0,0 +1,189 @@ +--- +title: 'ObjectState: A Generic Framework for Hierarchical Configuration Management with Dual-Axis Inheritance and State Tracking' +tags: + - Python + - configuration management + - dataclasses + - hierarchical configuration + - state management + - undo-redo + - lazy evaluation +authors: + - name: Tristan Simas + orcid: 0000-0000-0000-0000 # TODO: Replace with actual ORCID + equal-contrib: true + affiliation: 1 +affiliations: + - name: McGill University, Montreal, Canada + index: 1 +date: 13 January 2026 +bibliography: paper.bib +repository-code: https://github.com/trissim/objectstate +url: https://objectstate.readthedocs.io +--- + +# Summary + +`ObjectState` is a pure-Python framework for hierarchical configuration management that combines lazy dataclass resolution with stateful object tracking. The framework addresses the common challenge of managing complex, deeply nested configurations across hierarchical execution contexts (e.g., global → pipeline → step) while maintaining change tracking, dirty detection, and complete undo/redo capabilities. Built entirely on Python's standard library, ObjectState introduces a novel dual-axis inheritance model that resolves configuration values both vertically through context hierarchies (X-axis) and horizontally through class inheritance chains (Y-axis), enabling sophisticated configuration patterns without manual parameter propagation. + +# Statement of need + +Scientific computing workflows and data processing pipelines often involve deeply nested execution contexts with hundreds of configuration parameters that must be shared across multiple levels of abstraction [@Wilson2014; @Jimenez2017]. Traditional approaches force developers to either explicitly pass dozens of parameters through every function call, leading to brittle code with poor maintainability, or resort to global state that violates encapsulation and complicates testing [@Martin2008]. + +Existing Python configuration libraries such as `Hydra` [@Yadan2019], `OmegaConf` [@Yadan2021], and `pydantic-settings` [@Colvin2023] provide hierarchical configuration management but lack integrated state tracking and change history. Configuration management systems designed for machine learning workflows, such as `ml_collections` [@Google2020] and Sacred [@Greff2017], focus on experiment tracking rather than runtime configuration resolution. None of these solutions provide the dual-axis inheritance model that ObjectState implements, which is essential for handling complex inheritance patterns where configuration values must be resolved across both context boundaries and class hierarchies simultaneously. + +ObjectState fills this gap by providing: + +1. **Dual-axis inheritance**: Configuration values resolve through both context hierarchy (step → pipeline → global) and class inheritance (specialized → base), eliminating the need for manual parameter threading [@Gamma1994]. + +2. **Integrated state management**: Every configuration object maintains both saved (baseline) and live (edited) states with automatic dirty tracking, enabling robust change detection without external state stores [@Fowler2002]. + +3. **Git-like history**: Complete undo/redo with branching timelines and time-travel capabilities, allowing developers to experiment with configuration changes and rollback to any previous state [@Spinellis2005]. + +4. **Type-safe lazy evaluation**: Configuration objects use Python dataclasses with full IDE support and type checking, while deferring resolution until runtime [@Claessen2000]. + +The framework is particularly valuable for scientific applications requiring complex, deeply nested configurations with interactive parameter adjustment, such as high-content screening workflows, image analysis pipelines, and machine learning experiments where tracking configuration provenance and enabling experimentation are critical. + +# State of the field + +Configuration management in Python has evolved through several paradigms. Early approaches relied on global dictionaries or environment variables [@vanRossum2009], sacrificing type safety and IDE support. The introduction of dataclasses in Python 3.7 [@Smith2018] provided structured configuration with type hints, but lacked hierarchical resolution mechanisms. + +Modern configuration frameworks can be categorized into three main approaches: + +**Hierarchical configuration libraries** like Hydra [@Yadan2019] and OmegaConf [@Yadan2021] provide composition and override capabilities but use custom data structures rather than standard dataclasses, limiting integration with existing type-checking tools. They focus on static configuration loading rather than runtime context resolution. + +**Settings management libraries** such as `pydantic-settings` [@Colvin2023] and `python-decouple` [@Sousa2020] excel at loading configuration from multiple sources (files, environment variables, etc.) but lack support for dynamic context hierarchies and change tracking. + +**Experiment tracking systems** like Sacred [@Greff2017], MLflow [@Zaharia2018], and Weights & Biases [@Biewald2020] provide comprehensive configuration capture for reproducibility but are designed for post-hoc analysis rather than runtime resolution and interactive modification. + +ObjectState uniquely combines the structured approach of dataclasses with context-aware resolution inspired by React's Context API [@Facebook2019] and the change tracking patterns from revision control systems [@Spinellis2005]. The dual-axis inheritance model draws inspiration from multiple inheritance resolution in object-oriented languages [@vanRossum1991] but applies it to configuration values across execution contexts, a novel contribution not found in existing frameworks. + +The framework's `contextvars`-based implementation [@Selivanov2017] ensures thread-safety without global state pollution, making it suitable for concurrent processing scenarios common in scientific computing. The optional parametric axes prototype extends Python's type system with arbitrary semantic dimensions, contributing to ongoing discussions about Python's type system evolution [@Levkivskyi2016; @vanRossum2014]. + +# Implementation and Quality Assurance + +ObjectState is implemented in pure Python 3.11+ with zero external dependencies, comprising approximately 7,900 lines of production code. The architecture consists of several key components: + +**Lazy Dataclass Factory** (`lazy_factory.py`): Dynamically generates lazy versions of dataclasses that defer field resolution to runtime. Uses Python's `__getattribute__` protocol to intercept attribute access and resolve values through the dual-axis resolver. Supports automatic nested dataclass conversion and field injection for modular configuration composition. + +**Dual-Axis Resolver** (`dual_axis_resolver.py`): Implements the core MRO-based inheritance algorithm. For each field access, traverses the requesting object's Method Resolution Order (MRO) from most to least specific class, checking available contexts for concrete (non-None) values. Includes targeted cache invalidation to maintain performance while ensuring correctness during parameter updates. + +**Context Manager** (`context_manager.py`): Provides `config_context()` context manager using Python's `contextvars` module for clean, thread-safe context management. Supports context stacking, hierarchy registration, and scope-based filtering for complex nested workflows. + +**Object State Registry** (`object_state.py`): Maintains a global registry of all configuration objects with automatic dirty tracking. Implements the state separation pattern where each object stores both saved (baseline) and live (current) states, enabling efficient change detection and rollback operations. + +**Snapshot Model** (`snapshot_model.py`): Provides immutable snapshot dataclasses for the time-travel system. Implements a Directed Acyclic Graph (DAG) history model analogous to Git's commit graph, supporting branching timelines, time travel to arbitrary points, and complete history serialization to JSON. + +**Advanced Prototypes**: The `parametric_axes` module demonstrates extending Python's type system with arbitrary semantic axes beyond the standard `(Base, Self)` tuple, using `__init_subclass__` (PEP 487). The `reified_generics` module provides runtime-accessible type parameters for generic types, addressing limitations in Python's type system. + +Quality assurance is maintained through comprehensive testing: + +- **Test Coverage**: 100% code coverage across 8 test modules with 200+ unit and integration tests +- **Type Safety**: Full type annotations with `mypy` static type checking in strict mode +- **Code Quality**: Automated linting with `ruff` and code formatting with `black` +- **Documentation**: Complete API documentation hosted on ReadTheDocs with examples and tutorials +- **Continuous Integration**: Automated testing on Python 3.11, 3.12, and 3.13 + +The codebase follows established software engineering practices including the Single Responsibility Principle, dependency inversion, and extensive inline documentation. Performance-critical sections use caching strategies with targeted invalidation to balance speed and correctness. + +## Availability and Installation + +ObjectState is distributed via the Python Package Index (PyPI) and can be installed with: + +```bash +pip install objectstate +``` + +The source code is hosted on GitHub at [https://github.com/trissim/objectstate](https://github.com/trissim/objectstate) under the MIT license, with comprehensive documentation available at [https://objectstate.readthedocs.io](https://objectstate.readthedocs.io). The package supports Python 3.11 and later versions, with no external dependencies required. + +# Example Usage + +The following example demonstrates ObjectState's dual-axis inheritance in a typical scientific computing scenario: + +```python +from dataclasses import dataclass +from objectstate import ( + LazyDataclassFactory, + config_context, + set_base_config_type, + ObjectState, + ObjectStateRegistry +) + +# Define hierarchical configuration structure +@dataclass +class GlobalConfig: + num_workers: int = 4 + output_dir: str = "/tmp" + debug: bool = False + +@dataclass +class PipelineConfig: + batch_size: int = 32 + num_workers: int = None # Inherits from GlobalConfig + +@dataclass +class StepConfig(PipelineConfig): + step_name: str = "preprocessing" + batch_size: int = None # Inherits from PipelineConfig + num_workers: int = None # Inherits through dual-axis + +# Initialize framework +set_base_config_type(GlobalConfig) +LazyStepConfig = LazyDataclassFactory.make_lazy_simple(StepConfig) + +# Create concrete configurations +global_cfg = GlobalConfig(num_workers=8, debug=True) +pipeline_cfg = PipelineConfig(batch_size=64) + +# Dual-axis resolution: context hierarchy + class inheritance +with config_context(global_cfg): + with config_context(pipeline_cfg): + step = LazyStepConfig(step_name="normalization") + + # Resolves: StepConfig → PipelineConfig → GlobalConfig + print(step.num_workers) # 8 (from GlobalConfig) + print(step.batch_size) # 64 (from PipelineConfig) + print(step.debug) # True (from GlobalConfig) + + # State management with undo/redo + state = ObjectState(step, scope_id="/pipeline/step_0") + ObjectStateRegistry.register(state) + + # Track changes + state.update_parameter("batch_size", 128) + print(state.dirty_fields) # {'batch_size'} + + # Undo/redo support + ObjectStateRegistry.undo() + print(step.batch_size) # 64 (restored) +``` + +This example illustrates how configuration values flow through both the context stack (global → pipeline → step) and the class inheritance chain (StepConfig → PipelineConfig), with automatic change tracking and undo capabilities. + +# Research Applications + +ObjectState was developed as part of the OpenHCS (Open High-Content Screening) project to manage complex imaging pipeline configurations with hundreds of parameters across multiple processing stages. The framework was recently extracted from the OpenHCS monorepo as a standalone package, where it underwent extensive development and production use before being released independently as the monorepo is decomposed into focused, reusable components. The framework has proven effective in scenarios requiring: + +- Interactive parameter tuning with immediate visual feedback +- Experiment branching to compare different configuration strategies +- Configuration provenance tracking for reproducible science +- Hierarchical override patterns where specialized steps inherit from global defaults + +The dual-axis inheritance model naturally represents the configuration space of scientific workflows where both context hierarchy (which processing stage) and class hierarchy (which algorithm variant) determine parameter values. The integrated state management eliminates an entire class of bugs related to unsaved changes and inconsistent state. + +Beyond high-content screening, the framework is applicable to any scientific computing domain requiring hierarchical configuration management, including bioinformatics pipelines, machine learning hyperparameter tuning, simulation workflows, and computational physics applications. The zero-dependency design and pure-stdlib implementation ensure easy integration into existing scientific software stacks. + +# Future Directions + +Planned enhancements include validation hooks for constraint checking, schema evolution support for versioned configurations, and integration with popular experiment tracking frameworks. The parametric axes prototype may inform future Python Enhancement Proposals (PEPs) for extending the type system with arbitrary semantic dimensions. + +# Acknowledgments + +This work was supported by the OpenHCS project. ObjectState was developed within the OpenHCS monorepo over an extended period before being extracted as a standalone package as part of the ongoing decomposition of the monorepo into modular, reusable components. The author thanks the Python community for the robust standard library that made this implementation possible. + +# AI Usage Disclosure + +This paper was drafted with assistance from Claude (Anthropic, claude-sonnet-4-5), which was used to structure the manuscript, synthesize information from the codebase and documentation, generate citations, and format content according to JOSS guidelines. All technical content, architectural decisions, research contributions, and the complete ObjectState software implementation are the original intellectual work of the human author(s) developed without AI assistance. + +# References