TestWeaver is an advanced regression test generation tool that integrates Large Language Models (LLMs) with lightweight program analysis. Its goal is to generate high-quality test cases that enhance code coverage while addressing common challenges such as redundant test generation and the coverage plateau.
Unlike traditional test generators, TestWeaver incrementally builds a test suite by reasoning about program execution. It begins with seed tests and iteratively refines them through feedback-driven guidance informed by execution analysis, slicing, and "closest" test case retrieval.
- Execution-aware feedback: Uses real execution traces to guide the LLM toward covering uncovered lines.
- Backward slicing: Focuses the LLM on only the relevant code for each target line, reducing hallucinations.
- Closest test retrieval: Identifies test cases that nearly reach the uncovered line to serve as contextual guidance.
- Support for multiple LLM providers: Works with OpenAI, Anthropic, or AWS Bedrock.
pip install -r requirements.txtYou need to set up access to an LLM provider before running TestWeaver.
echo "OPENAI_API_KEY=sk-your-actual-api-key-here" > .env
echo "OPENAI_BASE_URL=https://api.openai.com/v1" >> .envWe conduct our evaluation using the CodaMosa (CM) suite, a dataset derived from 35 open-source Python projects.
To download the dataset, run:
git clone https://github.com/plasma-umass/codamosa.gitYou can run TestWeaver on a specific subproject within a larger repository.
This will launch an experimental run.
The results will be saved under the output/cm/... directory.
cd scripts/
export PYTHONPATH=$(pwd)
export sample_id=21 # The id of your chosen Codamosa module, e.g. 21 is corresponding to 'tqdm' module
python testweaver.py --test-index $sample_idTo evaluate the impact of different components, you can run TestWeaver in an ablation study mode. This command will execute five experimental configurations:
- With slicing
- Without slicing
- Without execution-in-line
- Without closest-test retrieval
- Full TestWeaver pipeline
The results will be saved under the output/cm/... directory.
cd scripts/
export PYTHONPATH=$(pwd)
export sample_id=21 # The id of your chosen Codamosa module, e.g. 21 is corresponding to 'tqdm' module
python ablate.py --test-index $sample_id- TestWeaver builds tests incrementally by reasoning about what code remains uncovered.
- It uses slicing and closest-test retrieval to make LLM prompts more focused and effective.
- Generated tests are saved as
.pyfiles and can be executed withpytest.
Run CoverUp baseline with DeepSeek model.
Prerequisites: Docker, Python 3.10+
Steps:
- Ensure
.envfile is configured (same as TestWeaver):
echo "OPENAI_API_KEY=sk-your-actual-api-key-here" > .env
echo "OPENAI_BASE_URL=https://llm-prof-tien.thaiminhpv.id.vn/" >> .env- Load docker image:
docker load -i scripts/baselines/coverup/docker/coverup-runner.tar- Run CoverUp baseline:
cd scripts/baselines/coverup
python3 scripts/eval_coverup.py --config deepseek-v3 --suite cmOptional: Run on specific package or file:
python3 scripts/eval_coverup.py --config deepseek-v3 --suite cm --package tqdm
python3 scripts/eval_coverup.py --config deepseek-v3 --suite cm --only tqdm/_tqdm.pyOutput: scripts/baselines/coverup/output/cm.deepseek-v3/<package>/final.json
Run CodaMosa baseline with DeepSeek model.
Prerequisites: Docker, Python 3.10+
Steps:
- Ensure
.envfile is configured (same as TestWeaver):
echo "OPENAI_API_KEY=sk-your-actual-api-key-here" > .env
echo "OPENAI_BASE_URL=https://llm-prof-tien.thaiminhpv.id.vn/" >> .env- Load docker images:
cd scripts/baselines/codamosa/replication
docker load < docker-images/benchmarks-docker.tar.gz
docker load < docker-images/codamosa-docker.tar.gz- Start benchmark container (if not already started):
./scripts/start_benchmark_container.sh- Run CodaMosa baseline:
python3 run_codamosa_deepseek.pyOutput: scripts/baselines/codamosa/replication/deepseek-coda/<module>-<run>/statistics.csv