Skip to content

FSoft-AI4Code/TestWeaver

Repository files navigation

TestWeaver

Overview

TestWeaver is an advanced regression test generation tool that integrates Large Language Models (LLMs) with lightweight program analysis. Its goal is to generate high-quality test cases that enhance code coverage while addressing common challenges such as redundant test generation and the coverage plateau.

Unlike traditional test generators, TestWeaver incrementally builds a test suite by reasoning about program execution. It begins with seed tests and iteratively refines them through feedback-driven guidance informed by execution analysis, slicing, and "closest" test case retrieval.


Key Features

  • Execution-aware feedback: Uses real execution traces to guide the LLM toward covering uncovered lines.
  • Backward slicing: Focuses the LLM on only the relevant code for each target line, reducing hallucinations.
  • Closest test retrieval: Identifies test cases that nearly reach the uncovered line to serve as contextual guidance.
  • Support for multiple LLM providers: Works with OpenAI, Anthropic, or AWS Bedrock.


🔧 Setup

. Install dependencies

pip install -r requirements.txt

🔐 Configure API Key

You need to set up access to an LLM provider before running TestWeaver.

echo "OPENAI_API_KEY=sk-your-actual-api-key-here" > .env
echo "OPENAI_BASE_URL=https://api.openai.com/v1" >> .env

📂 Prepare Dataset for Evaluation

We conduct our evaluation using the CodaMosa (CM) suite, a dataset derived from 35 open-source Python projects.

To download the dataset, run:

git clone https://github.com/plasma-umass/codamosa.git

🧪 Running TestWeaver on a Specific Subproject

You can run TestWeaver on a specific subproject within a larger repository.
This will launch an experimental run.

The results will be saved under the output/cm/... directory.

cd scripts/
export PYTHONPATH=$(pwd)
export sample_id=21  # The id of your chosen Codamosa module, e.g. 21 is corresponding to 'tqdm' module   
python testweaver.py --test-index $sample_id

🧪 Running TestWeaver Ablation Study

To evaluate the impact of different components, you can run TestWeaver in an ablation study mode. This command will execute five experimental configurations:

  1. With slicing
  2. Without slicing
  3. Without execution-in-line
  4. Without closest-test retrieval
  5. Full TestWeaver pipeline

The results will be saved under the output/cm/... directory.

cd scripts/
export PYTHONPATH=$(pwd)
export sample_id=21  # The id of your chosen Codamosa module, e.g. 21 is corresponding to 'tqdm' module   
python ablate.py --test-index $sample_id

📌 Notes

  • TestWeaver builds tests incrementally by reasoning about what code remains uncovered.
  • It uses slicing and closest-test retrieval to make LLM prompts more focused and effective.
  • Generated tests are saved as .py files and can be executed with pytest.

Baselines:

CoverUp Baseline

Run CoverUp baseline with DeepSeek model.

Prerequisites: Docker, Python 3.10+

Steps:

  1. Ensure .env file is configured (same as TestWeaver):
echo "OPENAI_API_KEY=sk-your-actual-api-key-here" > .env
echo "OPENAI_BASE_URL=https://llm-prof-tien.thaiminhpv.id.vn/" >> .env
  1. Load docker image:
docker load -i scripts/baselines/coverup/docker/coverup-runner.tar
  1. Run CoverUp baseline:
cd scripts/baselines/coverup
python3 scripts/eval_coverup.py --config deepseek-v3 --suite cm

Optional: Run on specific package or file:

python3 scripts/eval_coverup.py --config deepseek-v3 --suite cm --package tqdm
python3 scripts/eval_coverup.py --config deepseek-v3 --suite cm --only tqdm/_tqdm.py

Output: scripts/baselines/coverup/output/cm.deepseek-v3/<package>/final.json

CodaMosa Baseline

Run CodaMosa baseline with DeepSeek model.

Prerequisites: Docker, Python 3.10+

Steps:

  1. Ensure .env file is configured (same as TestWeaver):
echo "OPENAI_API_KEY=sk-your-actual-api-key-here" > .env
echo "OPENAI_BASE_URL=https://llm-prof-tien.thaiminhpv.id.vn/" >> .env
  1. Load docker images:
cd scripts/baselines/codamosa/replication
docker load < docker-images/benchmarks-docker.tar.gz
docker load < docker-images/codamosa-docker.tar.gz
  1. Start benchmark container (if not already started):
./scripts/start_benchmark_container.sh
  1. Run CodaMosa baseline:
python3 run_codamosa_deepseek.py

Output: scripts/baselines/codamosa/replication/deepseek-coda/<module>-<run>/statistics.csv

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published