PlotMyData is an agentic data analysis and visualization system. It follows your prompts to drive an R session.
You can start with example datasets, upload your own data, or download data from a URL. If you want to ask about the data or transform it before plotting, just say what you want to do.
- Multiple data sources: Use built-in R datasets or user-provided data (currently CSV files are supported)
- Interactive analysis: The system uses an R session so variables persist across invocations
- Instant visualization: Plots are shown in the chat interface and are downloadable as PNG files
- Help tools
- Provide access to help pages for packages and topics
- Data agent
- Knows about R datasets and can access uploaded files or URLs
- Data files are automatically summarized for the LLM
- This lets you describe a plot without knowing the exact variable names
- Run agent
- Runs R code generated by the LLM
- If you want to run specific code, just send it in a message
- LLM chooses invisible or visible results depending on requirements
- Plot agent
- Tools are provided for making plots with base R graphics (default) and ggplot2
- To use ggplot2, just mention "ggplot" or "ggplot2" in your message
- Install agent
- Installs CRAN packages to add capabilities to the running application
- Can be called by other agents or requested by the user
- User confirmation is required for installing any packages
The application can be run with or without a container.
Containerless
- Install R and run
install.packages(c("ellmer", "mcptools", "readr", "ggplot2", "tidyverse")) - Install Python with packages listed in
requirements.txt - Put your OpenAI API key in a file named
secret.openai-api-key - Execute
run_web.shto start an R session and launch the ADK web UI
Containerized
First, build the project.
This creates a plotmydata Docker Compose project and a plotmydata-app image.
docker compose buildNow run the project.
This uses your OpenAI API key (sk-proj-...) from secret.openai-api-key.
docker compose upChanging the model
If you want to change the remote LLM from the default (gpt-4o), change it in the startup script (run_web.sh or entrypoint.sh).
To use a local LLM, install Docker Model Runner then run this command.
docker compose -f compose.yaml -f model-runner.yaml upSee model-runner.yaml to change the local LLM used.
Plot data
- Plot radius_worst (y) vs radius_mean (x) from https://github.com/jedick/plotmydata/raw/refs/heads/main/evals/data/breast-cancer.csv. Add a blue 1:1 line and title "Breast Cancer Wisconsin (Diagnostic)".
Interactive analysis
- Save 100 random numbers from a normal distribution in x
- Run y = x^2
- Plot a histogram of y
Most recent eval run: 74% accuracy on 50 cases with GPT-4o.
Evals history
Accuracy = fraction of correct plots. Plot correctness is judged by a human.
| Eval set | Size | Agent version | Accuracy | Notes |
|---|---|---|---|---|
| 04 | 50 | 1c3f5bd | 0.74 | More base graphics and add Install agent: corrr, scatterplot3d, nlme, parcoord, kde, and custom plots |
| 03 | 40 | 24fb91f | 0.75 | Model: gpt-4o |
| 03 | 40 | b8e5f8c | 0.38 | Add agent for loading and summarizing data |
| 03 | 40 | 30c22a1 | 0.50 | Handle uploaded CSV files |
| 02 | 37 | e9180aa | 0.49 | More base graphics: hist, image, lines, matplot, mosaicplot, pairs, rug, spineplot, plot.window |
| 01 | 27 | e9180aa | 0.52 | Add help tools to get R documentation |
| 01 | 27 | bb4eead | 0.41 | Mainly base graphics: barplot, boxplot, cdplot, coplot, contour, dotchart, filled.contour, grid (Model: gpt-4o-mini) |
Evals info
The repo tracks both evaluation sets and prompt sets.
For example, the evals/01 directory contains all results for the first evaluation set using different prompt sets.
The file name uses the short commit hash for the prompt set used for evaluation.
Each eval consists of a query and reference code and image. Because of their size, reference and generated images are not stored in this repo.
To run evals, copy the latest eval CSV file to evals/evals.csv.
Then use e.g. run_eval.sh 1 to run the first eval.
This script: 1) saves the tool calls, generated code, and current date to the CSV file and 2) saves the generated image to the evals/generated directory.
After running evals, change to the evals directory and run streamlit run view.py to edit the eval CSV file.
This app allows:
- Choosing an eval to edit
- Viewing the reference and generated images side-by-side
- Indicating whether the generated plot is correct (True or False)
- Editing other eval data (e.g. query, file name for data upload, reference code, notes)
- Adding new evals
- An Agent Development Kit client is connected to an MCP server from the mcptools R package
- The startup scripts launch a persistent R session with some preloaded packages and helper functions
- Data files are saved in a temporary directory using ADK's artifacts and callbacks
- This is how the R session can access the files
Container notes:
- The Docker image is based on rocker/r-ver and adds R packages and a Python installation
- Docker Compose is used for port mapping, secrets, and watching file changes with Docker Watch
- This code in repo is licensed under MIT
- Some examples used in evals are taken from R and are licensed under GPL-2|GPL-3
breast-cancer.csv(from UCI Machine Learning Repository via Kaggle) is licensed under CC BY 4.0



