Skip to content

Use AI agents to access, transform, and plot your data. With a live demo and growing evaluation set.

License

Notifications You must be signed in to change notification settings

jedick/plotmydata

Repository files navigation

PlotMyData

Open in HF Spaces

PlotMyData is an agentic data analysis and visualization system. It follows your prompts to drive an R session.

You can start with example datasets, upload your own data, or download data from a URL. If you want to ask about the data or transform it before plotting, just say what you want to do.

Animation of using PlotMyData to plot, download, upload, and explore data

Features

  • Multiple data sources: Use built-in R datasets or user-provided data (currently CSV files are supported)
  • Interactive analysis: The system uses an R session so variables persist across invocations
  • Instant visualization: Plots are shown in the chat interface and are downloadable as PNG files

Agents and tools refined through many usage trials

  • Help tools
    • Provide access to help pages for packages and topics
  • Data agent
    • Knows about R datasets and can access uploaded files or URLs
    • Data files are automatically summarized for the LLM
    • This lets you describe a plot without knowing the exact variable names
  • Run agent
    • Runs R code generated by the LLM
    • If you want to run specific code, just send it in a message
    • LLM chooses invisible or visible results depending on requirements
  • Plot agent
    • Tools are provided for making plots with base R graphics (default) and ggplot2
    • To use ggplot2, just mention "ggplot" or "ggplot2" in your message
  • Install agent
    • Installs CRAN packages to add capabilities to the running application
    • Can be called by other agents or requested by the user
    • User confirmation is required for installing any packages

Running the application

The application can be run with or without a container.

Containerless
  • Install R and run install.packages(c("ellmer", "mcptools", "readr", "ggplot2", "tidyverse"))
  • Install Python with packages listed in requirements.txt
  • Put your OpenAI API key in a file named secret.openai-api-key
  • Execute run_web.sh to start an R session and launch the ADK web UI
Containerized

First, build the project. This creates a plotmydata Docker Compose project and a plotmydata-app image.

docker compose build

Now run the project. This uses your OpenAI API key (sk-proj-...) from secret.openai-api-key.

docker compose up
Changing the model

If you want to change the remote LLM from the default (gpt-4o), change it in the startup script (run_web.sh or entrypoint.sh).

To use a local LLM, install Docker Model Runner then run this command.

docker compose -f compose.yaml -f model-runner.yaml up

See model-runner.yaml to change the local LLM used.

Examples

Plot data

Plot of breast cancer data

Plot functions
  • Plot a Sierpiński Triangle
Chat session with AI agent to plot a Sierpiński Triangle
Interactive analysis
  • Save 100 random numbers from a normal distribution in x
  • Run y = x^2
  • Plot a histogram of y

Histogram of squared normal random numbers

Evaluations

Most recent eval run: 74% accuracy on 50 cases with GPT-4o.

Evals history

Accuracy = fraction of correct plots. Plot correctness is judged by a human.

Eval set Size Agent version Accuracy Notes
04 50 1c3f5bd 0.74 More base graphics and add Install agent: corrr, scatterplot3d, nlme, parcoord, kde, and custom plots
03 40 24fb91f 0.75 Model: gpt-4o
03 40 b8e5f8c 0.38 Add agent for loading and summarizing data
03 40 30c22a1 0.50 Handle uploaded CSV files
02 37 e9180aa 0.49 More base graphics: hist, image, lines, matplot, mosaicplot, pairs, rug, spineplot, plot.window
01 27 e9180aa 0.52 Add help tools to get R documentation
01 27 bb4eead 0.41 Mainly base graphics: barplot, boxplot, cdplot, coplot, contour, dotchart, filled.contour, grid (Model: gpt-4o-mini)
Evals info

The repo tracks both evaluation sets and prompt sets. For example, the evals/01 directory contains all results for the first evaluation set using different prompt sets. The file name uses the short commit hash for the prompt set used for evaluation.

Each eval consists of a query and reference code and image. Because of their size, reference and generated images are not stored in this repo.

To run evals, copy the latest eval CSV file to evals/evals.csv. Then use e.g. run_eval.sh 1 to run the first eval. This script: 1) saves the tool calls, generated code, and current date to the CSV file and 2) saves the generated image to the evals/generated directory.

After running evals, change to the evals directory and run streamlit run view.py to edit the eval CSV file. This app allows:

  • Choosing an eval to edit
  • Viewing the reference and generated images side-by-side
  • Indicating whether the generated plot is correct (True or False)
  • Editing other eval data (e.g. query, file name for data upload, reference code, notes)
  • Adding new evals

Architecture

  • An Agent Development Kit client is connected to an MCP server from the mcptools R package
  • The startup scripts launch a persistent R session with some preloaded packages and helper functions
  • Data files are saved in a temporary directory using ADK's artifacts and callbacks
    • This is how the R session can access the files

Container notes:

Licenses

  • This code in repo is licensed under MIT
  • Some examples used in evals are taken from R and are licensed under GPL-2|GPL-3
  • breast-cancer.csv (from UCI Machine Learning Repository via Kaggle) is licensed under CC BY 4.0

About

Use AI agents to access, transform, and plot your data. With a live demo and growing evaluation set.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published