A Streamlit-based application that converts natural language questions into SQL queries using Google's Gemini AI. The system supports multiple databases with database-specific prompts and memory management.
- Features
- Project Structure
- Prerequisites
- Installation
- Configuration
- Usage
- File Descriptions
- How It Works
- Adding New Databases
- Natural Language Queries: Ask questions in plain English and get SQL queries automatically generated
- AI-Powered: Uses Google Gemini AI for intelligent SQL generation and result summarization
- Query Explanation: Get plain English explanations of generated SQL queries
- Multi-Database Support: Switch between multiple databases seamlessly
- Database-Specific Prompts: Custom AI prompts tailored for each database type
- Database-Specific Memory: Separate conversation history for each database
- SQL Validation: Multi-layer SQL validation including safety checks, semantic validation, and execution verification
- Conversation Memory: Maintains context from previous interactions for more accurate query generation
- Interactive UI: Clean Streamlit interface with real-time results and data visualization
- Database Management: Upload SQLite databases or create new ones from CSV files
- Export Results: Download query results as CSV files
NL-SQL/
βββ main.py # Main Streamlit application
βββ create_db.py # Database creation utility
βββ custom_db.py # Custom database upload/creation handler
βββ databse_manager.py # Database operations manager
βββ gemini_class.py # Gemini AI integration
βββ explain_query.py # SQL query explainer
βββ memory_management.py # Conversation memory handler
βββ prompt_manager.py # Database-specific prompt loader
βββ sql_validation.py # SQL query validation
βββ pyproject.toml # Project dependencies
βββ README.md # This file
βββ db/ # SQLite database directory
β βββ soil_pollution.db # Soil pollution database
β βββ air_pollution.db # Air pollution database
βββ inputs/ # Input CSV files
β βββ global_air_pollution_dataset.csv
β βββ soil_pollution_diseases.csv
βββ memory/ # Database-specific memory files
β βββ soil_pollution_memory.json
β βββ air_pollution_memory.json
βββ prompts/ # Database-specific prompt templates
βββ __init__.py
βββ default_prompt.py # Default/fallback prompts
βββ soil_pollution_prompt.py
βββ air_pollution_prompt.py
- Python 3.12 or higher
- Google Gemini API key
-
Clone the repository
git clone <repository-url> cd NL-SQL
-
Install dependencies using pip
pip install -r requirements.txt
Or using
uv(recommended):uv sync
-
Create the database
python create_db.py
-
Set up environment variables
Create a
.envfile in the project root:GEMINI_API_KEY=your_gemini_api_key_here
-
Get a Gemini API Key
- Visit Google AI Studio
- Create a new API key
- Copy the key to your
.envfile
-
Start the application
streamlit run main.py
-
Open your browser and navigate to
http://localhost:8501 -
Select a database from the dropdown menu
-
Ask questions in natural language, for example:
- "Show me the top 10 records"
- "What is the average value by category?"
- "List all unique countries in the dataset"
-
View results: The system displays:
- Generated SQL query
- Query results in a table
- AI-generated natural language summary
-
Explain Query: Click the "π Explain Query" button to get a plain English breakdown of what the SQL query does
-
Manage databases via the sidebar:
- Upload existing SQLite databases
- Create new databases from CSV files
The main entry point for the Streamlit application. It:
- Sets up the web interface with custom styling
- Initializes session state for database, memory, and AI assistant
- Handles database switching and syncs memory/prompts accordingly
- Handles user input and displays query results
- Manages query history and provides CSV export functionality
A utility function to create SQLite databases from CSV files:
- Creates the
db/directory if it doesn't exist - Reads CSV data and creates tables with appropriate columns
- Imports all rows from the CSV file
Handles custom database operations through the CustomDatabase class:
upload_database(): Upload existing SQLite database filescreate_database(): Create new databases from uploaded CSV files
Manages all SQLite database operations through the DatabaseManager class:
execute_query(): Executes SQL queries and returns results as dictionariesget_schema(): Retrieves database schema information for AI contextget_available_databases(): Lists all available database filesswitch_database(): Switches to a different database
Handles Google Gemini AI integration through the GeminiAssistant class:
set_database(): Loads appropriate prompts for the selected databasebuild_sql_prompt(): Constructs prompts with schema context and guidelinesgenerate_sql(): Converts natural language to SQL queriesgenerate_summary(): Creates human-readable summaries of query results
Provides SQL query explanations through the QueryExplainer class:
explain_query(): Generates plain English explanations of SQL queries using Gemini AI- Breaks down each part of the query (SELECT, FROM, WHERE, JOIN, GROUP BY, etc.)
- Uses bullet points for clear, easy-to-understand explanations
Manages database-specific prompts through the PromptManager class:
load_prompts_for_db(): Loads prompts specific to a databaseget_sql_prompt(): Returns the SQL generation prompt templateget_summary_prompt(): Returns the summary generation prompt template- Falls back to default prompts if no specific prompt file exists
Manages conversation context through the MemoryManager class:
add(): Stores new interactions (question, SQL, result, summary)get_recent_context(): Retrieves recent interactions for AI contextswitch_memory_file(): Switches to memory file for a different databaseclear(): Clears all stored memory for the current database- Each database has its own memory file in the
memory/directory
Provides SQL security and validation through the SQLValidator class:
safety_check(): Blocks DDL/DML operations (DROP, DELETE, INSERT, etc.)semantic_check(): Validates tables and columns against the database schemaexecution_check(): Tests query execution in a safe environment
Contains database-specific prompt templates:
default_prompt.py: Generic prompts used as fallbacksoil_pollution_prompt.py: Prompts tailored for soil pollution dataair_pollution_prompt.py: Prompts tailored for air quality data- Each file exports
SQL_PROMPTandSUMMARY_PROMPTtemplates
Project configuration and dependencies:
google-generativeai: Google Gemini AI SDKpython-dotenv: Environment variable managementsqlalchemy: SQL toolkit for Pythonsqlglot: SQL parser for validationstreamlit: Web application framework
-
User Input: User enters a natural language question in the Streamlit interface
-
Context Building: The system retrieves:
- Database schema (tables, columns, data types)
- Recent conversation history for context
-
SQL Generation: Gemini AI generates an SQLite-compatible query based on:
- The user's question
- Database schema
- Previous interactions
-
Validation: The generated SQL passes through multiple validation layers:
- Safety check (blocks harmful operations)
- Semantic check (validates tables/columns exist)
- Execution check (ensures query runs successfully)
-
Execution: The validated query is executed against the SQLite database
-
Summary: Gemini AI generates a natural language summary of the results
-
Query Explanation (Optional): User can click "Explain Query" to get a detailed breakdown of the SQL in plain English
-
Memory Storage: The interaction is saved for future context
- Read-Only Queries: Only SELECT statements are allowed
- DDL/DML Blocking: DROP, DELETE, INSERT, UPDATE, CREATE, ALTER operations are blocked
- Schema Validation: Queries are validated against actual database schema
- SQL Injection Prevention: Uses parameterized queries and AST parsing
-
Option A: Upload via UI
- Go to the sidebar β Database Management
- Select "Upload SQLite Database"
- Upload your
.db,.sqlite, or.sqlite3file
-
Option B: Create from CSV
- Go to the sidebar β Database Management
- Select "Create Database from CSV"
- Upload your CSV file and specify database/table names
-
Option C: Manual placement
- Place your SQLite database file in the
db/directory - Restart the application
- Place your SQLite database file in the
To add database-specific prompts for better AI responses:
-
Create a new file in
prompts/named<database_name>_prompt.py- For
my_data.db, createprompts/my_data_prompt.py
- For
-
Add two template variables:
SQL_PROMPT = """Your custom SQL generation prompt here... {column_descriptions} {context} User Question: {user_question} Generate only the SQL query:""" SUMMARY_PROMPT = """Your custom summary prompt here... User asked: "{user_question}" {context} Current result: {data_preview} Summary:"""
-
The system will automatically use these prompts when the database is selected
This project is open source. See the LICENSE file for details.