DocExtract is a powerful tool designed to extract structured data from various documents (Images, PDFs) using the power of Google's Gemini Multimodal AI. It allows users to define custom extraction schemas and validation rules.
- Backend: Python, FastAPI
- AI Model: Google Gemini 2.0 Flash (via
google-generativeai) - Frontend: HTML5, Vanilla JavaScript, TailwindCSS
- Persistence: JSON-based storage for configuration
- Validation: Regex, Fuzzy Matching (TheFuzz), and LLM-based validation
-
Prerequisites:
- Python 3.8+
- Google Cloud API Key with access to Gemini API.
-
Installation:
cd backend pip install -r requirements.txt -
Configuration:
- Create a
.envfile in thebackenddirectory. - Add your API key:
GOOGLE_API_KEY=your_api_key_here
- Create a
-
Running the Application:
uvicorn main:app --reload
- Open your browser and navigate to
http://127.0.0.1:8000.
- Open your browser and navigate to
-
Using the App:
- Go to Configuration to create a new "Document Type" (e.g., Invoice, ID Card).
- Define fields to extract (e.g., "Total Amount", "Name") and add descriptions to help the AI.
- Add validation rules (Regex, etc.) to ensure data quality.
- Go to Dashboard, select your document type, and upload a file to extract data.
This application has been (quickly) developed by Antigravity AI, with a little help from myself.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




