Skip to content

Tool which leverages Gemini's multimodality and structured outputs to easily extract specific fields from documents (in PDF or image). Including validations for the extracted fields.

Notifications You must be signed in to change notification settings

echiner/document-data-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Document Data Extractor

DocExtract is a powerful tool designed to extract structured data from various documents (Images, PDFs) using the power of Google's Gemini Multimodal AI. It allows users to define custom extraction schemas and validation rules.

Technical Stack

  • Backend: Python, FastAPI
  • AI Model: Google Gemini 2.0 Flash (via google-generativeai)
  • Frontend: HTML5, Vanilla JavaScript, TailwindCSS
  • Persistence: JSON-based storage for configuration
  • Validation: Regex, Fuzzy Matching (TheFuzz), and LLM-based validation

Usage

  1. Prerequisites:

    • Python 3.8+
    • Google Cloud API Key with access to Gemini API.
  2. Installation:

    cd backend
    pip install -r requirements.txt
  3. Configuration:

    • Create a .env file in the backend directory.
    • Add your API key: GOOGLE_API_KEY=your_api_key_here
  4. Running the Application:

    uvicorn main:app --reload
    • Open your browser and navigate to http://127.0.0.1:8000.
  5. Using the App:

    • Go to Configuration to create a new "Document Type" (e.g., Invoice, ID Card).
    • Define fields to extract (e.g., "Total Amount", "Name") and add descriptions to help the AI.
    • Add validation rules (Regex, etc.) to ensure data quality.
    • Go to Dashboard, select your document type, and upload a file to extract data.

Screenshots

Credits

This application has been (quickly) developed by Antigravity AI, with a little help from myself.

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC BY-NC

About

Tool which leverages Gemini's multimodality and structured outputs to easily extract specific fields from documents (in PDF or image). Including validations for the extracted fields.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published