Document Data Extractor

DocExtract is a powerful tool designed to extract structured data from various documents (Images, PDFs) using the power of Google's Gemini Multimodal AI. It allows users to define custom extraction schemas and validation rules.

Technical Stack

Backend: Python, FastAPI
AI Model: Google Gemini 2.0 Flash (via google-generativeai)
Frontend: HTML5, Vanilla JavaScript, TailwindCSS
Persistence: JSON-based storage for configuration
Validation: Regex, Fuzzy Matching (TheFuzz), and LLM-based validation

Usage

Prerequisites:
- Python 3.8+
- Google Cloud API Key with access to Gemini API.

Installation:

cd backend
pip install -r requirements.txt

Configuration:
- Create a .env file in the backend directory.
- Add your API key: GOOGLE_API_KEY=your_api_key_here
Running the Application:
```
uvicorn main:app --reload
```
- Open your browser and navigate to http://127.0.0.1:8000.
Using the App:
- Go to Configuration to create a new "Document Type" (e.g., Invoice, ID Card).
- Define fields to extract (e.g., "Total Amount", "Name") and add descriptions to help the AI.
- Add validation rules (Regex, etc.) to ensure data quality.
- Go to Dashboard, select your document type, and upload a file to extract data.

Screenshots

Credits

This application has been (quickly) developed by Antigravity AI, with a little help from myself.

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
img		img
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document Data Extractor

Technical Stack

Usage

Screenshots

Credits

License

About

Uh oh!

Releases

Packages

Languages

echiner/document-data-extractor

Folders and files

Latest commit

History

Repository files navigation

Document Data Extractor

Technical Stack

Usage

Screenshots

Credits

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages