Document Extractor is a simple interface based document extractor with Pydantic and prompt support, designed to extract structured data from documents such as PDFs and images.

Features

Document Processing

Extract structured data from documents like PDFs and images, turning them into usable information.

Custom Prompts

Allows you to define specific extraction prompts to target particular information within documents.

Schema Definition

Provides the option to define Pydantic schemas for producing structured output, ensuring consistency in data extraction.

Multiple API Providers

Supports integration with multiple API providers including OpenAI and Azure OpenAI, allowing flexible extraction capabilities.

Template Management

Save and manage extraction templates locally, allowing for reuse and consistency in document processing.

User-friendly Interface

Offers a simple web interface for easily uploading documents and configuring extraction settings.

Multi-page Support

Handles multi-page PDF documents by combining pages automatically during the processing.

Flexible Configuration

Enables adjustment of model parameters such as temperature and token limits for more accurate data extraction.

Usage Metrics

Provides insights into app usage through simple card views, helping users understand their interaction with the tool.

Download JSON

Allows you to download extracted data as JSON objects, making it easy to use or analyze further.