ParseHawk

Apache-2.0 local-first document AI for turning PDFs, scans, images, text files and Markdown into structured JSON without sending sensitive documents to third-party AI APIs.

Owner

Totoy GmbH

Company

Contact information

Francis Rafal

https://github.com/parsehawk/parsehawk

ParseHawk is a local-first document AI solution for extracting structured data from unstructured documents. It processes PDFs, scans, images, text files and Markdown and returns validated JSON through a REST API, CLI and Web UI.

Needs addressed:
Public administrations and regulated organizations often receive forms, reports, referrals, invoices, legacy PDFs and scanned documents that need to be transformed into structured data. These documents may contain sensitive personal, medical, financial or administrative data and should not be sent to third-party AI APIs.

Features:
- Extract structured JSON from PDFs, scans, images, text files and Markdown
- Define custom extraction schemas
- Run locally by default on macOS Apple Silicon and Linux/NVIDIA
- Use REST API, CLI and Web UI
- Apache-2.0 open-source licence
- Local model serving using open-source infrastructure

Intended audience:
Developers, public-sector IT teams, municipalities, healthcare IT providers, DMS/ECM vendors and regulated organizations that need document extraction while retaining control over sensitive data.

Reuse:
ParseHawk can be installed from its GitHub repository and run locally. Users can define schemas and instructions for the document types they need to process, then integrate the REST API into existing document, case-management or data workflows.

Standards and interoperability:
ParseHawk returns JSON and validates output against JSON Schema Draft 2020-12. It exposes an HTTP API suitable for integration into existing digital public-service workflows.

Policy contribution:
ParseHawk supports European digital sovereignty, open-source reuse and privacy-conscious AI adoption by enabling local document extraction without dependence on external AI APIs.

Detailed information

Published on

30/06/2026

Last update

30/06/2026

Language availability

English

Status

Completed

Moderation

Only facilitators and authors can create content.

Moderated

ParseHawk

Quick links

Detailed information

Moderation