Source Code:
src/gaia/agents/emr/ · src/gaia/agents/emr/dashboard/How It Works
The EMR agent combines three AI models in a sophisticated pipeline that runs entirely on your local hardware:- Vision Language Model (VLM) - The Qwen3-VL-4B model “sees” the intake form image and extracts text using a carefully crafted prompt that guides it to identify specific fields (name, DOB, allergies, medications, etc.). Unlike traditional OCR, the VLM understands context—it knows that “DOB” means date of birth and can handle handwritten entries, checkboxes, and varied form layouts.
- LLM Validation & Querying - The Qwen3-Coder-30B model (a Mixture-of-Experts architecture that activates only 3B parameters per inference) validates extracted data, handles natural language queries, and generates SQL to search the patient database. When you ask “Which patients have penicillin allergies?”, the LLM translates this to proper SQL.
- Embedding Model - The nomic-embed model creates vector embeddings for semantic similarity search, enabling fuzzy matching when looking up returning patients or finding related records.
Why Local Matters: Running on-device with AMD Ryzen AI means sub-second inference latency, no per-request API costs, and complete data sovereignty. A typical intake form processes in 10-15 seconds on AMD Ryzen AI MAX+ hardware.
Key Features
- Automatic file watching - Monitors a directory for new intake forms
- Drag-and-drop upload - Drop files directly into the Watch Folder panel
- VLM-powered extraction - Uses Qwen3-VL-4B-Instruct for OCR and data extraction
- Local database storage - SQLite with full patient record schema
- New/returning detection - Identifies returning patients and flags changes
- Critical alerts - Automatic alerts for allergies and missing fields
- Web dashboard - Real-time monitoring with SSE updates
- Cumulative efficiency metrics - Track total time saved across all processed forms
Required Models
The EMR agent uses three models, downloaded automatically on first run viagaia-emr init:
| Model | Size | Purpose |
|---|---|---|
| Qwen3-Coder-30B-A3B-Instruct-GGUF | 18.6 GB | LLM for chat queries and patient search |
| Qwen3-VL-4B-Instruct-GGUF | 3.3 GB | Vision language model for form extraction |
| nomic-embed-text-v2-moe-GGUF | 522 MB | Embedding model for similarity search |
Disk Space: Ensure you have at least 25 GB of free disk space for model downloads.
Prerequisites
- Windows
- Linux
- Windows (Dev)
- Linux (Dev)
Installs amd-gaia from PyPI. Recommended for most users on Windows.You should see
Step 1: Create Project Directory
Open PowerShell and run:Step 2: Create Virtual Environment
uv will automatically download Python 3.12 if not already installed.
Step 3: Activate the Environment
(.venv) in your terminal prompt when activated.Step 4: Install GAIA with EMR Dependencies
Step 5: Verify Installation
The
api extra provides FastAPI and uvicorn for the web dashboard. The rag extra provides PyMuPDF for PDF processing.Quick Start
First time here? Complete the Setup guide first to install Lemonade Server and uv.
Step 1: Initialize (First Time Only)
Download and load all required models before first use:- Checks Lemonade server is running and context size is configured
- Downloads and loads all required models:
- VLM: Qwen3-VL-4B-Instruct-GGUF (form extraction)
- LLM: Qwen3-Coder-30B-A3B-Instruct-GGUF (chat/query processing)
- Embedding: nomic-embed-text-v2-moe-GGUF (similarity search)
- Verifies all models are loaded and ready
Partial Success: If the LLM fails to download but VLM succeeds, form extraction will still work. Chat queries and natural language patient search require the LLM. Run
gaia-emr init again to retry failed downloads.Step 2: Launch
- CLI App
- Desktop App
Download sample forms
Download sample intake forms from GitHub and save them to a local directory (e.g.,./intake-forms/).Dev Install: Sample forms are already included in the repository at
data/img/intake-forms/.Start watching for forms
Example Output
Example Output
Query patients
Once processing completes, you can query the database using natural language. The agent uses tool calling to translate your questions into SQL queries and retrieve results from the SQLite database.quit or press Ctrl+C to stop.CLI Commands Reference
| Command | Description |
|---|---|
init | Download required models |
watch | Watch folder and process forms |
dashboard | Launch web dashboard |
query | One-shot database query |
reset | Delete database and start fresh |
-h | Full command reference |
Supported Intake Form Formats
The agent accepts scanned or photographed intake forms in these formats:| Extension | Processing |
|---|---|
.png, .jpg, .jpeg | Direct image processing |
.pdf | Converted to image via PyMuPDF |
.tiff, .bmp | Direct image processing |
Under the Hood
Image Preprocessing Pipeline
Before reaching the VLM, intake forms go through an optimization pipeline:- EXIF Orientation - Auto-rotates images based on camera metadata (critical for phone photos)
- Resolution Scaling - Resizes to max 1024px while preserving aspect ratio (balances quality vs. token count)
- JPEG Compression - Reduces file size with quality=85 for faster transmission to the model
- Token Estimation - Calculates expected image tokens to verify they fit within context window
Returning Patient Detection
The agent uses a multi-signal approach to identify returning patients:- Exact Match - Name + DOB combination lookup in SQLite
- Fuzzy Match - Levenshtein distance for misspellings (“Jon Smith” → “John Smith”)
- Embedding Similarity - Vector search using nomic-embed for semantic matching
Real-Time Dashboard Architecture
The dashboard uses Server-Sent Events (SSE) for real-time updates without polling:- FastAPI backend streams processing events as they occur
- React frontend subscribes to
/api/eventsendpoint - Sub-100ms latency from file detection to UI update
- No WebSocket complexity—SSE is simpler and works through proxies
Tool Calling for Database Queries
When you ask a natural language question, the LLM uses function calling (tool use) to interact with the database: This pattern ensures the LLM never directly writes SQL—it calls predefined, validated tools that safely construct queries.Troubleshooting
PyMuPDF Required
pip install "amd-gaia[rag]"
JSON Parse Failed
Database Locked
Slow First Run
First run downloads models (LLM, VLM, embeddings) which may take several minutes. Subsequent runs start immediately.Model Download Failed
gaia-emr init again to resume the download. If the error persists:
- Close any applications that may be using the model files
- Delete partial downloads in Lemonade’s model cache directory
- Run
gaia-emr initagain
Context Size Too Small
- Right-click Lemonade tray icon → Settings
- Set Context Size to 32768
- Restart the model
Corrupted Model / Init Fails
Ifgaia-emr init fails repeatedly or the agent won’t start due to model errors:
Solution: Delete the corrupted model from Lemonade’s cache and restart:
- Close Lemonade Server (right-click tray icon → Exit)
- Navigate to the model cache directory:
- Windows:
%LOCALAPPDATA%\AMD\LemonadeModels\ - Linux:
~/.local/share/lemonade/models/
- Windows:
- Delete the corrupted model folder (e.g.,
Qwen3-VL-4B-Instruct-GGUF/) - Restart Lemonade Server
- Run
gaia-emr initagain to re-download
Learn More
Part 1: Getting Started
Build this agent from scratch and understand the core components
Part 2: Dashboard & API
Deep dive into the web dashboard and REST API endpoints
Part 3: Architecture
Database schema, processing pipeline, and system design