Source Code:
src/gaia/llm/llm_client.pyComponent: LLMClient
Module:
gaia.llm.llm_client
Import: from gaia.llm import LLMClientOverview
LLMClient provides a unified interface for generating text from multiple LLM backends (local Lemonade server, Claude API, OpenAI API). It handles connection management, retry logic, streaming responses, and performance monitoring with automatic endpoint selection and base URL normalization. Key Features:- Multi-backend support (local, Claude, OpenAI)
- Automatic retry with exponential backoff
- Streaming and non-streaming generation
- Performance statistics tracking
- Generation halting/interruption
- Context manager for resource cleanup
Requirements
Functional Requirements
-
Multi-Backend Support
- Local LLM via Lemonade server (default)
- Anthropic Claude API
- OpenAI ChatGPT API
- Automatic base URL normalization
-
Generation Interface
generate()- Generate text with prompt- Streaming and non-streaming modes
- System prompt support
- Temperature and other parameters
- Messages array support for chat
-
Connection Management
- Configurable timeouts (connect, read, write, pool)
- Connection pooling
- Retry logic with exponential backoff
- Connection error handling
-
Performance Monitoring
get_performance_stats()- Token counts, timingis_generating()- Check generation statushalt_generation()- Stop current generation
-
Error Handling
- Network error detection and retry
- Timeout handling
- API endpoint validation
- Clear error messages with fix suggestions
Non-Functional Requirements
-
Performance
- Fast connection establishment (15s timeout)
- Streaming with 120s read timeout
- Efficient token counting
- Minimal overhead
-
Reliability
- Automatic retry on transient failures
- Exponential backoff (base: 1s, max: 60s)
- Configurable max retries (default: 3)
- Connection pool management
-
Usability
- Simple initialization
- Sensible defaults
- Clear documentation
- Helpful error messages
API Specification
File Location
Public Interface
Implementation Details
Connection Configuration
Local LLM (Lemonade Server):Base URL Normalization
Retry Logic
Endpoint Selection
Error Handling
Testing Requirements
Unit Tests
File:tests/llm/test_llm_client.py
Integration Tests
Dependencies
Required Packages
Import Dependencies
Usage Examples
Example 1: Basic Local LLM
Example 2: Streaming Responses
Example 3: Using Claude API
Example 4: Chat with Message History
Example 5: Halting Generation
Example 6: Custom Retry Configuration
Example 7: Remote Lemonade Server
Third-Party LLM Integration
GAIA supports third-party LLM service providers through its OpenAI-compatible API interface. Any service implementing the OpenAI API specification can be used with GAIA.
Required API Endpoints
Your LLM service must implement at least one of these OpenAI-compatible endpoints:Completions Endpoint
Default:
POST /v1/completionsUsed for pre-formatted promptsChat Completions Endpoint
POST /v1/chat/completionsUsed for structured conversations with message historyCompletions Endpoint
- Request
- Response
- Streaming
Chat Completions Endpoint
- Request
- Response
- Streaming
Configuration
- Environment Variable
- Direct Initialization
Linux
Windows (PowerShell)
Windows (CMD)
URL Normalization: GAIA automatically appends
/api/v1 if not present:http://localhost:8080→http://localhost:8080/api/v1- If your service uses
/v1instead, provide the full path:http://localhost:8080/v1
Example Integration
Compatibility Checklist
Required Features
Required Features
- ✅ OpenAI-compatible endpoints (
/v1/completionsor/v1/chat/completions) - ✅ JSON request/response format matching OpenAI specification
- ✅ HTTP POST method for generation requests
- ✅ Non-streaming responses (complete response as JSON)
Optional Features
Optional Features
- ⚠️ Streaming responses (Server-Sent Events format)
- ⚠️ Error handling (proper HTTP status codes: 200, 400, 404, 500)
- ⚠️ Model listing (
GET /v1/modelsendpoint) - ⚠️ Token counting (usage statistics in responses)
GAIA-Specific Features
GAIA-Specific Features
The following features are specific to Lemonade Server and will not work with third-party services:
get_performance_stats()- Returns empty dict{}is_generating()- ReturnsFalsehalt_generation()- ReturnsFalse
Troubleshooting
Connection Errors
Connection Errors
Problem:
ConnectionError: LLM Server Connection ErrorSolutions:- Verify service is running:
- Check firewall settings
- Ensure correct base URL format
- Test with explicit endpoint:
404 Endpoint Errors
404 Endpoint Errors
Problem:
404 endpoint not foundSolutions:- Check if service uses
/v1/completions(OpenAI standard) - Verify API path structure:
/v1vs/api/v1 - Consult service documentation for correct endpoint paths
- Use explicit endpoint override:
Model Not Found
Model Not Found
Problem: Model errors or “model not loaded”Solutions:
- Specify model explicitly:
- List available models (if service supports):
- Ensure model is loaded in your service before connecting
Streaming Issues
Streaming Issues
Problem: Streaming responses not workingSolutions:
- Verify service supports Server-Sent Events (SSE)
- Check Content-Type headers:
text/event-stream - Test non-streaming first:
- Enable debug logging:
Documentation Updates Required
SDK.md
Add to LLM Section:Acceptance Criteria
- LLMClient implemented in
src/gaia/llm/llm_client.py - All methods implemented with docstrings
- Supports local Lemonade, Claude, OpenAI backends
- Retry logic with exponential backoff works
- Streaming generation works
- Performance stats retrieval works
- Generation halting works
- Base URL normalization works
- All unit tests pass (15+ tests)
- Integration tests pass with live server
- Error messages are helpful
- Can import:
from gaia.llm import LLMClient - Documented in SDK.md
- Example code works
LLMClient Technical Specification