What You’ll Learn: How the tool system works under the hood, how LLMs “see” and choose which tools to use, how to write tools that LLMs understand correctly, best practices for parameters, return values, and error handling, and common pitfalls and how to avoid them.
Here’s the key insight: The LLM never sees your Python code. It only sees a “contract” describing the tool.When you write this:
Copy
@tooldef get_weather(city: str, units: str = "fahrenheit") -> dict: """Get current weather for a city. Args: city: Name of the city (e.g., "Seattle", "Tokyo") units: Temperature units - "fahrenheit" or "celsius" Use this tool when the user asks about weather conditions, temperature, or if they need an umbrella. Returns: Dictionary with temperature, conditions, humidity """ # Your implementation here (LLM never sees this!) response = weather_api.get(city, units) return { "temperature": response.temp, "conditions": response.description, "humidity": response.humidity }
The LLM sees this contract:
Copy
{ "name": "get_weather", "description": "Get current weather for a city.\n\nUse this tool when the user asks about weather conditions, temperature, or if they need an umbrella.", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "Name of the city (e.g., \"Seattle\", \"Tokyo\")" }, "units": { "type": "string", "description": "Temperature units - \"fahrenheit\" or \"celsius\"", "default": "fahrenheit" } }, "required": ["city"] }}
Key Insight: Everything the LLM knows about your tool comes from the function name, type hints, and docstring. Your implementation is invisible to it.
from gaia.agents.base.tools import tool@tooldef calculate(expression: str) -> float: """Calculate a mathematical expression. Args: expression: Math expression like "2 + 2" or "sqrt(16)" Use this tool when the user asks to calculate, compute, or do math. """ import math # Safe evaluation with only math functions return eval(expression, {"__builtins__": {}}, vars(math))
What makes this effective:
Clear function name (calculate) matches what it does
Type hint (str) tells LLM what to pass
Return type (float) sets expectations
Docstring explains when to use it (“when the user asks to calculate”)
@tooldef search_files( query: str, file_type: str = "all", max_results: int = 10) -> dict: """Search for files matching a query. Args: query: Search term to look for in file names and contents file_type: Filter by type - "all", "py", "js", "md", etc. max_results: Maximum number of results to return (default: 10) Use this tool when the user wants to find or locate files. """ # Implementation... results = perform_search(query, file_type, max_results) return { "status": "success", "count": len(results), "results": results }
How defaults work with LLMs:
Copy
User: "Find Python files about authentication"LLM reasons:- "Find files" → use search_files- "authentication" → query="authentication"- "Python files" → file_type="py"- max_results not mentioned → use default 10Tool call: search_files(query="authentication", file_type="py")
For structured data, use type hints to guide the LLM:
Copy
from typing import List, Dict, Optional@tooldef create_task( title: str, description: str, assignees: List[str], priority: str = "medium", labels: Optional[List[str]] = None) -> dict: """Create a new task in the project management system. Args: title: Short title for the task description: Detailed description of what needs to be done assignees: List of usernames to assign (e.g., ["alice", "bob"]) priority: Priority level - "low", "medium", or "high" labels: Optional list of labels (e.g., ["bug", "frontend"]) Use this tool when the user wants to create, add, or make a new task. """ # Implementation... task = task_system.create( title=title, description=description, assignees=assignees, priority=priority, labels=labels or [] ) return { "status": "success", "task_id": task.id, "url": task.url }
The LLM understands:
List[str] → needs to pass a list of strings
Optional[List[str]] → can be omitted or set to null
@tooldef get_user(user_id: int) -> dict: """Get user information by ID.""" try: user = database.get_user(user_id) return {"status": "success", "user": {...}} except UserNotFoundError: return { "status": "error", "error": f"No user found with ID {user_id}", "suggestion": "Check the user ID or try searching by name" }
LLM receives:{"status": "error", "error": "No user found...", "suggestion": "..."}LLM responds: “I couldn’t find a user with ID 999. Would you like to search by name instead?”
Your docstring teaches the LLM when and how to use your tool. Compare:
Vague Docstring
Clear Docstring
Copy
@tooldef search(q: str) -> str: """Search for stuff.""" ...
Problems: LLM doesn’t know what kind of search, q parameter name is unclear, no guidance on when to use it, might conflict with other search tools.
Copy
@tooldef search_codebase(query: str) -> dict: """Search for code patterns in the project's source files. Args: query: Code pattern to search for (e.g., "def process_", "import requests", "TODO") Use this tool when the user wants to: - Find functions, classes, or variables by name - Locate imports or dependencies - Search for code patterns or comments Do NOT use this tool for: - Web searches (use search_web instead) - File name searches (use find_files instead) - Documentation searches (use search_docs instead) Returns: Dictionary with matching files and line numbers """ ...
Benefits: Clear purpose (code search, not web search), example queries help LLM understand format, explicit “when to use” guidance, explicit “when NOT to use” prevents misuse.
@tooldef tool_name(param: type) -> return_type: """One-line summary of what the tool does. ← Required Longer description with more details if needed. ← Optional Can span multiple lines. Args: ← Highly recommended param: Description of what this parameter ← Include examples! expects (e.g., "file path like /src/main.py") Use this tool when the user wants to: ← Critical for selection - First trigger phrase - Second trigger phrase - Third trigger phrase Do NOT use when: ← Prevents confusion - Situation where another tool is better Returns: ← Helps LLM use results Description of return format """
@tooldef read_file(path: str) -> dict: """Read contents of a file. Args: path: Path to the file to read """ try: with open(path, 'r') as f: content = f.read() return { "status": "success", "path": path, "content": content, "size_bytes": len(content) } except FileNotFoundError: return { "status": "error", "error": f"File not found: {path}", "suggestion": "Check the path spelling or use search_files to find it" } except PermissionError: return { "status": "error", "error": f"Permission denied: {path}", "suggestion": "This file may be protected. Try a different file." } except UnicodeDecodeError: return { "status": "error", "error": f"Cannot read binary file: {path}", "suggestion": "This appears to be a binary file, not text" } except Exception as e: return { "status": "error", "error": f"Unexpected error: {str(e)}", "suggestion": "Try a different file or check the path" }
Now the LLM can respond helpfully:
Copy
Tool result: {"status": "error", "error": "File not found: /src/main.py", ...}LLM: "I couldn't find /src/main.py. Let me search for it..." → Calls search_files("main.py")
Symptom: User asks to search code, but LLM calls search_web.Cause: Tool descriptions are too similar or vague.Solution: Add explicit differentiation:
Copy
@tooldef search_codebase(query: str) -> dict: """Search for code in PROJECT FILES. Use ONLY for searching source code, not web content. For web searches, use search_web instead. """@tooldef search_web(query: str) -> dict: """Search the INTERNET for information. Use for current events, external documentation, general knowledge. For project code, use search_codebase instead. """
Pitfall 2: LLM passes wrong parameter type
Symptom: Tool expects integer, receives string like “5”.Cause: Missing or unclear type hints.Solution: Always use type hints and validate:
Copy
@tooldef get_item(item_id: int) -> dict: # Type hint tells LLM to pass int """Get item by ID. Args: item_id: Numeric ID of the item (e.g., 123, 456) """ # Defensive validation if not isinstance(item_id, int): try: item_id = int(item_id) except (ValueError, TypeError): return { "status": "error", "error": f"Invalid ID format: {item_id}", "suggestion": "Please provide a numeric ID" } # ... rest of implementation
Pitfall 3: Tool returns too much data
Symptom: Agent becomes slow or gives inconsistent answers.Cause: Tool returns massive amounts of data that overflow context.Solution: Limit and summarize output:
Copy
@tooldef search_logs(query: str) -> dict: """Search application logs.""" matches = log_search(query) # Always limit results MAX_RESULTS = 20 MAX_LINE_LENGTH = 200 truncated_matches = [] for match in matches[:MAX_RESULTS]: truncated_matches.append({ "file": match.file, "line_num": match.line, "content": match.content[:MAX_LINE_LENGTH] }) result = { "status": "success", "total_matches": len(matches), "showing": len(truncated_matches), "matches": truncated_matches } if len(matches) > MAX_RESULTS: result["note"] = f"Showing first {MAX_RESULTS} of {len(matches)}. Refine query for specific results." return result
Pitfall 4: Tool not being discovered
Symptom: LLM says “I don’t have a tool for that” when you do.Cause: Tool is defined but not registered with the agent.Solution: Make sure tool is inside _register_tools():
Copy
class MyAgent(Agent): def _register_tools(self): # ✅ Tool is registered @tool def my_tool(): ...# ❌ Tool outside agent - won't be registered!@tooldef orphan_tool(): ...
Pitfall 5: Tool with side effects runs unexpectedly
Symptom: Emails sent, files deleted, or data modified when user was just asking a question.Cause: Destructive tools need safeguards.Solution: Add confirmation or dry-run modes:
Copy
@tooldef delete_file(path: str, confirm: bool = False) -> dict: """Delete a file from the filesystem. Args: path: Path to the file to delete confirm: Must be True to actually delete. Default False returns what WOULD be deleted. Use this tool when the user explicitly asks to delete a file. Always show what will be deleted before confirming. """ import os if not os.path.exists(path): return {"status": "error", "error": f"File not found: {path}"} file_info = { "path": path, "size_bytes": os.path.getsize(path), "type": "file" if os.path.isfile(path) else "directory" } if not confirm: return { "status": "preview", "message": "This would delete:", "file": file_info, "note": "Call again with confirm=True to delete" } os.remove(path) return { "status": "success", "deleted": file_info }
Now the LLM will preview first:
Copy
User: "Delete the old config file"LLM: "I'll delete /config/old_settings.json (2.3KB). Should I proceed?" → First calls delete_file(path="...", confirm=False)User: "Yes"LLM: "Done, I've deleted the file." → Then calls delete_file(path="...", confirm=True)
Create a tool that: (1) Accepts a natural language query about users, (2) Translates it to a database operation, (3) Returns structured results, (4) Handles errors gracefully.Requirements: Clear docstring with usage examples, type hints on all parameters, graceful error handling, reasonable result limits.
Hints
Use a dict to simulate database records. Include examples in the docstring to guide the LLM. Return both data and metadata (count, any filtering applied).
Solution
Copy
from gaia.agents.base.tools import toolfrom typing import Optional, List# Simulated databaseUSERS_DB = [ {"id": 1, "name": "Alice", "role": "admin", "department": "Engineering"}, {"id": 2, "name": "Bob", "role": "developer", "department": "Engineering"}, {"id": 3, "name": "Carol", "role": "developer", "department": "Design"}, {"id": 4, "name": "Dave", "role": "manager", "department": "Sales"}, {"id": 5, "name": "Eve", "role": "developer", "department": "Engineering"},]@tooldef query_users( name: Optional[str] = None, role: Optional[str] = None, department: Optional[str] = None, limit: int = 10) -> dict: """Query the user database with filters. Args: name: Filter by name (partial match, case-insensitive) Example: "ali" matches "Alice" role: Filter by exact role Options: "admin", "developer", "manager" department: Filter by exact department Options: "Engineering", "Design", "Sales" limit: Maximum results to return (default: 10) Use this tool when the user wants to: - Find users by name - List users with a specific role - See who works in a department - Get user counts or lists Examples: - "Find all developers" → role="developer" - "Who works in Engineering?" → department="Engineering" - "Find Alice" → name="alice" Returns: Dictionary with matching users and query metadata """ try: # Start with all users results = USERS_DB.copy() filters_applied = [] # Apply name filter (partial, case-insensitive) if name: results = [u for u in results if name.lower() in u["name"].lower()] filters_applied.append(f"name contains '{name}'") # Apply role filter (exact match) if role: valid_roles = ["admin", "developer", "manager"] if role.lower() not in valid_roles: return { "status": "error", "error": f"Invalid role: {role}", "valid_roles": valid_roles, "suggestion": f"Use one of: {', '.join(valid_roles)}" } results = [u for u in results if u["role"].lower() == role.lower()] filters_applied.append(f"role = '{role}'") # Apply department filter (exact match) if department: valid_depts = ["Engineering", "Design", "Sales"] if department not in valid_depts: return { "status": "error", "error": f"Invalid department: {department}", "valid_departments": valid_depts, "suggestion": f"Use one of: {', '.join(valid_depts)}" } results = [u for u in results if u["department"] == department] filters_applied.append(f"department = '{department}'") # Apply limit total_matches = len(results) results = results[:limit] # Build response response = { "status": "success", "total_matches": total_matches, "showing": len(results), "users": results, "filters_applied": filters_applied if filters_applied else ["none"] } if total_matches > limit: response["note"] = f"Showing {limit} of {total_matches}. Increase limit or add filters." if total_matches == 0: response["suggestion"] = "No matches found. Try different filters." return response except Exception as e: return { "status": "error", "error": f"Query failed: {str(e)}", "suggestion": "Check your filter values and try again" }# Example usage in an agent:class UserQueryAgent(Agent): def _register_tools(self): # Register the tool we defined above self.tools["query_users"] = query_users
Why this solution works:
Clear docstring with examples: LLM knows exactly how to map user requests to parameters
Optional parameters: LLM can use any combination of filters