The Mechanics of Function Calling, APIs, and Making AI Actually Do Things
Reading time: 14 minutes | Difficulty: Intermediate
In Parts 1 and 2, we learned what agents are and which frameworks to use. But there's still a crucial piece missing: how do agents actually DO things?
When an AI agent books a flight, searches the web, or runs code — what's actually happening under the hood?
The answer is tool use (also called "function calling"). And understanding it is the key to building agents that actually work.
Here's something that surprises almost everyone:
The LLM doesn't execute tools. Your code does.
When Claude or GPT-4 "searches the web" or "runs code," the model isn't actually performing those actions. It's suggesting which tool to use and what arguments to pass. Your application receives that suggestion, validates it, and executes the actual function.
This distinction matters for security, reliability, and understanding what's really possible.
Let's walk through exactly what happens when you ask an agent to check the weather:
Before anything happens, you tell the LLM what tools exist:
{ "name": "get_weather", "description": "Get current weather for a location", "parameters": { "location": { "type": "string", "description": "City name, e.g., 'Tokyo, Japan'" } } }
This is like giving someone a menu — they can only order what's on it.
User: "What's the weather in Tokyo?"
Your app sends this to the LLM along with the list of available tools.
The LLM doesn't answer directly. Instead, it returns:
{ "tool_call": { "name": "get_weather", "arguments": { "location": "Tokyo, Japan" } } }
Notice: this is just a suggestion. The LLM is saying "I think you should call get_weather with this argument."
This is the critical part. Your code receives the suggestion and decides whether to:
# YOUR CODE runs this, not the LLM if tool_call.name == "get_weather": result = weather_api.get(tool_call.arguments["location"])
You send the results back to the LLM:
{ "tool_result": { "temperature": "18°C", "condition": "Cloudy", "humidity": "65%" } }
The LLM then formulates a natural language response: "The weather in Tokyo is currently 18°C and cloudy with 65% humidity."
The fact that your app executes tools, not the LLM has huge implications:
You control exactly what actions are allowed. The LLM can suggest deleting all your files, but your code decides whether to actually do it.
You can check arguments before executing. Is that email address valid? Is that file path safe?
Every tool call passes through your code. You can log everything, rate-limit, and review.
If a tool fails, your code can retry, fall back, or ask the user for help — instead of the LLM hallucinating a result.
Modern agents can connect to an enormous range of tools. Here's the landscape:
| Tool Type | What It Does | Example |
|---|---|---|
| Web Search | Query search engines | "Find latest AI news" |
| Web Scraping | Extract data from websites | "Get product prices from Amazon" |
| RAG Retrieval | Search knowledge bases | "Find our company policy on X" |
| API Queries | Get structured data | "Get population of France" |
| Tool Type | What It Does | Example |
|---|---|---|
| Code Execution | Run Python/JS in sandbox | "Calculate compound interest" |
| Code Interpreter | Analyze data, create charts | "Visualize this CSV" |
| Shell Commands | System operations | "List files in directory" |
| Git Operations | Manage repositories | "Create a pull request" |
| Tool Type | What It Does | Example |
|---|---|---|
| SQL Databases | Query relational data | "Get sales by region" |
| Vector Stores | Semantic similarity search | "Find similar documents" |
| File Systems | Read, write, organize | "Save report to folder" |
| Tool Type | What It Does | Example |
|---|---|---|
| Send, read, organize | "Send meeting invite" | |
| Slack/Teams | Post messages | "Alert team of issue" |
| Calendar | Schedule, check availability | "Book meeting room" |
| Tool Type | What It Does | Example |
|---|---|---|
| Image Generation | DALL-E, Midjourney | "Create logo concept" |
| Image Analysis | Vision, OCR | "Read text from receipt" |
| Document Generation | PDF, Word, slides | "Create quarterly report" |
The key insight: If a service has an API, an agent can use it. The only limits are what tools you choose to enable.
Giving AI the ability to take actions creates real risks. Here's what you need to know:
Prompt Injection — Malicious instructions hidden in data the agent reads
Tool Misuse — Agent uses legitimate tools in harmful ways
Scope Creep — Agent exceeds intended authority
Sandboxing: Run code execution in isolated containers (Docker, gVisor, Firecracker)
Least Privilege: Only give tools the minimum permissions needed
Human-in-the-Loop: Require approval for high-impact actions
Agent: "I'm about to delete 500 files. Confirm? [Y/N]"
Rate Limiting: Prevent runaway costs and actions
if tool_calls_this_minute > 10: raise RateLimitError("Too many tool calls")
Input Validation: Never trust data blindly
# BAD: Agent can execute any command os.system(agent_suggestion) # GOOD: Only allow whitelisted commands if agent_suggestion in ALLOWED_COMMANDS: execute(agent_suggestion)
One recent breakthrough deserves special mention: Structured Outputs.
The problem: LLMs sometimes return malformed JSON, breaking your code.
The solution: Constrained decoding that guarantees valid output.
# With OpenAI's strict mode response = client.chat.completions.create( model="gpt-4o", response_format={ "type": "json_schema", "json_schema": my_schema, "strict": True # Guarantees valid output } )
With strict: True, the model literally cannot produce invalid JSON. The output is guaranteed to match your schema.
This eliminates an entire class of bugs and makes tool calling much more reliable.
Remember from Part 2 how MCP (Model Context Protocol) is becoming the standard? Here's why it matters for tools:
Before MCP: Every framework had its own tool format. Build a tool for LangChain, rebuild it for CrewAI, rebuild again for Claude.
After MCP: Build once, use everywhere. Like USB for AI tools.
# MCP tool definition (works with any MCP-compatible framework) @mcp.tool() def search_database(query: str) -> list[dict]: """Search the company database for relevant records.""" return db.search(query)
Major players adopting MCP: Anthropic (creator), OpenAI, Google, Microsoft, AWS.
LLMs suggest tools, your code executes them — This separation is crucial for security and control
The 5-step dance: Define tools → Send request → LLM suggests → Your app executes → Return results
Agents can connect to anything with an API — Web, databases, email, code execution, image generation...
Security requires defense in depth — Sandboxing, least privilege, human approval, rate limiting, validation
Structured Outputs guarantee valid JSON — Eliminates parsing errors, makes tool calling reliable
MCP is standardizing tool connectivity — Build once, use with any framework
In Part 4, we'll look at real-world applications across industries (with actual stats), the 2025 landscape, and how to stay current as this field evolves rapidly.
Series Navigation:
Last updated: December 2025