OpenAI Function Calling: o3 & o4-mini Cheat Sheet

AI
Anjali
September 10, 2025
No Comments

If you’ve ever watched a smart agent “go rogue,” you know the pain: random tool calls, missing files, and mystery failures. The o3/o4-mini models fix a lot of that—if you give them a clean contract and a clear plan. This cheat sheet shows how I set up OpenAI function calling for calm, predictable behavior (and fewer 3 a.m. incidents).

How do I use OpenAI function calling with o3/o4-mini? Define self-contained functions with crystal-clear descriptions and JSON schemas, set strict: true, and specify tool order and fallbacks in your developer message. Persist API “reasoning items” between calls via the Responses-style flow to maintain context and boost reliability.

Source: Linkedin User

What’s special about o3 & o4-mini for tool use?

Built for reasoning & tool use. These models “take a beat” before acting, which reduces sloppy calls and improves multi-step tasks.
Structured outputs + strict mode. Enforce schema-correct function inputs/outputs by enabling strict behavior. Less glue code, fewer surprises.
Cross-call continuity. When your app chains tools, pass back the API’s “reasoning items” so the model keeps context over multiple calls. Think of it as short-term memory for your agent.

The Cheat Sheet (what actually works now)

1) Prompting is evolving → use developer messages

Forget one mega system prompt. In your developer message, lay out:

Tool policy (what each tool is for, when to use it)
Sequence logic (order of tools; where failure fallbacks kick in)
Non-goals (what not to do: “Never write to disk before ensure_dir succeeds.”)

Reasoning models follow high-level guidance well—so make your rules crisp and testable.

2) Tool order matters

Say the quiet part out loud:

“If a directory may not exist → call ensure_dir → then write_file → then index_file.”
Disallow chaotic paths: “Never call write_file until ensure_dir returns ok: true.”

Order avoids weird cascades like writing into nowhere.

3) Descriptions = your contract

Each function should be self-contained with:

A one-line purpose
When to use it (decision rule)
Inputs with units/formats (e.g., “path must be absolute”)
Output shape (what success/failure looks like)

Place rules up front. Keep it actionable, not poetic.

4) Guard against hallucinations

Turn on strict mode (strict: true) so the model adheres to the JSON schema instead of “best effort.”
Add a rule: “Do not mention tools that were not called in this turn.”
On unexpected output, force a retry with the same schema, not a looser one.

5) Use a Responses-style flow to keep continuity

When the model calls several tools in a row, persist the API’s returned reasoning items (not private chain-of-thought) and pass them into the next call. This improves multi-step accuracy without re-explaining context every time.

6) Mix hosted + custom tools—but label the lanes

Spell out exactly when to use your hosted Search/Code tools vs. your custom business endpoints. Clear lane markers → cleaner behavior.

Minimal, durable tool spec (copy-paste starter)

{

“tools”: [

{

“type”: “function”,

“name”: “ensure_dir”,

“description”: “Create a directory if it does not exist. Use before any write.”,

“parameters”: {

“type”: “object”,

“properties”: {

“path”: { “type”: “string”, “description”: “Absolute directory path” }

“required”: [“path”],

“additionalProperties”: false

“strict”: true

{

“type”: “function”,

“name”: “write_file”,

“description”: “Write small text files. Use only after ensure_dir ok:true.”,

“parameters”: {

“type”: “object”,

“properties”: {

“path”: { “type”: “string”, “description”: “Absolute file path” },

“content”: { “type”: “string” }

“required”: [“path”, “content”],

“additionalProperties”: false

“strict”: true

}

“developer_instructions”: [

“If target directory may not exist: call ensure_dir -> write_file.”,

“Never call write_file before ensure_dir returns ok:true.”,

“Return only actually executed function calls; do not reference future tools.”

]

}

Enable strict: true for each function or via Structured Outputs so inputs always match your schema.

Common pitfalls (and quick fixes)

Vague tool descriptions → Fix: Add “when to use” rules and input formats.
Parallel chaos → Fix: Force an explicit order; gate the next tool on prior success.
Best-effort JSON → Fix: Turn on strict mode and reject invalid payloads.
Context amnesia across calls → Fix: Persist and resend reasoning items in a Responses-style loop.
Over-prompting → Fix: Short, declarative developer guidance > flowery instructions.

Fast setup checklist

Define functions with self-contained descriptions + JSON schemas.
Add developer rules for ordering, fallbacks, and non-goals.
Turn on strict mode.
Implement a Responses-style loop that carries forward reasoning items.
Mix hosted + custom tools with clear lane rules.

Conclusion

o3 and o4-mini can feel uncannily dependable—when you hand them a tight contract. Give them strict schemas, clear tool order, and short-term memory. You’ll see fewer misfires and more “it just works” moments. Explore more AI tools on TheAISurf to power up your stack.

FAQs

What’s the single most important setting?

If you only do one thing, enable strict: true so function calls must match your schema. It shuts down a big slice of “creative” payloads before they hit your code.

How do I stop the model from naming tools it didn’t use?

State it in the developer rules (“Return only executed calls”) and validate on your side. Reject replies that reference tools without a corresponding call.

Do I need long prompts with o4-mini?

No. These reasoning models respond well to high-level, crisp rules. Keep messages short, define your lanes, and let the model do the planning.