You asked ChatGPT for a function. The code looked clean. You ran it.
TypeError. You fixed that. Wrong return shape. Fixed that. Imports a library that does not exist. By the time the code runs, you have rewritten more than ChatGPT did.
The most common "ChatGPT code" complaint comes down to one thing. The prompt left blanks for the model to guess at. Every blank became a bug.
The four ways the code ends up broken
Invented APIs
import requests
response = requests.fetch_json("https://api.example.com")
requests.fetch_json() does not exist. The model invented it because "fetch_json" sounded like a real method. This is the most common reason "AI code doesn't run."
Wrong version syntax
The model writes Python 3.10 pattern matching for a project on 3.8. It writes React class components for a Next.js 16 app. The syntax is correct, just for a different version.
Missing edge cases
You asked for "a function that fetches a user." The model wrote happy-path code. No 404 handling, no auth failure, no rate limit, no timeout. The function works once on your dev machine and dies in production.
Wrong types
The function returns { data: { user } } when your codebase expects { user }. Or it returns a Promise instead of an awaited value. The code runs. It just does not compose with anything else.
Why this happens
ChatGPT is a text predictor. When your prompt says "write a function to fetch a user," the model predicts plausible code from a million similar examples. It does not know:
- What version of your language you are on
- What error handling style your codebase uses
- What types your callers expect
- Which libraries you have installed
So it guesses. The guesses look confident, which is what makes the bugs subtle.
A template that closes the blanks
Use this structure when you ask for code:
TASK
[1-2 sentences on what the function should do]
LANGUAGE / VERSION
[e.g. "TypeScript 5.4, target ES2022, strict mode on"]
FRAMEWORK / RUNTIME
[e.g. "Next.js 16 App Router, runs in a server action"]
INPUT TYPE
[paste the exact type definition or a sample input]
OUTPUT TYPE
[paste the exact type definition or a sample output]
ALLOWED DEPENDENCIES
[list libraries already in package.json. Say: "do NOT import
anything not on this list"]
ERROR HANDLING
[e.g. "throw on auth failure, return null on 404, retry on 429
with exponential backoff up to 3 attempts"]
EDGE CASES TO HANDLE
- [empty input]
- [auth failure]
- [rate limit]
- [network timeout]
EXAMPLE CALL + EXPECTED RESULT
[one concrete example so the model knows the shape]
CONSTRAINTS
- Use only methods that exist in the listed libraries.
- If you would have to invent a method, STOP and tell me what's
missing instead.
- Match the codebase's existing style.
This feels like a lot. It is. It is also faster than debugging the hallucinated version.
The one line that does the most work
Embed this near the top of your prompt:
"If you would have to invent a method, STOP and tell me what's missing instead."
By default the model invents rather than admits gaps. That single sentence flips the behavior. Instead of requests.fetch_json(), the model says "I'd need a method to do X. Does your codebase have one?"
In our testing, this one line cuts hallucinated-API bugs by more than half.
How to audit the code the model gives you
Even with the template, audit the output:
- Open package.json. Does every imported library actually exist?
- For each external method call, check the real library docs, not the model's memory of them.
- Does the return shape match what the caller expects?
- What happens on auth failure, 404, rate limit, timeout, malformed response?
- Every Promise either awaited or returned. Never dangled.
If any of those fail, the prompt was under-specified. Add the missing constraint and re-run.
Which model is best for code
For complex code tasks like multi-file refactors and real debugging, Claude Sonnet 4.6, Claude Opus 4.7, and Gemini 2.5 Pro all outperform GPT-5 and 5.4. ChatGPT is the fastest. The thinking models are the most accurate.
The template above works on any of them.
A faster way to check
Paste a code-generation prompt into FixMyPrompt. The rubric flags missing language version, missing input or output types, missing allowed-dependencies list, missing error handling, and missing "stop and ask instead of inventing" instructions. The rewrite assembles the template using whatever context you provided.
Three free reports per day. No signup.