Field Notes
How to Handle Malformed JSON from LLM Responses in TypeScript
Quick Summary
tl;dr: If you’re getting malformed JSON from LLM API responses — markdown code fences, trailing commas, smart quotes, JavaScript-style objects — I published a small TypeScript package that fixes all of it. It’s called ai-json-safe-parse, it’s on npm and GitHub, zero dependencies, works everywhere.
LLMs return broken JSON constantly
I’m going to assume you already know this if you clicked on this post, but if you’ve been building anything that calls an LLM API and asks for JSON back, you’ve dealt with this.
You tell the model “respond with JSON”. You even put it in the system prompt. You might even be using structured outputs or tool use. And it works 95% of the time. But the other 5% of the time you get something like this:
Sure! Here's the analysis:
```json
{
"sentiment": "positive",
"score": 0.95,
"keywords": ["great", "excellent",],
}
```
Let me know if you need anything else!
Three things wrong with that: it’s wrapped in markdown code fences, there are trailing commas, and there’s a bunch of prose around it. JSON.parse will throw on all of it.
And those are the easy cases. In production I’ve seen models return smart quotes (" instead of "), em dashes in place of hyphens, unquoted property names like it’s JavaScript, single-quoted strings, // comments inline, and zero-width unicode characters sprinkled into keys for no apparent reason.
If you’re running this at any kind of scale, you need to handle it.
Why I built this
I’m the CTO at LeadTruffle — we build AI-powered lead capture tools for home service businesses. We process a lot of LLM-generated JSON every day across our lead qualification, SMS automation, and voice AI pipelines. The JSON comes from OpenAI, Claude, Gemini, and a few other providers depending on the task.
We had basically the same JSON repair code copy-pasted across like four different services. Each one was slightly different and handled slightly different edge cases. After the third time I found myself debugging a production error that one version handled but another didn’t, I pulled it all into a single package.
I figured other people are writing the same code, so we open sourced it. Thank you to LeadTruffle for being cool with that.
The package
It’s called ai-json-safe-parse.
Zero dependencies. ~2KB gzipped. Works in Node.js, browsers, Cloudflare Workers, Deno — basically anywhere JavaScript runs. Full TypeScript generics. Dual ESM and CommonJS.
npm install ai-json-safe-parse
Usage
The main function is aiJsonParse. You give it the raw string from an LLM, and it returns a discriminated union — either { success: true, data: T } or { success: false, error: string }. It never throws.
import { aiJsonParse } from 'ai-json-safe-parse'
const llmOutput = `Here is the analysis:
\`\`\`json
{
"sentiment": "positive",
"confidence": 0.92,
}
\`\`\`
`
const result = aiJsonParse<{
sentiment: string
confidence: number
}>(llmOutput)
if (result.success) {
console.log(result.data.sentiment) // "positive"
console.log(result.data.confidence) // 0.92
}
There are two other exports for different ergonomics:
// Returns T | null (or a fallback value you provide)
const data = aiJsonSafeParse<MyType>(text)
// Throws on failure
const data = aiJsonStrictParse<MyType>(text)
I mostly use aiJsonParse in backend services where I want to handle errors explicitly, and aiJsonSafeParse with a fallback in places where I have a sensible default.
What it actually fixes
The parser runs through a pipeline of recovery strategies. It tries the safest things first and only moves to more aggressive fixes if the simpler ones fail.
| Strategy | What it handles |
|---|---|
Direct JSON.parse | Valid JSON (fast path, no overhead) |
| Markdown extraction | ```json ... ```, ```jsonc , ```javascript , bare code fences |
| Unicode normalization | Smart quotes, em/en dashes, UTF-8 BOM, zero-width characters, non-breaking spaces |
| Bracket matching | JSON embedded in surrounding prose |
| Comment stripping | // line comments and /* block comments */ (string-aware, won’t break URLs) |
| Trailing comma removal | {"a": 1,} and [1, 2,] |
| Single quote conversion | {'key': 'value'} → {"key": "value"} |
| Unquoted key quoting | {key: "value"} → {"key": "value"} |
| Regex key-value extraction | Last resort for severely mangled JSON |
The first four are “safe” mode — they extract but don’t modify the JSON syntax. The rest are “aggressive” mode and actually repair the content. You can control this:
// Only extract, don't repair
const result = aiJsonParse(text, { mode: 'safe' })
// Extract and repair (default)
const result = aiJsonParse(text, { mode: 'aggressive' })
“Just use structured outputs”
Yeah, you should when you can. But there are real reasons you can’t always rely on them:
Not every model supports them. If you’re using multiple providers, or routing between models, or using an open source model on Together AI or similar, you can’t assume structured outputs are available everywhere.
Streaming responses. If you’re streaming and the model has already decided to wrap the response in markdown, structured output mode doesn’t always help.
Chained LLM calls. If you feed one model’s JSON output as context into another model, the second model might quote it, reformat it, wrap it in markdown, or “helpfully” add comments. I’ve seen this happen a lot.
Existing prompts. We have a lot of prompts in production that predate structured output support. Rewriting and regression-testing all of them to avoid a parse issue that a 2KB library solves is not a good use of time.
Defense in depth. Even with structured outputs, I’d rather have a defensive parser than a hard crash. The cost of this library is essentially nothing. The cost of a 500 error in production from a malformed response is not nothing.
Weird stuff we’ve actually seen in production
Some of the more interesting edge cases that motivated different parts of the recovery pipeline:
Smart quotes from GPT-4. For a while GPT-4 would replace " with " and " if the user input or prompt contained text that was copy-pasted from Word or a rich text email. This was maddening to debug because the quotes look right in most editors.
Em dashes replacing hyphens. LLMs love em dashes. “10—20” instead of “10-20” in a value. Looks fine to a human, breaks any downstream consumer expecting a number range.
UTF-8 BOM at the start of output. \uFEFF — the byte order mark. Invisible in every editor, completely breaks JSON.parse. I’ve only seen this from one provider and I still don’t know why it happened.
Zero-width spaces inside property names. \u200B inserted between characters in JSON keys. I genuinely have no explanation for this one. But it happened more than once, so we handle it.
Full JavaScript object syntax. {name: "Alice", age: 30} — no quotes on the keys. This is valid JS but invalid JSON. Models do this fairly often, especially smaller models or when they’re being “casual” in their output.
Links
- npm: ai-json-safe-parse
- GitHub: a-r-d/ai-json-safe-parse
MIT licensed. If you hit a malformed LLM response that this doesn’t handle, open an issue with the raw text. That’s the single most useful contribution you can make. PRs are welcome too.