← Tin's Posts · May 26, 2026 · 2 min read
Real Fancy Fuzzer
I've recently read a post discussing whether security engineers are more pro-AI than the rest of us. The argument: security people think probabilistically. Most code is probably (already) crummy. Nothing is deterministic. Drive down X%. Apply the same lens to AI output and stop expecting guarantees you never had.
It's not wrong. But I didn't find it an interesting question.
I've found myself thinking a different way - a skeptic's. LLMs aren't uniquely intelligent at breaking security. They're not discovering zero-days through some emergent understanding of attack surfaces. What they are is a fuzzer with almost unlimited scope - and that's actually the more interesting chain of thought.
A traditional fuzzer generates inputs, lots of them, with variation, and watches what breaks. The intelligence isn't in the fuzzer. It's in the verifier. Did it crash? Did it return data it shouldn't have? The fuzzer covers space. The harness catches signal.
An LLM running against an API surface is structurally the same. Probe this endpoint with unusual inputs. Try to access a resource without the right token. Ask for user 2's data while authenticated as user 1. The LLM has ingested every IDOR writeup, every auth bypass, every security postmortem ever published. It knows the patterns.
That's not intelligence. That's scope.
Then there's Anthropic and Mythos. Around a couple thousand dollars a scan, per repository. A hundred million committed to run it. Not pocket change.
A human pen tester has limited time, limited imagination on any given day, and genuinely can't hold the entire surface area of a large application in their head at once. An LLM can hold all of it, for as long as you keep paying.
The economic question is: does the cost-per-vulnerability ratio beat a human? My guess: it's thorough in a way humans aren't, and almost certainly cheaper for "does this have the obvious holes." Novel architectural vulnerabilities still need a human who understands the domain. The LLM knows the playbook. It doesn't write new chapters.
There's also the question of the mix. An LLM amplifies a skilled attacker in ways a fuzzer can't -- pattern recognition, exhaustive scope, and enough reasoning to know what to probe next. I don't think we'll see massive exploits crafted by AI alone, not on mainstream software. But a skilled group with a budget? We're already seeing those scenarios play out.
The interesting constraint isn't intelligence. It's economics.
Whether it scales depends on whether finding that bug was worth what you spent to find it. Anthropic apparently thinks so, for some definition of "worth it." The rest of us are still figuring out the denominator.
Enjoyed this? Subscribe to get future posts by email.