The biggest problem with how AI agents search today is embarrassingly simple: they do everything one step at a time. Ask for five related results, and your agent fires five separate API calls, reads each one, then repeats. It’s slow, expensive, and the model’s context window fills up with noise before it can do any real thinking.
Perplexity’s answer is to let the model write code instead of making calls.
What “Search as Code” actually means
Rather than calling a fixed search API, the model writes a Python program that talks directly to Perplexity’s search stack. The program runs in a secure sandbox. Results get filtered, deduplicated, and ranked inside that sandbox, not inside the model’s context. By the time anything reaches the model, it’s clean and ready to work with.
This matters for three specific reasons.
Parallelism. If your agent needs five variant queries, serial tool-calling means waiting five times longer than necessary. With fanout queries running concurrently inside the generated program, you get all five results in roughly the same time as one. At ten queries, the gap becomes absurd.
Cost. Running a full pipeline for a simple question is wasteful. The model now writes only the operations a query actually needs. Perplexity claims this cuts token costs by up to 85% compared to the standard approach. That’s not a rounding difference — at scale, it changes the economics of building search-heavy agents entirely.
Context cleanliness. In traditional tool-calling, intermediate results pile up in the model’s context window. You end up with half your context budget eaten by noise. Search as Code processes and discards intermediate data inside the sandbox. The model sees a compact Python program and the final output, nothing else.
The six building blocks
The Agentic Search SDK gives the model six composable operations: retrieval, fanout, ranking, filtering, deduplication, and rendering. Each one does exactly one thing. They chain together, and the parallelizable ones (fanout, retrieval) run concurrently. Filtering and deduplication use deterministic code, so results are consistent across runs rather than drifting based on model mood.
Does the performance hold up?
The benchmark numbers are strong. Perplexity led four of five evaluation suites against OpenAI, Anthropic, Exa, and Parallel. On a cybersecurity task involving 200+ CVE advisory reports, Search as Code hit 100% accuracy. Competitors landed below 25%. That’s not a close race.
The 85% token cost reduction is the figure most worth scrutinizing. It’s measured against a non-codegen baseline, so the comparison is fair, but real-world numbers will vary based on query complexity and how aggressively you use fanout.
Who should care right now
Developers using the Perplexity Agent API can enable this today. It’s also the default in Computer mode. Research and Search modes are reportedly coming later this year.
If you’re building an agent that does lots of parallel lookups, information synthesis, or tasks like security scanning or competitive research, the cost and speed improvements are real enough to evaluate seriously.
The broader point
Search was the obvious first target, but the underlying pattern applies anywhere an agent loops through identical tool calls. Database queries, API orchestration, multi-step data pipelines — the same logic holds. Give the model composable primitives, a sandbox to execute code, and get out of the way. Perplexity didn’t just fix search; they demonstrated an architecture that other systems could follow.
Whether competitors close that gap quickly depends on how much infrastructure work serial calling saves them today. Right now, Perplexity is the only platform doing this natively.
Bottom line: The codegen approach to search is smarter than the serial approach. The benchmarks back it up. The cost reduction alone makes it worth testing if you’re already using the Perplexity API.
Quick verdict: A genuinely clever architecture shift. If you build AI agents or just care about where search is headed, this is worth paying attention to.