AIBrew

AIBrew · To: Readers

Subscribe →

AIBrew

If you have exams coming up, we feel sorry for you. But we also want to remind you to check in on your friend who has outsourced his thinking to ChatGPT.

In today's newsletter, we'll get into:

  • Warp's big bet on building open source with GPT-5.5
  • How Braintrust turns customer requests into code with Codex
  • What happens when companies become too AI-pilled?
NEWS
Niv Bavarsky

The terminal is no longer just a place you open when Stack Overflow fails you.

Warp, the Rust-based terminal company, is betting big that the future of development looks like coordinating multiple coding agents across local machines, cloud infrastructure, and open-source workflows—all orchestrated by GPT-5.5 and other OpenAI models.

The startup announced this week that it's leaning into agent-driven development, where GPT-5.5 acts as a kind of middleman between your terminal, your IDE, and distributed coding tasks. Instead of you running commands, the model decides what runs where: locally for speed, in the cloud for heavy compute, or tapping open-source repositories for modular components. It's less "write code faster" and more "let the model decide your infrastructure."

The pitch is efficiency—fewer context switches, fewer typos, fewer "wait, where does this actually live?" moments. (Note: It also means fewer excuses for bugs, which is a problem when GPT-5.5 hallucinates SQL queries.)

Warp's move signals something bigger: the terminal, once a place for humans to bark orders at machines, is becoming a negotiation space between you, AI, and your entire development stack. Whether that's liberation or creeping automation dependency depends partly on how well the model understands your infrastructure—and partly on whether you're comfortable letting it try.

TOUR DE HEADLINES
Source: Puck News

🤖 Boston Children's uses AI to unlock new diagnoses.

Boston Children's Hospital deployed OpenAI technology to sift through patient records and genetic data, surfacing connections that human radiologists and geneticists might miss. The result: 40+ rare disease diagnoses that would've otherwise languished in the "we don't know what's wrong" pile. It's less about replacing doctors and more about giving them a research assistant that never sleeps, never forgets a symptom, and doesn't charge overtime. The hospital reports reduced diagnostic burden and faster time-to-treatment—classic "AI as a second pair of eyes" story that actually works because the stakes are measurable (lives saved, not productivity points).

🤖 Strengthening societal resilience with Rosalind Biodefense.

OpenAI expanded access to GPT-Rosalind, a specialized model trained on biodefense, public health, and pandemic-preparedness datasets, to vetted developers and U.S. government partners. Think of it as ChatGPT with a biosecurity clearance. The company is positioning frontier AI as critical infrastructure for the next outbreak or bioterror scenario—and it's doing it with more caution than usual (hence "vetted access"). The subtext: powerful models can be weaponized, so OpenAI's playing gatekeeper. Whether that scales to global pandemics or just makes sense for political cover remains an open question.

🤖 Local LLM advocates predict 'agentic price collapse' vs cloud subscriptions.

/r/LocalLLaMA is flooding with "I'm switching off Claude/OpenAI" posts as developers realize that a $6k RTX 5090 setup costs less over two years than $200+ monthly API bills. The community is buzzing about an "agentic price collapse"—the moment when open-source models like Llama 4 Scout and Gemma 4 matched flagship performance and made subscription models look ridiculous by comparison. The calculus is simple: one-time hardware cost beats infinite rental fees. The catch? You're running inference yourself, managing your own infrastructure, and potentially dealing with model quality that lags commercial endpoints by a few months. It's a recurring pattern: expensive, closed AI → open-source exodus once the math works.

NEWS
Generated by the crew

Braintrust, a decentralized platform for engineering talent, deployed GPT-5.5 with Codex to transform how it processes feature requests from clients. Instead of engineers manually parsing tickets and building prototypes, the model drafts initial code, runs it against test suites, and flags edge cases—shrinking the feedback loop from hours to minutes.

The workflow looks like this: customer submits a request → GPT-5.5 reads the spec → Codex generates a first pass → tests run automatically → engineers review, tweak, ship. It's not "AI writes production code unattended" (that way lies bugs); it's "AI handles the boilerplate, engineers handle the judgment."

For Braintrust, the upside is throughput—more experiments per engineer, faster iteration. For clients, it means features land faster. The unspoken trade-off: engineers now spend more time reviewing AI code than writing it from scratch, which shifts the skillset from "can you code?" to "can you spot what a model got wrong?" (Note: Those are different skills.)

The broader pattern: every platform offering on-demand engineering is racing to inject AI into the request-to-delivery pipeline. Braintrust's bet is that GPT-5.5 + Codex cuts enough friction that human engineers become a quality gate, not a bottleneck—which works only if the model's error rate stays low enough that review isn't a job by itself.

NEWS
Getty Images

Box founder Aaron Levie called it "AI psychosis": the moment when executives, bedazzled by model benchmarks and cost-cutting fantasies, greenlight mass layoffs to replace roles they don't actually understand with AI systems they haven't tested at scale.

ClickUp cut 22% of its workforce recently, betting on AI agents to fill the gap. 2026 tech layoffs are already nearly matching all of 2025. The pattern is predictable: CEO reads a TechCrunch headline about GPT-5.5's coding prowess → CEO decides the entire support team can be replaced → CEO discovers, six months later, that what AI excels at on a benchmark doesn't map to what your actual customers need when they're angry at 2 a.m.

Levie's point cuts deeper: the people making these decisions—mostly non-technical executives—are the least equipped to assess what a job actually requires versus what an AI demo *claims* to do. A customer success manager who's been with your company for five years knows why customers churn. An LLM knows patterns in chat logs. These are not the same thing.

The fallout: reduced service quality, customer frustration, and (the delicious part) rehiring the people you just fired when the agents fail. It's the AI equivalent of "move fast and break things," except the things that break include your brand and your team's ability to ship anything that doesn't suck.

WHAT ELSE IS BREWING

That's all for this week. Stay sharp, keep your API bills under $200/month if you can, and remember: the terminal can't learn your infrastructure if you don't teach it first.