AIBrew
If you have exams coming up, we feel sorry for you. But we also want to remind you to check in on your friend who has outsourced his thinking to ChatGPT.
In today's newsletter, we'll get into:
- Warp's big bet on building open source with GPT-5.5
- How Braintrust turns customer requests into code with Codex
- What happens when companies become too AI-pilled?
The terminal is no longer just a place you open when Stack Overflow fails you.
Warp, the Rust-based terminal company, is betting big that the future of development looks like coordinating multiple coding agents across local machines, cloud infrastructure, and open-source workflows—all orchestrated by GPT-5.5 and other OpenAI models.
The startup announced this week that it's leaning into agent-driven development, where GPT-5.5 acts as a kind of middleman between your terminal, your IDE, and distributed coding tasks. Instead of you running commands, the model decides what runs where: locally for speed, in the cloud for heavy compute, or tapping open-source repositories for modular components. It's less "write code faster" and more "let the model decide your infrastructure."
The pitch is efficiency—fewer context switches, fewer typos, fewer "wait, where does this actually live?" moments. (Note: It also means fewer excuses for bugs, which is a problem when GPT-5.5 hallucinates SQL queries.)
Warp's move signals something bigger: the terminal, once a place for humans to bark orders at machines, is becoming a negotiation space between you, AI, and your entire development stack. Whether that's liberation or creeping automation dependency depends partly on how well the model understands your infrastructure—and partly on whether you're comfortable letting it try.
🤖 Boston Children's uses AI to unlock new diagnoses.
Boston Children's Hospital deployed OpenAI technology to sift through patient records and genetic data, surfacing connections that human radiologists and geneticists might miss. The result: 40+ rare disease diagnoses that would've otherwise languished in the "we don't know what's wrong" pile. It's less about replacing doctors and more about giving them a research assistant that never sleeps, never forgets a symptom, and doesn't charge overtime. The hospital reports reduced diagnostic burden and faster time-to-treatment—classic "AI as a second pair of eyes" story that actually works because the stakes are measurable (lives saved, not productivity points).
🤖 Strengthening societal resilience with Rosalind Biodefense.
OpenAI expanded access to GPT-Rosalind, a specialized model trained on biodefense, public health, and pandemic-preparedness datasets, to vetted developers and U.S. government partners. Think of it as ChatGPT with a biosecurity clearance. The company is positioning frontier AI as critical infrastructure for the next outbreak or bioterror scenario—and it's doing it with more caution than usual (hence "vetted access"). The subtext: powerful models can be weaponized, so OpenAI's playing gatekeeper. Whether that scales to global pandemics or just makes sense for political cover remains an open question.
🤖 Local LLM advocates predict 'agentic price collapse' vs cloud subscriptions.
/r/LocalLLaMA is flooding with "I'm switching off Claude/OpenAI" posts as developers realize that a $6k RTX 5090 setup costs less over two years than $200+ monthly API bills. The community is buzzing about an "agentic price collapse"—the moment when open-source models like Llama 4 Scout and Gemma 4 matched flagship performance and made subscription models look ridiculous by comparison. The calculus is simple: one-time hardware cost beats infinite rental fees. The catch? You're running inference yourself, managing your own infrastructure, and potentially dealing with model quality that lags commercial endpoints by a few months. It's a recurring pattern: expensive, closed AI → open-source exodus once the math works.
Braintrust, a decentralized platform for engineering talent, deployed GPT-5.5 with Codex to transform how it processes feature requests from clients. Instead of engineers manually parsing tickets and building prototypes, the model drafts initial code, runs it against test suites, and flags edge cases—shrinking the feedback loop from hours to minutes.
The workflow looks like this: customer submits a request → GPT-5.5 reads the spec → Codex generates a first pass → tests run automatically → engineers review, tweak, ship. It's not "AI writes production code unattended" (that way lies bugs); it's "AI handles the boilerplate, engineers handle the judgment."
For Braintrust, the upside is throughput—more experiments per engineer, faster iteration. For clients, it means features land faster. The unspoken trade-off: engineers now spend more time reviewing AI code than writing it from scratch, which shifts the skillset from "can you code?" to "can you spot what a model got wrong?" (Note: Those are different skills.)
The broader pattern: every platform offering on-demand engineering is racing to inject AI into the request-to-delivery pipeline. Braintrust's bet is that GPT-5.5 + Codex cuts enough friction that human engineers become a quality gate, not a bottleneck—which works only if the model's error rate stays low enough that review isn't a job by itself.
Box founder Aaron Levie called it "AI psychosis": the moment when executives, bedazzled by model benchmarks and cost-cutting fantasies, greenlight mass layoffs to replace roles they don't actually understand with AI systems they haven't tested at scale.
ClickUp cut 22% of its workforce recently, betting on AI agents to fill the gap. 2026 tech layoffs are already nearly matching all of 2025. The pattern is predictable: CEO reads a TechCrunch headline about GPT-5.5's coding prowess → CEO decides the entire support team can be replaced → CEO discovers, six months later, that what AI excels at on a benchmark doesn't map to what your actual customers need when they're angry at 2 a.m.
Levie's point cuts deeper: the people making these decisions—mostly non-technical executives—are the least equipped to assess what a job actually requires versus what an AI demo *claims* to do. A customer success manager who's been with your company for five years knows why customers churn. An LLM knows patterns in chat logs. These are not the same thing.
The fallout: reduced service quality, customer frustration, and (the delicious part) rehiring the people you just fired when the agents fail. It's the AI equivalent of "move fast and break things," except the things that break include your brand and your team's ability to ship anything that doesn't suck.
- GitHub Copilot's new token-based billing is sparking outrage among developers who claim the pricing model penalizes verbose coding and rewards shortcuts over clarity.
- A viral CLAUDE.md workflow has hit 220k GitHub stars by codifying Andrej Karpathy's "vibe coding" strategies for managing AI agents with surgical changes and goal-driven execution.
- Elon Musk announced Grok V9-Medium, featuring 1.5 trillion parameters and advanced coding reasoning, with a pledge to open-source V8-Small by year's end.
- The Vatican issued its first formal AI encyclical, "Magnifica Humanitas," addressing AI's impact on labor, dignity, and human relationships.
- Developers now refuse to work without AI coding assistants, but researchers warn that speed-optimized code isn't always correct code.
- Spotify and UMG launched an AI remix tool that's been roundly criticized as "slop" by the music community and accused of training listeners to accept AI-generated mediocrity.
- The heated GPT-5.5 vs. Claude Opus 4.7 benchmark war shows GPT-5.5 dominating agentic tasks while Claude still edges out OpenAI's model on nuanced reasoning.
- Meta is developing an AI pendant, continuing its hardware strategy to embed AI assistants into everyday wearables.
- Finland's leading newsroom deployed an AI press-release scanner that fabricated a headline about Russian drones entering airspace, exposing the risks of fully automated editorial workflows.
That's all for this week. Stay sharp, keep your API bills under $200/month if you can, and remember: the terminal can't learn your infrastructure if you don't teach it first.