Skip to content

Large Language Models

Should we invent “Vide Founding”?

I find it astonishing to see so many new AI companies like Moonshot AI (Kimi 2) and Zhipu AI (GLM) building models and actually compete with Not-so-OpenAI. Even Grok is in the game, somehow.

While Apple can’t ship a working AI assistant 🫣.

In theory (and practice), Apple has unlimited money, the best talent, a billion devices, and the ecosystem that comes with that. The startups on the other hand just have smart people and funding. But then, Apple has those too, only more of them.

It’s the size. It matters. Apple’s size has stopped being an advantage, obviously. A startup can organize everything around AI while Apple has to add AI to a company that’s organized around hardware, retail, services, and a dozen other things. It’s like they’re renovating while people are living in the house.

It’s way easier to start something new than to change something that already exists, provided you have access to capital and talent. Musk figured this out (apparently). He’s basically invented Vibe Founding: you have enough money and personal brand that talented (if naive) engineers will join you, and you can spawn a new company, train absurdly large models, and compete with established players in a couple years. The fact that xAI burns through resources and ignores environmental regulations doesn’t seem to matter.

Oh, and Vibe founding has the same problem as vibe coding, though. Vibe coding gives you a prototype fast but leaves technical debt everywhere. Vibe founding gives you a company fast but the debt is ecological and social. Musk’s platforms amplify (his) misinformation and dangerous ideology at scale now. What we saw in the recent incidents is probably just the start.

The Shifting Landscape of AI: The Rise of China


Generated using Draw Things.

On September 30, 2025, something unremarkable happened: a Chinese AI company called Z.ai released GLM-4.6, a new large language model. The release came one day after Anthropic shipped Claude Sonnet 4.5, roughly seven weeks after OpenAI launched GPT-5. Three frontier models in two months. Just another week in AI.

Except it wasn’t.

What made GLM-4.6 significant wasn’t its performance, though it achieves near-parity with Claude Sonnet 4 (48.6% win rate). It wasn’t even that it trails Claude Sonnet 4.5 in coding, currently the benchmark. What mattered was the price tag.

GLM-4.6 costs $0.50 per million input tokens and $1.75 per million output tokens. Claude Sonnet 4.5? $3 input, $15 output. That’s 6-8.5 times cheaper for roughly comparable performance. Put another way: developers paying $200/month for Claude Max can get similar coding assistance through GLM for $3-15/month.

GLM-4.6 costs $0.50 per million input tokens and $1.75 per million output tokens. Claude Sonnet 4.5? $3 input, $15 output. That’s 6-8.5 times cheaper for roughly comparable performance. Put another way: developers paying $200/month for Claude Max can get similar coding assistance through GLM for $3-15/month.

This is the kind of price disruption that doesn’t just change markets. It restructures them.

Taming Dragons

Image of dragons. Generated with the Draw Things software.
Generated using Draw Things.

If you’ve worked with Large Language Models (LLMs), you’ve probably experienced a peculiar kind of cognitive dissonance. On one hand, it feels like magic. The possibilities seem endless. You can generate human-like text, answer questions, even write code. It’s as if we’ve unlocked a new superpower.

But on the other hand, it’s not fully predictable, let alone reliable. It’s like we’ve discovered a new species of bird that can provide flight to everyone, but that species happens to be a dragon. A dragon that occasionally and unpredictably breathes fire, destroying things and breaking the trust you so badly want to have in it.

This makes working with LLMs a highly non-trivial engineering challenge. It’s not just about implementation; it’s about taming a powerful but volatile force. So how do we do it? Here are some thoughts:

The Great Reversal

Comparing small AI models on performance vs cost. Credits: Artificial Analysis.
Comparing small AI models on performance vs cost. Credits: Artificial Analysis.

Something interesting is happening in AI. After years of “bigger is better,” we’re seeing a shift towards smaller, more efficient models. Mistral just released NeMo and OpenAI unveiled GPT40-mini. Google’s in on it too, with Gemini Flash. What’s going on?

It’s simple: we’ve hit diminishing returns with giant models.

Training massive AI models is expensive. Really expensive. We’re talking millions of dollars and enough energy to power a small town. For a while, this seemed worth it. Bigger models meant better performance, and tech giants were happy to foot the bill in the AI arms race.

But here’s the thing: throwing more parameters at the problem only gets you so far. It turns out that data quality matters way more than sheer model size. And high-quality data? That’s getting scarce.