Taming Dragons¶

Image of dragons. Generated with the Draw Things software. — Generated using Draw Things.

If you’ve worked with Large Language Models (LLMs), you’ve probably experienced a peculiar kind of cognitive dissonance. On one hand, it feels like magic. The possibilities seem endless. You can generate human-like text, answer questions, even write code. It’s as if we’ve unlocked a new superpower.

But on the other hand, it’s not fully predictable, let alone reliable. It’s like we’ve discovered a new species of bird that can provide flight to everyone, but that species happens to be a dragon. A dragon that occasionally and unpredictably breathes fire, destroying things and breaking the trust you so badly want to have in it.

This makes working with LLMs a highly non-trivial engineering challenge. It’s not just about implementation; it’s about taming a powerful but volatile force. So how do we do it? Here are some thoughts:

1. Think small and focused, not big and general¶

Instead of relying on huge, “do-everything” models, use multiple smaller, cost-efficient models. Each model should handle a specific type of task. It’s the Unix philosophy applied to AI: do one thing and do it well.

2. Fine-tune relentlessly¶

Each model should be fine-tuned for its specific task. This not only improves performance but also helps constrain the model’s outputs to a more predictable range.

3. Separate concerns¶

This approach follows a solid principle in programming: separation of concerns. By breaking down the system into smaller, more focused parts, you make it easier to understand, test, and maintain.

4. Make it testable¶

One of the biggest challenges with general AI assistants is testing. How do you test something that’s supposed to do everything? By narrowing the scope of each model, you make your system more testable. You’re no longer asking, “Can this AI do anything?” but rather, “Can this AI do this specific thing correctly?”

5. Build an agentic architecture¶

Use a system that routes user requests to the relevant model. Let models validate each other’s output. This narrows the scope of each model and adds layers of checks and balances.

6. Use LLMs as a translation layer¶

Remember, LLMs work best when they’re used to translate between human language and some other system or API. Avoid using them for general, open-ended tasks. They’re a powerful tool, but they should be just one part of a larger, more structured system.

7. Run locally if possible¶

This solves a lot of data privacy issues by design. If you can’t run locally, be extremely clear about how user data is used (or preferably, not used). If you can provide guarantees, even better. This is tricky because collecting user interactions is crucial for improving your product, but transparency and user trust are paramount.

8. Focus on UX¶

I’ve always believed that UX is more crucial to ML systems than ML itself. This is even more true with LLMs. Limit interactions to buttons or specific commands when possible. Avoid open inputs. Set up guardrails throughout the user journey, from inputs to outputs.

Remember, no one can truly tame a dragon. What you can do is muzzle its mouth, build fences, and invest in firefighting infrastructure. In other words, constrain the LLM, limit its potential for harm, and have robust error handling and fallback systems.

Working with LLMs is exciting, but it’s also a huge responsibility. We’re dealing with systems that can generate human-like text at scale. The potential for both benefit and harm is enormous. As builders, it’s our job to harness the power while mitigating the risks.

This might all sound daunting, and it is. But it’s also an incredible opportunity. We’re at the forefront of a technology that could reshape how humans interact with computers and information. It’s our job to shape that interaction responsibly.

The companies and developers who figure out how to reliably harness LLMs — how to truly “tame the AI dragon” — will be the ones who define the next era of computing.