Using LLMs for writing software - so-called “vibe-coding”, but I prefer “grow-coding” - is taking the world of software development by storm. Love it or hate it, using LLM-based tooling for writing code is everywhere right now.
In his Welcome to Gas Town post, Steve Yegge posits a set of developmental stages many engineers go through while using AI. At stage 1 you’re really not using AI at all. Maybe auto-complete in the IDE, but not really leaning on it. By Stage 5 you’re living in the LLM UI, running in YOLO mode - getting work done using stored prompts, churning out code and performing reviews as fast as you can.
This scaling continues up to Stage 8 - Building your own Orchestration layer. At this point, you’ve hit the limits of how many parallel agents you can juggle, but you just know you can automate more of the boring stuff away. And hey - it’s easy! You spend some time setting up shell scripts or a wrapper and let it manage a small army of software clones.
Of course, there’s a cost. While manually managing more than 2-3 agents feels both like you’re drowning (“I can’t juggle this many tasks!”) and like you’re slowing things down (“All I do is hit enter for the agent.”), once you switch to fully autonomous yolo-mod… you begin to really, really hope that you don’t run afoul of the Lethal Trifecta.
Bicycle or Motorbike?
I like to ride motorcycles. In particular, I love riding street-fighter bikes - they look sort of like bullet bikes but give the rider a more upright, comfortable riding position. They’re still fast, but meant to be more comfortable for longer rides than the tight, tucked position offered by a bullet bike.
After over 20 years in the saddle, I know that accidents are inevitable and while I can do things to reduce the likelihood and impact of a collision - there’s just a certain level of risk I have to accept when I go for a ride. Some of that risk comes from not having a literal steel shell strapped around me (fun fact, motorcyclists refer to cars as ‘cages’ online!), but more a far more important factor in the severity of a collision is my velocity.
In my university years, I rode a bicycle everywhere. (I’m too lazy for that now. Especially because I live in Seattle, city of hills.) The highway that ran past the school and down into town had a speed limit of 45 mph. If I really cranked hard on the high-gears, I could typically to keep pace with traffic thanks to the gravity-boosted descent. So, my top speed was roughly 45 mph on that hill. But I mostly wasn’t on that hill. Mostly, I was tooling around on the much flatter terrain of the valley basin, where I topped out between 10-20 mph.
By comparison, my motorcycle can hit 90+ mph effortlessly - and still have plenty of overhead in the 850’s engine. I don’t really like riding that fast, except in short bursts to get clear of clumped traffic. (Attn traffic officers: I am of course, lying! I never exceed the posted speed limit and always drive exactly as described in the state driver’s handbook.) Still, I’ll occasionally experience a burst of anxiety as I become uncomfortably aware that I’m sitting on a 60 mile-an-hour chair. Riding is exhilirating, but it can also be frightening in a way I’ve never experienced in a car, regardless of speed.
Either two-wheeled conveyence gets me where I want to go faster than walking, but there are two very different levels of risk here.
On the more manual bicycle, my slower velocity means there’s both more time to react to a problem and less impact if something goes wrong. Sure, there’s not much difference if I get t-boned by a semi whether I’m on a bicycle or motorbike, but generally speaking, if something goes wrong, it will probably go wrong worse on a motorbike.
YOLO should surely mean ‘be careful’
This brings us back to grow-coding. An AI agent that can run code and make shell calls is one mistake away from doing Bad Things. Having a human in the loop is
like riding a bicycle. It’s notably faster, but you’re putting in a lot of effort. Thus arrives the temptation to switch your agent of choice to full-auto, yolo-mode.
Now you’re riding a motorcycle. Velocity increases - and so does your risk exposure.
rm -rf is only the beginning
I’ve seen well-intentioned (i.e. not even trying to be malicious) LLM Agents erase their project, trying to clean up temp files. I’ve seen them decide to
force add .git/ to a git commit - and then decide to rm -rf .git/ to try to clean up the resulting mess. These sorts of bone-headed mistakes are getting
rarer, but it’s still not sufficient to truly protect yourself from the Lethal Trifecta.
Back to the Beginning
So, what does this have to do with Stage 8 and orchestration?
Well, it seems I’ve gotten Stage 8 grow-coding.
I downloaded Gas Town and pointed it a long idle side project of mine - a fully peer-to-peer social network based on just the parts of Facebook I liked (which is to say, async commenting on my friends text/image posts and pretty much nothing else). Back when I gave it up, I’d based it upon the IPFS/IPNS project and, well, it kinda-sorta-not-really worked. Basically, it would work for a few messages, but once it encountered a particular data condition, the peer download logic would get confused and it would never update again. To fix it would have required a deep rewrite.
…So I asked Gas Town to do it.
It took a few hours over the course of a weekend, but by the end of it… the damn thing just works. Gas Town replaced the IPFS underpinning with the more modern Iroh P2P system, fixed the architectural bug in my messaging protocol and well… Huh. Now I have a functional social network app with three people on it, all of whom are me.
I liked the Gas Town experience a lot - that level of automation, of just pushing requirements in and getting a functional app out - it was amazing.
But it was also terrifying. It felt like a 60-mph-chair, without a helmet. My whole life is somewhere in the depths of my computer. And while I was running Gas Town, it was all exposed, waiting for a single mistake or malefactor to steal, delete, or corrupt it all.
Afterward, I decided I needed two things from any agent orchestration system:
- isolation of autonomous agents from the host system
- deterministic control of the overall system behavior
To the first point, I am relying upon containerization. Whatever I place in the container is still potentially vulnerable to Lethal Trifecta-style misbehavior, but the scope of the damage is (largely) limited to whatever I have put inside the container. In my case, that still includes e.g. the credentials I’ve given my agents, not to mention access to a project’s code itself - but that’s still already a much smaller surface area than my entire machine.
To the second, I am building a workflow DSL. The workflow itself acts as a domain-agnostic glue between shell commands and prompt files. It ensures that deterministic behaviors like “run the unit tests” are a) guaranteed to run and b) don’t waste precious tokens rediscovering ‘cargo test ’ in every damn turn.
Cloche
Therefore, I present cloche, my personal agent workflow/orchestration system. I’ve been running some experiments with it and while there’s still plenty I intend to add/change - I find it already quite useable.
Like any good experiment cloche does many things totally wrong, but I think I’m getting some important things right as well.
Over the next few posts, I’ll be breaking down some of the things I’ve learned building cloche, some of the things I’ve done already and some of the upcoming features that I hope will keep me building with cloche for the next decade.