The Scarce Thing
This week I'm in Switzerland, at a retreat on the future of software development, in a room full of people who have thought about that future longer and harder than I have. I haven't come to out-theorize them. I've come with a sample of one — a single project, built in retirement with an agent as co-author — and a few months' worth of posts arguing about what it might mean. This is the map of those arguments: the overall shape in one place, with each strand linked to where I make the case in full. If you only have room for the conclusion, it's at the bottom. Everything between here and there is how I got to it.
And I'm bringing them to explore, not to defend. It's an unconference — people propose what they most want to dig into and gather around it — so these are the questions I'm hoping to put on the board. I expect to be made welcome; it's that kind of room, and I'm a friendly enough stranger to it. Whether the ideas are made as welcome is the part I can't predict — and, honestly, the reason I'm going. A sample of one is exactly the kind of thing a room like this is built to stress-test, and I would rather find the cracks here than never find them at all.
Start with the observation the rest depends on, because if it's wrong, the rest is decoration. Over a few months I've been building Roundhouse, a compiler that reads an ordinary Rails application — untyped Ruby — and emits standalone, statically typed projects in Rust, Crystal, TypeScript, Go, and more, each compiling clean and passing its tests. I want to be precise about my role in that, because it's the whole point: I did not write the compiler. I can't write a compiler; I've never written one, and I can't read the Rust it generates. What I wrote was the thing the compiler has to satisfy — a fixture, a set of tests, a gate that fetches the same URL from Rails and from each target and checks that the responses are byte-for-byte identical. The agent wrote whatever passed it. The Oracle Is the Asset is the long version of what follows: when the agent holds the implementation depth you lack, the implementation stops being the thing you author. The thing you author is the oracle — the definition of what "correct" means — and the implementation is generated to satisfy it.
That project is one data point, and I've tried throughout to treat it as one. From a Sample of One is me holding it next to the findings of the retreat that preceded this one, looking for where my view and the room's diverge, and assuming the difference is a difference of regime — one retired person with an agent is not a team of twenty inside a company — until proven otherwise. Hold that humility over everything below.
The cost collapsed
Here is what makes the data point worth a post rather than a shrug. Whole-program type inference over a real framework, with a conformance oracle that mechanically decides whether an implementation is correct, is the kind of artifact that used to require an institution: a funded compiler team, a vendor, a research lab, or a standards body with member companies and a multi-year process. I spent years in that last world, and the executable conformance suite was always the expensive half of standardization — the part that often never got fully built. What changed is not that the agent writes code faster; everyone in this room has metabolized that already. What changed is that the cost of authoring the artifacts that used to require institutions has collapsed to a person and a few months. Conformance vs Comprehension is the full argument, and it's the one I'd most want this room to sit with: cheaper construction doesn't mean less software, it means we author orders of magnitude more of the things — oracles, compilers, domain standards — that were previously too expensive to build at all.
Rigor moved from comprehension to conformance
It changed how I work, not just what I can afford to build. The retreat's largest question was where engineering rigor goes once the agent writes the code. My answer is the uncomfortable one: for three months, on this project, conformance didn't augment my comprehension of the code — it replaced it. I never read the generated code, not once, and couldn't have judged it if I had. That sounds reckless until you see that the understanding didn't vanish, it moved. I understood the problem completely — which behaviors mattered, what correct output looked like byte for byte — and I authored the oracle, and the code is that oracle's output. Reading it to reassure myself would be checking the machine's work against the input I handed the machine. Conformance vs Comprehension draws the distinction I think our field keeps getting backwards — between the conformance test that does the overwhelming majority of the work and carries almost none of the cognitive load, and the written spec that is only ever the court of last appeal. We lavish our attention on the appellate court and starve the trial court that handles every actual case.
The working relationship inverts
If the agent holds the depth and you hold the problem, the management relationship inverts — and it turns out the inversion was described long ago. The Drucker Inversion is the structural version: Drucker noticed in 1959 that knowledge workers know more than their managers and have to be managed by objective rather than by method; the agent is that worker, and the principal directs by outcome because the outcome — the oracle — is the only place the principal's control and the agent's freedom meet. What Do You Recommend? is the same idea from the practitioner's end: the single embarrassingly small change that turned my own results from modest to transformative was to stop issuing choices and start asking for judgment, because on the specifics in front of us the agent usually knew more than I did. And because building the wrong thing got cheap, an exploratory style that used to be too expensive to indulge becomes rational — Bring Me a Rock is about how a management dysfunction inverts into a defensible method once iteration costs minutes instead of days and no one absorbs a cost they didn't choose. Each of those three is honest about the same unresolved seam: the agent contributes like a peer and has the stake of a tool, and I don't yet know how much of "peer" survives that.
The ladder started moving again
Step back from the one project and the motion looks older and bigger than any of it. For fifteen years the place where software gets defined has been climbing — language to framework, and then it kept going — and each rung was reached by building a tool, an analyzer or a compiler, that read the whole program and made that rung's ceremony disappear. The ladder stalled not because we ran out of ideas about where abstraction should go next, but because building those tools required institutions. The Rungs is the whole climb: the framework becomes a whole-app compiler (Rails was already typed — its conventions had been carrying a type system for twenty years), the compiler goes full-stack, the targets stop being exercises for the student, and finally — the rung this AI-native moment actually forces — the compiler turns around and answers questions about everything it made invisible. That last rung is real today, not a prediction: Live Types for Rails is the same inference that emits nine languages, pointed at an editor and an agent instead of a code generator, answering what is the type here, can this be nil, what won't survive ejection with no annotations and no running app. The payoffs underneath are real too — The Compilers Were Ready on why shape-stable output lets every existing compiler do its best work, and The Ruby JRuby Was Built to Run on the order of magnitude that buys. But the durable thing was never the compiler, or the speed. It was the oracle every rung answers to.
The scarce thing
Which is where all the strands meet, and where I'll leave it for the room. If the language is a substrate you don't think about, the framework's idioms are invisible, the types are inferred, the boundaries are compiled away, the targets are chosen for you, and the agent generates the implementation against a compiler that certifies it — then what is left for a person to author? The answer the whole climb converges on is: the definition of correct. The oracle. The statement of what counts as done, against which everything below is generated and replaced at will. That is the one input the agent still doesn't supply, because the one thing no layer beneath you can infer is what you were trying to build in the first place.
So the scarce input is about to change. When construction was scarce, we organized everything around who could build. When hundreds of people can each author a conformance oracle over a weekend — and I think that's exactly where this goes — construction stops being the bottleneck and judgment becomes it: judgment about which behaviors are even worth pinning down, which things are worth making an oracle of. I don't know how we get good at that, or who decides, or whether peer is the right word for the participant helping us do it. I've been honest elsewhere about how little one project can settle, and the numbers come without conclusions on purpose. But I'm fairly sure the question has stopped being who can build the thing code answers to, and become whether we still know which things are worth answering to. That's the conversation I came to Switzerland for.
I'll post an update when I'm back — once I've found out how much of this survives contact with a room that knows more than I do.
Roundhouse is open source: dual-licensed MIT / Apache-2.0. Issues and discussion welcome.