From a Sample of One

2026-05-24T17:28:56Z

Thoughtworks recently published The future of software engineering — retreat findings and strategic insights. It is the synthesis of a multi-day, Chatham-House-Rule retreat among senior engineering practitioners from major technology companies. The people in the room have depth of experience I do not, and the document itself is honest about uncertainty: it explicitly says it produced "a map of the fault lines" rather than a roadmap, and the closing quote is "the retreat didn't produce a roadmap. It produced a shared understanding that the map is being redrawn and that the people best positioned to draw it are the ones willing to admit how much they don't yet know." That framing reads as an invitation to add to the map, not as a statement of conclusions.

I would like to take it up as that invitation.

My vantage point is small. One project — Roundhouse — built over a few weeks by one retired developer with Claude as a co-author. No employer, no quarterly cadence, no team to coordinate with, no organizational governance to navigate. A sample size of one project, in a regime the retreat probably didn't have many examples of in the room. What follows are three places where my experience suggests something different from the report, framed as questions rather than disagreements. I am extrapolating from a small sample, and the difference between what I see and what the retreat sees may be entirely a difference of regime.

On the role of product management

The report says, in the section on the future of roles:

Nobody at the retreat could define what product managers will do in an AI-driven world.

From where I sit, the workflow I have been operating under for the past few weeks has a fairly specific shape, and I have been calling it the Drucker Inversion. The principal — me — does not have the implementation depth in the technical domain (compiler construction, codegen across multiple target languages). The agent does. My job is to frame the problem, evaluate proposals against the framing, and direct by objective rather than by method. The four structural moves I named in that post — constrain by outcome rather than method; decompose into units small enough to fail visibly; publish diagnostically rather than declaratively; keep artifacts inspectable at the principal's level of expertise — map almost line-for-line onto what I would describe as the technical-product-manager skill stack. Problem framing. Outcome specification. Scope judgment. Decomposition. Taste. Stakeholder narration. Choosing the abstraction layer at which decisions get made.

That suggests a different reading of the retreat's open question. The role convergence the report notes — PM, developer, designer all blurring — may be less "we don't know what PMs will do" and more "the workflow promotes PM-shaped work to the load-bearing role, and the developers thriving in it are the ones who already had PM instincts." If that reframe holds, the open question shifts. It is not "what will product managers do." It is "how do developers who lacked PM instincts develop them, and what professional pathway does that growth look like?"

I cannot generalize this from one project. But I can say that the four structural moves above are not optional decoration — they are what makes the inversion sustainable, and they are not engineering moves. They are management moves applied to the opposite end of the principal-agent relationship.

On expressiveness versus safety

The report says, in the section on programming languages for agents:

Languages that favor expressiveness over safety make both agent generation and human review harder.

Roundhouse happens to be a case where I can put data next to this. It reads a non-statically-typed Rails application and emits standalone projects in up to seven typed targets — Rust, Crystal, TypeScript, Go, Python, Elixir, plus a Ruby round-trip. The same concept exists in both forms. I can count.

The Article model in the standard Rails blog scaffold is eleven lines of Ruby:

class Article < ApplicationRecord
  has_many :comments, dependent: :destroy

  broadcasts_to ->(_article) { "articles" }, inserts_by: :prepend

  validates :title, presence: true
  validates :body, presence: true, length: { minimum: 10 }
end

The same concept, emitted to Crystal, is 218 lines. Emitted to Rust it spans three files totalling 469 lines. The single line has_many :comments, dependent: :destroy becomes a sixteen-line typed comments method with explicit SQL plus a before_destroy cascade. The single line broadcasts_to becomes three separate after_*_commit methods with fully-materialized broadcast calls. The validations become an explicit validate method with branches per rule. The ratio sits somewhere between twenty and sixty times by line count, depending on the target.

That difference matters for two readers the retreat cares about. For a finite-context LLM reasoning about intent across a multi-model application, the dense form fits and the expanded form does not — there is something like a hundred-fold difference in token budget per model concept. For a principal verifying that the model matches requirements, the dense form describes intent (this resource has these associations, broadcasts these changes, enforces these validations) and the expanded form describes mechanism (these SQL strings, these escape calls, these nil-coalesce branches). The first is verifiable in seconds. The second is not.

There is a nuance the report's framing collapses. The ML language tradition — OCaml, F#, Scala, Haskell — has been demonstrating for forty years that type inference dissolves the apparent tradeoff between expressiveness and safety. The programmer writes expression-shaped code, the compiler infers types, and the safety property holds without the reader paying an annotation tax. Whole-program inference, which both Spinel and Roundhouse perform, extends this further: even the boundaries that local inference cannot close — cross-module, cross-file, polymorphic containers — close without annotation.

The question I would put back to the retreat: is the right axis really "expressive versus safe," or is it "annotation versus inferred"? And does the artifact the agent generates into need to be the same artifact the human edits? Roundhouse's existence is partial evidence that the answer to the second question can be no. The principal edits the dense, typed-DSL form (Rails was already typed; has_many is itself a type declaration). The compiler reads that as ground truth and emits the annotation-heavy form for the target. Neither audience pays the verbosity cost together.

This conclusion is downstream of an unusual project shape — a transpiler whose job is to produce the typed form from the expressive form. It may not generalize. But the existence proof of one source compiling to five typed targets, all of which compile clean and pass their tests, is at least suggestive that the framing might be worth re-examining.

On sprint cadence and the iteration unit

The report says, in the section on agile evolution:

Some teams are compressing sprint cadences to one week.

Two earlier posts of mine sit next to this. Nine Days describes a Rails blog application transpiled to six languages, with downloadable binaries and 126+ tests passing, in nine days from first commit. Even on a generous reading, that is at minimum six sprint-equivalents — almost certainly more if you count discarded prototypes that never landed. Choose Your Own Adventure names the asymmetry directly: a per-resource specialization PR took 24 minutes of commit time, after several hours of decision time. The handoff document that planned it estimated three to five days.

If I do the ceremony arithmetic from there, something interesting happens. Sprint overhead (planning, retro, demo) is roughly constant per sprint. At two-week cadence it is around five percent of cycle time. At one-week cadence it is around ten percent. At one-day cadence it is around fifty percent. Compressing the cadence does not just speed up the same practice — at some point the overhead structure stops paying for itself. And at team size one, the purpose of most of the ceremony (cross-human coordination, status visibility, planning alignment) drops to zero. You do not compress it. You delete it.

A second thing happens at this scale. The unit being iterated over changes. Extreme Programming's smallest unit of value was a feature. The unit I am iterating over looks more like an architectural bet: should the IR specialize on the closed axis or the open one? Does Prisma's split generalize to a multi-target transpiler? Is whole-program inference enough to close the typed-target boundary, or do we need per-resource specialization first? Each of those is a question that gets probed, verified, kept or discarded. The Prisma-axis decision in Borrowed from Prisma was exactly this shape: hours of thinking, 24 minutes of commits, a verified result, an architectural commitment.

That regime makes some classical observations relevant again. Brooks's "plan to throw one away" — which he himself recanted in 1995 on the grounds that incremental delivery meant you would inevitably ship the throwaway — returns to applicability when implementation cost drops below decision cost. The next iteration really is 24 minutes away. Exploration budgets become tractable: a bet with seventy percent probability of success becomes rational because the cost of being wrong is a session, not a quarter, and three independent seventy-percent bets in series have ninety-seven percent probability that at least one lands.

The enabler — and this matters — is a verifiable acceptance criterion you can run cheaply. Roundhouse has a byte-identical cross-target compare gate: any approach the agent takes is acceptable if the lowered output produces byte-identical HTTP responses across CRuby, TypeScript, Crystal, Rust, and Spinel against the same Rails source. Without that oracle, "throw it away" has nothing to compare to and the exploration regime cannot work. The forcing function is part of the practice, not a precondition you can skip. Projects that lack a cheap verification surface cannot access this mode.

The question I would put to the retreat: is sprint compression a continuous trend, or is there a phase change near team size one where the iteration unit itself shifts from features to architectural bets, and where the right enabling investment is the forcing function rather than the cadence? Extreme Programming was a paradigm shift relative to waterfall. I sense that another shift of comparable magnitude may be possible, and that its slogan is something closer to iterate on bets, not on features.

On orthogonal slices of the same elephant

The retreat's findings describe enterprise teams of many humans. My observations come from a project of one human plus one LLM. These may not be the same regime, and several of the report's findings — the speed mismatch where agents burn through backlogs and hit cross-team walls; decision fatigue at middle management; governance as the real threat to agile — describe the cross-organizational boundary I never run into, because I do not have one. The report is correct about that boundary. I am simply outside it.

So I am not arguing that the retreat's synthesis is wrong. I am suggesting that some of its findings may describe the N-humans regime well and the N-equals-one-plus-LLM regime less well, and that the two regimes might have qualitatively different answers to the same questions. The PM role question, the language-design question, and the sprint-cadence question all look different from this end of the telescope. None of that displaces what the retreat saw. It might add to the map.

I am extrapolating from a sample of one project, and the retreat is synthesizing across many practitioners. Neither sample is privileged. None of us has the answers; we are all making educated guesses. The point of writing this is to put one more vantage point next to theirs, in the spirit the report explicitly invites — that the people best positioned to draw the new map are the ones willing to admit how much they do not yet know.

I will add my admission to that pile, and the three observations above are mine.

Roundhouse is open source: dual-licensed MIT / Apache-2.0. Issues and discussion welcome.