Capybara Without the Browser Tax
Transpiling Ruby to JavaScript is usually discussed in terms of deployment targets — browser, edge, mobile, desktop — platforms Rails can't reach. But transpilation has a second payoff that's easy to overlook: it makes system tests practical. When your entire application runs as JavaScript, your entire application can run inside a test runner, no browser required.
The Problem
DHH recently declared that system tests have failed. After a decade of Rails system tests being "slow, brittle, and full of false negatives," he's wasted more time making them reliable than catching bugs. HEY is cutting their 300-test system suite and leaning on manual testing instead.
He's right about the diagnosis. Every problem he describes traces to the same stack: Capybara + Selenium + a real browser + HTTP round-trips. The test has to launch Chrome, navigate via WebDriver, wait for JavaScript timing, and hope nothing flakes out along the way. Each of those layers adds latency and nondeterminism.
But the conclusion — that system tests themselves are the problem — doesn't follow. The concept is sound. The tooling failed.
Vitest + jsdom
Vitest is the standard test runner in the JavaScript ecosystem — think minitest for JS. When configured with jsdom — a pure-JavaScript implementation of the browser DOM — it runs your full application in Node.js: routes, controllers, models, views, fixtures, DOM updates, event handlers, all in a single process. No browser launches, no network stack, no WebDriver protocol.
The tradeoff is that jsdom doesn't do layout or painting — you can't test "is this element visible?" or "did the CSS animation fire?" But for testing application logic, navigation flows, and CRUD operations, it's functionally equivalent to a browser and completely deterministic.
Same Source, Same DSL
Here's a system test from the ballroom demo app:
class StudiosSystemTest < ApplicationSystemTestCase
test "create, edit, and delete a studio" do
visit root_url
click_on "Studios"
assert_text "Studios"
click_on "New studio"
fill_in "Name", with: "Galaxy Dance"
click_on "Create Studio"
assert_text "Galaxy Dance was successfully created."
click_on "Edit this studio"
fill_in "Name", with: "Galaxy Ballroom"
click_on "Update Studio"
assert_text "Galaxy Ballroom was successfully updated."
click_on "Edit this studio"
accept_confirm do
click_on "Remove this studio"
end
assert_text "Galaxy Ballroom was successfully removed."
end
end
This is a standard Rails system test. visit, click_on, fill_in, assert_text, accept_confirm — the same Capybara DSL Rails developers already know. Nothing to learn.
But this test exercises more than CRUD. Each click_on triggers Stimulus controllers handling form submission, Turbo Stream responses updating the DOM, and flash notice rendering. These are the JavaScript interactions where edge cases hide — and where DHH admits he doesn't have "great automated answers." juntos test transpiles this to vitest/jsdom and runs it in 75ms. The same file can also be run with juntos e2e against a real browser via Playwright. Same source, two outputs — at a cost that lets you cover these interactions thoroughly.
Measured Results
| Runner | Per-test cost |
|---|---|
| Vitest/jsdom | ~75ms |
| Playwright | ~250ms |
| Rails/Selenium | ~425ms |
Speed is only one factor. Each of Selenium's 425ms includes network hops and browser rendering where timing races hide — that's where "brittle and full of false negatives" comes from. Vitest/jsdom eliminates the browser entirely. No timing races. No flakiness. Deterministic by construction.
When You Need a Real Browser
Selenium's problems aren't inherent to browser testing. They're specific to the WebDriver architecture: an HTTP-based protocol translating commands across process boundaries, requiring explicit waits, suffering from stale element references, and breaking across browser version updates. Playwright was built as a direct response — native CDP protocol, built-in auto-waiting, isolated browser contexts, and a trace viewer for debugging. Most of DHH's "brittle" complaints simply don't apply.
Playwright also enables visual regression testing — pixel-level screenshot comparison across runs. This partially automates the "does it look right?" question that DHH says only humans can answer. CSS regressions, layout shifts, broken styling — things jsdom can never detect and humans easily miss on repeat testing — get caught automatically.
Since the same Ruby source file produces both vitest and Playwright outputs, a defined? Playwright guard lets you add visual assertions that only run in the real browser:
test "visual regression" do
visit messages_url
expect(page).to_have_screenshot if defined? Playwright
end
In vitest/jsdom, defined? Playwright is false — the assertion is skipped. In Playwright, it's true — the screenshot is captured and compared. In Rails, it's nil — same behavior as vitest. One source file, three tiers, no conditionals leaking into the wrong runner.
Three Tiers
One source file, three levels of confidence:
| Tier | How | Flakiness | When |
|---|---|---|---|
| vitest/jsdom | In-process, no browser | Zero | Every commit |
| Playwright | Real browser, CDP | Low | Pre-deploy |
| Human | Eyes and judgment | N/A | Feature dev |
Manual Testing Is Irreplaceable but Insufficient
DHH's point about "does it feel right" is real — no automated test catches a jarring animation or a confusing flow. Manual testing is irreplaceable for that.
But it doesn't scale. HEY's 300 system tests at 30 seconds each would take 2.5 hours of human attention. It doesn't compound — the same effort every time. And it isn't reliable under time pressure — edge cases get skipped, happy paths get tested, error paths don't.
When the automated tier is fast and deterministic, humans are freed to do what they're actually good at — judging whether the app feels right, not catching Stimulus controller regressions.
Conclusion
The lesson of the last decade isn't that system tests are a bad idea. It's that Selenium made them so painful that teams concluded the concept itself was flawed. Strip away the WebDriver overhead, the timing races, and the false negatives, and system tests do exactly what they were always supposed to: catch regressions cheaply and reliably. System tests didn't fail. The tooling did.
Juntos is open source: github.com/ruby2js/ruby2js