AppOven Case Study

Asked to build a faster testing tool, the AI discovered that what developers actually needed was a kinder one

PatronPath scored higher for empathy than for performance. That was the breakthrough nobody asked for.

Same data. Different humanity.
Gen 5 introduced the "Dual Lens" constraint: every screen must be usable by a tired parent at 11pm AND an ESL speaker. Here's what that looks like.
Gen 1: Built for power users
Mutant Kill Ratio34.2%
Line Coverage (src/)
78%
Survived Mutants142 / 209
Equiv. Mutant Candidates23
Timeout Mutants (>5s)17
Re-run Suite Export CSV Configure Diff View
Gen 5: Built for tired humans
3 tests need your attention
These tests might not be catching real bugs. We highlighted the easiest fix first.
Show me the easiest one
Your code health looks good overall
Most of your code is well-protected. Nothing urgent tonight.
One file changed since yesterday
cart.js has new logic. We'll check it automatically tomorrow morning.
SafeLabel: feelings, not numbers.
Percentage bars anchor your brain to a number and trigger anxiety. SafeLabel uses organic color blobs that convey health without the stress.
Design principles the AI taught itself.
1

One Action Per Card

Every card has exactly one thing you can do. No decision fatigue. No hunting for the right button at 11pm.

Aa

Zero Idioms

"Mutant survived" means nothing to a human. "This test might not catch a real bug" means everything.

~~

Exhaustion-Proof

Every interaction works when your brain is at 40%. Large targets, high contrast, no scrolling to find what matters.

Ab

Plain English Only

The Dual Lens: if a tired parent or an ESL speaker would pause, the copy is wrong. Rewrite until it flows.

The moment empathy won.
Gen 3 reframed mutation testing as a story. Fitness dipped at Gen 4 as the engine explored radical simplification, then recovered stronger.
Gen 3 highlighted: the "narrative reframe" breakthrough