What Are Flaky Tests? The Silent Killer of CI Pipelines
They pass. They fail. Nothing changed. And your team just lost another hour.
A flaky test is a test that produces different results — pass or fail — when run against the same code. No one touched the source. No dependency changed. Yet the test failed, your build went red, and someone on your team had to stop what they were doing to investigate.
Thirty minutes later, they re-run the pipeline. It passes. They shrug, merge the PR, and move on — but the damage is already done: time wasted, context lost, and a little more trust eroded in your test suite.
Why do tests become flaky?
Flaky tests aren’t random. They have root causes, but those causes are often subtle enough that they don’t surface on every run. The most common culprits:
Timing & race conditions
Tests that depend on specific timing — setTimeout, polling intervals, animations — fail when the runner is a few milliseconds slower than expected.
Shared state
Tests that read from or write to shared databases, files, or global variables. Run them in a different order and they break.
External dependencies
API calls to third-party services, DNS lookups, network requests that timeout intermittently under load.
Environment differences
Your test passes locally on macOS but fails on the Linux CI runner due to filesystem case sensitivity, timezone differences, or resource limits.
Date & time sensitivity
Tests that compare against "now" or assume a specific day of the week. They fail at midnight, on weekends, or across timezone boundaries.
Resource contention
Parallel test runners competing for ports, file locks, or database connections. Works fine sequentially, breaks under concurrency.
The real cost is invisible
Most teams underestimate flaky tests because the cost is diffuse. It’s not one big outage — it’s a thousand small interruptions.
- ●Re-runs burn CI minutes. Every retry is compute you’re paying for twice. At scale, this adds up to thousands of dollars per month.
- ●Developer time is the hidden multiplier. An engineer investigating a false failure for 20 minutes costs more than the CI compute. Multiply that by every flaky test, every day.
- ●Trust erodes slowly, then all at once. Once developers stop trusting the test suite, they start ignoring real failures. That’s when bugs ship to production.
- ●Merge velocity drops. PRs sit open longer because the build is “probably just flaky.” Reviews stack up. Shipping slows down.
Industry data point
Google’s internal research found that roughly 1.5% of all test runs across their monorepo were flaky. At Google’s scale, that translated to millions of wasted compute hours per year. Your team is smaller, but the proportional cost can be just as painful.
How do you know if you have a flaky test problem?
If any of these sound familiar, you already do:
- ✔️Developers routinely re-run CI without changing code
- ✔️Your team has a Slack message template for “just re-run it”
- ✔️Certain tests are known to be unreliable but no one has time to fix them
- ✔️CI costs have been creeping up and nobody knows exactly why
- ✔️Engineers merge PRs even when CI is red, saying “it’s a known flake”
What high-performing teams do differently
The best engineering teams don’t just fix flaky tests — they build systems to catch and manage them before they metastasize. Here’s the playbook:
1. Detect automatically
Don’t wait for developers to report flaky tests in Slack. Analyze CI run history programmatically. A test that fails on one commit but passes on a retry — with no code diff — is flaky. Flag it immediately.
2. Quantify the damage
Knowing a test is flaky isn’t enough. You need to know how much it’s costing you — in CI minutes, in re-runs, in dollars. That’s what turns a “we should fix this” into an “we need to fix this now.”
3. Assign ownership
Flaky tests without owners don’t get fixed. Assign each flaky test to a person with an SLA. Track resolution like you track incidents.
4. Quarantine strategically
While a fix is in progress, quarantine the test so it stops blocking other developers. But quarantine with an expiration date — otherwise it becomes a graveyard.
5. Measure improvement over time
Track flaky test count and cost week over week. If the trend isn’t going down, your process isn’t working.
This is exactly what Kleore does.
Kleore connects to your GitHub repos, analyzes your CI history, and shows you every flaky test — ranked by cost. Assign owners, quarantine tests, track your burn-down. Two-minute setup, no config changes.
Scan my repos — freeFurther reading
- How Much Do Flaky Tests Actually Cost? — The dollar math behind CI waste.
- How to Fix Flaky Tests in GitHub Actions — Practical patterns for the most common root causes.
- Flaky Test Cost Calculator — See what flaky tests cost your specific team.