What permissions does the Kleore GitHub App need?

Kleore starts with the bare minimum: read-only access to Actions workflow runs. It never sees your source code. When you enable optional features like PR comments, it asks for just the additional permission needed — you approve each one.

Do I need to change my CI workflows?

Not for the initial report. The zero-config scan works from your existing GitHub Actions data. To see individual flaky tests, you add one step to upload JUnit XML results — a 2-line YAML change.

How is the CI cost number calculated?

Each flaky rerun costs approximately 30 minutes (20 min rerun wait + 10 min context switch) at $75/hr fully-loaded engineer rate. These are conservative defaults you can customize for your team.

Can I use Kleore on a private GitHub repo?

Yes. The free tier works on any repo you install the app on, public or private. Your data stays private and is never shared.

What makes Kleore different from other CI tools?

Most CI tools show you pass/fail. Kleore shows you cost. It translates flaky tests into dollar amounts so you can prioritize fixes and get budget approval. The shareable report makes the problem impossible to ignore.

← All articles

What Are Flaky Tests? The Silent Killer of CI Pipelines

They pass. They fail. Nothing changed. And your team just lost another hour.

March 21, 2026·8 min read

A flaky test is a test that produces different results — pass or fail — when run against the same code. No one touched the source. No dependency changed. Yet the test failed, your build went red, and someone on your team had to stop what they were doing to investigate.

Thirty minutes later, they re-run the pipeline. It passes. They shrug, merge the PR, and move on — but the damage is already done: time wasted, context lost, and a little more trust eroded in your test suite.

Why do tests become flaky?

Flaky tests aren’t random. They have root causes, but those causes are often subtle enough that they don’t surface on every run. The most common culprits:

Timing & race conditions

Tests that depend on specific timing — setTimeout, polling intervals, animations — fail when the runner is a few milliseconds slower than expected.

Shared state

Tests that read from or write to shared databases, files, or global variables. Run them in a different order and they break.

External dependencies

API calls to third-party services, DNS lookups, network requests that timeout intermittently under load.

Environment differences

Your test passes locally on macOS but fails on the Linux CI runner due to filesystem case sensitivity, timezone differences, or resource limits.

Date & time sensitivity

Tests that compare against "now" or assume a specific day of the week. They fail at midnight, on weekends, or across timezone boundaries.

Resource contention

Parallel test runners competing for ports, file locks, or database connections. Works fine sequentially, breaks under concurrency.

The real cost is invisible

Most teams underestimate flaky tests because the cost is diffuse. It’s not one big outage — it’s a thousand small interruptions.

●Re-runs burn CI minutes. Every retry is compute you’re paying for twice. At scale, this adds up to thousands of dollars per month.
●Developer time is the hidden multiplier. An engineer investigating a false failure for 20 minutes costs more than the CI compute. Multiply that by every flaky test, every day.
●Trust erodes slowly, then all at once. Once developers stop trusting the test suite, they start ignoring real failures. That’s when bugs ship to production.
●Merge velocity drops. PRs sit open longer because the build is “probably just flaky.” Reviews stack up. Shipping slows down.

Industry data point

Google’s internal research found that roughly 1.5% of all test runs across their monorepo were flaky. At Google’s scale, that translated to millions of wasted compute hours per year. Your team is smaller, but the proportional cost can be just as painful.

How do you know if you have a flaky test problem?

If any of these sound familiar, you already do:

✔️Developers routinely re-run CI without changing code
✔️Your team has a Slack message template for “just re-run it”
✔️Certain tests are known to be unreliable but no one has time to fix them
✔️CI costs have been creeping up and nobody knows exactly why
✔️Engineers merge PRs even when CI is red, saying “it’s a known flake”

What high-performing teams do differently

The best engineering teams don’t just fix flaky tests — they build systems to catch and manage them before they metastasize. Here’s the playbook:

1. Detect automatically

Don’t wait for developers to report flaky tests in Slack. Analyze CI run history programmatically. A test that fails on one commit but passes on a retry — with no code diff — is flaky. Flag it immediately.

2. Quantify the damage

Knowing a test is flaky isn’t enough. You need to know how much it’s costing you — in CI minutes, in re-runs, in dollars. That’s what turns a “we should fix this” into an “we need to fix this now.”

3. Assign ownership

Flaky tests without owners don’t get fixed. Assign each flaky test to a person with an SLA. Track resolution like you track incidents.

4. Quarantine strategically

While a fix is in progress, quarantine the test so it stops blocking other developers. But quarantine with an expiration date — otherwise it becomes a graveyard.

5. Measure improvement over time

Track flaky test count and cost week over week. If the trend isn’t going down, your process isn’t working.

This is exactly what Kleore does.

Kleore connects to your GitHub repos, analyzes your CI history, and shows you every flaky test — ranked by cost. Assign owners, quarantine tests, track your burn-down. Two-minute setup, no config changes.

Scan my repos — free

What Are Flaky Tests? The Silent Killer of CI Pipelines

Why do tests become flaky?

Timing & race conditions

Shared state

External dependencies

Environment differences

Date & time sensitivity

Resource contention

The real cost is invisible

How do you know if you have a flaky test problem?

What high-performing teams do differently

1. Detect automatically

2. Quantify the damage

3. Assign ownership

4. Quarantine strategically

5. Measure improvement over time

This is exactly what Kleore does.

Further reading

Stop guessing.
Start measuring.

Why do tests become flaky?

Timing & race conditions

Shared state

External dependencies

Environment differences

Date & time sensitivity

Resource contention

The real cost is invisible

How do you know if you have a flaky test problem?

What high-performing teams do differently

1. Detect automatically

2. Quantify the damage

3. Assign ownership

4. Quarantine strategically

5. Measure improvement over time

This is exactly what Kleore does.

Further reading

Stop guessing.Start measuring.

Stop guessing.
Start measuring.