← All articles

How to Find and Fix Flaky Tests in pytest

Database state, network calls, import side effects — the most common causes of Python test flakiness and how to eliminate each one.

·14 min read

pytest is the gold standard for Python testing. Its fixture system, plugin ecosystem, and clean syntax make it a joy to write tests with. But those same powerful features — especially fixtures with broad scopes and plugin interactions — can introduce subtle flakiness that only shows up in CI.

This guide covers the most common patterns behind flaky pytest tests and gives you concrete fixes with real code. Whether you’re dealing with database state leaks, time-dependent assertions, or mysterious import side effects, you’ll find the solution here.

Want to skip the guesswork?

Instead of hunting through CI logs manually, Kleore analyzes your CI history and ranks every flaky test by failure rate and cost — so you fix the worst ones first.

Why pytest tests become flaky

Python’s dynamic nature and pytest’s powerful fixture system create unique flakiness vectors that don’t exist in more constrained testing frameworks. Here are the five most common root causes:

  1. Database state leaking between tests — Tests share a database and don’t properly isolate transactions. Test A creates a record, Test B doesn’t expect it to exist.
  2. File system conflicts — Tests write to the same files or directories. Parallel execution causes race conditions on file reads/writes.
  3. Network calls to real services — Tests make HTTP requests to external APIs that are slow, rate-limited, or occasionally down.
  4. Import side effects — Python modules that execute code at import time (database connections, config loading, signal handlers) create hidden coupling between tests.
  5. Test ordering dependencies — Test B only passes when Test A runs first because A sets up state that B implicitly relies on.

How to identify flaky pytest tests

pytest’s plugin ecosystem includes several tools specifically designed to flush out non-deterministic tests.

pytest-randomly: Shuffle test order

The most effective way to find tests with hidden ordering dependencies. pytest-randomly shuffles the order of test modules, classes, and functions on every run. When a test fails under randomization, you’ve found a flake.

Install and use pytest-randomly
pip install pytest-randomly

# Run with randomized order (enabled by default after install)
pytest

# Reproduce a specific failure with the same seed
pytest -p randomly --randomly-seed=12345

# Disable randomization temporarily
pytest -p no:randomly

pytest-repeat: Stress-test suspected flakes

Run a specific test many times to confirm it’s non-deterministic.

Repeat a test to confirm flakiness
pip install pytest-repeat

# Run a test 100 times — if it fails once, it's flaky
pytest --count=100 tests/test_checkout.py::test_apply_discount

# Stop on first failure
pytest --count=100 -x tests/test_checkout.py::test_apply_discount

pytest-rerunfailures: Detect and retry

This plugin automatically reruns failed tests. Tests that pass on rerun are flaky by definition. Use this for detection, not as a permanent solution.

Detect flaky tests with reruns
pip install pytest-rerunfailures

# Rerun failed tests up to 3 times
pytest --reruns 3

# Add a delay between reruns (useful for timing-dependent flakes)
pytest --reruns 3 --reruns-delay 2

# Mark specific tests as expected to flake
@pytest.mark.flaky(reruns=3, reruns_delay=1)
def test_webhook_delivery():
    ...

Common patterns and fixes

Pattern 1: Database state leaking between tests

Symptom: Tests pass individually but fail when run together. Failures involve unexpected records in the database or unique constraint violations.

Root cause: Tests create database records that persist across test boundaries. One test’s setup data becomes another test’s pollution.

Fix — transaction rollback with autouse fixture
# conftest.py
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

@pytest.fixture(autouse=True)
def db_session(request):
    """Wrap every test in a transaction that rolls back."""
    engine = create_engine(TEST_DATABASE_URL)
    connection = engine.connect()
    transaction = connection.begin()
    session = sessionmaker(bind=connection)()

    yield session

    session.close()
    transaction.rollback()
    connection.close()


# For Django projects, use the built-in support:
@pytest.fixture(autouse=True)
def enable_db_access(db):
    """Django's db fixture already handles transaction rollback."""
    pass

# Or in pytest.ini / pyproject.toml:
# [tool.pytest.ini_options]
# django_db_cleanup = "transaction"

The autouse=True parameter ensures every test gets isolation automatically, without needing to request the fixture explicitly. This prevents new tests from accidentally skipping isolation.

Pattern 2: Time-dependent tests

Symptom: Tests that check expiration, scheduling, or duration fail at certain times of day or run slower in CI than expected.

Root cause: Tests use datetime.now() or time.time() directly, and their assertions depend on the current time.

Fix — freeze time with freezegun
pip install freezegun
Using freezegun in tests
from freezegun import freeze_time
from datetime import datetime, timedelta
from myapp.auth import create_token, is_token_expired

@freeze_time("2025-06-15 12:00:00")
def test_token_expiration():
    token = create_token(expires_in=timedelta(hours=1))

    # Still within the hour — not expired
    assert not is_token_expired(token)

@freeze_time("2025-06-15 14:00:00")
def test_token_is_expired():
    # Create a token that expired an hour ago
    with freeze_time("2025-06-15 12:00:00"):
        token = create_token(expires_in=timedelta(hours=1))

    # Now it's 2pm — token expired at 1pm
    assert is_token_expired(token)

# As a fixture for broader use:
@pytest.fixture
def frozen_time():
    with freeze_time("2025-01-01 00:00:00") as frozen:
        yield frozen

Pattern 3: Network calls to real services

Symptom: Tests fail with ConnectionError, Timeout, or 429 Too Many Requests. Failures happen in bursts when the external service has issues.

Root cause: Tests make real HTTP requests to APIs you don’t control.

Fix — mock HTTP with responses library
pip install responses
Mocking HTTP calls
import responses
import requests
from myapp.payment import charge_customer

@responses.activate
def test_successful_charge():
    responses.add(
        responses.POST,
        "https://api.stripe.com/v1/charges",
        json={"id": "ch_test_123", "status": "succeeded"},
        status=200,
    )

    result = charge_customer(amount=2000, token="tok_visa")
    assert result.status == "succeeded"

@responses.activate
def test_payment_gateway_timeout():
    responses.add(
        responses.POST,
        "https://api.stripe.com/v1/charges",
        body=requests.exceptions.Timeout(),
    )

    with pytest.raises(PaymentError, match="timeout"):
        charge_customer(amount=2000, token="tok_visa")


# For httpx (async):
# pip install httpx-mock
import pytest
from httpx import AsyncClient

@pytest.fixture
def mock_httpx(httpx_mock):
    httpx_mock.add_response(
        url="https://api.example.com/data",
        json={"results": []},
    )
    return httpx_mock

Pattern 4: File system conflicts

Symptom: Tests fail with FileNotFoundError, PermissionError, or produce corrupted output. Especially common with parallel test execution via pytest-xdist.

Root cause: Multiple tests read/write the same file paths concurrently.

Fix — use pytest's tmp_path fixture
def test_export_csv(tmp_path):
    """tmp_path gives each test a unique temporary directory."""
    output_file = tmp_path / "export.csv"

    export_data(output_path=output_file)

    content = output_file.read_text()
    assert "header1,header2" in content
    assert len(content.splitlines()) == 101  # header + 100 rows

    # tmp_path is automatically cleaned up after the test


def test_config_loading(tmp_path):
    """Create isolated config files per test."""
    config_file = tmp_path / "config.yaml"
    config_file.write_text("""
database:
  host: localhost
  port: 5432
""")

    config = load_config(str(config_file))
    assert config["database"]["host"] == "localhost"


# For fixtures that need a persistent temp directory across a test class:
@pytest.fixture(scope="class")
def shared_tmp(tmp_path_factory):
    return tmp_path_factory.mktemp("shared")

Pattern 5: Import side effects

Symptom: Tests fail with errors about database connections already being open, signal handlers being registered twice, or global config having unexpected values.

Root cause: Python modules execute code at import time. If a module opens a database connection, registers a signal handler, or modifies global state when imported, that side effect persists for the entire test session.

Fix — mock at module level or use importlib
# If the module connects to a database on import:
# myapp/db.py
# connection = psycopg2.connect(DATABASE_URL)  # Runs at import time!

# Option 1: Mock before import
import sys
from unittest.mock import MagicMock

# Prevent the real module from connecting
sys.modules["psycopg2"] = MagicMock()

from myapp.db import get_users  # Now uses mocked connection

# Option 2: Use importlib for fresh imports
import importlib

def test_with_fresh_module():
    import myapp.db
    importlib.reload(myapp.db)  # Re-executes module code
    # ... test with fresh state

# Option 3 (best): Refactor to lazy initialization
# myapp/db.py
_connection = None

def get_connection():
    global _connection
    if _connection is None:
        _connection = psycopg2.connect(DATABASE_URL)
    return _connection

Quarantining flaky pytest tests

The pytest-quarantine plugin lets you mark tests as known-flaky so they don’t block your CI pipeline while you work on fixes.

pytest-quarantine setup
pip install pytest-quarantine

# Generate a quarantine list from your last test run
pytest --quarantine-save quarantine.txt

# Run tests, treating quarantined tests as expected failures
pytest --quarantine quarantine.txt

For a more automated approach, Kleore detects flaky tests automatically from your CI history — no manual tagging needed. It tracks every test that has passed and failed on the same commit, ranks them by impact, and gives you a prioritized fix list with cost estimates.

CI configuration tips for pytest

Beyond fixing individual tests, your CI configuration can reduce flakiness across the board.

pyproject.toml — hardened pytest config
[tool.pytest.ini_options]
# Randomize test order to catch hidden dependencies
addopts = "-p randomly --randomly-seed=last"

# Strict markers — prevent typos in marker names
markers = [
    "slow: marks tests as slow (deselect with '-m "not slow"')",
    "integration: marks integration tests",
    "flaky: marks known flaky tests",
]
strict_markers = true

# Timeout per test (requires pytest-timeout)
timeout = 30

# Fail on warnings to catch deprecation issues early
filterwarnings = [
    "error",
    "ignore::DeprecationWarning:third_party_lib.*",
]
.github/workflows/test.yml — pytest CI config
jobs:
  test:
    runs-on: ubuntu-latest
    env:
      PYTHONDONTWRITEBYTECODE: "1"
      PYTHONHASHSEED: "0"
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version-file: ".python-version"
          cache: "pip"
      - run: pip install -r requirements-test.txt
      - run: pytest -x --tb=short -q
        # -x: stop on first failure
        # --tb=short: concise tracebacks
        # -q: quiet output

  # For parallel execution with pytest-xdist:
  test-parallel:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version-file: ".python-version"
          cache: "pip"
      - run: pip install -r requirements-test.txt
      - run: pytest --forked -n auto
        # --forked: each test in its own subprocess
        # -n auto: use all available CPUs

Setting PYTHONDONTWRITEBYTECODE=1 prevents .pyc file conflicts in parallel runs. PYTHONHASHSEED=0 makes dictionary ordering deterministic, eliminating a whole class of order-dependent flakes.

Stop guessing which pytest tests are flaky.

Kleore scans your GitHub Actions history and gives you a ranked list of every flaky test — with failure rates, cost estimates, and fix priority. Free to start.

Further reading

Stop guessing.
Start measuring.

Two minutes from now, you’ll know exactly how much your CI flakes cost. No credit card. No config changes.

Scan my repos — free

Free to start. No credit card required.