Getting started¶

This page runs pytest-benchmem end to end: write a benchmark, execute it, read both metrics back. The fixture and memray ship with the core install; the plots and CLI need pytest-benchmem[plot], and memory measurement is Linux/macOS only.

Setup¶

A scratch dir for the suite and JSON the cells produce; the PATH line makes the !pytest / !benchmem cells resolve to this kernel's environment.

In [1]:

Copied!





import os
import sys
import tempfile
from pathlib import Path

os.environ["FORCE_COLOR"] = "1"
os.environ["PATH"] = f"{Path(sys.executable).parent}{os.pathsep}{os.environ['PATH']}"
_tmp = Path(tempfile.mkdtemp(prefix="pytest-benchmem-"))
print(f"tempdir: {_tmp}")
import os
import sys
import tempfile
from pathlib import Path

os.environ["FORCE_COLOR"] = "1"
os.environ["PATH"] = f"{Path(sys.executable).parent}{os.pathsep}{os.environ['PATH']}"
_tmp = Path(tempfile.mkdtemp(prefix="pytest-benchmem-"))
print(f"tempdir: {_tmp}")

tempdir: /tmp/pytest-benchmem-wq4oslku

A memory benchmark — the `benchmark_memory` fixture¶

benchmark_memory wraps pytest-benchmark's benchmark, so timing works as usual; it then runs the action once more under memray.Tracker — a separate, untimed pass — and stashes the result in extra_info.benchmem. Parametrize params become the analysis dims the plots scale by, for free.

Here's a tiny suite — sorted over a range of input sizes:

In [2]:

Copied!





suite = _tmp / "test_sortbench.py"
suite.write_text("""
import pytest

@pytest.mark.parametrize("n", [10_000, 50_000, 200_000, 500_000])
def test_sort(benchmark_memory, n):
    data = list(range(n, 0, -1))
    benchmark_memory(sorted, data)
""")
print(suite.read_text())
suite = _tmp / "test_sortbench.py"
suite.write_text("""
import pytest

@pytest.mark.parametrize("n", [10_000, 50_000, 200_000, 500_000])
def test_sort(benchmark_memory, n):
    data = list(range(n, 0, -1))
    benchmark_memory(sorted, data)
""")
print(suite.read_text())

import pytest

@pytest.mark.parametrize("n", [10_000, 50_000, 200_000, 500_000])
def test_sort(benchmark_memory, n):
    data = list(range(n, 0, -1))
    benchmark_memory(sorted, data)

Your benchmark must be safe to re-run. Memory rides a separate invocation, after pytest-benchmark has already called your function many times — so a side-effectful call (mutates a fixture, fills a cache, drains an iterator) silently records its warmed state, not a cold one. Benchmark a pure call, or use the pedantic form with a setup that rebuilds fresh state each round.

Run it — one command, both metrics¶

A normal pytest invocation. --benchmark-json writes the same file pytest-benchmark always writes; the only difference is each entry now also carries extra_info.benchmem.

In [3]:

Copied!

baseline = _tmp / "baseline.json"
!pytest {suite} --benchmark-only --benchmark-json={baseline} --benchmark-columns=min,median -q -p no:cacheprovider
baseline = _tmp / "baseline.json"
!pytest {suite} --benchmark-only --benchmark-json={baseline} --benchmark-columns=min,median -q -p no:cacheprovider

.                                                                     [100%]

Wrote benchmark data in: <_io.BufferedWriter name='/tmp/pytest-benchmem-wq4oslku/baseline.json'>

benchmark: 4 tests                                                              
                                                                                
  Name (time in us)                  Min               Median   │   peak (MiB)  
 ────────────────────────────────────────────────────────────────────────────── 
  test_sort[10000]         49.1430 (1.0)        50.2785 (1.0)   │         0.08  
  test_sort[50000]       257.9280 (5.25)      263.4380 (5.24)   │         0.38  
  test_sort[200000]   1,048.9610 (21.35)   1,066.8865 (21.22)   │         1.53  
  test_sort[500000]   2,769.9070 (56.36)   2,946.4200 (58.60)   │         3.81  
                                                                                
memory (right of │): a separate, untimed pass, not the timed rounds  •  also    
available via --benchmark-memory-columns: allocated, allocs                     
4 passed in 4.34s

pytest-benchmem appends peak to pytest-benchmark's own table — no flag needed. Add allocated / allocs with --benchmark-memory-columns, or split memory into its own table with --benchmark-memory-table=split.

Already have a benchmark suite? Add --benchmark-memory and every benchmark(...) call records memory too, no test changes. Reach for the benchmark_memory fixture when you want memory on specific tests, or the pedantic control.

Read both metrics back¶

pytest-benchmem reads that file per metric: from_pytest_benchmark pulls timing (from stats), memory_from_pytest_benchmark pulls peak memory (from extra_info.benchmem). Dims default to the parametrize params, so each sample knows its n.

In [4]:

Copied!





from pytest_benchmem import from_pytest_benchmark, memory_from_pytest_benchmark

_, time_samples, tunit = from_pytest_benchmark(baseline)
_, mem_samples, munit = memory_from_pytest_benchmark(baseline)

print(f"timing ({tunit}):")
for s in time_samples:
    print(f"  {s.id.split('::')[-1]:<18} {s.value:.3e}  dims={dict(s.dims)}")
print(f"\nmemory ({munit}):")
for s in mem_samples:
    print(f"  {s.id.split('::')[-1]:<18} {s.value:>10.0f}  dims={dict(s.dims)}")
from pytest_benchmem import from_pytest_benchmark, memory_from_pytest_benchmark

_, time_samples, tunit = from_pytest_benchmark(baseline)
_, mem_samples, munit = memory_from_pytest_benchmark(baseline)

print(f"timing ({tunit}):")
for s in time_samples:
    print(f"  {s.id.split('::')[-1]:<18} {s.value:.3e}  dims={dict(s.dims)}")
print(f"\nmemory ({munit}):")
for s in mem_samples:
    print(f"  {s.id.split('::')[-1]:<18} {s.value:>10.0f}  dims={dict(s.dims)}")

timing (s):
  test_sort[10000]   4.914e-05  dims={'node.module': 'test_sortbench.py', 'node.func': 'test_sort', 'n': 10000}
  test_sort[50000]   2.579e-04  dims={'node.module': 'test_sortbench.py', 'node.func': 'test_sort', 'n': 50000}
  test_sort[200000]  1.049e-03  dims={'node.module': 'test_sortbench.py', 'node.func': 'test_sort', 'n': 200000}
  test_sort[500000]  2.770e-03  dims={'node.module': 'test_sortbench.py', 'node.func': 'test_sort', 'n': 500000}

memory (B):
  test_sort[10000]        80000  dims={'node.module': 'test_sortbench.py', 'node.func': 'test_sort', 'n': 10000}
  test_sort[50000]       400000  dims={'node.module': 'test_sortbench.py', 'node.func': 'test_sort', 'n': 50000}
  test_sort[200000]     1600000  dims={'node.module': 'test_sortbench.py', 'node.func': 'test_sort', 'n': 200000}
  test_sort[500000]     4000000  dims={'node.module': 'test_sortbench.py', 'node.func': 'test_sort', 'n': 500000}

load_long_df stacks one or more runs into the tidy frame every plot pivots — one row per (run, id) for the chosen metric, one column per dim:

In [5]:

Copied!

from pytest_benchmem import load_long_df

df, unit = load_long_df([baseline], metric="peak")
print(f"unit: {unit}")
df
from pytest_benchmem import load_long_df

df, unit = load_long_df([baseline], metric="peak")
print(f"unit: {unit}")
df

unit: B

Out[5]:

	snapshot	id	value	node.module	node.func	n
0	baseline	test_sortbench.py::test_sort[10000]	80000.0	test_sortbench.py	test_sort	10000
1	baseline	test_sortbench.py::test_sort[50000]	400000.0	test_sortbench.py	test_sort	50000
2	baseline	test_sortbench.py::test_sort[200000]	1600000.0	test_sortbench.py	test_sort	200000
3	baseline	test_sortbench.py::test_sort[500000]	4000000.0	test_sortbench.py	test_sort	500000

Quick one-off — `measure_peak`¶

Outside pytest — in a REPL or notebook — measure_peak is the bare engine: hand it a zero-arg callable, get the peak in bytes. (repeats > 1 takes the min, since peak memory is noisy; measure_memory returns the full MemoryResult — peak, spread, allocation count.)

In [6]:

Copied!

from pytest_benchmem import human_bytes, measure_peak

human_bytes(measure_peak(lambda: [0] * 5_000_000))
from pytest_benchmem import human_bytes, measure_peak

human_bytes(measure_peak(lambda: [0] * 5_000_000))

Out[6]:

'38.1 MiB'