Metrics: peak, allocated, allocations¶

One memray pass yields more than a single number. Every measured benchmark stores three memory metrics, and compare / plot / the readers take any of them — peak is the default, but allocated and allocations often catch what peak hides.

Metric	Blob key	What it is	Reach for it when
`peak`	`peak_bytes`	high-water of live bytes — the most allocated at once	headline footprint; "how big did it get"
`allocated`	`total_bytes`	sum of every allocation over the run	churn / temporary spikes `peak` smooths over
`allocations`	`allocations`	count of allocation calls	a near-deterministic, low-noise tripwire

All three are memray's allocator demand — what your code requested, in-process and byte-exact, so they see native (numpy / C-extension) allocations, not just Python objects.

Distribution across repeats¶

With --benchmark-memory-repeats=N (suite-wide) or @pytest.mark.benchmem(repeats=N) (per test), every repeat is kept as a flat series in the blob. The headline peak is the minimum (the cleanest floor); ask for any other stat over the series with --stat:

benchmem compare base.json head.json --metric peak --stat stddev   # how noisy is peak?
benchmem compare v1.json v2.json --metric allocated --stat mean

--stat takes min / max / mean / median / stddev and applies to any metric. Peak is the noisy one (GC timing, page cache); stddev tells you how much.

The terminal table shows the spread too: with repeats > 1, every shown metric expands into min / mean / max columns (peak·min, peak·mean, peak·max) — always, so the columns don't shift between runs; a single pass stays one column. The table shows peak only by default; add the rest with --benchmark-memory-columns=peak,allocated,allocs and pick the spread stats with --benchmark-memory-stats=min,stddev.

Setup¶

In [1]:

Copied!





import os
import sys
import tempfile
from pathlib import Path

os.environ["FORCE_COLOR"] = "1"
os.environ["PATH"] = f"{Path(sys.executable).parent}{os.pathsep}{os.environ['PATH']}"
_tmp = Path(tempfile.mkdtemp(prefix="pytest-benchmem-"))
import os
import sys
import tempfile
from pathlib import Path

os.environ["FORCE_COLOR"] = "1"
os.environ["PATH"] = f"{Path(sys.executable).parent}{os.pathsep}{os.environ['PATH']}"
_tmp = Path(tempfile.mkdtemp(prefix="pytest-benchmem-"))

Three readings of one run¶

A workload that allocates a lot of temporary memory but holds little at its peak — the place peak and allocated diverge most:

In [2]:

Copied!





suite = _tmp / "test_churn.py"
suite.write_text("""
def test_churn(benchmark_memory):
    def work():
        total = 0
        for _ in range(200):
            total += sum([i * i for i in range(20_000)])
        return total
    benchmark_memory(work)
""")
run = _tmp / "churn.json"
!pytest {suite} --benchmark-only --benchmark-json={run} --benchmark-columns=min,median -q -p no:cacheprovider
suite = _tmp / "test_churn.py"
suite.write_text("""
def test_churn(benchmark_memory):
    def work():
        total = 0
        for _ in range(200):
            total += sum([i * i for i in range(20_000)])
        return total
    benchmark_memory(work)
""")
run = _tmp / "churn.json"
!pytest {suite} --benchmark-only --benchmark-json={run} --benchmark-columns=min,median -q -p no:cacheprovider

.                                                                        [100%]

Wrote benchmark data in: <_io.BufferedWriter name='/tmp/pytest-benchmem-gzgmv4qt/churn.json'>

benchmark: 1 tests                                          
                                                            
  Name (time in ms)        Min     Median   │   peak (MiB)  
 ────────────────────────────────────────────────────────── 
  test_churn          203.2353   204.2025   │         1.16  
                                                            
memory (right of │): a separate, untimed pass, not the timed
rounds  •  also available via --benchmark-memory-columns:   
allocated, allocs                                           
1 passed in 3.87s

The same run read three ways — peak stays small (one list lives at a time) while allocated is far larger (every list summed) and allocations counts the calls:

In [3]:

Copied!





from pytest_benchmem import human_bytes, load_long_df

for metric in ("peak", "allocated", "allocations"):
    df, unit = load_long_df([run], metric=metric)
    v = df["value"].iloc[0]
    shown = human_bytes(v) if unit == "B" else f"{v:.0f}"
    print(f"{metric:<12} {shown}")
from pytest_benchmem import human_bytes, load_long_df

for metric in ("peak", "allocated", "allocations"):
    df, unit = load_long_df([run], metric=metric)
    v = df["value"].iloc[0]
    shown = human_bytes(v) if unit == "B" else f"{v:.0f}"
    print(f"{metric:<12} {shown}")

peak         1.16 MiB
allocated    294 MiB
allocations  9001

The raw blob, for reference:

In [4]:

Copied!

import json

json.loads(run.read_text())["benchmarks"][0]["extra_info"]["benchmem"]
import json

json.loads(run.read_text())["benchmarks"][0]["extra_info"]["benchmem"]

Out[4]:

{'peak_bytes': [1221536], 'allocations': [9001], 'total_bytes': [308357376]}

Picking one for a gate¶

For CI gating, allocations is often the best tripwire — it's near-deterministic, so a change there is almost always a real change in behaviour, not measurement noise. peak answers the capacity question; allocated catches churn regressions a peak gate would miss. You can gate on several at once — see Compare & plot and the reference.