fastDigest is a Rust-powered Python extension module that provides a lightning-fast implementation of the t-digest data structure and algorithm, offering a lightweight suite of online statistics for streaming and distributed data.
- Online statistics: Compute highly accurate estimates of quantiles, the CDF, and many derived quantities such as the (trimmed) mean.
- Updating: Update a t-digest incrementally with streaming data or batches of large datasets.
- Merging: Merge many t-digests into one, enabling parallel compute operations such as map-reduce.
- Serialization: Use the
to_dict
/from_dict
methods or thepickle
module for serialization. - Easy API: The fastDigest API is designed to be intuitive and to keep high overlap with popular libraries.
- Blazing fast: Thanks to its Rust backbone, this module is hundreds of times faster than other Python implementations.
Compiled wheels are available on PyPI. Simply install via pip:
pip install fastdigest
To build and install fastDigest from source, you will need Rust and maturin.
-
Install the Rust toolchain → see https://rustup.rs
-
Install maturin via pip:
pip install maturin
- Build and install the package:
maturin build --release
pip install target/wheels/fastdigest-0.8.3-<platform-tag>.whl
The following examples give you a quick start. See the API reference for the full documentation.
Simply call TDigest()
, or use TDigest.from_values
to create a digest directly from any sequence of numeric values:
from fastdigest import TDigest
digest = TDigest()
digest = TDigest.from_values([1.42, 2.71, 3.14])
Estimate the value at the rank q
using quantile(q)
:
digest = TDigest.from_values(range(101))
print("99th percentile:", digest.quantile(0.99))
Or the inverse - use cdf
to find the rank (cumulative probability) of a given value:
print("cdf(990) =", digest.cdf(990))
Compute the arithmetic mean
, or the trimmed_mean
between two quantiles:
data = list(range(101))
data[-1] = 100_000 # inserting an outlier
digest = TDigest.from_values(data)
print(f" Mean: {digest.mean():.1f}")
print(f"Trimmed mean: {digest.trimmed_mean(0.1, 0.9)}")
Use batch_update
to merge a sequence of many values at once, or update
to add one value at a time:
digest = TDigest()
digest.batch_update([0, 1, 2])
digest.update(3)
Note that there can be significant performance differences between these methods depending on use-case.
Use the +
operator to create a new instance from two TDigests, or +=
to merge in-place:
digest1 = TDigest.from_values(range(20))
digest2 = TDigest.from_values(range(20, 51))
digest3 = TDigest.from_values(range(51, 101))
digest1 += digest2
merged_new = digest1 + digest3
The merge_all
function offers an easy way to merge an iterable of many TDigests:
from fastdigest import TDigest, merge_all
digests = [TDigest.from_values(range(i, i+10)) for i in range(0, 100, 10)]
merged = merge_all(digests)
Obtain a dictionary representation by calling to_dict()
and load it into a new instance with TDigest.from_dict
:
from fastdigest import TDigest
import json
digest = TDigest.from_values(range(101))
td_dict = digest.to_dict()
print(json.dumps(td_dict, indent=2))
restored = TDigest.from_dict(td_dict)
The fastDigest API is designed to be backward compatible with the tdigest Python library. Migrating is as simple as changing your import
statement.
Dicts created by tdigest can also natively be used by fastDigest.
Constructing a TDigest and estimating the median of 1,000,000 uniformly distributed random values (average of 10 consecutive runs):
Library | Time (ms) | Speedup |
---|---|---|
tdigest | ~12,800 | - |
fastdigest | ~32 | 400x faster |
Environment: Python 3.13.2, Fedora 41 (Workstation), AMD Ryzen 5 7600X
If you want to try it yourself, install fastDigest as well as tdigest and run:
python benchmark.py
fastDigest is licensed under the MIT License. See the LICENSE file for details.
Credit goes to Ted Dunning for inventing the t-digest. Special thanks to Andy Lok and Paul Meng for creating the tdigests and tdigest Rust libraries, respectively, as well as to all PyO3 contributors.