binance-l2-capture · open source

Your order book data is lying to you.

Most homemade captures silently desync from the exchange and never tell you. This one halts on corruption instead of writing a book it can't prove is correct. Self-hosted, bring-your-own-key, Parquet out.

MIT licensed · USDT-M perps · read-only API key · get the notes ↓

BidsBTCUSDT · L2Asks
109,412
109,418
109,411
109,419
109,410
109,420
spread 6.0 · mid 109,415.0 · pu ✓
109,409
109,421
109,408
109,422
the problem

Silent corruption is worse than a crash.

A crash you notice. A desynced book you don't — until your backtest finds an edge that evaporates live. On Binance futures, every diff event carries pu, the previous event's final update ID. Miss one event under load and your local book diverges forever. A naive capture and a correct one look identical, right up until the moment they don't.

l2cap — sequence continuity
# a dropped event under load:
event U=88421901 u=88421906 pu == last_u ✓ applied
event U=88421907 u=88421913 pu == last_u ✓ applied
event U=88421920 u=88421925 pu (88421913) ≠ event.pu (88421919) — GAP
⟂ HALT book discarded · re-fetching snapshot · re-validating seam…
✓ resynced lastUpdateId=88421940 · capture resumed

# a naive capture prints nothing here. it just keeps going — wrong.
what it guarantees

It refuses to record a book it can't prove.

Every invariant halts capture rather than logging garbage and continuing. Loud failure you can fix; silent corruption you ship into a backtest.

// continuity

No silent gaps

Checks pu == last_u on every event. A break triggers an automatic snapshot re-fetch and rebuild — never a patched hole.

// integrity

Never a crossed book

Best bid ≥ best ask is corruption by definition. It halts instead of storing it.

// liveness

No frozen feeds

No update within the staleness window means the stream is dead. It stops rather than timestamp a stale book as live.

// reconnects

Re-sync, don't resume

A reconnect is a gap. The book is discarded and rebuilt from a fresh snapshot — every time the socket drops.

// time

Clock-skew guard

If your machine drifts too far from exchange time, timestamps are wrong — so it halts instead.

// output

Auditable Parquet

Full depth ladder per event, partitioned by symbol and date. Loads straight into pandas, Polars, or DuckDB.

how it works

Running in under 15 minutes.

You run it against your own Binance key, on your own machine. A read-only key is enough.

quickstart
$ git clone https://github.com/Balleing/binance-l2-capture.git
$ cd binance-l2-capture && pip install -e .
$ cp .env.example .env # add your BINANCE_API_KEY
$ l2cap run # data lands in ./data/

data/BTCUSDT/books/ · trades/ · mark_price/ — rotated every 5 min
// subscribe

Notes from BaldQuant

Occasional deep-dives on order-book data, market microstructure, and backtests that don't lie — plus one email when the Pro tier ships. Low volume, no spam.

Double opt-in · unsubscribe anytime · powered by Buttondown