binance-l2-capture · open source

Your order book data is lying to you.

Most homemade captures silently desync from the exchange and never tell you. This one halts on corruption instead of writing a book it can't prove is correct. Self-hosted, bring-your-own-key, Parquet out.

★ Get it on GitHub Read the guide →

MIT licensed · USDT-M perps · read-only API key · get the notes ↓

BidsBTCUSDT · L2Asks

109,412

109,418

109,411

109,419

109,410

109,420

spread 6.0 · mid 109,415.0 · pu ✓

109,409

109,421

109,408

109,422

the problem

Silent corruption is worse than a crash.

A crash you notice. A desynced book you don't — until your backtest finds an edge that evaporates live. On Binance futures, every diff event carries pu, the previous event's final update ID. Miss one event under load and your local book diverges forever. A naive capture and a correct one look identical, right up until the moment they don't.

l2cap — sequence continuity

# a dropped event under load:
event U=88421901 u=88421906 pu == last_u ✓ applied
event U=88421907 u=88421913 pu == last_u ✓ applied
event U=88421920 u=88421925 pu (88421913) ≠ event.pu (88421919) — GAP
⟂ HALT book discarded · re-fetching snapshot · re-validating seam…
✓ resynced lastUpdateId=88421940 · capture resumed

# a naive capture prints nothing here. it just keeps going — wrong.

what it guarantees

It refuses to record a book it can't prove.

Every invariant halts capture rather than logging garbage and continuing. Loud failure you can fix; silent corruption you ship into a backtest.

// continuity

No silent gaps

Checks pu == last_u on every event. A break triggers an automatic snapshot re-fetch and rebuild — never a patched hole.

// integrity

Never a crossed book

Best bid ≥ best ask is corruption by definition. It halts instead of storing it.

// liveness

No frozen feeds

No update within the staleness window means the stream is dead. It stops rather than timestamp a stale book as live.

// reconnects

Re-sync, don't resume

A reconnect is a gap. The book is discarded and rebuilt from a fresh snapshot — every time the socket drops.

// time

Clock-skew guard

If your machine drifts too far from exchange time, timestamps are wrong — so it halts instead.

// output

Auditable Parquet

Full depth ladder per event, partitioned by symbol and date. Loads straight into pandas, Polars, or DuckDB.

how it works

Running in under 15 minutes.

You run it against your own Binance key, on your own machine. A read-only key is enough.

quickstart

$ git clone https://github.com/Balleing/binance-l2-capture.git
$ cd binance-l2-capture && pip install -e .
$ cp .env.example .env # add your BINANCE_API_KEY
$ l2cap run # data lands in ./data/

✓ data/BTCUSDT/books/ · trades/ · mark_price/ — rotated every 5 min

is this legal?

Yes — because the data never touches me.

✓

You run it, your key, your machine. Binance prohibits redistributing its market data — so the tool never receives any. Every byte flows straight from Binance to your own storage. There is nothing to redistribute, because the author is never in the path. (This explains the design; it isn't legal advice.)