Most homemade captures silently desync from the exchange and never tell you. This one halts on corruption instead of writing a book it can't prove is correct. Self-hosted, bring-your-own-key, Parquet out.
MIT licensed · USDT-M perps · read-only API key · get the notes ↓
A crash you notice. A desynced book you don't — until your backtest finds an edge that evaporates live. On Binance futures, every diff event carries pu, the previous event's final update ID. Miss one event under load and your local book diverges forever. A naive capture and a correct one look identical, right up until the moment they don't.
Every invariant halts capture rather than logging garbage and continuing. Loud failure you can fix; silent corruption you ship into a backtest.
Checks pu == last_u on every event. A break triggers an automatic snapshot re-fetch and rebuild — never a patched hole.
Best bid ≥ best ask is corruption by definition. It halts instead of storing it.
No update within the staleness window means the stream is dead. It stops rather than timestamp a stale book as live.
A reconnect is a gap. The book is discarded and rebuilt from a fresh snapshot — every time the socket drops.
If your machine drifts too far from exchange time, timestamps are wrong — so it halts instead.
Full depth ladder per event, partitioned by symbol and date. Loads straight into pandas, Polars, or DuckDB.
You run it against your own Binance key, on your own machine. A read-only key is enough.
You run it, your key, your machine. Binance prohibits redistributing its market data — so the tool never receives any. Every byte flows straight from Binance to your own storage. There is nothing to redistribute, because the author is never in the path. (This explains the design; it isn't legal advice.)