Order Book Reconstruction in Sub-Millisecond
How to build and maintain order books from L2/L3 feeds. Sequence numbers, gap detection, and snapshot recovery.
🎯 What You'll Learn
- Understand L1/L2/L3 market data feeds
- Build an order book from incremental updates
- Handle sequence gaps and snapshot recovery
- Optimize for sub-millisecond reconstruction
Why Sub-Millisecond Matters
Your trading strategy sees a price. By the time you react, the price has moved.
Order book age: 1ms → You might win
Order book age: 10ms → You're trading on stale data
Order book age: 100ms → You're the sucker at the table
```bash
This lesson covers how to build and maintain **real-time order books** from exchange data feeds.
---
## Market Data Levels
| Level | Contains | Updates/sec | Use Case |
|-------|----------|-------------|----------|
| **L1** | Best bid/ask only | 10-100 | Simple trading |
| **L2** | Top N levels (e.g., top 10) | 100-1,000 | Most strategies |
| **L3** | Every order (individual IDs) | 1,000-100,000 | Market making |
Most exchanges provide L2. True L3 (market-by-order) is primarily available on equity venues; most crypto exchanges provide L2 at most.
---
## Order Books are Event-Sourced
You don't receive the current state — you receive a stream of changes. Miss one update, and your book is wrong forever until you re-sync.
Every update has a **sequence number**. Gap detection is mandatory.
---
## Order Book Data Structure
```python
from collections import defaultdict
from sortedcontainers import SortedDict
class OrderBook:
def __init__(self, symbol: str):
self.symbol = symbol
self.bids = SortedDict() # price -> quantity (descending)
self.asks = SortedDict() # price -> quantity (ascending)
self.sequence = 0
def apply_snapshot(self, snapshot: dict):
"""Initialize from full snapshot."""
self.bids.clear()
self.asks.clear()
for price, qty in snapshot['bids']:
self.bids[-float(price)] = float(qty) # Negative for descending
for price, qty in snapshot['asks']:
self.asks[float(price)] = float(qty)
self.sequence = snapshot['sequence']
def apply_update(self, update: dict) -> bool:
"""Apply incremental update. Returns False if gap detected."""
expected_seq = self.sequence + 1
if update['sequence'] != expected_seq:
return False # GAP! Need to re-sync
for price, qty in update.get('bids', []):
price = -float(price)
if float(qty) == 0:
self.bids.pop(price, None)
else:
self.bids[price] = float(qty)
for price, qty in update.get('asks', []):
price = float(price)
if float(qty) == 0:
self.asks.pop(price, None)
else:
self.asks[price] = float(qty)
self.sequence = update['sequence']
return True
def best_bid(self):
return -self.bids.peekitem(0)[0] if self.bids else None
def best_ask(self):
return self.asks.peekitem(0)[0] if self.asks else None
def mid_price(self):
bid, ask = self.best_bid(), self.best_ask()
return (bid + ask) / 2 if bid and ask else None
```diff
---
## Sequence Gap Detection
```python
class SafeOrderBook(OrderBook):
def __init__(self, symbol: str, on_gap_callback):
super().__init__(symbol)
self.on_gap = on_gap_callback
self.buffered_updates = []
self.max_buffer = 1000
def process_message(self, msg: dict):
if msg['type'] == 'snapshot':
self.apply_snapshot(msg)
self._replay_buffered()
elif msg['type'] == 'update':
if not self.apply_update(msg):
# Gap detected!
self._handle_gap(msg)
def _handle_gap(self, msg):
"""Buffer updates and request new snapshot."""
self.buffered_updates.append(msg)
if len(self.buffered_updates) > self.max_buffer:
self.buffered_updates.pop(0)
self.on_gap(self.symbol) # Trigger snapshot request
def _replay_buffered(self):
"""Apply buffered updates after snapshot."""
valid_updates = [u for u in self.buffered_updates
if u['sequence'] > self.sequence]
valid_updates.sort(key=lambda x: x['sequence'])
for update in valid_updates:
if not self.apply_update(update):
break # Still have a gap
self.buffered_updates.clear()
```diff
---
## Performance Optimization
### 1. Memory Layout
```python
# Bad: Python dicts with string keys
book = {"bids": {}, "asks": {}} # Slow, memory-intensive
# Better: NumPy arrays for numerical data
import numpy as np
bids = np.zeros((1000, 2), dtype=np.float64) # [price, qty] pairs
asks = np.zeros((1000, 2), dtype=np.float64)
# Best: Pre-allocated fixed-size arrays with manual management
```sql
## 2. Update Parsing
```python
# Bad: JSON parsing every message
data = json.loads(raw_message)
# Better: Use orjson (3-10x faster than stdlib json)
import orjson
data = orjson.loads(raw_message)
# Best: Binary protocols (protobuf, FlatBuffers)
```text
## 3. Price Level Indexing
```python
# For fixed-tick instruments, use array indexing
# E.g., BTC with $1 ticks: price $50,000 = index 50000
class TickBook:
def __init__(self, min_price, max_price, tick_size):
self.min_price = min_price
self.tick_size = tick_size
n_levels = int((max_price - min_price) / tick_size)
self.bids = np.zeros(n_levels, dtype=np.float64)
self.asks = np.zeros(n_levels, dtype=np.float64)
def _price_to_idx(self, price):
return int((price - self.min_price) / self.tick_size)
def update_bid(self, price, qty):
self.bids[self._price_to_idx(price)] = qty # O(1)
```python
---
## Common Misconceptions
**Myth:** "I can just poll the REST API for order book state."
**Reality:** REST is seconds behind WebSocket. By the time you get the response, it's ancient history. WebSocket with incremental updates is the only viable approach.
**Myth:** "Sequence gaps are rare, I can ignore them."
**Reality:** Networks drop packets. WebSocket connections hiccup. Gaps happen in production. Your code must handle them or your book diverges silently.
**Myth:** "My order book matches the exchange perfectly."
**Reality:** There's always propagation delay. Your book is a local approximation. Build strategies that tolerate this uncertainty.
---
## Exchange Integration Examples
### Binance
```python
# Subscribe to order book updates
{
"method": "SUBSCRIBE",
"params": ["btcusdt@depth@100ms"], # 100ms batching
"id": 1
}
# Snapshot endpoint for recovery
GET /api/v3/depth?symbol=BTCUSDT&limit=1000
```text
## Coinbase Advanced Trade
```python
# Subscribe
{
"type": "subscribe",
"channels": [{"name": "level2", "product_ids": ["BTC-USD"]}]
}
# Initial snapshot comes first, then updates
```diff
---
## Practice Exercises
### Exercise 1: Build a Simple Book
```python
# Process this sequence:
snapshot = {"bids": [["100", "10"]], "asks": [["101", "5"]], "sequence": 1}
update1 = {"bids": [["99", "20"]], "asks": [], "sequence": 2}
update2 = {"bids": [["100", "0"]], "asks": [["101", "8"]], "sequence": 3}
# What's the final state?
```text
## Exercise 2: Gap Detection
```python
# What happens with this sequence?
snapshot = {"sequence": 100}
update = {"sequence": 102} # Gap! 101 is missing — what do you do?
```text
## Exercise 3: Benchmark Performance
```python
# Measure updates per second
import time
start = time.perf_counter()
for _ in range(100000):
book.apply_update(sample_update)
elapsed = time.perf_counter() - start
print(f"{100000/elapsed:.0f} updates/sec")
Key Takeaways
- Event-sourcing model - Books are built from update streams
- Sequence numbers are critical - One missed update corrupts everything
- Buffer + snapshot = recovery - Handle gaps gracefully
- Performance is architecture - Data structures matter more than micro-optimization
What’s Next?
Want to go deeper?
Weekly infrastructure insights for engineers who build trading systems.
Free forever. Unsubscribe anytime.
You're in. Check your inbox.
Questions about this lesson? Working on related infrastructure?
Let's discuss