Order Book Reconstruction in Sub-Millisecond

How to build and maintain order books from L2/L3 feeds. Sequence numbers, gap detection, and snapshot recovery.

Intermediate • 25 min read • Expert Version →

🎯 What You'll Learn

Understand L1/L2/L3 market data feeds
Build an order book from incremental updates
Handle sequence gaps and snapshot recovery
Optimize for sub-millisecond reconstruction

Why Sub-Millisecond Matters

Your trading strategy sees a price. By the time you react, the price has moved.

Order book age: 1ms   → You might win
Order book age: 10ms  → You're trading on stale data
Order book age: 100ms → You're the sucker at the table
```bash

This lesson covers how to build and maintain **real-time order books** from exchange data feeds.

---

## Market Data Levels

| Level | Contains | Updates/sec | Use Case |
|-------|----------|-------------|----------|
| **L1** | Best bid/ask only | 10-100 | Simple trading |
| **L2** | Top N levels (e.g., top 10) | 100-1,000 | Most strategies |
| **L3** | Every order (individual IDs) | 1,000-100,000 | Market making |

Most exchanges provide L2. True L3 (market-by-order) is primarily available on equity venues; most crypto exchanges provide L2 at most.

---

## Order Books are Event-Sourced

You don't receive the current state — you receive a stream of changes. Miss one update, and your book is wrong forever until you re-sync.

Every update has a **sequence number**. Gap detection is mandatory.

---

## Order Book Data Structure

```python
from collections import defaultdict
from sortedcontainers import SortedDict

class OrderBook:
    def __init__(self, symbol: str):
        self.symbol = symbol
        self.bids = SortedDict()  # price -> quantity (descending)
        self.asks = SortedDict()  # price -> quantity (ascending)
        self.sequence = 0

    def apply_snapshot(self, snapshot: dict):
        """Initialize from full snapshot."""
        self.bids.clear()
        self.asks.clear()
        for price, qty in snapshot['bids']:
            self.bids[-float(price)] = float(qty)  # Negative for descending
        for price, qty in snapshot['asks']:
            self.asks[float(price)] = float(qty)
        self.sequence = snapshot['sequence']

    def apply_update(self, update: dict) -> bool:
        """Apply incremental update. Returns False if gap detected."""
        expected_seq = self.sequence + 1
        if update['sequence'] != expected_seq:
            return False  # GAP! Need to re-sync

        for price, qty in update.get('bids', []):
            price = -float(price)
            if float(qty) == 0:
                self.bids.pop(price, None)
            else:
                self.bids[price] = float(qty)

        for price, qty in update.get('asks', []):
            price = float(price)
            if float(qty) == 0:
                self.asks.pop(price, None)
            else:
                self.asks[price] = float(qty)

        self.sequence = update['sequence']
        return True

    def best_bid(self):
        return -self.bids.peekitem(0)[0] if self.bids else None

    def best_ask(self):
        return self.asks.peekitem(0)[0] if self.asks else None

    def mid_price(self):
        bid, ask = self.best_bid(), self.best_ask()
        return (bid + ask) / 2 if bid and ask else None
```diff

---

## Sequence Gap Detection

```python
class SafeOrderBook(OrderBook):
    def __init__(self, symbol: str, on_gap_callback):
        super().__init__(symbol)
        self.on_gap = on_gap_callback
        self.buffered_updates = []
        self.max_buffer = 1000

    def process_message(self, msg: dict):
        if msg['type'] == 'snapshot':
            self.apply_snapshot(msg)
            self._replay_buffered()
        elif msg['type'] == 'update':
            if not self.apply_update(msg):
                # Gap detected!
                self._handle_gap(msg)

    def _handle_gap(self, msg):
        """Buffer updates and request new snapshot."""
        self.buffered_updates.append(msg)
        if len(self.buffered_updates) > self.max_buffer:
            self.buffered_updates.pop(0)
        self.on_gap(self.symbol)  # Trigger snapshot request

    def _replay_buffered(self):
        """Apply buffered updates after snapshot."""
        valid_updates = [u for u in self.buffered_updates
                        if u['sequence'] > self.sequence]
        valid_updates.sort(key=lambda x: x['sequence'])

        for update in valid_updates:
            if not self.apply_update(update):
                break  # Still have a gap

        self.buffered_updates.clear()
```diff

---

## Performance Optimization

### 1. Memory Layout

```python
# Bad: Python dicts with string keys
book = {"bids": {}, "asks": {}}  # Slow, memory-intensive

# Better: NumPy arrays for numerical data
import numpy as np
bids = np.zeros((1000, 2), dtype=np.float64)  # [price, qty] pairs
asks = np.zeros((1000, 2), dtype=np.float64)

# Best: Pre-allocated fixed-size arrays with manual management
```sql

## 2. Update Parsing

```python
# Bad: JSON parsing every message
data = json.loads(raw_message)

# Better: Use orjson (3-10x faster than stdlib json)
import orjson
data = orjson.loads(raw_message)

# Best: Binary protocols (protobuf, FlatBuffers)
```text

## 3. Price Level Indexing

```python
# For fixed-tick instruments, use array indexing
# E.g., BTC with $1 ticks: price $50,000 = index 50000
class TickBook:
    def __init__(self, min_price, max_price, tick_size):
        self.min_price = min_price
        self.tick_size = tick_size
        n_levels = int((max_price - min_price) / tick_size)
        self.bids = np.zeros(n_levels, dtype=np.float64)
        self.asks = np.zeros(n_levels, dtype=np.float64)

    def _price_to_idx(self, price):
        return int((price - self.min_price) / self.tick_size)

    def update_bid(self, price, qty):
        self.bids[self._price_to_idx(price)] = qty  # O(1)
```python

---

## Common Misconceptions

**Myth:** "I can just poll the REST API for order book state."
**Reality:** REST is seconds behind WebSocket. By the time you get the response, it's ancient history. WebSocket with incremental updates is the only viable approach.

**Myth:** "Sequence gaps are rare, I can ignore them."
**Reality:** Networks drop packets. WebSocket connections hiccup. Gaps happen in production. Your code must handle them or your book diverges silently.

**Myth:** "My order book matches the exchange perfectly."
**Reality:** There's always propagation delay. Your book is a local approximation. Build strategies that tolerate this uncertainty.

---

## Exchange Integration Examples

### Binance
```python
# Subscribe to order book updates
{
    "method": "SUBSCRIBE",
    "params": ["btcusdt@depth@100ms"],  # 100ms batching
    "id": 1
}

# Snapshot endpoint for recovery
GET /api/v3/depth?symbol=BTCUSDT&limit=1000
```text

## Coinbase Advanced Trade
```python
# Subscribe
{
    "type": "subscribe",
    "channels": [{"name": "level2", "product_ids": ["BTC-USD"]}]
}

# Initial snapshot comes first, then updates
```diff

---

## Practice Exercises

### Exercise 1: Build a Simple Book
```python
# Process this sequence:
snapshot = {"bids": [["100", "10"]], "asks": [["101", "5"]], "sequence": 1}
update1 = {"bids": [["99", "20"]], "asks": [], "sequence": 2}
update2 = {"bids": [["100", "0"]], "asks": [["101", "8"]], "sequence": 3}

# What's the final state?
```text

## Exercise 2: Gap Detection
```python
# What happens with this sequence?
snapshot = {"sequence": 100}
update = {"sequence": 102}  # Gap! 101 is missing — what do you do?
```text

## Exercise 3: Benchmark Performance
```python
# Measure updates per second
import time
start = time.perf_counter()
for _ in range(100000):
    book.apply_update(sample_update)
elapsed = time.perf_counter() - start
print(f"{100000/elapsed:.0f} updates/sec")

Key Takeaways

Event-sourcing model - Books are built from update streams
Sequence numbers are critical - One missed update corrupts everything
Buffer + snapshot = recovery - Handle gaps gracefully
Performance is architecture - Data structures matter more than micro-optimization

What’s Next?

Market Making Fundamentals

Order Book Reconstruction: Nanoseconds Matter

Want to go deeper?

Weekly infrastructure insights for engineers who build trading systems.

Free forever. Unsubscribe anytime.

You're in. Check your inbox.

Questions about this lesson? Working on related infrastructure?

Let's discuss