Curriculum 10 posts · ~2.1h total

Linux Performance Engineering

Sub-100µs from NIC to strategy decision

The kernel layer beneath every HFT trade: NUMA topology, CPU isolation, kernel bypass networking, huge pages, and the RT scheduling configuration that separates 18µs from 200µs.

What you'll master

Kernel bypass networking (ef_vi, DPDK, AF_XDP)
NUMA-aware memory allocation
CPU pinning + isolcpus/nohz_full
SCHED_FIFO real-time scheduling
eBPF off-CPU profiling

Why this matters

Every institutional desk runs Linux bare-metal. The performance delta between a tuned and untuned system is not 10%—it is a full order of magnitude. These ten posts document the techniques that moved production latency from 200µs to sub-50µs at Akuna Capital, with measured numbers at every stage.

The Curriculum — 10 modules

Part 1 Jan 2026 17 min

The Anatomy of a Sub-50µs Trade: Tracing a Packet from NIC to Strategy and Back

A packet-level walkthrough of a sub-50µs trade at Akuna Capital: NIC ring buffer, kernel bypass, strategy evaluation, order encoding, and wire transmit.

Part 2 Jan 2026 13 min

NUMA in Production: Why Your Trading Bot Slows Down at 3 AM and How to Diagnose It

Real production NUMA debugging at Akuna Capital: P99 latency doubling overnight, cross-socket penalty measurement, and the numastat/perf c2c workflow.

Part 3 Jan 2026 12 min

CPU Pinning, isolcpus, and nohz_full: Building a Quiet Core for Latency-Critical Code

How to build a genuinely quiet CPU core for HFT using isolcpus, nohz_full, rcu_nocbs, and proper IRQ migration — with the grub cmdline that actually works.

Part 4 Jan 2026 13 min

Solarflare ef_vi vs DPDK vs AF_XDP: A Decision Framework for Kernel Bypass in 2026

A production comparison of kernel bypass approaches: Solarflare ef_vi (10-20ns), DPDK (25-50ns), and AF_XDP (50-80ns) with a decision matrix for HFT environments.

Part 5 Jan 2026 11 min

Huge Pages Done Right: Static, Transparent, and Why Most HFT Firms Disable THP

How a THP compaction stall caused a 400µs latency spike mid-session, plus the correct way to configure static huge pages for trading systems in production.

Part 6 Jan 2026 11 min

Interrupt Affinity, MSI-X, and the Multi-Queue NIC: Engineering Determinism into Network IO

How irqbalance moved an RX queue IRQ to the trading core mid-session, what MSI-X actually is, and how to correctly configure per-queue interrupt affinity for HFT.

Part 7 Jan 2026 12 min

Profiling Production Trading Systems with perf, eBPF, and Off-CPU Flame Graphs

Debugging a P99 latency spike at Akuna Capital using perf record, Brendan Gregg flame graphs, eBPF offcpu analysis, and the critical difference between on-CPU and off-CPU profiles.

Part 8 Jan 2026 13 min

Lock-Free Queues for Market Data: SPSC, MPMC, and the Pitfalls of False Sharing

Why mutex-protected queues fail at HFT rates, a correct C++ SPSC ring buffer implementation with cache-line alignment, and how false sharing costs 8x throughput.

Part 9 Jan 2026 13 min

Real-Time Scheduling on Linux: SCHED_FIFO, SCHED_DEADLINE, and Priority Inversion in Trading Engines

SCHED_FIFO for HFT, priority inversion from the Mars Pathfinder to trading latency, priority inheritance mutexes, and the near-miss kernel lockup from a misconfigured RT process.

Part 10 Jan 2026 12 min

Linux Tunable Drift: Why Your Carefully Tuned Box Is Slower After a Kernel Update

How a Spectre mitigation patch silently added 15% latency regression, what resets your tuning without warning, and how to govern a trading server against configuration drift.