The Anatomy of a Sub-50µs Trade: Tracing a Packet from NIC to Strategy and Back
A packet-level walkthrough of a sub-50µs trade at Akuna Capital: NIC ring buffer, kernel bypass, strategy evaluation, order encoding, and wire transmit.
EU AI Act Article 14: 90 days until enforcement. 2 Q2 2026 slots left. Check compliance →
Sub-100µs from NIC to strategy decision
The kernel layer beneath every HFT trade: NUMA topology, CPU isolation, kernel bypass networking, huge pages, and the RT scheduling configuration that separates 18µs from 200µs.
Every institutional desk runs Linux bare-metal. The performance delta between a tuned and untuned system is not 10%—it is a full order of magnitude. These ten posts document the techniques that moved production latency from 200µs to sub-50µs at Akuna Capital, with measured numbers at every stage.
A packet-level walkthrough of a sub-50µs trade at Akuna Capital: NIC ring buffer, kernel bypass, strategy evaluation, order encoding, and wire transmit.
Real production NUMA debugging at Akuna Capital: P99 latency doubling overnight, cross-socket penalty measurement, and the numastat/perf c2c workflow.
How to build a genuinely quiet CPU core for HFT using isolcpus, nohz_full, rcu_nocbs, and proper IRQ migration — with the grub cmdline that actually works.
A production comparison of kernel bypass approaches: Solarflare ef_vi (10-20ns), DPDK (25-50ns), and AF_XDP (50-80ns) with a decision matrix for HFT environments.
How a THP compaction stall caused a 400µs latency spike mid-session, plus the correct way to configure static huge pages for trading systems in production.
How irqbalance moved an RX queue IRQ to the trading core mid-session, what MSI-X actually is, and how to correctly configure per-queue interrupt affinity for HFT.
Debugging a P99 latency spike at Akuna Capital using perf record, Brendan Gregg flame graphs, eBPF offcpu analysis, and the critical difference between on-CPU and off-CPU profiles.
Why mutex-protected queues fail at HFT rates, a correct C++ SPSC ring buffer implementation with cache-line alignment, and how false sharing costs 8x throughput.
SCHED_FIFO for HFT, priority inversion from the Mars Pathfinder to trading latency, priority inheritance mutexes, and the near-miss kernel lockup from a misconfigured RT process.
How a Spectre mitigation patch silently added 15% latency regression, what resets your tuning without warning, and how to govern a trading server against configuration drift.