The Sub-50µs Cloud Lie
Why cloud vendors' latency claims don't match reality for trading. Real measurements and the hard limits of cloud infrastructure.
🎯 What You'll Learn
- Understand why vendor latency claims are misleading
- Learn how to measure real trading latency
- Identify cloud infrastructure limitations
- Know when cloud works and when it doesn't
The Marketing vs Reality Gap
Cloud vendors claim “sub-millisecond latency.” Your trading system measures 5-50ms. What’s going on?
AWS claims: "Single-digit millisecond latency"
Your measurement: 15ms to Binance
Reality: Both are "correct" - but measuring different things
```diff
This lesson exposes the gap between marketing claims and trading reality.
---
## What "Latency" Actually Means
Vendors measure **inter-VM latency** within the same datacenter:
```text
EC2 instance-A → EC2 instance-B (same AZ)
AWS claims: ~50-100µs
```text
What you actually need:
```yaml
Your EC2 → Internet → Exchange → Processing → Response
Reality: 5-50ms depending on exchange
```python
**Marketing latency ≠ application latency**
---
## Measuring Real Latency
> **The hypervisor adds 5-20µs of jitter to every network operation.** You share physical hardware with other tenants. When they spike, you spike. This variability is invisible in averages but destroys your p99 latency.
Dedicated hardware doesn't have this problem.
### Measure VM-to-VM (What AWS Claims)
```bash
# Install sockperf on two EC2 instances
sudo apt install sockperf
# Server side
sockperf server -i 0.0.0.0 -p 12345
# Client side - measure latency
sockperf ping-pong -i <server-ip> -p 12345 --pps=max -t 60
# Typical AWS result: avg 60µs, p99 150µs
```text
## Measure to Exchange (What You Actually Get)
```python
import time
import requests
def measure_exchange_latency(url, n=100):
latencies = []
for _ in range(n):
start = time.perf_counter()
requests.get(url)
latencies.append((time.perf_counter() - start) * 1000)
latencies.sort()
print(f"Min: {latencies[0]:.1f}ms")
print(f"Avg: {sum(latencies)/len(latencies):.1f}ms")
print(f"P99: {latencies[int(n*0.99)]:.1f}ms")
print(f"Max: {latencies[-1]:.1f}ms")
# Run from EC2
measure_exchange_latency("https://api.binance.com/api/v3/time")
# Typical: Min 15ms, Avg 25ms, P99 80ms
```bash
---
## Where Cloud Latency Comes From
| Source | Contribution | Fixable? |
|--------|--------------|----------|
| Physical distance | 1-50ms | Move to colo |
| Internet routing | 1-20ms | Pay for direct connect |
| Hypervisor overhead | 5-20µs | Bare metal instance |
| Kernel network stack | 10-50µs | Kernel tuning |
| Your application | Variable | Code optimization |
**90% of your latency is location + network path.** Optimizing code won't fix this.
---
## The Noisy Neighbor Problem
Shared infrastructure means shared variability:
```text
Normal operation:
Your latency: 50µs
Neighbor running ML training:
Your latency: 200µs (CPU steal)
Neighbor doing heavy I/O:
Your latency: 500µs (network contention)
```text
This variability is **random and unpredictable**. Your p99 suffers.
### Measuring CPU Steal
```bash
# Check if you're losing CPU to other tenants
vmstat 1 | awk 'NR>2 {print "steal:", $18"%"}'
# >0% steal means others are taking your CPU time
```bash
---
## AWS Instance Selection
| Instance Type | Latency Profile | Monthly Cost |
|---------------|-----------------|--------------|
| t3.medium | High variability, burst | $30 |
| c6i.2xlarge | Better, still shared | $250 |
| c6i.metal | Bare metal, no hypervisor | $3,000 |
| p4d.24xlarge | Dedicated network | $30,000+ |
**For trading:** Minimum c5n/c6i.xlarge with Enhanced Networking.
---
## Common Misconceptions
**Myth:** "Faster instance types = lower latency."
**Reality:** Instance type affects CPU, not network latency. A t3.micro and p4d.24xlarge have similar network latency to external destinations.
**Myth:** "AWS Direct Connect solves all latency problems."
**Reality:** Direct Connect reduces internet routing variability (~5-10ms savings) but doesn't fix hypervisor jitter or distance.
**Myth:** "My cloud setup is fast enough because average latency is low."
**Reality:** Averages hide tail latency. Your p99 or p99.9 is what matters for trading. One 500ms spike per minute is catastrophic.
---
## When Cloud Makes Sense
### Cloud is Fine For:
- Swing trading (minutes to days)
- Backtesting and research
- Non-latency-sensitive strategies
- Starting out / proving concepts
### Cloud is Not Fine For:
- Market making
- HFT strategies
- Arbitrage (especially cross-exchange)
- Any strategy where you compete on speed
---
## Honest Latency Budget
If you're serious about cloud trading:
```text
Fixed costs (can't optimize):
Distance to exchange: 10-30ms
Internet routing: 5-15ms
TLS handshake: 5-10ms
Variable costs (can optimize):
Application code: 0.1-10ms
Network stack: 0.01-0.1ms
Realistic total: 25-70ms
Your competitor in colo: 0.1-1ms
```diff
You're 25-700x slower. Accept it or move to colo.
---
## Practice Exercises
### Exercise 1: Measure Your Reality
```bash
# From your trading server, measure to your exchange
while true; do
curl -w "%{time_total}\n" -o /dev/null -s https://api.exchange.com/time
sleep 1
done | tee latency.log
```text
## Exercise 2: Check for Steal Time
```bash
# Monitor for 1 hour
vmstat 1 3600 | awk '{print $18}' > steal.log
# Any non-zero values?
```text
## Exercise 3: Compare Instance Types
```text
If budget allows:
- Spin up c6i.xlarge and c6i.metal
- Run same latency test on both
- Compare p99 latency
Key Takeaways
- Vendor claims measure the wrong thing - VM-to-VM ≠ to-exchange
- Hypervisor adds jitter - Shared infrastructure = shared variability
- Distance dominates - No amount of tuning fixes 10ms of physics
- Know your use case - Cloud works for some strategies, not others
What’s Next?
- Trading Infrastructure First Principles
- The Sub-50µs Cloud Lie - extended version
Want to go deeper?
Weekly infrastructure insights for engineers who build trading systems.
Free forever. Unsubscribe anytime.
You're in. Check your inbox.
Questions about this lesson? Working on related infrastructure?
Let's discuss