eBPF Profiling Without the Observer Effect

How to measure latency at nanosecond precision without slowing down your application. eBPF, bpftrace, and kernel tracing.

Intermediate • 25 min read • Expert Version →

What you'll learn

Understand why traditional profilers add latency
Use bpftrace for nanosecond-precision tracing
Write custom eBPF programs for application tracing
Avoid the observer effect in production profiling

📚 Prerequisites

Before this lesson, you should understand:

The Physics of Processes: Life, Death, and Zombies

The Profiling Paradox

You want to measure your application’s latency. You attach a profiler. Suddenly, your application is 10-100x slower.

Without profiler: 10µs per operation
With strace:      500µs per operation (50x slower)
With gdb:         Don't even try in production
```diff

This is the **observer effect**: measuring the system changes the system.

---

## Why Traditional Profilers Hurt

Traditional profilers use `ptrace()` to intercept syscalls:

```text
Application syscall
  → Context switch to profiler
  → Profiler logs data
  → Context switch back
  → Syscall executes
```diff

That's 3 extra context switches per syscall. Each context switch costs 1-5µs.

---

## How eBPF is Different

**eBPF programs run inside the kernel, not in a separate process.** There's no context switch. The tracing code executes in nanoseconds, not microseconds — and it's JIT-compiled to native machine code.

eBPF adds roughly ~100ns overhead per event. Traditional profilers add ~100µs. That's a 1000x difference.

The eBPF verifier checks every program before loading — it ensures programs terminate, don't access invalid memory, and can't crash the kernel. It's not "unsafe" to run eBPF in production.

---

## strace vs bpftrace

```bash
# The slow way: strace (don't use in production)
strace -tt -T ./your_app 2>&1 | head -20
# Your app is now ~50x slower

# The fast way: bpftrace (production-safe)
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_write /pid == 12345/ { @writes = count(); }'
# Your app runs at normal speed
```python

## Measuring syscall latency with bpftrace

```bash
# Histogram of write() syscall latency
sudo bpftrace -e '
tracepoint:syscalls:sys_enter_write /pid == 12345/ { @start[tid] = nsecs; }
tracepoint:syscalls:sys_exit_write /pid == 12345/ {
  @latency = hist(nsecs - @start[tid]);
  delete(@start[tid]);
}'
```text

Output:
```bash
@latency:
[1K, 2K)    5234 |@@@@@@@@@@@@@@@@@@@@    |
[2K, 4K)    8123 |@@@@@@@@@@@@@@@@@@@@@@@@|
[4K, 8K)    2341 |@@@@@@@                 |
[8K, 16K)    423 |@                       |
```diff

---

## bpftrace One-Liners for Trading

### 1. Network latency per packet

```bash
# Time from packet arrival to application read
sudo bpftrace -e '
kprobe:__netif_receive_skb { @start[arg0] = nsecs; }
kprobe:tcp_recvmsg /@start[arg0]/ {
  @latency = hist(nsecs - @start[arg0]);
  delete(@start[arg0]);
}'
```text

## 2. Scheduler latency (time waiting to run)

```bash
# How long threads wait for CPU
sudo bpftrace -e '
tracepoint:sched:sched_wakeup { @qtime[args->pid] = nsecs; }
tracepoint:sched:sched_switch /@qtime[args->next_pid]/ {
  @latency = hist(nsecs - @qtime[args->next_pid]);
  delete(@qtime[args->next_pid]);
}'
```text

## 3. Lock contention

```bash
# Time spent waiting for mutex
sudo bpftrace -e '
uprobe:/path/to/app:pthread_mutex_lock { @start[tid] = nsecs; }
uretprobe:/path/to/app:pthread_mutex_lock /@start[tid]/ {
  @lock_time = hist(nsecs - @start[tid]);
  delete(@start[tid]);
}'
```diff

---

## Writing Custom eBPF Programs

For complex tracing, write eBPF in C:

```c
// latency_trace.bpf.c
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>

struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 10240);
    __type(key, u32);
    __type(value, u64);
} start_time SEC(".maps");

SEC("tracepoint/syscalls/sys_enter_write")
int trace_write_entry(struct trace_event_raw_sys_enter *ctx) {
    u32 tid = bpf_get_current_pid_tgid();
    u64 ts = bpf_ktime_get_ns();
    bpf_map_update_elem(&start_time, &tid, &ts, BPF_ANY);
    return 0;
}

SEC("tracepoint/syscalls/sys_exit_write")
int trace_write_exit(struct trace_event_raw_sys_exit *ctx) {
    u32 tid = bpf_get_current_pid_tgid();
    u64 *ts = bpf_map_lookup_elem(&start_time, &tid);
    if (ts) {
        u64 latency = bpf_ktime_get_ns() - *ts;
        // Log latency to perf buffer or map
        bpf_map_delete_elem(&start_time, &tid);
    }
    return 0;
}

char LICENSE[] SEC("license") = "GPL";
```text

Compile and load:
```bash
clang -O2 -target bpf -c latency_trace.bpf.c -o latency_trace.o
bpftool prog load latency_trace.o /sys/fs/bpf/latency_trace
```bash

---

## Overhead Comparison

| Tool | Overhead | Use Case |
|------|----------|----------|
| **bpftrace** | ~100ns/event | Production tracing |
| **perf** | ~500ns/event | Sampling profiler |
| **strace** | ~50-100µs/event | Development only |
| **gdb** | Stops process | Development only |

---

## Practice Exercises

### Exercise 1: Trace Your Shell
```bash
# Count syscalls by type
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_* /pid == '$BASHPID'/ { @[probe] = count(); }'
```text

## Exercise 2: Find Slow Writes
```bash
# Writes taking >1ms
sudo bpftrace -e '
tracepoint:syscalls:sys_enter_write { @start[tid] = nsecs; }
tracepoint:syscalls:sys_exit_write /@start[tid]/ {
  $lat = nsecs - @start[tid];
  if ($lat > 1000000) { printf("pid %d: %d ns\n", pid, $lat); }
  delete(@start[tid]);
}'
```text

## Exercise 3: Profile Network Path
```bash
# Time from NIC to application
sudo bpftrace -e '
kprobe:napi_gro_receive { @recv[arg1] = nsecs; }
kretprobe:tcp_recvmsg /@recv[arg0]/ {
  printf("network->app: %d ns\n", nsecs - @recv[arg0]);
  delete(@recv[arg0]);
}'

Key Takeaways

Traditional profilers add 50-100µs overhead - Context switches kill latency
eBPF runs in kernel space - ~100ns overhead, 1000x faster
bpftrace for quick wins - One-liners for common traces
Custom eBPF for production - Full control with C programs
eBPF is safe - The verifier prevents crashes

What’s Next?

Monitoring Trading Systems

eBPF Profiling: Nanoseconds Without Adding Any

Want to go deeper?

Weekly infrastructure insights for engineers who build trading systems.

Free forever. Unsubscribe anytime.

You're in. Check your inbox.

Questions about this lesson? Working on related infrastructure?

Let's discuss