Skip to content
STAGING — not production

eBPF Profiling Without the Observer Effect

How to measure latency at nanosecond precision without slowing down your application. eBPF, bpftrace, and kernel tracing.

Intermediate 25 min read Expert Version →

🎯 What You'll Learn

  • Understand why traditional profilers add latency
  • Use bpftrace for nanosecond-precision tracing
  • Write custom eBPF programs for application tracing
  • Avoid the observer effect in production profiling

The Profiling Paradox

You want to measure your application’s latency. You attach a profiler. Suddenly, your application is 10-100x slower.

Without profiler: 10µs per operation
With strace:      500µs per operation (50x slower)
With gdb:         Don't even try in production
```diff

This is the **observer effect**: measuring the system changes the system.

---

## Why Traditional Profilers Hurt

Traditional profilers use `ptrace()` to intercept syscalls:

```text
Application syscall
  → Context switch to profiler
  → Profiler logs data
  → Context switch back
  → Syscall executes
```diff

That's 3 extra context switches per syscall. Each context switch costs 1-5µs.

---

## How eBPF is Different

**eBPF programs run inside the kernel, not in a separate process.** There's no context switch. The tracing code executes in nanoseconds, not microseconds — and it's JIT-compiled to native machine code.

eBPF adds roughly ~100ns overhead per event. Traditional profilers add ~100µs. That's a 1000x difference.

The eBPF verifier checks every program before loading — it ensures programs terminate, don't access invalid memory, and can't crash the kernel. It's not "unsafe" to run eBPF in production.

---

## strace vs bpftrace

```bash
# The slow way: strace (don't use in production)
strace -tt -T ./your_app 2>&1 | head -20
# Your app is now ~50x slower

# The fast way: bpftrace (production-safe)
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_write /pid == 12345/ { @writes = count(); }'
# Your app runs at normal speed
```python

## Measuring syscall latency with bpftrace

```bash
# Histogram of write() syscall latency
sudo bpftrace -e '
tracepoint:syscalls:sys_enter_write /pid == 12345/ { @start[tid] = nsecs; }
tracepoint:syscalls:sys_exit_write /pid == 12345/ {
  @latency = hist(nsecs - @start[tid]);
  delete(@start[tid]);
}'
```text

Output:
```bash
@latency:
[1K, 2K)    5234 |@@@@@@@@@@@@@@@@@@@@    |
[2K, 4K)    8123 |@@@@@@@@@@@@@@@@@@@@@@@@|
[4K, 8K)    2341 |@@@@@@@                 |
[8K, 16K)    423 |@                       |
```diff

---

## bpftrace One-Liners for Trading

### 1. Network latency per packet

```bash
# Time from packet arrival to application read
sudo bpftrace -e '
kprobe:__netif_receive_skb { @start[arg0] = nsecs; }
kprobe:tcp_recvmsg /@start[arg0]/ {
  @latency = hist(nsecs - @start[arg0]);
  delete(@start[arg0]);
}'
```text

## 2. Scheduler latency (time waiting to run)

```bash
# How long threads wait for CPU
sudo bpftrace -e '
tracepoint:sched:sched_wakeup { @qtime[args->pid] = nsecs; }
tracepoint:sched:sched_switch /@qtime[args->next_pid]/ {
  @latency = hist(nsecs - @qtime[args->next_pid]);
  delete(@qtime[args->next_pid]);
}'
```text

## 3. Lock contention

```bash
# Time spent waiting for mutex
sudo bpftrace -e '
uprobe:/path/to/app:pthread_mutex_lock { @start[tid] = nsecs; }
uretprobe:/path/to/app:pthread_mutex_lock /@start[tid]/ {
  @lock_time = hist(nsecs - @start[tid]);
  delete(@start[tid]);
}'
```diff

---

## Writing Custom eBPF Programs

For complex tracing, write eBPF in C:

```c
// latency_trace.bpf.c
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>

struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 10240);
    __type(key, u32);
    __type(value, u64);
} start_time SEC(".maps");

SEC("tracepoint/syscalls/sys_enter_write")
int trace_write_entry(struct trace_event_raw_sys_enter *ctx) {
    u32 tid = bpf_get_current_pid_tgid();
    u64 ts = bpf_ktime_get_ns();
    bpf_map_update_elem(&start_time, &tid, &ts, BPF_ANY);
    return 0;
}

SEC("tracepoint/syscalls/sys_exit_write")
int trace_write_exit(struct trace_event_raw_sys_exit *ctx) {
    u32 tid = bpf_get_current_pid_tgid();
    u64 *ts = bpf_map_lookup_elem(&start_time, &tid);
    if (ts) {
        u64 latency = bpf_ktime_get_ns() - *ts;
        // Log latency to perf buffer or map
        bpf_map_delete_elem(&start_time, &tid);
    }
    return 0;
}

char LICENSE[] SEC("license") = "GPL";
```text

Compile and load:
```bash
clang -O2 -target bpf -c latency_trace.bpf.c -o latency_trace.o
bpftool prog load latency_trace.o /sys/fs/bpf/latency_trace
```bash

---

## Overhead Comparison

| Tool | Overhead | Use Case |
|------|----------|----------|
| **bpftrace** | ~100ns/event | Production tracing |
| **perf** | ~500ns/event | Sampling profiler |
| **strace** | ~50-100µs/event | Development only |
| **gdb** | Stops process | Development only |

---

## Practice Exercises

### Exercise 1: Trace Your Shell
```bash
# Count syscalls by type
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_* /pid == '$BASHPID'/ { @[probe] = count(); }'
```text

## Exercise 2: Find Slow Writes
```bash
# Writes taking >1ms
sudo bpftrace -e '
tracepoint:syscalls:sys_enter_write { @start[tid] = nsecs; }
tracepoint:syscalls:sys_exit_write /@start[tid]/ {
  $lat = nsecs - @start[tid];
  if ($lat > 1000000) { printf("pid %d: %d ns\n", pid, $lat); }
  delete(@start[tid]);
}'
```text

## Exercise 3: Profile Network Path
```bash
# Time from NIC to application
sudo bpftrace -e '
kprobe:napi_gro_receive { @recv[arg1] = nsecs; }
kretprobe:tcp_recvmsg /@recv[arg0]/ {
  printf("network->app: %d ns\n", nsecs - @recv[arg0]);
  delete(@recv[arg0]);
}'

Key Takeaways

  1. Traditional profilers add 50-100µs overhead - Context switches kill latency
  2. eBPF runs in kernel space - ~100ns overhead, 1000x faster
  3. bpftrace for quick wins - One-liners for common traces
  4. Custom eBPF for production - Full control with C programs
  5. eBPF is safe - The verifier prevents crashes

What’s Next?

Monitoring Trading Systems

eBPF Profiling: Nanoseconds Without Adding Any

Want to go deeper?

Weekly infrastructure insights for engineers who build trading systems.

Free forever. Unsubscribe anytime.

You're in. Check your inbox.

Questions about this lesson? Working on related infrastructure?

Let's discuss