How Computers Actually Work: Complete Guide

From electrons to applications. Understand the CPU, memory, and fetch-decode-execute cycle that powers every program you'll ever write.

Beginner • 35 min read

What you'll learn

Understand the basic components of a computer
Learn how the CPU executes instructions
Grasp the memory hierarchy and why it matters
See how hardware and software connect
Build intuition for performance optimization

The Machine Under the Code

Your computer performs billions of operations per second. But what’s actually happening?

Everything — every video, every game, every AI model — is just numbers being shuffled between storage levels while a calculator adds them up. The engineers who really internalize this write code that’s an order of magnitude faster than engineers who don’t.

The Core Components

Every computer, from your phone to a supercomputer, has the same basic architecture:

CPU
The Brain ↔ Memory (RAM)
Working Space ↔ Storage (SSD/HDD)
Long-term Memory

Let’s understand each one:

The CPU: The Calculator

The Central Processing Unit is where computation happens. It can only do a few things:

Arithmetic: Add, subtract, multiply, divide
Logic: Compare values, AND/OR/NOT operations
Move data: Load from memory, store to memory
Branch: Jump to different instructions based on conditions

That’s it. Every program you’ve ever used (from Photoshop to video games to operating systems) is just clever combinations of these four operations.

Memory (RAM): The Desk

Random Access Memory is your computer’s working space. It’s:

Fast: 100 nanoseconds to access (compared to 10 milliseconds for disk)
Volatile: Contents disappear when power is off
Limited: Usually 8-64 GB in modern computers

Think of RAM like a desk. You can only work on what’s on the desk. Everything else is in filing cabinets (storage) and needs to be retrieved.

Storage: The Filing Cabinet

Persistent storage (SSD/HDD) keeps your data safe when power is off:

Slow: 1000x slower than RAM
Persistent: Data survives power loss
Large: 256 GB to several TB

The Fetch-Decode-Execute Cycle

This is the heartbeat of every computer. The CPU repeats this cycle billions of times per second:

1. FETCH
Get instruction from memory → 2. DECODE
Figure out what to do → 3. EXECUTE
Do the operation

Step 1: Fetch

The CPU has a special register called the Program Counter (PC) that holds the memory address of the next instruction. The CPU:

Reads the address in PC
Goes to that memory location
Retrieves the instruction stored there
Increments PC to point to the next instruction

Step 2: Decode

The fetched instruction is a number. The CPU decodes it to understand:

What operation? (add, subtract, load, etc.)
What data? (which registers, which memory addresses)

For example, the instruction 0x01D8 might mean “add register B to register A.”

Step 3: Execute

The CPU performs the operation. If it’s:

Arithmetic: The ALU (Arithmetic Logic Unit) computes the result
Memory access: Data is loaded from or stored to RAM
Branch: The PC is updated to a new address

Putting It Together

Here’s a simple example. The instruction “add 5 to the value in memory location 100”:

1. FETCH:   PC=0x1000, get instruction at 0x1000
2. DECODE:  Instruction means "load from address 100"
3. EXECUTE: Read value from address 100 into register

4. FETCH:   PC=0x1004, get next instruction
5. DECODE:  Instruction means "add 5 to register"
6. EXECUTE: Add 5 to the register value

7. FETCH:   PC=0x1008, get next instruction
8. DECODE:  Instruction means "store register to address 100"
9. EXECUTE: Write register value back to address 100
```bash

At 3 GHz, this happens 3 billion times per second. Your "instant" mouse click involves millions of these cycles.

---

## The Memory Hierarchy

Here's the dirty secret of computing: **CPU is way faster than memory**.

A modern CPU can execute an operation every 0.3 nanoseconds. But accessing RAM takes 100 nanoseconds. That's 300 wasted cycles waiting for data!

The solution: **caches**-small, fast memory close to the CPU.

| Level | Size | Latency | Analogy |
|-------|------|---------|---------|
| **Registers** | ~1 KB | 0.3 ns | Your hands |
| **L1 Cache** | 64 KB | 1 ns | Your desk |
| **L2 Cache** | 512 KB | 4 ns | Your office drawer |
| **L3 Cache** | 8 MB | 20 ns | Filing cabinet |
| **RAM** | 16 GB | 100 ns | Library in your building |
| **SSD** | 500 GB | 100,000 ns | Library across town |

### Why This Matters for Performance

```python
# Bad: Random memory access
for i in random_order:
    array[i] += 1      # Cache miss every time!

# Good: Sequential access
for i in range(len(array)):
    array[i] += 1      # Cache loves this
```bash

The difference? Sequential access can be **100x faster** because of how caches work. The cache loads data in chunks (cache lines). Sequential access uses the whole chunk; random access wastes it.

---

## Numbers Every Programmer Should Know

Memorize these. They'll inform every performance decision:

| Operation | Time |
|-----------|------|
| L1 cache reference | 1 ns |
| L2 cache reference | 4 ns |
| RAM reference | 100 ns |
| SSD random read | 150,000 ns (150 µs) |
| HDD seek | 10,000,000 ns (10 ms) |
| Network round-trip (same datacenter) | 500,000 ns (500 µs) |
| Network round-trip (cross-country) | 150,000,000 ns (150 ms) |

### Relative Scale

If an L1 cache access takes 1 second:
- L2 cache: 4 seconds
- RAM: 1.5 minutes
- SSD: 1.5 days
- HDD: 4 months
- Cross-country network: 5 years

**This is why caching, locality, and data structure choice matter so much.**

---

## From Hardware to Software

How does your Python code become CPU operations?

<div class="flow-diagram">
  <div class="flow-row">
    <span class="flow-node">Your Code<br><small>Python/JS/etc.</small></span>
    <span class="flow-arrow">→</span>
    <span class="flow-node">Interpreter/Compiler</span>
    <span class="flow-arrow">→</span>
    <span class="flow-node">Machine Code</span>
    <span class="flow-arrow">→</span>
    <span class="flow-node">CPU</span>
  </div>
</div>

**Compiled languages** (C, Rust, Go): Code is translated to machine code ahead of time. Fast execution.

**Interpreted languages** (Python, JavaScript): Code is translated line-by-line at runtime. More flexible, slower.

**JIT compiled** (Java, modern JS): Hybrid-interpreted first, hot paths compiled during execution.

```c
// This C code:
int x = 5;
int y = 10;
int z = x + y;

// Becomes something like:
// MOV R1, 5        ; Put 5 in register 1
// MOV R2, 10       ; Put 10 in register 2
// ADD R3, R1, R2   ; Add R1+R2, store in R3
```bash

---

## Practice Exercises

### Exercise 1: Latency Intuition (Beginner)

Rank these operations from fastest to slowest:
1. Reading from L1 cache
2. Reading from SSD
3. Reading from RAM
4. Network request to a server in the same city
5. Reading from L3 cache

<details>
<summary>Answer</summary>

1. L1 cache (1 ns)
2. L3 cache (20 ns)
3. RAM (100 ns)
4. SSD (150,000 ns)
5. Network (5,000,000+ ns)
</details>

### Exercise 2: Cache Behavior (Intermediate)

Why is this loop fast:
```python
for i in range(1000):
    total += array[i]
```text

And this loop slow:
```python
for i in range(1000):
    total += array[random.randint(0, len(array)-1)]

Answer

Sequential access benefits from cache prefetching. The CPU predicts you’ll need the next bytes and loads them in advance. Random access defeats this-every access is a cache miss.

Exercise 3: CPU Bottleneck Analysis (Advanced)

Your program takes 10 seconds. Profiling shows:

8 seconds waiting for disk I/O
1 second in CPU computation
1 second waiting for network

What’s the bottleneck? How would you optimize?

Answer

Disk I/O is the bottleneck (80% of time). Optimizations:

Load data into RAM once, process in memory
Use an SSD instead of HDD
Read files sequentially, not randomly
Use async I/O to overlap reading with processing

Optimizing CPU computation would save at most 1 second (10%).

Knowledge Check

What are the three main operations in the fetch-decode-execute cycle?
Why do we have cache memory? Why not just use more RAM?
L1 cache access takes 1ns. RAM access takes 100ns. How many L1 accesses could you do in the time of one RAM access?
What’s the difference between compiled and interpreted languages at the hardware level?
True or False: A 3 GHz CPU executes 3 billion instructions per second.

Answers

Fetch (get instruction from memory), Decode (understand what it means), Execute (perform the operation).
RAM is too slow for the CPU. The CPU would spend most of its time waiting. Cache is faster but more expensive per byte, so we use small amounts close to the CPU.
100 accesses. This is why cache hit rate matters so much.
Compiled: Machine code is generated once, runs directly on CPU. Interpreted: Code is translated to machine operations at runtime by another program.
False. Modern CPUs can execute multiple instructions per cycle (superscalar) and run at varying speeds (turbo boost). Actual instructions per second depends on workload.

Summary

Concept	Key Takeaway
CPU	The calculator that executes instructions
RAM	Fast, volatile working memory
Storage	Slow, persistent long-term storage
Fetch-Decode-Execute	The fundamental cycle of computation
Memory Hierarchy	Trade-off between speed and capacity
Cache	Small, fast memory that hides RAM latency

The mental model: your computer shuffles numbers between different speed levels of storage while a calculator adds them up billions of times per second.

What’s Next?

Networking Basics — How data moves between computers
What Is the Linux Kernel — The software that controls the hardware
Linux Kernel Optimization Series — How to tune these systems

Want to go deeper?

Weekly infrastructure insights for engineers who build trading systems.

Free forever. Unsubscribe anytime.

You're in. Check your inbox.

Questions about this lesson? Working on related infrastructure?

Let's discuss