Crypto Exchange Architecture on AWS
Building a crypto exchange infrastructure on AWS. VPC design, security groups, HSM integration, and disaster recovery.
🎯 What You'll Learn
- Design a secure VPC for exchange infrastructure
- Implement proper security group rules
- Integrate AWS HSM for key management
- Plan for disaster recovery and failover
Why AWS for Crypto Exchanges?
Despite latency disadvantages, many crypto exchanges use AWS because:
- Fast iteration - Go live in days, not months
- Security certifications - SOC2, ISO27001 out of the box
- Global presence - Regions near major crypto markets
- Managed services - Less operational burden
This lesson covers architecture patterns for exchange infrastructure on AWS.
VPC Design
A proper exchange VPC has multiple layers:
┌─────────────────────────────────────────────────────────────┐
│ VPC (10.0.0.0/16) │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Public Subnet (10.0.1.0/24) │ │
│ │ [ALB] [NAT Gateway] [Bastion] │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Private Subnet - App (10.0.2.0/24) │ │
│ │ [API Servers] [Matching Engine] [Order Manager] │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Private Subnet - Data (10.0.3.0/24) │ │
│ │ [RDS] [ElastiCache] [DocumentDB] │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Private Subnet - HSM (10.0.4.0/24) │ │
│ │ [CloudHSM] [Key Management] │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```python
---
The matching engine should never be directly accessible from the internet. All external traffic goes through API gateways in the public subnet. The matching engine lives in a private subnet with no inbound rules except from the API layer.
Network segmentation is your first defense.
---
## Terraform VPC
```hcl
# exchange-vpc.tf
resource "aws_vpc" "exchange" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "exchange-vpc"
Environment = "production"
}
}
# Public subnet for ALB, NAT, Bastion
resource "aws_subnet" "public" {
vpc_id = aws_vpc.exchange.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
map_public_ip_on_launch = true
}
# Private subnet for application servers
resource "aws_subnet" "app" {
vpc_id = aws_vpc.exchange.id
cidr_block = "10.0.2.0/24"
availability_zone = "us-east-1a"
}
# Private subnet for databases
resource "aws_subnet" "data" {
vpc_id = aws_vpc.exchange.id
cidr_block = "10.0.3.0/24"
availability_zone = "us-east-1a"
}
# Private subnet for HSM
resource "aws_subnet" "hsm" {
vpc_id = aws_vpc.exchange.id
cidr_block = "10.0.4.0/24"
availability_zone = "us-east-1a"
}
```diff
---
## Security Groups: Least Privilege
```hcl
# ALB security group - only public entry point
resource "aws_security_group" "alb" {
name = "exchange-alb"
description = "Allow HTTPS from internet"
vpc_id = aws_vpc.exchange.id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# App server - only from ALB
resource "aws_security_group" "app" {
name = "exchange-app"
description = "API servers"
vpc_id = aws_vpc.exchange.id
ingress {
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.alb.id] # Only ALB!
}
}
# Matching engine - only from app servers
resource "aws_security_group" "matching" {
name = "exchange-matching"
description = "Matching engine - no internet access"
vpc_id = aws_vpc.exchange.id
ingress {
from_port = 9000
to_port = 9000
protocol = "tcp"
security_groups = [aws_security_group.app.id] # Only app servers!
}
# No egress to internet
egress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.rds.id] # Only DB
}
}
```diff
---
## CloudHSM for Key Management
Crypto exchanges need HSM for:
- Hot wallet signing keys
- API key encryption
- User password hashing
```python
# Simplified CloudHSM integration via PKCS#11
# AWS CloudHSM exposes a PKCS#11 shared library; use python-pkcs11 to wrap it
import pkcs11
import os
class SecureWallet:
def __init__(self):
# CloudHSM installs its PKCS#11 library at this path
lib = pkcs11.lib("/opt/cloudhsm/lib/libcloudhsm_pkcs11.so")
self.token = lib.get_token()
def sign_withdrawal(self, transaction: bytes, key_label: str) -> bytes:
"""Sign transaction using a key that never leaves the HSM."""
with self.token.open(user_pin=os.environ['HSM_PIN']) as session:
private_key = session.get_key(
pkcs11.constants.ObjectClass.PRIVATE_KEY,
label=key_label
)
# Key never leaves the HSM — only the signature comes back
return private_key.sign(transaction, mechanism=pkcs11.Mechanism.ECDSA)
```bash
**Cost:** ~$5,000/month for CloudHSM cluster (2 HSMs minimum for HA)
---
## Common Misconceptions
**Myth:** "Security groups are like firewalls-set once and forget."
**Reality:** Security groups should be audited monthly. Developers add rules for debugging and forget to remove them. Use AWS Config to detect violations.
**Myth:** "Multi-AZ RDS is enough for disaster recovery."
**Reality:** Multi-AZ protects against AZ failure, not region failure. For a crypto exchange, you need cross-region replication and a DR runbook.
**Myth:** "AWS manages security, so I don't need to."
**Reality:** AWS secures the infrastructure; you secure the configuration. Most breaches are misconfigured S3 buckets or overly permissive security groups.
---
## Disaster Recovery Strategy
| Tier | RTO | RPO | Strategy | Cost |
|------|-----|-----|----------|------|
| **Backup & Restore** | Hours | Hours | S3 cross-region | $ |
| **Pilot Light** | Minutes | Seconds | Standby DB in DR region | $$ |
| **Warm Standby** | Seconds | Seconds | Scaled-down DR region | $$$ |
| **Active-Active** | 0 | 0 | Full production in 2 regions | $$$$ |
For exchanges, **Warm Standby** minimum. Active-Active for serious operations.
---
## High-Availability Architecture
```yaml
Region: us-east-1 Region: us-west-2 (DR)
┌─────────────────────┐ ┌─────────────────────┐
│ [ALB] ─── [API] │ │ [ALB] ─── [API] │
│ ├── [Match] │ ────────▶ │ ├── [Match] │
│ └── [RDS-Pri] │ Replication│ └── [RDS-Read] │
└─────────────────────┘ └─────────────────────┘
│ │
└──────── Route 53 Health Checks ─────┘
```diff
---
## Practice Exercises
### Exercise 1: Draw Your VPC
```text
Create a VPC diagram for your exchange:
- How many subnets?
- What goes in each?
- Which can reach the internet?
```text
### Exercise 2: Security Group Audit
```text
For each security group, answer:
1. Who/what can initiate connections?
2. To what ports?
3. Why?
If you can't answer "why," the rule shouldn't exist.
```text
### Exercise 3: DR Runbook
```text
Write steps for region failover:
1. How do you detect the outage?
2. How do you fail over DNS?
3. How do you promote the DR database?
4. How do you fail back?
Key Takeaways
- Network segmentation is fundamental - Public, app, data, HSM subnets
- Security groups = least privilege - Only what’s needed, nothing more
- HSM for critical keys - Signing keys never leave hardware
- Plan for failure - DR strategy before you need it
What’s Next?
Continue learning: Security Architecture for Trading
Expert version: Building a Crypto Exchange on AWS
Want to go deeper?
Weekly infrastructure insights for engineers who build trading systems.
Free forever. Unsubscribe anytime.
You're in. Check your inbox.
Questions about this lesson? Working on related infrastructure?
Let's discuss