Skip to content
STAGING — not production

Security Incident Response

When things go wrong. Learn how to detect, contain, and recover from security incidents.

Intermediate 35 min read

🎯 What You'll Learn

  • Understand the incident response lifecycle
  • Learn detection and containment strategies
  • Know how to investigate incidents
  • Recover and learn from incidents
  • Build an incident response plan

When Things Go Wrong

Security incidents happen — breaches, malware, data leaks. What matters is how you respond.

Fast, effective response minimizes damage. Panic and improvisation make it worse.


The Incident Response Lifecycle

Preparation → Detection → Containment → Eradication → Recovery → Lessons
```bash

---

## Phase 1: Preparation

Before incidents happen:

### Build Your Team

| Role | Responsibility |
|------|----------------|
| Incident Commander | Decisions, communication |
| Technical Lead | Investigation, containment |
| Communications | Internal/external messaging |
| Legal | Compliance, notification |

### Create Runbooks

Pre-written playbooks for common scenarios:
- Malware infection
- Data breach
- DDoS attack
- Account compromise

### Prepare Tools

- Log aggregation (ELK, Splunk, or simpler alternatives)
- Forensic tools
- Communication channels (out-of-band — if email is compromised, you need a backup)
- Contact lists

---

## Phase 2: Detection

Recognize that an incident is occurring.

### Detection Sources

| Source | Example |
|--------|---------|
| Monitoring alerts | Unusual login patterns |
| User reports | "My computer is acting weird" |
| External notification | Researcher, customer, attacker |
| Log analysis | Failed auth spikes |

### Triage Questions

1. What is happening?
2. When did it start?
3. What systems are affected?
4. Is it ongoing?
5. What's the potential impact?

---

## Phase 3: Containment

Stop the bleeding.

### Short-Term Containment

| Action | Purpose |
|--------|---------|
| Isolate system | Prevent lateral movement |
| Block IPs | Stop ongoing attack |
| Disable accounts | Prevent access |
| Preserve evidence | Don't destroy logs |

```bash
# Example: Isolate network (emergency use only — document first)
iptables -A INPUT -j DROP
iptables -A OUTPUT -j DROP

# Or: Move to quarantine VLAN via network team
```diff

## Long-Term Containment

Keep business running while you investigate:
- Temporary workarounds
- Rebuild affected systems in parallel
- Monitor for re-infection

---

## Phase 4: Eradication

Remove the threat completely.

### Find Root Cause

- How did they get in?
- What did they access?
- What did they leave behind (persistence mechanisms)?

### Clean Up

- Remove malware
- Close vulnerabilities
- Reset compromised credentials
- Patch exploited systems

---

## Phase 5: Recovery

Return to normal operations.

### Restore Safely

```python
1. Verify system is clean
2. Restore from known-good backup
3. Monitor closely after restoration
4. Gradual return to production
```diff

### Validation

- Systems functioning correctly
- Security controls in place
- No signs of persistent access

---

## Phase 6: Lessons Learned

Every incident is a learning opportunity.

### Post-Mortem Meeting

Within 1-2 weeks:
- What happened (timeline)
- What went well
- What could improve
- Action items

### Document Everything

```markdown
# Incident Report: [Title]

## Summary
Brief description of what happened.

## Timeline
- HH:MM - Detection
- HH:MM - Containment began
- HH:MM - Root cause identified
- HH:MM - Systems restored

## Impact
- Systems affected
- Data exposed
- Duration
- Cost

## Root Cause
How the incident occurred.

## Response Actions
What was done to contain and eradicate.

## Lessons Learned
What to improve.

## Action Items
- [ ] Item 1 (Owner, Due date)
- [ ] Item 2 (Owner, Due date)

Practice Exercises

Exercise 1: Triage (Beginner)

An employee reports: “I can’t access my email and there are weird files on my desktop.”

What are your first 3 questions?

Answer
  1. When did you first notice this?
  2. Did you click any links or open attachments recently?
  3. Are your coworkers experiencing the same issue?

Exercise 2: Containment (Intermediate)

You’ve confirmed malware on a developer’s laptop that has SSH access to production servers.

What containment actions do you take?

Answer
  1. Isolate the laptop from network
  2. Revoke developer’s SSH keys
  3. Check production access logs for suspicious activity during the infection window
  4. Force password reset on affected accounts
  5. Monitor for unusual production activity

Exercise 3: Post-Mortem (Advanced)

Write a brief post-mortem for this scenario:

  • Attacker gained access via phishing
  • Had access for 3 days before detection
  • Exfiltrated customer database

Knowledge Check

  1. What are the six phases of incident response?

  2. Why preserve evidence during containment?

  3. What’s the difference between short-term and long-term containment?

  4. Why do a post-mortem?

  5. What’s the first thing you should do when you detect an incident?

Answers
  1. Preparation, Detection, Containment, Eradication, Recovery, Lessons Learned.

  2. For investigation and potential legal action. Destroying evidence makes it impossible to understand what happened and may complicate legal proceedings.

  3. Short-term = immediate isolation. Long-term = temporary workarounds while you investigate and clean up properly.

  4. Learn and improve. Understand what happened, what worked, what didn’t, so you’re better prepared next time.

  5. Assess scope and impact before taking containment actions. Don’t panic and don’t act without documenting what you’re doing.


Summary

PhaseGoal
PreparationBe ready before incidents
DetectionRecognize incidents quickly
ContainmentStop the damage
EradicationRemove the threat
RecoveryReturn to normal
LessonsImprove for next time

What’s Next?

Want to go deeper?

Weekly infrastructure insights for engineers who build trading systems.

Free forever. Unsubscribe anytime.

You're in. Check your inbox.

Questions about this lesson? Working on related infrastructure?

Let's discuss