Engineer's hands on keyboard, terminal mid-compile on screen, instructor's hand pointing at a specific line of code, other students blurred into warm bokeh in background

Cohort 8 · Workshop Floor · Boulder, CO

Technical Skills Training · Cohort-Based

The questions you're afraid to ask at work? We built a curriculum around them.

Kubernetes.  System Design.  Production Debugging.

40Modules across 8 cohorts
94%Rated "directly applicable at work"
12Max learners per cohort

Asked by engineers with 4–9 years of experience.

Am I too senior for a training program?

61%Engineers with 5+ years experienceof our current cohort
Engineer studying distributed systems on a large monitor in a dark office, code visible on screen

Marcus Webb

Senior Backend Engineer, fintech startup · Denver, CO

Before

Marcus had been the most experienced person in his team for two years. He was shipping features, unblocking juniors, and quietly terrified of the system design interviews at the companies he actually wanted to work at. He'd never owned infrastructure. His mental model of Kubernetes was 'Docker but more complicated.' He almost didn't apply because he assumed the program was for juniors.

Module 11 — Distributed Systems Under Load

After

Three weeks in, Marcus was the one asking the hardest questions in the room. By week six he had designed and presented a multi-region failover architecture for a real client's staging environment. He got the Staff Engineer role at the company he'd been quietly watching for eighteen months.

Developer at laptop with code editor open showing a large production codebase, focused expression

Priya Nair

Full-Stack Developer, agency → product company · Austin, TX

Before

Priya could build a complete React app from scratch in a weekend. But when she joined a product company and opened the existing codebase for the first time — 340,000 lines, seven years of decisions, three abandoned frameworks half-migrated — she froze. She'd never debugged something she didn't write. She'd never read a flame graph. She'd never had to care about p99 latency.

Module 7 — Reading Code You Didn't Write

After

By her second week, Priya had submitted her first meaningful PR to the production codebase — a fix for a memory leak she found by reading logs and a flame graph, not by guessing. Her manager mentioned it in the sprint retro. She told us she cried a little on the way home, but in a good way.

The real question under every enrollment hesitation.

Will this actually help me at work on Monday?

78%Learners who shipped something real within 14 days of startingbased on cohort 4–7 surveys

Free Sample Content

Three modules. No email required.

The curriculum earns the download before asking for it. Read these three modules in full. If they feel like the most useful 45 minutes you've spent on your career in months, the full syllabus is the next step.

You're staring at a flame graph. It looks like a city on fire. Every bar is a function call, stacked on the thing that called it, width proportional to how much time it consumed. The x-axis is not time — it's alphabetical ordering within each level. This confuses everyone the first time.

The three questions to ask, in order:

  1. 1.What's the widest bar at the top? That's where the most time is going. Everything below it is what caused it. Start there, not at the bottom.
  2. 2.Does this bar have children? If a wide bar has no children, you've found a leaf — something spending real CPU time doing actual work. That's your candidate.
  3. 3.Is this user code or library code? Library code you can't change. User code you can. Look for your namespace in the function names.
# Generate a flame graph from a running Node.js process
node --prof app.js
node --prof-process isolate-*.log > processed.txt
# Or with clinic.js (recommended for beginners)
clinic flame -- node app.js

In the next section of this module, you'll open an actual flame graph from a production Node.js service (a real one, anonymized, with a real performance bug) and walk through the diagnosis step by step.

This is Module 3 of 40. The full program includes 40 modules, 8 live workshop sessions, and a real project shipped to production.

Get the rest

A pod crashes. You get paged. You open your terminal. What do you type first? Most people type kubectl get pods and stare at CrashLoopBackOff like it owes them an explanation. Here's the actual diagnostic sequence that gets you to the root cause in under four minutes.

# Step 1: See the last exit code
kubectl describe pod <pod-name> | grep -A5 "Last State"
# Step 2: Get the logs from the previous container (not the current one)
kubectl logs <pod-name> --previous
# Step 3: Check events for OOMKilled, ImagePullBackOff, etc.
kubectl get events --sort-by=.lastTimestamp | tail -20

Exit code 137 means OOMKilled — the container used more memory than its limit and the kernel killed it. Exit code 1 is an application error — look at the logs. Exit code 0 is the container exited cleanly, which means your liveness probe is probably wrong.

Workshop exercise: In Module 14's lab environment, you'll be handed a broken three-service deployment. One pod is CrashLoopBackOff, one is Pending, one is Running but returning 503s. You have 20 minutes to diagnose all three.

This is Module 14 of 40. The full program includes 40 modules, 8 live workshop sessions, and a real project shipped to production.

Get the rest

The most common system design mistake: jumping to components before you've established scale. You draw a load balancer. The interviewer asks “how many requests per second?” and you realize you don't know. Everything you've drawn is now in question. Start over — but this time, in the right order.

01

Clarify the scale

Ask: daily active users, peak QPS, read/write ratio, data size in 5 years. Write the numbers on the whiteboard. Every architectural decision flows from these.

02

Define the API contract first

What are the three or four endpoints this system needs? Draw them. This forces you to think about what the system actually does before how it does it.

03

Start with a single server

Draw the simplest possible system that works. Then break it — "at 10k users, this single DB becomes the bottleneck." Then fix the break. This is the narrative the interviewer is looking for.

In Module 28, we do six mock system design sessions back-to-back — URL shortener, rate limiter, notification service, search autocomplete, distributed cache, and a real one from a past cohort member's actual interview. You watch three, you do three.

This is Module 27 of 40. The full program includes 40 modules, 8 live workshop sessions, and a real project shipped to production.

Get the rest

Full Curriculum · 40 Modules

Download the full syllabus.
It reads like a course, not a brochure.

The PDF is 28 pages. It includes the complete module list with learning objectives, the workshop format and schedule, what you'll have shipped by the end, and the specific tools and codebases you'll work in. No testimonials, no stock photography, no marketing copy. Just the curriculum.

40 modules across Kubernetes, system design, production debugging, and distributed systems

8 live workshop sessions with real codebases and real problems

One production-grade project shipped before the cohort ends

Maximum 12 learners — questions get answered, not deferred

Next cohort starts March 17, 2026 · Applications close March 3

Get the full syllabus

Two fields. No spam. Unsubscribe any time.

28-page PDF · Delivered instantly · No credit card

or

Ready to apply for Cohort 8?

View Application →
No spam, everUnsubscribe in one clickWe read every reply