Open-Source Engineer & Contributor
A collection of deep, implementation-level curricula for engineers who want to contribute seriously to major open-source projects — not just fix typos, but build the kind of sustained understanding that leads to committer status.
Each curriculum is designed around how the project is actually developed, tested, reviewed, and maintained by its core contributors. Labs reference real source code, real issue trackers, and real contribution workflows.
Curricula
| Project | Focus | Status |
|---|---|---|
| Apache Tez | DAG execution engine on YARN — used by Hive, Pig, and custom batch pipelines | Active |
| Apache Kafka | Distributed log — producers, consumers, brokers, replication, Streams API | Planned |
| Apache Flink | Streaming and batch — state machines, checkpointing, watermarks, operators | Planned |
| Apache Spark | Unified analytics — scheduler, shuffle, RDD lineage, SQL planning | Planned |
| Apache Hadoop | HDFS, YARN, MapReduce — the foundation layer for everything above | Planned |
How to Use This Book
Each curriculum is self-contained. Start at the curriculum's Introduction page and work through its levels sequentially. Levels build on each other — skipping levels skips foundations that later labs depend on.
What you will need for any curriculum:
- 3+ years of Java (or the project's primary language) on production-grade codebases
- Comfort reading large, unfamiliar codebases without a guide
- Git, a build tool (Maven / Gradle / sbt), and an IDE (IntelliJ recommended)
- Patience: the path from contributor to committer is measured in months to years
Select a curriculum from the table above or from the sidebar to begin.