Open-Source Engineer & Contributor

A collection of deep, implementation-level curricula for engineers who want to contribute seriously to major open-source projects — not just fix typos, but build the kind of sustained understanding that leads to committer status.

Each curriculum is designed around how the project is actually developed, tested, reviewed, and maintained by its core contributors. Labs reference real source code, real issue trackers, and real contribution workflows.

Curricula

Project	Focus	Status
Apache Tez	DAG execution engine on YARN — used by Hive, Pig, and custom batch pipelines	Active
OpenSearch	Distributed search & analytics engine on Apache Lucene — search, log analytics, observability	Active
Apache Kafka	Distributed log — producers, consumers, brokers, replication, Streams API	Planned
Apache Flink	Streaming and batch — state machines, checkpointing, watermarks, operators	Planned
Apache Spark	Unified analytics — scheduler, shuffle, RDD lineage, SQL planning	Planned
Apache Hadoop	HDFS, YARN, MapReduce — the foundation layer for everything above	Planned

How to Use This Book

Each curriculum is self-contained. Start at the curriculum's Introduction page and work through its levels sequentially. Levels build on each other — skipping levels skips foundations that later labs depend on.

What you will need for any curriculum:

3+ years of Java (or the project's primary language) on production-grade codebases
Comfort reading large, unfamiliar codebases without a guide
Git, a build tool (Maven / Gradle / sbt), and an IDE (IntelliJ recommended)
Patience: the path from contributor to committer is measured in months to years

Select a curriculum from the table above or from the sidebar to begin.