Open-Source Engineer & Contributor

A collection of deep, implementation-level curricula for engineers who want to contribute seriously to major open-source projects — not just fix typos, but build the kind of sustained understanding that leads to committer status.

Each curriculum is designed around how the project is actually developed, tested, reviewed, and maintained by its core contributors. Labs reference real source code, real issue trackers, and real contribution workflows.


Curricula

ProjectFocusStatus
Apache TezDAG execution engine on YARN — used by Hive, Pig, and custom batch pipelinesActive
Apache KafkaDistributed log — producers, consumers, brokers, replication, Streams APIPlanned
Apache FlinkStreaming and batch — state machines, checkpointing, watermarks, operatorsPlanned
Apache SparkUnified analytics — scheduler, shuffle, RDD lineage, SQL planningPlanned
Apache HadoopHDFS, YARN, MapReduce — the foundation layer for everything abovePlanned

How to Use This Book

Each curriculum is self-contained. Start at the curriculum's Introduction page and work through its levels sequentially. Levels build on each other — skipping levels skips foundations that later labs depend on.

What you will need for any curriculum:

  • 3+ years of Java (or the project's primary language) on production-grade codebases
  • Comfort reading large, unfamiliar codebases without a guide
  • Git, a build tool (Maven / Gradle / sbt), and an IDE (IntelliJ recommended)
  • Patience: the path from contributor to committer is measured in months to years

Select a curriculum from the table above or from the sidebar to begin.