Issue Roadmap — Twelve Stages from Trivial to Release-Blocking
This roadmap is a deliberately ordered ladder of Apache Tez contributions. Each rung trains a specific skill, depends on the rung below it, and ends at a concrete review-ready patch. Skipping rungs is the most common reason contributors stall: a shuffle bug fix without state-machine fluency turns into a six-month patch thread, and a release-blocker triage call without compatibility reflexes turns into a reverted commit.
The stages are calibrated to the Tez 0.10.x codebase on disk at ~/tez-src. JIRA
queries assume https://issues.apache.org/jira/projects/TEZ. Patch discussion happens
on dev@tez.apache.org. Where stages reference real modules they use the exact paths
you will see under ~/tez-src:
tez-api/ public interfaces, descriptors, configuration keys
tez-common/ IDs, util, log helpers, ATS/timeline shared code
tez-dag/ AppMaster: DAGImpl, VertexImpl, TaskImpl, schedulers
tez-runtime-internals/ TezTaskRunner, LogicalIOProcessorRuntimeTask
tez-runtime-library/ ShuffleManager, Fetcher, IFile, MergeManager
tez-mapreduce/ MR-shim inputs/outputs/processors
tez-tests/ MiniTezCluster integration tests
tez-examples/ OrderedWordCount, SimpleSessionExample, etc.
tez-plugins/tez-yarn-timeline-history/ ATS history events
tez-plugins/tez-aux-services/ NM-side ShuffleHandler hook
docs/ User-facing site under src/site/markdown
The Twelve Stages
| # | Stage | Target skill | Prereq | Typical patch size | Review depth |
|---|---|---|---|---|---|
| 1 | Docs & tests | Reading the codebase, JIRA workflow, RAT/checkstyle | none | 1–30 lines | 1 reviewer |
| 2 | Build & logging hygiene | pom dep bands, slf4j idioms, LOG.isDebugEnabled() | 1 | 5–80 lines | 1 reviewer |
| 3 | Error message context | Exception chaining, ID propagation, tez-dag CONTEXT rule | 2 | 20–200 lines | 1–2 reviewers |
| 4 | State machine transitions | StateMachineFactory, InvalidStateTransitonException | 3 | 30–250 lines + test | 2 reviewers, dev@ ping |
| 5 | Scheduler bugs | TaskSchedulerManager, YarnTaskSchedulerService, AMRMClient | 4 | 50–500 lines + MiniCluster test | 2 reviewers |
| 6 | Shuffle & runtime | ShuffleManager, Fetcher, MergeManager, IFile | 5 | 80–600 lines + test | 2 reviewers |
| 7 | Hive-on-Tez compatibility | DAGPlan size, edge property contracts, session reuse | 5 or 6 | varies; often a tez-side + HIVE-side ticket | committers in both projects |
| 8 | YARN integration | AMRMToken, log aggregation, NM aux service, kerberos renewal | 5 | 50–400 lines | 2 reviewers, often YARN-side too |
| 9 | Flaky tests | DrainDispatcher, dispatcher-aware waits, port collisions | 4 | 20–150 lines per test | 1–2 reviewers; sometimes "stamped" |
| 10 | Performance regression | git bisect, async-profiler / JFR, JMH micro | 6 or 8 | 30–300 lines + bench evidence | 2 reviewers, dev@ design ping |
| 11 | Backward compatibility | @InterfaceAudience, @InterfaceStability, protobuf evolution | 4 | small code, long dev@ thread | committers + PMC |
| 12 | Release-blocking | RC voting, -1 binding, security CVE pipeline | committer | varies | PMC + release manager |
How to Use This Roadmap
Pick a stage honestly
Find your rung by asking what is the largest patch you have shipped:
- Never landed a Tez patch: start at Stage 1.
- Landed a docs patch but never touched Java in
tez-dag: Stage 2. - Comfortable with
tez-commonJava but never read a state machine: Stage 3. - Read
VertexImpl.stateMachineFactoryonce and were confused: Stage 4. - Read it twice and could draw the state graph: Stage 5+.
- Already a Tez committer: jump straight to Stages 10–12 for sharpening.
Do not jump rungs to chase a "cool" bug. A locality miscount in
YarnTaskSchedulerService looks self-contained and isn't — the patch will land on
state-machine transitions you have never edited.
One stage per PR
Resist the urge to fix two things in one patch. Reviewers reject mixed-concern patches almost reflexively. If you find a logging issue while fixing an error message, file a follow-up JIRA and move on. The roadmap rewards small surface area.
Always start with git log and git blame
Before touching a file, find the last 5 commits that modified it:
cd ~/tez-src
git log --oneline -n 5 -- tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java
git blame -L 1200,1260 tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java
The blame output tells you which committer cares about that area. CC them on the JIRA.
Time investment per stage
Calibrated against a working contributor who has the codebase checked out, can build
locally with mvn clean install -DskipTests -Phadoop28, and has filed at least one
JIRA before:
| Stage | First patch | Becoming fluent (5 patches landed) |
|---|---|---|
| 1 | half a day | 1 week |
| 2 | 1 day | 2 weeks |
| 3 | 1–2 days | 1 month |
| 4 | 3–5 days | 2–3 months |
| 5 | 1–2 weeks | 4–6 months |
| 6 | 2–4 weeks | 6 months |
| 7 | weeks per attribution call | a year of cross-project work |
| 8 | 1–3 weeks | 6 months |
| 9 | 1–3 days per flake | ongoing |
| 10 | weeks (perf is bisect-bound) | committer-level skill |
| 11 | weeks (dev@ design cycle) | committer-level skill |
| 12 | PMC-level responsibility | n/a |
Success criterion per stage
Each stage is "complete" for you when:
- Stage 1: one docs and one test patch are committed to
master. - Stage 2: at least two logging or build patches are committed without nits.
- Stage 3: one error-context patch is committed with no reviewer asking "which DAG?"
- Stage 4: one transition fix is committed and has a regression test in
TestVertexImpl. - Stage 5: one scheduler patch is committed with a
MiniTezClusterrepro test. - Stage 6: one shuffle-runtime patch is committed with a deterministic repro.
- Stage 7: one cross-project ticket is filed with a written attribution argument.
- Stage 8: one YARN-integration patch is committed with explicit Hadoop-version evidence.
- Stage 9: at least three flaky tests have been de-flaked.
- Stage 10: one perf patch is committed with before/after benchmark numbers.
- Stage 11: one compatibility-sensitive patch is committed with explicit annotations and dev@ sign-off.
- Stage 12: you have helped triage at least one RC vote.
When to ask on dev@
Before writing any code for Stages 4 and above, send a short note to
dev@tez.apache.org:
Subject: [DISCUSS] TEZ-XXXX — proposed approach
I see <symptom> at <file>:<line>. My read is <cause>. I plan to <fix>, with
a regression test in <test>. Would appreciate any context I'm missing before
I post a patch.
Three sentences. No essay. The list will tell you in 24 hours whether you are about to step on someone else's in-flight work.
When the roadmap does not apply
This roadmap is for bug fixes and small features. It is not for:
- New runtime engines or scheduler rewrites — those are Tez Improvement Proposals (TEPs); start a dev@ thread, not a patch.
- Hive query-engine changes that happen to surface in Tez — file on
HIVE, notTEZ. - YARN-side fixes that Tez merely consumes — file on
YARN, notTEZ.
Stage 7 teaches the attribution skill that keeps these in the right project.
What to read alongside this roadmap
| Roadmap stage | Companion deep-dive |
|---|---|
| 1–3 | Reading the codebase |
| 4 | State machines, Vertex lifecycle |
| 5 | Scheduler, DAG App Master |
| 6 | Shuffle & sort, Tez runtime |
| 7 | Hive integration |
| 8 | YARN integration |
| 9 | Testing framework |
| 10 | Container reuse, Tez runtime |
| 11 | Compatibility |
| 12 | Release & PMC |
What this roadmap is not
This roadmap is not a tutorial on Apache Tez itself. The deep dives in
../deep-dives/index.md cover the architecture; the
labs in ../level-1/index.md onward cover the
hands-on code reading. The roadmap assumes you can already build Tez from
source, run the unit tests, and stand up a MiniTezCluster end-to-end. If
you cannot, the prerequisite chapter is Level 1, Lab 1.1.
It is also not a generic Apache contribution guide. The Apache "How to Contribute" pages cover the cross-project mechanics (ICLA, JIRA account creation, mailing list etiquette). The roadmap assumes those are done.
Finally, it is not a roadmap for committership. Becoming a Tez committer is a separate path that the PMC manages. The roadmap teaches the skills that, applied consistently over time, make committership a reasonable outcome — but landing patches is necessary, not sufficient.
Reading order
If you read this book front-to-back, you will hit this chapter after the deep dives and before the capstone. That is the intended sequence:
- Read the deep dives to understand the architecture.
- Read this roadmap to understand the contribution ladder.
- Pick a rung and ship a patch.
- Come back to this roadmap when the patch lands, and step up a rung.
- After three or four rungs, attempt the capstone in ../capstone/index.md.
If you are jumping in mid-book, start at the rung that matches your current skill (see "Pick a stage honestly" above) and read the stage's companion deep dive at the same time.
A note on JQL
The JIRA queries in each stage are starting points. The Tez project's
issue labelling has drifted over the years — labels like newbie and
beginner are inconsistently applied. If a filter returns zero results,
broaden it (remove a clause) before assuming the filter is wrong. Each
stage gives at least one fallback grep-based candidate-finding method that
does not depend on labels.
A second JQL tip: pin a "watched issues" filter for the components you care about. Tez has roughly a dozen components in JIRA; you do not need to watch all of them, but watching the two or three closest to your current rung is how you stay current on landed work.
A note on local clone hygiene
Every stage in this roadmap assumes you have a clean checkout at
~/tez-src. "Clean" means:
git statusshows no untracked files outside.gitignore.git branchshows you onmaster(or a topic branch you remember creating).mvn clean install -DskipTests -Phadoop28completes in under two minutes locally.
A messy checkout produces hard-to-reproduce results: a grep that
catches your own WIP, a git bisect that visits commits whose builds
were already broken by an unrelated local change, a mvn test that
passes locally because of a stale ~/.m2 jar.
Refresh on Mondays:
cd ~/tez-src
git checkout master
git pull --ff-only
git clean -fdx
mvn -q clean install -DskipTests -Phadoop28
The git clean -fdx is aggressive — it removes everything not tracked
by git, including IDE artifacts. Keep an .idea/ (or equivalent) backup
elsewhere if you customise it.
How the stages interlock
Each stage builds vocabulary the next stage uses without re-explaining:
- Stage 1 teaches the patch artifact format. Every later stage assumes it.
- Stage 2 teaches the
LOG.isDebugEnabled()pattern. Stage 3 builds on it with the CONTEXT rule. - Stage 3 teaches you to navigate
tez-dag. Stage 4 lives intez-dag/...impl/. - Stage 4 teaches the state-machine DSL. Stage 5 reads the same DSL in the scheduler.
- Stage 5 teaches
MiniTezCluster. Stage 6 leans on it for every shuffle test. - Stage 6 teaches the runtime contracts. Stage 7 attributes bugs against those contracts to Hive.
- Stage 8 teaches the YARN boundary. Stage 11 references it when discussing compat across Hadoop versions.
- Stage 9 teaches deterministic testing. Stage 10 uses it as the baseline for benchmark stability.
- Stage 10 teaches measurement. Stage 11 uses measurement as evidence for compat decisions.
- Stage 11 teaches the audience/stability matrix. Stage 12 uses it when triaging blockers.
Skipping a stage means skipping a vocabulary. Reviewers will notice.
Now turn the page to Stage 1.