Issue Roadmap — Twelve Stages from Trivial to Release-Blocking

This roadmap is a deliberately ordered ladder of Apache Tez contributions. Each rung trains a specific skill, depends on the rung below it, and ends at a concrete review-ready patch. Skipping rungs is the most common reason contributors stall: a shuffle bug fix without state-machine fluency turns into a six-month patch thread, and a release-blocker triage call without compatibility reflexes turns into a reverted commit.

The stages are calibrated to the Tez 0.10.x codebase on disk at ~/tez-src. JIRA queries assume https://issues.apache.org/jira/projects/TEZ. Patch discussion happens on dev@tez.apache.org. Where stages reference real modules they use the exact paths you will see under ~/tez-src:

tez-api/                       public interfaces, descriptors, configuration keys
tez-common/                    IDs, util, log helpers, ATS/timeline shared code
tez-dag/                       AppMaster: DAGImpl, VertexImpl, TaskImpl, schedulers
tez-runtime-internals/         TezTaskRunner, LogicalIOProcessorRuntimeTask
tez-runtime-library/           ShuffleManager, Fetcher, IFile, MergeManager
tez-mapreduce/                 MR-shim inputs/outputs/processors
tez-tests/                     MiniTezCluster integration tests
tez-examples/                  OrderedWordCount, SimpleSessionExample, etc.
tez-plugins/tez-yarn-timeline-history/   ATS history events
tez-plugins/tez-aux-services/  NM-side ShuffleHandler hook
docs/                          User-facing site under src/site/markdown

The Twelve Stages

#StageTarget skillPrereqTypical patch sizeReview depth
1Docs & testsReading the codebase, JIRA workflow, RAT/checkstylenone1–30 lines1 reviewer
2Build & logging hygienepom dep bands, slf4j idioms, LOG.isDebugEnabled()15–80 lines1 reviewer
3Error message contextException chaining, ID propagation, tez-dag CONTEXT rule220–200 lines1–2 reviewers
4State machine transitionsStateMachineFactory, InvalidStateTransitonException330–250 lines + test2 reviewers, dev@ ping
5Scheduler bugsTaskSchedulerManager, YarnTaskSchedulerService, AMRMClient450–500 lines + MiniCluster test2 reviewers
6Shuffle & runtimeShuffleManager, Fetcher, MergeManager, IFile580–600 lines + test2 reviewers
7Hive-on-Tez compatibilityDAGPlan size, edge property contracts, session reuse5 or 6varies; often a tez-side + HIVE-side ticketcommitters in both projects
8YARN integrationAMRMToken, log aggregation, NM aux service, kerberos renewal550–400 lines2 reviewers, often YARN-side too
9Flaky testsDrainDispatcher, dispatcher-aware waits, port collisions420–150 lines per test1–2 reviewers; sometimes "stamped"
10Performance regressiongit bisect, async-profiler / JFR, JMH micro6 or 830–300 lines + bench evidence2 reviewers, dev@ design ping
11Backward compatibility@InterfaceAudience, @InterfaceStability, protobuf evolution4small code, long dev@ threadcommitters + PMC
12Release-blockingRC voting, -1 binding, security CVE pipelinecommittervariesPMC + release manager

How to Use This Roadmap

Pick a stage honestly

Find your rung by asking what is the largest patch you have shipped:

  • Never landed a Tez patch: start at Stage 1.
  • Landed a docs patch but never touched Java in tez-dag: Stage 2.
  • Comfortable with tez-common Java but never read a state machine: Stage 3.
  • Read VertexImpl.stateMachineFactory once and were confused: Stage 4.
  • Read it twice and could draw the state graph: Stage 5+.
  • Already a Tez committer: jump straight to Stages 10–12 for sharpening.

Do not jump rungs to chase a "cool" bug. A locality miscount in YarnTaskSchedulerService looks self-contained and isn't — the patch will land on state-machine transitions you have never edited.

One stage per PR

Resist the urge to fix two things in one patch. Reviewers reject mixed-concern patches almost reflexively. If you find a logging issue while fixing an error message, file a follow-up JIRA and move on. The roadmap rewards small surface area.

Always start with git log and git blame

Before touching a file, find the last 5 commits that modified it:

cd ~/tez-src
git log --oneline -n 5 -- tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java
git blame -L 1200,1260 tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java

The blame output tells you which committer cares about that area. CC them on the JIRA.

Time investment per stage

Calibrated against a working contributor who has the codebase checked out, can build locally with mvn clean install -DskipTests -Phadoop28, and has filed at least one JIRA before:

StageFirst patchBecoming fluent (5 patches landed)
1half a day1 week
21 day2 weeks
31–2 days1 month
43–5 days2–3 months
51–2 weeks4–6 months
62–4 weeks6 months
7weeks per attribution calla year of cross-project work
81–3 weeks6 months
91–3 days per flakeongoing
10weeks (perf is bisect-bound)committer-level skill
11weeks (dev@ design cycle)committer-level skill
12PMC-level responsibilityn/a

Success criterion per stage

Each stage is "complete" for you when:

  • Stage 1: one docs and one test patch are committed to master.
  • Stage 2: at least two logging or build patches are committed without nits.
  • Stage 3: one error-context patch is committed with no reviewer asking "which DAG?"
  • Stage 4: one transition fix is committed and has a regression test in TestVertexImpl.
  • Stage 5: one scheduler patch is committed with a MiniTezCluster repro test.
  • Stage 6: one shuffle-runtime patch is committed with a deterministic repro.
  • Stage 7: one cross-project ticket is filed with a written attribution argument.
  • Stage 8: one YARN-integration patch is committed with explicit Hadoop-version evidence.
  • Stage 9: at least three flaky tests have been de-flaked.
  • Stage 10: one perf patch is committed with before/after benchmark numbers.
  • Stage 11: one compatibility-sensitive patch is committed with explicit annotations and dev@ sign-off.
  • Stage 12: you have helped triage at least one RC vote.

When to ask on dev@

Before writing any code for Stages 4 and above, send a short note to dev@tez.apache.org:

Subject: [DISCUSS] TEZ-XXXX — proposed approach

I see <symptom> at <file>:<line>. My read is <cause>. I plan to <fix>, with
a regression test in <test>. Would appreciate any context I'm missing before
I post a patch.

Three sentences. No essay. The list will tell you in 24 hours whether you are about to step on someone else's in-flight work.

When the roadmap does not apply

This roadmap is for bug fixes and small features. It is not for:

  • New runtime engines or scheduler rewrites — those are Tez Improvement Proposals (TEPs); start a dev@ thread, not a patch.
  • Hive query-engine changes that happen to surface in Tez — file on HIVE, not TEZ.
  • YARN-side fixes that Tez merely consumes — file on YARN, not TEZ.

Stage 7 teaches the attribution skill that keeps these in the right project.


What to read alongside this roadmap


What this roadmap is not

This roadmap is not a tutorial on Apache Tez itself. The deep dives in ../deep-dives/index.md cover the architecture; the labs in ../level-1/index.md onward cover the hands-on code reading. The roadmap assumes you can already build Tez from source, run the unit tests, and stand up a MiniTezCluster end-to-end. If you cannot, the prerequisite chapter is Level 1, Lab 1.1.

It is also not a generic Apache contribution guide. The Apache "How to Contribute" pages cover the cross-project mechanics (ICLA, JIRA account creation, mailing list etiquette). The roadmap assumes those are done.

Finally, it is not a roadmap for committership. Becoming a Tez committer is a separate path that the PMC manages. The roadmap teaches the skills that, applied consistently over time, make committership a reasonable outcome — but landing patches is necessary, not sufficient.

Reading order

If you read this book front-to-back, you will hit this chapter after the deep dives and before the capstone. That is the intended sequence:

  1. Read the deep dives to understand the architecture.
  2. Read this roadmap to understand the contribution ladder.
  3. Pick a rung and ship a patch.
  4. Come back to this roadmap when the patch lands, and step up a rung.
  5. After three or four rungs, attempt the capstone in ../capstone/index.md.

If you are jumping in mid-book, start at the rung that matches your current skill (see "Pick a stage honestly" above) and read the stage's companion deep dive at the same time.

A note on JQL

The JIRA queries in each stage are starting points. The Tez project's issue labelling has drifted over the years — labels like newbie and beginner are inconsistently applied. If a filter returns zero results, broaden it (remove a clause) before assuming the filter is wrong. Each stage gives at least one fallback grep-based candidate-finding method that does not depend on labels.

A second JQL tip: pin a "watched issues" filter for the components you care about. Tez has roughly a dozen components in JIRA; you do not need to watch all of them, but watching the two or three closest to your current rung is how you stay current on landed work.

A note on local clone hygiene

Every stage in this roadmap assumes you have a clean checkout at ~/tez-src. "Clean" means:

  • git status shows no untracked files outside .gitignore.
  • git branch shows you on master (or a topic branch you remember creating).
  • mvn clean install -DskipTests -Phadoop28 completes in under two minutes locally.

A messy checkout produces hard-to-reproduce results: a grep that catches your own WIP, a git bisect that visits commits whose builds were already broken by an unrelated local change, a mvn test that passes locally because of a stale ~/.m2 jar.

Refresh on Mondays:

cd ~/tez-src
git checkout master
git pull --ff-only
git clean -fdx
mvn -q clean install -DskipTests -Phadoop28

The git clean -fdx is aggressive — it removes everything not tracked by git, including IDE artifacts. Keep an .idea/ (or equivalent) backup elsewhere if you customise it.

How the stages interlock

Each stage builds vocabulary the next stage uses without re-explaining:

  • Stage 1 teaches the patch artifact format. Every later stage assumes it.
  • Stage 2 teaches the LOG.isDebugEnabled() pattern. Stage 3 builds on it with the CONTEXT rule.
  • Stage 3 teaches you to navigate tez-dag. Stage 4 lives in tez-dag/...impl/.
  • Stage 4 teaches the state-machine DSL. Stage 5 reads the same DSL in the scheduler.
  • Stage 5 teaches MiniTezCluster. Stage 6 leans on it for every shuffle test.
  • Stage 6 teaches the runtime contracts. Stage 7 attributes bugs against those contracts to Hive.
  • Stage 8 teaches the YARN boundary. Stage 11 references it when discussing compat across Hadoop versions.
  • Stage 9 teaches deterministic testing. Stage 10 uses it as the baseline for benchmark stability.
  • Stage 10 teaches measurement. Stage 11 uses measurement as evidence for compat decisions.
  • Stage 11 teaches the audience/stability matrix. Stage 12 uses it when triaging blockers.

Skipping a stage means skipping a vocabulary. Reviewers will notice.

Now turn the page to Stage 1.