Capstone Project
The Capstone is the bridge from "I have read the Tez codebase" to "I have shipped a non-trivial fix that an Apache Tez committer merged into master." Everything in Levels 1–7 was preparation. This is the work.
You will pick one real, open Apache Tez JIRA, reproduce it against a current build, trace the failure through the codebase, identify the root cause, write a minimum-diff patch with deterministic tests, get it through precommit (Yetus / GitHub Actions), respond to review comments, and land the change. Then you write it up so the next person can learn from your investigation.
This chapter is the table of contents. The ten step-chapters that follow are the work itself.
Prerequisites
Do not start the Capstone until you can answer "yes" to every one of these:
- Level 1–7 complete. You can read
DAGImpl,VertexImpl,TaskImpl,TaskAttemptImpl,AsyncDispatcher, the shuffle path (ShuffleManager,Fetcher,MergeManager), and at least oneVertexManagerPlugin(ShuffleVertexManagerorRootInputVertexManager) without a guide open. - You have built Tez from source.
mvn clean install -DskipTestssucceeds on your machine, andmvn test -pl tez-dagfinishes (some flakes are normal — see Stage 9 of the issue roadmap). - You have run
MiniTezClusterlocally.mvn test -pl tez-tests -Dtest=TestOrderedWordCountgoes green. - You have a working JIRA + Apache ID (or a GitHub account ready to PR).
- You have read the Tez contribution guide:
https://tez.apache.org/contribution_guide.htmlandhttps://cwiki.apache.org/confluence/display/TEZ/How+to+Contribute.
If any of these is "no," stop. Go back. The Capstone is unforgiving of partial preparation — you will spend three weeks confused instead of three weeks shipping.
The 10-Step Flow
flowchart TD
A[Step 1: Issue Selection] --> B[Step 2: Reproduction]
B --> C[Step 3: Execution Path Analysis]
C --> D[Step 4: Root Cause Identification]
D --> E[Step 5: Implementation]
E --> F[Step 6: Testing]
F --> G[Step 7: Validation]
G --> H[Step 8: Patch / PR]
H --> I[Step 9: JIRA + Docs]
I --> J[Step 10: Engineering Write-Up]
G -.fail.-> D
F -.fail.-> E
H -.review.-> E
The dotted arrows are the loops you will actually run. Nobody gets root cause right on the first hypothesis. Nobody passes precommit on the first push. Plan for two or three iterations through Steps 4–8 before you land.
Deliverables
By the time you mark the Capstone done, every one of these artifacts exists:
| # | Artifact | Lives in |
|---|---|---|
| 1 | Failing reproducer test (a JUnit test that fails on master without your patch and passes with it) | tez-tests/ or a module-local src/test/java/... |
| 2 | Root-cause document (200–500 words, with file:line citations) | capstone-work/root-cause.md in your fork |
| 3 | Minimum-diff patch | A branch on your fork of apache/tez |
| 4 | Unit tests using DrainDispatcher / mock dispatcher (if state-machine related) | The relevant src/test/java |
| 5 | Integration test using MiniTezCluster (if end-to-end behavior changed) | tez-tests/src/test/java/org/apache/tez/test/ |
| 6 | Validation report (output of mvn test -pl <module>, checkstyle, spotbugs, RAT) | capstone-work/validation.md |
| 7 | GitHub PR against apache/tez:master (or .patch file attached to JIRA) | https://github.com/apache/tez/pulls |
| 8 | JIRA updated: status = "Patch Available," PR linked, release-notes filled if user-visible | https://issues.apache.org/jira/browse/TEZ-NNNN |
| 9 | Engineering write-up (500–1000 words: problem, investigation, design, alternatives, lessons) | Personal blog, Apache wiki page, or dev@ summary |
Every one. No exceptions. The write-up is not optional — it is how the community (and your future self) learns from your investigation.
100-Point Rubric Summary
The full rubric lives in evaluation-rubric.md. Headline:
| Area | Weight |
|---|---|
| Problem articulation (symptom vs. root cause separation, conditions) | 20 |
| Execution-path mastery (file:line citations, diagram, accuracy) | 20 |
| Implementation quality (minimum diff, conventions, no scope creep) | 20 |
| Testing (unit + integration, deterministic, coverage) | 15 |
| Review responsiveness (addresses comments, iteration cadence) | 10 |
| Documentation (JIRA, code comments, write-up) | 10 |
| Community interaction (mailing-list etiquette, handoff hygiene) | 5 |
Tier thresholds:
- 80+ — credible Tez contributor. You can sustain a steady patch flow.
- 90+ — committer-ready. You are doing work a committer would do without hand-holding.
- 95+ — PMC-track. You are leading work others want to follow.
You will self-grade in Step 10. Be honest. Inflated self-grades are visible from orbit when a committer reads your write-up.
Timeline
The Capstone is a 4–6 week effort if you have one focused evening per weekday plus weekend mornings. Less than that and you risk losing context between sessions (which is far more expensive than people expect for state-machine code).
| Week | Steps | Hours |
|---|---|---|
| 1 | 1–2: Pick an issue, build a deterministic reproducer | 10–15 |
| 2 | 3–4: Trace execution, identify root cause | 12–18 |
| 3 | 5–6: Implement fix, write unit + integration tests | 12–18 |
| 4 | 7–8: Validate, prepare patch / PR, push | 8–12 |
| 5 | 8–9: Review iteration (two or three rounds is normal) | 6–10 |
| 6 | 10: Write-up, JIRA cleanup, retrospective | 4–6 |
If you blow past six weeks, that is a signal — not a failure. Either the issue is
larger than it looked (in which case, pause and renegotiate scope in the JIRA), or
you are stuck on a specific step (in which case, ask on dev@tez.apache.org).
Success Indicators
You will know it is working when:
- A committer comments "+1" or "LGTM, will commit shortly" on your PR.
- Your fix appears in
git log apache/masterwith(cherry picked from commit ...)landing on the next release branch. - The JIRA you claimed flips to "Resolved / Fixed in X.Y.Z" with your name on it.
- Your write-up gets traffic — search-engine hits, a comment from another
contributor, a question on
user@. - The next time you pick a JIRA, you reach root cause in days, not weeks.
You will know it is failing when:
- You are still editing files in Step 5 with no failing test in hand from Step 2.
- Your PR description says "I think this might fix it."
- You have not run
mvn test -pl tez-dagend-to-end in over a week. - You are arguing in PR comments instead of changing code or asking questions.
If you spot a failure signal, do not push through. Stop, reread the relevant step chapter, and reset.
How to Use This Chapter
Read all ten step-chapters once, end-to-end, before you start Step 1. You need the shape of the whole journey in your head — Step 4 (root cause) makes choices that Step 6 (testing) depends on; Step 8 (patch) assumes you have artifacts from Steps 2 and 7. Skim now, deep-read each as you arrive at it.
Then go to Step 1: Issue Selection. Pick the issue. The clock starts when you comment "Working on this" on the JIRA.
Validation / Self-check
Before starting Step 1, confirm:
- You can produce, from memory, the file path of
DAGAppMaster,DAGImpl,VertexImpl,TaskImpl, andAsyncDispatcher. mvn clean install -DskipTestscompletes against your local~/tez-src/clone.mvn test -pl tez-tests -Dtest=TestOrderedWordCountpasses.- You have a
capstone-work/directory in your fork ready for theroot-cause.md,validation.md, andwriteup.mddeliverables. - You have skimmed every step-chapter once.
- You have set aside 4–6 calendar weeks with realistic time budget.
- You have subscribed to
dev@tez.apache.org(sendsubscribetodev-subscribe@tez.apache.org) andissues@tez.apache.org.