Capstone Project

The Capstone is the bridge from "I have read the Tez codebase" to "I have shipped a non-trivial fix that an Apache Tez committer merged into master." Everything in Levels 1–7 was preparation. This is the work.

You will pick one real, open Apache Tez JIRA, reproduce it against a current build, trace the failure through the codebase, identify the root cause, write a minimum-diff patch with deterministic tests, get it through precommit (Yetus / GitHub Actions), respond to review comments, and land the change. Then you write it up so the next person can learn from your investigation.

This chapter is the table of contents. The ten step-chapters that follow are the work itself.


Prerequisites

Do not start the Capstone until you can answer "yes" to every one of these:

  • Level 1–7 complete. You can read DAGImpl, VertexImpl, TaskImpl, TaskAttemptImpl, AsyncDispatcher, the shuffle path (ShuffleManager, Fetcher, MergeManager), and at least one VertexManagerPlugin (ShuffleVertexManager or RootInputVertexManager) without a guide open.
  • You have built Tez from source. mvn clean install -DskipTests succeeds on your machine, and mvn test -pl tez-dag finishes (some flakes are normal — see Stage 9 of the issue roadmap).
  • You have run MiniTezCluster locally. mvn test -pl tez-tests -Dtest=TestOrderedWordCount goes green.
  • You have a working JIRA + Apache ID (or a GitHub account ready to PR).
  • You have read the Tez contribution guide: https://tez.apache.org/contribution_guide.html and https://cwiki.apache.org/confluence/display/TEZ/How+to+Contribute.

If any of these is "no," stop. Go back. The Capstone is unforgiving of partial preparation — you will spend three weeks confused instead of three weeks shipping.


The 10-Step Flow

flowchart TD
    A[Step 1: Issue Selection] --> B[Step 2: Reproduction]
    B --> C[Step 3: Execution Path Analysis]
    C --> D[Step 4: Root Cause Identification]
    D --> E[Step 5: Implementation]
    E --> F[Step 6: Testing]
    F --> G[Step 7: Validation]
    G --> H[Step 8: Patch / PR]
    H --> I[Step 9: JIRA + Docs]
    I --> J[Step 10: Engineering Write-Up]
    G -.fail.-> D
    F -.fail.-> E
    H -.review.-> E

The dotted arrows are the loops you will actually run. Nobody gets root cause right on the first hypothesis. Nobody passes precommit on the first push. Plan for two or three iterations through Steps 4–8 before you land.


Deliverables

By the time you mark the Capstone done, every one of these artifacts exists:

#ArtifactLives in
1Failing reproducer test (a JUnit test that fails on master without your patch and passes with it)tez-tests/ or a module-local src/test/java/...
2Root-cause document (200–500 words, with file:line citations)capstone-work/root-cause.md in your fork
3Minimum-diff patchA branch on your fork of apache/tez
4Unit tests using DrainDispatcher / mock dispatcher (if state-machine related)The relevant src/test/java
5Integration test using MiniTezCluster (if end-to-end behavior changed)tez-tests/src/test/java/org/apache/tez/test/
6Validation report (output of mvn test -pl <module>, checkstyle, spotbugs, RAT)capstone-work/validation.md
7GitHub PR against apache/tez:master (or .patch file attached to JIRA)https://github.com/apache/tez/pulls
8JIRA updated: status = "Patch Available," PR linked, release-notes filled if user-visiblehttps://issues.apache.org/jira/browse/TEZ-NNNN
9Engineering write-up (500–1000 words: problem, investigation, design, alternatives, lessons)Personal blog, Apache wiki page, or dev@ summary

Every one. No exceptions. The write-up is not optional — it is how the community (and your future self) learns from your investigation.


100-Point Rubric Summary

The full rubric lives in evaluation-rubric.md. Headline:

AreaWeight
Problem articulation (symptom vs. root cause separation, conditions)20
Execution-path mastery (file:line citations, diagram, accuracy)20
Implementation quality (minimum diff, conventions, no scope creep)20
Testing (unit + integration, deterministic, coverage)15
Review responsiveness (addresses comments, iteration cadence)10
Documentation (JIRA, code comments, write-up)10
Community interaction (mailing-list etiquette, handoff hygiene)5

Tier thresholds:

  • 80+ — credible Tez contributor. You can sustain a steady patch flow.
  • 90+ — committer-ready. You are doing work a committer would do without hand-holding.
  • 95+ — PMC-track. You are leading work others want to follow.

You will self-grade in Step 10. Be honest. Inflated self-grades are visible from orbit when a committer reads your write-up.


Timeline

The Capstone is a 4–6 week effort if you have one focused evening per weekday plus weekend mornings. Less than that and you risk losing context between sessions (which is far more expensive than people expect for state-machine code).

WeekStepsHours
11–2: Pick an issue, build a deterministic reproducer10–15
23–4: Trace execution, identify root cause12–18
35–6: Implement fix, write unit + integration tests12–18
47–8: Validate, prepare patch / PR, push8–12
58–9: Review iteration (two or three rounds is normal)6–10
610: Write-up, JIRA cleanup, retrospective4–6

If you blow past six weeks, that is a signal — not a failure. Either the issue is larger than it looked (in which case, pause and renegotiate scope in the JIRA), or you are stuck on a specific step (in which case, ask on dev@tez.apache.org).


Success Indicators

You will know it is working when:

  1. A committer comments "+1" or "LGTM, will commit shortly" on your PR.
  2. Your fix appears in git log apache/master with (cherry picked from commit ...) landing on the next release branch.
  3. The JIRA you claimed flips to "Resolved / Fixed in X.Y.Z" with your name on it.
  4. Your write-up gets traffic — search-engine hits, a comment from another contributor, a question on user@.
  5. The next time you pick a JIRA, you reach root cause in days, not weeks.

You will know it is failing when:

  1. You are still editing files in Step 5 with no failing test in hand from Step 2.
  2. Your PR description says "I think this might fix it."
  3. You have not run mvn test -pl tez-dag end-to-end in over a week.
  4. You are arguing in PR comments instead of changing code or asking questions.

If you spot a failure signal, do not push through. Stop, reread the relevant step chapter, and reset.


How to Use This Chapter

Read all ten step-chapters once, end-to-end, before you start Step 1. You need the shape of the whole journey in your head — Step 4 (root cause) makes choices that Step 6 (testing) depends on; Step 8 (patch) assumes you have artifacts from Steps 2 and 7. Skim now, deep-read each as you arrive at it.

Then go to Step 1: Issue Selection. Pick the issue. The clock starts when you comment "Working on this" on the JIRA.


Validation / Self-check

Before starting Step 1, confirm:

  1. You can produce, from memory, the file path of DAGAppMaster, DAGImpl, VertexImpl, TaskImpl, and AsyncDispatcher.
  2. mvn clean install -DskipTests completes against your local ~/tez-src/ clone.
  3. mvn test -pl tez-tests -Dtest=TestOrderedWordCount passes.
  4. You have a capstone-work/ directory in your fork ready for the root-cause.md, validation.md, and writeup.md deliverables.
  5. You have skimmed every step-chapter once.
  6. You have set aside 4–6 calendar weeks with realistic time budget.
  7. You have subscribed to dev@tez.apache.org (send subscribe to dev-subscribe@tez.apache.org) and issues@tez.apache.org.