Step 8: Patch Preparation
You have working code, working tests, and a green validation run. Now you
package the change so it can land. Modern Tez does this via GitHub PR; older
Tez (and still some committers' preference) is .patch files attached to
the JIRA. You should know how to do both.
This step is the easiest to skip past and the easiest to lose a week on if you do it sloppily. Treat the PR title, description, and commit message as seriously as the code.
Modern Tez: GitHub Pull Request
Apache Tez has been on GitHub Issues + PRs (mirrored to JIRA) for several years. The flow:
# 1. Make sure your branch is up to date with master
cd ~/tez-src
git remote -v
# origin git@github.com:<you>/tez.git
# apache https://github.com/apache/tez.git
git fetch apache
git checkout tez-NNNN-<slug>
git rebase apache/master
# Resolve any conflicts. Rebuild and re-run your tests after rebase.
# 2. Squash to a single clean commit (or 2-3 if logically separable)
git rebase -i apache/master
# In the editor: pick the first commit, squash the rest. Edit the combined
# commit message to one final version.
# 3. Push to your fork
git push --force-with-lease origin tez-NNNN-<slug>
# 4. Open a PR via https://github.com/apache/tez
# Title: TEZ-NNNN: <short summary, present tense>
# Base: apache/tez:master
--force-with-lease instead of --force: protects against overwriting a
collaborator's commit if someone pushed to your branch between your fetch
and your push.
Commit Message Template
TEZ-NNNN: Fix VertexImpl recovery branch short-circuiting non-recovery path
Reverts the unconditional short-circuit added in TEZ-2877 so that
V_TASK_COMPLETED events on the final task are processed by the standard
transition when no recovery is in progress. The original short-circuit
assumed any non-null recoveryData implies an active replay; this assumption
broke when recoveryData is populated speculatively at vertex initialization
even though no replay will occur.
The fix gates the recovery branch on the new isReplayingRecovery() predicate.
The previous behavior is preserved for actual recovery scenarios.
Tests:
- New unit test TestVertexImpl#testV_TASK_COMPLETED_inRunningWithRecovery
- New integration test in tez-tests verifying end-to-end DAG success
with recoveryData populated.
- Existing TestOrderedWordCount and full tez-dag suite pass.
Note the shape:
- Title line:
TEZ-NNNN: <verb-phrase, present tense, < 72 chars>. - Blank line.
- Body paragraphs: what the change does, why, what assumption broke.
- Tests: explicit list of tests added or affected.
No "should fix" or "I think." Past tense for what you did, present tense for what the code does after the change.
PR Title
TEZ-NNNN: <Short imperative summary>
TEZ-NNNNprefix is mandatory. The bot uses it to link to JIRA.- Imperative mood: "Fix race", "Add config", "Avoid NPE". Not "Fixed race", not "Fixing race".
- < 72 characters total including the prefix.
Bad: Fix bug, Updates to VertexImpl, My fix for TEZ-NNNN.
Good: TEZ-4567: Honor isReplayingRecovery in VertexImpl completion path.
PR Description Template
## JIRA
https://issues.apache.org/jira/browse/TEZ-NNNN
## Problem
<2-4 sentences. Symptom + trigger conditions. Cite the root-cause doc.>
When the last `V_TASK_COMPLETED` event arrives for a vertex with non-null
`recoveryData` outside an actual recovery replay, the event is unconditionally
re-routed through `handleRecovery()` rather than processed by the standard
transition. As a result, `completedTaskCount` is not incremented and the vertex
fails to transition to SUCCEEDED. This affects DAGs whose AM populates
recovery data speculatively at vertex initialization.
## Root cause
See `capstone-work/root-cause.md` (or paste inline if short).
Introduced in TEZ-2877 (commit a1b2c3d4).
## Fix
Gate the recovery short-circuit on a new `isReplayingRecovery()` predicate
that returns true only during active replay. Minimum-diff (one production
line + one new private method).
## Testing
- **New:** `TestVertexImpl#testV_TASK_COMPLETED_inRunningWithRecovery` —
unit test using `DrainDispatcher` that reproduces the failure on master
and passes with this fix. Plus a negative-control test.
- **New:** `TestTezNNNNFixIntegration` — `MiniTezCluster` end-to-end test
that runs a 20-task vertex with speculatively-populated recoveryData and
asserts DAG SUCCEEDED.
- **Existing:** Full `tez-dag` suite (1342 tests) passes. `TestOrderedWordCount`
passes. Validation report in commit message footer.
## Backward compatibility
None affected. The fix changes behavior only for the broken case (no replay
in progress). Recovery scenarios are unchanged.
## Configuration
No new keys.
Adjust sections to your fix. The structure stays the same.
GitHub Actions / Yetus Precommit
When you open the PR, GitHub Actions runs the precommit checks. The full
config lives in .github/workflows/ — read it:
ls ~/tez-src/.github/workflows/
cat ~/tez-src/.github/workflows/build.yml
Common checks (subject to change as the workflow evolves):
| Check | What it runs | Failure means |
|---|---|---|
| Compile | mvn install -DskipTests | Build broken on some module |
| Tests | mvn test for each module | Some test failed (yours or flake) |
| Checkstyle | mvn checkstyle:check | Style violation in changed file |
| Javadoc | mvn javadoc:javadoc | Broken Javadoc reference or missing tag |
| RAT | mvn apache-rat:check | New file missing ASL header |
| License | License snippet check | Same as RAT, or LICENSE/NOTICE drift |
| SpotBugs | mvn spotbugs:check | New static-analysis warning |
Failures appear on the PR as red ✖ marks. Click into the failing job to read the log. Common first-PR failures:
- Javadoc broken: You referenced
{@link Foo#bar}andbardoesn't exist. Either fix the link or remove it. - Checkstyle: A line exceeded 120 chars or an unused import slipped in.
- License: New file missing the header. Add it.
- Test: A flake. Re-run the workflow ("Re-run all failed jobs" in the Actions tab). If it goes green on retry, leave a comment: "Re-ran job — flake, see TEZ-XYZ." If it fails again, your fix probably broke it.
Push fixes as new commits on the same branch. The PR auto-updates. After review approval, you'll squash on merge.
Old-Style: .patch Files on JIRA
Some committers still review .patch attachments. Know the convention.
Generate
git format-patch apache/master -o /tmp/
# Produces /tmp/0001-TEZ-NNNN-Fix-...patch
Or, for one combined diff:
git diff apache/master..HEAD > /tmp/TEZ-NNNN.01.patch
Naming convention
TEZ-<NNNN>.<iteration>.patch. So your first attachment is TEZ-4567.01.patch,
second iteration after review feedback is TEZ-4567.02.patch. Some committers
use TEZ-4567.001.patch (three-digit). Match whatever pattern the most
recent committer used on that issue.
For a branch-specific patch (e.g. against the branch-0.10):
TEZ-4567.branch-0.10.01.patch.
Attach
In JIRA: "Attach files" → upload. Then "More" → "Patch Available" to flip the state. Cancel patch (revert to "In Progress") if you find a problem before review starts.
The JIRA workflow is covered fully in Step 9. The patch-file mechanics live here.
Rebasing on Master Without Losing Review Comments
GitHub's PR view loses inline comment threads when you force-push a rebase that changes the SHAs reviewers commented on. To minimize the damage:
- Don't rebase mid-review unless you have to. Merge-commits from
apache/masterinto your branch are usually acceptable during active review; squash at the end. - When you do rebase, leave a comment: "Force-pushed to rebase on
master(was<old SHA>, now<new SHA>). All review threads should still be visible against the latest commit." - Squash only at the very end, after approval, just before merge.
- If you really break the comment threads, post a summary comment listing "what was at line X became line Y in the new push." Reviewers appreciate it.
To rebase:
git fetch apache
git checkout tez-NNNN-<slug>
git rebase apache/master
# If conflicts: edit, git add, git rebase --continue
mvn install -DskipTests -pl <module> -am -q
mvn test -pl <module> -Dtest=<YourTests> -q
git push --force-with-lease origin tez-NNNN-<slug>
Co-Author and Sign-Off
If a committer or another contributor materially helped (suggested the fix direction, found the root cause), credit them:
TEZ-NNNN: <summary>
<body>
Co-authored-by: Alice <alice@example.org>
Tez does not require a Signed-off-by line (it is not a DCO project — it requires an Apache CLA), but committers appreciate when you note influences in the commit message.
What Reviewers Look For First
In rough order:
- PR title and JIRA link — wrong format, instant correction request.
- Description quality — vague description, "please clarify" comment.
- Diff size — > ~200 lines for a "bug fix" gets scrutiny on scope creep.
- Tests present — no tests, immediate request.
- Tests fail on master, pass with fix — confirms the test is actually testing the fix, not just a happy path.
- Production diff is minimum to fix the bug — every extra change has to justify itself.
- Style and convention compliance — checkstyle and tests must be green.
- API hygiene — no public methods added/removed without discussion.
- Backward compatibility — does the fix change observable behavior for non-buggy cases? If yes, was it discussed?
Optimize for the first seven before you push. The last two are usually discussed in JIRA comments before the PR opens.
Validation / Self-check
Before advancing to Step 9:
- PR exists on
apache/tezwith the formatTEZ-NNNN: <summary>. - PR description follows the template; cites the root-cause document.
- Commit message follows the template (title, body, tests footer).
- GitHub Actions precommit is green (every check), or every red has a documented and accepted explanation.
- Branch is rebased on a recent
apache/master(within last 24-48h ideally). - PR was opened with the URL pasted into the JIRA as a comment.
- You can articulate in one paragraph why every line of your diff is necessary, if a reviewer asks.