Step 8: Patch Preparation

You have working code, working tests, and a green validation run. Now you package the change so it can land. Modern Tez does this via GitHub PR; older Tez (and still some committers' preference) is .patch files attached to the JIRA. You should know how to do both.

This step is the easiest to skip past and the easiest to lose a week on if you do it sloppily. Treat the PR title, description, and commit message as seriously as the code.


Modern Tez: GitHub Pull Request

Apache Tez has been on GitHub Issues + PRs (mirrored to JIRA) for several years. The flow:

# 1. Make sure your branch is up to date with master
cd ~/tez-src
git remote -v
# origin    git@github.com:<you>/tez.git
# apache    https://github.com/apache/tez.git

git fetch apache
git checkout tez-NNNN-<slug>
git rebase apache/master
# Resolve any conflicts. Rebuild and re-run your tests after rebase.

# 2. Squash to a single clean commit (or 2-3 if logically separable)
git rebase -i apache/master
# In the editor: pick the first commit, squash the rest. Edit the combined
# commit message to one final version.

# 3. Push to your fork
git push --force-with-lease origin tez-NNNN-<slug>

# 4. Open a PR via https://github.com/apache/tez
# Title: TEZ-NNNN: <short summary, present tense>
# Base: apache/tez:master

--force-with-lease instead of --force: protects against overwriting a collaborator's commit if someone pushed to your branch between your fetch and your push.

Commit Message Template

TEZ-NNNN: Fix VertexImpl recovery branch short-circuiting non-recovery path

Reverts the unconditional short-circuit added in TEZ-2877 so that
V_TASK_COMPLETED events on the final task are processed by the standard
transition when no recovery is in progress. The original short-circuit
assumed any non-null recoveryData implies an active replay; this assumption
broke when recoveryData is populated speculatively at vertex initialization
even though no replay will occur.

The fix gates the recovery branch on the new isReplayingRecovery() predicate.
The previous behavior is preserved for actual recovery scenarios.

Tests:
- New unit test TestVertexImpl#testV_TASK_COMPLETED_inRunningWithRecovery
- New integration test in tez-tests verifying end-to-end DAG success
  with recoveryData populated.
- Existing TestOrderedWordCount and full tez-dag suite pass.

Note the shape:

  • Title line: TEZ-NNNN: <verb-phrase, present tense, < 72 chars>.
  • Blank line.
  • Body paragraphs: what the change does, why, what assumption broke.
  • Tests: explicit list of tests added or affected.

No "should fix" or "I think." Past tense for what you did, present tense for what the code does after the change.


PR Title

TEZ-NNNN: <Short imperative summary>
  • TEZ-NNNN prefix is mandatory. The bot uses it to link to JIRA.
  • Imperative mood: "Fix race", "Add config", "Avoid NPE". Not "Fixed race", not "Fixing race".
  • < 72 characters total including the prefix.

Bad: Fix bug, Updates to VertexImpl, My fix for TEZ-NNNN. Good: TEZ-4567: Honor isReplayingRecovery in VertexImpl completion path.


PR Description Template

## JIRA

https://issues.apache.org/jira/browse/TEZ-NNNN

## Problem

<2-4 sentences. Symptom + trigger conditions. Cite the root-cause doc.>

When the last `V_TASK_COMPLETED` event arrives for a vertex with non-null
`recoveryData` outside an actual recovery replay, the event is unconditionally
re-routed through `handleRecovery()` rather than processed by the standard
transition. As a result, `completedTaskCount` is not incremented and the vertex
fails to transition to SUCCEEDED. This affects DAGs whose AM populates
recovery data speculatively at vertex initialization.

## Root cause

See `capstone-work/root-cause.md` (or paste inline if short).

Introduced in TEZ-2877 (commit a1b2c3d4).

## Fix

Gate the recovery short-circuit on a new `isReplayingRecovery()` predicate
that returns true only during active replay. Minimum-diff (one production
line + one new private method).

## Testing

- **New:** `TestVertexImpl#testV_TASK_COMPLETED_inRunningWithRecovery` —
  unit test using `DrainDispatcher` that reproduces the failure on master
  and passes with this fix. Plus a negative-control test.
- **New:** `TestTezNNNNFixIntegration` — `MiniTezCluster` end-to-end test
  that runs a 20-task vertex with speculatively-populated recoveryData and
  asserts DAG SUCCEEDED.
- **Existing:** Full `tez-dag` suite (1342 tests) passes. `TestOrderedWordCount`
  passes. Validation report in commit message footer.

## Backward compatibility

None affected. The fix changes behavior only for the broken case (no replay
in progress). Recovery scenarios are unchanged.

## Configuration

No new keys.

Adjust sections to your fix. The structure stays the same.


GitHub Actions / Yetus Precommit

When you open the PR, GitHub Actions runs the precommit checks. The full config lives in .github/workflows/ — read it:

ls ~/tez-src/.github/workflows/
cat ~/tez-src/.github/workflows/build.yml

Common checks (subject to change as the workflow evolves):

CheckWhat it runsFailure means
Compilemvn install -DskipTestsBuild broken on some module
Testsmvn test for each moduleSome test failed (yours or flake)
Checkstylemvn checkstyle:checkStyle violation in changed file
Javadocmvn javadoc:javadocBroken Javadoc reference or missing tag
RATmvn apache-rat:checkNew file missing ASL header
LicenseLicense snippet checkSame as RAT, or LICENSE/NOTICE drift
SpotBugsmvn spotbugs:checkNew static-analysis warning

Failures appear on the PR as red ✖ marks. Click into the failing job to read the log. Common first-PR failures:

  • Javadoc broken: You referenced {@link Foo#bar} and bar doesn't exist. Either fix the link or remove it.
  • Checkstyle: A line exceeded 120 chars or an unused import slipped in.
  • License: New file missing the header. Add it.
  • Test: A flake. Re-run the workflow ("Re-run all failed jobs" in the Actions tab). If it goes green on retry, leave a comment: "Re-ran job — flake, see TEZ-XYZ." If it fails again, your fix probably broke it.

Push fixes as new commits on the same branch. The PR auto-updates. After review approval, you'll squash on merge.


Old-Style: .patch Files on JIRA

Some committers still review .patch attachments. Know the convention.

Generate

git format-patch apache/master -o /tmp/
# Produces /tmp/0001-TEZ-NNNN-Fix-...patch

Or, for one combined diff:

git diff apache/master..HEAD > /tmp/TEZ-NNNN.01.patch

Naming convention

TEZ-<NNNN>.<iteration>.patch. So your first attachment is TEZ-4567.01.patch, second iteration after review feedback is TEZ-4567.02.patch. Some committers use TEZ-4567.001.patch (three-digit). Match whatever pattern the most recent committer used on that issue.

For a branch-specific patch (e.g. against the branch-0.10):

TEZ-4567.branch-0.10.01.patch.

Attach

In JIRA: "Attach files" → upload. Then "More" → "Patch Available" to flip the state. Cancel patch (revert to "In Progress") if you find a problem before review starts.

The JIRA workflow is covered fully in Step 9. The patch-file mechanics live here.


Rebasing on Master Without Losing Review Comments

GitHub's PR view loses inline comment threads when you force-push a rebase that changes the SHAs reviewers commented on. To minimize the damage:

  1. Don't rebase mid-review unless you have to. Merge-commits from apache/master into your branch are usually acceptable during active review; squash at the end.
  2. When you do rebase, leave a comment: "Force-pushed to rebase on master (was <old SHA>, now <new SHA>). All review threads should still be visible against the latest commit."
  3. Squash only at the very end, after approval, just before merge.
  4. If you really break the comment threads, post a summary comment listing "what was at line X became line Y in the new push." Reviewers appreciate it.

To rebase:

git fetch apache
git checkout tez-NNNN-<slug>
git rebase apache/master
# If conflicts: edit, git add, git rebase --continue
mvn install -DskipTests -pl <module> -am -q
mvn test -pl <module> -Dtest=<YourTests> -q
git push --force-with-lease origin tez-NNNN-<slug>

Co-Author and Sign-Off

If a committer or another contributor materially helped (suggested the fix direction, found the root cause), credit them:

TEZ-NNNN: <summary>

<body>

Co-authored-by: Alice <alice@example.org>

Tez does not require a Signed-off-by line (it is not a DCO project — it requires an Apache CLA), but committers appreciate when you note influences in the commit message.


What Reviewers Look For First

In rough order:

  1. PR title and JIRA link — wrong format, instant correction request.
  2. Description quality — vague description, "please clarify" comment.
  3. Diff size — > ~200 lines for a "bug fix" gets scrutiny on scope creep.
  4. Tests present — no tests, immediate request.
  5. Tests fail on master, pass with fix — confirms the test is actually testing the fix, not just a happy path.
  6. Production diff is minimum to fix the bug — every extra change has to justify itself.
  7. Style and convention compliance — checkstyle and tests must be green.
  8. API hygiene — no public methods added/removed without discussion.
  9. Backward compatibility — does the fix change observable behavior for non-buggy cases? If yes, was it discussed?

Optimize for the first seven before you push. The last two are usually discussed in JIRA comments before the PR opens.


Validation / Self-check

Before advancing to Step 9:

  1. PR exists on apache/tez with the format TEZ-NNNN: <summary>.
  2. PR description follows the template; cites the root-cause document.
  3. Commit message follows the template (title, body, tests footer).
  4. GitHub Actions precommit is green (every check), or every red has a documented and accepted explanation.
  5. Branch is rebased on a recent apache/master (within last 24-48h ideally).
  6. PR was opened with the URL pasted into the JIRA as a comment.
  7. You can articulate in one paragraph why every line of your diff is necessary, if a reviewer asks.