Patch Quality

A "patch" in Apache parlance is a unified diff attached to a JIRA (or, more recently, a GitHub PR linked from a JIRA). This chapter tells you what a committer is looking for when they open it for the first time. Internalising these expectations is the difference between a patch that gets committed in two review rounds and one that dies after a "please rebase" comment in month three.

What Committers Look For — In Reading Order

A committer reviewing your patch does, roughly, this:

1. Read JIRA description.        (30 sec)
2. Open the patch, skim the diff stat.   (30 sec)
3. Look at tests.                (2 min)
4. Look at the implementation.   (5 min)
5. Run mvn install / mvn test.   (background)
6. Comment.                      (variable)

Notice tests come before implementation. If the test diff is empty or weak, the implementation is read with suspicion. If the test diff is strong and minimal, the implementation is read with trust.

Rule 1: Minimum Diff

The single rule that most distinguishes a strong patch from a weak one. The diff should contain only the changes that the JIRA describes. Not:

  • A whitespace cleanup of the surrounding method
  • A rename of an unrelated variable you didn't like
  • An import reorder by your IDE
  • A bumped dependency version "while you were here"
  • A reformatted block

Every line you change costs the reviewer attention. Lines that don't serve the JIRA are a tax on the review.

Check before submitting:

cd ~/tez-src
git diff --stat origin/master
git diff origin/master | head -50

If git diff --stat shows changes in files unrelated to the JIRA, revert them:

git checkout origin/master -- path/to/unrelated/file

Rule 2: No Unrelated Changes

The corollary to Rule 1. Even within a touched file, do not bundle unrelated improvements. If you notice a separate bug while fixing your bug:

# don't fix it here. Open a separate JIRA:
echo "Noticed: VertexImpl.java:842 catches Exception too broadly" >> ~/tez-notes/queue.md

File a follow-up JIRA at the end of the week. Two small patches beat one mixed patch every time.

Rule 3: Apache Commit Message Format

The exact format used in git log for committed Tez changes:

TEZ-NNNN: <short imperative summary, under 72 chars>. (<contributor-name> via <committer-name>)

Verify with:

cd ~/tez-src
git log --oneline -20

You will see lines like:

abc1234 TEZ-4321: Fix NPE in VertexImpl.recover when no inputs. (Jane Doe via gunther)
def5678 TEZ-4322: Add MR compat test for vectorized output. (John Smith via gopalv)

When you submit, your commit message has the contributor side only:

TEZ-4321: Fix NPE in VertexImpl.recover when no inputs.

The committer appends (Jane Doe via <committer>) at commit time. Don't pre-fill it.

The summary line rules:

  • Imperative mood: "Fix", "Add", "Remove", "Refactor" — not "Fixed", "Adding".
  • Under 72 characters.
  • Ends with a period.
  • No trailing whitespace.

If the change needs more explanation, leave one blank line and add a body wrapped at 72 columns:

TEZ-4321: Fix NPE in VertexImpl.recover when no inputs.

When a vertex has no Inputs (a root data-source vertex with no
upstream edges), VertexImpl.recover called .iterator() on a null
inputs collection. The fix initialises inputs to an empty list in
the recover path.

Adds TestVertexImpl.testRecoverNoInputs covering the case.

Rule 4: Tests for Behavior Changes

Any behavior change must come with a test. This includes bug fixes — the test should fail before your fix and pass after. Verify:

cd ~/tez-src
# stash your fix
git stash
# run the new test
mvn test -pl tez-dag -Dtest=TestVertexImpl#testRecoverNoInputs
# it should fail
git stash pop
mvn test -pl tez-dag -Dtest=TestVertexImpl#testRecoverNoInputs
# now it should pass

If your "bug fix" passes with the test added but without the fix applied, your test doesn't actually exercise the bug.

Exceptions where a test is not required:

Change typeTest needed?
Javadoc fixNo
Log message string changeNo
Comment / formatting (rare; should be its own patch)No
Build / Maven config changeUsually no, but justify
Behavior changeYes, always

Rule 5: No Whitespace Churn

Whitespace-only diff lines are noise. IDEs love to insert them — turn off "format on save" for tez-src, or restrict it to lines you edited.

Detect before submitting:

cd ~/tez-src
git diff -w origin/master --stat
git diff origin/master --stat

If the second shows many more changed files than the first, you have whitespace churn. Either clean it up or, if it's pervasive, configure your editor and re-do the change.

Rule 6: Javadoc for @Public API

If you add or modify a method on a class annotated @InterfaceAudience.Public, it needs javadoc. The check:

cd ~/tez-src
grep -l "@InterfaceAudience.Public" tez-api/src/main/java -r | head

For each such class, every public method has Javadoc with at least:

  • One-sentence summary
  • @param for each parameter
  • @return for non-void
  • @throws for any non-RuntimeException declared exception

If your patch adds a new public method without Javadoc, expect the first review comment to ask for it.

Rule 7: @InterfaceAudience and @InterfaceStability Annotations

Every public-ish class in tez-api is annotated. Example from Vertex.java:

@Public
@Evolving
public class Vertex {
    ...
}

The grid:

@Stable@Evolving@Unstable
@PublicCompat guaranteed across minor versionsMay change between minor versions with warningMay change between any release
@LimitedPrivate({"Hive"})Stable for named projectsEvolving for named projectsUnstable, named projects only
@PrivateInternal; do not depend onInternalInternal

When you add a new class to tez-api, you must annotate it. The annotations live in tez-api/src/main/java/org/apache/hadoop/classification/. When in doubt, default to:

@Public
@Unstable

so users see the class but know not to depend on its shape yet.

Rule 8: Pre-Submit Checklist

Before you upload TEZ-NNNN.001.patch, run each of these and have all pass.

cd ~/tez-src

# 1. Full compile, all modules, no tests.
mvn install -DskipTests

# 2. Checkstyle. Tez uses the config in tez-tools/.
mvn checkstyle:check

# 3. Tests in modules you changed.
# For tez-dag, tez-api, etc.:
mvn test -pl tez-dag
mvn test -pl tez-api

# 4. A representative integration test.
mvn test -pl tez-tests -Dtest=TestOrderedWordCount

# 5. Patch applies cleanly to current master.
git fetch origin
git rebase origin/master
git diff origin/master > /tmp/TEZ-NNNN.001.patch
cd /tmp
git -C ~/tez-src apply --check TEZ-NNNN.001.patch

If any step fails, fix and re-run. Submit only when all pass.

Rule 9: Patch Generation

Generate the patch from a clean rebase against origin/master:

cd ~/tez-src
git fetch origin
git rebase origin/master           # resolves conflicts now, not at commit time
git diff origin/master --no-color --unified=5 > TEZ-NNNN.001.patch

The --unified=5 gives reviewers 5 lines of context instead of the default 3. This is a small kindness that makes review materially easier.

Inspect the patch before attaching:

wc -l TEZ-NNNN.001.patch          # how big is it?
head -30 TEZ-NNNN.001.patch       # right files?
grep -c "^+" TEZ-NNNN.001.patch   # added lines
grep -c "^-" TEZ-NNNN.001.patch   # removed lines

A patch of 50–300 lines is comfortable for a single review round. A patch over 1000 lines will sit unreviewed until you split it.

Worked Example — A Minimal Trivial Patch

A real-shape patch for a Javadoc fix on Vertex.java:

diff --git a/tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java b/tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java
index abcdef1..1234567 100644
--- a/tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java
+++ b/tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java
@@ -180,7 +180,10 @@ public class Vertex {
   }

   /**
-   * Set the parallelism.
+   * Set the parallelism (number of tasks) for this Vertex.
+   *
+   * @param parallelism the number of tasks. Must be > 0 unless
+   *                    {@link #setVertexManagerPlugin} configures a dynamic plugin.
+   * @return this Vertex, for chaining.
    */
   public Vertex setParallelism(int parallelism) {

That's the entire patch — 5 changed lines, +6/-1. No test (Javadoc only). It passes checkstyle:check, mvn install -DskipTests, and the JIRA is TEZ-NNNN: Improve Javadoc on Vertex#setParallelism.

Anti-Patterns

What committers flag immediately:

Anti-patternWhy it's flagged
Reformat of an entire fileHides the real change
// TODO: refactor comment addedShould be a separate JIRA
System.out.println left inUse LOG, never System.out
e.printStackTrace()Use LOG.warn(msg, e)
Catch Exception swallowing everythingCatch specific or rethrow
New configuration key with no @Public annotationWon't be honored as stable
New method with throws ExceptionUse specific exceptions
Test that always passes (no assertion)Useless
Test depending on wall-clock timingFlaky
@Ignore added to silence a failing testFix it or revert

Validation Artifacts

After this chapter you should have:

  1. A ~/tez-notes/precommit.sh script running the seven pre-submit commands above.
  2. One actual patch file TEZ-NNNN.001.patch on disk, even if you haven't uploaded it.
  3. A ~/tez-notes/patch-checklist.md cheatsheet from Rule 8.
  4. Knowledge of the @InterfaceAudience / @InterfaceStability matrix.

The next chapter — Responding to Feedback — covers what happens after you press "Attach".