Patch Quality
A "patch" in Apache parlance is a unified diff attached to a JIRA (or, more recently, a GitHub PR linked from a JIRA). This chapter tells you what a committer is looking for when they open it for the first time. Internalising these expectations is the difference between a patch that gets committed in two review rounds and one that dies after a "please rebase" comment in month three.
What Committers Look For — In Reading Order
A committer reviewing your patch does, roughly, this:
1. Read JIRA description. (30 sec)
2. Open the patch, skim the diff stat. (30 sec)
3. Look at tests. (2 min)
4. Look at the implementation. (5 min)
5. Run mvn install / mvn test. (background)
6. Comment. (variable)
Notice tests come before implementation. If the test diff is empty or weak, the implementation is read with suspicion. If the test diff is strong and minimal, the implementation is read with trust.
Rule 1: Minimum Diff
The single rule that most distinguishes a strong patch from a weak one. The diff should contain only the changes that the JIRA describes. Not:
- A whitespace cleanup of the surrounding method
- A rename of an unrelated variable you didn't like
- An import reorder by your IDE
- A bumped dependency version "while you were here"
- A reformatted block
Every line you change costs the reviewer attention. Lines that don't serve the JIRA are a tax on the review.
Check before submitting:
cd ~/tez-src
git diff --stat origin/master
git diff origin/master | head -50
If git diff --stat shows changes in files unrelated to the JIRA, revert them:
git checkout origin/master -- path/to/unrelated/file
Rule 2: No Unrelated Changes
The corollary to Rule 1. Even within a touched file, do not bundle unrelated improvements. If you notice a separate bug while fixing your bug:
# don't fix it here. Open a separate JIRA:
echo "Noticed: VertexImpl.java:842 catches Exception too broadly" >> ~/tez-notes/queue.md
File a follow-up JIRA at the end of the week. Two small patches beat one mixed patch every time.
Rule 3: Apache Commit Message Format
The exact format used in git log for committed Tez changes:
TEZ-NNNN: <short imperative summary, under 72 chars>. (<contributor-name> via <committer-name>)
Verify with:
cd ~/tez-src
git log --oneline -20
You will see lines like:
abc1234 TEZ-4321: Fix NPE in VertexImpl.recover when no inputs. (Jane Doe via gunther)
def5678 TEZ-4322: Add MR compat test for vectorized output. (John Smith via gopalv)
When you submit, your commit message has the contributor side only:
TEZ-4321: Fix NPE in VertexImpl.recover when no inputs.
The committer appends (Jane Doe via <committer>) at commit time. Don't pre-fill it.
The summary line rules:
- Imperative mood: "Fix", "Add", "Remove", "Refactor" — not "Fixed", "Adding".
- Under 72 characters.
- Ends with a period.
- No trailing whitespace.
If the change needs more explanation, leave one blank line and add a body wrapped at 72 columns:
TEZ-4321: Fix NPE in VertexImpl.recover when no inputs.
When a vertex has no Inputs (a root data-source vertex with no
upstream edges), VertexImpl.recover called .iterator() on a null
inputs collection. The fix initialises inputs to an empty list in
the recover path.
Adds TestVertexImpl.testRecoverNoInputs covering the case.
Rule 4: Tests for Behavior Changes
Any behavior change must come with a test. This includes bug fixes — the test should fail before your fix and pass after. Verify:
cd ~/tez-src
# stash your fix
git stash
# run the new test
mvn test -pl tez-dag -Dtest=TestVertexImpl#testRecoverNoInputs
# it should fail
git stash pop
mvn test -pl tez-dag -Dtest=TestVertexImpl#testRecoverNoInputs
# now it should pass
If your "bug fix" passes with the test added but without the fix applied, your test doesn't actually exercise the bug.
Exceptions where a test is not required:
| Change type | Test needed? |
|---|---|
| Javadoc fix | No |
| Log message string change | No |
| Comment / formatting (rare; should be its own patch) | No |
| Build / Maven config change | Usually no, but justify |
| Behavior change | Yes, always |
Rule 5: No Whitespace Churn
Whitespace-only diff lines are noise. IDEs love to insert them — turn off "format on
save" for tez-src, or restrict it to lines you edited.
Detect before submitting:
cd ~/tez-src
git diff -w origin/master --stat
git diff origin/master --stat
If the second shows many more changed files than the first, you have whitespace churn. Either clean it up or, if it's pervasive, configure your editor and re-do the change.
Rule 6: Javadoc for @Public API
If you add or modify a method on a class annotated @InterfaceAudience.Public, it needs
javadoc. The check:
cd ~/tez-src
grep -l "@InterfaceAudience.Public" tez-api/src/main/java -r | head
For each such class, every public method has Javadoc with at least:
- One-sentence summary
@paramfor each parameter@returnfor non-void@throwsfor any non-RuntimeExceptiondeclared exception
If your patch adds a new public method without Javadoc, expect the first review comment to ask for it.
Rule 7: @InterfaceAudience and @InterfaceStability Annotations
Every public-ish class in tez-api is annotated. Example from Vertex.java:
@Public
@Evolving
public class Vertex {
...
}
The grid:
@Stable | @Evolving | @Unstable | |
|---|---|---|---|
@Public | Compat guaranteed across minor versions | May change between minor versions with warning | May change between any release |
@LimitedPrivate({"Hive"}) | Stable for named projects | Evolving for named projects | Unstable, named projects only |
@Private | Internal; do not depend on | Internal | Internal |
When you add a new class to tez-api, you must annotate it. The annotations live in
tez-api/src/main/java/org/apache/hadoop/classification/. When in doubt, default to:
@Public
@Unstable
so users see the class but know not to depend on its shape yet.
Rule 8: Pre-Submit Checklist
Before you upload TEZ-NNNN.001.patch, run each of these and have all pass.
cd ~/tez-src
# 1. Full compile, all modules, no tests.
mvn install -DskipTests
# 2. Checkstyle. Tez uses the config in tez-tools/.
mvn checkstyle:check
# 3. Tests in modules you changed.
# For tez-dag, tez-api, etc.:
mvn test -pl tez-dag
mvn test -pl tez-api
# 4. A representative integration test.
mvn test -pl tez-tests -Dtest=TestOrderedWordCount
# 5. Patch applies cleanly to current master.
git fetch origin
git rebase origin/master
git diff origin/master > /tmp/TEZ-NNNN.001.patch
cd /tmp
git -C ~/tez-src apply --check TEZ-NNNN.001.patch
If any step fails, fix and re-run. Submit only when all pass.
Rule 9: Patch Generation
Generate the patch from a clean rebase against origin/master:
cd ~/tez-src
git fetch origin
git rebase origin/master # resolves conflicts now, not at commit time
git diff origin/master --no-color --unified=5 > TEZ-NNNN.001.patch
The --unified=5 gives reviewers 5 lines of context instead of the default 3. This is a
small kindness that makes review materially easier.
Inspect the patch before attaching:
wc -l TEZ-NNNN.001.patch # how big is it?
head -30 TEZ-NNNN.001.patch # right files?
grep -c "^+" TEZ-NNNN.001.patch # added lines
grep -c "^-" TEZ-NNNN.001.patch # removed lines
A patch of 50–300 lines is comfortable for a single review round. A patch over 1000 lines will sit unreviewed until you split it.
Worked Example — A Minimal Trivial Patch
A real-shape patch for a Javadoc fix on Vertex.java:
diff --git a/tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java b/tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java
index abcdef1..1234567 100644
--- a/tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java
+++ b/tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java
@@ -180,7 +180,10 @@ public class Vertex {
}
/**
- * Set the parallelism.
+ * Set the parallelism (number of tasks) for this Vertex.
+ *
+ * @param parallelism the number of tasks. Must be > 0 unless
+ * {@link #setVertexManagerPlugin} configures a dynamic plugin.
+ * @return this Vertex, for chaining.
*/
public Vertex setParallelism(int parallelism) {
That's the entire patch — 5 changed lines, +6/-1. No test (Javadoc only). It passes
checkstyle:check, mvn install -DskipTests, and the JIRA is TEZ-NNNN: Improve Javadoc on Vertex#setParallelism.
Anti-Patterns
What committers flag immediately:
| Anti-pattern | Why it's flagged |
|---|---|
| Reformat of an entire file | Hides the real change |
// TODO: refactor comment added | Should be a separate JIRA |
System.out.println left in | Use LOG, never System.out |
e.printStackTrace() | Use LOG.warn(msg, e) |
Catch Exception swallowing everything | Catch specific or rethrow |
New configuration key with no @Public annotation | Won't be honored as stable |
New method with throws Exception | Use specific exceptions |
| Test that always passes (no assertion) | Useless |
| Test depending on wall-clock timing | Flaky |
@Ignore added to silence a failing test | Fix it or revert |
Validation Artifacts
After this chapter you should have:
- A
~/tez-notes/precommit.shscript running the seven pre-submit commands above. - One actual patch file
TEZ-NNNN.001.patchon disk, even if you haven't uploaded it. - A
~/tez-notes/patch-checklist.mdcheatsheet from Rule 8. - Knowledge of the
@InterfaceAudience/@InterfaceStabilitymatrix.
The next chapter — Responding to Feedback — covers what happens after you press "Attach".