Lab 2.2: Prepare a Patch Using Apache Practices

Background

A "patch" in Apache open-source culture means a unified diff file attached to a JIRA issue. This lab walks you through the complete workflow: finding a safe change to make, preparing the patch, verifying it, and writing the JIRA description.

This lab uses a real but trivial change as the vehicle — a Javadoc improvement in tez-api. Trivial changes are intentional: the goal is to master the workflow, not to write impressive code.


The Apache Git Patch Workflow

Apache Tez development uses a linear history on master (now trunk in some Apache projects, master in Tez). The standard contributor workflow:

origin/master  (read-only for non-committers)
      |
      ↓ checkout
local/master
      |
      ↓ branch
local/TEZ-NNNN
      |
      ↓ make changes
      ↓ mvn test (pass)
      ↓ mvn checkstyle:check (pass)
      ↓ git diff origin/master > TEZ-NNNN.001.patch
      |
      → Attach to JIRA

You never push your branch to Apache. You generate a diff and attach it.


Step-by-Step Tasks

Step 1: Set Up Your Working Branch

cd /path/to/tez

# Always start from a clean, up-to-date master
git fetch origin
git checkout master
git merge origin/master

# Create a branch named after the JIRA issue you are working on
# Use TEZ-0000 as a placeholder for this lab
git checkout -b TEZ-0000-javadoc-tezvertex

Verify you are on the new branch:

git branch
# * TEZ-0000-javadoc-tezvertex
#   master

Step 2: Find a Target for Your Change

Open tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java.

Look for public methods that:

  • Have no Javadoc, or
  • Have a @param tag with a non-descriptive name like // TODO, or
  • Have a @return tag missing from a non-void method

A useful starting point:

# Find methods with empty or missing Javadoc in tez-api
javadoc -private -sourcepath tez-api/src/main/java \
  org.apache.tez.dag.api 2>&1 | grep "no comment"

Or manually: open Vertex.java in IntelliJ, look at the addDataSink() method. If it lacks a @param description for dataSink, that is your target.

Step 3: Make the Change

Add or improve the Javadoc for the method you identified. Follow this format exactly:

/**
 * Adds a {@link DataSink} to this vertex. The sink will receive the output
 * of this vertex after all tasks complete.
 *
 * @param outputName
 *          the name used to identify this sink in the DAG; must be unique
 *          within this vertex
 * @param dataSink
 *          the {@link DataSink} descriptor defining the sink type and
 *          configuration
 * @return this {@link Vertex} instance (for method chaining)
 * @throws IllegalStateException if the vertex has already been added to a {@link DAG}
 */
public Vertex addDataSink(String outputName, DataSinkDescriptor dataSink) {

Rules for Apache Javadoc style:

  • First sentence is a brief imperative description (no subject: "Adds a…" not "This method adds a…")
  • Multi-line @param descriptions indent the continuation by 10 spaces (2 more than @param)
  • Use {@link ClassName} for all class references
  • Use {@code value} for code literals and parameter names in prose

Step 4: Verify Compilation

mvn compile -pl tez-api -q

Expected: BUILD SUCCESS with no errors.

Step 5: Run Checkstyle

mvn checkstyle:check -pl tez-api

Expected: BUILD SUCCESS. If there are violations, fix them before continuing.

Common Javadoc-specific violations:

  • JavadocStyle — Javadoc comment does not end with a period
  • JavadocMethod@param or @return tag is missing
  • JavadocVariable — public field missing Javadoc

Step 6: Run the Relevant Tests

mvn test -pl tez-api -q

Expected: BUILD SUCCESS. Even a pure Javadoc change requires a test run — checkstyle runs as part of the test phase in some configurations.

Step 7: Generate the Patch

# Verify what you changed
git diff

# The diff should show only the lines you intentionally changed
# No whitespace changes, no unrelated files

# Generate the patch file
git diff origin/master > /tmp/TEZ-0000.001.patch

# Inspect it
cat /tmp/TEZ-0000.001.patch

The patch file should:

  • Start with diff --git a/tez-api/...
  • Show exactly the lines you added/removed (prefixed with +/-)
  • Contain no changes to files you did not intend to modify

If the patch is longer than expected, run git status to find unexpected changes and use git checkout -- <file> to revert them.

Step 8: Write the JIRA Description

For the JIRA issue you would create for this patch, write:

Summary line format:

TEZ-0000. Improve Javadoc for Vertex.addDataSink()

Description format:

Problem:
The addDataSink() method in Vertex.java has no @param documentation for the
'dataSink' parameter. This makes it harder for new users to understand the
expected input without reading the implementation.

Fix:
Add complete @param, @return, and @throws Javadoc for addDataSink().

Testing:
mvn test -pl tez-api  (all existing tests pass)
mvn checkstyle:check -pl tez-api  (no violations)

Step 9: Review the Patch as a Committer Would

Before attaching a patch, ask yourself:

  1. Does the patch contain only the changes described in the JIRA description?
  2. Does it pass mvn test -pl <module> locally?
  3. Does it pass mvn checkstyle:check -pl <module>?
  4. Is the commit message format correct? (TEZ-NNNN. Short description.)
  5. Is there a clear explanation in the JIRA description of what was wrong and what was fixed?

If any answer is "no", fix it before uploading.


Common Mistakes

MistakeHow to detectFix
Patch includes unrelated formatting changesgit diff shows hundreds of linesgit checkout -- <unintended-file>
Patch modifies generated codeProto-generated files in the diffRevert generated files; only change source
Patch applies only to a non-master branchgit diff origin/master shows no changesRebase your branch onto current master
Checkstyle violation in unchanged linemvn checkstyle:check fails in a line you did not writeYou must fix it anyway — it is in your patch
Test fails on unrelated moduleRunning all tests surfaces a pre-existing failureConfirm by running on a clean checkout; note the existing failure in JIRA

JIRA Status Workflow

After attaching your patch:

  1. Set the JIRA status to "Patch Available"
  2. Add a comment: "Patch attached. Tested with mvn test -pl tez-api and mvn checkstyle:check -pl tez-api, both pass."
  3. Wait for a committer to review — do not ping on the mailing list immediately
  4. If no response in 2 weeks, it is acceptable to send one polite reminder to dev@tez.apache.org:
    Subject: [REMINDER] TEZ-NNNN patch available for review
    
    Hi dev@,
    
    Friendly reminder that TEZ-NNNN has a patch attached. Any feedback welcome.
    https://issues.apache.org/jira/browse/TEZ-NNNN
    
    Thanks
    

Expected Output

At the end of this lab you have:

  1. A local branch TEZ-0000-javadoc-tezvertex with a Javadoc change
  2. A passing test run: mvn test -pl tez-api
  3. A passing checkstyle run: mvn checkstyle:check -pl tez-api
  4. A patch file at /tmp/TEZ-0000.001.patch with only the intended diff
  5. A written JIRA description (even if not submitted) in the format above

Stretch Goals

  1. Find a real Minor or Trivial open issue in Apache Tez JIRA that has been open for more than 6 months with no patch. Leave a JIRA comment expressing interest.

  2. Attempt the same patch workflow with a real issue:

    • Use git checkout -b TEZ-<real-number>-<short-description> for the branch name
    • Use the real JIRA number in the patch filename: TEZ-NNNN.001.patch
  3. Read three recently committed Tez patches by browsing JIRA issues with status "Resolved". For each, read the complete comment thread to understand the feedback cycle and how many patch revisions were required.

  4. Generate a git log view that shows only your branch's commits:

    git log origin/master..HEAD --oneline
    

    This is what a committer sees when reviewing your work.