Lab 2.3 — Fix It: NullPointerException in TezTaskAttemptID.fromString
Lab type: Fix-It — reproduce → locate → write failing test → patch → verify → format patch
Estimated time: 90–120 min
Tez component: tez-common → org.apache.tez.common.TezTaskAttemptID
Background
TezTaskAttemptID is the primary key that links a task attempt to its vertex, DAG, and
application. Its static fromString method parses a serialised ID like:
attempt_1609459200000_0001_1_00_000000_0
In the Tez codebase the parse path for certain malformed inputs has historically
thrown an unguarded NullPointerException rather than a descriptive
IllegalArgumentException. Null returns from String.split() or Integer.parseInt()
call sites that skip validation are the common culprit.
This lab walks the complete Apache contribution workflow for such a bug:
- Reproduce the crash in a test
- Read the source to understand why it crashes
- Apply the minimal fix
- Verify tests pass and checkstyle is clean
- Produce a
.patchfile ready for JIRA upload
Step 1 — Locate the Source File
cd ~/tez-src # your local Tez clone from Lab 1.1
find . -name "TezTaskAttemptID.java" | head -5
Expected path:
./tez-common/src/main/java/org/apache/tez/common/TezTaskAttemptID.java
Open the file and read the fromString method in full.
Questions to answer before continuing
| # | Question |
|---|---|
| 1 | What does fromString call first — TezDAGID.fromString, TezVertexID.fromString, or does it parse raw tokens? |
| 2 | What happens if the input string is null? Is there an explicit null guard? |
| 3 | What exception type does the method declare in its signature (throws clause)? |
| 4 | Find the split("_") call(s). If the split produces fewer parts than expected, what line would throw? |
| 5 | Is there a sibling method toString()? What is the canonical string format it produces? |
Step 2 — Find the Existing Tests
find . -name "TestTezTaskAttemptID.java" | head -5
Expected path:
./tez-common/src/test/java/org/apache/tez/common/TestTezTaskAttemptID.java
Open it.
Questions
| # | Question |
|---|---|
| 1 | How many fromString test cases already exist? |
| 2 | Is there a test for a null input? |
| 3 | Is there a test for a string with too few underscore-separated parts? |
| 4 | What assertion style does the file use — JUnit 4 @Test(expected=...) or try/catch? |
Step 3 — Reproduce the Bug
Add the following test to TestTezTaskAttemptID.java inside the existing test class.
Do not modify the test — the goal is to make it pass, not work around it.
@Test(expected = IllegalArgumentException.class)
public void testFromStringNullInput() {
TezTaskAttemptID.fromString(null);
}
@Test(expected = IllegalArgumentException.class)
public void testFromStringTooFewParts() {
// Fewer underscore-separated tokens than the format requires
TezTaskAttemptID.fromString("attempt_1609459200000_0001_1");
}
Run the tests:
cd tez-common
mvn test -pl . -Dtest=TestTezTaskAttemptID -q 2>&1 | tail -30
Expected result: Both new tests FAIL (the method throws NullPointerException
or ArrayIndexOutOfBoundsException, not IllegalArgumentException).
Record the exact exception and stack-trace line. You will need this for the JIRA description later.
Step 4 — Apply the Fix
Open TezTaskAttemptID.java and apply a minimal patch to the fromString method.
Rules for a minimal patch
- Add a null-check at the very top of
fromString; throwIllegalArgumentExceptionwith a clear message - Add a length-check on the parsed tokens before subscripting the array
- Do not reformat unrelated lines (this produces noisy diffs that fail checkstyle review)
- Do not change method signatures or visibility
Hint — guard pattern used elsewhere in the same class
Search the file for how other fromString variants guard their input:
grep -n "IllegalArgumentException" TezTaskAttemptID.java
Use the same pattern and message style.
Step 5 — Verify the Fix
# All TezTaskAttemptID tests must pass
mvn test -pl tez-common -Dtest=TestTezTaskAttemptID -q
# Full tez-common test suite (regression guard)
mvn test -pl tez-common -q 2>&1 | tail -20
# Checkstyle must be clean
mvn checkstyle:check -pl tez-common -q 2>&1 | grep -E "ERROR|WARNING|violation" | head -20
All three commands must produce zero errors.
Step 6 — Understand the Checkstyle Rules
cat tez-common/src/main/checkstyle/tez-checkstyle.xml | grep -A2 "LineLength"
cat tez-common/src/main/checkstyle/tez-checkstyle.xml | grep -A2 "Javadoc"
Questions
| # | Question |
|---|---|
| 1 | What is the maximum line length enforced? |
| 2 | Does the project require Javadoc on all public methods, or only some? |
| 3 | What import ordering rule is in effect — alphabetical, grouped, or none? |
Step 7 — Format the Patch File
Apache Tez uses the unified diff format. From the repo root:
cd ~/tez-src
git diff > /tmp/TEZ-XXXX.001.patch
Inspect the patch:
cat /tmp/TEZ-XXXX.001.patch
Checklist before uploading to JIRA
- Patch header shows the correct file path relative to the repo root
-
Only
TezTaskAttemptID.javaandTestTezTaskAttemptID.javaare modified -
No trailing whitespace on any changed line (
grep -P "\s+$" /tmp/TEZ-XXXX.001.patch) -
Patch applies cleanly to a fresh checkout:
git apply --check /tmp/TEZ-XXXX.001.patch -
mvn test -pl tez-commonstill passes aftergit apply
Step 8 — Write the JIRA Description
Draft a JIRA ticket description following the Apache Tez convention:
Summary: TezTaskAttemptID.fromString throws NPE/AIOOBE on malformed input
instead of IllegalArgumentException
Description:
TezTaskAttemptID.fromString does not validate its input before parsing.
Passing null or a string with fewer than N underscore-separated parts
causes an unhandled NullPointerException (null path) or
ArrayIndexOutOfBoundsException (short-string path) instead of
the expected IllegalArgumentException.
Steps to reproduce:
TezTaskAttemptID.fromString(null);
→ NullPointerException at TezTaskAttemptID.java:NN
TezTaskAttemptID.fromString("attempt_1609459200000_0001_1");
→ ArrayIndexOutOfBoundsException at TezTaskAttemptID.java:NN
Fix: add explicit null guard + array-length guard at the top of fromString.
Priority: Minor
Component: tez-common
Replace NN with the actual line numbers from your stack traces in Step 3.
Step 9 — Connect the Concepts
| Concept | Where to find it in the codebase |
|---|---|
TezTaskAttemptID | tez-common/src/main/java/.../TezTaskAttemptID.java |
TezID base class | Same package — TezID.java |
All fromString sibling methods | TezDAGID, TezVertexID, TezTaskID — same package |
| Checkstyle config | tez-common/src/main/checkstyle/tez-checkstyle.xml |
| Example past fix (similar pattern) | Search JIRA for TEZ- + IllegalArgumentException + fromString |
Reflection
- Why should library code throw
IllegalArgumentExceptionrather than letting aNullPointerExceptionpropagate? - What does the Apache contribution guide say about test coverage for bug fixes?
(Hint:CONTRIBUTING.mdor the Apache Tez wiki — every bug fix must include a reproducing test.) - How does the Tez
fromStringguard pattern compare to the one in Hadoop'sTaskAttemptID.forName? - Could this same class of bug exist in
TezDAGID.fromStringorTezVertexID.fromString?
Check both files and note your findings.