Lab 2.4 — Review It: Spot the Flaws in TEZ-FAKE001.001.patch

Lab type: Review-It — read a synthetic patch, find every flaw, explain the impact, propose fixes
Estimated time: 60–90 min
Tez component: tez-dagorg.apache.tez.dag.app.dag.impl.TaskImpl


Context

You are a Tez committer reviewing a patch uploaded to JIRA. The contributor claims the patch fixes a race condition where TaskImpl.getCounters() returns null when called before any task attempt has completed.

Your job is to review the patch before it merges. There are exactly 5 intentional flaws hidden in the diff below. Find them all.


The Synthetic Patch

diff --git a/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java b/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java
index a1b2c3d..e4f5a6b 100644
--- a/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java
+++ b/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java
@@ -214,6 +214,8 @@ public class TaskImpl implements Task, EventHandler<TaskEvent> {
 
+  import org.apache.tez.common.counters.TezCounters;
+
   public synchronized TezCounters getCounters() {
     TezCounters counters = null;
     if (successfulAttempt != null) {
@@ -221,7 +223,7 @@ public class TaskImpl implements Task, EventHandler<TaskEvent> {
       counters = successfulAttempt.getCounters();
     } else {
       counters = attemptList.stream()
-          .filter(a -> a.getState() == TaskAttemptState.SUCCEEDED)
+          .filter(a -> a.getState() == TaskAttemptState.RUNNING)
           .findFirst()
           .map(TaskAttemptImpl::getCounters)
           .orElse(null);
@@ -231,6 +233,14 @@ public class TaskImpl implements Task, EventHandler<TaskEvent> {
     return counters;
   }
 
+  /**
+   * Returns the counter for this task, or a new empty TezCounters object
+   * if no counters are available yet.
+   *
+   * @return counters, never null
+   */
+  public synchronized TezCounters getCountersOrEmpty() {
+    TezCounters c = getCounters();
+    return c == null ? new TezCounters() : c;
+  }
+
diff --git a/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestTaskImpl.java b/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestTaskImpl.java
index b7c8d9e..f0a1b2c 100644
--- a/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestTaskImpl.java
+++ b/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestTaskImpl.java
@@ -891,6 +891,18 @@ public class TestTaskImpl {
 
+  @Test
+  public void testGetCountersBeforeAnyAttempt() {
+    // No attempts started; counters should not be null
+    initTask();
+    TezCounters result = task.getCounters();
+    assertNotNull("getCounters() must not return null", result);
+  }
+
+  @Test
+  public void testGetCountersOrEmptyReturnsSameObjectEachTime() {
+    initTask();
+    TezCounters first  = task.getCountersOrEmpty();
+    TezCounters second = task.getCountersOrEmpty();
+    assertSame("Must return same instance", first, second);
+  }
+

Your Task

For each flaw you find, fill in the table:

#FileLine / hunkFlaw descriptionWhy it mattersSuggested fix
1
2
3
4
5

Guided Questions

Work through these questions one by one. Each one points at a different flaw.

Question 1 — Import placement

Look at where the import statement was added:

+  import org.apache.tez.common.counters.TezCounters;
+
   public synchronized TezCounters getCounters() {
  • Is this a valid location for a Java import declaration?
  • What would happen at compile time if this diff were applied as-is?
  • Where should imports go in a Java file?
  • Lookup: does TaskImpl.java already import TezCounters at the top?
    (grep "import.*TezCounters" tez-dag/src/main/java/.../TaskImpl.java)

What is the flaw?


Question 2 — The filter predicate

The patch changes the fallback stream filter from:

.filter(a -> a.getState() == TaskAttemptState.SUCCEEDED)

to:

.filter(a -> a.getState() == TaskAttemptState.RUNNING)
  • Re-read the JIRA description: the reporter says getCounters() returns null when called before any attempt has completed.
  • Does filtering for RUNNING attempts fix that?
  • What does it mean to read counters from a RUNNING attempt vs a SUCCEEDED one?
  • Are the counters of a still-running attempt considered final/reliable?

What is the flaw? What should the filter be?


Question 3 — The new test testGetCountersBeforeAnyAttempt

Read the test body carefully:

TezCounters result = task.getCounters();
assertNotNull("getCounters() must not return null", result);
  • The test asserts that getCounters() is not null when no attempt has started.
  • But the patch does not change getCounters() to return an empty object — it adds a separate method getCountersOrEmpty() for that.
  • When successfulAttempt is null and attemptList is empty, what does getCounters() actually return?
  • Will this test pass or fail against the patched code?

What is the flaw?


Question 4 — The new test testGetCountersOrEmptyReturnsSameObjectEachTime

assertSame("Must return same instance", first, second);
  • getCountersOrEmpty() is implemented as:
    return c == null ? new TezCounters() : c;
  • Each call creates a new TezCounters() when c is null.
  • Does the assertSame assertion match the implementation?
  • Is assertSame testing a documented contract, or is it over-specifying an implementation detail?
  • What assertion would actually verify the intended contract ("not null")?

What is the flaw?


Question 5 — The JIRA description says the fix is needed, but…

Re-read the patch one final time. The root cause (as stated in the JIRA) is that getCounters() can return null. The correct caller-safe fix for most Tez callers would be to make getCounters() itself never return null (return empty TezCounters as the contract).

Instead the patch adds getCountersOrEmpty() as a new method — but leaves the old getCounters() method returning null.

  • Every existing caller of getCounters() still gets null.
  • The Tez codebase uses getCounters() in aggregation loops that iterate counters: counters.incrAllCounters(taskCounters) — passing null there throws NPE.
  • How many callers of getCounters() exist in tez-dag?
grep -rn "\.getCounters()" tez-dag/src/main/ | grep -v "//.*getCounters" | wc -l
  • Does the patch actually fix the original bug?

What is the flaw?


Answer Key (Read After You've Filled the Table)

Reveal answers
#FlawImpactFix
1import statement placed inside the class body (after the opening {)Compile error — Java imports must precede the class declarationRemove the import; TezCounters is already imported at the top of TaskImpl.java
2Filter changed to RUNNING instead of keeping SUCCEEDED; a running attempt's counters are partial and unstableReturns wrong/partial data; counters values change as the attempt progressesRevert to SUCCEEDED filter; the real fix is to handle the "no succeeded attempt yet" case separately (return null or empty)
3testGetCountersBeforeAnyAttempt asserts assertNotNull on getCounters() which still returns null when no attempt has completedTest will fail on the patched code — the patch doesn't make getCounters() non-nullTest should call getCountersOrEmpty() or the assertion should accept null and document the contract
4assertSame requires the same object reference but getCountersOrEmpty() creates a new TezCounters() each time null is returnedTest fails on every call where no successful attempt existsUse assertNotNull to verify the non-null contract; don't assert reference identity
5The patch adds getCountersOrEmpty() but doesn't fix the root cause — getCounters() still returns null; all existing callers are still brokenDownstream NPEs in counter aggregation loops are not fixedChange getCounters() itself to return new TezCounters() instead of null, or add a null-guard in every caller; document the chosen contract

Reflection

  1. A patch that adds a new method instead of fixing the old one is sometimes called an "additive workaround." When is that acceptable? When is it wrong?

  2. The Apache Tez review process requires that every patch include a test that would have failed before the fix and passes after. Does this patch satisfy that requirement? Why or why not?

  3. If you were the committer, what feedback would you leave on JIRA? Write two or three sentences in the style of a real review comment (constructive, specific, pointing at the line).

  4. Look up a real Tez JIRA review thread (search issues.apache.org/jira for project = TEZ AND labels = patch-available AND resolution = Fixed). Find one comment where a committer asked for a test change. What did they say?