Stage 2 — Build and Logging Hygiene
What this stage teaches
Stage 2 teaches the smallest patches that touch running production code: build
metadata (pom.xml), logging idioms, and dependency hygiene. You learn:
- How Tez's dependency version bands work and which bumps are safe within a minor line.
- The
slf4j-api+log4j(orreload4j) logging stack as wired intez-common, and the four idioms reviewers actively enforce. - How to remove deprecated Guava and Hadoop calls without breaking older Hadoop consumers in the supported compatibility band.
- How to triage log-level mismatches: messages logged at
INFOthat should beDEBUG(and the reverse).
The patches are still small (5–80 lines) and the risk surface is small, but they
go into the AM and the runtime tasks. A LOG.info in ShuffleManager that fires
once per fetch will be seen by every operator running Hive-on-Tez.
JIRA filter to find candidates
project = TEZ
AND resolution = Unresolved
AND (summary ~ "logging" OR summary ~ "deprecated" OR summary ~ "guava"
OR summary ~ "bump" OR summary ~ "upgrade dependency"
OR summary ~ "System.out" OR description ~ "isDebugEnabled")
ORDER BY updated DESC
A second sweep for dependency bumps that the build flags:
project = TEZ AND component in (build) AND status = Open ORDER BY priority DESC
You can also generate candidates by running OWASP / dependency-check:
cd ~/tez-src
mvn -pl tez-common dependency:tree -DoutputType=text | grep -E "guava|jackson|netty"
Any line that flags a Guava 12.x in transitive scope is a Stage 2 candidate, because Tez has been on Guava-shaded internals for years.
Walked example A — System.out.println in production code
Symptom: a grep finds three stray System.out.println calls in
tez-runtime-library. They were left over from a debugging session and now show
up in NodeManager stdout logs, polluting operator dashboards.
Step 1 — Find every offender
cd ~/tez-src
grep -rn "System\.out\.println\|System\.err\.println" \
tez-runtime-library/src/main/java tez-runtime-internals/src/main/java tez-dag/src/main/java \
| grep -v "/test/" | grep -v "examples"
Each hit is a separate JIRA candidate (one stage-2 patch per ticket). Pick one, file the JIRA, claim it.
Step 2 — The diff
Suppose the offender is in tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/PipelinedSorter.java:
--- a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/PipelinedSorter.java
+++ b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/PipelinedSorter.java
@@
- System.out.println("Spill " + numSpills + " starting, size=" + buffer.position());
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Spill {} starting, size={}", numSpills, buffer.position());
+ }
Three rules in one diff:
- Replace
System.outwith the class's existing slf4jLOG. If the file does not have one, addprivate static final Logger LOG = LoggerFactory.getLogger(...)at the top. - Use slf4j
{}placeholders, not string concatenation. The placeholder form avoids constructing the message string when the log level is filtered out. - Wrap the call in
LOG.isDebugEnabled()only when the argument list does non-trivial work (atoString()on a large object, a list copy, a.size()on a synchronized collection). Pure references (numbers, already-bound strings) do not need the guard.
The third rule is the one reviewers nitpick most. The placeholder form already
defers toString(), so a guard around a plain LOG.debug("foo {}", x) where x
is an int is unnecessary noise. But this:
LOG.debug("Pending {}", scheduledTasks); // scheduledTasks.toString() is expensive
does benefit from a guard, because scheduledTasks will be toString()-ed
before slf4j forms the message.
Step 3 — Verify the build
mvn -pl tez-runtime-library -am clean install -DskipTests -Phadoop28 -q
mvn -pl tez-runtime-library checkstyle:check -q
There is no easy unit test for "no System.out left behind." The reviewer signal
is a clean grep across the changed file plus a green checkstyle run.
Walked example B — pom.xml dep bump within the compat band
Symptom: jackson-databind 2.12.x has a known CVE; Tez is pinned to 2.12.6 in the
parent POM. The compatibility band for the 0.10.x line allows bumps within the
2.12.* range.
Step 1 — Find the pin
cd ~/tez-src
grep -n "jackson-databind\|jackson.version\|jackson-core" pom.xml
Result, abbreviated:
pom.xml:178: <jackson.version>2.12.6</jackson.version>
Most jackson artifacts in Tez are governed by ${jackson.version} in the parent
POM. That is the only string you change.
Step 2 — The diff
--- a/pom.xml
+++ b/pom.xml
@@
- <jackson.version>2.12.6</jackson.version>
+ <jackson.version>2.12.7.1</jackson.version>
That is the entire patch. The harder part is justifying it.
Step 3 — The JIRA description
Summary: Bump jackson-databind from 2.12.6 to 2.12.7.1
Description:
2.12.6 is affected by CVE-YYYY-NNNN. 2.12.7.1 is the latest patch on the 2.12
line and is API-compatible per the jackson maintainers' compat notes. We do not
bump to 2.13 / 2.14 here to keep Hive-on-Tez compatibility unchanged.
Verification:
mvn clean install -DskipTests -Phadoop28
mvn -pl tez-dag test -Dtest=TestDAGImpl
mvn -pl tez-runtime-library test -Dtest=TestShuffleManager
Step 4 — Why "within the compat band" matters
If you bumped to 2.14, you would break Hive 3.x users who ship 2.13. A 2.12 → 2.12.7.1 bump is a one-line patch. A 2.12 → 2.14 bump is a six-month compatibility argument and lives in Stage 11. Stay on rung.
Walked example C — log-level mismatch
Symptom: a user reports their NodeManager logs are at 100GB/day. Investigation
shows Fetcher is logging every single shuffle fetch at INFO:
LOG.info("Fetcher " + id + " connecting to " + host + ":" + port);
That message fires per attempt per source per fetch. For a 10k-task vertex it is catastrophic.
Diff
--- a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/Fetcher.java
+++ b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/Fetcher.java
@@
- LOG.info("Fetcher " + id + " connecting to " + host + ":" + port);
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Fetcher {} connecting to {}:{}", id, host, port);
+ }
Rules for INFO → DEBUG demotions:
- The message fires more than once per task attempt → almost always
DEBUG. - The message fires once per DAG lifecycle event (DAG start, vertex committed,
task killed by user) → keep at
INFO. - The message fires per exception → keep at
WARNorERRORper the existing level, never demote silently. - Never demote a log without dev@ confirmation if the message references a contract event (state transition, container release). Operators rely on those for postmortems.
The Fetcher example is uncontroversial; a LOG.info on every state transition in
VertexImpl is not — that would be Stage 4.
Pitfalls
- Don't introduce a logger dependency change in a logging patch. If the file
imports
org.apache.commons.logging.Log, do not migrate it to slf4j in this patch. That migration is a separate JIRA and a much larger surface area. - Don't use
Throwable.printStackTrace()even in tests. Reviewers will flag it. UseLOG.error("msg", t)instead. - Don't bump a dep across a major version line in a Stage 2 patch. That is Stage 11.
- Don't
mvn versions:use-latest-releasesand submit the resulting diff. The bump must be justified per artifact with the CVE or the bug being fixed. - Don't remove deprecated Guava calls by adding new Guava calls. The Tez
trajectory is off Guava in public code. Replace
Preconditions.checkNotNullwithObjects.requireNonNull(JDK 7+) — not with a different Guava class. - Don't add a
LOG.debugguard around a string literal.LOG.debug("hello")needs no guard.
Exit criteria — when you're ready for the next stage
Move on when:
- You have shipped at least two logging-cleanup patches and one pom dep bump.
- You can explain, without looking it up, when to add
LOG.isDebugEnabled()and when not to. - You have read
tez-common/src/main/java/org/apache/tez/common/CallableWithNdc.javaand understand the NDC pattern used to attach DAG/Vertex IDs to log messages — that knowledge is the bridge to Stage 3. - Your last patch was reviewed and merged without a "split this into two JIRAs" comment.
Stage 3 layers on top: you keep the same surgical patch style, but now you make the content of error messages tell the operator which DAG and which vertex.
Appendix — finding logging hygiene candidates yourself
The JIRA filter at the top of this stage may return zero results during quiet periods. When that happens, you can manufacture candidates yourself with two grep patterns that have a high signal-to-noise ratio.
Pattern A — unguarded string concatenation in LOG.debug
cd ~/tez-src
grep -rn 'LOG\.debug(.*+' --include="*.java" tez-dag tez-runtime-internals \
tez-runtime-library | grep -v isDebugEnabled | head -30
This finds calls of the form LOG.debug("got " + counter) that allocate the
concatenated string unconditionally. Pick one, wrap in if (LOG.isDebugEnabled()), attach to a JIRA.
Pattern B — LOG.info calls with high call-site frequency
cd ~/tez-src
grep -rn 'LOG\.info' --include="*.java" tez-runtime-library | wc -l
grep -rn 'LOG\.info' --include="*.java" tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle | head
The shuffle path runs per-fetch — any LOG.info there fires hundreds of
thousands of times per DAG. Most are candidates for demotion to DEBUG.
Pattern C — pom files referencing pinned old versions
cd ~/tez-src
grep -rn "<version>" --include="pom.xml" | grep -E "jackson|commons-|guava|netty" \
| grep -v -- "-test" | head -20
Cross-reference against the latest patch release on the package's GitHub releases page. If your match is two patch versions behind and the changelog mentions a security fix, you have a Stage 2 candidate.
The bar for these "self-found" candidates is the same: file a JIRA before
coding, attach a 001 patch, wait for review.