Code Style & Trust
The Tez project enforces a specific code style via checkstyle. The style itself is less interesting than the trust mechanism it embodies: an automated, opinionated style is how a project of dozens of committers and hundreds of contributors keeps its codebase coherent without requiring every reviewer to argue about braces.
This chapter is the practical guide to the style, the tools that enforce it, and the trust ladder a contributor climbs from first patch to commit bit.
Where the Style Lives
Tez's checkstyle configuration:
cat ~/tez-src/tez-tools/src/main/resources/tez/checkstyle.xml
This file is the source of truth. If a reviewer says "your patch fails checkstyle," they mean this file is unhappy.
The file is invoked by the parent pom.xml:
grep -A10 "maven-checkstyle-plugin" ~/tez-src/pom.xml
Verify locally:
cd ~/tez-src
mvn checkstyle:check
Output on success is silent (exit 0). Output on failure lists each violation with file and line number.
The Rules That Matter
The full ruleset is the file above. The rules that catch contributors most often:
| Rule | What it enforces |
|---|---|
| Line length | Usually 120 chars max |
| Indentation | 2 spaces (not 4, not tabs) |
| Imports | No wildcard imports; specific order |
| Brace style | Egyptian ({ on same line) |
| Unused imports | Disallowed |
| Member ordering | Static fields, instance fields, constructors, methods |
| Trailing whitespace | Disallowed |
| Final newline | Required |
@Override annotations | Required when overriding |
Javadoc on public methods of @Public classes | Required |
The full list is in the file. Notable absences:
- Tez does not enforce a strict naming convention beyond standard Java (camelCase, PascalCase for classes).
- Tez does not enforce method length limits (so committers must catch overly long methods in review).
- Tez does not enforce strict cyclomatic complexity (same).
So checkstyle is a floor, not a ceiling. Passing it doesn't mean the patch is well-styled in the human sense — it means the obvious mechanical violations are absent.
IDE Setup
Configure your IDE to match. IntelliJ:
1. File → Settings → Editor → Code Style → Java.
2. Set Tab size: 2; Indent: 2; Continuation indent: 4.
3. Use spaces, not tabs.
4. Wrapping: hard wrap at 120.
5. Import → Class count to use import with '*': 999.
6. Final newline: required.
Or import the Hadoop / Tez IntelliJ style file if one is in the repo:
find ~/tez-src -name "*.xml" | xargs grep -l "CodeStyle" 2>/dev/null | head
Eclipse: Window → Preferences → Java → Code Style → Formatter, import an XML if one is
provided in tez-tools/.
VS Code with the Java extension: edit .vscode/settings.json per workspace:
{
"java.format.settings.url": "tez-tools/src/main/resources/tez/eclipse-formatter.xml",
"editor.tabSize": 2,
"editor.insertSpaces": true,
"files.insertFinalNewline": true,
"files.trimTrailingWhitespace": true
}
The goal: at save time, your IDE produces checkstyle-passing code.
Catching Violations Pre-Submit
The pre-submit script (from Patch Quality):
#!/usr/bin/env bash
set -e
cd ~/tez-src
mvn install -DskipTests
mvn checkstyle:check
git diff --check # detects whitespace errors
mvn test -pl tez-dag,tez-api
git diff --check is a free win — it catches trailing whitespace and conflict markers
before they reach the reviewer.
The Trust Ladder
Style is the visible surface of a deeper thing: trust. The contributor-to-committer path is a multi-step climb up a trust ladder.
Step 0: Anonymous reader.
Reads the codebase.
Trust: none required.
Step 1: First-time contributor (Javadoc fix).
Patch passes mechanical checks.
Trust to receive: a few minutes of review attention.
Step 2: Multi-patch contributor.
Several patches in over weeks/months.
Trust to receive: a sympathetic reviewer who will guide.
Trust to give: explain your reasoning on JIRA without being asked.
Step 3: Repeat contributor in one area.
Becomes recognised as an expert in that area.
Trust to receive: their +1 (non-binding) carries weight on patches in that area.
Trust to give: stay engaged on follow-up issues.
Step 4: Reviewer.
Provides non-binding +1 on others' patches with insight.
Trust to receive: PMC members notice.
Trust to give: your reviews must be substantive, not drive-by +1s.
Step 5: Committer (the bit).
Granted by PMC vote on private@.
Trust to receive: commit access to apache/tez.
Trust to give: review patches in your areas, mentor newcomers, attend to dev@.
Step 6: PMC member.
Granted later, after sustained committership.
Trust to receive: binding release vote, security-disclosure access.
Trust to give: stewardship duties (legal, brand, community, releases).
Each step takes months of consistent engagement. The ladder is asymmetric: the contribution required to climb each step grows roughly linearly, but the trust granted grows roughly exponentially.
Patterns Committers Want
Beyond mechanical style, certain patterns mark a patch as "from someone who gets it":
Use the existing logging idiom
private static final Logger LOG = LoggerFactory.getLogger(MyClass.class);
// Then in method:
LOG.info("Initialized vertex {} with {} tasks", vertexName, numTasks);
Not System.out.println. Not LOG.info("Initialized vertex " + vertexName + ...) (the
string is built even when INFO is off in some logging stacks; with SLF4J it's avoided by
parameterized form).
Use existing helper classes
If tez-common has a TezUtils helper for serialising a config to a byte buffer, use
it. Don't write a new helper inline. Search:
grep -rn "class.*Utils" ~/tez-src/tez-common/src/main/java
Match the surrounding file's style for ambiguous things
If the file uses final on every parameter, your additions should too. If the file
uses single-letter loop variables (for (int i = 0; ...), don't suddenly switch to
for (int taskIndex = 0; ...). Match the file.
Avoid speculative generality
Don't introduce an interface "in case we need a second implementation later." Don't add a configuration key "in case someone wants to tune this." Both increase the surface area the committer pool must maintain forever.
Cite the JIRA in non-obvious code
// TEZ-4321: handle the case where inputs is null after recover.
if (inputs == null) {
inputs = Collections.emptyList();
}
The comment is a permanent breadcrumb back to the design discussion.
Keep try/catch narrow
// Good
try {
state = readState();
} catch (IOException e) {
LOG.warn("Failed to read state for {}", id, e);
return defaultState();
}
// Bad — catches too much
try {
state = readState();
process(state); // <-- different exception domain
publish(state); // <-- different exception domain
} catch (Exception e) { // <-- swallows everything
LOG.error("Something failed", e);
}
Don't add @SuppressWarnings without justification
// Bad
@SuppressWarnings("unchecked")
public List<T> getStuff() { ... }
// Good
@SuppressWarnings("unchecked") // safe; we control all writers
public List<T> getStuff() { ... }
A bare @SuppressWarnings is a code smell that says "I didn't want to deal with the
real warning."
Use specific exception types in throws
// Bad
public DAG build() throws Exception { ... }
// Good
public DAG build() throws TezException, IOException { ... }
throws Exception defeats the type system. Reviewers will ask for specifics.
How Trust Is Withdrawn
Trust is built one patch at a time; it can also erode. Things that erode committer trust in a contributor:
| Behavior | Erosion |
|---|---|
| Ghosting a patch mid-review | Significant; reviewer's time wasted |
| Re-attaching the same patch without addressing comments | Significant; wastes another review cycle |
| Arguing without evidence | Moderate; teaches reviewer to expect friction |
| Pinging weekly | Moderate; reviewer learns to deprioritise |
| Submitting a patch that breaks tests | Mild if rare; serious if pattern |
| Committing your own patch without review (as committer) | Serious; loss of community trust |
| Reverting another committer's work without discussion | Very serious; potential PMC issue |
| Public criticism of a committer for their review | Very serious |
The recoverable: explain, apologise, address the underlying issue. Trust returns.
The non-recoverable: code-of-conduct violations. PMC handles these privately.
From First Patch to Commit Bit — The Arc
A realistic 12-month arc for a contributor on the path:
Month 1 First Javadoc fix. Review takes 2 weeks (reviewer wasn't sure).
You learn the patch generation workflow.
Month 2 Three small bug fixes. Review faster (reviewer knows you).
You learn checkstyle, run it pre-submit.
Month 3 Mid-sized refactor. Two review rounds, no friction.
You start filing follow-up JIRAs from things you notice.
Month 4-5 You review someone else's patch with a substantive +1.
A PMC member notices on dev@.
Month 6 First design discussion on a JIRA. You write a one-page design.
Review goes well; consensus reached.
Month 7-8 You're patch-author on the implementation. Three review rounds.
Final commit feels routine.
Month 9 You shepherd a new contributor through their first patch.
PMC notices.
Month 10 You're proposed on private@. Vote passes.
You're a committer.
Month 11 You commit your first patch (someone else's, reviewed by you).
You explicitly don't commit your own work unreviewed.
Month 12 You're routine. You review 2-3 patches a month, file 2-3.
The flywheel.
This is one path, not the only path. Some contributors hit the bit at month 6 (extremely sustained activity); some at month 24+ (slower but steady). The trust ladder doesn't have a clock; it has a contribution count + sustained behavior pattern.
Validation Artifacts
After this chapter:
- Your IDE is configured to produce checkstyle-passing code at save time.
- Your pre-submit script runs
mvn checkstyle:checkandgit diff --check. - A
~/tez-notes/style-patterns.mdlisting the "patterns committers want" above. - A clear-eyed estimate of where you are on the trust ladder, and what step is next.
This chapter closes the Release & PMC Reality section. The next major section, Hive-on-Tez Labs, is operational engineering at the Tez/Hive boundary — the most common production context for Tez today.