Code Style & Trust

The Tez project enforces a specific code style via checkstyle. The style itself is less interesting than the trust mechanism it embodies: an automated, opinionated style is how a project of dozens of committers and hundreds of contributors keeps its codebase coherent without requiring every reviewer to argue about braces.

This chapter is the practical guide to the style, the tools that enforce it, and the trust ladder a contributor climbs from first patch to commit bit.

Where the Style Lives

Tez's checkstyle configuration:

cat ~/tez-src/tez-tools/src/main/resources/tez/checkstyle.xml

This file is the source of truth. If a reviewer says "your patch fails checkstyle," they mean this file is unhappy.

The file is invoked by the parent pom.xml:

grep -A10 "maven-checkstyle-plugin" ~/tez-src/pom.xml

Verify locally:

cd ~/tez-src
mvn checkstyle:check

Output on success is silent (exit 0). Output on failure lists each violation with file and line number.

The Rules That Matter

The full ruleset is the file above. The rules that catch contributors most often:

RuleWhat it enforces
Line lengthUsually 120 chars max
Indentation2 spaces (not 4, not tabs)
ImportsNo wildcard imports; specific order
Brace styleEgyptian ({ on same line)
Unused importsDisallowed
Member orderingStatic fields, instance fields, constructors, methods
Trailing whitespaceDisallowed
Final newlineRequired
@Override annotationsRequired when overriding
Javadoc on public methods of @Public classesRequired

The full list is in the file. Notable absences:

  • Tez does not enforce a strict naming convention beyond standard Java (camelCase, PascalCase for classes).
  • Tez does not enforce method length limits (so committers must catch overly long methods in review).
  • Tez does not enforce strict cyclomatic complexity (same).

So checkstyle is a floor, not a ceiling. Passing it doesn't mean the patch is well-styled in the human sense — it means the obvious mechanical violations are absent.

IDE Setup

Configure your IDE to match. IntelliJ:

1. File → Settings → Editor → Code Style → Java.
2. Set Tab size: 2; Indent: 2; Continuation indent: 4.
3. Use spaces, not tabs.
4. Wrapping: hard wrap at 120.
5. Import → Class count to use import with '*': 999.
6. Final newline: required.

Or import the Hadoop / Tez IntelliJ style file if one is in the repo:

find ~/tez-src -name "*.xml" | xargs grep -l "CodeStyle" 2>/dev/null | head

Eclipse: Window → Preferences → Java → Code Style → Formatter, import an XML if one is provided in tez-tools/.

VS Code with the Java extension: edit .vscode/settings.json per workspace:

{
  "java.format.settings.url": "tez-tools/src/main/resources/tez/eclipse-formatter.xml",
  "editor.tabSize": 2,
  "editor.insertSpaces": true,
  "files.insertFinalNewline": true,
  "files.trimTrailingWhitespace": true
}

The goal: at save time, your IDE produces checkstyle-passing code.

Catching Violations Pre-Submit

The pre-submit script (from Patch Quality):

#!/usr/bin/env bash
set -e
cd ~/tez-src
mvn install -DskipTests
mvn checkstyle:check
git diff --check                       # detects whitespace errors
mvn test -pl tez-dag,tez-api

git diff --check is a free win — it catches trailing whitespace and conflict markers before they reach the reviewer.

The Trust Ladder

Style is the visible surface of a deeper thing: trust. The contributor-to-committer path is a multi-step climb up a trust ladder.

Step 0: Anonymous reader.
        Reads the codebase.
        Trust: none required.

Step 1: First-time contributor (Javadoc fix).
        Patch passes mechanical checks.
        Trust to receive: a few minutes of review attention.

Step 2: Multi-patch contributor.
        Several patches in over weeks/months.
        Trust to receive: a sympathetic reviewer who will guide.
        Trust to give: explain your reasoning on JIRA without being asked.

Step 3: Repeat contributor in one area.
        Becomes recognised as an expert in that area.
        Trust to receive: their +1 (non-binding) carries weight on patches in that area.
        Trust to give: stay engaged on follow-up issues.

Step 4: Reviewer.
        Provides non-binding +1 on others' patches with insight.
        Trust to receive: PMC members notice.
        Trust to give: your reviews must be substantive, not drive-by +1s.

Step 5: Committer (the bit).
        Granted by PMC vote on private@.
        Trust to receive: commit access to apache/tez.
        Trust to give: review patches in your areas, mentor newcomers, attend to dev@.

Step 6: PMC member.
        Granted later, after sustained committership.
        Trust to receive: binding release vote, security-disclosure access.
        Trust to give: stewardship duties (legal, brand, community, releases).

Each step takes months of consistent engagement. The ladder is asymmetric: the contribution required to climb each step grows roughly linearly, but the trust granted grows roughly exponentially.

Patterns Committers Want

Beyond mechanical style, certain patterns mark a patch as "from someone who gets it":

Use the existing logging idiom

private static final Logger LOG = LoggerFactory.getLogger(MyClass.class);

// Then in method:
LOG.info("Initialized vertex {} with {} tasks", vertexName, numTasks);

Not System.out.println. Not LOG.info("Initialized vertex " + vertexName + ...) (the string is built even when INFO is off in some logging stacks; with SLF4J it's avoided by parameterized form).

Use existing helper classes

If tez-common has a TezUtils helper for serialising a config to a byte buffer, use it. Don't write a new helper inline. Search:

grep -rn "class.*Utils" ~/tez-src/tez-common/src/main/java

Match the surrounding file's style for ambiguous things

If the file uses final on every parameter, your additions should too. If the file uses single-letter loop variables (for (int i = 0; ...), don't suddenly switch to for (int taskIndex = 0; ...). Match the file.

Avoid speculative generality

Don't introduce an interface "in case we need a second implementation later." Don't add a configuration key "in case someone wants to tune this." Both increase the surface area the committer pool must maintain forever.

Cite the JIRA in non-obvious code

// TEZ-4321: handle the case where inputs is null after recover.
if (inputs == null) {
    inputs = Collections.emptyList();
}

The comment is a permanent breadcrumb back to the design discussion.

Keep try/catch narrow

// Good
try {
    state = readState();
} catch (IOException e) {
    LOG.warn("Failed to read state for {}", id, e);
    return defaultState();
}

// Bad — catches too much
try {
    state = readState();
    process(state);              // <-- different exception domain
    publish(state);              // <-- different exception domain
} catch (Exception e) {          // <-- swallows everything
    LOG.error("Something failed", e);
}

Don't add @SuppressWarnings without justification

// Bad
@SuppressWarnings("unchecked")
public List<T> getStuff() { ... }

// Good
@SuppressWarnings("unchecked") // safe; we control all writers
public List<T> getStuff() { ... }

A bare @SuppressWarnings is a code smell that says "I didn't want to deal with the real warning."

Use specific exception types in throws

// Bad
public DAG build() throws Exception { ... }

// Good
public DAG build() throws TezException, IOException { ... }

throws Exception defeats the type system. Reviewers will ask for specifics.

How Trust Is Withdrawn

Trust is built one patch at a time; it can also erode. Things that erode committer trust in a contributor:

BehaviorErosion
Ghosting a patch mid-reviewSignificant; reviewer's time wasted
Re-attaching the same patch without addressing commentsSignificant; wastes another review cycle
Arguing without evidenceModerate; teaches reviewer to expect friction
Pinging weeklyModerate; reviewer learns to deprioritise
Submitting a patch that breaks testsMild if rare; serious if pattern
Committing your own patch without review (as committer)Serious; loss of community trust
Reverting another committer's work without discussionVery serious; potential PMC issue
Public criticism of a committer for their reviewVery serious

The recoverable: explain, apologise, address the underlying issue. Trust returns.

The non-recoverable: code-of-conduct violations. PMC handles these privately.

From First Patch to Commit Bit — The Arc

A realistic 12-month arc for a contributor on the path:

Month 1   First Javadoc fix.   Review takes 2 weeks (reviewer wasn't sure).
          You learn the patch generation workflow.
Month 2   Three small bug fixes.   Review faster (reviewer knows you).
          You learn checkstyle, run it pre-submit.
Month 3   Mid-sized refactor.   Two review rounds, no friction.
          You start filing follow-up JIRAs from things you notice.
Month 4-5 You review someone else's patch with a substantive +1.
          A PMC member notices on dev@.
Month 6   First design discussion on a JIRA.   You write a one-page design.
          Review goes well; consensus reached.
Month 7-8 You're patch-author on the implementation.   Three review rounds.
          Final commit feels routine.
Month 9   You shepherd a new contributor through their first patch.
          PMC notices.
Month 10  You're proposed on private@.   Vote passes.
          You're a committer.
Month 11  You commit your first patch (someone else's, reviewed by you).
          You explicitly don't commit your own work unreviewed.
Month 12  You're routine.   You review 2-3 patches a month, file 2-3.
          The flywheel.

This is one path, not the only path. Some contributors hit the bit at month 6 (extremely sustained activity); some at month 24+ (slower but steady). The trust ladder doesn't have a clock; it has a contribution count + sustained behavior pattern.

Validation Artifacts

After this chapter:

  1. Your IDE is configured to produce checkstyle-passing code at save time.
  2. Your pre-submit script runs mvn checkstyle:check and git diff --check.
  3. A ~/tez-notes/style-patterns.md listing the "patterns committers want" above.
  4. A clear-eyed estimate of where you are on the trust ladder, and what step is next.

This chapter closes the Release & PMC Reality section. The next major section, Hive-on-Tez Labs, is operational engineering at the Tez/Hive boundary — the most common production context for Tez today.