Level 2: Apache Contributor Onboarding

This level teaches you how the Apache open-source contribution machine works — not in the abstract, but in the specific context of Apache Tez. You will set up your tooling, understand the community structure, learn the patch workflow, and submit your first meaningful change.


Learning Objectives

By the end of Level 2 you must be able to:

  1. Subscribe to dev@tez.apache.org and read a week's worth of threads
  2. Navigate Apache Tez JIRA to find and evaluate open issues
  3. Describe the full lifecycle of a patch: from JIRA issue to committed code
  4. Generate a unified diff patch from a Git branch
  5. Run Apache checkstyle and resolve all violations before submitting a patch
  6. Write a JIRA comment that adds technical value
  7. Find any class in the Tez repository in under 30 seconds

Apache Open-Source Contribution Fundamentals

Apache projects operate differently from GitHub-native open-source projects. The primary communication channels are mailing lists, not GitHub issues or Slack. Patches are attached to JIRA issues, not submitted as GitHub pull requests (though GitHub PRs may be used as a convenience in some projects — Tez still prefers JIRA-based workflow).

The Contribution Hierarchy

PMC (Project Management Committee)
  └─ Committers (can commit directly)
       └─ Contributors (submit patches via JIRA)
            └─ Everyone else (can file issues, ask questions)

Becoming a contributor means submitting patches. Becoming a committer means sustained, high-quality contributions over time that earn the trust of existing committers.

The Patch Lifecycle

1. Find or file a JIRA issue
2. Leave a comment: "I'm looking into this"
3. Make changes on a local branch
4. Run: mvn test -pl <module> -am  (must pass)
5. Run: mvn checkstyle:check -pl <module>  (must pass)
6. Generate a patch: git diff origin/master > TEZ-NNNN.patch
7. Attach the patch to the JIRA issue
8. Set JIRA status to "Patch Available"
9. Wait for review — a committer will comment or set "Reviewed" or "Not a bug"
10. Address feedback → upload v2 patch → repeat
11. Committer commits the patch (you cannot commit yourself until you are a committer)

Required Reading

#ResourceWhat to extract
1Apache Tez ContributingThe official contribution guide
2Apache JIRA for TezBrowse recent issues to understand what active work looks like
3dev@tez.apache.org archivesRead 2 weeks of mailing list threads at https://lists.apache.org/list.html?dev@tez.apache.org
4src/config/checkstyle.xml in the Tez repoWhat style rules are enforced
5Apache How It WorksMeritocracy, governance, why Apache operates the way it does
6Any 3 recently closed Tez patchesRead the JIRA comment thread — observe how committers give feedback

Source Code Areas to Inspect

FileWhy
pom.xml (root)Module structure, dependency management, build profiles
tez-dag/pom.xmlModule-level dependency declarations
src/config/checkstyle.xmlStyle rules enforced on every patch
src/config/checkstyle-suppressions.xmlSuppressions — which files are exempt and why
.gitignoreWhat is excluded from version control
Any recently committed fileRead the commit message format

Apache Tez JIRA Structure

Issue Types You Will Encounter

TypeDescription
BugA defect in behavior
ImprovementAn enhancement to existing functionality
New FeatureSomething that does not exist yet
TaskNon-code work (documentation, release, etc.)
Sub-taskPart of a larger issue
TestAdding or fixing a test

Priority Levels

PriorityMeaning
BlockerPrevents a release
CriticalSignificant data loss or correctness risk
MajorImportant but not release-blocking
MinorSmall issue or improvement
TrivialTypo, cosmetic, minor cleanup

For Level 2 contributors: Only work on Minor and Trivial issues. Do not pick up Major or higher issues until you have at least 3 accepted patches in the project.

Component Labels

JIRA issues are labeled by component. The most relevant for early contributors:

ComponentWhat it covers
Tez-DAGDAG execution, AM, state machines
Tez-RuntimeI/O library, shuffle
Tez-APIPublic API — high stability required
DocumentationDocs, Javadoc, website
TestsTest additions and fixes

Mailing List Etiquette

How to Subscribe

# Send an empty email to:
dev-subscribe@tez.apache.org
# You will receive a confirmation email — reply to it

What to Read First

Do not post until you have read at least two weeks of threads. Understand:

  • What issues are currently being discussed
  • How committers respond to patches
  • The tone and technical depth expected
  • What questions get quick responses vs. what gets ignored

How to Ask a Question

Good question format:

Subject: [QUESTION] Understanding VertexImpl initialization flow

Hi dev@,

I'm trying to understand the initialization sequence in VertexImpl.
Specifically, I'm looking at the transition from INITIALIZING to INITED
in VertexImpl.java around line 1234.

The code calls rootInputInitializer() before transitioning, but I'm unclear
on what happens if an initializer throws an unchecked exception.

I've read the JIRA issue TEZ-XXXX and the associated commit, but I still
have this question. Can anyone point me to the relevant code path?

Thanks,
[Your name]

What makes this question good:

  • Specific class and approximate line number
  • State machine terminology used correctly
  • References prior research
  • Concrete question, not "how does Tez work?"

What makes a question bad:

  • "How do I contribute?" — this is answered in the contributing guide
  • "Can you explain how shuffle works?" — too broad; you should read the code first
  • Posting before subscribing and reading archives

Apache Checkstyle

Tez enforces checkstyle on every patch. A patch that fails checkstyle will not be committed.

Running Checkstyle

# Check a specific module
mvn checkstyle:check -pl tez-dag

# Check all modules (slow)
mvn checkstyle:check

# Check and see violations inline
mvn checkstyle:checkstyle -pl tez-dag
open tez-dag/target/checkstyle-result.xml

Common Violations

ViolationCauseFix
UnusedImportsImport statement for an unused classRemove the import
LineLengthLine exceeds 100 charactersBreak the line
WhitespaceAroundMissing space around operatorAdd space
LeftCurly{ on wrong lineMove to end of previous line
JavadocMethodPublic method missing JavadocAdd /** ... */ block
FinalClassUtility class not declared finalAdd final modifier

JIRA Issue Categories for Level 2 Contributors

In addition to Level 1 categories, you can now attempt:

  • Test improvements — adding tests for uncovered paths you identify from reading the code
  • Logging improvements — adding LOG.debug() statements that would help diagnose issues
  • Checkstyle fixes — especially in modules you have been reading

Discipline: The quality of your first 5 patches determines how quickly you build credibility in the community. A patch with a checkstyle violation, compilation error, or test failure will be rejected immediately. Every patch must be verified locally before upload.


Deliverables

  • Subscribed to dev@tez.apache.org and can describe two active discussions
  • Apache JIRA account created
  • One JIRA issue identified, studied, and commented on (even if not yet working on it)
  • Lab 2.1 completed: module-by-module walkthrough documented
  • Lab 2.2 completed: patch generated, checkstyle passing, JIRA description written
  • Understanding of the difference between a Minor and a Trivial issue

Common Mistakes

MistakeConsequenceFix
Opening a GitHub PR instead of attaching a patch to JIRAPR will likely be ignored or closedUse JIRA; attach a .patch file
Submitting a patch that changes formatting in unrelated linesNoise in the diff; committers reject itChange only the lines you meant to change
Claiming an issue without leaving a JIRA commentAnother contributor may do the same workComment "I am investigating this" before starting
Submitting a patch without running testsImmediate rejectionTest everything locally first
Writing a JIRA comment that just says "fix attached"Unhelpful; committers will ask for explanationExplain what was wrong and what the fix does
Using git commit -m "fix"Unprofessional commit messageFormat: TEZ-NNNN. Short description of change.

How to Verify Success

# Your patch generates cleanly
git diff origin/master > /tmp/TEZ-NNNN.001.patch
cat /tmp/TEZ-NNNN.001.patch | head -20   # should show only your intended changes

# Checkstyle passes on the module you changed
mvn checkstyle:check -pl <changed-module>

# Tests pass
mvn test -pl <changed-module> -am -Dtest=<RelevantTestClass>

Patch Profile: Level 2 Graduate

Patch typeExampleTest requirement
Javadoc improvementAdd missing @throws annotation to a methodNone
Log statement improvementAdd context to an existing LOG.warn that is unhelpfulRun the affected test class
Checkstyle fixFix unused import across multiple files in one moduleRun mvn checkstyle:check -pl <module>
Test comment improvementAdd test setup comments explaining what MockAppContext doesRun the test class

You are not ready to submit: behavioral code changes, new features, bug fixes in state machines or shuffle. Continue to Level 3.