Level 2: Apache Contributor Onboarding
This level teaches you how the Apache open-source contribution machine works — not in the abstract, but in the specific context of Apache Tez. You will set up your tooling, understand the community structure, learn the patch workflow, and submit your first meaningful change.
Learning Objectives
By the end of Level 2 you must be able to:
- Subscribe to
dev@tez.apache.organd read a week's worth of threads - Navigate Apache Tez JIRA to find and evaluate open issues
- Describe the full lifecycle of a patch: from JIRA issue to committed code
- Generate a unified diff patch from a Git branch
- Run Apache checkstyle and resolve all violations before submitting a patch
- Write a JIRA comment that adds technical value
- Find any class in the Tez repository in under 30 seconds
Apache Open-Source Contribution Fundamentals
Apache projects operate differently from GitHub-native open-source projects. The primary communication channels are mailing lists, not GitHub issues or Slack. Patches are attached to JIRA issues, not submitted as GitHub pull requests (though GitHub PRs may be used as a convenience in some projects — Tez still prefers JIRA-based workflow).
The Contribution Hierarchy
PMC (Project Management Committee)
└─ Committers (can commit directly)
└─ Contributors (submit patches via JIRA)
└─ Everyone else (can file issues, ask questions)
Becoming a contributor means submitting patches. Becoming a committer means sustained, high-quality contributions over time that earn the trust of existing committers.
The Patch Lifecycle
1. Find or file a JIRA issue
2. Leave a comment: "I'm looking into this"
3. Make changes on a local branch
4. Run: mvn test -pl <module> -am (must pass)
5. Run: mvn checkstyle:check -pl <module> (must pass)
6. Generate a patch: git diff origin/master > TEZ-NNNN.patch
7. Attach the patch to the JIRA issue
8. Set JIRA status to "Patch Available"
9. Wait for review — a committer will comment or set "Reviewed" or "Not a bug"
10. Address feedback → upload v2 patch → repeat
11. Committer commits the patch (you cannot commit yourself until you are a committer)
Required Reading
| # | Resource | What to extract |
|---|---|---|
| 1 | Apache Tez Contributing | The official contribution guide |
| 2 | Apache JIRA for Tez | Browse recent issues to understand what active work looks like |
| 3 | dev@tez.apache.org archives | Read 2 weeks of mailing list threads at https://lists.apache.org/list.html?dev@tez.apache.org |
| 4 | src/config/checkstyle.xml in the Tez repo | What style rules are enforced |
| 5 | Apache How It Works | Meritocracy, governance, why Apache operates the way it does |
| 6 | Any 3 recently closed Tez patches | Read the JIRA comment thread — observe how committers give feedback |
Source Code Areas to Inspect
| File | Why |
|---|---|
pom.xml (root) | Module structure, dependency management, build profiles |
tez-dag/pom.xml | Module-level dependency declarations |
src/config/checkstyle.xml | Style rules enforced on every patch |
src/config/checkstyle-suppressions.xml | Suppressions — which files are exempt and why |
.gitignore | What is excluded from version control |
| Any recently committed file | Read the commit message format |
Apache Tez JIRA Structure
Issue Types You Will Encounter
| Type | Description |
|---|---|
| Bug | A defect in behavior |
| Improvement | An enhancement to existing functionality |
| New Feature | Something that does not exist yet |
| Task | Non-code work (documentation, release, etc.) |
| Sub-task | Part of a larger issue |
| Test | Adding or fixing a test |
Priority Levels
| Priority | Meaning |
|---|---|
| Blocker | Prevents a release |
| Critical | Significant data loss or correctness risk |
| Major | Important but not release-blocking |
| Minor | Small issue or improvement |
| Trivial | Typo, cosmetic, minor cleanup |
For Level 2 contributors: Only work on
MinorandTrivialissues. Do not pick upMajoror higher issues until you have at least 3 accepted patches in the project.
Component Labels
JIRA issues are labeled by component. The most relevant for early contributors:
| Component | What it covers |
|---|---|
Tez-DAG | DAG execution, AM, state machines |
Tez-Runtime | I/O library, shuffle |
Tez-API | Public API — high stability required |
Documentation | Docs, Javadoc, website |
Tests | Test additions and fixes |
Mailing List Etiquette
How to Subscribe
# Send an empty email to:
dev-subscribe@tez.apache.org
# You will receive a confirmation email — reply to it
What to Read First
Do not post until you have read at least two weeks of threads. Understand:
- What issues are currently being discussed
- How committers respond to patches
- The tone and technical depth expected
- What questions get quick responses vs. what gets ignored
How to Ask a Question
Good question format:
Subject: [QUESTION] Understanding VertexImpl initialization flow
Hi dev@,
I'm trying to understand the initialization sequence in VertexImpl.
Specifically, I'm looking at the transition from INITIALIZING to INITED
in VertexImpl.java around line 1234.
The code calls rootInputInitializer() before transitioning, but I'm unclear
on what happens if an initializer throws an unchecked exception.
I've read the JIRA issue TEZ-XXXX and the associated commit, but I still
have this question. Can anyone point me to the relevant code path?
Thanks,
[Your name]
What makes this question good:
- Specific class and approximate line number
- State machine terminology used correctly
- References prior research
- Concrete question, not "how does Tez work?"
What makes a question bad:
- "How do I contribute?" — this is answered in the contributing guide
- "Can you explain how shuffle works?" — too broad; you should read the code first
- Posting before subscribing and reading archives
Apache Checkstyle
Tez enforces checkstyle on every patch. A patch that fails checkstyle will not be committed.
Running Checkstyle
# Check a specific module
mvn checkstyle:check -pl tez-dag
# Check all modules (slow)
mvn checkstyle:check
# Check and see violations inline
mvn checkstyle:checkstyle -pl tez-dag
open tez-dag/target/checkstyle-result.xml
Common Violations
| Violation | Cause | Fix |
|---|---|---|
UnusedImports | Import statement for an unused class | Remove the import |
LineLength | Line exceeds 100 characters | Break the line |
WhitespaceAround | Missing space around operator | Add space |
LeftCurly | { on wrong line | Move to end of previous line |
JavadocMethod | Public method missing Javadoc | Add /** ... */ block |
FinalClass | Utility class not declared final | Add final modifier |
JIRA Issue Categories for Level 2 Contributors
In addition to Level 1 categories, you can now attempt:
- Test improvements — adding tests for uncovered paths you identify from reading the code
- Logging improvements — adding
LOG.debug()statements that would help diagnose issues - Checkstyle fixes — especially in modules you have been reading
Discipline: The quality of your first 5 patches determines how quickly you build credibility in the community. A patch with a checkstyle violation, compilation error, or test failure will be rejected immediately. Every patch must be verified locally before upload.
Deliverables
-
Subscribed to
dev@tez.apache.organd can describe two active discussions - Apache JIRA account created
- One JIRA issue identified, studied, and commented on (even if not yet working on it)
- Lab 2.1 completed: module-by-module walkthrough documented
- Lab 2.2 completed: patch generated, checkstyle passing, JIRA description written
- Understanding of the difference between a Minor and a Trivial issue
Common Mistakes
| Mistake | Consequence | Fix |
|---|---|---|
| Opening a GitHub PR instead of attaching a patch to JIRA | PR will likely be ignored or closed | Use JIRA; attach a .patch file |
| Submitting a patch that changes formatting in unrelated lines | Noise in the diff; committers reject it | Change only the lines you meant to change |
| Claiming an issue without leaving a JIRA comment | Another contributor may do the same work | Comment "I am investigating this" before starting |
| Submitting a patch without running tests | Immediate rejection | Test everything locally first |
| Writing a JIRA comment that just says "fix attached" | Unhelpful; committers will ask for explanation | Explain what was wrong and what the fix does |
Using git commit -m "fix" | Unprofessional commit message | Format: TEZ-NNNN. Short description of change. |
How to Verify Success
# Your patch generates cleanly
git diff origin/master > /tmp/TEZ-NNNN.001.patch
cat /tmp/TEZ-NNNN.001.patch | head -20 # should show only your intended changes
# Checkstyle passes on the module you changed
mvn checkstyle:check -pl <changed-module>
# Tests pass
mvn test -pl <changed-module> -am -Dtest=<RelevantTestClass>
Patch Profile: Level 2 Graduate
| Patch type | Example | Test requirement |
|---|---|---|
| Javadoc improvement | Add missing @throws annotation to a method | None |
| Log statement improvement | Add context to an existing LOG.warn that is unhelpful | Run the affected test class |
| Checkstyle fix | Fix unused import across multiple files in one module | Run mvn checkstyle:check -pl <module> |
| Test comment improvement | Add test setup comments explaining what MockAppContext does | Run the test class |
You are not ready to submit: behavioral code changes, new features, bug fixes in state machines or shuffle. Continue to Level 3.