Committer Mindset
Becoming a committer is a one-day event. Thinking like one is a multi-year practice. This chapter sketches the practice: the asymmetries, the recurring trade-offs, and the mental model that distinguishes "writes good patches" from "stewards the codebase."
The Long-Lived Code Tax
A contributor writes a patch and leaves. A committer commits a patch and inherits it forever. Every line a committer approves is theirs to debug at 11pm three years later when it breaks in production.
Practical consequence: the committer's "yes" is a much heavier word than the contributor's "this would be nice." Committers reflexively ask:
| Question | Why |
|---|---|
| Who will maintain this in 2 years? | Code without a maintainer becomes everyone's problem |
| Is the complexity proportional to the value? | Complex code is paid for in every future bug |
Does this make tez-dag harder to onboard into? | Onboarding cost is real |
| What's the failure mode at 10x scale? | Tez runs in production clusters at scale |
| Does this lock us into a design we'll regret? | API and proto changes are forever |
These are not abstractions. Every committer has at least one patch they regret approving. That memory is the source of the "no" muscle.
Reasoning About Compatibility
The compatibility surface is exhaustively documented in Compatibility. The mindset around it:
- Default to backwards-compat. A change that breaks no one is always preferable to one that breaks anyone, even if uglier.
- A deprecation is a promise. If you deprecate a method "to be removed in 0.12," it had better be removable in 0.12 — which means no production user can still be on it by then, which means the deprecation window has to be long enough to drain.
- Wire compat is not negotiable. A
DAGPlanchange that breaks recovery from an old AM means a cluster can't roll-restart safely. That's a P0 production issue. - Configuration compatibility is silent until it isn't. Renaming a key without a
deprecation alias breaks every cluster that has the old key in
tez-site.xml. Reviewers will catch this if they're paying attention; committers must always pay attention.
The mental model: imagine you are the SRE on call at a Fortune 500 that runs Tez via Hive at 1 AM. What does this patch do to your night?
Reasoning About Performance
Tez runs in the hot path of Hive on terabyte-scale workloads. A 5 ms regression in a per-task code path is real money. The mindset:
- Measure, don't guess. A patch claiming performance benefit needs numbers, not intuition. A patch claiming no performance impact in a hot path still needs a check.
- Hot vs. cold paths. Optimisations matter in
tez-runtime-libraryand the per-task paths oftez-runtime-internals. They matter much less intez-dagAM startup code that runs once per DAG. - GC is performance. A patch that allocates an extra object per task adds GC pressure at scale. Reuse buffers; use primitives; bound queues.
- Logging is performance.
LOG.debug("..." + obj)allocates the string even when DEBUG is off. UseLOG.debug("... {}", obj)instead.
The committer reading a patch in a hot path keeps these questions ready:
- Does this allocate per-record? Per-batch? Per-DAG?
- Is the allocation reusable / poolable?
- Is the log statement guarded or formatted?
- Has the contributor said how this performs at scale?
Reasoning About Complexity
Complexity has a half-life of bugs. The reviewing committer's complexity check:
| Complexity addition | What it costs |
|---|---|
| A new abstract base class | A new mental model for readers |
| A new configuration key | Documentation, default-tuning, deprecation later |
| A new state in a state machine | Combinatorial new transitions to test |
| A new event type | New event dispatcher cases, new history entries |
| A new public method | Compatibility commitment |
| A new dependency | Licensing review, attack surface, build complexity |
A patch that adds, say, a new configuration key for a corner-case behavior is not trivially "yes" even if the code is correct. The cost of the key — documentation, tuning, eventual deprecation — must justify the value.
The reflexive committer question: "Could this be a default, with no key?" If the answer is yes, skip the key.
Reasoning About Risk
Different code paths carry different risk profiles:
| Path | Risk |
|---|---|
tez-tools/ | Low. Process tooling; broken doesn't affect runtime. |
tez-mapreduce/ | Medium. Affects MR-on-Tez users; relatively well-tested. |
tez-runtime-library/ | High. In the per-task hot path. |
tez-runtime-internals/ | High. Task runtime; affects every DAG. |
tez-dag/ AM scheduling | High. AM bugs lose work. |
tez-dag/ DAG planning | Very high. Errors are bad DAGs. |
tez-api/ | Very high. Public API; breaking it breaks downstream projects. |
tez-api/src/main/proto/ | Critical. Wire format; cluster-rolling-restart implications. |
Committers calibrate review depth to risk. A 50-line patch in tez-tools/ may get a
quick read and +1. A 50-line patch in tez-api/src/main/proto/ gets word-by-word
scrutiny, a [DISCUSS] thread, and possibly a -1 if the protobuf change is anything
other than additive.
The "No" Muscle — When and How
The hardest committer skill is saying no. Not no-by-silence (the default and worst form), but explicit, kind, decisive no. Patterns for when to use it:
| Pattern | Pattern of "no" |
|---|---|
| Patch fixes a real but rare bug at the cost of significant new complexity | "Let's not fix this in code; document the workaround and close as Won't Fix." |
| Patch adds a feature with one user (the contributor) | "Could you maintain this as an out-of-tree plugin? VertexManagerPlugin exists for this." |
| Patch is technically correct but encodes a design that conflicts with planned direction | "We're going a different way on dev@ thread XYZ; let's wait." |
| Patch is correct but vastly over-scoped | "Could you split into 3 JIRAs? Happy to commit them one at a time." |
| Patch is correct but in a part of the codebase being rewritten | "Let's wait for TEZ-NNNN to land first; this conflicts." |
The crucial thing about saying no: do it early, explicitly, and once. Don't ghost the patch. The contributor's time is worth your one paragraph of explanation.
When to Refactor Unsolicited
A patch lands in a part of the codebase the committer has been wanting to refactor. The temptation is to do the refactor in or alongside the commit. Don't, except in narrow cases.
The rules:
- Refactor neither in the contributor's patch nor in the same commit. Their patch must match what was reviewed.
- File a follow-up JIRA for the refactor. Reference the contributor in CC; they often have context.
- Do the refactor in a separate review cycle. Either you do it (review by someone else) or someone else does it (review by you).
- Exception: If the contributor's patch sits in code that is literally being moved or removed by an imminent committed patch, coordinate. Either delay the contributor's patch or rebase the imminent one.
Mentoring Pattern
A committer's leverage is not just commits — it's mentoring. The well-trodden Apache mentoring pattern:
- Notice a thoughtful new contributor. Their first patch was clean; they responded
well to feedback; they asked good questions on
dev@. - Suggest a JIRA in your area. Comment on a JIRA: "This would be a good fit for NAME based on their recent work on TEZ-XXXX."
- Shepherd it. Review their patch yourself, fast. Set expectations on iteration count.
- Make them visible. Refer to their work on
dev@. Cite them in commits as you would any contributor. - Eventually propose them. When they hit the rough bar from
Meritocracy, propose them on
private@.
A committer who has mentored two or three contributors into committership has done more for the project than one who has committed thousands of patches.
Time Allocation
Newly-minted committers underestimate how time-consuming the role is. A rough budget for sustained committership:
| Activity | Weekly time |
|---|---|
| Reviewing patches | 2–4 hours |
| Filing or shepherding your own patches | 2–4 hours |
dev@ discussion participation | 1–2 hours |
| JIRA triage (closing dups, asking for repros) | 0.5–1 hour |
| Mentoring | 0.5–1 hour |
| Release work (during release windows) | 4–8 hours |
A committer who spends 0.5 hours/week on the project will be reactive at best and become inactive within a year. A committer who spends 4+ hours/week stewards the codebase.
Avoiding Burnout
The committer pool at any Apache project is finite. Burnout is a real failure mode:
| Burnout signal | Self-rescue |
|---|---|
| Reviewing patches feels like a chore | Take a 2-week formal break; tell dev@ |
| You're saying yes to patches you don't believe in | Practice saying no |
| You're the only reviewer for an area | Mentor someone into co-reviewing |
| You're sleeping less because of a release window | Ask the PMC to split the RM duties |
| You haven't filed a JIRA you cared about in months | Stop reviewing for a week; write |
Committership is voluntary. Stepping back is honourable. Emeritus committer status exists at Apache for those who want a graceful exit; you can come back later.
Validation Artifacts
After this chapter:
- A
~/tez-notes/committer-questions.mdof the five recurring questions a committer asks of every patch. - The discipline to score each Tez file path you touch by risk tier.
- The vocabulary to say no, in writing, with no rancour.
- The plan to do mentoring at some point in your committer life.
The next chapter — Release Voting — is the operational manual for the most visible PMC-level work: cutting a release.