Committer Mindset

Becoming a committer is a one-day event. Thinking like one is a multi-year practice. This chapter sketches the practice: the asymmetries, the recurring trade-offs, and the mental model that distinguishes "writes good patches" from "stewards the codebase."

The Long-Lived Code Tax

A contributor writes a patch and leaves. A committer commits a patch and inherits it forever. Every line a committer approves is theirs to debug at 11pm three years later when it breaks in production.

Practical consequence: the committer's "yes" is a much heavier word than the contributor's "this would be nice." Committers reflexively ask:

QuestionWhy
Who will maintain this in 2 years?Code without a maintainer becomes everyone's problem
Is the complexity proportional to the value?Complex code is paid for in every future bug
Does this make tez-dag harder to onboard into?Onboarding cost is real
What's the failure mode at 10x scale?Tez runs in production clusters at scale
Does this lock us into a design we'll regret?API and proto changes are forever

These are not abstractions. Every committer has at least one patch they regret approving. That memory is the source of the "no" muscle.

Reasoning About Compatibility

The compatibility surface is exhaustively documented in Compatibility. The mindset around it:

  • Default to backwards-compat. A change that breaks no one is always preferable to one that breaks anyone, even if uglier.
  • A deprecation is a promise. If you deprecate a method "to be removed in 0.12," it had better be removable in 0.12 — which means no production user can still be on it by then, which means the deprecation window has to be long enough to drain.
  • Wire compat is not negotiable. A DAGPlan change that breaks recovery from an old AM means a cluster can't roll-restart safely. That's a P0 production issue.
  • Configuration compatibility is silent until it isn't. Renaming a key without a deprecation alias breaks every cluster that has the old key in tez-site.xml. Reviewers will catch this if they're paying attention; committers must always pay attention.

The mental model: imagine you are the SRE on call at a Fortune 500 that runs Tez via Hive at 1 AM. What does this patch do to your night?

Reasoning About Performance

Tez runs in the hot path of Hive on terabyte-scale workloads. A 5 ms regression in a per-task code path is real money. The mindset:

  • Measure, don't guess. A patch claiming performance benefit needs numbers, not intuition. A patch claiming no performance impact in a hot path still needs a check.
  • Hot vs. cold paths. Optimisations matter in tez-runtime-library and the per-task paths of tez-runtime-internals. They matter much less in tez-dag AM startup code that runs once per DAG.
  • GC is performance. A patch that allocates an extra object per task adds GC pressure at scale. Reuse buffers; use primitives; bound queues.
  • Logging is performance. LOG.debug("..." + obj) allocates the string even when DEBUG is off. Use LOG.debug("... {}", obj) instead.

The committer reading a patch in a hot path keeps these questions ready:

  • Does this allocate per-record? Per-batch? Per-DAG?
  • Is the allocation reusable / poolable?
  • Is the log statement guarded or formatted?
  • Has the contributor said how this performs at scale?

Reasoning About Complexity

Complexity has a half-life of bugs. The reviewing committer's complexity check:

Complexity additionWhat it costs
A new abstract base classA new mental model for readers
A new configuration keyDocumentation, default-tuning, deprecation later
A new state in a state machineCombinatorial new transitions to test
A new event typeNew event dispatcher cases, new history entries
A new public methodCompatibility commitment
A new dependencyLicensing review, attack surface, build complexity

A patch that adds, say, a new configuration key for a corner-case behavior is not trivially "yes" even if the code is correct. The cost of the key — documentation, tuning, eventual deprecation — must justify the value.

The reflexive committer question: "Could this be a default, with no key?" If the answer is yes, skip the key.

Reasoning About Risk

Different code paths carry different risk profiles:

PathRisk
tez-tools/Low. Process tooling; broken doesn't affect runtime.
tez-mapreduce/Medium. Affects MR-on-Tez users; relatively well-tested.
tez-runtime-library/High. In the per-task hot path.
tez-runtime-internals/High. Task runtime; affects every DAG.
tez-dag/ AM schedulingHigh. AM bugs lose work.
tez-dag/ DAG planningVery high. Errors are bad DAGs.
tez-api/Very high. Public API; breaking it breaks downstream projects.
tez-api/src/main/proto/Critical. Wire format; cluster-rolling-restart implications.

Committers calibrate review depth to risk. A 50-line patch in tez-tools/ may get a quick read and +1. A 50-line patch in tez-api/src/main/proto/ gets word-by-word scrutiny, a [DISCUSS] thread, and possibly a -1 if the protobuf change is anything other than additive.

The "No" Muscle — When and How

The hardest committer skill is saying no. Not no-by-silence (the default and worst form), but explicit, kind, decisive no. Patterns for when to use it:

PatternPattern of "no"
Patch fixes a real but rare bug at the cost of significant new complexity"Let's not fix this in code; document the workaround and close as Won't Fix."
Patch adds a feature with one user (the contributor)"Could you maintain this as an out-of-tree plugin? VertexManagerPlugin exists for this."
Patch is technically correct but encodes a design that conflicts with planned direction"We're going a different way on dev@ thread XYZ; let's wait."
Patch is correct but vastly over-scoped"Could you split into 3 JIRAs? Happy to commit them one at a time."
Patch is correct but in a part of the codebase being rewritten"Let's wait for TEZ-NNNN to land first; this conflicts."

The crucial thing about saying no: do it early, explicitly, and once. Don't ghost the patch. The contributor's time is worth your one paragraph of explanation.

When to Refactor Unsolicited

A patch lands in a part of the codebase the committer has been wanting to refactor. The temptation is to do the refactor in or alongside the commit. Don't, except in narrow cases.

The rules:

  • Refactor neither in the contributor's patch nor in the same commit. Their patch must match what was reviewed.
  • File a follow-up JIRA for the refactor. Reference the contributor in CC; they often have context.
  • Do the refactor in a separate review cycle. Either you do it (review by someone else) or someone else does it (review by you).
  • Exception: If the contributor's patch sits in code that is literally being moved or removed by an imminent committed patch, coordinate. Either delay the contributor's patch or rebase the imminent one.

Mentoring Pattern

A committer's leverage is not just commits — it's mentoring. The well-trodden Apache mentoring pattern:

  1. Notice a thoughtful new contributor. Their first patch was clean; they responded well to feedback; they asked good questions on dev@.
  2. Suggest a JIRA in your area. Comment on a JIRA: "This would be a good fit for NAME based on their recent work on TEZ-XXXX."
  3. Shepherd it. Review their patch yourself, fast. Set expectations on iteration count.
  4. Make them visible. Refer to their work on dev@. Cite them in commits as you would any contributor.
  5. Eventually propose them. When they hit the rough bar from Meritocracy, propose them on private@.

A committer who has mentored two or three contributors into committership has done more for the project than one who has committed thousands of patches.

Time Allocation

Newly-minted committers underestimate how time-consuming the role is. A rough budget for sustained committership:

ActivityWeekly time
Reviewing patches2–4 hours
Filing or shepherding your own patches2–4 hours
dev@ discussion participation1–2 hours
JIRA triage (closing dups, asking for repros)0.5–1 hour
Mentoring0.5–1 hour
Release work (during release windows)4–8 hours

A committer who spends 0.5 hours/week on the project will be reactive at best and become inactive within a year. A committer who spends 4+ hours/week stewards the codebase.

Avoiding Burnout

The committer pool at any Apache project is finite. Burnout is a real failure mode:

Burnout signalSelf-rescue
Reviewing patches feels like a choreTake a 2-week formal break; tell dev@
You're saying yes to patches you don't believe inPractice saying no
You're the only reviewer for an areaMentor someone into co-reviewing
You're sleeping less because of a release windowAsk the PMC to split the RM duties
You haven't filed a JIRA you cared about in monthsStop reviewing for a week; write

Committership is voluntary. Stepping back is honourable. Emeritus committer status exists at Apache for those who want a graceful exit; you can come back later.

Validation Artifacts

After this chapter:

  1. A ~/tez-notes/committer-questions.md of the five recurring questions a committer asks of every patch.
  2. The discipline to score each Tez file path you touch by risk tier.
  3. The vocabulary to say no, in writing, with no rancour.
  4. The plan to do mentoring at some point in your committer life.

The next chapter — Release Voting — is the operational manual for the most visible PMC-level work: cutting a release.