Evaluation Rubric
A 100-point self-grading rubric for the Capstone. Score yourself honestly after you finish Step 10. The scoring is calibrated against what Tez committers actually look for — not what feels good to read.
The point of this rubric is not the score. It is the diagnostic: a low score on one dimension tells you exactly where to invest the next contribution.
Scoring Dimensions
Seven dimensions, weighted by how much they matter for review outcomes.
| # | Dimension | Points |
|---|---|---|
| 1 | Problem articulation | 20 |
| 2 | Execution-path mastery | 20 |
| 3 | Implementation quality | 20 |
| 4 | Testing | 15 |
| 5 | Review responsiveness | 10 |
| 6 | Documentation | 10 |
| 7 | Community interaction | 5 |
| Total | 100 |
1. Problem Articulation (20 pts)
Can you state, in one paragraph, what was broken, for whom, under what conditions?
| Score | What it looks like |
|---|---|
| 18-20 | Crisp one-paragraph statement covering symptom, trigger conditions, affected version range, and operational impact. Distinguishes "this is what the user sees" from "this is the underlying mechanism." Could be read aloud at a standup and a peer would correctly grasp the bug. |
| 14-17 | Clear symptom but trigger conditions vague ("happens sometimes under load"). OR trigger clear but conflates symptom with root cause. |
| 10-13 | Reader needs to ask follow-up questions to understand what was broken. Uses jargon without grounding it in user-visible behavior. |
| 5-9 | Mostly restates the JIRA title. No conditions. No version impact. |
| 0-4 | "It was broken and I fixed it." |
Look for: the absence of the word "intermittent" without a documented trigger; conflation of symptom (vertex stuck) with cause (event short-circuit).
2. Execution-Path Mastery (20 pts)
Did you actually trace the code, or did you guess?
| Score | What it looks like |
|---|---|
| 18-20 | Step-3 document maps the full path from user submission to bug location with file:line citations at every layer. Includes a diagram (mermaid or text-arrow). Cites the AsyncDispatcher event hop and the specific state-machine transition where the bug fires. Reviewer reading it could open each file at each line and follow the logic without asking questions. |
| 14-17 | Most layers cited but one or two skipped ("then the event reaches VertexImpl"). Diagram present but missing a critical hop. |
| 10-13 | Cites the location of the bug correctly but does not trace how execution reached it. No diagram. |
| 5-9 | Vague references ("the dispatcher handles it") without file:line. |
| 0-4 | No execution-path document, or it is just a paragraph of prose. |
Look for: presence of tez-api/src/main/...-style paths with line numbers
that match the resolved commit SHA.
3. Implementation Quality (20 pts)
Diff hygiene, scope discipline, convention compliance.
| Score | What it looks like |
|---|---|
| 18-20 | Minimum-diff fix. Production change measured in tens of lines, not hundreds. Every changed line is justifiable in one sentence. No drive-by refactors, no opportunistic renames. Public API surface unchanged unless required. Naming, slf4j logging style, Preconditions, exception messages all match Tez conventions. Checkstyle, SpotBugs, RAT all green without manual overrides. |
| 14-17 | Mostly minimum-diff but one or two stray changes that don't belong. Conventions mostly followed; minor style nits a reviewer would flag. |
| 10-13 | Fix works but is broader than necessary. Scope creep ("while I was here I cleaned up..."). Conventions inconsistently applied. |
| 5-9 | Significant scope creep. Public API changed unnecessarily. Style violations would block precommit without revision. |
| 0-4 | Diff is so large reviewers would request it be broken up before reviewing. OR breaks public API silently. |
Look for: scope-creep tells: git diff origin/master --stat with files
unrelated to the bug touched.
4. Testing (15 pts)
Coverage, determinism, regression value.
| Score | What it looks like |
|---|---|
| 14-15 | New unit test reproduces the bug deterministically on master (DrainDispatcher or equivalent), passes with fix. Negative-control test (similar input where the bug should NOT trigger) included. Branch coverage on the changed lines is high. Integration test with MiniTezCluster confirms the fix in an end-to-end DAG. No Thread.sleep, no wall-clock dependencies, no order-dependent assertions. Test ran 10x in a loop without flake. |
| 11-13 | Unit test present and deterministic but no negative control. OR has an integration test but the unit test is weak. |
| 7-10 | Unit test present but uses Thread.sleep or is otherwise non-deterministic. Coverage of fix path incomplete. |
| 3-6 | Test exists but only checks the happy path; would have passed before the fix. |
| 0-2 | No new tests, or tests that fail on master AND on the fix. |
Look for: presence of dispatcher.await() rather than Thread.sleep; a
test name that describes the scenario (testV_TASK_COMPLETED_inRunningWithRecovery)
rather than the method (testHandle).
5. Review Responsiveness (10 pts)
How well you ran the review cycle.
| Score | What it looks like |
|---|---|
| 9-10 | Every reviewer comment addressed in code or with a substantive reply. Iteration cadence < 48h on most comments. Disagreements (when they happened) made the technical case without defensiveness. Updated PR description after material changes so the top-of-PR text stays accurate. |
| 7-8 | Addresses comments correctly but slow (multi-day gaps). OR addresses most comments but lets a few stylistic ones slide without acknowledgement. |
| 5-6 | Defensive on at least one comment ("but I think my way is fine"). OR force-pushed without summarizing the diff for reviewers. |
| 2-4 | Required multiple reminders from reviewers. Comments not addressed cleanly. |
| 0-1 | PR went silent for > 2 weeks without explanation, or contributor argued every comment. |
Look for: PR review threads marked "resolved" by the contributor with a substantive commit pushed, not just a reply.
6. Documentation (10 pts)
JIRA fields, code comments, write-up presence.
| Score | What it looks like |
|---|---|
| 9-10 | JIRA has Component, Affects Version, Release Notes (if user-visible), PR link, and relevant cross-links. In-code comments cite TEZ-NNNN where the change is non-obvious. Write-up exists at a public URL. JIRA status correctly walked through In Progress -> Patch Available. |
| 7-8 | JIRA mostly filled but Release Notes missing on a user-visible change. Code comments present but don't cite the JIRA. |
| 5-6 | JIRA workflow followed but fields incomplete. No write-up beyond the PR description. |
| 2-4 | JIRA fields blank or wrong. Comments absent at the surprising lines. |
| 0-1 | No JIRA hygiene at all. |
Look for: the JIRA's "Release Notes" field being populated or an explicit note explaining why it's intentionally blank.
7. Community Interaction (5 pts)
Mailing list etiquette, claiming/handoff hygiene.
| Score | What it looks like |
|---|---|
| 5 | Claimed the JIRA before starting. Posted to dev@ only when meaningful (design question, summary after merge). Used [TEZ-NNNN] subject prefix. Was reachable during review. Thanked reviewers explicitly. If they hit a wall, posted clearly with "stuck on X, considering A/B/C, leaning A because Y." |
| 3-4 | Mostly good etiquette; one minor slip (claimed late, or one off-topic mailing-list post). |
| 1-2 | Did not claim the JIRA before working. OR sent mailing-list traffic that was really just chat ("does anyone know..."). |
| 0 | Worked silently for weeks, then dropped a PR with no JIRA assignment and no context. |
Look for: a JIRA comment by the contributor before the first PR push, along the lines of "Working on this, will have a patch in a few days."
Tier Thresholds
Where you land tells you what to do next.
| Score | Tier | Interpretation |
|---|---|---|
| 95-100 | PMC-ready | This is the quality of work that earns a committer vote, given a track record of several such contributions over months. You are operating at the level of someone the PMC would trust to maintain a module. |
| 90-94 | Committer-ready | You are writing patches at committer quality. With 3-5 such contributions across different modules over 6-12 months and demonstrated review participation on others' patches, a vote is plausible. |
| 80-89 | Strong contributor | A reliable contributor whose patches need minimal review iteration. Keep building the track record; this is the level where committers actively look forward to reviewing your work. |
| 65-79 | Contributor | Solid bug-fix-grade work. Patches land with normal review iteration. Most contributions to most projects live here, and it is honorable work. |
| 50-64 | Learning | Patches eventually land but with significant reviewer guidance. Use the next contribution to focus on the dimension where you scored lowest. |
| < 50 | Foundational gap | The contribution may have merged, but the process skipped enough corners that another reviewer or future maintainer is paying a tax. Restart with a smaller bug and apply the rubric end-to-end. |
The tier is not a personality assessment. It is calibrated to the artifact you produced for this one Capstone. The same person can score 65 on one contribution and 95 on the next.
How to Self-Grade
Block 30 minutes. Open this rubric. Open your own artifacts side by side (JIRA, PR, code, root-cause doc, write-up, validation report). Score each dimension by reading the band descriptions and picking the one that most honestly matches what you produced.
Two rules:
- No interpolation upward. If you're between 14 and 17 on a dimension and unsure, score 14. The optimist's tax.
- One independent reviewer. Ask a peer (ideally another contributor) to score independently on the same rubric. If your scores differ by more than 10 points on any dimension, talk about it. The difference is where the calibration lives.
Record both scores in capstone-work/self-grade.md along with one sentence
per dimension on what would have moved the score up one band. This becomes
the input for the next contribution's plan.
What to Do With a Low Score
| Lowest dimension | Next contribution focus |
|---|---|
| Problem articulation | Pick a smaller, sharper bug. Write the one-paragraph statement before opening the JIRA edit, and post it for review. |
| Execution-path mastery | Pick a bug in a layer you've never traced (e.g. you've done DAG-level, now do shuffle-level). Force yourself to write the path doc before reading the existing tests. |
| Implementation quality | Pick a bug where the minimum fix is < 10 lines. Practice the discipline of leaving the surrounding code untouched. |
| Testing | Pick a flaky-test JIRA (Stage 9 of the roadmap). The whole bug is about testing discipline. |
| Review responsiveness | Pick a bug in a high-traffic area where you'll get more reviewers. Set a 24-hour SLA for yourself on every comment. |
| Documentation | Pick a bug that requires a Release Notes entry. Write the entry before the fix is done. |
| Community interaction | Reply substantively to three other contributors' patches before opening your next one. |
Validation / Self-check
Before declaring the Capstone done:
capstone-work/self-grade.mdexists with a score per dimension and a total.- The total is honest, not aspirational — you can defend each dimension's score with citations to your own artifacts.
- At least one independent reviewer has also scored, and disagreements
10 points on any dimension have been discussed.
- The lowest dimension is identified and the next contribution's focus is written down.
- The score is recorded somewhere you'll see again in 3 months (calendar reminder, journal, follow-on JIRA list).
- You understand that the tier label ("Contributor", "Committer-ready") describes this one piece of work, not you.
- You have a candidate next bug picked, with the focus dimension in mind.