Step 1: Issue Selection

Picking the wrong issue is the most expensive mistake in the Capstone. Two weeks of investigation on a JIRA that turns out to be a duplicate, a WONTFIX, or a multi-month rearchitecture is two weeks you do not get back. The goal of this step is not to find a perfect issue. It is to find a tractable issue that exercises the parts of Tez you actually know.

Budget: 1–3 days. If you are past day 4 and still triaging, your standards are too high.


Where the Real Issues Live

Apache Tez tracks issues in JIRA at:

https://issues.apache.org/jira/projects/TEZ

There is no good-first-issue label on Tez (unlike Hadoop). The closest proxies are newbie, very small subtasks of larger umbrellas, and stale unassigned bugs with reproducers attached. You will write your own JQL.

Starter JQL Queries

Run these in JIRA's "Advanced" search box. Open each in a separate tab; do not chase one result before you have seen the whole landscape.

1. Unassigned open bugs, sorted by recency:

project = TEZ AND status in (Open, "In Progress")
  AND assignee is EMPTY
  AND type = Bug
ORDER BY created DESC

2. Bugs with reproducers attached (the gold standard):

project = TEZ AND status = Open
  AND type = Bug
  AND attachments is not EMPTY
ORDER BY updated DESC

3. Newbie-labeled (small surface area):

project = TEZ AND status = Open
  AND (labels = newbie OR labels = beginner OR labels = "low-hanging-fruit")
ORDER BY priority DESC, created DESC

4. Flaky tests (Stage 9 territory, often great Capstone fodder):

project = TEZ AND status = Open
  AND (summary ~ "flaky" OR summary ~ "intermittent" OR description ~ "flaky")
ORDER BY votes DESC

5. Open bugs touching modules you know:

project = TEZ AND status = Open AND type = Bug
  AND (component in ("tez-dag", "tez-runtime-internals", "tez-runtime-library")
       OR summary ~ "VertexImpl"
       OR summary ~ "ShuffleManager"
       OR summary ~ "AsyncDispatcher")
ORDER BY created DESC

Cast a wide net. Pull 20+ candidates into a scratchpad. You will trim aggressively.


Triage: Pick 5 Finalists from 20

For each candidate, spend 10–15 minutes — no more — answering this single question: "Could I write a failing test for this today?" If "no" or "I have no idea," drop it. If "probably yes, here's how," keep it.

Concrete triage protocol:

  1. Read the JIRA description and every comment. Watch for "I cannot reproduce" or "this is a duplicate of TEZ-XXXX" buried at the bottom.
  2. Check git log --grep "TEZ-NNNN" in your ~/tez-src/ clone — has it already been partially fixed?
  3. Search the dev@ mailing list archive for the issue number: https://lists.apache.org/list.html?dev@tez.apache.org.
  4. Open the linked files in your editor. Are they in tez-dag, tez-runtime-*, tez-api (familiar territory), or tez-ui, tez-plugins, tez-yarn-timeline-* (less familiar — skip unless you specifically studied them)?
  5. Note the Affects-Versions field. If it only affects 0.8.x and master has been rewritten in the area, the fix may not be portable.

Keep the 5 finalists in a markdown table:

| TEZ-NNNN | Title | Component | Reproducer? | Last activity | My read |
|---|---|---|---|---|---|
| TEZ-4321 | Fetcher hangs on connection reset | tez-runtime-library | none | 2024-11 | Plausible; I know ShuffleManager |
| TEZ-4456 | VertexImpl NPE on V_ROUTE_EVENT after kill | tez-dag | stack trace only | 2025-02 | Race-y; familiar state machine |
| ... | | | | | |

Scoring Rubric

Score each finalist 0–2 in each column. The winner is the highest aggregate.

Criterion012
ClarityDescription is one sentence and ambiguousDescription names symptom but not conditionsClear symptom + reproduction conditions in description
ScopeOpen-ended ("refactor X")Bounded but spans modulesBounded to one or two classes
IsolationRequires Hive/Pig runningNeeds MiniTezClusterCan be reproduced in pure unit test
TestabilityNo clear failing assertion possibleFailing assertion possible after MiniTezCluster runFailing assertion possible in DrainDispatcher test
AlignmentTouches code I have never readTouches one familiar classTouches 2–3 classes I have studied in Levels 4–6
Community engagementLast activity > 2 years, no watchersSome activity in last yearRecently discussed; a committer responded

Total possible: 12. Anything below 7 is risky. Pick the 9+ candidate.


Three Worked Examples

These are illustrative archetypes, not literal current JIRAs.

Candidate A: "ShuffleManager retries forever on IOException: Connection reset"

  • Clarity: 2 (description names the exception and the loop).
  • Scope: 2 (one class: ShuffleManager or Fetcher).
  • Isolation: 1 (need a fake Fetcher to inject the exception).
  • Testability: 2 (mock-based unit test with retry counter assertion).
  • Alignment: 2 (you read this in Level 5).
  • Community engagement: 1 (one committer comment, no resolution).
  • Total: 10. Pick this.

Candidate B: "Refactor DAGImpl state machine to use enum-based transitions"

  • Clarity: 1 (vague — "refactor").
  • Scope: 0 (touches DAGImpl, every event handler, every test).
  • Isolation: 0 (no failing behavior to test).
  • Testability: 0 (regression-only testing).
  • Alignment: 1 (you know DAGImpl but this is huge).
  • Community engagement: 0 (no committer +1).
  • Total: 2. Skip. This is a months-long design proposal, not a bug.

Candidate C: "Container reuse logs say assigned then released for same container"

  • Clarity: 2 (you can pull the log lines from the description).
  • Scope: 1 (touches TaskSchedulerManager and possibly YarnTaskSchedulerService).
  • Isolation: 0 (need MiniYARNCluster — slow, flaky, environment-sensitive).
  • Testability: 1 (assertions are on log content + scheduler state).
  • Alignment: 1 (you read TaskSchedulerManager once).
  • Community engagement: 2 (recent discussion).
  • Total: 7. Borderline. Pick only if you have no candidate above 8 and you budget extra time for the YARN harness.

Claiming the Issue

Once you decide, claim it publicly. This is non-negotiable — it prevents wasted work by others, and it commits you.

JIRA comment template

Hi — I'd like to work on this as part of an extended Tez learning project.

My plan:
1. Build a deterministic reproducer (target: <date+1 week>).
2. Root-cause analysis (target: <date+2 weeks>).
3. Patch + tests posted for review (target: <date+4 weeks>).

I'll post weekly updates here. If anyone with context has pointers on
<specific question, e.g. "whether this race was discussed in TEZ-NNNN">,
I'd be grateful. Otherwise I'll start on the reproducer this week.

— <Your Name>

Then assign the JIRA to yourself (you need a JIRA account; the Tez PMC grants contributor role on request — comment "please grant contributor role" on any issue and a PMC member will action it within a few days).

If you get no response in 5 business days

Post to dev@tez.apache.org:

Subject: [TEZ-NNNN] Working on this — any context before I dive in?

Hi all,

I left a comment on TEZ-NNNN <link> last week saying I plan to work on it. No
objections so far, so I'm starting on a reproducer this week. If anyone has
historical context — especially whether this overlaps with TEZ-XXXX — please
shout. Otherwise I'll update the JIRA as I make progress.

Thanks,
<Your Name>

If still no response after another week, proceed. Silence on a small bug is permission. (Silence on a redesign proposal is not — different beast.)


Red Flags: Issues to Skip

  • Last comment is from a committer saying "we should think about this more." You are not the right person to land a design call.
  • Open for >5 years with multiple abandoned patches. Something is structurally hard. Not Capstone material — pick later.
  • Touches tez-ui (Ember 1.x). The UI is on a separate lifecycle; build and test setup is divergent from the JVM modules you studied.
  • "Upgrade dependency X to version Y." Looks easy, ends up rebuilding the shuffle stack to handle a Guava API change. Skip unless you specifically want this experience.
  • Critical or Blocker priority with no patch. A committer would already be on it. If they are not, the issue may be misclassified or stale-critical.
  • Reproducer requires a specific Hive version + a 1TB TPC-DS run. No.

Validation / Self-check

Before you advance to Step 2, produce:

  1. A markdown table of your 5 finalists with full scoring rubric, saved as capstone-work/issue-shortlist.md.
  2. The TEZ-NNNN number of your chosen issue, posted as a JIRA comment claiming it.
  3. A 1-paragraph statement of why you picked it (which two criteria scored highest and which scored lowest).
  4. A self-assigned target date for Step 2 (deterministic reproducer in hand).
  5. Subscription confirmed to dev@tez.apache.org and the JIRA itself (click the "Start watching" eye icon).
  6. Your fork of apache/tez exists on GitHub with a branch named tez-NNNN-<short-slug> checked out locally.
  7. A note in capstone-work/issue-shortlist.md of any near-miss candidates you may revisit after the Capstone — these are your next contributions.