Deep Dives: Reading Order

This directory contains 21 deep-dive chapters. They are the reference material behind the Level curriculum. Each chapter is self-contained, but most chapters depend on a handful of earlier ones. Read in the order below the first time through; thereafter use the index as a lookup.

The chapters are grouped by subsystem. For each chapter we list:

  • Title — the file.
  • One-line summary — what you should walk away knowing.
  • Consumed by — which Levels/Labs depend on it.

Group 1 — The DAG Model and the Client

These four chapters define "what is a Tez job" before any execution machinery exists.

#FileSummaryConsumed by
1dag-model.mdDAG/Vertex/Edge as immutable plan; DAGPlan protobuf; validation rulesLevel 1 (all labs); Level 2 lab 2.1
2logical-physical.mdHow the logical DAG becomes a physical execution plan with concrete parallelismLevel 4 lab 4.2; Level 5 lab 5.1
3tez-client.mdClient-side bring-up: session mode, local resources, AM start, submission RPCLevel 3 lab 3.1; Level 7 lab 7.1
4dag-client.mdStatus polling, kill, error reporting; RPC vs ATS backendsLevel 3 lab 3.1; Level 8 lab 8.1

Start here. Without the DAG model in your head, every later chapter feels like trivia.


Group 2 — AM Lifecycle and Dispatch

#FileSummaryConsumed by
5dag-app-master.mdAM as YARN application; dispatchers, heartbeats, recoveryLevel 3 lab 3.2; Level 8 lab 8.2
6state-machines.mdHadoop StateMachineFactory API; dispatcher invariants; testsLevel 4 labs 4.1, 4.3, 4.4
7event-routing.mdThe event hierarchy; "events are the only mutation API" ruleLevel 4 (all labs)

These chapters explain how the AM mutates state. They must precede the per-entity lifecycle chapters that follow.


Group 3 — Per-Entity Lifecycle

#FileSummaryConsumed by
8vertex-lifecycle.mdVertexImpl state machine: NEW → SUCCEEDED, plus failure/kill pathsLevel 4 lab 4.2
9task-lifecycle.mdTaskImpl state machine; speculation; max-failed-attemptsLevel 4 lab 4.3
10task-attempt-lifecycle.mdTaskAttemptImpl state machine; container assignment; termination causesLevel 4 lab 4.4; Level 8 lab 8.2

Read 8, 9, 10 in this order. Each refers backward to events from chapter 7 and state-machine primitives from chapter 6.


Group 4 — Input/Processor/Output

#FileSummaryConsumed by
11ipo-abstractions.mdLogicalInput/LogicalOutput/Processor; lifecycle methods; mergedinputsLevel 5 lab 5.1; Level 7 lab 7.1
12tez-runtime.mdTezTaskRunner2, LogicalIOProcessorRuntimeTask, the umbilicalLevel 5 lab 5.1

These chapters live inside tez-runtime-internals and tez-runtime-library — the JVM the task actually runs in.


Group 5 — Shuffle, Sort, and Counters

#FileSummaryConsumed by
13shuffle-sort.mdSorter implementations, IFile, ShuffleManager, Fetcher, MergeManagerLevel 5 labs 5.2, 5.3
14counters-diagnostics.mdTezCounters, framework counters, custom counters, ATS publicationLevel 8 lab 8.1

If you skip 13, do not attempt to debug shuffle issues in production. Always read it cold before opening a fetcher-related JIRA.


Group 6 — Scheduling and Resources

#FileSummaryConsumed by
15scheduler.mdTaskSchedulerManager, YarnTaskSchedulerService, AMRM heartbeatsLevel 6 lab 6.2
16container-reuse.mdAMContainerImpl lifecycle; reuse policy; idle timeoutsLevel 6 labs 6.1, 6.2
17yarn-integration.mdYARN tokens, AMRM client, app master failover, log aggregationLevel 6 lab 6.2

Group 7 — Modes and Integrations

#FileSummaryConsumed by
18local-mode.mdLocalContainerLauncher, debugging without YARNLevel 2 labs
19hive-integration.mdHive TezTask, edge usage, DynamicPartitionPruning, ATS spansLevel 7 (Hive labs h1–h6)

Group 8 — Failure, Recovery, and Testing

#FileSummaryConsumed by
20failure-handling.mdTask retry, vertex rerun, AM restart, recovery recordsLevel 8 lab 8.2
21testing-framework.mdMiniTezCluster, MockContainerLauncher, DrainDispatcher, fault injectionLevel 2 labs; Level 4 labs

A note on order vs index

The deep-dives are an index — they exist to be looked up later. The first read should follow the table above. But when you return to fix a bug, jump directly to the chapter most relevant and use the cross-references inside it.

Every chapter ends with a Validation: prove you understand this section. Treat that as the gate before declaring the chapter "read."