Open-Source Engineer & Contributor

A collection of deep, implementation-level curricula for engineers who want to contribute seriously to major open-source projects — not just fix typos, but build the kind of sustained understanding that leads to committer status.

Each curriculum is designed around how the project is actually developed, tested, reviewed, and maintained by its core contributors. Labs reference real source code, real issue trackers, and real contribution workflows.


Curricula

ProjectFocusStatus
Apache TezDAG execution engine on YARN — used by Hive, Pig, and custom batch pipelinesActive
Apache KafkaDistributed log — producers, consumers, brokers, replication, Streams APIPlanned
Apache FlinkStreaming and batch — state machines, checkpointing, watermarks, operatorsPlanned
Apache SparkUnified analytics — scheduler, shuffle, RDD lineage, SQL planningPlanned
Apache HadoopHDFS, YARN, MapReduce — the foundation layer for everything abovePlanned

How to Use This Book

Each curriculum is self-contained. Start at the curriculum's Introduction page and work through its levels sequentially. Levels build on each other — skipping levels skips foundations that later labs depend on.

What you will need for any curriculum:

  • 3+ years of Java (or the project's primary language) on production-grade codebases
  • Comfort reading large, unfamiliar codebases without a guide
  • Git, a build tool (Maven / Gradle / sbt), and an IDE (IntelliJ recommended)
  • Patience: the path from contributor to committer is measured in months to years

Select a curriculum from the table above or from the sidebar to begin.

Apache Tez Open-Source Contributor Curriculum

Welcome to the Apache Tez Open-Source Contributor Curriculum — a complete, implementation-heavy roadmap for engineers who want to become serious Apache Tez contributors and eventually operate at the level of a core contributor, committer, or PMC-aware engineer.


What This Curriculum Is

This is not a tutorial. It is a structured engineering apprenticeship built around how Apache Tez is actually developed, tested, reviewed, and maintained by its committers and PMC members.

Every level is tied to real Apache Tez source code, real JIRA issue patterns, real test infrastructure, and real contribution workflows. The labs mirror the work an Apache Tez committer actually does — reading state machine code, tracing DAG execution paths, debugging shuffle failures, reproducing reported issues, and preparing patches for community review.

The curriculum will not hold your hand. It will point you at the right parts of the codebase, give you the right questions to ask, and push you to develop the muscle memory of someone who works at this level habitually.


Who This Is For

This curriculum is designed for strong backend and distributed systems engineers who:

  • Have 3+ years of Java development experience (Maven-based projects)
  • Are familiar with Hadoop, YARN, or MapReduce at a conceptual level
  • Understand distributed systems fundamentals: scheduling, fault tolerance, partitioning, shuffle
  • Want to contribute to Apache open-source at a serious level — not just fix typos

You should be comfortable with:

  • Reading large, unfamiliar Java codebases without a guide
  • git workflows, reading diffs, working with patch-based reviews
  • The Hadoop ecosystem at a high level: YARN, HDFS, MapReduce, Hive
  • Distributed execution concepts: task graphs, data movement, speculative execution

What You Will Be Able to Do

After completing this curriculum, you will be able to:

CapabilityDescription
Build and testBuild Apache Tez from source, run unit and integration tests, run DAGs locally
Navigate the codebaseFind any class, understand its role, trace execution across module boundaries
Understand DAG executionFollow a DAG from client submission through AM scheduling to task completion
Debug failuresDiagnose failed task attempts, hung DAGs, shuffle errors, and YARN allocation failures
Trace state machinesRead and reason about DAGImpl, VertexImpl, TaskImpl, TaskAttemptImpl state machines
Contribute patchesReproduce issues, fix bugs, write tests, prepare high-quality patches
Engage the communityInteract productively on JIRA and mailing lists
Understand Hive integrationTrace a SQL query through Hive planning to a Tez DAG execution
Think like a committerReason about compatibility, test stability, performance, and release impact

How to Use This Curriculum

Work through the 9 levels sequentially. Do not skip levels. Each level builds directly on the previous one, and the labs depend on the conceptual foundations laid earlier.

LevelTitleCore Focus
1Hadoop and Tez FoundationBuild, test, first DAG, Hadoop ecosystem
2Apache Contributor OnboardingWorkflow, patches, JIRA, mailing lists
3Tez ArchitectureDAG model, TezClient, DAGAppMaster, key subsystems
4DAG Execution InternalsState machines, vertex/task/attempt lifecycle, events
5Testing and DebuggingTest infra, mini-cluster, debugging failed tasks
6Hive/Tez IntegrationSQL-to-DAG, Hive integration, cross-project bugs
7Runtime and ShuffleTezRuntime, I/O abstractions, shuffle and sort
8Real Issue ContributionJIRA reproduction, root cause analysis, real patches
9Advanced Committer / PMCPerformance, backward compatibility, release practices

Beyond the 9 levels, the curriculum includes five additional sections:

SectionPurpose
Contributor MindsetHow to think, behave, and grow as an Apache contributor
Issue RoadmapStaged progression from beginner-friendly to release-blocking issues
Internals Deep Dives21 focused deep dives, each with a mini-lab
Hive-on-Tez LabsCross-project debugging, SQL-to-DAG tracing, integration bugs
Release, Review, and PMC PracticesApache governance, voting, licensing, release management

The curriculum closes with a Capstone Project — a full contribution cycle from issue reproduction to merged patch and engineering write-up.


Required Tools

Before starting Level 1, ensure you have the following installed and working:

Java 8 or Java 11 (OpenJDK recommended — match the Tez branch target)
Apache Maven 3.6.3 or newer
Git 2.x
IntelliJ IDEA (strongly recommended) or Eclipse with M2E
Docker (optional — useful for containerized mini-cluster environments)

You will also need:

Note on Java version: Apache Tez's master branch targets Java 8 as the minimum. Some newer branches may require Java 11. Always check the pom.xml at the root of the branch you are working on.


Apache Tez at a Glance

Apache Tez is a general-purpose DAG execution engine built on top of Apache YARN. It is the primary execution engine for Apache Hive since Hive 0.13, and is used by other Hadoop ecosystem projects including Pig, Cascading, and Spark (historically).

Why Tez Exists

MapReduce forces every computation into a Map → Shuffle → Reduce pattern. Complex analytical queries (like multi-join SQL) require chaining many MapReduce jobs, with intermediate results written to HDFS between each stage. This is slow and wasteful.

Tez allows arbitrary directed acyclic graphs (DAGs) of computation where:

  • Vertices represent computation stages
  • Edges represent data movement between stages
  • Container reuse eliminates JVM startup overhead between tasks
  • Data can be pipelined between tasks without HDFS materialization
  • The same container can run multiple task types

This makes Tez significantly faster than MapReduce for multi-stage queries.

Key Modules

You will spend the majority of your time in these modules:

ModulePathDescription
tez-apitez-api/Public API: DAG, Vertex, Edge, TezClient, DAGClient
tez-dagtez-dag/Core execution engine: AM, state machines, scheduling
tez-runtime-librarytez-runtime-library/Input/Output/Processor implementations, shuffle
tez-mapreducetez-mapreduce/MapReduce compatibility layer (MRInput, MROutput)
tez-runtime-internalstez-runtime-internals/Task execution framework, container management
tez-teststez-tests/Integration tests and system-level tests
tez-toolstez-tools/Utility tools (DAG recovery, history parsing)
tez-pluginstez-plugins/Optional plugins (LLAP, timeline server integration)

Key Classes (High-Level Preview)

ClassModuleRole
TezClienttez-apiEntry point for DAG submission from a client
DAGClienttez-apiHandle for monitoring a submitted DAG
DAGtez-apiDAG definition: vertices + edges
Vertextez-apiVertex definition: processor + parallelism
DAGAppMastertez-dagApplicationMaster — orchestrates DAG execution
DAGImpltez-dagState machine: models DAG lifecycle
VertexImpltez-dagState machine: models vertex lifecycle
TaskImpltez-dagState machine: models task lifecycle
TaskAttemptImpltez-dagState machine: models a single task attempt
TaskCommunicatorManagertez-dagManages communication between AM and task containers
TezTaskRunner2tez-runtime-internalsRuns a task inside a container
LogicalIOProcessorRuntimeTasktez-runtime-internalsWires up I/O processors inside a task

Apache Tez Community

Apache Tez is a mature project with an active but selective community. The codebase reflects years of careful design decisions, many of which are documented in JIRA issues, design documents, and mailing list threads rather than in code comments.

What the community values:

  • Patches that include tests
  • Issues that include a clear reproduction case
  • Comments that demonstrate you have read the existing code
  • Contributors who engage respectfully and patiently
  • Sustained contribution over time, not one-off patches

The path from contributor to committer is measured in years, not weeks. That is intentional. The Apache meritocracy rewards sustained, high-quality contribution — not volume of patches.

This curriculum will help you build the habits and depth of understanding that make that path realistic.


Begin with Level 1: Hadoop and Tez Foundation.

Overview & Prerequisites

Status: Full content coming in Phase 11.

This section covers the complete prerequisites checklist, environment setup guide, and how to navigate the curriculum effectively.

Topics covered:

  • Detailed environment setup (Java, Maven, Git, IDE configuration)
  • Cloning and verifying the Apache Tez and Hadoop repositories
  • Subscribing to Apache Tez mailing lists
  • Setting up an Apache JIRA account
  • How to navigate each curriculum level
  • How to use the labs

Tez Warm-Up: From Data Engineer to Source Contributor

Before you read a single line of VertexImpl.java, you need to have sat in the seat of the person whose workload Tez is serving. The engineers who built Tez's state machines, container reuse logic, and shuffle pipelines were solving specific, painful problems that showed up in production Hive and Pig workloads every day. If you skip that context and go straight to the source code, you will memorize class names without understanding why the design exists.

This chapter is the missing first mile. You will run Tez from the outside — as a data engineer would — across a series of practical scenarios covering different data shapes, query patterns, and ecosystem integrations. After each scenario, the chapter maps what you observed back to the source code structures that own it. By the end, you will have a mental model that makes every internal class feel like an old acquaintance rather than an alien term.


What Tez Actually Is (Two Sentences)

Apache Tez is a general-purpose DAG execution engine that runs on Apache YARN. It does not execute SQL or process files itself — it provides a runtime that other systems (Hive, Pig, custom applications) compile their work into, and then Tez runs that compiled work as a directed acyclic graph of parallel tasks.

Everything else — SQL parsing, query planning, physical operators, file format codecs — belongs to the caller. Tez sees vertex descriptors, edge properties, and processor classes. That boundary is what you need to hold clearly in mind throughout the curriculum.


Where Tez Sits in the Data Engineering Spectrum

┌─────────────────────────────────────────────────────────────────────────────┐
│                     Data Engineering Tool Spectrum                          │
│                                                                             │
│  Batch ◄───────────────────────────────────────────────► Streaming         │
│                                                                             │
│  MapReduce    Tez      Spark         Flink         Kafka Streams            │
│  (2004)       (2013)   (2014)        (2014)        (2016)                   │
│  pure batch   batch+   micro-batch   true stream   native stream            │
│               pipelined & batch      & batch                                │
│                                                                             │
│  ──────────────────────────────────────────────────────────────────────     │
│  Ingest Layer:  Flume  →  Kafka  →  Flink/Kafka Streams                    │
│  Storage Layer: HDFS / S3 / ORC / Parquet / Iceberg / Delta Lake           │
│  Query Layer:   Hive (Tez), Presto, Trino, Spark SQL, Flink SQL            │
└─────────────────────────────────────────────────────────────────────────────┘

Tez vs. MapReduce

MapReduce forces every computation into map → shuffle → reduce. A five-join SQL query becomes five chained MapReduce jobs with HDFS materializations between each. Tez expresses that same query as one DAG, pipelines intermediate data between vertices without HDFS writes, and reuses JVMs across tasks. Typical improvement: 2–5x on complex queries, 10x+ on workflows that would have been five MR jobs.

Tez vs. Spark

DimensionTezSpark
Primary use caseHive SQL (on YARN/HDFS ecosystems)General batch + ML + streaming
Execution modelYARN-native, container reuseDriver + executor (YARN or Kubernetes)
In-memory cachingNo (disk-backed shuffle)RDD/DataFrame caching (explicit)
StreamingNot nativeStructured Streaming (micro-batch)
DeploymentYARN onlyYARN, Kubernetes, standalone
Hive integrationDeep (Hive's primary engine)Separate (Hive-on-Spark is less common)
CommunityApache Tez (focused on Hive use case)Apache Spark (broad general use)

When you are on a Hadoop/YARN cluster where Hive is the primary SQL layer, Tez is the right choice. Spark is a better fit for Python/Scala workloads, ML pipelines, or when you need in-memory caching across multiple queries.

Flink is a streaming-first engine that also handles batch. Tez is a batch-first engine that handles simple pipelines. The key structural difference: Flink maintains persistent operator state across windows and checkpoints; Tez vertices are stateless per-task (state is external: HDFS, HBase). If you are building event-time windowed aggregations or exactly-once stream processing, you want Flink. If you are running nightly ETL on HDFS data via Hive, Tez is the right tool and Flink would be overengineered for the job.

Tez vs. Flume (Ingest)

Flume is not a computation engine — it is a log/event ingestion agent that moves data from sources (web servers, syslog, Kafka) to sinks (HDFS, Kafka, HBase). The typical pipeline is:

Application Logs → Flume Agent → HDFS (ORC/Parquet files) → Hive table → Tez query

Flume and Tez are not competitors; they are peers in the same pipeline. Tez reads the data that Flume (or Kafka, or Sqoop) landed on HDFS. Knowing this boundary matters when you encounter a data quality bug: is it in the ingest (Flume), the storage format (ORC serialization), or the compute layer (Tez/Hive)?


Data Formats in the Tez Ecosystem

Tez itself is format-agnostic. It does not read or write ORC, Parquet, or Iceberg directly. Tez sees InputDescriptor and OutputDescriptor objects — the actual codec lives in the class pointed to by those descriptors. The format lives in the tez-mapreduce compatibility layer (MRInput, MROutput) or in Hive's vectorized readers.

ORC (Optimized Row Columnar)

ORC is Hive's native format. When you INSERT INTO an ORC table and query it via Hive-on-Tez:

  • The input split is an OrcSplit generated by OrcInputFormat in the Hive ORC library.
  • Tez receives that split as a DataSourceDescriptor in the DAGPlan.
  • MRInput wraps OrcInputFormat.createRecordReader(), feeding vectorized row batches to Hive's MapOperator.
  • The key Tez entry point is MRInputLegacy.createReaderInternal() in tez-mapreduce/src/main/java/org/apache/tez/mapreduce/input/MRInputLegacy.java.

ORC's predicate pushdown (column pruning, row group skipping) happens before Tez sees the data — entirely inside OrcInputFormat. If a Hive-on-Tez query reads 10 billion rows instead of the 100K it should (wrong predicate pushdown), the bug is in ORC/Hive, not in Tez.

Parquet

Parquet is the other dominant columnar format, more common in cross-ecosystem pipelines (Spark + Hive interop). With Hive-on-Tez reading Parquet:

  • ParquetInputFormat generates ParquetInputSplit objects.
  • Tez receives those as DataSourceDescriptor entries.
  • Vectorization depth varies: ORC vectorization is deeper in Hive; Parquet goes through an additional row-column translation layer.

From a Tez contributor's standpoint, Parquet vs. ORC differences show up mainly in:

  1. Split size calculations affecting vertex parallelism (how many map tasks Tez schedules)
  2. Record skew when one Parquet file is much larger than others

Iceberg

Apache Iceberg is a table format (not a file format). It stores data in Parquet or ORC files but adds a metadata layer for ACID semantics, time travel, and hidden partitioning. Hive + Tez reads Iceberg via IcebergInputFormat (from the Iceberg Hive runtime JAR).

From Tez's view, Iceberg is yet another InputFormat. The novel behavior is:

  • Iceberg's snapshot-based read means splits can come from multiple physical locations.
  • Iceberg's PlanningUtil generates splits that can be much more numerous than traditional partition-based splits — this affects Tez vertex parallelism significantly.
  • Time-travel queries (SELECT ... FOR SYSTEM_TIME AS OF ...) generate a different split list at query compile time, which Hive encodes into the DAGPlan before Tez sees it.

Key insight for contributors: Tez bugs triggered by Iceberg tables are almost always about parallelism (too many small tasks, too few tasks for large snapshots) or about the DataSourceDescriptor encoding. The actual file reading is not Tez's responsibility.


Scenario 1: Classic Batch ETL — Aggregation Over a Large Table

What the data engineer does:

-- Run in Hive CLI connected to a cluster with Hive-on-Tez enabled
SET hive.execution.engine=tez;

CREATE TABLE daily_sales (
  event_date STRING,
  product_id BIGINT,
  region     STRING,
  revenue    DOUBLE
)
STORED AS ORC
TBLPROPERTIES ("orc.compress"="ZSTD");

-- Query: daily revenue by region, last 90 days
SELECT
  event_date,
  region,
  SUM(revenue)     AS total_revenue,
  COUNT(*)         AS transaction_count
FROM daily_sales
WHERE event_date >= '2026-03-01'
GROUP BY event_date, region
ORDER BY event_date, region;

What Tez does under the hood:

  1. Hive compiles the query to MapWork (map-side partial aggregation) + ReduceWork (global aggregation + sort).
  2. DagUtils.createVertex() in Hive creates two Tez Vertex objects: Map 1 and Reducer 2.
  3. The edge between them is SCATTER_GATHER (partitioned shuffle by GROUP BY key hash).
  4. ShuffleVertexManager auto-parallelism kicks in: it monitors how much data map tasks produce, then dynamically reduces the reducer count if data is smaller than expected (config: tez.shuffle-vertex-manager.desired-task-input-size).
  5. Map tasks run MapProcessorHashTableContainer (partial agg) → OrderedPartitionedKVOutput (partitioned, sorted).
  6. Reducer tasks run ReduceProcessorOrderedGroupedKVInput (merge shuffle inputs) → PTFOperator (for ORDER BY) → FileSinkOperator → ORC writer.

Dataset characteristics and edge behaviors:

Dataset characteristicTez behaviorSource class to read
1 small file (< 1 block)1 map task, ShuffleVertexManager sets 1 reducerShuffleVertexManager.java
1,000 files, uniform sizeParallelism = file count (MR split logic)MRInputLegacy.java split sizing
1 file, 10 GB, no ORC splits1 map task (cannot split non-splittable format)OrcInputFormat.isSplittable()
WHERE predicate on partitioned columnHive partition pruning, fewer splits passed to TezHive PartitionPruner, not Tez
WHERE filters out all rows0 output bytes from map, ShuffleVertexManager → 1 reducerShuffleVertexManager.onSourceTaskCompleted()

Bridge to source code:

cd ~/tez-src

# ShuffleVertexManager — the most important vertex manager for map-reduce style DAGs
find . -name "ShuffleVertexManager.java" -path "*/tez-dag/*"
# tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/ShuffleVertexManager.java

# Auto-parallelism: how ShuffleVertexManager decides to reduce the number of reducers
grep -n "computeParallelism\|desiredTaskInputSize\|onSourceTaskCompleted" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/ShuffleVertexManager.java | head -30

# The edge between Map 1 and Reducer 2 is SCATTER_GATHER — EdgeProperty documentation
grep -n "SCATTER_GATHER" \
  tez-api/src/main/java/org/apache/tez/dag/api/EdgeProperty.java

Scenario 2: Multi-Table Join — The Real Workload Tez Was Built For

What the data engineer does:

SET hive.execution.engine=tez;
SET hive.auto.convert.join=true;      -- enable map-side (broadcast) joins
SET hive.mapjoin.smalltable.filesize=25000000;  -- 25 MB threshold

SELECT
  o.order_id,
  c.customer_name,
  p.product_name,
  o.quantity * p.unit_price AS line_total
FROM orders          o
JOIN customers       c ON o.customer_id = c.customer_id
JOIN products        p ON o.product_id  = p.product_id
WHERE o.order_date = '2026-05-31'
AND   c.region = 'US-WEST';

What Tez does:

Hive's query planner analyzes table sizes:

  • customers and products are small (< 25 MB) → broadcast join (MapJoin)
  • orders is large (> 25 MB) → the probe side, goes through SCATTER_GATHER shuffle

The resulting DAG has:

  1. Map 1 — reads orders, builds hash table from customers and products small tables, emits matching rows. Small tables arrive via a BROADCAST edge (ONE_TO_ONE semantics: every map task gets the full small table).
  2. Optionally a Reducer 2 if there's a DISTINCT or ORDER BY.

VertexGroup for broadcast joins: Hive uses VertexGroup to express that one physical vertex's output goes to both Map 1 and any other map-side consumer. This is expressed via DAG.addVertex() with a VertexGroup wrapper.

Dataset edge cases for joins:

ScenarioWhat goes wrongWhere to look
customers grows from 20 MB to 30 MBMap join threshold exceeded, query switches to shuffle join; slowerHive CommonJoinResolver, not Tez
orders has extreme key skew (one customer_id has 90% of rows)One reducer gets 90% of data; task timeoutSkewedJoin hint in Hive; Tez sees it as one overloaded reducer
Broadcast table > YARN container heapOOM in map taskContainer memory: tez.task.resource.memory.mb
Right side of join returns 0 rowsMap tasks emit 0 output; downstream vertex immediately succeedsVertexImpl.checkTasksForCompletion()

Bridge to source code:

# ONE_TO_ONE edge (broadcast) — how every map task gets all small-table data
grep -n "ONE_TO_ONE\|BroadcastEdgeManager" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/ -r | grep -v Test | head -20

# VertexGroup — Hive's mechanism for fan-out to multiple consumers
grep -n "class VertexGroup\|addVertexGroup" \
  tez-api/src/main/java/org/apache/tez/dag/api/DAG.java | head -15

# How the DAGAppMaster sees both edges from the same vertex
grep -n "vertexGroup\|groupInput" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java | head -20

Scenario 3: Direct Tez API — No Hive

Not all Tez workloads go through Hive. Custom data pipelines, internal batch frameworks, and migration tools often build Tez DAGs directly. The canonical example is OrderedWordCount in tez-examples/.

// Simplified from tez-examples/src/main/java/org/apache/tez/examples/OrderedWordCount.java
TezClient tezClient = TezClient.create("OrderedWordCount", tezConf);
tezClient.start();

DAG dag = DAG.create("OrderedWordCount");

// Vertex 1: Tokenize words from input files
Vertex tokenizerVertex = Vertex.create(
    "Tokenizer",
    ProcessorDescriptor.create(TokenProcessor.class.getName()),
    numMapTasks,
    MRHelpers.getMapResource(conf));
tokenizerVertex.addDataSource(
    "Input",
    MRInput.createConfigBuilder(conf, TextInputFormat.class, inputPath).build());

// Vertex 2: Sort and deduplicate
Vertex sumVertex = Vertex.create(
    "Sorter",
    ProcessorDescriptor.create(SumProcessor.class.getName()),
    numReduceTasks,
    MRHelpers.getReduceResource(conf));
sumVertex.addDataSink(
    "Output",
    MROutput.createConfigBuilder(conf, TextOutputFormat.class, outputPath).build());

// Edge: SCATTER_GATHER, sorted by word key
dag.addVertex(tokenizerVertex)
   .addVertex(sumVertex)
   .addEdge(Edge.create(tokenizerVertex, sumVertex, EdgeProperty.create(
       DataMovementType.SCATTER_GATHER,
       DataSourceType.PERSISTED,
       SchedulingType.SEQUENTIAL,
       OrderedPartitionedKVOutput.createConfigBuilder(conf, HashPartitioner.class).build(),
       OrderedGroupedKVInput.createConfigBuilder(conf).build())));

DAGClient dagClient = tezClient.submitDAG(dag);
DAGStatus status = dagClient.waitForCompletion();

What this teaches about Tez's structure:

  • Vertex.create(name, processorDescriptor, parallelism, resource) — the four primitives of a vertex: name, code to run, how many copies, how much resource.
  • EdgeProperty.create(movementType, sourceType, schedulingType, outputDesc, inputDesc) — edge properties completely specify how data moves.
  • MRInput/MROutput bridge the gap between legacy Hadoop InputFormat/OutputFormat and Tez's native I/O descriptors.

Bridge to source code:

# Read OrderedWordCount to understand the complete DAG lifecycle from a client
cat tez-examples/src/main/java/org/apache/tez/examples/OrderedWordCount.java

# Follow TezClient.submitDAG() into the AM
grep -n "public.*submitDAG" \
  tez-api/src/main/java/org/apache/tez/dag/api/client/TezClient.java

# EdgeProperty — the central struct that determines routing
cat tez-api/src/main/java/org/apache/tez/dag/api/EdgeProperty.java

Tez supports PIPELINED edge scheduling (vs. SEQUENTIAL). With pipelined edges, downstream tasks can start before all upstream tasks complete — the data flows like a stream within the DAG.

EdgeProperty pipelinedEdge = EdgeProperty.create(
    DataMovementType.SCATTER_GATHER,
    DataSourceType.PERSISTED_PIPELINED,    // <-- pipelined
    SchedulingType.CONCURRENT,             // <-- downstream starts immediately
    outputDescriptor,
    inputDescriptor);

This is used by Hive for query pipelining in long-running SELECT ... INSERT chains. The downstream vertex starts consuming partial output from the upstream before it finishes, reducing end-to-end latency for multi-stage queries.

When pipelining causes problems:

ProblemSymptomRoot class
Upstream task fails mid-streamDownstream task has consumed partial data → must be killed and retried with upstreamTaskAttemptImpl.FAILED_TRANSITION
Downstream cannot consume fast enoughBack-pressure: upstream pauses on write()OrderedPartitionedKVOutput.sendingThreadShouldRun
Memory overflow in pipelined bufferOutOfMemoryError in fetcher threadsMergeManager in-memory limit

Bridge to source code:

grep -n "PERSISTENT_PIPELINED\|PIPELINED\|CONCURRENT" \
  tez-api/src/main/java/org/apache/tez/dag/api/EdgeProperty.java

grep -n "SchedulingType.CONCURRENT" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head -10

Dataset Scenarios for Testing Edge Cases

When you are writing a repro test case or validating a fix, the dataset you choose determines which code paths you exercise. Use these as your starting templates.

Dataset 1: The Empty Partition

// Generate test data where one reduce partition has 0 records
// Triggers ShuffleVertexManager.onSourceTaskCompleted() with 0-byte output
private static final int NUM_PARTITIONS = 10;
private static final int RECORDS_PER_PARTITION = 100;

// Force all records into partitions 0–8, leave partition 9 empty
int partition = key.hashCode() % (NUM_PARTITIONS - 1);  // never 9

What this tests: ShuffleVertexManager must handle a vertex where some reducer partitions receive zero input. Before TEZ-3247, this caused reducers to hang waiting for shuffle data that would never arrive.

# Test class that covers empty-partition behavior
grep -rn "emptyPartition\|zeroInput\|emptyInput" tez-tests/src/test/ | head -10

Dataset 2: Extreme Key Skew

// One key accounts for 95% of records
for (int i = 0; i < 1_000_000; i++) {
    String key = (i < 950_000) ? "hot_key" : "key_" + i;
    writer.write(new Text(key), new IntWritable(1));
}

What this tests: The reducer that receives hot_key gets ~950,000 records while other reducers get ~50 each. This exposes:

  • Speculative execution decisions in LegacySpeculator
  • Container reuse after the skewed reducer finishes last
  • Per-vertex timing in VertexImpl.checkTasksForCompletion()

Dataset 3: Zero-Row Input

// Empty input — 0 files, 0 records
// The DAG should complete SUCCEEDED with 0 output, not hang
String inputPath = "/tmp/empty_dir_" + UUID.randomUUID();
fs.mkdirs(new Path(inputPath));  // create directory but put no files in it

What this tests: VertexImpl must handle the case where MRInput generates 0 splits. A vertex with 0 input splits sets its parallelism to 0, transitions immediately to V_SUCCEEDED without scheduling any tasks. This has historically been a source of NullPointerException bugs when downstream vertices assume at least one upstream task ran.

grep -n "setParallelism.*0\|numTasks.*0\|zeroTasks\|numSourceTasks.*0" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head -15

Dataset 4: Very Wide Rows (Many Columns)

// 1,000 columns per row — stresses IFile serialization and spill logic
StringBuilder sb = new StringBuilder();
for (int col = 0; col < 1000; col++) {
    sb.append("column_").append(col).append("=").append("value_").append(col).append("\t");
}
writer.write(new Text("key"), new Text(sb.toString()));  // ~30 KB per record

What this tests: PipelinedSorter and DefaultSorter spill thresholds. With 30 KB per record, even a modest sort buffer fills quickly. This exercises the spill path in tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/OrderedPartitionedKVOutput.java and exposes off-by-one bugs in the IFile index writer.

Dataset 5: Many Small Files (HDFS Small-File Problem)

# Generate 50,000 files of 1 KB each — a classic HDFS anti-pattern
for i in $(seq 1 50000); do
  echo "record_$i value_$i" > /tmp/smallfiles/file_$i.txt
done
hadoop fs -put /tmp/smallfiles /data/input/smallfiles/

What this tests: Split generation produces 50,000 map tasks. This is a realistic workload that stresses:

  • TaskSchedulerManager task queue management
  • Container reuse logic (50,000 containers → reuse is essential for performance)
  • DAGAppMaster AMRM heartbeat frequency under high task count
# Container reuse configuration
grep -n "heldContainer\|releaseTimeout\|IDLE_TIMEOUT" \
  tez-dag/src/main/java/org/apache/tez/dag/app/launcher/ContainerLauncherImpl.java \
  | head -20

Dataset 6: Nested Structs (Complex Types)

-- ORC table with nested complex types
CREATE TABLE events (
  event_id  BIGINT,
  metadata  STRUCT<
    user_id:       BIGINT,
    session_id:    STRING,
    properties:    MAP<STRING, STRING>,
    tags:          ARRAY<STRING>
  >,
  timestamp BIGINT
) STORED AS ORC;

What this tests: ORC vectorized reader deserialization of STRUCT, MAP, and ARRAY types. These types are serialized into Hive's OrcStruct/OrcMap/OrcList classes before being passed through MRInput to the MapOperator. If the column count or type tree changes between what the ORC file was written with and what the Hive schema says, you get schema evolution behavior — which can generate bugs that look like Tez data corruption but are actually ORC schema evolution issues.

Dataset 7: Partitioned Iceberg Table (Snapshot Isolation)

# Using PyIceberg or Spark to create an Iceberg table with multiple snapshots
from pyiceberg.catalog import load_catalog
catalog = load_catalog("hive_catalog", **{"uri": "thrift://hive-metastore:9083"})
table = catalog.load_table("db.events_iceberg")

# Write 3 snapshots representing 3 days of appends
for day in range(3):
    df = generate_day_data(day)
    table.append(df)

# Now query with time travel — Hive generates a DAGPlan that reads snapshot 1
hive_execute("""
    SELECT COUNT(*) FROM db.events_iceberg
    FOR SYSTEM_TIME AS OF '2026-05-29 00:00:00'
""")

What this tests: Iceberg's IcebergInputFormat generates a split list that differs per snapshot. The DataSourceDescriptor passed to Tez encodes the snapshot ID. If Hive resolves the wrong snapshot, Tez faithfully executes it — the bug is in the DagUtils snapshot resolution in Hive, not in Tez. But the symptom (wrong row count) looks like a Tez data bug.


Running Tez End-to-End: The Local Developer Loop

Before writing source code, every Tez contributor should be able to do this loop in under 10 minutes:

# 1. Clone and build
git clone https://github.com/apache/tez.git ~/tez-src
cd ~/tez-src
mvn clean install -DskipTests -Pdist -q   # ~8–12 min cold, 3–4 min warm

# 2. Run the canonical integration test that exercises the full stack
mvn test -pl tez-tests \
    -Dtest=TestOrderedWordCount \
    -DfailIfNoTests=false 2>&1 | tail -30

# 3. Run a single unit test (fast feedback loop — use this constantly)
mvn test -pl tez-dag \
    -Dtest=TestVertexImpl#testVertexSucceededSpeculation \
    -DfailIfNoTests=false 2>&1 | tail -20

# 4. Run OrderedWordCount in local mode (no YARN cluster required)
hadoop jar tez-examples/target/tez-examples-*.jar orderedwordcount \
    -D tez.local.mode=true \
    /path/to/input /tmp/tez-output-$(date +%s)

# 5. Verify output
hadoop fs -cat /tmp/tez-output-*/part-* | sort | head -20

The TestOrderedWordCount test is your baseline health check. If it passes, the full end-to-end stack (TezClient → DAGAppMaster → VertexImpl → shuffle → MRInput/MROutput) is working. If it fails, something fundamental is broken and you need to fix that before touching anything else.


The Bridge: User Scenario → Source Code

Every scenario above maps to a specific source subsystem. Use this table whenever you see a runtime behavior and want to find the code responsible:

Observed behaviorSource location
Map task count equals file counttez-mapreduce/.../MRInputLegacy.createSplitsProto()
Reducer count auto-adjusted downShuffleVertexManager.computeParallelism()
DAG completes even with 0-row inputVertexImpl.scheduleTasks() (0-task vertex path)
Broadcast join: small table to all mapsBroadcastEdgeManager + ONE_TO_ONE edge
Container reused between tasksAMContainerImpl.assignContainer() + HeldContainer
Task retried after failureTaskAttemptImplTaskImpl.handleTaskAttemptFailed()
OOM in shuffle fetchMergeManager.memoryAvailable / Fetcher.copyFromHost()
Hung vertex with tasks still RUNNINGVertexImpl.checkTasksForCompletion() not triggered
Wrong output record countCheck OrcInputFormat predicate pushdown first, then Tez
Slow single reducer (skew)LegacySpeculator slow-task detection → speculative attempt
Pipelined task killed on upstream failureTaskAttemptImpl.FAILED_TRANSITION cascades

What to Verify Before Starting Level 1

Run through this checklist once. It takes 30–45 minutes and proves your environment is solid.

# Environment check
java -version    # must be Java 8 or Java 11
mvn -version     # must be 3.6.3+
git --version    # must be 2.x

# Clone and build
git clone https://github.com/apache/tez.git ~/tez-src
cd ~/tez-src
mvn clean install -DskipTests -Pdist 2>&1 | tail -10

# Confirm build artifacts exist
ls tez-dist/target/tez-*.tar.gz  # should exist
ls tez-examples/target/tez-examples-*.jar

# Run the unit test suite in the two most important modules
mvn test -pl tez-dag -DfailIfNoTests=false 2>&1 | grep -E "Tests run:|FAIL|ERROR" | tail -5
mvn test -pl tez-api -DfailIfNoTests=false 2>&1 | grep -E "Tests run:|FAIL|ERROR" | tail -5

# Run the critical end-to-end test
mvn test -pl tez-tests -Dtest=TestOrderedWordCount -DfailIfNoTests=false 2>&1 | tail -10

# All lines should read "Tests run: N, Failures: 0, Errors: 0"

If any of these fail before you have modified a single line of code, stop and fix your environment. Do not proceed into Level 1 with a broken baseline. A broken baseline means every subsequent mvn test will produce false failures that obscure the real work.


Continue to Overview & Prerequisites or jump directly to Level 1: Hadoop and Tez Foundation.

16-Week Plan: From Curious Reader to Tez Committer Candidate

This is a 16-week, ~10-hour-per-week plan that maps the curriculum (Levels 1–9 plus a 2-week capstone) onto a calendar. Each week states:

  • Reading — concrete Tez source files. Open them; do not just skim diagrams.
  • Hands-on — what you must build/run on your machine.
  • JIRA practice queries — searches that surface real, beginner-appropriate issues.
  • Labs — the curriculum labs you must complete.
  • Exit checkpoint — concrete deliverables. If you cannot produce them, repeat the week.

The plan assumes you have ~/tez-src checked out, tez-tests/ building with mvn -DskipTests install, and a working Java 8+/Maven 3.6+ environment.


Weeks 1–2: Level 1 — Orientation and First DAG

Week 1 — The DAG model and the client API

Reading

  • tez-api/src/main/java/org/apache/tez/dag/api/DAG.java (entire file; ~600 lines)
  • tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java
  • tez-api/src/main/java/org/apache/tez/dag/api/Edge.java
  • tez-api/src/main/java/org/apache/tez/dag/api/EdgeProperty.java
  • tez-api/src/main/proto/DAGApiRecords.proto — focus on DAGPlan, VertexPlan, EdgePlan, EdgeProperty.

Hands-on

  • Build Tez from source: mvn clean install -DskipTests -Phadoop28.
  • Run OrderedWordCount against a local file using MiniTezCluster (see tez-tests/src/test/java/org/apache/tez/test/TestTezJobs.java).
  • Inspect the generated DAGPlan: print it with dag.createDag(...).toString().

JIRA practice queries

project = TEZ AND status in (Open, "In Progress") AND labels = newbie
project = TEZ AND component = tez-api AND fixVersion is empty AND priority in (Trivial, Minor)

Labs

  • Lab 1.1 — Trace a WordCount end-to-end.
  • Lab 1.2 — Modify the DAG: add a second mapper vertex.

Exit checkpoint

  • You can name every required argument to DAG.create(), Vertex.create(), Edge.create(), and EdgeProperty.create().
  • You can diagram the WordCount DAG without looking.
  • You have one JIRA ticket open in a browser tab that you've read end-to-end (description + every comment).

Week 2 — Edges in depth

Reading

  • tez-api/src/main/java/org/apache/tez/dag/api/EdgeProperty.java — all three enums (DataMovementType, DataSourceType, SchedulingType).
  • tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/EdgeManager*.java — five built-in edge managers.
  • tez-api/src/main/java/org/apache/tez/dag/api/InputDescriptor.java, OutputDescriptor.java, ProcessorDescriptor.java.

Hands-on

  • Build the same WordCount with BROADCAST instead of SCATTER_GATHER for the edge. Observe the failure mode and explain it.
  • Write a 3-vertex DAG (A -> B -> C) where A->B is ONE_TO_ONE and B->C is SCATTER_GATHER. Run it; confirm parallelism rules from the source.

JIRA practice queries

project = TEZ AND text ~ "EdgeManager" AND resolution = Unresolved
project = TEZ AND text ~ "broadcast" AND status = Resolved ORDER BY created DESC

Labs

  • Lab 1.3 — Edge type matrix experiment.

Exit checkpoint

  • Edge type matrix (movement × scheduling × source) drawn from memory.
  • You can predict, given edge properties, which EdgeManager impl will be picked.
  • One short forum/dev-list email you drafted (do not send) summarizing your reading of an EdgeManager file.

Weeks 3–4: Level 2 — Build, run, and read tests

Week 3 — Tez build system and module layout

Reading

  • pom.xml (root), tez-api/pom.xml, tez-dag/pom.xml.
  • BUILDING.txt.
  • tez-tests/src/test/java/org/apache/tez/test/MiniTezCluster.java — entry-point for nearly every integration test.

Hands-on

  • Run mvn -pl tez-dag test -Dtest=TestVertexImpl#testBasicVertexCompletion.
  • Run mvn -pl tez-tests test -Dtest=TestTezJobs#testWordCount.
  • Profile a build: mvn -DskipTests install -X 2>&1 | grep "Building\|BUILD".

JIRA practice queries

project = TEZ AND component = build AND status = Open
project = TEZ AND text ~ "MiniTezCluster" AND resolution = Unresolved

Labs

  • Lab 2.1 — Build Tez and run all tez-api tests.
  • Lab 2.2 — Add a no-op test to tez-dag and run it via Maven.

Exit checkpoint

  • You can explain why tez-dag depends on tez-api but not vice versa.
  • You know the difference between tez-runtime-internals and tez-runtime-library.
  • You can run a single test via Maven without consulting any docs.

Week 4 — Tests as documentation

Reading

  • tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestVertexImpl.java (~5000 lines; pick the top 10 test methods).
  • tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestDAGImpl.java.
  • tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestTaskImpl.java.

Hands-on

  • Pick one test method in TestVertexImpl; rewrite it from scratch in your notebook, then diff against the original.
  • Add an assertion that fails; observe the message; fix it.

JIRA practice queries

project = TEZ AND text ~ "flaky" AND status in (Open, "In Progress")
project = TEZ AND text ~ "TestVertexImpl" AND resolution = Unresolved

Labs

  • Lab 2.3 — Read TestVertexImpl#testKilledTasksHandling and explain every line.

Exit checkpoint

  • You can write a test that constructs a VertexImpl directly (without MiniTezCluster).
  • You understand the DrainDispatcher pattern (see state-machines.md).

Weeks 5–6: Level 3 — Submission and AM lifecycle

Week 5 — TezClient and submission

Reading

  • tez-api/src/main/java/org/apache/tez/client/TezClient.java.
  • tez-api/src/main/java/org/apache/tez/client/TezClientUtils.java.
  • tez-api/src/main/java/org/apache/tez/client/TezSessionImpl.java.

Hands-on

  • Write a small Java program that uses TezClient directly (no MR shim) to submit a DAG to MiniTezCluster.
  • Use both session and non-session modes; measure the second-DAG latency difference.

JIRA practice queries

project = TEZ AND component = "tez-api" AND text ~ "TezClient" AND status = Open

Labs

  • Lab 3.1 — Build a custom client that submits two DAGs in one session.

Exit checkpoint

  • You can list every method that talks to the AM over RPC (grep for dagAMProtocol in TezClient.java).
  • You can name the three local resources that TezClientUtils uploads.

Week 6 — DAGAppMaster bring-up

Reading

  • tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java — focus on serviceInit, serviceStart, dispatcher registration.
  • tez-dag/src/main/java/org/apache/tez/dag/app/TaskCommunicatorManager.java.
  • tez-dag/src/main/java/org/apache/tez/dag/app/launcher/ContainerLauncher*.java.

Hands-on

  • Run a DAG against MiniTezCluster with AM logs at DEBUG. Identify the line in DAGAppMaster.java that emits the first "Created DAG" log line.

Labs

  • Lab 3.2 — Map an AM log line to source code (Lab in Level 3).

Exit checkpoint

  • You can list the AsyncDispatcher event-handler registrations in DAGAppMaster in order.
  • You can walk the path from TezClient.submitDAG() to DAGImpl being instantiated inside the AM.

Weeks 7–9: Level 4 — Vertex internals and state machines

Week 7 — State machine library

Reading

  • hadoop-yarn-common StateMachineFactory source (you'll need to fetch Hadoop source separately).
  • tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java — read only the stateMachineFactory block first (~200 lines near the top).

Hands-on

  • Write a toy StateMachineFactory for a Light (OFF, ON, BROKEN) in a scratch project.

Labs

  • Lab 4.1 — State-machine introduction.

Exit checkpoint

  • You can explain SingleArcTransition vs MultipleArcTransition without notes.

Week 8 — VertexManager plugins

Reading

  • tez-api/src/main/java/org/apache/tez/dag/api/VertexManagerPlugin.java, VertexManagerPluginContext.java.
  • tez-dag/src/main/java/org/apache/tez/dag/library/vertexmanager/ShuffleVertexManager.java.

Labs

  • Lab 4.2 — VertexManager deep dive (the depth-bar lab).

Exit checkpoint

  • A working CountingVertexManager with passing unit test, as specified in Lab 4.2.

Week 9 — Task and TaskAttempt

Reading

  • tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java.
  • tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptImpl.java.

Labs

  • Lab 4.3 — Task lifecycle walk.
  • Lab 4.4 — TaskAttempt termination causes.

Exit checkpoint

  • You can draw the TaskAttempt state machine from memory.
  • You can list every TaskAttemptTerminationCause and what produces it.

Weeks 10–11: Level 5 — Runtime, IPO, and shuffle

Week 10 — Runtime task execution

Reading

  • tez-runtime-internals/src/main/java/org/apache/tez/runtime/task/TezTaskRunner2.java.
  • tez-runtime-internals/src/main/java/org/apache/tez/runtime/LogicalIOProcessorRuntimeTask.java.

Labs

  • Lab 5.1 — Trace a task from container start to processor exit.

Exit checkpoint

  • You can list every umbilical call a task makes during its lifetime (grep umbilical in tez-runtime-internals).

Week 11 — Shuffle and merge

Reading

  • tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/ShuffleManager.java.
  • tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/Fetcher.java.
  • tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/PipelinedSorter.java.

Labs

  • Lab 5.2 — Spilled output inspection on MiniTezCluster.
  • Lab 5.3 — Force a fetch failure.

Exit checkpoint

  • You can explain IFile framing in two paragraphs.
  • You can name the three sorter implementations and when each is used.

Week 12: Level 6 — Scheduling and container reuse

Reading

  • tez-dag/src/main/java/org/apache/tez/dag/app/rm/YarnTaskSchedulerService.java.
  • tez-dag/src/main/java/org/apache/tez/dag/app/rm/TaskSchedulerManager.java.
  • tez-dag/src/main/java/org/apache/tez/dag/app/rm/container/AMContainerImpl.java.

JIRA practice queries

project = TEZ AND text ~ "container reuse" AND status in (Open, "In Progress")

Labs

  • Lab 6.1 — Disable container reuse; measure latency cost.
  • Lab 6.2 — Read and explain tez.am.container.reuse.* configs.

Exit checkpoint

  • You can list the four conditions under which a container is not reused.

Week 13: Level 7 — MapReduce compatibility and integrations

Reading

  • tez-mapreduce/src/main/java/org/apache/tez/mapreduce/input/MRInput.java.
  • tez-mapreduce/src/main/java/org/apache/tez/mapreduce/output/MROutput.java.
  • tez-mapreduce/src/main/java/org/apache/tez/mapreduce/processor/map/MapProcessor.java.

Labs

  • Lab 7.1 — Submit a vanilla MR job via Tez (tez.lib.uris mode).

Exit checkpoint

  • You can write a one-page essay on "what MRInput does that a plain LogicalInput does not."

Week 14: Level 8 — Production diagnostics

Reading

  • tez-api/src/main/java/org/apache/tez/common/counters/TezCounters.java.
  • tez-dag/src/main/java/org/apache/tez/dag/history/HistoryEventHandler.java.
  • tez-plugins/tez-yarn-timeline-history/.

Labs

  • Lab 8.1 — Read a real ATS event dump.
  • Lab 8.2 — Trace a failure through the AM log + ATS + counters.

Exit checkpoint

  • You can answer: "Why did vertex X fail?" given only an AM log and ATS dump.

Weeks 15–16: Capstone

Follow capstone/index.md start-to-finish:

  1. Issue selection (week 15, day 1–2).
  2. Reproduction → root cause (week 15, day 3–7).
  3. Implementation + tests (week 16, day 1–4).
  4. Patch submission + write-up (week 16, day 5–7).

Exit checkpoint

  • A real patch attached to a real JIRA, with passing tests and a clear summary.
  • A 1500–3000 word public write-up of the experience.

How to use this plan when you fall behind

  • If you finish a week's reading but cannot pass the exit checkpoint, repeat the week. Do not advance.
  • If a JIRA query returns no results, change the query. The dev community moves; labels and components shift.
  • Skip a Level only if you can pass all exit checkpoints from previous Levels in one sitting.

Milestones: M1 Through M9

Milestones are the "what does mastery look like at this stage" checkpoints. Each milestone has:

  • Expected completion — a calendar guideline.
  • Skills you must demonstrate — 5–8 concrete abilities.
  • Self-check questions — answer them out loud, without notes.
  • 20-point rubric — five criteria, four points each.
  • Pass threshold — minimum total to advance.
  • Move to the next level when — the binary gate.

Pass thresholds are deliberately high. The point is competence, not throughput.


M1 — Orientation (end of Week 2)

You can read the Tez DAG API and explain what every method on DAG, Vertex, and Edge does.

Skills

  1. Write a 3-vertex DAG end-to-end without consulting docs.
  2. Explain the three enums on EdgeProperty and pick the correct one for a given problem.
  3. Name the protobuf message that represents a DAG on the wire.
  4. Predict which built-in EdgeManager implementation will be selected for a given edge.
  5. Locate any class in the tez-api module by name within 30 seconds.

Self-check questions

  • What is the difference between DataSourceDescriptor and a runtime Input?
  • Why is DAG.verify() called before submission?
  • Which class produces the protobuf DAGPlan?

Rubric

Criterion1234
API fluencyCan name classesCan describe responsibilitiesCan write code from memoryCan predict behavior
Edge modelConfusedKnows enumsPicks correct edge typePredicts EdgeManager impl
Reading speed>5 min/file~3 min/file~1 min/filescanning fluently
Mental modelVagueSketches DAGSketches DAG + edge typesSketches DAG + edge types + plan flow
CommunicationCannot explainExplains with notesExplains without notesTeaches another

Pass threshold: 14/20, with no criterion below 2.

Move to Level 2 when: you can draft a new DAG class in 10 minutes from a verbal problem statement, on a whiteboard.


M2 — Build and Test Literacy (end of Week 4)

You can navigate the codebase, build it, and run any test by name.

Skills

  1. Run a single test in any module via mvn -pl <module> test -Dtest=Class#method.
  2. Add a new test file to tez-dag and have it picked up by Maven.
  3. Read TestVertexImpl and explain at least 10 individual test methods.
  4. Identify the module of a class given just its FQN (e.g., o.a.t.dag.app...tez-dag).
  5. Build Tez from a clean checkout in under 5 minutes (with cached deps).
  6. Distinguish unit tests from MiniTezCluster-backed integration tests.

Self-check questions

  • Why does tez-dag depend on tez-api and not the reverse?
  • What is DrainDispatcher and why do tests use it?
  • Where do MiniTezCluster tests live and what classpath do they need?

Rubric

Criterion1234
Build masterymvn install worksCan skip tests, profilesKnows module depsDiagnoses build failures
Test executionRuns all testsRuns a classRuns a methodRuns cross-module
Test readingSkimsUnderstands assertionsUnderstands setupRecreates from scratch
Module mapKnows namesKnows top-level depsKnows transitive depsDiagnoses cycles
ToolingIDE-onlyCLI + IDECLI primaryCLI + scripting

Pass threshold: 14/20.

Move to Level 3 when: you can clone Tez on a fresh laptop, build it, and run a TestVertexImpl method by name within 15 minutes.


M3 — Submission and AM Bring-up (end of Week 6)

You can trace a DAG from TezClient.submitDAG() to DAGImpl.handle(...) inside the AM.

Skills

  1. List the three local resources TezClientUtils uploads.
  2. Explain session vs non-session mode and the AM keep-alive mechanism.
  3. Name every AsyncDispatcher event-handler registered in DAGAppMaster.
  4. Locate the line of code where DAGImpl is constructed inside the AM.
  5. Read AM logs at DEBUG and map lines to source positions.
  6. Run MiniTezCluster in your tests and inspect AM logs.

Self-check questions

  • What RPC does TezClient use to submit a DAG? Which protocol class?
  • How does the AM stay alive between DAGs in a session?
  • What happens if the AM dies during a DAG run with recovery disabled?

Rubric

Criterion1234
Submission pathVagueKnows TezClient APIKnows RPCKnows full byte path
AM bring-upCannot describeNames dispatcherNames handlersWalks serviceInit
Session modelConfusedKnows the flagKnows keep-aliveKnows timeouts
Log readingGreps blindlyGreps with intentMaps to codePredicts log line
RecoveryUnknownAwareKnows config keysKnows record format

Pass threshold: 14/20.

Move to Level 4 when: you can answer "where in the AM does my DAG show up?" with a file:line citation.


M4 — State Machines and VertexManager (end of Week 9)

You can read and modify the vertex/task/attempt state machines.

Skills

  1. Write a small StateMachineFactory-based state machine from scratch.
  2. Add a transition to VertexImpl.stateMachineFactory and update tests in the same patch.
  3. Implement a custom VertexManagerPlugin with a unit test.
  4. Diagnose an InvalidStateTransitonException from a stack trace.
  5. Distinguish SingleArcTransition from MultipleArcTransition.
  6. Explain the dispatcher single-threading invariant.

Self-check questions

  • Why must state-machine code be single-threaded? What breaks if not?
  • What happens if you forget to register a transition for an event in a state?
  • How does ShuffleVertexManager implement slow-start?

Rubric

Criterion1234
State machineKnows it existsCan read transitionsCan add transitionCan refactor safely
Test disciplineNoneAdds happy pathAdds happy + sadUpdates per transition
VertexManagerKnows interfaceImplements minimalImplements customImplements + tests
ConcurrencyConfusedKnows the ruleKnows whyCan audit a PR
DebuggingReads stackMaps to sourceReproduces locallyWrites regression test

Pass threshold: 16/20 — this is the first hard gate.

Move to Level 5 when: you have submitted (or at minimum drafted) a state machine change that compiles, with a passing test.


M5 — Runtime and Shuffle (end of Week 11)

You can read the runtime data path and explain spill, merge, and fetch.

Skills

  1. Walk a single task's lifecycle: container start → processor.run() → output close.
  2. Explain IFile framing and the difference between V1 and V2.
  3. Distinguish DefaultSorter, PipelinedSorter, and unordered output.
  4. Diagnose a fetcher failure from logs.
  5. Read ShuffleManager and explain its scheduling of fetchers.
  6. Explain combiners and where they run in the pipeline.

Self-check questions

  • What umbilical RPCs does a task make during its run?
  • Where is the spill threshold checked?
  • What triggers a FAILED_FETCH event upstream?

Rubric

Criterion1234
Runtime pathNames classesWalks happy pathWalks failure pathsWalks edge cases
IFileKnows formatReads with hexdumpModifies safelyDiagnoses corruption
SorterNames themKnows tradeoffsPicks for workloadTunes configs
ShuffleVagueKnows pull modelKnows schedulingKnows backoff
CombinerAwareKnows when runImplements oneDebugs incorrect output

Pass threshold: 15/20.

Move to Level 6 when: you can intentionally produce a fetcher failure on MiniTezCluster and explain every log line.


M6 — Scheduling and Container Reuse (end of Week 12)

You understand how Tez decides where tasks run.

Skills

  1. Read YarnTaskSchedulerService and explain its scheduling loop.
  2. List the conditions under which a container is/is not reused.
  3. Explain affinity, locality, and racks.
  4. Tune tez.am.container.reuse.* for a given workload.
  5. Diagnose "stuck" scheduling.

Self-check questions

  • Why does Tez prefer to reuse containers over requesting new ones?
  • What happens if tez.am.container.idle-release-timeout-min.millis is too low?

Rubric

Criterion1234
Reuse modelAwareKnows conditionsKnows configsTunes for workload
SchedulingBlack boxReads main loopReads matchingReads + modifies
LocalityAwareKnows hintsKnows fallbackKnows rack policy
DiagnosticsGuess-and-checkReads AM logsReads + maps to codeAdds counters
YARN integrationAwareKnows AMRMKnows tokensKnows failover

Pass threshold: 14/20.

Move to Level 7 when: you can explain why container reuse is on by default and pick five workloads where you would tune it.


M7 — Integrations (end of Week 13)

You can read and modify the MapReduce shim and explain Hive-on-Tez at a high level.

Skills

  1. Write a DAG that uses MRInput reading from HDFS.
  2. Explain MROutput commit semantics.
  3. Sketch how Hive's TezTask builds a DAG.
  4. Identify which features Hive uses (custom edges, manager plugins, dynamic reconfig).

Self-check questions

  • What does MROutput.commit() do, and what guarantees does it offer?
  • Why does Hive use ROOT_INPUT_INITIALIZER_FAILED heavily in its bug fixes?

Rubric

Criterion1234
MR shimKnows existenceReads MRInputReads + usesModifies safely
CommitAwareKnows semanticsKnows failure modesKnows speculative cleanup
Hive lensAwareReads TezTaskReads + mapsDiagnoses cross-project bug
Cross-projectConfusedKnows boundariesPicks the right listFiles bug correctly

Pass threshold: 12/16 (only 4 criteria here).

Move to Level 8 when: you can read a Hive query plan and predict its DAG.


M8 — Production Diagnostics (end of Week 14)

You can debug a real Tez job failure given logs and an ATS dump.

Skills

  1. Read a Tez counters dump and find a bottleneck.
  2. Find a VertexImpl failure cause from AM logs in <5 minutes.
  3. Read ATS events and reconstruct a DAG timeline.
  4. Identify a stuck task vs a slow task vs a failed task from counters.
  5. Build a one-pager triage runbook for your team.

Rubric

Criterion1234
CountersKnows existenceReadsInterpretsTunes
Log triageGrepsMaps to codeMaps to statePredicts next event
ATSAwareQueriesReads eventsCross-checks vs AM log
RunbookNoneDraftReviewedShipped to team
Speed>30 min~15 min<10 min<5 min

Pass threshold: 16/20.

Move to capstone when: you've helped someone (on chat, dev list, or internally) debug a real Tez issue successfully.


M9 — Capstone (end of Week 16)

You've shipped a patch.

Skills

  1. Selected an appropriate issue.
  2. Reproduced and root-caused.
  3. Implemented a fix with tests.
  4. Submitted a patch in the project's accepted format.
  5. Responded to at least one round of review feedback.

Rubric (20 points)

Criterion1234
Issue selectionRandomScopedJustifiedAligned to roadmap
ReproductionNoneManualScriptedAdded as a test
Root causeSpeculativeLocalizedCitedExplained in JIRA
ImplementationCompilesTests passIdiomaticMinimal & focused
SubmissionNoneDraftSubmittedReviewed

Pass threshold: 16/20, and the patch must compile and pass mvn verify on the affected module.


Global Rubric (committer-readiness)

Use this every quarter, regardless of level, to self-assess.

Dimension1 (Beginner)2 (Apprentice)3 (Practitioner)4 (Committer-ready)
CodeReadsModifiesDesigns subsystemReviews others' changes
TestingRuns testsAdds testsWrites regression suitesDrives test infra
DocsReadsEditsWrites user-facingOwns module-level docs
IntegrationSingle moduleCross-moduleCross-project (Hive)Drives release decisions

A committer-track contributor should be at level 3 on all four dimensions and level 4 on at least one. Aim for 3/3/3/3 → 4/3/3/4 by month 12 of focused contribution.

Level 1: Hadoop and Tez Foundation

This level establishes the technical baseline every subsequent level depends on. You will understand where Tez fits in the Hadoop ecosystem, successfully build the project from source, run the test suite, and execute your first Tez DAG in local mode.


Learning Objectives

By the end of Level 1 you must be able to:

  1. Explain where Apache Tez sits in the Hadoop ecosystem and why it exists
  2. Build Apache Tez from source using Maven, with and without tests
  3. Execute unit tests scoped to a single module and interpret the results
  4. Run a simple Tez DAG in local mode without a YARN cluster
  5. Locate any class mentioned in Levels 2–9 without using a search engine
  6. Articulate the difference between a MapReduce job and a Tez DAG at the execution model level
  7. Read TezConfiguration.java and find any configuration key by category

The Hadoop Ecosystem Context

Apache Tez lives inside the Hadoop ecosystem. Before touching a line of Tez code, build an accurate mental model of the stack:

┌─────────────────────────────────────────────────────┐
│         Apache Hive / Apache Pig / Cascading        │  ← Query / scripting layer
├─────────────────────────────────────────────────────┤
│                  Apache Tez                         │  ← DAG execution engine
├─────────────────────────────────────────────────────┤
│                  Apache YARN                        │  ← Cluster resource management
├─────────────────────────────────────────────────────┤
│                  Apache HDFS                        │  ← Distributed storage
└─────────────────────────────────────────────────────┘

YARN (Yet Another Resource Negotiator) manages cluster resources. It runs an ApplicationMaster (AM) per application, allocates containers, and monitors health. Tez's DAGAppMaster IS a YARN ApplicationMaster.

HDFS stores input, output, and sometimes intermediate data. Tez prefers to keep intermediate data on local disk or in memory, but falls back to HDFS for recovery and large-scale shuffles.

Tez submits a DAGAppMaster to YARN, which requests containers for task execution. Tasks read inputs, execute processors, and write outputs — either directly to downstream tasks via shuffle or to HDFS for final output.

MapReduce vs. Tez

AspectMapReduceApache Tez
Execution modelFixed: Map → Shuffle → ReduceArbitrary DAG of vertices
Multi-stage queriesChain of separate MR jobsSingle DAG
Inter-stage dataAlways written to HDFSPipelined or local disk
JVM startupNew JVM per taskContainer reuse across tasks
Vertex typesTwo (Map, Reduce)Unlimited
Speculative executionYesYes (configurable per vertex)
Session supportNoYes — TezClient session mode

For a 10-stage Hive aggregation query, MapReduce requires 10 separate MR jobs with HDFS writes between every stage. Tez runs the same query as a single DAG — no HDFS round-trips between stages, containers reused across task waves, and pipeline-style data movement between compatible vertices.


Required Reading

Complete in this order before starting the labs:

#ResourceWhat to extract
1README.md in the Tez repo rootBuild commands, module overview
2Tez architecture documentOriginal design intent, DAG model rationale
3YARN ArchitectureContainer lifecycle, AM responsibilities
4tez-api/src/main/java/org/apache/tez/dag/api/TezClient.javaClass-level Javadoc only — understand session vs. non-session
5tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.javaSkim all keys — understand the category groupings
6tez-examples/src/main/java/org/apache/tez/examples/OrderedWordCount.javaEnd-to-end DAG construction and submission

Note on reading strategy: In a mature Apache codebase, Javadoc is often the best documentation that exists. Class-level Javadoc on public API classes reflects decisions debated and agreed upon by committers. Read it seriously.


Source Code Areas to Inspect

Read these files before and after the labs. You are not modifying anything yet.

tez-api — Public API

FileWhy
dag/api/TezClient.javaEntry point for all DAG submissions. Read createTezClient(), start(), submitDAG().
dag/api/DAG.javaDAG construction API. Note addVertex(), addEdge(), addTaskLocalFiles().
dag/api/Vertex.javaVertex definition. Understand ProcessorDescriptor, parallelism, and VertexManagerPlugin.
dag/api/Edge.javaEdge definition. Understand EdgeProperty and DataMovementType.
dag/api/client/DAGClient.javaDAG monitoring. Understand getDAGStatus() and progress tracking.
dag/api/TezConfiguration.javaAll Tez configuration keys. Every key is documented.
dag/api/EdgeProperty.javaData movement type and scheduling type for edges. Fundamental to DAG design.

tez-dag — Core Execution Engine

FileWhy
app/DAGAppMaster.javaThe YARN ApplicationMaster. First read: just init() and start(). It is 5000+ lines.
app/dag/impl/DAGImpl.javaDAG state machine. Read the state/transition enum declarations at the top.
app/dag/impl/VertexImpl.javaMost complex class in the project. First read: state enum + handle() only.
app/dag/impl/TaskImpl.javaTask state machine. More tractable than VertexImpl. Read fully.
app/dag/impl/TaskAttemptImpl.javaTaskAttempt state machine. Read fully.

tez-runtime-library — I/O Implementations

FileWhy
runtime/library/input/OrderedGroupedKVInput.javaStandard sorted shuffle input. Used by most Hive reduce operations.
runtime/library/output/OrderedPartitionedKVOutput.javaStandard sorted shuffle output. Paired with the above.
runtime/library/input/UnorderedKVInput.javaBroadcast input — data is not sorted.

tez-examples — Reference Implementations

FileWhy
examples/OrderedWordCount.javaThe canonical Tez DAG example. Read this completely.
examples/IntersectExample.javaShows a 3-vertex DAG with a broadcast edge.

Key Classes Quick Reference

ClassModulePackageRole
TezClienttez-apiorg.apache.tez.dag.apiCreates sessions, submits DAGs
DAGtez-apiorg.apache.tez.dag.apiDefines the computation graph
Vertextez-apiorg.apache.tez.dag.apiOne processing stage
Edgetez-apiorg.apache.tez.dag.apiData connection between vertices
EdgePropertytez-apiorg.apache.tez.dag.apiData movement + scheduling type
ProcessorDescriptortez-apiorg.apache.tez.dag.apiWhich Processor class runs in a vertex
TezConfigurationtez-apiorg.apache.tez.dag.apiAll Tez configuration keys
DAGAppMastertez-dagorg.apache.tez.dag.appYARN ApplicationMaster
DAGImpltez-dagorg.apache.tez.dag.app.dag.implDAG state machine
VertexImpltez-dagorg.apache.tez.dag.app.dag.implVertex state machine
TaskImpltez-dagorg.apache.tez.dag.app.dag.implTask state machine
TaskAttemptImpltez-dagorg.apache.tez.dag.app.dag.implTaskAttempt state machine
TezTaskRunner2tez-runtime-internalsorg.apache.tez.runtimeRuns a task inside a container
OrderedWordCounttez-examplesorg.apache.tez.examplesCanonical DAG example

JIRA Issue Categories for Level 1 Contributors

At this stage, focus exclusively on:

  • Documentation — Javadoc typos, outdated parameter descriptions, missing @param or @return annotations, broken links in comments
  • Test improvements — Adding missing assertions to existing tests, improving test method naming, removing dead code from test classes
  • Checkstyle violations — Unused imports, line length violations, missing final keywords

How to find these:

  1. Go to Apache Tez JIRA
  2. Search: project = TEZ AND labels = "newbie" AND resolution = Unresolved
  3. Also scan: project = TEZ AND component = "Documentation" AND resolution = Unresolved
  4. Look at recently closed "trivial" issues to understand the standard for accepted patches

Warning: Do not pick up a JIRA issue and immediately upload a patch. Read all existing comments. If there is an active discussion or existing assignee, move on. Leave a comment saying you are investigating before you claim an issue.


Deliverables

You must demonstrate all of the following before advancing to Level 2:

  • Successful mvn install -DskipTests output — no build failures
  • At least one unit test class run successfully (e.g., TestDAGImpl)
  • Successful local DAG execution showing DAG completed: SUCCEEDED
  • Ability to locate DAGAppMaster, TezClient, and OrderedGroupedKVInput by memory
  • Written explanation (2–3 sentences) of why a Tez DAG is faster than chained MapReduce
  • Written explanation of the difference between a YARN container and a Tez task

Common Mistakes

MistakeConsequenceFix
Building with Java 17 against masterCompile errors or compatibility failuresUse Java 8 or Java 11; check <maven.compiler.source> in root pom.xml
Running mvn test on the full repositoryHours-long run including integration testsUse -pl tez-dag -am to scope to one module
Ignoring TezConfiguration.javaConfusion about configuration keys throughout all levelsSkim the entire file; every key is documented
Skipping the YARN architecture docConfusion about what Tez owns vs. what YARN ownsYARN understanding is required from Level 3 onward
Trying to understand all of DAGAppMaster at onceOverwhelm — 5000+ linesFirst pass: read only init() and start()
Reading Tez code without running itAbstract understanding that does not transfer to debuggingAlways run the code after reading it
Picking a JIRA issue without reading existing commentsDuplicate work; community frictionRead all comments; check assignee; leave a note before claiming

How to Verify Success

# 1. Full build without tests
cd /path/to/tez
mvn install -DskipTests -q && echo "BUILD OK"

# 2. Unit test from tez-dag
mvn test -pl tez-dag -am -Dtest=TestDAGImpl -q

# 3. Local DAG run (from Lab 1.3)
# Expected final output line:
#   DAG: [OrderedWordCount] finished with status: [SUCCEEDED]

Patch Profile: Level 1 Graduate

Patch typeExampleTest requirement
Javadoc fixCorrecting a wrong @param description in TezClientNone — documentation only
Dead import removalRemove unused import statement flagged by checkstyleRun mvn checkstyle:check -pl <module>
Test assertion improvementAdd assertEquals to an existing test that only checks for no-exceptionRun the test class
README updateFix a broken Maven command in the build instructionsManual verification

You are not ready to submit: bug fixes in state machines, new features, performance patches, or changes to the shuffle path. Those require Levels 3–7.

Lab 1.1: Build Apache Tez from Source

Background

Apache Tez is a multi-module Maven project. Building from source is the mandatory first step for any contributor — you need the ability to make code changes, rebuild specific modules, and run tests against your local changes. This lab walks through the full build, from cloning to verifying artifacts.

Why This Lab Matters for Contributors

  • You cannot submit a credible patch without first verifying it builds cleanly
  • Knowing which Maven flags control which modules saves hours during development
  • Understanding the build structure helps you scope test runs efficiently
  • Build failures are sometimes real bugs — knowing a clean build baseline lets you detect regressions

Prerequisites

Verify before starting:

java -version    # Must be Java 8 or Java 11
mvn -version     # Must be Maven 3.6.3 or newer
git --version    # Must be 2.x

Disk space: at least 10 GB free. The full build with tests generates large artifacts. Memory: at least 8 GB RAM. The tez-dag unit tests can spike to 4 GB during parallel runs.


Step-by-Step Tasks

Step 1: Clone the Repository

git clone https://github.com/apache/tez.git
cd tez

The GitHub repository at https://github.com/apache/tez is a mirror of the canonical Apache GitBox repository. For contribution purposes (submitting patches via JIRA), the GitHub mirror is acceptable for development. The patch will be attached to the JIRA issue rather than sent as a GitHub PR — this is Apache's traditional workflow.

Verify the remote:

git remote -v
# origin  https://github.com/apache/tez.git (fetch)
# origin  https://github.com/apache/tez.git (push)

Step 2: Inspect the Branch Structure

git branch -r | grep -v HEAD | sort

You will see branches like:

  • origin/master — development trunk
  • origin/branch-0.10 — stable release branch
  • origin/branch-0.9 — older stable branch

For contributor work, use master unless you are reproducing an issue specific to a release branch. Bug fixes for release branches are typically backported from master.

Check the current Hadoop dependency in pom.xml:

grep -m1 "hadoop.version" pom.xml

This tells you which Hadoop version Tez is built against. The default Hadoop version target controls which APIs are available.

Step 3: Full Build (Skip Tests)

mvn install -DskipTests -q

Expected duration: 5–15 minutes depending on hardware and Maven cache state.

The first run downloads all dependencies. With a warm Maven cache (~/.m2/repository), subsequent builds of unchanged modules are near-instant due to incremental compilation.

What -DskipTests does:
Skips compilation and execution of test classes. Use this for iterative development when you are not changing test code.

What -q does:
Suppresses INFO-level Maven output. Remove -q if you need to debug build failures.

When the build completes, you will see:

[INFO] BUILD SUCCESS
[INFO] Total time:  X min Y s

If you see BUILD FAILURE, go to the Troubleshooting section below.

Step 4: Verify Build Artifacts

After a successful build, key JARs exist in each module's target/ directory:

find . -name "tez-dag-*.jar" -not -path "*/test-*" | grep -v sources
# Expected: ./tez-dag/target/tez-dag-<version>.jar

find . -name "tez-api-*.jar" -not -path "*/test-*" | grep -v sources
# Expected: ./tez-api/target/tez-api-<version>.jar

The assembled distribution tarball is built by a separate command:

mvn package -DskipTests -Pdist -q
ls tez-dist/target/*.tar.gz

This produces the full binary distribution used by HDP and other distributions.

Step 5: Build a Single Module

During development you will almost always build a single module to save time:

# Build only tez-dag and its dependencies
mvn install -DskipTests -pl tez-dag -am -q

# Build only tez-api (no dependencies needed — it has none in Tez)
mvn install -DskipTests -pl tez-api -q

-pl specifies the module path. -am (also-make) builds all upstream dependencies first. This is the command you will run hundreds of times during contributor work.

Step 6: Configure IntelliJ IDEA

IntelliJ handles Maven multi-module projects natively.

  1. File → Open → select the tez/ directory (the one containing pom.xml)
  2. IntelliJ detects the Maven project and imports all modules
  3. When prompted, select the JDK that matches the build (Java 8 or Java 11)
  4. Wait for the initial index build to complete (2–5 minutes)

Verify the import worked:

  • Open tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java
  • Ctrl+Click on any class reference — it should navigate correctly
  • Open Find Class (Cmd+O / Ctrl+N) and search TestDAGImpl — it should find the test

Enable checkstyle integration:

  1. Install the CheckStyle-IDEA plugin (Settings → Plugins)
  2. Configure it to use src/config/checkstyle.xml in the Tez repo root
  3. This gives you real-time checkstyle feedback as you edit

Implementation Requirements

This lab has no code to implement. Deliverables are:

  1. A successful mvn install -DskipTests run (screenshot or terminal output)
  2. Identification of the Hadoop version Tez is built against
  3. Location of the tez-dag-<version>.jar artifact
  4. A working IntelliJ project that resolves all imports

Troubleshooting Common Build Failures

"Source/Target Java version mismatch"

error: Source option X is no longer supported. Use Y or later.

Cause: Your JAVA_HOME or java in PATH is the wrong version.
Fix:

export JAVA_HOME=$(/usr/libexec/java_home -v 11)   # macOS
export PATH=$JAVA_HOME/bin:$PATH
java -version   # verify
mvn install -DskipTests -q

"Cannot resolve dependency: org.apache.hadoop:..."

Cause: The required Hadoop version is not in Maven Central or your local cache.
Fix: Ensure Maven Central is reachable. If building offline, use an internal repository mirror. On a clean machine with network access this should not occur.

"Killed" or "Out of Memory"

Cause: Maven forked JVM runs out of heap.
Fix:

export MAVEN_OPTS="-Xmx4g -XX:MaxPermSize=512m"
mvn install -DskipTests -q

"ERROR: Failed to execute goal ... tez-tests"

Cause: The tez-tests module requires specific integration test infrastructure.
Fix: Build only the modules you need:

mvn install -DskipTests -pl tez-api,tez-dag,tez-runtime-library,tez-examples -am -q

Expected Output

[INFO] Reactor Summary:
[INFO] Apache Tez ......................................... SUCCESS [  2.345 s]
[INFO] tez-api ............................................ SUCCESS [ 15.678 s]
[INFO] tez-dag ............................................ SUCCESS [ 45.123 s]
[INFO] tez-runtime-internals .............................. SUCCESS [ 12.456 s]
[INFO] tez-runtime-library ................................ SUCCESS [ 18.789 s]
[INFO] tez-mapreduce ...................................... SUCCESS [  8.012 s]
[INFO] tez-examples ....................................... SUCCESS [  5.234 s]
...
[INFO] BUILD SUCCESS

Stretch Goals

  1. Build against a specific Hadoop version by overriding the hadoop.version property:

    mvn install -DskipTests -Dhadoop.version=3.3.6 -q
    
  2. Inspect the generated effective-pom.xml for tez-dag to see all inherited dependency versions:

    mvn help:effective-pom -pl tez-dag | grep -A3 "dependency>"
    
  3. Identify which modules depend on tez-api by inspecting all pom.xml files:

    grep -r "tez-api" */pom.xml | grep "artifactId"
    
  • Build breakage issues (e.g., dependency version conflicts) — you can observe but not fix at Level 1
  • Java version compatibility issues — important context when reading bug reports

Lab 1.2: Run Unit and Integration Tests

Background

Apache Tez has a well-structured test suite that spans unit tests, module-level integration tests, and full cluster integration tests using MiniTezCluster. Understanding how to run specific tests, read failures, and scope test execution is essential for contributor work — your patch must include a passing test run before upload.

Why This Lab Matters for Contributors

  • You must run tests before submitting any patch
  • Being able to run a single test class in seconds makes iteration fast
  • Understanding test failure output is the first step to debugging
  • Many flaky tests are contributor opportunities once you understand how tests work

How Tez Tests Are Organized

Tez tests fall into three categories:

CategoryLocationRuns withScope
Unit testssrc/test/java/ in each modulemvn test -pl <module>Fast, no cluster
Module integration teststez-tests/src/test/java/mvn test -pl tez-testsRequires MiniTezCluster
System testsManual / CI scriptsRequires full clusterNot run locally

For Level 1–3 work, focus exclusively on unit tests.

Key unit test classes in tez-dag (path: tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/):

Test ClassWhat it Tests
TestDAGImplDAGImpl state machine transitions, initialization, completion
TestVertexImplVertexImpl state machine — the most complex test class in the project
TestTaskImplTaskImpl state machine transitions
TestTaskAttemptImplTaskAttemptImpl state transitions, speculation, failure handling

Supporting test infrastructure in tez-dag/src/test/java/org/apache/tez/dag/app/:

ClassRole
MockDAGAppMasterA reduced AM for unit testing — no YARN connection needed
MockAppContextMock AppContext that provides state to state machine tests
MockHistoryEventHandlerNo-op history handler for tests that don't test history

Step-by-Step Tasks

Step 1: Run All Unit Tests in tez-dag

cd /path/to/tez
mvn test -pl tez-dag -am -q 2>&1 | tail -30

Expected duration: 3–8 minutes depending on hardware.

Expected completion:

[INFO] Tests run: NNNN, Failures: 0, Errors: 0, Skipped: NN
[INFO] BUILD SUCCESS

Some tests are marked @Ignore or skipped due to environment constraints — a non-zero Skipped count is normal.

Step 2: Run a Single Test Class

mvn test -pl tez-dag -am -Dtest=TestDAGImpl -q

Expected output (last few lines):

[INFO] Tests run: 42, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: X.XXX s
[INFO] BUILD SUCCESS

If a test fails, you will see:

[ERROR] Tests run: 42, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: X.XXX s
[ERROR] testDAGCreation(org.apache.tez.dag.app.dag.impl.TestDAGImpl): expected:<...> but was:<...>

Step 3: Run a Single Test Method

mvn test -pl tez-dag -am -Dtest=TestDAGImpl#testDAGCreation -q

This is the command you will use most often: run exactly one test after a code change to verify your fix.

Step 4: Read the Surefire Report

Maven writes detailed test results to:

tez-dag/target/surefire-reports/

For a failing test, read the .txt file for the test class:

cat tez-dag/target/surefire-reports/org.apache.tez.dag.app.dag.impl.TestDAGImpl.txt

This contains the full stack trace, which is often more informative than the Maven console output.

Step 5: Run Tests in tez-api

mvn test -pl tez-api -q

tez-api tests are faster and simpler. Key test classes:

Test ClassWhat it Tests
TestDAGDAG API construction, validation, serialization
TestVertexVertex API construction and edge validation
TestTezClientTezClient initialization and session management
TestAMControlAM communication protocol

Step 6: Run Tests in tez-runtime-library

mvn test -pl tez-runtime-library -am -q

This includes shuffle and I/O tests. Expected duration: 5–10 minutes.

Key test classes:

Test ClassWhat it Tests
TestOrderedPartitionedKVWriterSorted KV output serialization
TestFetcherShuffle fetch logic
TestShuffleSchedulerFetch scheduling and retry
TestTezMergerSort-merge implementation

Step 7: Understand a Test Failure

Intentionally break a test to understand failure output:

  1. Open tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java
  2. Find the getTotalVertices() method
  3. Add return 0; as the first line
  4. Run mvn test -pl tez-dag -am -Dtest=TestDAGImpl -q
  5. Read the failure output in both the console and the surefire report
  6. Revert the change with git checkout tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java

This exercise makes test failure output familiar before you encounter a real failure.


Debugging Test Failures

Adding Log Output

Tez uses SLF4J + Log4j. To enable debug-level logging during a test run:

mvn test -pl tez-dag -am -Dtest=TestDAGImpl \
  -Dlog4j.configuration=file:src/test/resources/log4j.properties \
  -Dlog4j.logger.org.apache.tez=DEBUG

Running Tests with Remote Debug (IntelliJ)

To attach a debugger to a Maven test run:

mvn test -pl tez-dag -am -Dtest=TestDAGImpl \
  -Dmaven.surefire.debug="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5005"

In IntelliJ: Run → Attach to Process → port 5005. The test JVM pauses until IntelliJ connects.


Testing Checklist

Before submitting any patch:

  • Run mvn test -pl <changed-module> -am — zero failures
  • If adding a new test: mvn test -pl <module> -am -Dtest=<YourNewTest> passes
  • Run mvn checkstyle:check -pl <changed-module> — zero violations
  • If the change touches shuffle or I/O: run mvn test -pl tez-runtime-library -am

Expected Output

A clean test run for TestDAGImpl:

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.tez.dag.app.dag.impl.TestDAGImpl
[INFO] Tests run: 42, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.345 s
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 42, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] BUILD SUCCESS

Stretch Goals

  1. Find all test classes in tez-dag that test the VertexImpl state machine:

    find tez-dag/src/test -name "*.java" | xargs grep -l "VertexImpl"
    
  2. Count the total number of test methods in TestVertexImpl:

    grep -c "@Test" tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestVertexImpl.java
    
  3. Identify which test classes take the longest to run by examining surefire report timestamps:

    grep "Time elapsed" tez-dag/target/surefire-reports/*.txt | sort -t= -k2 -rn | head -10
    
  4. Find tests that use MockDAGAppMaster to understand the test infrastructure pattern:

    grep -rl "MockDAGAppMaster" tez-dag/src/test/
    
  • Flaky tests (timing-dependent, environment-dependent) — a major contributor opportunity
  • Tests that don't assert anything meaningful — test quality improvements
  • Missing test coverage for error paths — discoverable by reading state machine code

Lab 1.3: Run a Simple Tez DAG Locally

Background

Apache Tez supports a local mode that runs the entire DAG execution inside a single JVM without YARN or HDFS. This is the primary environment for rapid development and testing. Understanding how to run a DAG in local mode is essential before attempting cluster testing.

The tez-examples module contains reference DAG implementations. OrderedWordCount is the canonical example: it reads text, counts word occurrences, and sorts by frequency. It demonstrates the complete Tez DAG API: TezClient, DAG, Vertex, Edge, and I/O processors.

Why This Lab Matters for Contributors

  • Local mode is how you verify behavior changes without a cluster
  • All integration test work in tez-tests builds on the same local mode infrastructure
  • Understanding how a real DAG is constructed gives concrete context for reading state machine code
  • Every DAG execution produces log output that teaches you about the AM lifecycle

Understanding Tez Local Mode

Tez local mode is enabled by setting tez.local.mode=true in the TezConfiguration. When this is set:

  • No YARN cluster is contacted
  • No containers are launched — task execution happens in threads within the same JVM
  • LocalMode.java replaces the full DAGAppMaster with a lightweight local executor
  • HDFS is replaced by the local filesystem (configurable)

Key configuration for local mode:

TezConfiguration tezConf = new TezConfiguration();
tezConf.setBoolean(TezConfiguration.TEZ_LOCAL_MODE, true);
// Use local filesystem instead of HDFS
tezConf.set("fs.defaultFS", "file:///");
tezConf.setBoolean("tez.local.mode.without.network", true);

Anatomy of OrderedWordCount

Before running the example, read tez-examples/src/main/java/org/apache/tez/examples/OrderedWordCount.java.

The DAG structure:

[Tokenizer Vertex]
      |
      | (SCATTER_GATHER edge — partitioned by hash, sorted)
      v
[SumReducer Vertex]
      |
      | (SCATTER_GATHER edge — partitioned by value for sort)
      v
[Sorter Vertex] → HDFS output

Tokenizer: Reads input text lines, splits into words, emits (word, 1) pairs.
Processor class: TokenProcessor (inner class in OrderedWordCount)

SumReducer: Receives (word, [1, 1, 1, ...]) groups, sums counts, emits (word, count).
Processor class: SumProcessor (inner class in OrderedWordCount)

Sorter: Receives by (count, word) key (reversed), emits sorted output.
Processor class: NoOpSorter — uses OrderedGroupedKVInput to do the sort during shuffle

The key insight: Tez uses edge properties and I/O processor configuration to control the sort and partition behavior. The Sorter vertex does not sort — the shuffle/merge into OrderedGroupedKVInput does the sorting.


Step-by-Step Tasks

Step 1: Prepare Sample Input

mkdir -p /tmp/tez-lab/input
cat > /tmp/tez-lab/input/words.txt << 'EOF'
the quick brown fox jumps over the lazy dog
the dog barked at the fox
quick brown dog
EOF

Step 2: Build tez-examples

cd /path/to/tez
mvn package -DskipTests -pl tez-examples -am -q

Locate the examples JAR:

ls tez-examples/target/tez-examples-*.jar | grep -v sources | grep -v tests

Step 3: Run OrderedWordCount in Local Mode

The example is run as a standard Java main class:

# Set classpath to include Tez JARs
TEZ_HOME=/path/to/tez

CLASSPATH=\
$TEZ_HOME/tez-examples/target/tez-examples-*.jar:\
$TEZ_HOME/tez-api/target/tez-api-*.jar:\
$TEZ_HOME/tez-dag/target/tez-dag-*.jar:\
$TEZ_HOME/tez-runtime-library/target/tez-runtime-library-*.jar:\
$TEZ_HOME/tez-runtime-internals/target/tez-runtime-internals-*.jar:\
$TEZ_HOME/tez-mapreduce/target/tez-mapreduce-*.jar:\
$TEZ_HOME/tez-common/target/tez-common-*.jar

# Add Hadoop JARs (required for FileSystem, Configuration, etc.)
# If Hadoop is installed:
CLASSPATH=$CLASSPATH:$(hadoop classpath)
# If not, add from Maven local cache manually

java -cp "$CLASSPATH" \
  org.apache.tez.examples.OrderedWordCount \
  /tmp/tez-lab/input \
  /tmp/tez-lab/output \
  1

Tip: The easiest way to handle classpaths during development is to use Maven's exec:java goal or to build a fat JAR using the shade plugin. The tez-dist assembly includes all JARs and the bin/ scripts handle classpath setup.

Step 4: Run with Maven exec plugin (simpler)

If you have Hadoop installed and HADOOP_HOME set, use the Tez distributed shell script:

cd $TEZ_HOME
bin/tez-examples.sh OrderedWordCount \
  /tmp/tez-lab/input \
  /tmp/tez-lab/output \
  1

Or, add local mode flags to the Hadoop conf:

java -Dtez.local.mode=true \
     -Dfs.defaultFS=file:/// \
     -cp "$CLASSPATH" \
     org.apache.tez.examples.OrderedWordCount \
     /tmp/tez-lab/input \
     /tmp/tez-lab/output \
     1

Step 5: Verify Output

cat /tmp/tez-lab/output/part-*

Expected output (sorted by frequency descending):

the	4
dog	3
fox	2
quick	2
brown	2
...

Step 6: Read the Execution Log

Examine the log output from the run. Key lines to understand:

INFO  TezClient: Submitting DAG to YARN, queueName=...
INFO  DAGAppMaster: Running DAG: [OrderedWordCount]
INFO  VertexImpl: Vertex: [Tokenizer] initialized
INFO  VertexImpl: Vertex: [Tokenizer] started
INFO  DAGImpl: DAG: [OrderedWordCount] finished with status: [SUCCEEDED]

These lines correspond directly to state machine transitions you will study in Level 4. For each log line, identify the state transition it represents.


Implementation Requirements

Modify OrderedWordCount to add a fourth vertex that filters out words with count < 2:

  1. Add a new Vertex named "Filter" after SumReducer and before Sorter
  2. Write a minimal FilterProcessor extends AbstractProcessor:
    • In run(): iterate the input, skip pairs where the count value < 2, forward the rest
  3. Add an edge SumReducer → Filter and Filter → Sorter
  4. Run the modified DAG and verify that single-occurrence words are removed from output

This exercise teaches you:

  • How to add a vertex to an existing DAG
  • How to write a minimal Processor implementation
  • How edges connect processors

Do not overthink the implementation — the processor body is ~20 lines.


Debugging Checklist

If the DAG fails with DAG status: FAILED:

  1. Read the log for ERROR lines — they contain the failure reason and task attempt ID
  2. Check DAGAppMaster log for VertexImpl: Vertex [...] failed
  3. The error message will include the class and method where the exception occurred
  4. Common causes:
    • Classpath missing a required JAR (NoClassDefFoundError)
    • Output directory already exists (FileAlreadyExistsException)
    • Wrong input path (FileNotFoundException)

Clean output directory before re-running:

rm -rf /tmp/tez-lab/output

Expected Output

A successful run ends with:

INFO  DAGImpl: DAG: [OrderedWordCount] finished with status: [SUCCEEDED]
INFO  TezClient: Shutting down TezSession...

Stretch Goals

  1. Enable INFO-level logging for org.apache.tez.dag.app.dag.impl and observe vertex state transitions in the console output during the DAG run.

  2. Modify the DAG to use UnorderedKVInput/UnorderedKVOutput instead of the ordered pair for the first edge. Observe the difference in output ordering.

  3. Change the parallelism of the Sorter vertex to 2 and observe the output directory structure (2 part files instead of 1).

  4. Add a timer around the TezClient.submitDAG()DAGClient.waitForCompletion() block and measure execution time for different input sizes.

  • Local mode-specific bugs (different from cluster mode) — contributor opportunity
  • DAG API usability issues — often exposed by example code
  • Local mode configuration issues — often reported by new users

Lab 1.4: Project — Number Pipeline DAG

What You Are Building

A self-contained, runnable Java project that builds and executes a 3-vertex Tez DAG entirely in local mode — no YARN cluster, no HDFS, no Docker required.

Generator (2 tasks)
    │  SCATTER_GATHER shuffle
    ▼
Multiplier (2 tasks)   [value * 2]
    │  SCATTER_GATHER shuffle
    ▼
Sink (1 task)          [sum → counter]

Numbers 0–99 flow through the pipeline. The expected final sum is: sum(0..99) * 2 = 4950 * 2 = 9900.

This pipeline intentionally mirrors the structure of Apache Tez's own OrderedWordCount example but with an integer domain so the math is verifiable without a corpus.


Project Location

book/projects/
├── pom.xml                              ← parent; sets Tez + Hadoop versions
└── level-1-number-pipeline/
    ├── pom.xml
    └── src/main/java/org/apache/tez/learning/l1/
        ├── GeneratorProcessor.java      ← no inputs; emits integers
        ├── MultiplierProcessor.java     ← one input, one output; value * 2
        ├── SinkProcessor.java           ← sums values; publishes counter
        ├── FilterProcessor.java         ← exercise stub (incomplete)
        └── NumberPipelineDAG.java       ← main class; configures + submits DAG

Prerequisites

  • Completed Lab 1.1 (Apache Tez built from source with mvn install -DskipTests)
  • Java 8+ on $PATH
  • Maven 3.6+ on $PATH

Step 1: Set the Tez Version

The parent pom.xml needs to reference the exact version that mvn install installed into your local ~/.m2 repository. Find it:

# Inside your apache/tez clone:
grep -m1 '<version>' pom.xml

Open book/projects/pom.xml and set <tez.version> to match:

<tez.version>0.10.3-SNAPSHOT</tez.version>   <!-- adjust to your build -->

Step 2: Compile

cd /path/to/opensource-engineer-and-contributor/apache-tez/book/projects

# Build only the level-1 module (fast; skips the other modules)
mvn -pl level-1-number-pipeline package -q

You should see no errors. The fat JAR is at:

level-1-number-pipeline/target/level-1-number-pipeline-1.0-SNAPSHOT-jar-with-dependencies.jar

If you see Could not resolve dependency org.apache.tez:tez-api:

  1. Verify that tez.version matches the version in ~/.m2/repository/org/apache/tez/tez-api/
  2. Re-run mvn install -DskipTests in your Tez clone

Step 3: Run

java -jar level-1-number-pipeline/target/level-1-number-pipeline-1.0-SNAPSHOT-jar-with-dependencies.jar

Expected output (log lines abbreviated):

TezClient started (local mode).
Submitting DAG...
[SinkProcessor] task=0  partialSum=9900

=== NumberPipeline Result ===
  Expected : 9900
  Actual   : 9900
  Result   : PASS

Note: You will see a large number of INFO log lines from the Tez framework. This is normal for local mode. The important lines are the ones from [SinkProcessor] and the final === Result === block.


Step 4: Read Every Source File

Before modifying anything, read each Java file carefully.

GeneratorProcessor.java

Key questions:

  1. Which Tez interface does it implement?
  2. Why is output.start() called before getWriter()? What happens if you remove it?
  3. How does the processor know which range of numbers to generate? What Tez API provides this?
  4. The key and value written are both the same integer n. Why? When would you want them to differ?

MultiplierProcessor.java

Key questions:

  1. OrderedGroupedKVInput vs OrderedPartitionedKVOutput — which side is the input and which is the output? Why are they named differently?
  2. Both input.start() and output.start() are called. What does input.start() actually trigger? (Hint: look at OrderedGroupedKVInput.start() in the Tez source.)
  3. FACTOR = 2 is hardcoded. The Javadoc explains how to pass it via UserPayload. What is the size in bytes of an int encoded in a ByteBuffer?

SinkProcessor.java

Key questions:

  1. What is the type of getContext().getCounters()?
  2. findCounter(group, name) — what happens if the counter doesn't exist yet when first called?
  3. There is only one Sink task (parallelism=1). If you changed it to 2, would the counter still be correct? Why?

NumberPipelineDAG.java

Key questions:

  1. What does tez.local.mode=true actually change about task execution?
  2. OrderedPartitionedKVEdgeConfig.newBuilder(keyClass, valueClass, partitionerClass) — what is HashPartitioner doing here, and where does the partition count come from?
  3. dagClient.waitForCompletion() — does this block on the calling thread, or is it async?
  4. EnumSet.of(StatusGetOpts.GET_COUNTERS) — why is this extra call needed? Why aren't counters always included in DAGStatus?

Step 5: Break It and Understand It

Make each change, run the JAR, observe the failure, then revert.

Break 1: Remove output.start()

In GeneratorProcessor.run(), comment out logicalOutput.start().

Expected: NullPointerException or IllegalStateException from the Tez runtime when getWriter() is called on an uninitialized output.

Why this matters: Tez I/O objects are lazily initialized. The start() method triggers buffer allocation, sort buffer setup, and (for inputs) the shuffle fetch. Forgetting start() is a common first patch mistake.

Break 2: Set the wrong parallelism

Change sink parallelism from 1 to 3, run again.

Observe: does the result change? Is it still 9900? Why or why not?

Expected: the total counter is still 9900, because each Sink task emits a partial sum and the AM aggregates counters across all tasks automatically.

Break 3: Swap key and value in the Generator

Change writer.write(new IntWritable(n), new IntWritable(n)) to writer.write(new IntWritable(0), new IntWritable(n)) (fixed key = 0).

Expected: all values route to the same Multiplier task (the one that owns partition 0). The other Multiplier task gets no work. The result is still 9900 (correct) but the work distribution is skewed. You can verify this by adding a counter in MultiplierProcessor that tracks how many records each task processed.

Why this matters: key-skew (many records with the same key) is one of the most common Tez/MapReduce performance problems. This exercise makes it visible.


Step 6: Add a FilterProcessor (Exercise)

Open FilterProcessor.java. This is the skeleton for your exercise.

Your task: Insert a FilterProcessor between Multiplier and Sink that drops all values not divisible by 4, then verify the new expected sum.

Step 6a: Implement FilterProcessor

  1. Add a private int threshold field.
  2. In initialize(), read the threshold from UserPayload:
    byte[] bytes = getContext().getUserPayload().deepCopyAsArray();
    this.threshold = ByteBuffer.wrap(bytes).getInt();
    
  3. In run(), replace if (true) with if (value.get() % threshold == 0).

Step 6b: Update NumberPipelineDAG.buildDAG()

Vertex filter = Vertex.create("filter",
    ProcessorDescriptor.create(FilterProcessor.class.getName())
        .setUserPayload(UserPayload.create(
            ByteBuffer.allocate(4).putInt(4).flip())),  // threshold=4
    2);  // same parallelism as multiplier

// New edge chain: generator → multiplier → filter → sink
.addEdge(Edge.create(generator,  multiplier, edgeConf.createDefaultEdgeProperty()))
.addEdge(Edge.create(multiplier, filter,     edgeConf.createDefaultEdgeProperty()))
.addEdge(Edge.create(filter,     sink,       edgeConf.createDefaultEdgeProperty()));

Step 6c: Calculate the new expected sum

After multiplying by 2, the values are: 0, 2, 4, 6, 8, …, 198. After filtering (keep only values divisible by 4): 0, 4, 8, 12, …, 196. Sum of {0, 4, 8, …, 196} = 4 * sum(0, 1, 2, …, 49) = 4 * (49*50/2) = 4 * 1225 = 4900.

Update NumberPipelineDAG.expectedSum() to return 4900 and verify PASS.


Step 7: Connect This to the Tez Source

Every class you used in this project maps to a real Tez module.

ClassModuleSource path
AbstractLogicalIOProcessortez-runtime-apitez-runtime-api/src/main/java/org/apache/tez/runtime/api/AbstractLogicalIOProcessor.java
OrderedPartitionedKVOutputtez-runtime-librarytez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/OrderedPartitionedKVOutput.java
OrderedGroupedKVInputtez-runtime-librarytez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/OrderedGroupedKVInput.java
OrderedPartitionedKVEdgeConfigtez-runtime-librarytez-runtime-library/src/main/java/org/apache/tez/runtime/library/conf/OrderedPartitionedKVEdgeConfig.java
TezClienttez-apitez-api/src/main/java/org/apache/tez/client/TezClient.java
TezConfigurationtez-commontez-common/src/main/java/org/apache/tez/dag/api/TezConfiguration.java

After running the pipeline successfully, open each source file above. For each one:

  1. Find the method you called
  2. Read its implementation — what does it actually do?
  3. Find the unit test class for that file (usually in src/test/java/ under the same package)

This pipeline uses OrderedPartitionedKVOutput. Search the Tez JIRA for issues in this component to find real bugs and improvements you could work on:

project = TEZ AND component = "runtime-library" AND status in (Open, Patch Available)
ORDER BY priority DESC

Also search specifically:

text ~ "OrderedPartitionedKVOutput" AND status in (Open, "Patch Available")

For each open issue you find, ask yourself:

  1. Do you understand what the bug description is saying?
  2. Can you locate the relevant code in the source?
  3. Is there a failing test, or do you need to write one?

Expected Deliverables

  • Project compiles without errors
  • Running the JAR prints PASS with result 9900
  • You can answer all questions in Step 4 (with file:line references to the source)
  • You have run all three "Break It" experiments and understand each failure
  • FilterProcessor is implemented and the pipeline prints PASS with result 4900
  • You have opened all 5 source files from the "Connect to Source" table
  • You have found at least 2 open JIRA issues in the runtime-library component

Level 2: Apache Contributor Onboarding

This level teaches you how the Apache open-source contribution machine works — not in the abstract, but in the specific context of Apache Tez. You will set up your tooling, understand the community structure, learn the patch workflow, and submit your first meaningful change.


Learning Objectives

By the end of Level 2 you must be able to:

  1. Subscribe to dev@tez.apache.org and read a week's worth of threads
  2. Navigate Apache Tez JIRA to find and evaluate open issues
  3. Describe the full lifecycle of a patch: from JIRA issue to committed code
  4. Generate a unified diff patch from a Git branch
  5. Run Apache checkstyle and resolve all violations before submitting a patch
  6. Write a JIRA comment that adds technical value
  7. Find any class in the Tez repository in under 30 seconds

Apache Open-Source Contribution Fundamentals

Apache projects operate differently from GitHub-native open-source projects. The primary communication channels are mailing lists, not GitHub issues or Slack. Patches are attached to JIRA issues, not submitted as GitHub pull requests (though GitHub PRs may be used as a convenience in some projects — Tez still prefers JIRA-based workflow).

The Contribution Hierarchy

PMC (Project Management Committee)
  └─ Committers (can commit directly)
       └─ Contributors (submit patches via JIRA)
            └─ Everyone else (can file issues, ask questions)

Becoming a contributor means submitting patches. Becoming a committer means sustained, high-quality contributions over time that earn the trust of existing committers.

The Patch Lifecycle

1. Find or file a JIRA issue
2. Leave a comment: "I'm looking into this"
3. Make changes on a local branch
4. Run: mvn test -pl <module> -am  (must pass)
5. Run: mvn checkstyle:check -pl <module>  (must pass)
6. Generate a patch: git diff origin/master > TEZ-NNNN.patch
7. Attach the patch to the JIRA issue
8. Set JIRA status to "Patch Available"
9. Wait for review — a committer will comment or set "Reviewed" or "Not a bug"
10. Address feedback → upload v2 patch → repeat
11. Committer commits the patch (you cannot commit yourself until you are a committer)

Required Reading

#ResourceWhat to extract
1Apache Tez ContributingThe official contribution guide
2Apache JIRA for TezBrowse recent issues to understand what active work looks like
3dev@tez.apache.org archivesRead 2 weeks of mailing list threads at https://lists.apache.org/list.html?dev@tez.apache.org
4src/config/checkstyle.xml in the Tez repoWhat style rules are enforced
5Apache How It WorksMeritocracy, governance, why Apache operates the way it does
6Any 3 recently closed Tez patchesRead the JIRA comment thread — observe how committers give feedback

Source Code Areas to Inspect

FileWhy
pom.xml (root)Module structure, dependency management, build profiles
tez-dag/pom.xmlModule-level dependency declarations
src/config/checkstyle.xmlStyle rules enforced on every patch
src/config/checkstyle-suppressions.xmlSuppressions — which files are exempt and why
.gitignoreWhat is excluded from version control
Any recently committed fileRead the commit message format

Apache Tez JIRA Structure

Issue Types You Will Encounter

TypeDescription
BugA defect in behavior
ImprovementAn enhancement to existing functionality
New FeatureSomething that does not exist yet
TaskNon-code work (documentation, release, etc.)
Sub-taskPart of a larger issue
TestAdding or fixing a test

Priority Levels

PriorityMeaning
BlockerPrevents a release
CriticalSignificant data loss or correctness risk
MajorImportant but not release-blocking
MinorSmall issue or improvement
TrivialTypo, cosmetic, minor cleanup

For Level 2 contributors: Only work on Minor and Trivial issues. Do not pick up Major or higher issues until you have at least 3 accepted patches in the project.

Component Labels

JIRA issues are labeled by component. The most relevant for early contributors:

ComponentWhat it covers
Tez-DAGDAG execution, AM, state machines
Tez-RuntimeI/O library, shuffle
Tez-APIPublic API — high stability required
DocumentationDocs, Javadoc, website
TestsTest additions and fixes

Mailing List Etiquette

How to Subscribe

# Send an empty email to:
dev-subscribe@tez.apache.org
# You will receive a confirmation email — reply to it

What to Read First

Do not post until you have read at least two weeks of threads. Understand:

  • What issues are currently being discussed
  • How committers respond to patches
  • The tone and technical depth expected
  • What questions get quick responses vs. what gets ignored

How to Ask a Question

Good question format:

Subject: [QUESTION] Understanding VertexImpl initialization flow

Hi dev@,

I'm trying to understand the initialization sequence in VertexImpl.
Specifically, I'm looking at the transition from INITIALIZING to INITED
in VertexImpl.java around line 1234.

The code calls rootInputInitializer() before transitioning, but I'm unclear
on what happens if an initializer throws an unchecked exception.

I've read the JIRA issue TEZ-XXXX and the associated commit, but I still
have this question. Can anyone point me to the relevant code path?

Thanks,
[Your name]

What makes this question good:

  • Specific class and approximate line number
  • State machine terminology used correctly
  • References prior research
  • Concrete question, not "how does Tez work?"

What makes a question bad:

  • "How do I contribute?" — this is answered in the contributing guide
  • "Can you explain how shuffle works?" — too broad; you should read the code first
  • Posting before subscribing and reading archives

Apache Checkstyle

Tez enforces checkstyle on every patch. A patch that fails checkstyle will not be committed.

Running Checkstyle

# Check a specific module
mvn checkstyle:check -pl tez-dag

# Check all modules (slow)
mvn checkstyle:check

# Check and see violations inline
mvn checkstyle:checkstyle -pl tez-dag
open tez-dag/target/checkstyle-result.xml

Common Violations

ViolationCauseFix
UnusedImportsImport statement for an unused classRemove the import
LineLengthLine exceeds 100 charactersBreak the line
WhitespaceAroundMissing space around operatorAdd space
LeftCurly{ on wrong lineMove to end of previous line
JavadocMethodPublic method missing JavadocAdd /** ... */ block
FinalClassUtility class not declared finalAdd final modifier

JIRA Issue Categories for Level 2 Contributors

In addition to Level 1 categories, you can now attempt:

  • Test improvements — adding tests for uncovered paths you identify from reading the code
  • Logging improvements — adding LOG.debug() statements that would help diagnose issues
  • Checkstyle fixes — especially in modules you have been reading

Discipline: The quality of your first 5 patches determines how quickly you build credibility in the community. A patch with a checkstyle violation, compilation error, or test failure will be rejected immediately. Every patch must be verified locally before upload.


Deliverables

  • Subscribed to dev@tez.apache.org and can describe two active discussions
  • Apache JIRA account created
  • One JIRA issue identified, studied, and commented on (even if not yet working on it)
  • Lab 2.1 completed: module-by-module walkthrough documented
  • Lab 2.2 completed: patch generated, checkstyle passing, JIRA description written
  • Understanding of the difference between a Minor and a Trivial issue

Common Mistakes

MistakeConsequenceFix
Opening a GitHub PR instead of attaching a patch to JIRAPR will likely be ignored or closedUse JIRA; attach a .patch file
Submitting a patch that changes formatting in unrelated linesNoise in the diff; committers reject itChange only the lines you meant to change
Claiming an issue without leaving a JIRA commentAnother contributor may do the same workComment "I am investigating this" before starting
Submitting a patch without running testsImmediate rejectionTest everything locally first
Writing a JIRA comment that just says "fix attached"Unhelpful; committers will ask for explanationExplain what was wrong and what the fix does
Using git commit -m "fix"Unprofessional commit messageFormat: TEZ-NNNN. Short description of change.

How to Verify Success

# Your patch generates cleanly
git diff origin/master > /tmp/TEZ-NNNN.001.patch
cat /tmp/TEZ-NNNN.001.patch | head -20   # should show only your intended changes

# Checkstyle passes on the module you changed
mvn checkstyle:check -pl <changed-module>

# Tests pass
mvn test -pl <changed-module> -am -Dtest=<RelevantTestClass>

Patch Profile: Level 2 Graduate

Patch typeExampleTest requirement
Javadoc improvementAdd missing @throws annotation to a methodNone
Log statement improvementAdd context to an existing LOG.warn that is unhelpfulRun the affected test class
Checkstyle fixFix unused import across multiple files in one moduleRun mvn checkstyle:check -pl <module>
Test comment improvementAdd test setup comments explaining what MockAppContext doesRun the test class

You are not ready to submit: behavioral code changes, new features, bug fixes in state machines or shuffle. Continue to Level 3.

Lab 2.1: Navigate the Repository Structure

Background

Before writing a single line of code, a new contributor must be able to navigate the repository with the same fluency as a committer. This lab builds that fluency by walking you through every module, understanding the Maven multi-module structure, and being able to locate any class in under 30 seconds.


Repository Root Layout

apache/tez/
├── pom.xml                     # Root POM — module declarations, dep management
├── tez-api/                    # Public client API
├── tez-common/                 # Utilities shared across modules
├── tez-dag/                    # DAG AppMaster — the core of Tez
├── tez-examples/               # Example DAG implementations
├── tez-ext-service-tests/      # External service integration tests
├── tez-mapreduce/              # MapReduce compatibility layer
├── tez-plugins/                # Optional plugins (ATSv2, etc.)
├── tez-runtime-internals/      # Internal runtime interfaces
├── tez-runtime-library/        # I/O processors, shuffle
├── tez-tests/                  # Integration test suite
├── tez-tools/                  # Performance analysis utilities
├── src/
│   └── config/
│       ├── checkstyle.xml      # Style enforcement rules
│       └── checkstyle-suppressions.xml
└── CHANGES.txt                 # Release changelog

Module-by-Module Walkthrough

tez-api — The Public Contract

Everything in tez-api is part of the public API that application developers use. Changes here must be backward-compatible or explicitly versioned. This is the highest-stability module.

Key packages:

PackageContents
org.apache.tez.dag.apiDAG, Vertex, Edge, TezClient, TezConfiguration
org.apache.tez.dag.api.clientDAGClient, DAGStatus — monitoring and control
org.apache.tez.dag.api.eventEvents emitted by the AM to task processors
org.apache.tez.dag.api.recordsProtocol Buffer message classes (generated)
org.apache.tez.runtime.apiAbstractProcessor, Input, Output interfaces

Exercise:

# Count public classes in tez-api (the API surface)
find tez-api/src/main/java -name "*.java" | wc -l

# Find all classes that implement or extend AbstractProcessor
grep -rl "extends AbstractProcessor" tez-runtime-library/src/

tez-dag — The Application Master

This is the largest and most complex module. It implements the DAG AppMaster that runs in a YARN container and orchestrates vertex and task execution.

Key packages:

PackageContents
org.apache.tez.dag.appDAGAppMaster — the main AM class
org.apache.tez.dag.app.dagDAG, Vertex, Task, TaskAttempt state machine interfaces
org.apache.tez.dag.app.dag.implDAGImpl, VertexImpl, TaskImpl, TaskAttemptImpl
org.apache.tez.dag.app.rmYARN resource management integration
org.apache.tez.dag.app.launcherContainer launch logic
org.apache.tez.dag.app.webAM web UI servlets
org.apache.tez.dag.historyTimeline history event handling

Exercise:

# Count lines in DAGImpl (the most complex class)
wc -l tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java

# Count state machine transitions in VertexImpl
grep "addTransition" tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | wc -l

tez-runtime-library — I/O and Shuffle

The I/O module implements the actual data reading/writing done inside task containers. Shuffle happens here.

Key packages:

PackageContents
org.apache.tez.runtime.library.inputOrderedGroupedKVInput, UnorderedKVInput, etc.
org.apache.tez.runtime.library.outputOrderedPartitionedKVOutput, UnorderedKVOutput, etc.
org.apache.tez.runtime.library.common.shuffleShuffle fetch infrastructure
org.apache.tez.runtime.library.common.sortExternal sort implementation
org.apache.tez.runtime.library.common.writersSpilling KV writers

Exercise:

# Find all Input implementations
find tez-runtime-library/src/main/java -name "*Input*.java" | grep -v test

# Find the shuffle Fetcher
find tez-runtime-library/src/main/java -name "Fetcher.java"
wc -l $(find tez-runtime-library/src/main/java -name "Fetcher.java")

tez-common — Shared Utilities

Contains utilities used by multiple modules that do not fit in tez-api:

  • TezUtils — configuration serialization/deserialization
  • TezTaskID, TezVertexID, TezDAGID — ID types
  • ReflectionUtils — Tez-specific reflection helpers
  • VersionUtils — version compatibility checks

tez-mapreduce — MapReduce Compatibility

Allows MapReduce jobs to run on Tez without code changes. Contains MRInput, MROutput, and the mapper/reducer wrapping infrastructure.

tez-examples — Reference Implementations

Four example DAGs:

ClassWhat it demonstrates
OrderedWordCount3-vertex pipeline, ordered shuffle, sort by value
IntersectExample2-way join using broadcast edge
JoinDataGenData generation for the join example
FilterLinesByWordSimple filter with configurable parallelism

tez-tests — Integration Test Suite

Contains tests that run against MiniTezCluster — a full in-process Tez + YARN + HDFS cluster. These tests are slow (minutes each) but provide end-to-end coverage.

Key test class: TestMiniTezSessionWithLocalMode — runs example DAGs in local mode.


Maven Structure Deep Dive

Root pom.xml

Read the root pom.xml to understand:

  1. Module declarations (<modules> section) — the build order
  2. Dependency management (<dependencyManagement>) — canonical versions for all deps
  3. Plugin management (<pluginManagement>) — canonical plugin configurations
  4. Build profileshadoop-2 vs hadoop-3, dist profile for assembly

Exercise:

# What Hadoop version does Tez build against by default?
grep -A2 "hadoop.version" pom.xml | head -5

# What Java version is required?
grep "maven.compiler" pom.xml

# How many external dependencies does the root pom manage?
grep "<artifactId>" pom.xml | wc -l

Module pom.xml Structure

Each module follows the same pattern:

<parent>
  <groupId>org.apache.tez</groupId>
  <artifactId>tez</artifactId>
  <version>0.10.x-SNAPSHOT</version>
</parent>

<artifactId>tez-dag</artifactId>
<name>Tez DAG</name>

<dependencies>
  <!-- Module-specific dependencies -->
</dependencies>

Modules declare their inter-dependencies explicitly. This is how Maven knows the build order.

Exercise:

# What modules does tez-dag depend on?
grep -A3 "<dependency>" tez-dag/pom.xml | grep "tez-" | grep "artifactId"

# What does tez-runtime-library depend on?
grep -A3 "<dependency>" tez-runtime-library/pom.xml | grep "tez-" | grep "artifactId"

Finding Classes Quickly

By Name

find . -name "VertexImpl.java"
find . -name "Fetcher.java"
find . -name "TestDAGImpl.java"

By Content

# Find the class that defines TEZ_LOCAL_MODE
grep -rl "TEZ_LOCAL_MODE" --include="*.java" .

# Find all state machine StateMachine declarations
grep -rl "StateMachineFactory" --include="*.java" . | grep -v test

In IntelliJ

  • Navigate to class: ⌘ O (macOS) — type class name, supports wildcards
  • Navigate to file: ⌘ ⇧ O — type file name
  • Find usages: ⌥ F7 — shows all places a class/method is used
  • Go to implementation: ⌘ ⌥ B — jumps from interface to implementation

After completing this lab, time yourself on each:

TaskTarget time
Find DAGImpl.java< 10 seconds
Find TezConfiguration.TEZ_LOCAL_MODE declaration< 20 seconds
Find all tests for VertexImpl< 30 seconds
Identify which module handles shuffle fetch retry< 60 seconds
Find the class that submits a DAG from client to AM< 60 seconds

If any take longer, repeat the exercises in this lab.


Expected Output

By end of this lab you should have notes documenting:

  1. The line count of VertexImpl.java and DAGImpl.java
  2. The number of state machine transitions in VertexImpl
  3. The names of all 4 example DAG classes
  4. The Hadoop version Tez builds against
  5. Which module handles shuffle (your own words, not copy-pasted)

Stretch Goals

  1. Generate the full module dependency graph:

    mvn dependency:tree -pl tez-dag -am | grep "\\-\\-" | head -30
    
  2. Find all Protocol Buffer definition files (.proto):

    find . -name "*.proto" | sort
    

    For each, identify which module it belongs to and what messages it defines.

  3. Read tez-api/src/main/proto/DAGApiRecords.proto completely. Identify which messages correspond to Java classes you have already read.

Lab 2.2: Prepare a Patch Using Apache Practices

Background

A "patch" in Apache open-source culture means a unified diff file attached to a JIRA issue. This lab walks you through the complete workflow: finding a safe change to make, preparing the patch, verifying it, and writing the JIRA description.

This lab uses a real but trivial change as the vehicle — a Javadoc improvement in tez-api. Trivial changes are intentional: the goal is to master the workflow, not to write impressive code.


The Apache Git Patch Workflow

Apache Tez development uses a linear history on master (now trunk in some Apache projects, master in Tez). The standard contributor workflow:

origin/master  (read-only for non-committers)
      |
      ↓ checkout
local/master
      |
      ↓ branch
local/TEZ-NNNN
      |
      ↓ make changes
      ↓ mvn test (pass)
      ↓ mvn checkstyle:check (pass)
      ↓ git diff origin/master > TEZ-NNNN.001.patch
      |
      → Attach to JIRA

You never push your branch to Apache. You generate a diff and attach it.


Step-by-Step Tasks

Step 1: Set Up Your Working Branch

cd /path/to/tez

# Always start from a clean, up-to-date master
git fetch origin
git checkout master
git merge origin/master

# Create a branch named after the JIRA issue you are working on
# Use TEZ-0000 as a placeholder for this lab
git checkout -b TEZ-0000-javadoc-tezvertex

Verify you are on the new branch:

git branch
# * TEZ-0000-javadoc-tezvertex
#   master

Step 2: Find a Target for Your Change

Open tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java.

Look for public methods that:

  • Have no Javadoc, or
  • Have a @param tag with a non-descriptive name like // TODO, or
  • Have a @return tag missing from a non-void method

A useful starting point:

# Find methods with empty or missing Javadoc in tez-api
javadoc -private -sourcepath tez-api/src/main/java \
  org.apache.tez.dag.api 2>&1 | grep "no comment"

Or manually: open Vertex.java in IntelliJ, look at the addDataSink() method. If it lacks a @param description for dataSink, that is your target.

Step 3: Make the Change

Add or improve the Javadoc for the method you identified. Follow this format exactly:

/**
 * Adds a {@link DataSink} to this vertex. The sink will receive the output
 * of this vertex after all tasks complete.
 *
 * @param outputName
 *          the name used to identify this sink in the DAG; must be unique
 *          within this vertex
 * @param dataSink
 *          the {@link DataSink} descriptor defining the sink type and
 *          configuration
 * @return this {@link Vertex} instance (for method chaining)
 * @throws IllegalStateException if the vertex has already been added to a {@link DAG}
 */
public Vertex addDataSink(String outputName, DataSinkDescriptor dataSink) {

Rules for Apache Javadoc style:

  • First sentence is a brief imperative description (no subject: "Adds a…" not "This method adds a…")
  • Multi-line @param descriptions indent the continuation by 10 spaces (2 more than @param)
  • Use {@link ClassName} for all class references
  • Use {@code value} for code literals and parameter names in prose

Step 4: Verify Compilation

mvn compile -pl tez-api -q

Expected: BUILD SUCCESS with no errors.

Step 5: Run Checkstyle

mvn checkstyle:check -pl tez-api

Expected: BUILD SUCCESS. If there are violations, fix them before continuing.

Common Javadoc-specific violations:

  • JavadocStyle — Javadoc comment does not end with a period
  • JavadocMethod@param or @return tag is missing
  • JavadocVariable — public field missing Javadoc

Step 6: Run the Relevant Tests

mvn test -pl tez-api -q

Expected: BUILD SUCCESS. Even a pure Javadoc change requires a test run — checkstyle runs as part of the test phase in some configurations.

Step 7: Generate the Patch

# Verify what you changed
git diff

# The diff should show only the lines you intentionally changed
# No whitespace changes, no unrelated files

# Generate the patch file
git diff origin/master > /tmp/TEZ-0000.001.patch

# Inspect it
cat /tmp/TEZ-0000.001.patch

The patch file should:

  • Start with diff --git a/tez-api/...
  • Show exactly the lines you added/removed (prefixed with +/-)
  • Contain no changes to files you did not intend to modify

If the patch is longer than expected, run git status to find unexpected changes and use git checkout -- <file> to revert them.

Step 8: Write the JIRA Description

For the JIRA issue you would create for this patch, write:

Summary line format:

TEZ-0000. Improve Javadoc for Vertex.addDataSink()

Description format:

Problem:
The addDataSink() method in Vertex.java has no @param documentation for the
'dataSink' parameter. This makes it harder for new users to understand the
expected input without reading the implementation.

Fix:
Add complete @param, @return, and @throws Javadoc for addDataSink().

Testing:
mvn test -pl tez-api  (all existing tests pass)
mvn checkstyle:check -pl tez-api  (no violations)

Step 9: Review the Patch as a Committer Would

Before attaching a patch, ask yourself:

  1. Does the patch contain only the changes described in the JIRA description?
  2. Does it pass mvn test -pl <module> locally?
  3. Does it pass mvn checkstyle:check -pl <module>?
  4. Is the commit message format correct? (TEZ-NNNN. Short description.)
  5. Is there a clear explanation in the JIRA description of what was wrong and what was fixed?

If any answer is "no", fix it before uploading.


Common Mistakes

MistakeHow to detectFix
Patch includes unrelated formatting changesgit diff shows hundreds of linesgit checkout -- <unintended-file>
Patch modifies generated codeProto-generated files in the diffRevert generated files; only change source
Patch applies only to a non-master branchgit diff origin/master shows no changesRebase your branch onto current master
Checkstyle violation in unchanged linemvn checkstyle:check fails in a line you did not writeYou must fix it anyway — it is in your patch
Test fails on unrelated moduleRunning all tests surfaces a pre-existing failureConfirm by running on a clean checkout; note the existing failure in JIRA

JIRA Status Workflow

After attaching your patch:

  1. Set the JIRA status to "Patch Available"
  2. Add a comment: "Patch attached. Tested with mvn test -pl tez-api and mvn checkstyle:check -pl tez-api, both pass."
  3. Wait for a committer to review — do not ping on the mailing list immediately
  4. If no response in 2 weeks, it is acceptable to send one polite reminder to dev@tez.apache.org:
    Subject: [REMINDER] TEZ-NNNN patch available for review
    
    Hi dev@,
    
    Friendly reminder that TEZ-NNNN has a patch attached. Any feedback welcome.
    https://issues.apache.org/jira/browse/TEZ-NNNN
    
    Thanks
    

Expected Output

At the end of this lab you have:

  1. A local branch TEZ-0000-javadoc-tezvertex with a Javadoc change
  2. A passing test run: mvn test -pl tez-api
  3. A passing checkstyle run: mvn checkstyle:check -pl tez-api
  4. A patch file at /tmp/TEZ-0000.001.patch with only the intended diff
  5. A written JIRA description (even if not submitted) in the format above

Stretch Goals

  1. Find a real Minor or Trivial open issue in Apache Tez JIRA that has been open for more than 6 months with no patch. Leave a JIRA comment expressing interest.

  2. Attempt the same patch workflow with a real issue:

    • Use git checkout -b TEZ-<real-number>-<short-description> for the branch name
    • Use the real JIRA number in the patch filename: TEZ-NNNN.001.patch
  3. Read three recently committed Tez patches by browsing JIRA issues with status "Resolved". For each, read the complete comment thread to understand the feedback cycle and how many patch revisions were required.

  4. Generate a git log view that shows only your branch's commits:

    git log origin/master..HEAD --oneline
    

    This is what a committer sees when reviewing your work.

Lab 2.3 — Fix It: NullPointerException in TezTaskAttemptID.fromString

Lab type: Fix-It — reproduce → locate → write failing test → patch → verify → format patch
Estimated time: 90–120 min
Tez component: tez-commonorg.apache.tez.common.TezTaskAttemptID


Background

TezTaskAttemptID is the primary key that links a task attempt to its vertex, DAG, and application. Its static fromString method parses a serialised ID like:

attempt_1609459200000_0001_1_00_000000_0

In the Tez codebase the parse path for certain malformed inputs has historically thrown an unguarded NullPointerException rather than a descriptive IllegalArgumentException. Null returns from String.split() or Integer.parseInt() call sites that skip validation are the common culprit.

This lab walks the complete Apache contribution workflow for such a bug:

  1. Reproduce the crash in a test
  2. Read the source to understand why it crashes
  3. Apply the minimal fix
  4. Verify tests pass and checkstyle is clean
  5. Produce a .patch file ready for JIRA upload

Step 1 — Locate the Source File

cd ~/tez-src         # your local Tez clone from Lab 1.1
find . -name "TezTaskAttemptID.java" | head -5

Expected path:

./tez-common/src/main/java/org/apache/tez/common/TezTaskAttemptID.java

Open the file and read the fromString method in full.

Questions to answer before continuing

#Question
1What does fromString call first — TezDAGID.fromString, TezVertexID.fromString, or does it parse raw tokens?
2What happens if the input string is null? Is there an explicit null guard?
3What exception type does the method declare in its signature (throws clause)?
4Find the split("_") call(s). If the split produces fewer parts than expected, what line would throw?
5Is there a sibling method toString()? What is the canonical string format it produces?

Step 2 — Find the Existing Tests

find . -name "TestTezTaskAttemptID.java" | head -5

Expected path:

./tez-common/src/test/java/org/apache/tez/common/TestTezTaskAttemptID.java

Open it.

Questions

#Question
1How many fromString test cases already exist?
2Is there a test for a null input?
3Is there a test for a string with too few underscore-separated parts?
4What assertion style does the file use — JUnit 4 @Test(expected=...) or try/catch?

Step 3 — Reproduce the Bug

Add the following test to TestTezTaskAttemptID.java inside the existing test class. Do not modify the test — the goal is to make it pass, not work around it.

@Test(expected = IllegalArgumentException.class)
public void testFromStringNullInput() {
    TezTaskAttemptID.fromString(null);
}

@Test(expected = IllegalArgumentException.class)
public void testFromStringTooFewParts() {
    // Fewer underscore-separated tokens than the format requires
    TezTaskAttemptID.fromString("attempt_1609459200000_0001_1");
}

Run the tests:

cd tez-common
mvn test -pl . -Dtest=TestTezTaskAttemptID -q 2>&1 | tail -30

Expected result: Both new tests FAIL (the method throws NullPointerException or ArrayIndexOutOfBoundsException, not IllegalArgumentException).

Record the exact exception and stack-trace line. You will need this for the JIRA description later.


Step 4 — Apply the Fix

Open TezTaskAttemptID.java and apply a minimal patch to the fromString method.

Rules for a minimal patch

  • Add a null-check at the very top of fromString; throw IllegalArgumentException with a clear message
  • Add a length-check on the parsed tokens before subscripting the array
  • Do not reformat unrelated lines (this produces noisy diffs that fail checkstyle review)
  • Do not change method signatures or visibility

Hint — guard pattern used elsewhere in the same class

Search the file for how other fromString variants guard their input:

grep -n "IllegalArgumentException" TezTaskAttemptID.java

Use the same pattern and message style.


Step 5 — Verify the Fix

# All TezTaskAttemptID tests must pass
mvn test -pl tez-common -Dtest=TestTezTaskAttemptID -q

# Full tez-common test suite (regression guard)
mvn test -pl tez-common -q 2>&1 | tail -20

# Checkstyle must be clean
mvn checkstyle:check -pl tez-common -q 2>&1 | grep -E "ERROR|WARNING|violation" | head -20

All three commands must produce zero errors.


Step 6 — Understand the Checkstyle Rules

cat tez-common/src/main/checkstyle/tez-checkstyle.xml | grep -A2 "LineLength"
cat tez-common/src/main/checkstyle/tez-checkstyle.xml | grep -A2 "Javadoc"

Questions

#Question
1What is the maximum line length enforced?
2Does the project require Javadoc on all public methods, or only some?
3What import ordering rule is in effect — alphabetical, grouped, or none?

Step 7 — Format the Patch File

Apache Tez uses the unified diff format. From the repo root:

cd ~/tez-src
git diff > /tmp/TEZ-XXXX.001.patch

Inspect the patch:

cat /tmp/TEZ-XXXX.001.patch

Checklist before uploading to JIRA

  • Patch header shows the correct file path relative to the repo root
  • Only TezTaskAttemptID.java and TestTezTaskAttemptID.java are modified
  • No trailing whitespace on any changed line (grep -P "\s+$" /tmp/TEZ-XXXX.001.patch)
  • Patch applies cleanly to a fresh checkout: git apply --check /tmp/TEZ-XXXX.001.patch
  • mvn test -pl tez-common still passes after git apply

Step 8 — Write the JIRA Description

Draft a JIRA ticket description following the Apache Tez convention:

Summary: TezTaskAttemptID.fromString throws NPE/AIOOBE on malformed input
         instead of IllegalArgumentException

Description:
  TezTaskAttemptID.fromString does not validate its input before parsing.
  Passing null or a string with fewer than N underscore-separated parts
  causes an unhandled NullPointerException (null path) or
  ArrayIndexOutOfBoundsException (short-string path) instead of
  the expected IllegalArgumentException.

  Steps to reproduce:
    TezTaskAttemptID.fromString(null);
    → NullPointerException at TezTaskAttemptID.java:NN

    TezTaskAttemptID.fromString("attempt_1609459200000_0001_1");
    → ArrayIndexOutOfBoundsException at TezTaskAttemptID.java:NN

  Fix: add explicit null guard + array-length guard at the top of fromString.

Priority: Minor
Component: tez-common

Replace NN with the actual line numbers from your stack traces in Step 3.


Step 9 — Connect the Concepts

ConceptWhere to find it in the codebase
TezTaskAttemptIDtez-common/src/main/java/.../TezTaskAttemptID.java
TezID base classSame package — TezID.java
All fromString sibling methodsTezDAGID, TezVertexID, TezTaskID — same package
Checkstyle configtez-common/src/main/checkstyle/tez-checkstyle.xml
Example past fix (similar pattern)Search JIRA for TEZ- + IllegalArgumentException + fromString

Reflection

  1. Why should library code throw IllegalArgumentException rather than letting a NullPointerException propagate?
  2. What does the Apache contribution guide say about test coverage for bug fixes?
    (Hint: CONTRIBUTING.md or the Apache Tez wiki — every bug fix must include a reproducing test.)
  3. How does the Tez fromString guard pattern compare to the one in Hadoop's TaskAttemptID.forName?
  4. Could this same class of bug exist in TezDAGID.fromString or TezVertexID.fromString?
    Check both files and note your findings.

Lab 2.4 — Review It: Spot the Flaws in TEZ-FAKE001.001.patch

Lab type: Review-It — read a synthetic patch, find every flaw, explain the impact, propose fixes
Estimated time: 60–90 min
Tez component: tez-dagorg.apache.tez.dag.app.dag.impl.TaskImpl


Context

You are a Tez committer reviewing a patch uploaded to JIRA. The contributor claims the patch fixes a race condition where TaskImpl.getCounters() returns null when called before any task attempt has completed.

Your job is to review the patch before it merges. There are exactly 5 intentional flaws hidden in the diff below. Find them all.


The Synthetic Patch

diff --git a/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java b/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java
index a1b2c3d..e4f5a6b 100644
--- a/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java
+++ b/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java
@@ -214,6 +214,8 @@ public class TaskImpl implements Task, EventHandler<TaskEvent> {
 
+  import org.apache.tez.common.counters.TezCounters;
+
   public synchronized TezCounters getCounters() {
     TezCounters counters = null;
     if (successfulAttempt != null) {
@@ -221,7 +223,7 @@ public class TaskImpl implements Task, EventHandler<TaskEvent> {
       counters = successfulAttempt.getCounters();
     } else {
       counters = attemptList.stream()
-          .filter(a -> a.getState() == TaskAttemptState.SUCCEEDED)
+          .filter(a -> a.getState() == TaskAttemptState.RUNNING)
           .findFirst()
           .map(TaskAttemptImpl::getCounters)
           .orElse(null);
@@ -231,6 +233,14 @@ public class TaskImpl implements Task, EventHandler<TaskEvent> {
     return counters;
   }
 
+  /**
+   * Returns the counter for this task, or a new empty TezCounters object
+   * if no counters are available yet.
+   *
+   * @return counters, never null
+   */
+  public synchronized TezCounters getCountersOrEmpty() {
+    TezCounters c = getCounters();
+    return c == null ? new TezCounters() : c;
+  }
+
diff --git a/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestTaskImpl.java b/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestTaskImpl.java
index b7c8d9e..f0a1b2c 100644
--- a/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestTaskImpl.java
+++ b/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestTaskImpl.java
@@ -891,6 +891,18 @@ public class TestTaskImpl {
 
+  @Test
+  public void testGetCountersBeforeAnyAttempt() {
+    // No attempts started; counters should not be null
+    initTask();
+    TezCounters result = task.getCounters();
+    assertNotNull("getCounters() must not return null", result);
+  }
+
+  @Test
+  public void testGetCountersOrEmptyReturnsSameObjectEachTime() {
+    initTask();
+    TezCounters first  = task.getCountersOrEmpty();
+    TezCounters second = task.getCountersOrEmpty();
+    assertSame("Must return same instance", first, second);
+  }
+

Your Task

For each flaw you find, fill in the table:

#FileLine / hunkFlaw descriptionWhy it mattersSuggested fix
1
2
3
4
5

Guided Questions

Work through these questions one by one. Each one points at a different flaw.

Question 1 — Import placement

Look at where the import statement was added:

+  import org.apache.tez.common.counters.TezCounters;
+
   public synchronized TezCounters getCounters() {
  • Is this a valid location for a Java import declaration?
  • What would happen at compile time if this diff were applied as-is?
  • Where should imports go in a Java file?
  • Lookup: does TaskImpl.java already import TezCounters at the top?
    (grep "import.*TezCounters" tez-dag/src/main/java/.../TaskImpl.java)

What is the flaw?


Question 2 — The filter predicate

The patch changes the fallback stream filter from:

.filter(a -> a.getState() == TaskAttemptState.SUCCEEDED)

to:

.filter(a -> a.getState() == TaskAttemptState.RUNNING)
  • Re-read the JIRA description: the reporter says getCounters() returns null when called before any attempt has completed.
  • Does filtering for RUNNING attempts fix that?
  • What does it mean to read counters from a RUNNING attempt vs a SUCCEEDED one?
  • Are the counters of a still-running attempt considered final/reliable?

What is the flaw? What should the filter be?


Question 3 — The new test testGetCountersBeforeAnyAttempt

Read the test body carefully:

TezCounters result = task.getCounters();
assertNotNull("getCounters() must not return null", result);
  • The test asserts that getCounters() is not null when no attempt has started.
  • But the patch does not change getCounters() to return an empty object — it adds a separate method getCountersOrEmpty() for that.
  • When successfulAttempt is null and attemptList is empty, what does getCounters() actually return?
  • Will this test pass or fail against the patched code?

What is the flaw?


Question 4 — The new test testGetCountersOrEmptyReturnsSameObjectEachTime

assertSame("Must return same instance", first, second);
  • getCountersOrEmpty() is implemented as:
    return c == null ? new TezCounters() : c;
  • Each call creates a new TezCounters() when c is null.
  • Does the assertSame assertion match the implementation?
  • Is assertSame testing a documented contract, or is it over-specifying an implementation detail?
  • What assertion would actually verify the intended contract ("not null")?

What is the flaw?


Question 5 — The JIRA description says the fix is needed, but…

Re-read the patch one final time. The root cause (as stated in the JIRA) is that getCounters() can return null. The correct caller-safe fix for most Tez callers would be to make getCounters() itself never return null (return empty TezCounters as the contract).

Instead the patch adds getCountersOrEmpty() as a new method — but leaves the old getCounters() method returning null.

  • Every existing caller of getCounters() still gets null.
  • The Tez codebase uses getCounters() in aggregation loops that iterate counters: counters.incrAllCounters(taskCounters) — passing null there throws NPE.
  • How many callers of getCounters() exist in tez-dag?
grep -rn "\.getCounters()" tez-dag/src/main/ | grep -v "//.*getCounters" | wc -l
  • Does the patch actually fix the original bug?

What is the flaw?


Answer Key (Read After You've Filled the Table)

Reveal answers
#FlawImpactFix
1import statement placed inside the class body (after the opening {)Compile error — Java imports must precede the class declarationRemove the import; TezCounters is already imported at the top of TaskImpl.java
2Filter changed to RUNNING instead of keeping SUCCEEDED; a running attempt's counters are partial and unstableReturns wrong/partial data; counters values change as the attempt progressesRevert to SUCCEEDED filter; the real fix is to handle the "no succeeded attempt yet" case separately (return null or empty)
3testGetCountersBeforeAnyAttempt asserts assertNotNull on getCounters() which still returns null when no attempt has completedTest will fail on the patched code — the patch doesn't make getCounters() non-nullTest should call getCountersOrEmpty() or the assertion should accept null and document the contract
4assertSame requires the same object reference but getCountersOrEmpty() creates a new TezCounters() each time null is returnedTest fails on every call where no successful attempt existsUse assertNotNull to verify the non-null contract; don't assert reference identity
5The patch adds getCountersOrEmpty() but doesn't fix the root cause — getCounters() still returns null; all existing callers are still brokenDownstream NPEs in counter aggregation loops are not fixedChange getCounters() itself to return new TezCounters() instead of null, or add a null-guard in every caller; document the chosen contract

Reflection

  1. A patch that adds a new method instead of fixing the old one is sometimes called an "additive workaround." When is that acceptable? When is it wrong?

  2. The Apache Tez review process requires that every patch include a test that would have failed before the fix and passes after. Does this patch satisfy that requirement? Why or why not?

  3. If you were the committer, what feedback would you leave on JIRA? Write two or three sentences in the style of a real review comment (constructive, specific, pointing at the line).

  4. Look up a real Tez JIRA review thread (search issues.apache.org/jira for project = TEZ AND labels = patch-available AND resolution = Fixed). Find one comment where a committer asked for a test change. What did they say?

Level 3: Tez Architecture

This level gives you a working mental model of how all Tez components fit together. After completing it you will be able to trace any execution path — from API call to task output — through the code without getting lost. Architecture knowledge is what separates a contributor who fixes isolated bugs from one who can design improvements.


Learning Objectives

By the end of Level 3 you must be able to:

  1. Draw the Tez component topology from memory (Client → AM → RM → NM → Container)
  2. Trace a DAG.submit() call through four class boundaries to the first vertex start
  3. Explain the role of each of the four state machines and how they interact
  4. Describe what happens on each of the three communication channels between components
  5. Explain the Input-Processor-Output (IPO) model and how it relates to DAG edges
  6. Identify which Protocol Buffer message type carries a given piece of information

Component Topology

┌─────────────────────────────────────────────────────────────────────┐
│  Client JVM                                                         │
│  ┌─────────────┐                                                    │
│  │  TezClient  │──── submitDAG() ────────────────────────────────┐  │
│  └─────────────┘                                                 │  │
└──────────────────────────────────────────────────────────────────┼──┘
                                                                   │ DAGPlan (protobuf)
                                                                   ▼
┌─────────────────────────────────────────────────────────────────────┐
│  YARN ResourceManager                                               │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │  ApplicationMaster container (DAGAppMaster)                  │   │
│  │                                                              │   │
│  │  ┌───────────┐  ┌────────────┐  ┌──────────┐  ┌─────────┐  │   │
│  │  │  DAGImpl  │→ │ VertexImpl │→ │ TaskImpl │→ │ TaskAttemptImpl│  │
│  │  └───────────┘  └────────────┘  └──────────┘  └─────────┘  │   │
│  │         │              │                                     │   │
│  │         └──── events ──┘                                    │   │
│  │                                                              │   │
│  │  ContainerLauncher ─── launches ──────────────────────────┐ │   │
│  └─────────────────────────────────────────────────────────┬─┘ │   │
└────────────────────────────────────────────────────────────┼───┼───┘
                                                             │   │
                                              container req  │   │ container
                                                             ▼   ▼
┌─────────────────────────────────────────────────────────────────────┐
│  YARN NodeManagers (one per worker node)                            │
│  ┌───────────────────────────────────────────────────────────────┐  │
│  │  TezChild (task container JVM)                                │  │
│  │  ┌────────────────────────────────────────────────────────┐  │  │
│  │  │  LogicalIOProcessorRuntimeTask                         │  │  │
│  │  │   Input(s) ─── Processor ─── Output(s)                │  │  │
│  │  └────────────────────────────────────────────────────────┘  │  │
│  └───────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘

Communication Channels

ChannelFrom → ToWhat travels
Client → AMTezClientDAGClientAMProtocol (IPC)DAGPlan protobuf, GetDAGStatusRequest
AM → RMRMCommunicator → YARN RMContainer requests, heartbeats, AM completion
AM → NMContainerLauncher → YARN NMContainer launch context, env, classpath, command
AM ↔ ContainerTaskCommunicatorManagerTezTaskUmbilicalProtocol (IPC)Task assignment, task status, event routing

The Four State Machines

Tez execution is modeled as four nested state machines. Each tracks a specific level of granularity and sends events to the others.

DAGImpl State Machine

StateDescription
NEWDAG created, not yet initialized
INITEDAll vertices initialized, ready to start
RUNNINGAt least one vertex is running
SUCCEEDEDAll vertices succeeded
FAILEDAt least one vertex failed (unrecoverable)
KILLEDAM received a kill request
ERRORInternal AM error

Key transition: NEW → INITED triggers VertexInitializedEvent for each vertex.

VertexImpl State Machine

The most complex state machine in Tez. Has ~30 states and 80+ transitions.

Core states (simplified):

StateDescription
NEWVertex created, not yet initialized
INITIALIZINGWaiting for inputs and vertex managers to initialize
INITEDReady to schedule tasks
RUNNINGAt least one task is running
COMMITTINGAll tasks done, running output committers
SUCCEEDEDAll tasks succeeded, all outputs committed
FAILEDUnrecoverable failure
RECOVERINGAM restarted, recovering state from history

The VertexImpl state machine is defined by the StateMachineFactory at the top of VertexImpl.java. Reading the factory definition gives you the complete transition table.

TaskImpl State Machine

Each vertex has N tasks (parallelism = N). TaskImpl tracks one task across its attempts.

StateDescription
NEWSCHEDULEDTask created and placed in the scheduler queue
RUNNINGAt least one attempt is running
SUCCEEDEDOne attempt succeeded
FAILEDAll attempts exhausted
KILLEDTask explicitly killed (e.g., pre-emption)

TaskImpl manages the attempt retry logic: if attempt 1 fails, TaskImpl decides whether to launch attempt 2 based on the failure mode and retry count configuration.

TaskAttemptImpl State Machine

One actual container execution of a task.

StateDescription
NEWAttempt created, awaiting container assignment
ASSIGNEDContainer assigned by the scheduler
RUNNINGContainer launched, task code executing
SUCCESS_FINISHING_CONTAINERTask reported success, container cleanup in progress
SUCCEEDEDAttempt completed successfully
FAILEDAttempt failed (may or may not trigger task retry)
KILLEDAttempt pre-empted or killed by AM

Event System

State machine transitions are driven by events. The event bus (AsyncDispatcher) routes events from producers to the correct state machine.

Key Event Types

Event TypeProducerConsumer
DAGEventType.DAG_INITDAGAppMasterDAGImpl
VertexEventType.V_INITDAGImplVertexImpl
VertexEventType.V_STARTDAGImplVertexImpl
TaskEventType.T_SCHEDULEVertexImplTaskImpl
TaskAttemptEventType.TA_ASSIGNEDTaskSchedulerTaskAttemptImpl
TaskAttemptEventType.TA_DONETezTaskUmbilicalProtocol (container callback)TaskAttemptImpl
VertexEventType.V_TASK_COMPLETEDTaskImplVertexImpl
DAGEventType.DAG_VERTEX_COMPLETEDVertexImplDAGImpl

The event flow for a normal task success:

Container reports TA_DONE
  → TaskAttemptImpl: RUNNING → SUCCEEDED
  → sends T_ATTEMPT_SUCCEEDED to TaskImpl
    → TaskImpl: RUNNING → SUCCEEDED
    → sends V_TASK_COMPLETED to VertexImpl
      → VertexImpl checks: all tasks done?
        → if yes: sends DAG_VERTEX_COMPLETED to DAGImpl
          → DAGImpl checks: all vertices done?
            → if yes: DAG transitions to SUCCEEDED

Every state transition in this chain corresponds to a log line you will see in the AM logs.


Protocol Buffers

All cross-process data in Tez is serialized with Protocol Buffers (proto3 in newer versions).

Proto fileLocationKey messages
DAGApiRecords.prototez-api/src/main/proto/DAGPlan, VertexPlan, EdgePlan
DAGIo.prototez-api/src/main/proto/RootInputLeafOutputProto, EntityDescriptorProto
HistoryProtos.prototez-dag/src/main/proto/All timeline/history event types
Events.prototez-runtime-internals/src/main/proto/Task-level events (DataMovementEvent, etc.)

The DAGPlan message is what TezClient sends to the AM. It contains the complete description of the DAG: vertices, edges, processor descriptors, I/O configurations, and edge properties. It is generated from the DAG API object.

// In DAGImpl.java, the plan is received and deserialized:
DAGPlan dagPlan = clientAMProtocol.submitDAG(submitDAGRequest).getDagId();
// Plan is then converted to DAGImpl state

Input-Processor-Output (IPO) Model

Each task runs a single AbstractProcessor. The processor has access to named Input and Output instances, which are determined by the edges in the DAG.

┌──────────────────────────────────────────────────────────────────┐
│  Task container                                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  LogicalIOProcessorRuntimeTask                          │    │
│  │                                                         │    │
│  │  Inputs:                    Outputs:                    │    │
│  │  ┌──────────────────┐       ┌──────────────────────┐   │    │
│  │  │ OrderedGrouped   │       │ OrderedPartitioned   │   │    │
│  │  │ KVInput          │──┐ ┌──│ KVOutput             │   │    │
│  │  └──────────────────┘  │ │  └──────────────────────┘   │    │
│  │                        ▼ │                              │    │
│  │               ┌───────────────┐                         │    │
│  │               │  MyProcessor  │                         │    │
│  │               │  extends      │                         │    │
│  │               │  AbstractProcessor                      │    │
│  │               └───────────────┘                         │    │
│  └─────────────────────────────────────────────────────────┘    │
└──────────────────────────────────────────────────────────────────┘

Edge Property Types

The EdgeProperty in the DAG API determines what I/O classes are used between two vertices.

DataMovementTypeMeaningDefault I/O pair
SCATTER_GATHERPartitioned, sorted shuffleOrderedPartitionedKVOutputOrderedGroupedKVInput
BROADCASTAll output sent to all downstream tasksUnorderedKVOutputUnorderedKVInput
ONE_TO_ONETask i → Task i, no shuffleUnorderedKVOutputUnorderedKVInput
CUSTOMUser-defined routingUser-provided EdgeManagerPlugin

SCATTER_GATHER corresponds to the classic MapReduce shuffle. BROADCAST is used for joins where one side is small enough to replicate to all tasks.

DataMovementEvent

When a task output is ready, it sends a DataMovementEvent through the umbilical to the AM. The AM routes it to the downstream tasks so their input knows which partition to fetch.

This event routing is the mechanism by which OrderedGroupedKVInput discovers where each upstream partition is located — it receives DataMovementEvents from the AM containing the shuffle server address and partition index.


Required Reading

#ResourceWhat to extract
1tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.javaThe StateMachineFactory declaration — read all addTransition() calls
2tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.javaThe createDag() method — how DAGPlan becomes state machine objects
3tez-api/src/main/proto/DAGApiRecords.protoDAGPlan and VertexPlan message definitions
4tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.javaThe serviceStart() method — component initialization order
5tez-runtime-internals/src/main/java/org/apache/tez/runtime/LogicalIOProcessorRuntimeTask.javaHow inputs, processors, and outputs are initialized in a container

Key Classes Quick Reference

ClassModuleRole
DAGAppMastertez-dagAM main class; manages all components; starts the event dispatcher
DAGImpltez-dagDAG state machine; tracks vertex completion; manages history
VertexImpltez-dagVertex state machine; manages task scheduling; calls VertexManager
TaskImpltez-dagTask state machine; manages attempt lifecycle and retry logic
TaskAttemptImpltez-dagTaskAttempt state machine; coordinates container assignment
AsyncDispatchertez-dag (via Hadoop)Event bus; routes events to state machines asynchronously
TezTaskUmbilicalProtocoltez-runtime-internalsIPC interface between container and AM
TezChildtez-dagContainer main class; receives task assignment; runs the task
LogicalIOProcessorRuntimeTasktez-runtime-internalsIn-container task runner; sets up IPO
TezClienttez-apiClient API; creates TezSession; submits DAGs

JIRA Categories for Level 3

Having read the architecture, you can now evaluate:

  • Architecture improvement JIRAs — proposals to change how components interact
  • State machine correctness bugs — transitions that lead to wrong states
  • Event routing issues — events that are lost or sent to wrong consumers
  • Container reuse improvements — how tasks are assigned to existing containers

You are still not ready to submit fixes for state machine bugs — those require Level 4. But you can now read these issues intelligently and leave informed comments.


Deliverables

  • Draw the component topology diagram from memory (no looking)
  • Trace TezClient.submitDAG() to VertexImpl V_START event through class names
  • Identify the state machines and their event types from code (not from this page)
  • Explain in your own words what DataMovementEvent does and why it exists
  • Lab 3.1 completed: DAG submission trace documented
  • Lab 3.2 completed: IPO abstraction walkthrough complete

Common Mistakes

MistakeImpactCorrect understanding
Thinking the AM runs tasks directlyLeads to wrong mental model of container lifecycleTasks run in separate JVMs (containers); AM only schedules and monitors
Confusing VertexImpl with Vertex (API)Vertex is the builder; VertexImpl is the runtime state machineThey are in different modules (tez-api vs tez-dag)
Thinking AsyncDispatcher is synchronousEvents are queued; transitions happen on the dispatcher threadNever assume a transition is immediate after an event is posted
Reading VertexImpl top-to-bottomThe class is 6000+ lines; reading linearly is unproductiveStart with the StateMachineFactory declaration, then follow individual transitions

Lab 3.1: Trace a DAG Submission End-to-End

Background

A DAG goes from a Java object constructed with the API to running tasks in containers through a sequence of method calls, IPC calls, and event posts that spans six class boundaries and three JVMs. This lab asks you to trace that path precisely — class name, method name, and the data that crosses each boundary.

Being able to reconstruct this trace from code (not from documentation) is the skill. That means reading DAGAppMaster.java, DAGImpl.java, VertexImpl.java, and TezChild.java and following the chain yourself.


The Six Class Boundaries

[1] TezClient.submitDAG(dag)
         │
         │  DAGClientAMProtocol (IPC) — carries: SubmitDAGRequest{DAGPlan}
         ▼
[2] DAGClientHandler.submitDAG(request)   [in DAGAppMaster]
         │
         │  posts: DAGAppMasterEvent(NEW_DAG_SUBMITTED)
         ▼
[3] DAGAppMaster.handle(event)
         │
         │  calls createDag(dagPlan) → new DAGImpl(...)
         │  posts: DAGEvent(DAG_INIT)
         ▼
[4] DAGImpl.handle(DAGEvent{DAG_INIT})
         │
         │  InitTransition: initializes all VertexImpl objects
         │  posts: VertexEvent(V_INIT) for each vertex
         ▼
[5] VertexImpl.handle(VertexEvent{V_INIT})
         │
         │  InitTransition: sets up tasks, calls VertexManager
         │  posts: VertexEvent(V_START) when ready
         │  posts: TaskEvent(T_SCHEDULE) for each task
         ▼
[6] TaskImpl → TaskAttemptImpl → ContainerLauncher → NM
         │
         │  NM starts container JVM: TezChild.main()
         ▼
[Container JVM] TezChild receives task assignment via TezTaskUmbilicalProtocol
         │
         ▼
LogicalIOProcessorRuntimeTask.run()  — Processor.run() called

Step-by-Step Tasks

Step 1: Find the Entry Point in TezClient

Open tez-api/src/main/java/org/apache/tez/dag/api/TezClient.java.

Find the submitDAG(DAG dag) method. Answer:

  1. What is the name of the IPC protocol interface used to communicate with the AM?
  2. What does TezClient do if it does not yet have an AM to talk to (session not started)?
  3. What method on the DAG object serializes it to a DAGPlan protobuf?
  4. What request object wraps the DAGPlan before it is sent over IPC?
# Find the IPC protocol interface
grep -n "Protocol" tez-api/src/main/java/org/apache/tez/dag/api/TezClient.java | head -10

# Find DAGPlan construction
grep -n "DAGPlan\|createDag\|getPlan" tez-api/src/main/java/org/apache/tez/dag/api/TezClient.java

Step 2: Find the AM-side IPC Handler

The AM exposes the DAGClientAMProtocol interface. The implementation is in DAGAppMaster.

# Find the implementation of submitDAG on the AM side
grep -rn "submitDAG" tez-dag/src/main/java/org/apache/tez/dag/app/ | grep -v test

Open the handler class. Answer:

  1. What is the exact class name that implements DAGClientAMProtocol?
  2. What event type does it post to the AsyncDispatcher after receiving the DAGPlan?
  3. Does the submitDAG call on the AM side block until the DAG completes, or does it return immediately?

Step 3: Trace DAGAppMaster Initialization

Open tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java.

Find the serviceStart() method. Read the component initialization order:

  1. List the components initialized in serviceStart() in order
  2. Find where AsyncDispatcher is created and started
  3. Find where the DAGEventDispatcher (the component that routes DAGEvents to DAGImpl) is registered
# Find component initialization
grep -n "addService\|serviceStart\|startService" \
  tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java | head -20

Step 4: Read the DAGImpl Init Transition

Open tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java.

Find the StateMachineFactory definition. Locate the transition for DAGEventType.DAG_INIT.

The transition handler class is InitTransition. Find it in the same file.

Answer:

  1. What does InitTransition.transition() do with each vertex in the DAG?
  2. After initializing vertices, what event does DAGImpl post?
  3. Under what condition does the init transition immediately move to RUNNING vs waiting?
# Find the init transition
grep -n "InitTransition\|DAG_INIT" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java | head -20

Step 5: Read the VertexImpl Init Transition

Open tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java.

Find the transition from INITIALIZING on event V_INIT. The handler is InitTransition (a different class from the one in DAGImpl).

Answer:

  1. What is the VertexManager and when is it invoked during initialization?
  2. How does VertexImpl know how many tasks to create (the parallelism)?
  3. What event does VertexImpl send to DAGImpl when initialization completes?
# Find vertex init transition
grep -n "V_INIT\|InitTransition" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head -20

Step 6: Trace the Container Launch

After tasks are scheduled, TaskAttemptImpl requests a container from the TaskScheduler. When a container is assigned, ContainerLauncher builds the launch context.

# Find the container launch command construction
grep -rn "containerLaunchContext\|getContainerLaunchContext\|vargs" \
  tez-dag/src/main/java/org/apache/tez/dag/app/launcher/ | grep -v test | head -10

Answer:

  1. What is the main class of the container JVM? (The class with main() that YARN launches)
  2. What information is passed to TezChild via system properties vs environment variables?
  3. How does TezChild know which task to run when it starts?

Step 7: Read TezChild.main()

Open tez-dag/src/main/java/org/apache/tez/dag/app/TezChild.java.

Find the main() method and the run() loop.

Answer:

  1. What IPC interface does TezChild use to communicate with the AM?
  2. What does TezChild do when it receives a TaskSpec from the AM?
  3. What class is instantiated to actually run the processor?
# Find TezChild
find tez-dag/src/main/java -name "TezChild.java"
wc -l $(find tez-dag/src/main/java -name "TezChild.java")

Complete the Trace Table

Fill in this table by reading the code (not from this page or any other documentation):

StepClassMethodData / Event
1TezClientsubmitDAG()Sends SubmitDAGRequest{DAGPlan} via IPC
2?submitDAG()Posts event ???
3DAGAppMasterhandle()Creates DAGImpl, posts DAGEvent{DAG_INIT}
4DAGImplInitTransition.transition()Posts VertexEvent{V_INIT} for each vertex
5VertexImplInitTransition.transition()Posts TaskEvent{T_SCHEDULE} for each task
6TaskAttemptImpl?Requests container from RM via TaskScheduler
7ContainerLauncher?Launches container JVM with TezChild as main class
8TezChildrun()Receives task spec, starts processor
9LogicalIOProcessorRuntimeTaskrun()Calls Processor.run()

Fill in the ? cells from the actual code. Each cell should contain the real method name.


Expected Output

A completed trace table with all cells filled from code, not from documentation. Each answer should be verifiable by pointing to a specific line in a specific file.

Example format for your notes:

Step 2: DAGClientHandler.submitDAG()
  in: tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java
  line: ~1234
  posts: DAGAppMasterEvent(NEW_DAG_SUBMITTED)

Stretch Goals

  1. Find the AsyncDispatcher queue size configuration. What happens if the queue fills up?

    grep -rn "AsyncDispatcher\|dispatcher.queue" \
      tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java | head -10
    
  2. Find where the AM is told to exit when the DAG completes:

    grep -n "stop\|shutdown\|exit" \
      tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java | grep -i "succeeded\|complete"
    
  3. Trace what happens to a TA_DONE event from TezChild back to DAGImpl:

    • TezChild calls a method on the umbilical
    • The AM receives it and posts a TaskAttemptEvent
    • TaskAttemptImpl transitions to SUCCEEDED
    • The chain continues up to DAGImpl Identify every class and event in this reverse chain.

Lab 3.2: Understand the IPO Abstraction

Background

Every task in Tez runs a Processor that reads from one or more Input objects and writes to one or more Output objects. This Input-Processor-Output (IPO) model is the fundamental abstraction for how data moves through a DAG. Edge properties in the API (EdgeProperty, DataMovementType) determine which I/O classes are instantiated in the container.

This lab walks through the IPO model from the API layer to the runtime, tracing how an ORDERED_PARTITIONED_KV_OUTPUT configuration becomes actual bytes in a shuffle buffer.


The IPO Interface Hierarchy

tez-runtime-api (in tez-api module):
  AbstractLogicalInput
      └── AbstractInput
  AbstractLogicalOutput
      └── AbstractOutput
  AbstractProcessor

tez-runtime-library (implementations):
  OrderedPartitionedKVOutput    extends AbstractLogicalOutput
  OrderedGroupedKVInput         extends AbstractLogicalInput
  UnorderedKVOutput             extends AbstractLogicalOutput
  UnorderedKVInput              extends AbstractLogicalInput
  UnorderedPartitionedKVOutput  extends AbstractLogicalOutput
  BroadcastKVInput              extends AbstractLogicalInput (alias for UnorderedKVInput)

The key interface chain:

AbstractLogicalOutput.initialize() → called by LogicalIOProcessorRuntimeTask
AbstractLogicalOutput.start()      → called when the processor is started
AbstractLogicalOutput.getWriter()  → returns KeyValueWriter for the processor to use
AbstractLogicalOutput.commit()     → called after processor.run() completes
AbstractLogicalOutput.close()      → cleanup

Step-by-Step Tasks

Step 1: Read the AbstractLogicalOutput Interface

Open tez-runtime-internals/src/main/java/org/apache/tez/runtime/api/AbstractLogicalOutput.java.

Answer:

  1. What is the purpose of the initialize() method? What does it return?
  2. What is the difference between start() and initialize()? Why are they separate?
  3. What method does a Processor call to get a writer to write records?

Step 2: Trace OrderedPartitionedKVOutput.initialize()

Open tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/OrderedPartitionedKVOutput.java.

Find the initialize() method.

Answer:

  1. What configuration key controls the buffer size for sorting?
  2. What class is created in initialize() to handle the actual sort-and-spill?
  3. How is the Partitioner class determined at runtime?
# Find sort buffer configuration
grep -n "SORT_MB\|sortmb\|buffer" \
  tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/OrderedPartitionedKVOutput.java \
  | head -10

# Find the writer/sorter creation
grep -n "new.*Writer\|new.*Sorter\|ExternalSorter" \
  tez-runtime-library/src/main/java/org/apache/tez/runtime/library/output/OrderedPartitionedKVOutput.java

Step 3: Trace the Write Path

When a processor calls writer.write(key, value), the data goes:

KeyValueWriter.write(key, value)
  → ExternalSorter.collect(key, value, partition)
    → SpillThread triggers when buffer is full
      → IFile.Writer writes sorted partition to local disk
        → On close(): merges all spills into final output file

Find the ExternalSorter class:

find tez-runtime-library/src/main/java -name "ExternalSorter.java"

Answer:

  1. What data structure holds records before they are spilled?
  2. What algorithm is used to sort records in the buffer?
  3. How is the sort key computed for (K, V) pairs with a custom Comparator?

Step 4: Read OrderedGroupedKVInput.initialize()

Open tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/OrderedGroupedKVInput.java.

Find initialize().

Answer:

  1. What class handles the shuffle (fetching data from remote nodes)?
  2. How does the input know which upstream tasks it needs to fetch from?
  3. What event type does the input consume to discover shuffle locations?
grep -n "Shuffle\|ShuffleManager\|DataMovementEvent" \
  tez-runtime-library/src/main/java/org/apache/tez/runtime/library/input/OrderedGroupedKVInput.java \
  | head -15

Step 5: Trace the Read Path

When a processor calls keyValueReader.next(), the data flow is:

KeyValueReader.next()
  → MergedKeyValueIterator.next()    [merging multiple sorted partitions]
    → TezRawKeyValueIterator         [from TezMerger]
      → IFile.Reader reads from local merged file

But before the merge can happen, the shuffle must fetch data:

DataMovementEvent arrives (from AM, routed from upstream task)
  → ShuffleManager records: "partition P is at host H:port/path"
  → Fetcher.fetch() downloads the partition file
    → stores locally
  → When all partitions fetched: MergeManager merges them
    → final sorted output available for KeyValueReader

Find the Fetcher class:

find tez-runtime-library/src/main/java -name "Fetcher.java"
wc -l $(find tez-runtime-library/src/main/java -name "Fetcher.java")

Answer:

  1. What HTTP endpoint does Fetcher call to retrieve partition data?
  2. What does Fetcher do if the HTTP request fails?
  3. How many simultaneous fetch connections does Fetcher allow by default?

Step 6: Understand DataMovementEvent Routing

The DataMovementEvent is what connects output to input. When a task completes its output:

  1. The ShuffleHandler (shuffle server) registers the output location
  2. The task sends a DataMovementEvent via the TezTaskUmbilicalProtocol to the AM
  3. The AM routes the event to the downstream tasks that need it
  4. The downstream input receives it and knows to fetch from that location

Find the DataMovementEvent class:

find . -name "DataMovementEvent.java" | grep -v test

Answer:

  1. What fields does DataMovementEvent carry?
  2. Why is the payload (userPayload) a byte array and not a typed field?
  3. How does the AM know which downstream tasks to route the event to?
# Find the AM-side routing logic
grep -rn "DataMovementEvent\|routeEvent" \
  tez-dag/src/main/java/org/apache/tez/dag/app/ --include="*.java" | grep -v test | head -15

Step 7: Edge Properties → I/O Classes

The EdgeProperty object in the API specifies which I/O classes to use. Trace how EdgeProperty becomes actual I/O class instantiation in the container.

Starting point:

# Find EdgeProperty
find tez-api/src/main/java -name "EdgeProperty.java"

Then trace:

# How does VertexImpl use EdgeProperty to configure I/O for a vertex?
grep -n "EdgeProperty\|getInputDescriptor\|getOutputDescriptor" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head -15

Answer:

  1. What field in EdgeProperty specifies the Input class for the destination vertex?
  2. What field specifies the Output class for the source vertex?
  3. How is the class name passed to the container so it can instantiate the correct I/O class?

Build the IPO Map

For OrderedWordCount, fill in this table by reading the code:

EdgeSource vertexDest vertexOutput classInput classDataMovementType
Tokenizer → SumReducerTokenizerSumReducer???
SumReducer → SorterSumReducerSorter???

Read tez-examples/src/main/java/org/apache/tez/examples/OrderedWordCount.java to fill in the ? cells.

grep -n "EdgeProperty\|KVOutput\|KVInput\|DataMovementType" \
  tez-examples/src/main/java/org/apache/tez/examples/OrderedWordCount.java

Expected Output

By end of this lab you have:

  1. The IPO map table for OrderedWordCount completed
  2. An answer for each step question (from code, not from documentation)
  3. Understanding of what DataMovementEvent carries and why it exists
  4. Knowledge of which configuration key controls sort buffer size

Stretch Goals

  1. Find the shuffle HTTP server that serves partition data to Fetcher:

    find . -name "ShuffleHandler.java" | grep -v test
    

    What HTTP framework does it use? What is the URL pattern for fetching a partition?

  2. Trace what happens when Fetcher receives corrupted data (a checksum mismatch). Does the task fail immediately? Or does it retry from a different source?

  3. Find the EdgeManagerPlugin interface and read its contract:

    find tez-api/src/main/java -name "EdgeManagerPlugin.java"
    

    What three methods must a custom edge manager implement, and what do they do? Why would you use a custom edge manager instead of SCATTER_GATHER?

  4. Look at IntersectExample.java in tez-examples. It uses BROADCAST for one edge. Explain why: what is the semantic meaning of broadcasting in a join operation?

Lab 3.3 — Build It: Multi-Input Union DAG

Lab type: Build It — real Maven project, compilable Java, run + break + fix cycle
Estimated time: 90–120 min
Maven module: book/projects/level-3-multi-input
Main class: org.apache.tez.learning.l3.MultiInputDAG


What You Will Build

EvenNumberSource(1) ──even-edge──┐
                                  ├─▶ MultiInputUnionProcessor(1) ──▶ UnionSinkProcessor(1)
OddNumberSource(1)  ──odd-edge───┘

Two source vertices emit separate streams of integers (even: 0,2,4,…,98 and odd: 1,3,5,…,99). A middle vertex receives both streams through two named input edges, unions them, and forwards everything to a terminal sink. The sink sums all values and publishes the result via a Tez counter.

Expected output: TotalSum=4950 PASS

This is the smallest possible Tez program with a multi-input vertex — the same structural pattern used by every Tez join, union, and co-group operation.


Step 1 — Set the Tez Version

Open book/projects/pom.xml and confirm <tez.version> matches your local build:

cd ~/tez-src         # your Tez clone from Lab 1.1
git log --oneline -1 | head -c 60
mvn help:evaluate -Dexpression=project.version -q -DforceStdout 2>/dev/null

If the version printed differs from what is in the POM, update <tez.version> before continuing.


Step 2 — Compile and Run the Unit Tests

cd /path/to/apache-tez/book/projects

# Compile and test the new module only
mvn -pl level-3-multi-input test

You should see:

Tests run: 10, Failures: 0, Errors: 0, Skipped: 0

Read every test in TestMultiInputProcessors.java before moving on.

Questions

#Question
1testEvenAndOddRangesNoOverlapNoGap simulates both sources using a boolean[]. Why is this a more rigorous check than just verifying the counts?
2testEdgeNameConstants tests string literals. What real bug would be caught if a developer renamed the constant but not the string in buildDAG()?
3testExpectedSum hardcodes 4950L. Could you make this test fail by changing only EvenNumberSource.COUNT? What would change?

Step 3 — Build the Fat JAR and Run the DAG

mvn -pl level-3-multi-input package -q

java -jar level-3-multi-input/target/level-3-multi-input-1.0-SNAPSHOT-jar-with-dependencies.jar

Expected final line:

[MultiInputDAG] TotalSum=4950  expected=4950  PASS

If you see FAIL, the counter value is wrong — note the actual value before proceeding to the debugging exercises.


Step 4 — Read Every Source File

Work through each file in src/main/java/org/apache/tez/learning/l3/.

EvenNumberSource.java

#Question
1What Tez base class does it extend? What does that class provide?
2Why does run() call output.start() before getWriter()? What happens if you skip it? (Break It experiment below)
3The output is retrieved by getOutputs().values().iterator().next(). What would break if this vertex had two outputs?
4Why are key and value declared once outside the loop rather than inside it? What allocation cost would the inner-loop placement cause?

MultiInputUnionProcessor.java

#Question
1Inputs are retrieved by string name: inputs.get(EVEN_EDGE). Where are these names assigned? Trace the call to setDestinationEdgeName in MultiInputDAG.java.
2Both inputs are started before either reader is obtained. Could you start them one at a time (start even → read even → start odd → read odd)? What would happen?
3After draining the even input, the odd input's reader is obtained separately. Is there a scenario where odd records arrive before all even records have been read? How does Tez buffer handle this?
4The processor forwards records unchanged (key=value=integer). What change to run() would be needed to emit only distinct values if both sources could produce duplicates?

MultiInputDAG.java

#Question
1Both evenEdge and oddEdge use edgeCfg.createDefaultEdgeProperty(). Could you use different edge configs for the two sources? When would that be necessary?
2Edge.setDestinationEdgeName(...) names the edge as seen by the destination vertex. Does the source vertex also see this name? Check by reading the Edge API.
3The DAG has 4 vertices. Draw the dependency graph. Which vertices can run in parallel?
4waitForCompletion(EnumSet.of(StatusGetOpts.GET_COUNTERS)) — what does GET_COUNTERS do? What would status.getDAGCounters() return if this option were omitted?

Step 5 — Break It: Three Experiments

Perform each experiment, observe the failure, then revert before the next one.

Experiment A — Swap the edge names

In MultiInputDAG.buildDAG(), swap EVEN_EDGE and ODD_EDGE:

.setDestinationEdgeName(MultiInputUnionProcessor.ODD_EDGE)   // was EVEN_EDGE
// ...
.setDestinationEdgeName(MultiInputUnionProcessor.EVEN_EDGE)  // was ODD_EDGE

Rebuild and run.

  • Does the DAG succeed or fail?
  • Is the sum still 4950?
  • Why does swapping the names not cause a failure here, but would cause a failure in a join operation where the left and right inputs have different schemas?

Experiment B — Remove one start() call

In MultiInputUnionProcessor.run(), remove evenInput.start().

Rebuild and run.

  • What exception is thrown? On which line?
  • Search the Tez source for the method that throws this exception. What is the guard condition?

Experiment C — Make one source emit duplicates

In EvenNumberSource.run(), change int n = i * 2 to int n = 0 (every write uses key=0).

Rebuild and run.

  • What is the counter value now?
  • Is the DAG PASS or FAIL?
  • What does this reveal about how OrderedGroupedKVInput handles duplicate keys when the value type is IntWritable?

Step 6 — Implement a FilterUnionProcessor

Create a new file in the same package: FilterUnionProcessor.java

Specification:

  • Extends AbstractLogicalIOProcessor
  • Has the same two named inputs as MultiInputUnionProcessor
  • Accepts a threshold via UserPayload (key "threshold", default 50)
  • Reads from both inputs; only forwards values >= threshold
  • Increments counter UnionPipeline/FilteredCount for each record dropped

Wire it into the DAG as a replacement for MultiInputUnionProcessor:

Vertex filter = Vertex.create(
    "FilterUnion",
    ProcessorDescriptor.create(FilterUnionProcessor.class.getName())
        .setUserPayload(UserPayload.create(
            ByteBuffer.wrap("threshold=50".getBytes()))),
    1);

Expected result: With threshold=50, values 0–49 are dropped, values 50–99 are forwarded. Sum at sink = 50+51+…+99 = 3725. FilteredCount = 50.


Step 7 — Tez Source Connection Table

For each class below, locate the corresponding source file in your Tez clone and record the path:

Class used in this projectTez source file (relative to repo root)
OrderedPartitionedKVOutput
OrderedGroupedKVInput
OrderedPartitionedKVEdgeConfig
HashPartitioner
Edge.setDestinationEdgeName

Step 8 — Connect to Real Tez Data Flows

Open tez-examples/src/main/java/org/apache/tez/examples/JoinDataGen.java or OrderedWordCount.java in the Tez source tree.

  1. Find a DAG in the examples that has more than 2 vertices.
  2. Draw its topology as an ASCII diagram.
  3. Identify which vertex is the "union-like" vertex (if any) that receives edges from multiple sources.
  4. Compare its processor class to MultiInputUnionProcessor: what is similar, what is different?

Step 9 — JIRA Research

Search issues.apache.org/jira for:

project = TEZ AND text ~ "multi-input" AND resolution = Fixed

Find one resolved issue involving multiple inputs to a single vertex.

  • What was the bug?
  • Which class was modified?
  • Was a test added? If so, what does it test?

Level 4: DAG State Machine Internals

This level takes you inside VertexImpl — the most complex class in the Tez codebase. You will read the full state machine, understand every major state and the conditions that drive transitions, learn how VertexManager plugs in to control scheduling, and understand how speculative execution works. After this level you are capable of diagnosing vertex-level failures from AM log output and writing patches to the state machine.


Learning Objectives

By the end of Level 4 you must be able to:

  1. Read a StateMachineFactory definition and produce a transition table from code
  2. Explain the full INITIALIZING → INITED → RUNNING → SUCCEEDED path with all preconditions
  3. Describe what VertexManager does and when it is invoked
  4. Explain the difference between ImmediateStartVertexManager and ShuffleVertexManager
  5. Describe the speculative execution trigger conditions and what it causes
  6. Trace a vertex failure from first task failure to DAGImpl receiving V_COMPLETED
  7. Explain what vertex groups are and why they exist

Reading a StateMachineFactory

Tez uses Hadoop's StateMachineFactory (from hadoop-common). The pattern:

private static final StateMachineFactory<VertexImpl, VertexState, VertexEventType, VertexEvent>
    stateMachineFactory =
        new StateMachineFactory<>(VertexState.NEW)

        // From NEW
        .addTransition(VertexState.NEW,
            VertexState.INITIALIZING,
            VertexEventType.V_INIT,
            new InitTransition())

        // From INITIALIZING
        .addTransition(VertexState.INITIALIZING,
            EnumSet.of(VertexState.INITED, VertexState.FAILED),
            VertexEventType.V_INIT_DONE,
            new InitedTransition())
        ...
        .installTopology();

Reading rules

  1. First argument — the source state (where we are now)
  2. Second argument — the destination state(s). If an EnumSet, the transition handler decides which destination to return.
  3. Third argument — the event type that triggers this transition
  4. Fourth argument — the SingleArcTransition or MultipleArcTransition handler

A SingleArcTransition always goes to the same destination state. Its transition() method returns void.

A MultipleArcTransition can go to different states. Its transition() method returns the next VertexState.

When you see an EnumSet as the second argument, look for a MultipleArcTransition implementation — the logic inside that class decides which state to move to.

How to extract the full transition table

# List all addTransition calls in VertexImpl
grep -n "addTransition" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java \
  | wc -l

# Print them all
grep -n "addTransition" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java

The Full Vertex State Machine

NEW → INITIALIZING (event: V_INIT)

Triggered by DAGImpl.InitTransition when the DAG is initializing.

Handler: InitTransition

What happens:

  • VertexImpl sets up inputsWithInitializers — inputs that require RootInputInitializer
  • Registers event handlers for root input initializer completion events
  • If there are no root input initializers, immediately posts V_INIT_DONE (transitions to INITED in the same logical step)

Precondition for INITIALIZING → INITED:

  • All RootInputInitializers have reported completion
  • VertexManager.initialize() has completed

INITED → RUNNING (event: V_START)

Triggered by DAGImpl when all source vertices of this vertex have started or when the vertex has no source edges (it is a root vertex).

Handler: StartTransition

What happens:

  • Calls vertexManager.onVertexStarted()
  • The VertexManager decides when to schedule tasks

Important: the V_START event does not directly schedule tasks. The VertexManager does, via VertexManagerPlugin.scheduleVertexTasks().

RUNNING task completion handling

Each task completion (success or failure) generates a V_TASK_COMPLETED event.

The TaskCompletedTransition handler:

  • Increments the succeeded/failed task counter
  • Checks if all tasks are done → if yes, triggers V_COMPLETE_EVENT
  • Checks speculative execution conditions
  • Checks if failure count exceeds tolerable failures threshold

Key configuration: tez.vertex.failure-tasks-percent.to-fail-vertex — percentage of task failures that cause the entire vertex to fail. Default: 0 (any failure fails the vertex). Setting to > 0 enables partial failure tolerance.

RUNNING → COMMITTING (all tasks succeeded)

Before a vertex is marked SUCCEEDED, its output committers run.

Handler: VertexCommitCallback

What happens:

  • OutputCommitter.commitOutput() is called for each output with a committer
  • Commit is atomic: either all outputs commit or the vertex fails
  • The AM must not fail between task completion and output commit (AM recovery handles this)

RUNNING → FAILED

Triggers:

  1. A task exceeds the failure threshold (V_TASK_COMPLETED with failure)
  2. A container dies without a task completion report
  3. VertexManager reports an error
  4. A downstream vertex fails and error propagation is configured

RECOVERING states

When the AM restarts (e.g., due to a node failure), VertexImpl enters RECOVERING states. Recovery reads history events from the timeline service to reconstruct which tasks completed before the AM died, avoiding re-running already-succeeded tasks.

This is the most complex part of VertexImpl. Recovery bugs are a major category of contributor-fixable issues.


VertexManager

VertexManager is the plugin interface that controls task scheduling within a vertex. It sits between the AM framework and the actual task scheduler.

Interface (simplified)

public abstract class VertexManagerPlugin {
    // Called when vertex is initialized; plugin configures itself
    public abstract void initialize() throws Exception;

    // Called when V_START event fires; plugin decides when to schedule tasks
    public abstract void onVertexStarted(List<TaskAttemptIdentifier> completions)
        throws Exception;

    // Called each time a source vertex task completes
    // Plugin uses this to update scheduling decisions (for slow-start)
    public abstract void onSourceTaskCompleted(TaskAttemptIdentifier completedSrcTaskAttempt)
        throws Exception;

    // Called when vertex configuration changes (e.g., auto-parallelism)
    public abstract void onVertexManagerEventReceived(VertexManagerEvent vmEvent)
        throws Exception;
}

ImmediateStartVertexManager

The default for root vertices and vertices with no special scheduling requirements.

Behavior:

  • Schedules all tasks immediately when onVertexStarted() is called
  • Does not wait for any source task completion
  • Used by: Tokenizer vertex in OrderedWordCount

ShuffleVertexManager

Used for vertices that receive SCATTER_GATHER input from a source vertex.

Behavior:

  • Implements slow start: waits until a configurable fraction of source tasks have completed before scheduling downstream tasks
  • Configuration key: tez.shuffle-vertex-manager.min-src-fraction (default 0.25) and tez.shuffle-vertex-manager.max-src-fraction (default 0.75)
  • Implements auto-parallelism: can reduce the downstream vertex's parallelism based on the actual size of shuffle data
  • When auto-parallelism reduces parallelism, it calls context.reconfigureVertex() which posts a V_PARALLELISM_UPDATED event to VertexImpl

Why VertexManager Matters for Contributors

Auto-parallelism and slow-start bugs are a major category of Tez issues. The interaction between ShuffleVertexManager and VertexImpl involves:

  • Parallelism changes after task scheduling
  • Race conditions between task completion events and parallelism updates
  • Recovery of vertices that had parallelism changed before AM death

Speculative Execution

Speculative execution launches a duplicate task attempt when the original attempt is slow.

Trigger conditions

VertexImpl checks speculation conditions in TaskCompletedTransition and on a periodic timer:

  1. At least one task has completed (we have a baseline for "normal" task duration)
  2. The running attempt has been running longer than speculative_threshold * median_time
  3. The running attempt's progress is lower than expected for its elapsed time

Configuration:

tez.am.speculation.enabled = true   (default: false)
tez.am.speculation.interval-ms = 5000  (check interval)

What happens

  1. VertexImpl posts a TaskEventType.T_ADD_SPEC_ATTEMPT event to TaskImpl
  2. TaskImpl creates a new TaskAttemptImpl
  3. Both attempts run concurrently
  4. The first to succeed wins; the other is killed
  5. The winning attempt's output is committed; the losing attempt's output is discarded

Interaction with ShuffleVertexManager

If a speculative attempt completes, ShuffleVertexManager receives an onSourceTaskCompleted callback for the winning attempt. It must de-duplicate: the task's output should only be counted once regardless of which attempt succeeded.


Vertex Groups

Vertex groups (VertexGroup in the API) allow multiple vertices to be treated as a single logical vertex for downstream consumption.

Use case: merging the output of multiple Map vertices before a single Reduce vertex, without an intermediate shuffle. This is used in the Hive UnionAll operator implementation.

Key classes:

  • VertexGroup API: tez-api/src/main/java/org/apache/tez/dag/api/VertexGroup.java
  • GroupInputEdge: an edge from a VertexGroup to a regular vertex
  • The downstream vertex sees a single MergedLogicalInput that combines all group members

Key Classes for This Level

ClassPathFocus
VertexImpltez-dag/.../dag/impl/VertexImpl.javaThe entire state machine; 6000+ lines
ShuffleVertexManagertez-dag/.../library/cartesian/ShuffleVertexManager.javaWait: this is actually in tez-dag/.../vertexmanager/
ImmediateStartVertexManagertez-dag/.../vertexmanager/ImmediateStartVertexManager.javaSimple baseline
VertexManagerPlugintez-api/.../VertexManagerPlugin.javaThe interface
VertexManagerPluginContexttez-api/.../VertexManagerPluginContext.javaWhat the plugin can call back into
TaskImpltez-dag/.../dag/impl/TaskImpl.javaManages attempt lifecycle
# Find the VertexManager implementations
find tez-dag/src/main/java -name "*VertexManager*.java" | grep -v test

JIRA Categories for Level 4 Contributors

You are now ready to investigate and submit patches for:

  • Vertex failure handling bugs — incorrect state transitions, wrong error messages
  • VertexManager logic bugs — slow-start fraction calculation, auto-parallelism edge cases
  • Recovery bugs — vertices that fail to recover correctly after AM restart
  • Speculation bugs — duplicate completions, wrong trigger conditions
  • Test improvementsTestVertexImpl has hundreds of tests; adding coverage for edge cases

Approach:

  1. Find a TestVertexImpl test that is @Ignored — read the comment explaining why
  2. If the bug is fixed, the @Ignore can be removed (a trivial but real contribution)
  3. Or find a state machine transition that has no test coverage (grep for the transition, then grep for the handler class name in test files)

Deliverables

  • Extract the complete VertexImpl state transition table (all source states, event types, destination states) from the code
  • Explain ShuffleVertexManager slow-start in your own words, with the relevant config keys
  • Trace a vertex failure through TaskImpl → VertexImpl → DAGImpl using event type names
  • Identify one @Ignored test in TestVertexImpl and read why it is ignored
  • Lab 4.1 completed: full state machine map documented
  • Lab 4.2 completed: VertexManager walkthrough complete

Common Mistakes

MistakeImpactCorrect understanding
Assuming V_START schedules tasksCode changes that bypass VertexManager break auto-parallelismV_START calls VertexManager.onVertexStarted(); the manager schedules
Ignoring RECOVERING statesPatches that forget about recovery cause AM restart failuresEvery new state or transition must handle the RECOVERING_* path
Confusing TaskImpl failure handling with VertexImplRetry logic is in TaskImpl; failure threshold is in VertexImplRead both classes before touching failure handling code
Reading VertexImpl in isolationMany transitions involve callbacks to DAGImplAlways trace events both ways: into the state machine AND back out

Lab 4.1: Read the VertexImpl State Machine

Background

VertexImpl.java is the most complex class in Apache Tez. It is approximately 6,000 lines long and contains the complete state machine for vertex execution, including initialization, scheduling, task completion handling, failure handling, speculative execution, and AM recovery. Reading it systematically — rather than linearly — is the skill this lab builds.

The output of this lab is a complete state transition table that you have produced from the source code, without reference to any external documentation.


How to Read a Large State Machine Class

Do not read VertexImpl.java from top to bottom. Instead:

  1. Start with the StateMachineFactory declaration (search for stateMachineFactory =)
  2. Extract all addTransition calls — this gives you the complete transition table
  3. For each transition, find the handler class — the inner class that implements SingleArcTransition or MultipleArcTransition
  4. Read each handler's transition() method — this is the actual state machine logic
  5. Trace inter-state-machine events — where does the handler post events to other state machines?

Step-by-Step Tasks

Step 1: Find the StateMachineFactory

grep -n "stateMachineFactory" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head -5

Note the line number. The factory declaration starts there and continues for hundreds of lines. Read the entire factory definition — do not skip any transitions.

Step 2: Count All States and Transitions

# Count distinct source states referenced in addTransition
grep "addTransition(VertexState\." \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java \
  | sed 's/.*addTransition(VertexState\.\([A-Z_]*\).*/\1/' \
  | sort -u

# Count total transitions
grep -c "addTransition" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java

Record your numbers. You should find more than 30 distinct source states and more than 80 transitions.

Step 3: Build the Transition Table

For each line in the StateMachineFactory, extract:

  • Source state
  • Event type
  • Destination state(s)
  • Handler class name

Begin with the transitions from NEW:

# Find all transitions FROM NEW
awk '/addTransition\(VertexState\.NEW/,/\.addTransition/' \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java \
  | head -20

Then from INITIALIZING:

grep -A4 "addTransition(VertexState\.INITIALIZING" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head -40

Build a table with columns: Source State | Event | Destination | Handler.

Step 4: Trace the Happy Path

The happy path for a vertex with no source edges (a root vertex, e.g., Tokenizer):

NEW
  V_INIT → INITIALIZING (InitTransition)
    V_INIT_DONE → INITED (InitedTransition — if no root input initializers)
  V_START → RUNNING (StartTransition)
    [VertexManager schedules tasks]
    [All tasks complete successfully]
    V_TASK_COMPLETED (final task) → COMMITTING (TaskCompletedTransition)
    V_COMMIT_COMPLETED → SUCCEEDED (CommitCompletedTransition)

For each transition in the happy path, find the handler class and answer:

InitTransition.transition():

grep -n "class InitTransition" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java
  • What does InitTransition do when there are no RootInputInitializers?
  • Does it immediately post V_INIT_DONE, or is there an intermediate step?

InitedTransition.transition() (or whatever class handles V_INIT_DONE):

  • When does INITIALIZING go to INITED vs going directly to RUNNING?
  • What is the condition that allows immediate transition to RUNNING?

StartTransition.transition():

  • What method on VertexManager is called here?
  • Does this method block or is it asynchronous?

TaskCompletedTransition.transition():

  • How does it track whether all tasks have completed?
  • What is numSuccessSourceAttemptCompletions?
  • At what point does it decide the vertex can move to COMMITTING?

Step 5: Trace the Failure Path

A task fails. The event chain:

TaskAttemptImpl: RUNNING → FAILED (sends T_ATTEMPT_FAILED to TaskImpl)
  TaskImpl: RUNNING → FAILED (if retry limit exceeded; sends V_TASK_COMPLETED{FAILED})
    VertexImpl: RUNNING → ?

Find the handler for V_TASK_COMPLETED when the task is FAILED:

# TaskCompletedTransition handles both success and failure
grep -n "TaskCompletedTransition" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head -10

Answer:

  1. What field tracks the number of failed tasks?
  2. What is the condition that causes the vertex to transition to FAILED?
  3. What event does VertexImpl send to DAGImpl when it fails?
  4. Does DAGImpl fail immediately when a vertex fails, or does it try to continue?
# Find how DAGImpl handles vertex failure
grep -n "DAG_VERTEX_COMPLETED\|vertexFailed\|VERTEX_FAILED" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java | head -15

Step 6: Find the RECOVERING States

grep "RECOVERING" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java \
  | grep "VertexState\." | head -20

Answer:

  1. How many RECOVERING_* states exist?
  2. What event exits the RECOVERING state?
  3. What class handles recovery completion?

Step 7: Find All @Ignored Tests in TestVertexImpl

grep -n "@Ignore" \
  tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestVertexImpl.java

For each @Ignored test:

  1. Read the comment explaining why it is ignored
  2. Determine if the bug has been fixed (search JIRA for the referenced issue number)
  3. If the fix exists, the test can likely be re-enabled — this is a contributor opportunity

Step 8: Find a Transition with No Test Coverage

Pick three transition handler classes from your transition table. For each, check if TestVertexImpl has a test that exercises that handler:

# Example: does TestVertexImpl test TaskCompletedTransition?
grep -n "TaskCompletedTransition\|taskCompletedTransition" \
  tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestVertexImpl.java | head -5

# If none found, search for tests that trigger V_TASK_COMPLETED
grep -n "V_TASK_COMPLETED\|VertexEventType.V_TASK_COMPLETED" \
  tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestVertexImpl.java | head -10

Identify one transition that appears to have insufficient test coverage and document it. This is a potential Test JIRA issue you could file and fix.


Deliverable: Your Transition Table

Produce a table in this format (populate all rows from code):

| Source State      | Event Type          | Destination       | Handler Class            |
|---|---|---|---|
| NEW               | V_INIT              | INITIALIZING      | InitTransition           |
| INITIALIZING      | V_INIT_DONE         | INITED / FAILED   | InitedTransition         |
| INITED            | V_START             | RUNNING           | StartTransition          |
| RUNNING           | V_TASK_COMPLETED    | RUNNING/SUCCEEDED | TaskCompletedTransition  |
| ...               | ...                 | ...               | ...                      |

Your table should have at least 30 rows (covering the main execution paths). Recovery states are optional for this level.


Expected Output

  1. A complete (or near-complete) state transition table for VertexImpl
  2. Answers to all questions in Steps 4–6 with file:line references
  3. List of @Ignored tests with your assessment of whether they could be re-enabled
  4. One transition identified as having insufficient test coverage

Stretch Goals

  1. Produce the same transition table for TaskImpl and TaskAttemptImpl. Compare their complexity (number of states and transitions) to VertexImpl.

  2. Find all places where VertexImpl calls eventHandler.handle() to post an event to another state machine. What are the target state machines and what event types are used?

    grep -n "eventHandler.handle" \
      tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java \
      | grep -v "VertexEvent" | head -20
    
  3. Find the V_PARALLELISM_UPDATED transition — what does it do, and why is it one of the most bug-prone transitions in the state machine?

Lab 4.2: VertexManager Deep Dive

Background

VertexManager is the hook that makes Tez more than just a DAG scheduler. By plugging in a custom VertexManagerPlugin, applications can implement dynamic parallelism, slow start, skew handling, and custom task scheduling — without modifying the core AM.

This lab walks through the two built-in VertexManager implementations, explains their behaviors via code reading, and ends with a minimal custom VertexManagerPlugin that you write and unit-test.


The VertexManagerPlugin Contract

Full interface: tez-api/src/main/java/org/apache/tez/dag/api/VertexManagerPlugin.java

public abstract class VertexManagerPlugin {
    private VertexManagerPluginContext context;

    // Called once by the AM to provide the context object
    public final void setContext(VertexManagerPluginContext context) { ... }

    // The plugin implementation must implement these:
    public abstract void initialize() throws Exception;
    public abstract void onVertexStarted(List<TaskAttemptIdentifier> completions)
        throws Exception;
    public abstract void onSourceTaskCompleted(
        TaskAttemptIdentifier completedSrcTaskAttempt) throws Exception;
    public abstract void onVertexManagerEventReceived(
        VertexManagerEvent vmEvent) throws Exception;
    // Called when an input is initialized (root inputs only):
    public void onRootVertexInitialized(String inputName,
        InputDescriptor inputDescriptor, List<Event> events) throws Exception {}
}

VertexManagerPluginContext — what the plugin can call back into

find tez-api/src/main/java -name "VertexManagerPluginContext.java"
cat $(find tez-api/src/main/java -name "VertexManagerPluginContext.java")

Key methods on the context:

MethodWhat it does
scheduleVertexTasks(List<TaskWithLocation>)Schedules the given tasks for execution
reconfigureVertex(int parallelism, VertexLocationHint, Map<String,EdgeProperty>)Changes parallelism and/or edge properties at runtime
getVertexNumTasks(String vertexName)Returns the current parallelism of a named vertex
getCurrentParallelism()Returns this vertex's current parallelism
getInputVertexEdgeProperties()Returns the EdgeProperty for each input edge
sendEventToProcessor(List<Event>, String, int)Sends a VertexManagerEvent to a task

Reading ImmediateStartVertexManager

find tez-dag/src/main/java -name "ImmediateStartVertexManager.java"
cat $(find tez-dag/src/main/java -name "ImmediateStartVertexManager.java")

Answer these questions from the code:

  1. In initialize(): does ImmediateStartVertexManager do anything? If not, why does it exist?
  2. In onVertexStarted(): does it schedule tasks immediately or wait for anything?
  3. What TaskWithLocation does it create for each task? Does it provide any location hints?
  4. Does it implement onSourceTaskCompleted()? If so, what does it do?

Expected finding: ImmediateStartVertexManager is intentionally minimal. Its purpose is to provide a named, testable implementation that schedules all tasks immediately with no location hints. It is the baseline from which ShuffleVertexManager diverges.


Reading ShuffleVertexManager

find tez-dag/src/main/java -name "ShuffleVertexManager.java"
wc -l $(find tez-dag/src/main/java -name "ShuffleVertexManager.java")

Slow Start

Find the slow-start logic in onSourceTaskCompleted().

grep -n "minFraction\|maxFraction\|min-src-fraction\|completedSourceTasks\|pendingTasksToSchedule" \
  $(find tez-dag/src/main/java -name "ShuffleVertexManager.java") | head -20

Answer:

  1. What is the variable that tracks how many source tasks have completed?
  2. At what fraction does ShuffleVertexManager start scheduling tasks?
  3. What is the formula: at fraction F between minFraction and maxFraction, what percentage of downstream tasks are scheduled?

Auto-Parallelism

Find the auto-parallelism logic:

grep -n "reconfigureVertex\|numBipartiteSourceTasks\|desiredTaskInputSize\|targetParallelism" \
  $(find tez-dag/src/main/java -name "ShuffleVertexManager.java") | head -20

Answer:

  1. What configuration key enables auto-parallelism?
  2. What information does ShuffleVertexManager use to compute the optimal parallelism?
  3. When is context.reconfigureVertex() called?
  4. What is the minimum parallelism ShuffleVertexManager will ever set (the floor)?

VertexManagerEvent handling

When auto-parallelism is enabled, each upstream task sends a VertexManagerEvent to the downstream VertexManagerPlugin containing statistics about its output (byte count, record count, partition sizes).

grep -n "VertexManagerEvent\|onVertexManagerEventReceived\|vmEvent" \
  $(find tez-dag/src/main/java -name "ShuffleVertexManager.java") | head -15

Answer:

  1. What protobuf message is decoded from the event payload?
  2. What statistic is accumulated across all events?
  3. How does ShuffleVertexManager use the accumulated statistics to decide on new parallelism?

Write a Minimal Custom VertexManager

Create a CountingVertexManager that:

  1. Schedules 50% of tasks immediately when onVertexStarted() is called
  2. Schedules the remaining tasks when all source tasks have completed
  3. Logs the number of scheduled tasks at each scheduling call

This is the core pattern of slow-start, stripped to its minimum.

Implementation skeleton

package org.apache.tez.dag.library.vertexmanager;

import org.apache.tez.dag.api.VertexManagerPlugin;
import org.apache.tez.dag.api.VertexManagerPluginContext;
import org.apache.tez.dag.api.TaskAttemptIdentifier;
import org.apache.tez.dag.api.event.VertexManagerEvent;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.List;

public class CountingVertexManager extends VertexManagerPlugin {

    private static final Logger LOG =
        LoggerFactory.getLogger(CountingVertexManager.class);

    private int totalSourceTasks = 0;
    private int completedSourceTasks = 0;
    private boolean secondBatchScheduled = false;
    private int totalTasksToSchedule = 0;

    @Override
    public void initialize() {
        totalTasksToSchedule = getContext().getCurrentParallelism();
        // Count source tasks across all input vertices
        for (String inputVertex : getContext().getInputVertexEdgeProperties().keySet()) {
            totalSourceTasks += getContext().getVertexNumTasks(inputVertex);
        }
    }

    @Override
    public void onVertexStarted(List<TaskAttemptIdentifier> completions) {
        // Schedule first 50%
        int firstBatch = totalTasksToSchedule / 2;
        List<VertexManagerPluginContext.ScheduleTaskRequest> toSchedule = new ArrayList<>();
        for (int i = 0; i < firstBatch; i++) {
            toSchedule.add(VertexManagerPluginContext.ScheduleTaskRequest.create(i, null));
        }
        LOG.info("CountingVertexManager: scheduling first batch of {} tasks", firstBatch);
        getContext().scheduleTasks(toSchedule);
    }

    @Override
    public void onSourceTaskCompleted(TaskAttemptIdentifier completedSrcTaskAttempt) {
        completedSourceTasks++;
        if (!secondBatchScheduled && completedSourceTasks >= totalSourceTasks) {
            // Schedule remaining 50%
            int firstBatch = totalTasksToSchedule / 2;
            List<VertexManagerPluginContext.ScheduleTaskRequest> toSchedule = new ArrayList<>();
            for (int i = firstBatch; i < totalTasksToSchedule; i++) {
                toSchedule.add(VertexManagerPluginContext.ScheduleTaskRequest.create(i, null));
            }
            LOG.info("CountingVertexManager: scheduling second batch of {} tasks",
                toSchedule.size());
            getContext().scheduleTasks(toSchedule);
            secondBatchScheduled = true;
        }
    }

    @Override
    public void onVertexManagerEventReceived(VertexManagerEvent vmEvent) {
        // No-op: we don't need statistics for this simple implementation
    }
}

Implementation tasks

  1. Identify the correct API method: getContext().scheduleTasks() vs getContext().scheduleVertexTasks() — check which one exists in your version of the API.

  2. Write a unit test using MockVertexManagerPluginContext (if it exists) or a mock:

    • Initialize the manager with parallelism = 10 and 4 source tasks
    • Call onVertexStarted() — verify 5 tasks are scheduled
    • Call onSourceTaskCompleted() 4 times — verify remaining 5 tasks are scheduled on the 4th call
    • Verify secondBatchScheduled is true after
  3. Register the CountingVertexManager in a DAG:

    Vertex reduceVertex = Vertex.create("reducer",
        ProcessorDescriptor.create(MyReducer.class.getName()), 10);
    reduceVertex.setVertexManagerPlugin(
        VertexManagerPluginDescriptor.create(CountingVertexManager.class.getName()));
    

Finding the VertexManager Test Utilities

# Find mock context for testing
find tez-dag/src/test -name "*Mock*Vertex*" -o -name "*VertexManager*Test*" | grep -v ".class"

# Find TestShuffleVertexManager
find . -name "TestShuffleVertexManager.java" | grep test

Read TestShuffleVertexManager.java to understand how VertexManager tests are structured. The test creates a mock context, calls lifecycle methods in order, and asserts which tasks were scheduled.


Expected Output

  1. Answers to all questions in the ImmediateStartVertexManager and ShuffleVertexManager sections, with file:line references
  2. A working CountingVertexManager implementation that compiles
  3. A unit test that passes for the two scheduling scenarios

Stretch Goals

  1. Read CartesianProductVertexManager — the most complex VertexManager:

    find tez-dag/src/main/java -name "CartesianProductVertexManager.java"
    

    What computation does it coordinate? When is it used?

  2. Find a ShuffleVertexManager related JIRA (search for "ShuffleVertexManager" in JIRA). Read the issue description and the patch. What invariant was violated?

  3. Implement a NoOpVertexManager that schedules no tasks (for testing DAG failure paths). Use it in a test DAG and verify the vertex fails with FAILED status after the timeout.

Lab 4.3 — Build It: WavingVertexManager

Lab type: Build It — VertexManagerPlugin with full JUnit + Mockito test suite
Estimated time: 120–150 min
Maven module: book/projects/level-4-waving-manager
Key class: org.apache.tez.learning.l4.WavingVertexManager


What You Will Build

A VertexManagerPlugin that schedules tasks in configurable waves:

  • Wave 0: tasks 0 to waveSize-1
  • Wave 1: tasks waveSize to 2×waveSize-1
  • Wave N: starts only when all tasks in wave N-1 have succeeded

Wave size is read from UserPayload as "waveSize=N".
Default: WavingVertexManager.DEFAULT_WAVE_SIZE = 2.

This is a minimal but complete VertexManagerPlugin — the same architectural pattern used by ImmediateStartVertexManager, ShuffleVertexManager, and the VertexManagerPlugin inside every Hive-on-Tez reduce vertex.


Step 1 — Understand the VertexManagerPlugin Contract

Before reading any code, open the Tez source:

find ~/tez-src -name "VertexManagerPlugin.java" | head -3
find ~/tez-src -name "VertexManagerPluginContext.java" | head -3
find ~/tez-src -name "ImmediateStartVertexManager.java" | head -3

Read all three files completely. Then answer:

#Question
1What are all the lifecycle callback methods in VertexManagerPlugin? List them.
2When does the Tez AM call initialize()? Can you call scheduleVertexTasks() from inside initialize()?
3What does VertexManagerPluginContext.scheduleVertexTasks(List<ScheduleTaskRequest>) actually do to the DAG execution engine?
4ImmediateStartVertexManager.onVertexStarted() calls scheduleAllTasks(). Does it call scheduleVertexTasks once (all tasks in one list) or once per task? Why does that matter for performance?
5What is the purpose of VertexManagerPluginContext.reconfigureVertex()? Does WavingVertexManager use it?

Step 2 — Compile and Run the Tests

cd /path/to/apache-tez/book/projects
mvn -pl level-4-waving-manager test

Expected:

Tests run: 13, Failures: 0, Errors: 0, Skipped: 0

Step 3 — Read the Source Code

Open WavingVertexManager.java and work through every section.

initialize()

#Question
1The payload is parsed as "waveSize=N". Where in a real DAG would you set this payload? (Hint: VertexManagerPluginDescriptor.setUserPayload() in DAG.create())
2Why does initialize() store totalTasks from the context rather than accepting it as a constructor argument?
3If the user sets waveSize=1000 but there are only 5 tasks, what happens? Is there a bug?
4Why are scheduled and waveFinished BitSets rather than List<Integer>? What is the time complexity of BitSet.andNot()?

onVertexStarted()

#Question
1The completions map passed to onVertexStarted is ignored. Under what condition would a real plugin need to process it?
2Why is scheduleNextWave() called here and not from initialize()?

onTaskAttemptCompleted()

#Question
1Failed attempts are silently ignored (if (!successful) return). What should a production plugin do instead?
2checkAndScheduleNextWave() clones scheduled to avoid mutating it. What subtle bug would occur without the clone?
3Trace through the state machine for 4 tasks, waveSize=2. Draw the state of scheduled and waveFinished after each callback.

scheduleNextWave()

#Question
1The while loop has two conditions: nextTaskToSchedule < totalTasks AND count < waveSize. Which terminates the loop for the last wave if the number of tasks is not a multiple of waveSize?
2The scheduled.get(idx) guard protects against double-scheduling. In what scenario could idx already be set? (Hint: look at the testTaskNotScheduledTwice test.)

Step 4 — Read the Test Suite

Open TestWavingVertexManager.java. For each test, before reading the assertions:

  1. Read the test name
  2. Predict what the test will assert
  3. Then read the actual assertions and compare to your prediction

Pay particular attention to how Mockito is used:

Mockito callWhat it does
mock(VertexManagerPluginContext.class)Creates a fake context that records all calls
when(ctx.getVertexNumTasks(...)).thenReturn(6)Stubs a specific return value
verify(ctx, times(2)).scheduleVertexTasks(anyList())Asserts the method was called exactly twice
ArgumentCaptor.forClass(List.class)Captures the actual argument for deep inspection

Questions

#Question
1testThreeWavesForSixTasks is an integration test of the entire scheduling lifecycle. Which individual unit tests cover the sub-cases that this test depends on?
2testPartialWave0DoesNotTriggerWave1 verifies the negative case (wave NOT triggered). How does verify(times(1)) prove this? Could you use verifyNoMoreInteractions() instead?
3The test class has a @Before setUp() method. What happens if you remove it and inline mockContext = mock(...) into each test instead?

Step 5 — Break It: Three Experiments

Experiment A — Remove the if (!successful) return guard

Delete the early-return in onTaskAttemptCompleted. Run:

mvn -pl level-4-waving-manager test -Dtest=TestWavingVertexManager#testFailedAttemptDoesNotAdvanceWave
  • Which test fails?
  • What is the actual vs. expected scheduleVertexTasks call count?
  • Why does treating failures as successes cause premature wave advancement?

Experiment B — Remove the BitSet.clone() in checkAndScheduleNextWave

Change:

BitSet scheduledCopy = (BitSet) scheduled.clone();
scheduledCopy.andNot(waveFinished);

to:

scheduled.andNot(waveFinished);

Run the full test suite.

  • Which tests fail?
  • What data corruption does this mutation cause? Trace through testThreeWavesForSixTasks manually.

Experiment C — Change count < waveSize to count <= waveSize

In scheduleNextWave(), change the loop condition.

  • How many tasks does wave 0 now schedule?
  • Which test catches this?

Step 6 — Add a New Feature: onVertexManagerEventReceived

The real ShuffleVertexManager uses onVertexManagerEventReceived to receive partition statistics from map tasks. Add support for a simple variant:

Create a new callback method:

@Override
public void onVertexManagerEventReceived(
        List<VertexManagerEvent> vmEvents) throws Exception {
    // If any event's user payload contains "skip=true", mark
    // that task as finished so it does not block wave advancement.
    for (VertexManagerEvent event : vmEvents) {
        // TODO: parse UserPayload for "skip=true"; if present, call
        //       onTaskAttemptCompleted(taskIndex, true) to release the wave
    }
}

Write a test for this method:

@Test
public void testSkipEventReleasesWave() {
    // set up 4 tasks, wave size 2
    // trigger onVertexStarted (wave 0: tasks 0,1)
    // send a VertexManagerEvent for task 0 with payload "skip=true"
    // verify task 0 is treated as done for wave-completion purposes
}

Step 7 — Tez Source Connection Table

Class used in this projectTez source file
VertexManagerPlugin
VertexManagerPluginContext
ScheduleTaskRequest
ImmediateStartVertexManager
ShuffleVertexManager

Step 8 — ShuffleVertexManager Deep Dive

Open ShuffleVertexManager.java in the Tez source:

find ~/tez-src -name "ShuffleVertexManager.java"
  1. Read onVertexStarted(). Does it schedule tasks immediately like WavingVertexManager, or does it wait? What does it wait for?
  2. Find the slowStartFraction field. How does it determine when to start scheduling?
  3. Find where reconfigureVertex() is called. What does it change about the vertex?
  4. How does ShuffleVertexManager prevent double-scheduling? Compare its guard to the scheduled BitSet in WavingVertexManager.
  5. ShuffleVertexManager has ~700 lines. Identify the 5 most important methods (the ones that contain the core scheduling logic) and list them.

Step 9 — JIRA Research: VertexManager Bugs

Search:

project = TEZ AND component = "tez-dag" AND text ~ "VertexManager" AND resolution = Fixed

Find one resolved issue where a VertexManagerPlugin had a scheduling bug.

  • What was the bug? (Race condition? Double scheduling? Wrong wave boundary?)
  • What was the fix?
  • Was a test added? What does it mock?

Lab 4.4 — Fix It: Null Dereference in ShuffleVertexManager on Zero-Partition Source

Lab type: Fix-It — reproduce → locate → write failing test → patch → verify → format patch
Estimated time: 120–150 min
Tez component: tez-dagorg.apache.tez.dag.app.dag.impl.ShuffleVertexManager


Background

ShuffleVertexManager uses partition statistics sent by map tasks to decide when to start reduce tasks (slow-start) and how many reducers to run (auto-parallelism). It processes these statistics via onVertexManagerEventReceived().

A long-standing bug category in this path: when a source vertex has zero output partitions (all records were filtered, or the vertex ran with zero tasks), the plugin can receive a ShuffleVertexManager.VertexManagerEvent whose payload encodes 0 partitions. In several versions of Tez, this caused a NullPointerException or ArithmeticException (divide by zero) deep in the statistics-processing path — the code assumed at least one partition existed.

This lab reproduces the bug pattern in a unit test, locates the exact guard that is missing, applies the fix, and submits a patch.


Step 1 — Locate the Source File

find ~/tez-src -name "ShuffleVertexManager.java" | head -5

Expected:

./tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/ShuffleVertexManager.java

Also locate the test file:

find ~/tez-src -name "TestShuffleVertexManager.java" | head -5

Step 2 — Read the Statistics Path

In ShuffleVertexManager.java, find the method that processes VertexManagerEvent payloads. It will have a call to ShuffleVertexManagerBase.parseStatsHeader() or similar, and will work with numPartitions or partitionCount.

Trace the complete call chain from onVertexManagerEventReceived() to the line that first uses the partition count arithmetically.

Questions

#Question
1What is the name of the proto-based payload class that encodes partition statistics?
2Which method extracts the partition count from the payload?
3On what line does the first arithmetic operation involving the partition count occur?
4Is there a null-check or zero-check before that line?
5What exception would result if partitionCount == 0 at that line?

Step 3 — Find the Existing Test

find ~/tez-src -name "TestShuffleVertexManager.java"

Open it and search for any test that covers the zero-partition case:

grep -n "zero\|0.*partition\|partition.*0" TestShuffleVertexManager.java -i | head -20

Note: in most Tez versions there is no such test — that is the gap you will fill.


Step 4 — Write the Reproducing Test

Add the following test to TestShuffleVertexManager.java. The exact helper methods depend on the version you have; adapt the setup pattern from the nearest existing test (look for testAutoParallelism or testSlowStart).

@Test(expected = Exception.class)   // replace Exception with the specific type you observe
public void testZeroPartitionSourceDoesNotCrash() throws Exception {
    // TODO: set up a ShuffleVertexManager with auto-parallelism enabled
    // TODO: send a VertexManagerEvent with numPartitions = 0
    // TODO: call onVertexManagerEventReceived with that event
    //       The call should NOT throw — once fixed.
    //       Mark expected = Exception.class so the test initially *passes*
    //       when the bug exists (the code throws), then change to asserting
    //       no throw after the fix is applied.
}

Run:

cd ~/tez-src
mvn test -pl tez-dag -Dtest=TestShuffleVertexManager#testZeroPartitionSourceDoesNotCrash -q 2>&1 | tail -30

Record: which exception is thrown and on which line.


Step 5 — Apply the Fix

In ShuffleVertexManager.java, add a guard at the point identified in Step 2.

Rules

  • The guard must be a minimum: either if (partitionCount == 0) { return; } to skip the event, or if (partitionCount == 0) { partitionCount = 1; } to normalise (choose the semantically correct one — which is safer for scheduling?)
  • Do not reformat surrounding code
  • Do not change method signatures

Step 6 — Update the Test

Now that the fix is applied, update the test:

@Test
public void testZeroPartitionSourceDoesNotCrash() throws Exception {
    // Same setup as before
    // This time assert NO exception is thrown
    // Optionally assert that scheduling state is unchanged
}

Run the full tez-dag test suite:

mvn test -pl tez-dag -q 2>&1 | tail -20

All tests must pass.


Step 7 — Checkstyle

mvn checkstyle:check -pl tez-dag -q 2>&1 | grep -E "ERROR|WARNING|violation" | head -20

Zero violations required.


Step 8 — Format the Patch

cd ~/tez-src
git diff > /tmp/TEZ-ZEROPART.001.patch
cat /tmp/TEZ-ZEROPART.001.patch

Checklist:

  • Only ShuffleVertexManager.java and TestShuffleVertexManager.java modified
  • No trailing whitespace: grep -P "\\s+$" /tmp/TEZ-ZEROPART.001.patch
  • Patch applies cleanly: git apply --check /tmp/TEZ-ZEROPART.001.patch
  • All tests pass after git apply

Step 9 — Write the JIRA Description

Summary: ShuffleVertexManager throws [ExceptionType] when source vertex
         has zero output partitions

Description:
  When a source vertex completes with zero output partitions (all records
  filtered or vertex ran zero tasks), ShuffleVertexManager.onVertexManagerEventReceived
  receives a VertexManagerEvent with partitionCount=0.  The statistics
  processing path performs arithmetic on this value without a zero guard,
  causing [ExceptionType] at [ClassName].java:[line].

  Steps to reproduce:
    See attached TestShuffleVertexManager#testZeroPartitionSourceDoesNotCrash.

  Fix:
    Add a zero-partition guard at [method name], line [N].
    Skip or normalise the event when partitionCount == 0.

Priority: Major
Component: tez-dag
Affects Version: 0.10.x

Step 10 — Deeper Understanding

After completing the fix, answer these questions by reading ShuffleVertexManager.java:

#Question
1What is the slowStartMinFraction and slowStartMaxFraction used for? At what point in the scheduling lifecycle are they checked?
2When does ShuffleVertexManager call reconfigureVertex()? What does it change?
3What data structure accumulates partition statistics across multiple VertexManagerEvent calls? Why accumulate rather than process each event independently?
4The test class uses mock(VertexManagerPluginContext.class). Compare this to TestWavingVertexManager — what additional interactions does ShuffleVertexManager have with the context that WavingVertexManager does not?
5Search for all places in ShuffleVertexManager where a divide-by-zero could theoretically occur. List them.

Level 5: Testing Infrastructure

Apache Tez has one of the most complete test suites in the Hadoop ecosystem: thousands of unit tests, a MiniTezCluster integration harness, and a TestOrderedWordCount end-to-end reference. At this level you will move from reading tests to writing them — adding missing coverage to TestVertexImpl, submitting a real DAG against MiniTezCluster, and finding and fixing a flaky test.

Why testing matters for contributors

Every Tez patch must include either (a) a new test that fails without the patch and passes with it, or (b) a clear justification in the JIRA for why a test is not needed. Committers will block patches that regress existing tests or that add unverified logic.

What this level covers

TopicWhere
MiniTezCluster setup/teardown lifecycleLab 5.1
TestOrderedWordCount as the canonical integration test templateLab 5.1
Adding a missing TestVertexImpl transition testLab 5.2
Writing a full mini-cluster integration test for your own DAGLab 5.3
Identifying, reproducing, and fixing a flaky testLab 5.4

Prerequisites

  • Level 4 complete (you understand VertexImpl state machine and VertexManagerPlugin)
  • Tez source checked out and mvn install -DskipTests succeeded

Test categories and Maven commands

CategoryWhat it testsCommand
UnitSingle class in isolation with mocksmvn test -pl tez-dag -Dtest=TestVertexImpl
Mini-cluster integrationFull AM + YARN + HDFS in-processmvn test -pl tez-tests -Dtest=TestOrderedWordCount
SystemReal cluster (CI only)Not run locally

Key test classes

ClassModuleWhat it covers
TestVertexImpltez-dagVertexImpl state machine, transitions, vertex recovery
TestDAGImpltez-dagDAGImpl state machine, DAG-level events
TestTaskImpltez-dagTaskImpl scheduling, speculation, counters
TestTaskAttemptImpltez-dagTaskAttemptImpl state transitions
TestOrderedWordCounttez-testsEnd-to-end DAG submission against MiniTezCluster
TestMiniTezClusterWithTeztez-testsMulti-DAG runs, recovery, kill scenarios

Expected outcome

By the end of this level you will have:

  1. Run a DAG against MiniTezCluster inside a JUnit test
  2. Added a missing state-machine transition test to TestVertexImpl
  3. Identified and fixed a flaky test (or documented why it flakes)

Lab 5.1 — Explore MiniTezCluster and TestOrderedWordCount

Lab type: Read & Run
Estimated time: 90 min
Tez module: tez-tests
Key class: org.apache.tez.test.TestOrderedWordCount


Overview

MiniTezCluster spins up an in-process YARN ResourceManager, NodeManager, HDFS NameNode, and DataNode, plus the Tez ApplicationMaster — all inside a single JVM. This lets you submit real DAGs in a JUnit test with no external infrastructure.

TestOrderedWordCount is the canonical example: it submits a multi-stage word-count DAG (tokenize → partition → sort → count) and asserts correct output.


Step 1 — Locate the Files

find ~/tez-src -name "MiniTezCluster.java" | head -5
find ~/tez-src -name "TestOrderedWordCount.java" | head -5
find ~/tez-src -name "MiniTezClusterWithTez.java" | head -5

Step 2 — Read MiniTezCluster.java

Open MiniTezCluster.java and answer:

#Question
1What superclass does MiniTezCluster extend? What Hadoop class sets up the in-process YARN cluster?
2Where is TezConfiguration created and how is it modified to use the in-process services?
3What is the purpose of the serviceStart() method? What does it start?
4After serviceStop(), can you call serviceStart() again on the same instance? Why or why not?
5Where does MiniTezCluster write its temporary data (HDFS files, YARN work dirs)? How would a test clean this up?

Step 3 — Read TestOrderedWordCount.java

Work through the test lifecycle:

3a — @BeforeClass setUpClass()

#Question
1How many NodeManagers does the test cluster start with?
2After miniTezCluster.start(), what call copies the Tez auxiliary service config?
3Where are test input files created — on HDFS or local FS?
4Is a new TezClient created per test or per class?

3b — @Test testOrderedWordCount()

#Question
1Trace the method calls from TezClient.submitDAG() to when the test receives the final DAGStatus.
2What does the assertion verify — DAG state, output correctness, or counter values?
3If you wanted to assert on a specific counter (e.g. TaskCounter.INPUT_RECORDS_PROCESSED), where in the test would you add that assertion?

3c — @AfterClass tearDownClass()

#Question
1What is the order of shutdown calls? Does the TezClient stop before or after the cluster?
2Does the test delete the HDFS working directory? Should it?

Step 4 — Run the Test

cd ~/tez-src
mvn test -pl tez-tests -Dtest=TestOrderedWordCount -q 2>&1 | tail -20

Expected:

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

If you see Unable to find class: org.apache.tez.test.TestOrderedWordCount, ensure mvn install -DskipTests completed successfully for all modules.


Step 5 — Measure the Overhead

Time the test:

time mvn test -pl tez-tests -Dtest=TestOrderedWordCount -q 2>&1 | tail -3

Record how long it takes. Then answer:

  1. Is the bottleneck cluster startup, DAG execution, or cluster shutdown? (Hint: add -Dorg.apache.tez.test.MiniTezCluster.log.level=DEBUG and look at the timestamps.)
  2. Why is @BeforeClass used instead of @Before? What is the performance difference?

Step 6 — Find More Integration Tests

find ~/tez-src/tez-tests -name "Test*.java" | xargs grep -l "MiniTezCluster" | head -10

Pick one that is NOT TestOrderedWordCount. Read its @BeforeClass and one @Test method. Answer:

  1. What scenario does this test cover that TestOrderedWordCount does not?
  2. Does it use a separate MiniTezCluster instance, or the same one reused across multiple test classes? How?

Step 7 — Source Connection Table

Class used in this labTez source file (relative to repo root)
MiniTezCluster
TezClient
TezConfiguration
DAGStatus
MiniDFSCluster (Hadoop helper)

Step 8 — JIRA Research

Search:

project = TEZ AND component = "tez-tests" AND resolution = Fixed ORDER BY updated DESC

Find a recent test-improvement JIRA.

  1. What was added or fixed?
  2. Does the patch include a new test, an existing test modification, or a flaky-test fix?

Lab 5.2 — Add a Missing TestVertexImpl Transition Test

Lab type: Fix-It (test improvement)
Estimated time: 90 min
Tez module: tez-dag
Key class: org.apache.tez.dag.app.dag.impl.TestVertexImpl


Overview

TestVertexImpl covers the VertexImpl state machine but no test suite is ever complete. In this lab you will:

  1. Read the state machine definition
  2. Identify an untested transition
  3. Write a JUnit test that exercises that transition
  4. Verify it fails without the expected assertions and passes with them

This is the canonical entry point for new Tez contributors — many accepted patches are "add test coverage for transition X".


Step 1 — Locate the State Machine Definition

find ~/tez-src -name "VertexImpl.java" | head -3
grep -n "StateMachineFactory\|addTransition" \
  ~/tez-src/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java \
  | head -50

The state machine is built with StateMachineFactory<VertexImpl, VertexState, VertexEventType, VertexEvent>. Each addTransition() call defines:

  • current state
  • event type
  • next state
  • transition action

Step 2 — Read TestVertexImpl.java

wc -l ~/tez-src/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestVertexImpl.java

It is large (~5,000 lines). You do not need to read it all. Instead:

grep -n "public void test" \
  ~/tez-src/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestVertexImpl.java \
  | head -60

List all test method names.


Step 3 — Find an Untested Transition

Compare the transitions in VertexImpl.java to the tests in TestVertexImpl.java.

Strategy:

  1. List all addTransition calls with grep -n "addTransition" VertexImpl.java
  2. For each transition, search TestVertexImpl.java for a test that covers the (fromState, eventType) pair
  3. Find one that is missing

Hint: look at transitions from INITED state. Some transitions from INITED triggered by rare events (e.g. VERTEX_FAILED before a task is scheduled) are often not explicitly tested.


Step 4 — Write the Test

Add a new test to TestVertexImpl.java. Follow the exact style of the surrounding tests:

@Test(timeout = 5000)
public void testVertexFailed_FromInitedState() {
    // TODO: initialize a vertex to INITED state using the existing test helpers
    //       then send a VERTEX_FAILED event
    //       assert the vertex transitions to ERROR or FAILED state
    //       assert any cleanup callbacks were invoked
}

Pattern to follow:

  • Look for an existing test that puts the vertex in the state you need (e.g. testVertexWithInitializer reaches RUNNING; look for a simpler path)
  • Use dispatcher.getEventHandler().handle(new VertexEventXxx(...)) to fire events
  • Use vertex.getState() to assert the resulting state

Step 5 — Run the New Test

cd ~/tez-src
mvn test -pl tez-dag \
  -Dtest=TestVertexImpl#testVertexFailed_FromInitedState -q 2>&1 | tail -20

Step 6 — Run the Full Test Class

mvn test -pl tez-dag -Dtest=TestVertexImpl -q 2>&1 | tail -10

All existing tests must still pass.


Step 7 — Write the Patch and JIRA Description

cd ~/tez-src
git diff > /tmp/TEZ-VERTEXTEST.001.patch
cat /tmp/TEZ-VERTEXTEST.001.patch

Draft JIRA:

Summary: TestVertexImpl is missing coverage for VERTEX_FAILED from INITED state

Description:
  The VertexImpl state machine defines a transition (INITED, VERTEX_FAILED)
  but TestVertexImpl has no test that fires this event path.  This patch adds
  TestVertexImpl#testVertexFailed_FromInitedState to cover the gap.

Priority: Minor
Component: tez-dag

Deeper Understanding

#Question
1What is the difference between VertexState.FAILED and VertexState.ERROR? When does the AM choose each?
2TestVertexImpl uses a mock AppContext. What methods on AppContext does VertexImpl call most frequently? (grep for appContext.)
3What is DrainDispatcher and why is it used in tests instead of AsyncDispatcher?
4Some tests set a Clock mock. Why would a state machine test need to control time?

Lab 5.3 — Build It: Integration Test with MiniTezCluster

Lab type: Build It — Maven module with a real mini-cluster integration test
Estimated time: 150 min
Maven module: book/projects/level-5-integration-test
Key class: org.apache.tez.learning.l5.TestNumberPipelineWithMiniCluster


What You Will Build

A JUnit integration test that:

  1. Starts MiniTezCluster in @BeforeClass
  2. Submits the Level 1 NumberPipelineDAG (reused from level-1-number-pipeline)
  3. Waits for the DAG to complete
  4. Reads back the counter NumberPipeline/TotalSum and asserts it equals 9900
  5. Stops the cluster in @AfterClass

This is the same pattern used by TestOrderedWordCount — you are building the exact kind of test that Tez committers write for new DAG features.


Step 1 — Create the Maven Module

book/projects/level-5-integration-test/
  pom.xml
  src/test/java/org/apache/tez/learning/l5/
    TestNumberPipelineWithMiniCluster.java

The module is a test-only module (no src/main/). It depends on:

  • org.apache.tez.learning:level-1-number-pipeline:1.0-SNAPSHOT (your DAG)
  • org.apache.tez:tez-tests (for MiniTezCluster)
  • JUnit 4.13.2
  • org.apache.hadoop:hadoop-minicluster
<dependency>
  <groupId>org.apache.tez</groupId>
  <artifactId>tez-tests</artifactId>
  <version>${tez.version}</version>
  <classifier>tests</classifier>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-minicluster</artifactId>
  <version>${hadoop.version}</version>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>org.apache.tez.learning</groupId>
  <artifactId>level-1-number-pipeline</artifactId>
  <version>1.0-SNAPSHOT</version>
  <scope>test</scope>
</dependency>

Add level-5-integration-test to the parent pom.xml modules list.


Step 2 — Write TestNumberPipelineWithMiniCluster.java

Skeleton:

package org.apache.tez.learning.l5;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hdfs.MiniDFSCluster;
import org.apache.tez.client.TezClient;
import org.apache.tez.common.counters.TezCounters;
import org.apache.tez.dag.api.DAG;
import org.apache.tez.dag.api.TezConfiguration;
import org.apache.tez.dag.app.dag.DAGState;
import org.apache.tez.dag.client.DAGClient;
import org.apache.tez.dag.client.DAGStatus;
import org.apache.tez.learning.l1.NumberPipelineDAG;
import org.apache.tez.test.MiniTezCluster;
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Test;

import static org.junit.Assert.*;

public class TestNumberPipelineWithMiniCluster {

    private static MiniTezCluster miniTezCluster;
    private static TezClient tezClient;
    private static TezConfiguration tezConf;

    @BeforeClass
    public static void setUpClass() throws Exception {
        // Start MiniTezCluster with 1 NodeManager
        miniTezCluster = new MiniTezCluster(
                TestNumberPipelineWithMiniCluster.class.getName(), 1, 1, 1);
        Configuration conf = new Configuration();
        miniTezCluster.init(conf);
        miniTezCluster.start();

        tezConf = new TezConfiguration(miniTezCluster.getConfig());
        tezConf.setBoolean(TezConfiguration.TEZ_LOCAL_MODE, false);

        tezClient = TezClient.create(
                "TestNumberPipelineClient", tezConf);
        tezClient.start();
    }

    @AfterClass
    public static void tearDownClass() throws Exception {
        if (tezClient != null) {
            tezClient.stop();
        }
        if (miniTezCluster != null) {
            miniTezCluster.stop();
        }
    }

    @Test(timeout = 120_000)
    public void testNumberPipelineTotalSum() throws Exception {
        // Build the Level 1 DAG (local mode runs fine in mini-cluster too)
        DAG dag = NumberPipelineDAG.buildDAG(tezConf);

        DAGClient dagClient = tezClient.submitDAG(dag);
        DAGStatus dagStatus = dagClient.waitForCompletion();

        assertEquals("DAG must succeed",
                DAGStatus.State.SUCCEEDED, dagStatus.getState());

        TezCounters counters = dagStatus.getDAGCounters();
        assertNotNull("Counters must be present", counters);

        long totalSum = counters
                .getGroup("NumberPipeline")
                .findCounter("TotalSum")
                .getValue();

        assertEquals("TotalSum for 0..99 must equal 4950", 4950L, totalSum);
    }
}

Adapting NumberPipelineDAG: the Level 1 project is designed for local mode (TezConfiguration.TEZ_LOCAL_MODE = true). You will need to either (a) add a static buildDAG(TezConfiguration conf) factory method that accepts an external config, or (b) create a subclass that overrides the DAG construction to accept an injected config. Choose (a).


Step 3 — Verify the Build

cd book/projects
mvn -pl level-1-number-pipeline install -DskipTests -q
mvn -pl level-5-integration-test test -q 2>&1 | tail -20

Expected:

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

Step 4 — Deep Questions

#Question
1Why does the test use dagStatus.getDAGCounters() instead of dagClient.getDAGStatus(EnumSet.of(StatusGetOpts.GET_COUNTERS))? Are they equivalent?
2The timeout is 120_000 ms. Why does a simple 100-integer DAG need 2 minutes?
3If the DAG fails, dagStatus.getState() returns FAILED and the assertion fires. How would you get the failure reason from dagStatus?
4@BeforeClass uses static fields. What happens if two test classes in the same JVM both start MiniTezCluster? How does TestOrderedWordCount handle this?
5The counter group is "NumberPipeline" and the counter name is "TotalSum". If you mistype the group name, what does getGroup() return? Does the assertion fail gracefully?

Step 5 — Experiment: Add a Second Assertion

After verifying TotalSum, add an assertion on the number of tasks run:

long inputRecords = counters
        .findCounter(TaskCounter.INPUT_RECORDS_PROCESSED)
        .getValue();
// How many input records do you expect?
assertEquals(???, inputRecords);

Think about the DAG topology:

  • Source vertex: 1 task, emits 100 integers
  • Sink vertex: 1 task, reads 100 records

What value do you expect for INPUT_RECORDS_PROCESSED across both vertices?


Step 6 — Tez Source Connection Table

Class used in this labTez source file
MiniTezCluster
TezClient
DAGClient
DAGStatus
TezCounters

Lab 5.4 — Fix It: Un-Ignore a Flaky Test in TestVertexImpl

Lab type: Fix-It — flaky test investigation and repair
Estimated time: 90 min
Tez module: tez-dag
Key class: TestVertexImpl


Overview

Large Java projects accumulate @Ignored tests that were disabled because they were "flaky" — meaning they passed sometimes and failed other times. A flaky test is almost always a symptom of a real bug: a race condition, an incorrect assertion, or missing test isolation.

In this lab you will:

  1. Find an @Ignored test in TestVertexImpl
  2. Un-ignore it and run it 10 times to characterize the failure
  3. Identify the root cause
  4. Apply the minimum fix
  5. Verify the test passes reliably

Step 1 — Find the Ignored Tests

grep -n "@Ignore\|@Disabled" \
  ~/tez-src/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestVertexImpl.java

Also search across all tez-dag tests:

grep -rn "@Ignore\|@Disabled" ~/tez-src/tez-dag/src/test/java/ | \
  grep -v "^Binary" | head -30

Step 2 — Pick a Target

Select one ignored test. Prefer tests that have a comment explaining why they were ignored — these are the most educational.

Record:

  1. The test method name
  2. The reason given in the @Ignore annotation or nearby comment
  3. Which state transition or feature it is testing

Step 3 — Un-Ignore and Run

Remove the @Ignore annotation. Run the test 10 times:

for i in $(seq 1 10); do
  mvn test -pl tez-dag -Dtest=TestVertexImpl#yourTestName -q 2>&1 | \
    grep -E "PASS|FAIL|ERROR|Tests run" | tail -1
done

Record the pass/fail pattern. Is it:

  • Always failing (deterministic bug)
  • Randomly failing (race condition or timing sensitivity)
  • Always passing (was it already fixed in this version?)

Step 4 — Diagnose the Failure

Read the test carefully. Common flaky-test patterns in Tez state machine tests:

PatternSymptomFix
AsyncDispatcher not drained before assertionAssertion fires before event is processedUse DrainDispatcher instead
Mock returns null for a method that returns a listNullPointerException in production codeStub with Collections.emptyList()
Thread.sleep(N) instead of proper synchronizationFails on slow CI machinesReplace with waitFor() or DrainDispatcher
Leaked state from another testFirst run passes, second failsVerify @Before / @After cleans up completely

Identify which pattern applies.


Step 5 — Apply the Fix

Apply the minimum fix. Options:

Option A — Replace AsyncDispatcher with DrainDispatcher

// Before (flaky):
AsyncDispatcher dispatcher = new AsyncDispatcher();

// After (deterministic):
DrainDispatcher dispatcher = new DrainDispatcher();
dispatcher.register(VertexEventType.class, vertex);
dispatcher.init(conf);
dispatcher.start();
// ... fire events ...
dispatcher.await(); // blocks until queue is empty

Option B — Add missing stub

when(mockContext.getSomeList()).thenReturn(Collections.emptyList());

Option C — Fix assertion order

Move assertions AFTER the dispatcher.await() call.


Step 6 — Verify Reliability

Run the test 20 times:

for i in $(seq 1 20); do
  mvn test -pl tez-dag -Dtest=TestVertexImpl#yourTestName -q 2>&1 | \
    grep -E "Tests run" | tail -1
done

All 20 runs must pass.


Step 7 — Run the Full Suite

mvn test -pl tez-dag -q 2>&1 | tail -10

All existing tests must pass.


Step 8 — Format the Patch and Write the JIRA

cd ~/tez-src
git diff > /tmp/TEZ-FLAKY.001.patch
Summary: TestVertexImpl#[testName] is flaky due to [root cause]

Description:
  TestVertexImpl#[testName] was marked @Ignore with the note "[original reason]".
  Investigation shows the root cause is [description].

  The fix [removes AsyncDispatcher / adds missing stub / fixes assertion order],
  making the test deterministic.

  Ran the test 20 times with the fix applied — all passed.

Priority: Minor
Component: tez-dag

Deeper Understanding

#Question
1What is the difference between AsyncDispatcher and DrainDispatcher? Where is DrainDispatcher defined?
2Why is a flaky test arguably worse than no test? What does it do to CI reliability?
3Tez's StateMachineFactory is modeled after Hadoop's. Does Hadoop's TestStateMachine use DrainDispatcher or AsyncDispatcher in its tests?
4Some Tez flaky tests are caused by System.currentTimeMillis() being called in a tight loop and the assertion depending on a specific elapsed time. How would you make such a test deterministic?

Level 6: Hive/Tez Integration

Hive-on-Tez is the largest consumer of the Tez API. Understanding how Hive translates SQL into a Tez DAG — and what can go wrong — is essential for any contributor who wants to fix real production bugs.

What Hive does with Tez

Every Hive query that runs on Tez goes through this pipeline:

SQL → Hive AST → Operator tree → MapReduceWork/ReduceWork tasks
   → TezWork → Tez DAG (vertices + edges + VertexManagerPlugins)
   → TezClient.submitDAG()

The translation layer lives in hive-exec module, specifically TezWork, DagUtils, and TezTask.

Why Tez contributors must understand Hive

  • Most real Tez bugs are first reported from Hive (a slow query, a failing shuffle, a counter discrepancy)
  • ShuffleVertexManager was built specifically for the Hive reduce pattern
  • Hive adds many VertexManagerEvent payloads that Tez must handle correctly
  • Compatibility issues between Hive versions and Tez versions are common release blockers

What this level covers

TopicLab
Trace a Hive SQL query to the generated Tez DAGLab 6.1
Read DagUtils and understand vertex/edge configurationLab 6.1
Debug a failing Hive-on-Tez query (task diagnostics, AM logs)Lab 6.2
Fix a Hive-Tez compatibility issue via a Tez patchLab 6.2

Prerequisites

  • Level 5 complete (you can submit and debug a Tez DAG)
  • Optional but helpful: basic SQL knowledge
  • Optional: Hive source checked out alongside Tez

Key classes

ClassWhereWhat it does
TezWorkhive-execContainer for all Tez DAG specifications
DagUtilshive-execBuilds Tez DAG from TezWork
TezTaskhive-execExecutes a TezWork via TezClient
ShuffleVertexManagertez-dagManages reduce-vertex scheduling
OrderedPartitionedKVOutputtez-runtime-libraryDefault Hive reduce output

Lab 6.1 — Trace a Hive SQL Query to the Generated Tez DAG

Lab type: Read & Research
Estimated time: 120 min
Key classes: DagUtils, TezWork, TezTask (all in Hive)


Overview

When you run SELECT a, COUNT(*) FROM t GROUP BY a on a Hive-on-Tez cluster, Hive builds a TezWork object (a description of what the DAG should look like) and hands it to DagUtils.createDag(). That method creates the actual Tez DAG, vertices, edges, and VertexManagerPluginDescriptors.

In this lab you will trace this path end-to-end.


Step 1 — Check Out Hive Source (Optional)

If you have Hive source:

git clone https://github.com/apache/hive.git ~/hive-src --depth=1
find ~/hive-src -name "DagUtils.java" | head -3
find ~/hive-src -name "TezWork.java" | head -3
find ~/hive-src -name "TezTask.java" | head -3

If you do not have Hive source, you can read these classes on GitHub:

  • ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java
  • ql/src/java/org/apache/hadoop/hive/ql/plan/TezWork.java
  • ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java

Step 2 — Read TezWork.java

TezWork is a directed graph of BaseWork nodes. Answer:

#Question
1What are the two main subclasses of BaseWork that represent map and reduce phases?
2How does TezWork represent edges between vertices? What class holds edge configuration?
3Where does TezWork store the VertexManagerPluginDescriptor?
4A GROUP BY query produces how many BaseWork nodes? Draw the graph.

Step 3 — Read DagUtils.createDag()

This is the core translation method. It iterates over TezWork and calls createVertex() and createEdge().

#Question
1What Tez EdgeProperty.DataMovementType does Hive use for a reduce shuffle? Where is this set?
2What VertexManagerPlugin does Hive attach to reduce vertices? Is this set unconditionally or based on a configuration flag?
3What is auto-parallelism in this context? How does Hive enable it?
4What UserPayload does Hive pass to ShuffleVertexManager? Specifically: what are the values of minFraction and maxFraction?

Step 4 — Read TezTask.execute()

This method submits the DAG and waits for completion.

#Question
1Does TezTask create a new TezClient per query, or reuse one per session?
2How does TezTask wait for DAG completion? Which Tez API does it poll?
3When a Hive query fails, what information does TezTask extract from the DAGStatus to show the user?
4TezTask updates Hive counters from Tez counters. What is the counter group mapping?

Step 5 — Tez Counterpart: ShuffleVertexManager

Open ShuffleVertexManager.java in your Tez source. Cross-reference with what you learned from DagUtils.java:

  1. The minFraction/maxFraction payload you found in Step 3 is parsed by which method in ShuffleVertexManager?
  2. When Hive enables auto-parallelism, what happens inside ShuffleVertexManager that does NOT happen when it is disabled?
  3. Where does ShuffleVertexManager call context.reconfigureVertex()? What does reconfigureVertex do to the number of reducer tasks?

Step 6 — End-to-End Mental Model

Draw (on paper or in a text diagram) the full path for:

SELECT dept, COUNT(*) FROM employees GROUP BY dept

Show:

  • Hive logical plan nodes
  • TezWork graph (label each BaseWork)
  • Tez DAG (label each vertex, edge type, VertexManagerPlugin)
  • Which Tez APIs TezTask calls

Step 7 — JIRA Research: Hive/Tez Compatibility

Search:

project = TEZ AND text ~ "hive" AND resolution = Fixed ORDER BY updated DESC

Find one issue where a Tez change broke Hive or where a Hive bug exposed a Tez issue.

  1. What was the incompatibility?
  2. Was the fix in Tez or Hive (or both)?
  3. Did the patch include a test? If so, where?

Lab 6.2 — Debug a Failed Hive-on-Tez Query

Lab type: Fix-It (diagnostics + root-cause analysis)
Estimated time: 120 min


Overview

A Hive-on-Tez query failure can originate from:

  1. Tez DAG layer — vertex scheduling error, shuffle failure, OOM
  2. Hive operator layer — deserialization error, UDF crash, wrong SerDe
  3. Infra layer — YARN container killed, HDFS quota exceeded, network timeout

In this lab you will work through a systematic diagnostic process and trace a simulated failure back to its Tez-layer root cause.


Scenario

A Hive query:

SELECT k, SUM(v) FROM large_table GROUP BY k;

fails with:

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1700000000000_0001_1_01,
diagnostics=[Task failed, taskId=task_1700000000000_0001_1_01_000000,
diagnostics=[TaskAttempt 0 failed, info=[Container container_... exited
with exitCode: -104]]

Exit code -104 means container killed by YARN for exceeding memory.


Step 1 — Identify the Layer

#Question
1Is exit code -104 a Tez error or a YARN error? Where is this code defined?
2Which vertex failed — the map or the reduce? How do you know from the diagnostic message?
3What Tez API would you call (in Java) to retrieve these diagnostics programmatically?
4The error says "TaskAttempt 0 failed". Does this mean no retries happened, or that all retries were exhausted?

Step 2 — Locate the Logs

In a real cluster:

# Get the AM logs
yarn logs -applicationId application_1700000000000_0001 \
  -log_files syslog | grep -A 20 "Reducer 2"

# Get the container logs
yarn logs -applicationId application_1700000000000_0001 \
  -containerId container_... | head -200

Questions:

  1. In the AM logs, what Tez class emits the Task failed message? (Hint: grep for TaskImpl or VertexImpl in the log.)
  2. The container log has a Java OOM or GC log. Where in TaskAttemptImpl does the container exit code get translated to a TaskAttemptEvent?

Step 3 — Identify the Tez Configuration Fix

The reduce vertex ran out of memory. The relevant configuration:

Config keyDefaultDescription
tez.am.resource.memory.mb1024AM container memory
tez.task.resource.memory.mb1024Task container memory
hive.tez.container.size-1 (inherits from mapred)Hive override for Tez task memory
hive.auto.convert.join.noconditionaltask.size10MBIn-memory join threshold
  1. Which config key should be increased to fix the OOM?
  2. Is this a Tez config or a Hive config? Which system applies it?
  3. Find where tez.task.resource.memory.mb is read in Tez source. In which class and method?

Step 4 — Tez Source Reading: Container Exit Code Handling

Find where Tez handles non-zero container exit codes:

grep -rn "exitCode\|EXIT_CODE\|ContainerExitStatus" \
  ~/tez-src/tez-dag/src/main/java/ | grep -v "test" | head -30

Answer:

  1. What class translates the YARN container exit code into a TaskAttemptEvent?
  2. Is -104 (PREEMPTED) treated differently from -1 (ABORTED)?
  3. Does Tez retry a preempted task? What configuration controls the max retries?

Step 5 — Simulate the Fix

In a real system you would increase tez.task.resource.memory.mb and rerun. Since you do not have a Hive cluster, instead:

Find the test in TestTaskAttemptImpl.java that covers container preemption:

grep -n "preempt\|PREEMPT\|exitCode" \
  ~/tez-src/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestTaskAttemptImpl.java \
  | head -20

Read the test. Answer:

  1. How does the test simulate a container exit with a non-zero exit code?
  2. What state does TaskAttemptImpl transition to on preemption?
  3. Is there a test for the full retry-until-max-attempts path?

Step 6 — Write a Diagnostic Runbook Entry

Write 5–8 bullet points as a "runbook entry" for this class of failure:

## Hive-on-Tez: Reducer OOM (exit code -104)

**Symptoms:** ...
**Root cause:** ...
**Diagnostic steps:** ...
**Fix:** ...
**Tez classes involved:** ...
**Relevant configuration:** ...

This is the kind of documentation that Tez PMC members write for operators.


JIRA Research

Search for Tez issues related to container OOM or preemption handling:

project = TEZ AND text ~ "preempt OR oom OR out of memory" AND resolution = Fixed

Find one. Read the patch. Was the fix in TaskAttemptImpl, in configuration defaults, or in a different class?

Level 7: Runtime and Shuffle

The Tez shuffle layer is the most performance-critical and most bug-prone part of the runtime. Understanding it is required for diagnosing slow queries, data-skew issues, and shuffle fetch failures.

How shuffle works in Tez

Map task → OrderedPartitionedKVOutput → TezIndexRecord (index + data files)
                                         ↓
                              ShuffleHandler (HTTP server in NM)
                                         ↓
Reduce task → OrderedGroupedKVInput ← shuffle fetcher threads
                                         ↓
                                    merge + sort → processor

Key insight: unlike Hadoop MapReduce's ShuffleConsumerPlugin, Tez's shuffle is split into framework code (tez-runtime-library) and user code (Processor). The processor never sees unsorted records — sorting happens in the runtime layer.

What this level covers

TopicLab
Trace shuffle fetch failure from AM logs to root causeLab 7.1
Add or modify an OrderedPartitionedKVOutput processorLab 7.2

Key classes

ClassWhereWhat it does
OrderedPartitionedKVOutputtez-runtime-libraryMap output: partition + sort + spill
OrderedGroupedKVInputtez-runtime-libraryReduce input: fetch + merge + sort
ShuffleFetch / Fetchertez-runtime-libraryHTTP fetch from ShuffleHandler
MergeManagertez-runtime-libraryIn-memory and on-disk merge
ShuffleHandlertez-shuffleNetty HTTP server serving map output
TezIndexRecordtez-runtime-libraryPer-partition offset+length in output file

Lab 7.1 — Debug Shuffle Behavior

Lab type: Read & Research
Estimated time: 120 min
Tez module: tez-runtime-library


Overview

Shuffle failures are the most common source of Tez bug reports. They manifest as FetchFailure events, IOException during map-output reads, or hung reduce tasks. In this lab you will trace the complete shuffle path from log line to source code.


Step 1 — Locate the Core Classes

find ~/tez-src/tez-runtime-library -name "*.java" | xargs grep -l "FetchFailure\|Fetcher\|ShuffleHandler" | head -10
find ~/tez-src/tez-shuffle -name "*.java" | head -10

Step 2 — Read the Shuffle Fetch Path

Open Fetcher.java (in tez-runtime-library) and trace the fetch loop:

#Question
1What HTTP method does the Fetcher use to request map output? GET or POST?
2What is the URL format it sends to ShuffleHandler? What parameters does it include?
3If the HTTP response code is 404, what does the Fetcher do? (Fail immediately? Retry? Report back to the InputManager?)
4What does the Fetcher do when it detects data corruption (checksum mismatch)? Which class handles checksum verification?
5How many concurrent fetcher threads does a reduce task run? What configuration key controls this?

Step 3 — Read the FetchFailure Event Path

When a fetch fails, an event travels up to the AM:

grep -rn "FetchFailure\|FETCH_FAILURE" ~/tez-src/tez-dag/src/main/java/ | \
  grep -v "test" | grep ".java:" | head -20

Trace: where does the FetchFailure event originate, and what state transition does it trigger in TaskAttemptImpl?

#Question
1What is the name of the event class that carries the fetch-failure information to the AM?
2In TaskAttemptImpl, what state does the task transition to when it receives a fetch failure?
3Does a single fetch failure kill the task, or does Tez retry? What configuration controls max fetch retries?
4What happens to the source task attempt (the map) when its output cannot be fetched? Is it re-run?

Step 4 — Read ShuffleHandler

Open ShuffleHandler.java in tez-shuffle:

#Question
1What Netty class does ShuffleHandler extend?
2How does ShuffleHandler authenticate that a requester is authorized to fetch map output? (Hint: look for TOKEN or JobTokenSecretManager.)
3Where does ShuffleHandler read the index file? What class represents the index?
4If the NM restarts while a reduce is fetching, what happens to in-flight fetch requests?

Step 5 — Read the Spill Path

Open DefaultSorter.java or PipelinedSorter.java in tez-runtime-library:

  1. At what memory threshold does a spill occur?
  2. How many spill files can accumulate before a merge is triggered?
  3. After a spill, where is the index written?

Step 6 — Common Shuffle Bug Patterns

For each pattern below, identify the relevant Tez class and the configuration that can mitigate it:

PatternClassConfig key
Slow fetch due to too few fetcher threads
OOM in reducer due to large in-memory merge buffer
Fetch failure due to ShuffleHandler authentication timeout
Data skew: one reducer processes 100× more data than others

Step 7 — JIRA Research

Search:

project = TEZ AND component = "tez-runtime-library" AND resolution = Fixed ORDER BY updated DESC

Find a recently fixed shuffle or sort bug. Read the patch:

  1. What was the bug?
  2. Was it in Fetcher, DefaultSorter, MergeManager, or ShuffleHandler?
  3. Was a test added? What does it mock or simulate?

Lab 7.2 — Modify a Processor: Add Deduplication to UnionSinkProcessor

Lab type: Fix-It / Extend
Estimated time: 90 min
Maven module: book/projects/level-3-multi-input


Overview

UnionSinkProcessor from Level 3 sums all values it receives. In this lab you will extend it to deduplicate records by key before summing — only the first record for each key is counted.

This exercise teaches:

  • How to modify a Processor that uses OrderedGroupedKVInput
  • How counters interact with deduplication logic
  • How to write a unit test for processor logic using mocks

Step 1 — Understand the Current Behavior

The current UnionSinkProcessor (Level 3) receives (Integer key, Integer value) pairs and sums all values. For the test input (0..99 integers), expected sum is 4950.

Open UnionSinkProcessor.java and answer:

  1. How does it iterate over input records?
  2. Where does it write the counter?
  3. What happens if the same key appears twice (e.g. key=5, value=5 appears from both the even source and… wait, can it? Check EvenNumberSource and OddNumberSource.)

Step 2 — Add a Deduplicating Variant

Create DeduplicatingUnionSinkProcessor.java in the same package. It should:

  1. Maintain a Set<Integer> of seen keys
  2. For each (key, value) pair from the input: if key is new, add to set and add value to the sum; otherwise skip
  3. Publish the same UnionPipeline/TotalSum counter
  4. Also publish a new counter UnionPipeline/DuplicatesSkipped

Step 3 — Write a Unit Test

Create TestDeduplicatingUnionSinkProcessor.java. Use the Mockito pattern from TestMultiInputProcessors:

@Test
public void testDuplicateKeyIsSkippedOnce() {
    // Create a mock input that returns (key=1, value=10) twice
    // and (key=2, value=20) once
    // Expected TotalSum: 10 + 20 = 30
    // Expected DuplicatesSkipped: 1
}

@Test
public void testAllUniqueKeys() {
    // No duplicates: result must equal non-deduplicating sum
}

Step 4 — Run the Tests

cd book/projects
mvn -pl level-3-multi-input test -q 2>&1 | tail -10

Step 5 — Questions

#Question
1If your deduplication Set grows very large (millions of keys), what would happen to the task JVM heap?
2The input is already sorted by key (because OrderedGroupedKVInput sorts). Could you use this property to deduplicate without a Set? Rewrite DeduplicatingUnionSinkProcessor to use O(1) memory.
3Your new counter UnionPipeline/DuplicatesSkipped — where in the Tez framework does it get propagated to the AM and eventually to DAGStatus.getDAGCounters()?

Level 8: Real Issue Contribution

This level is the transition from learner to contributor. You will pick a real open JIRA issue, reproduce it, write a patch, and go through the Apache contribution process from start to submission.

The Apache contribution loop

1. Pick an issue (JIRA)          → identify something you can fix
2. Understand the context         → read related code, existing tests, comments
3. Reproduce the bug              → write a failing test or reproduce steps
4. Implement the fix              → minimum change that passes all tests
5. Format the patch               → `git diff > TEZ-NNNN.001.patch`
6. Upload to JIRA                 → attach the patch, set status to "Patch Available"
7. Respond to review comments     → iterate, upload TEZ-NNNN.002.patch etc.
8. Patch committed                → a committer votes +1 and commits

Choosing the right issue

Good first contributions:

TypeDifficultyAcceptance rate
Missing test coverageLowHigh
Wrong error messageLowHigh
Javadoc improvementLowHigh
Logging improvementLowHigh
NPE in edge caseMediumHigh
Performance regression (small)MediumMedium
New featureHighLow (needs design discussion first)

Rule: Start with "Minor" or "Trivial" priority JIRAs. Do not attempt "Blocker" or "Critical" until you have 3+ committed patches.

What this level covers

TopicLab
Find and reproduce a real open JIRA issueLab 8.1
Implement a fix, write the test, format the patchLab 8.2
Write better error messages for failed DAGsLab 8.3

Lab 8.1 — Find and Reproduce a Real JIRA Issue

Lab type: Research & Reproduce
Estimated time: 2–4 hours (actual time varies by issue)


Step 1 — Find a Good Candidate

Go to: https://issues.apache.org/jira/projects/TEZ

Filter:

  • Status: Open
  • Priority: Minor or Trivial
  • Component: tez-dag or tez-runtime-library
  • Resolution: Unresolved

Look for issues with:

  • A small reproduction case described in comments
  • No existing "Patch Available" attachment
  • Last comment less than 1 year old

Step 2 — Read Everything

For your chosen issue, read:

  1. The original description
  2. Every comment (some comments contain critical reproduction steps)
  3. Any attached patches (even if they were rejected — understand why)
  4. Related issues in the "is blocked by" / "depends on" links

Answer for your issue:

#Question
1What is the exact symptom? (Exception? Wrong result? Performance regression?)
2Which Tez class is implicated? Which method?
3Under what conditions does the bug occur?
4Is there a unit test that would catch this if it existed?

Step 3 — Reproduce the Bug

For a unit-test-reproducible bug:

cd ~/tez-src
# Write a test that fails
mvn test -pl tez-dag -Dtest=TestVertexImpl#testMyReproduction -q 2>&1 | tail -20

For a configuration-dependent bug, write a minimal local-mode DAG that triggers it.

Record:

  • The exact exception and stack trace
  • Which class and line number triggers it
  • Whether it is deterministic or intermittent

Step 4 — Map the Root Cause

Trace from the symptom to the line of code that is wrong:

  1. Start with the exception message
  2. Find the throw site in source code
  3. Walk backwards through the call stack
  4. Identify the single line that is wrong (the real fix site is often 10 lines above the throw site)

Step 5 — Verify Your Understanding

Post a comment on the JIRA (be professional and concise):

I was able to reproduce this issue on Tez trunk (commit <hash>) with the
following minimal test case:
[paste test code or reproduction steps]

The root cause appears to be [one sentence description] at
[ClassName.java:line].  I am working on a patch.

This establishes you as working on the issue and prevents duplicate work.


Questions

  1. How long did it take you to go from "reading the JIRA" to "reproducing the bug"?
  2. Was the root cause where you expected it based on the stack trace, or did you have to trace further?
  3. Is there a comment in the code near the bug site that explains the intended behavior? Was the comment wrong?

Lab 8.2 — Implement the Fix, Write the Test, Format the Patch

Lab type: Fix-It (real JIRA)
Estimated time: 2–6 hours


Step 1 — Implement the Minimum Fix

Rules:

  • Change only what is necessary to fix the bug
  • Do not reformat surrounding code
  • Do not add unrelated improvements
  • Do not add comments unless they explain the fix
  • If the fix requires changes in multiple files, make all changes in one commit

Step 2 — Write the Test

Every Tez patch must include a test that:

  1. Fails on the original code (without the fix)
  2. Passes on the patched code

The test must be in the same test class as existing tests for the modified class.

Test quality checklist:

  • Test name clearly describes what it is testing
  • @Test(timeout = 5000) annotation (prevents hung tests from blocking CI)
  • No Thread.sleep() (use DrainDispatcher.await() or CountDownLatch instead)
  • Assertion messages explain what was expected vs. what was found
  • No hardcoded absolute paths or ports

Step 3 — Run the Full Test Module

cd ~/tez-src
mvn test -pl tez-dag -q 2>&1 | tail -10

All tests must pass.


Step 4 — Run Checkstyle

mvn checkstyle:check -pl tez-dag -q 2>&1 | grep -E "ERROR|violation" | head -20

Zero violations required.


Step 5 — Format the Patch

cd ~/tez-src
git diff HEAD > /tmp/TEZ-NNNN.001.patch

Verify:

# No trailing whitespace
grep -nP "\\s+$" /tmp/TEZ-NNNN.001.patch

# Patch applies cleanly
git apply --check /tmp/TEZ-NNNN.001.patch

Step 6 — Upload to JIRA

  1. Open your JIRA issue
  2. Click "Attach File" and upload /tmp/TEZ-NNNN.001.patch
  3. Set the "Patch Available" flag (the checkbox in the issue screen, NOT the workflow button)
  4. Update the description or add a comment:
Attaching patch TEZ-NNNN.001.patch

Changes:
- [ClassName.java]: [one-line description of the fix]
- [TestClassName.java]: [one-line description of the new test]

The test fails on unpatched code and passes with the fix applied.

After You Submit

You will typically receive review feedback within a few days to a few weeks. Common feedback categories:

FeedbackMeaningYour response
"Can you add a test?"Test is missingAdd test, re-upload
"This is too broad"Change is larger than neededNarrow scope, re-upload
"Style nit: …"Checkstyle or code styleFix, re-upload
"+1"Committer approvesWait for commit, or ask "Is this ready to commit?"
"-1"Hard blockAddress all -1 comments before re-uploading

Lab 8.3 — Improve Error Messages for Failed DAGs

Lab type: Fix-It (error message quality)
Estimated time: 90 min


Overview

Poor error messages are one of the most common complaints from Tez users. "Container exited with a non-zero exit code" tells an operator almost nothing. This lab focuses on finding and improving a diagnostic message in the Tez AM.


Step 1 — Find Weak Error Messages

Search for generic or unhelpful diagnostics:

grep -rn '"Container exited\|"Task failed\|"Vertex failed\|unknown error' \
  ~/tez-src/tez-dag/src/main/java/ | grep -v test | head -20

Also look for messages that use string concatenation on a potentially-null object:

grep -rn 'diagnostics.*\+.*null\|null.*\+.*diagnostics' \
  ~/tez-src/tez-dag/src/main/java/ | head -20

Step 2 — Pick a Target

Select one diagnostic message that you can improve. Good candidates:

  • A message that says "failed" without explaining why
  • A message that could NPE if a field is null
  • A message that uses a raw integer code without a human-readable explanation

Step 3 — Understand the Context

For your chosen message:

  1. What class emits it?
  2. What state transition triggers it?
  3. What information is available at that point (in the method parameters or fields) that could be added to the message?

Step 4 — Improve the Message

Example improvement:

// Before (unhelpful):
diagnostics.add("Container " + containerId + " failed");

// After (actionable):
diagnostics.add(String.format(
    "Container %s failed with exit code %d (%s). " +
    "Check container logs at: %s",
    containerId,
    exitCode,
    ContainerExitStatus.getExitCodeString(exitCode),
    logURL));

Step 5 — Write a Test for the New Message

The test should verify that:

  1. The improved message appears in TaskAttemptImpl.getDiagnostics() or VertexImpl.getDiagnostics() after the relevant failure event
  2. It contains the expected key fields (exit code, container ID, etc.)

Pattern:

@Test
public void testDiagnosticsContainsExitCode() {
    // ... set up failing task attempt with specific exit code ...
    List<String> diags = taskAttempt.getDiagnostics();
    assertTrue("Diagnostics should contain exit code",
        diags.stream().anyMatch(d -> d.contains("exitCode=123")));
}

Step 6 — Format Patch and JIRA

git diff > /tmp/TEZ-ERRORMSG.001.patch

JIRA title pattern: [tez-dag] Improve error message for [specific failure scenario]


Reflection Questions

  1. What makes a good diagnostic message? List 4 properties.
  2. Why do projects accumulate bad error messages over time? (Hint: think about who writes the code vs. who runs it.)
  3. Find a Tez JIRA where the only change was improving a log or diagnostic message. Was the patch accepted? How long did the review take?

Level 9: Advanced Committer / PMC-Level Contributor

At this level you move beyond fixing bugs into shaping the project: writing performance-critical tests, analyzing regressions, participating in design discussions, and understanding how Apache governance works.

The committer path

Contributor → trusted contributor (10+ accepted patches)
           → committer candidate (PMC votes)
           → committer (can merge patches)
           → PMC member (vote on releases and project direction)

Becoming a committer is about demonstrated judgment — not just writing correct code, but consistently:

  • Choosing the minimum-impact fix over the clever refactor
  • Writing tests that catch real bugs, not just satisfy coverage metrics
  • Reviewing others' patches with constructive, specific feedback
  • Following up on issues you reported or started

What this level covers

TopicLab
Write comprehensive scheduler behavior testsLab 9.1
Analyze and quantify a performance regressionLab 9.2

Lab 9.1 — Write Tests for Scheduler Behavior

Lab type: Build It — comprehensive test coverage
Estimated time: 3–4 hours
Tez module: tez-dag


Overview

The Tez task scheduler (TaskSchedulerEventHandler, CapacityTaskScheduler, FairTaskScheduler) manages how containers are requested from YARN and how pending tasks are assigned to available containers.

This is one of the least-tested areas of Tez. Well-written scheduler tests are highly valued by committers.


Step 1 — Understand the Scheduler Interface

find ~/tez-src -name "TaskScheduler.java" | head -3
find ~/tez-src -name "TaskSchedulerEventHandler.java" | head -3
find ~/tez-src -name "TestTaskScheduler*.java" | head -10

Open the scheduler interface and answer:

#Question
1What events does TaskSchedulerEventHandler process? List all event types.
2When a container becomes available, what is the algorithm for choosing which task to assign to it?
3When Tez requests a container from YARN, what resource profile does it request? (CPU + memory?)
4If YARN preempts a container, what does the scheduler do to the task that was running in it?

Step 2 — Identify Missing Coverage

grep -n "public void test" \
  ~/tez-src/tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestTaskSchedulerEventHandler.java \
  | head -30

Find 3 scenarios that are NOT covered by existing tests. Good candidates:

  • Container allocation after task is cancelled (race condition scenario)
  • Scheduling under resource pressure (all containers allocated, new task arrives)
  • Task scheduled to a blacklisted node

Step 3 — Write 3 New Tests

For each missing scenario, write a test following the pattern of the existing tests. Each test must:

  1. Set up the scheduler with a mock RMCommunicator and DAGAppMaster
  2. Drive a sequence of events
  3. Assert on the scheduler's resulting state and on calls made to the mock YARN RM
@Test(timeout = 5000)
public void testTaskScheduledAfterContainerPreempted() {
    // TODO: set up scheduler with 1 running container
    // TODO: simulate YARN preemption of that container
    // TODO: verify the task is re-queued (not dropped)
    // TODO: simulate new container allocation
    // TODO: verify the task is re-scheduled to the new container
}

Step 4 — Run and Verify

mvn test -pl tez-dag -Dtest=TestTaskSchedulerEventHandler -q 2>&1 | tail -10

Step 5 — Reflection

#Question
1The test uses mocks for YARN and the DAGAppMaster. What real behavior is NOT exercised by this approach?
2A scheduler has inherently concurrent behavior. How do the existing tests handle thread safety?
3If you were to write an integration test for the scheduler (using MiniTezCluster), what would be harder to set up than in a unit test? What would be easier to assert?

Lab 9.2 — Analyze a Performance Regression

Lab type: Research & Benchmark
Estimated time: 3–4 hours


Overview

Performance regressions are among the most impactful bugs in Tez — a 10% slowdown in shuffle can translate to significant cost at scale. But they are also the hardest to reproduce and fix.

In this lab you will:

  1. Identify a performance-sensitive code path
  2. Write a micro-benchmark using JMH
  3. Compare two implementations and quantify the difference
  4. Write a JIRA with a clear, reproducible performance report

Step 1 — Identify a Hot Path

The most performance-critical paths in Tez:

PathClassWhy it matters
Record serializationTezSerializer, WritableSerializationCalled once per record
Sort buffer writesDefaultSorter.collect()Called once per output record
Shuffle URL constructionFetcher.getFetchList()Called per fetch request
Counter incrementTezCounter.increment()Called very frequently
BitSet operationsVertexManagerPlugin.onTaskAttemptCompletedCalled per task completion

Step 2 — Add Maven Surefire Benchmark Configuration

For a quick JMH benchmark within the project:

<!-- Add to level-4-waving-manager/pom.xml if you want to benchmark BitSet -->
<dependency>
  <groupId>org.openjdk.jmh</groupId>
  <artifactId>jmh-core</artifactId>
  <version>1.37</version>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>org.openjdk.jmh</groupId>
  <artifactId>jmh-generator-annprocess</artifactId>
  <version>1.37</version>
  <scope>test</scope>
</dependency>

Step 3 — Write the Benchmark

Example: compare BitSet.andNot(clone) vs re-building the set from scratch:

@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void benchmarkBitSetAndNot(Blackhole bh) {
    BitSet scheduled = createBigBitSet(1000);
    BitSet finished = createBigBitSet(500);
    BitSet copy = (BitSet) scheduled.clone();
    copy.andNot(finished);
    bh.consume(copy.isEmpty());
}

@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void benchmarkManualIteration(Blackhole bh) {
    Set<Integer> scheduled = createBigSet(1000);
    Set<Integer> finished = createBigSet(500);
    boolean allDone = finished.containsAll(scheduled);
    bh.consume(allDone);
}

Step 4 — Run and Analyze

cd book/projects
mvn -pl level-4-waving-manager test -Dtest=WavingBenchmark -q 2>&1 | tail -30

Record:

  • Mean time per operation (nanoseconds)
  • Confidence intervals
  • Winner

Step 5 — Write the JIRA Performance Report

Summary: [ClassName] uses O(n) Set.containsAll() where O(n) BitSet.andNot() is available

Description:
  Micro-benchmark comparison of BitSet.andNot() vs Set.containsAll() for
  wave-completion detection in WavingVertexManager (and by extension any
  similar VertexManagerPlugin).

  Results (1000 tasks, 500 completed, JDK 11, M1 MacBook):

    BitSet.andNot():        X ± Y ns/op
    Set.containsAll():      X ± Y ns/op
    Speedup:                Nx

  For large DAGs with thousands of tasks, this difference compounds
  significantly over the lifetime of the DAG.

Patch: Switch from HashSet to BitSet in [ClassName].

Priority: Minor
Component: tez-dag

Reflection

#Question
1At what scale (number of tasks per DAG) would the BitSet optimization matter in practice? At 10 tasks? 10,000?
2JMH benchmarks measure throughput in isolation. What real-world factors could make the benchmark results misleading?
3Performance patches are often held to a higher standard of review than correctness patches. Why?

Contributor Mindset

This section is the "soft skills with hard edges" half of the curriculum. The technical chapters teach you how Tez works; this section teaches you how the Apache Tez project works — how decisions are made, how patches are accepted, how trust is earned, and how a contributor becomes a committer.

These are not optional skills. A technically excellent patch with poor process around it will sit on JIRA for months. A modest patch with clean process gets reviewed and committed.

Reading Order

The chapters are ordered to mirror the actual arc of a new contributor.

#ChapterWhat it answers
1Reading the CodebaseHow do I navigate ~200k LOC without drowning?
2Design via JIRAWhere does design happen in Apache projects?
3Community InteractionHow do I talk to dev@ and JIRA without burning trust?
4Patch QualityWhat does a committer-ready patch look like?
5Responding to FeedbackHow do I handle review comments well?
6CompatibilityWhat can I change without breaking users?
7MeritocracyHow does someone become a committer or PMC member?

Chapters 1–2 are pre-work — read them before opening any JIRA. Chapters 3–5 are operational — read them before submitting your first patch. Chapters 6–7 are strategic — read them when you start thinking beyond a single patch.

How This Complements the Technical Labs

The labs in Levels 1–9 build engineering competence inside the Tez codebase. This section builds the project-level competence needed to ship that work into Apache Tez itself.

The relationship is concrete:

Technical chapterMindset chapter that pairs with it
Level 2 Lab 2: Prepare a PatchPatch Quality
Level 3 deep dives on AM internalsReading the Codebase
Level 5 Tez/Hive integrationCompatibility
Level 7 protocol & wire formatCompatibility
Capstone project (capstone/)All seven mindset chapters

If you are doing the Capstone, you should have read all seven chapters in this section by the time you reach Step 8 (the patch).

What This Section Is Not

It is not generic open-source advice. Every claim, template, and procedure here is grounded in:

  • The Apache Software Foundation Way
  • The Apache Tez JIRA project (TEZ)
  • The dev@tez.apache.org mailing-list archive
  • The tez-tools/src/main/resources/tez/checkstyle.xml and other in-repo policy files
  • The @InterfaceAudience / @InterfaceStability annotations in tez-api

Where a chapter generalises, it labels the generalisation. Where it states a Tez-specific rule, it cites the in-repo file or the JIRA where the rule was set.

Prerequisites

Before this section is useful you must have:

  • A local clone of Tez at ~/tez-src (git clone https://github.com/apache/tez.git)
  • A JIRA account at https://issues.apache.org/jira/
  • A subscription to dev@tez.apache.org (send empty mail to dev-subscribe@tez.apache.org)
  • An ASF ID is not required — that comes later, with committership.

Validation for the Section

You have absorbed this section when you can:

  1. Find any feature in Tez within 10 minutes by tracing from TezClient or DAGAppMaster.
  2. Write a JIRA description that a committer can act on without follow-up questions.
  3. Produce a patch that passes mvn checkstyle:check and mvn test in changed modules on the first try.
  4. Read a @InterfaceAudience annotation and predict what you may and may not change.
  5. Explain to a colleague the difference between contributor, committer, and PMC.

The next chapter — Reading the Codebase — gives you the navigation strategy you will use through everything that follows.

Reading a 200k+ LOC Apache Codebase

Apache Tez is roughly 200,000 lines of Java across 15+ Maven modules. No single human holds it all in their head — not even the most senior committers. The skill is not memory; it is navigation. This chapter gives you the strategies committers actually use.

Module Map First

Before reading any code, learn the module shape. Run this once and pin the output:

cd ~/tez-src
find . -maxdepth 2 -name pom.xml | sort

The modules that matter for ~90% of work:

ModuleWhat lives thereWhen you read it
tez-apiPublic API: TezClient, DAG, Vertex, Edge, *DescriptorAlways start here
tez-commonShared utilities, TezConfiguration, countersTracing configs
tez-runtime-internalsTask runtime, LogicalIOProcessorRuntimeTaskFollowing a task
tez-runtime-libraryOrderedPartitionedKVOutput, shuffle inputsI/O contracts
tez-dagDAGAppMaster, schedulers, state machinesAM-side bugs
tez-mapreduceMR compat: MRInput, MROutputMR-on-Tez
tez-testsMiniTezCluster, TestOrderedWordCountIntegration tests
tez-toolsCheckstyle config, swimlanes, analyzerProcess tooling

Tez follows the Hadoop convention: code lives in <module>/src/main/java, tests in <module>/src/test/java. Protobufs live in <module>/src/main/proto.

Strategy 1: Start From the Public API, Trace Inward

Every Tez user program goes through tez-api. That makes it the only mandatory entry point. The reading order:

tez-api (what users see)
   ↓
tez-dag (what the AM does with it)
   ↓
tez-runtime-internals (what tasks do)
   ↓
tez-runtime-library (the I/Os tasks use)

Trace example — "where does parallelism come from?":

cd ~/tez-src
grep -rn "setParallelism" tez-api/src/main/java | head
grep -rn "setParallelism\|reconfigureVertex" tez-dag/src/main/java | head

You will find Vertex.setParallelism(int) in tez-api and follow it to VertexImpl.setParallelism in tez-dag. That arc — API → impl — is the canonical pattern for reading Tez.

Strategy 2: Protobufs Are the Source of Truth for Anything Serialized

Anything that crosses a process boundary (client → AM, AM → container, AM → history) is defined in protobuf. The protos are the contract; the Java is the implementation.

find ~/tez-src -name "*.proto" | sort

The four protos to internalise:

ProtoRole
tez-api/src/main/proto/DAGApiRecords.protoDAGPlan, VertexPlan, EdgePlan — the DAG on the wire
tez-api/src/main/proto/Events.protoThe event types that flow on the dispatcher
tez-common/src/main/proto/TezCommonProtos.protoCounters, plugin descriptors
tez-dag/src/main/proto/DAGProtos.protoAM-internal records

When you see a class named *Proto (e.g. DAGProtos.DAGPlan) the generated code lives in target/generated-sources/ after a build. Don't read the generated code; read the .proto.

Practical rule: if you are changing a field that appears in a proto, you are changing wire compatibility. See Compatibility.

Strategy 3: IDE Call Hierarchy + git log -S

Two tools, used together, replace 80% of speculative reading.

Call hierarchy (IntelliJ: Ctrl-Alt-H, Eclipse: Ctrl-Alt-H) answers "who calls this?". Use it on entry points like TezClient.submitDAG to find every call site in tests and examples.

git log -S answers "when and why did this code appear?".

cd ~/tez-src
git log -S "reconfigureVertex" --oneline -- tez-dag/
git log -S "reconfigureVertex" --oneline -- tez-api/

Pick the oldest commit referenced and read its JIRA:

git show <sha> | head -30
# Look for "TEZ-NNNN" in the commit message

That JIRA is the design discussion. It is more valuable than the code.

Strategy 4: Tests Are Executable Spec

The Tez test suite is the cheapest way to learn what a class does. For any class Foo.java, look for TestFoo.java:

find ~/tez-src -name "TestVertexImpl.java"
find ~/tez-src -name "TestDAGImpl.java"
find ~/tez-src -name "TestShuffleVertexManager.java"

The test names alone form a behavior spec:

grep "  public void test" $(find ~/tez-src -name TestVertexImpl.java)

For runtime behavior, integration tests in tez-tests/ are the gold:

ls ~/tez-src/tez-tests/src/test/java/org/apache/tez/test/

TestTezJobs.java and TestExceptionPropagation.java walk full DAGs end-to-end on a MiniTezCluster. Read them before guessing how a feature behaves at runtime.

Strategy 5: Keep a Reading Log

Committers have working memory of the codebase because they wrote a lot of it. You don't. Compensate with notes. Keep one file:

mkdir -p ~/tez-notes
cat > ~/tez-notes/reading-log.md <<'EOF'
# Tez Reading Log

## YYYY-MM-DD — DAG submission path
- TezClient.submitDAG(DAG) in tez-api builds DAGPlan
- → DAGClientAMProtocolBlockingPB.submitDAG (RPC)
- → DAGAppMaster.submitDAGToAppMaster
- → DAGAppMaster.startDAG → AsyncDispatcher.getEventHandler().handle(DAGEventType.DAG_INIT)

## YYYY-MM-DD — Vertex parallelism reconfiguration
- VertexManagerPlugin.context.reconfigureVertex(...)
...
EOF

Re-reading three months later, the log is gold. Without it, you re-trace the same path.

Worked Exercise: TezClient.submitDAG → AsyncDispatcher

Goal: in 90 minutes, trace the path from a user calling tezClient.submitDAG(dag) to the event landing on the DAGAppMaster async dispatcher.

Step 1 (15 min) — Find the entry

cd ~/tez-src
find tez-api/src/main/java -name "TezClient.java"
grep -n "public DAGClient submitDAG" $(find tez-api/src/main/java -name TezClient.java)

You will find an overload that takes DAG dag. Read its body. Note that it does two things: builds a DAGPlan from the DAG, then sends it via an RPC stub.

Step 2 (20 min) — Identify the RPC

grep -rn "submitDAG" tez-api/src/main/proto/

Find DAGClientAMProtocol.proto. The SubmitDAGRequestProto carries the DAGPlan. The generated stub is DAGClientAMProtocolBlockingPB. The server side implements it in tez-dag.

grep -rn "implements DAGClientAMProtocolBlockingPB\|extends DAGClientAMProtocolBlockingPB" tez-dag/src/main/java

You will land in DAGClientHandler (in tez-dag/.../dag/app/).

Step 3 (20 min) — Server-side handling

grep -n "submitDAG" $(find tez-dag/src/main/java -name "DAGClientHandler.java")

Follow submitDAGDAGAppMaster.submitDAGToAppMasterDAGAppMaster.startDAG. Inside startDAG, you will see a DAG dag = createDAG(dagPlan) and then an event dispatched through dispatcher.getEventHandler().handle(...).

Step 4 (20 min) — The dispatcher

find tez-dag/src/main/java -name "DAGAppMaster.java"
grep -n "AsyncDispatcher\|dispatcher" $(find tez-dag/src/main/java -name DAGAppMaster.java) | head

Find where dispatcher is instantiated and where event handlers are registered. The handler for DAGEventType is the DAGImpl's state machine.

Step 5 (15 min) — Record it

Open your reading log and write the four-line summary. Cite the file and line for each hop.

Validation Artifacts

After this chapter you should produce and keep:

  1. A ~/tez-notes/module-map.md with one sentence per module.
  2. A ~/tez-notes/reading-log.md with the submitDAG trace from the exercise above.
  3. A grep-able list of the four protos and what each one defines.
  4. One git log -S command and the JIRA it surfaced, saved to the log.

When you can do the exercise without checking this page, you have the navigation skill. The next chapter — Design via JIRA — tells you where the design decisions behind that code actually lived.

Design via JIRA, Not PRs

Apache projects design in the open. In Tez, "the open" is the TEZ JIRA project and the dev@tez.apache.org mailing list — not the GitHub PR.

A PR with a "see what you think" attitude and no JIRA attached will be ignored. A JIRA with a clear problem statement and rough design will get responses within days, often from people who never read the PR. This chapter is about why, and how to use that system.

Why Not Just PRs?

GitHub PRs at Apache are mirrors of patches. They are convenient for diff viewing, but they are not the system of record. The system of record is:

ArtifactSystemWhy there
Bug report, problem statementJIRASearchable, citeable forever
Design discussionJIRA + dev@Archived by the ASF, public
Patch / code reviewJIRA attachment or PR linked from JIRAReviewed under ASF ICLA
Vote on release / committerdev@ / private@Required by ASF policy
The final codegitThe result, not the discussion

If a discussion happens only on a PR and the PR is later force-closed or the repo moves, the rationale evaporates. JIRA + mailing list don't move.

Concrete consequence: when you read code in tez-dag/ and ask "why?", the answer is almost certainly in a JIRA referenced from the commit message — see Reading the Codebase, Strategy 3.

The TEZ JIRA Workflow

A Tez JIRA moves through these statuses:

Open → In Progress → Patch Available → Resolved → Closed
                                    ↘ Reopened

Triggers:

TransitionTriggered byMeans
Open → In ProgressAssignee starts workDon't duplicate this
In Progress → Patch AvailablePatch (or PR) is ready for reviewReviewers, please look
Patch Available → ResolvedCommitter commits itDone in trunk
Resolved → ClosedRelease ships containing the fixDone for users
Resolved → ReopenedBug returns or revert neededRe-do

You only set "Patch Available" yourself. Everything else above the dotted line is yours; everything below requires a committer.

Reading Old JIRAs for Context

The single highest-leverage Tez skill is reading old JIRAs. Conventions:

  • Issues are referenced as TEZ-NNNN in commit messages and source comments. You will see // see TEZ-3045 or // TEZ-1597 peppered through the code.
  • Search them at https://issues.apache.org/jira/browse/TEZ-NNNN.
  • The "Activity" tab shows the design conversation. The "Attachments" tab shows the patch iterations (TEZ-NNNN.001.patch, TEZ-NNNN.002.patch, ...).

Try this now:

cd ~/tez-src
git log --all --oneline | grep -oE "TEZ-[0-9]+" | sort -u | tail -20

Pick one and open it in a browser. Read the description, the comments, and the patch iterations. You will see the design happen — alternative considered, rejected, refined. This is more useful than any architecture document because it shows reasoning, not conclusions.

When to Open a JIRA Yourself

You open a JIRA before writing the patch when any of the following is true:

SituationOpen JIRA?
Typo in Javadoc or log messageYes (small, but track it)
One-line bug fix with obvious causeYes
Multi-file refactorYes, with a brief design
New public APIYes, mandatory, with [DISCUSS] on dev@ first
New configuration keyYes
Performance change with measurable impactYes, with benchmark plan
Anything touching DAGPlan protoYes, with compatibility note

You do not need a JIRA to:

  • Ask a question on dev@ or user@
  • File a documentation question
  • Patch a private fork

The JIRA Description Skeleton

A Tez JIRA description that committers can act on contains, in order:

## Problem

(Two to four sentences. What is wrong. Who hits it.)

## Reproduction

(Steps to reproduce, or a code sample. If a test reproduces it, name the test class.)

## Root Cause

(One paragraph. Cite file and method.)

## Proposed Fix

(One paragraph. What you intend to do. Mention any alternatives considered.)

## Compatibility

(One sentence. Wire compat? API change? Config rename? "None." is a valid answer.)

## Test Plan

(One paragraph. Which tests pass after the change. Any new test added.)

A trivial bug fix may collapse Compatibility and Test Plan to one line each. A new API must expand them.

Design Doc on a JIRA — Skeleton

For anything larger than a single-file fix, attach a design doc (Markdown or PDF) to the JIRA. The skeleton:

# TEZ-NNNN: <short title>

## 1. Problem
What is wrong today. Who is affected. Why "do nothing" is not acceptable.

## 2. Goals
Bulleted, testable. "DAGPlan submission survives a 10 MB plan without OOM."

## 3. Non-Goals
What this design explicitly will not address. Prevents scope creep.

## 4. Alternatives Considered
- Option A: <description>. Pros / Cons. Why rejected.
- Option B: <description>. Pros / Cons. Why rejected.
- Option C (chosen): <description>. Pros / Cons.

## 5. Chosen Approach
Architecture sketch. Mermaid or ASCII. Cite files that will change.

## 6. Compatibility
- Wire compat: <change to any proto? backward compatible?>
- API compat: <InterfaceAudience.Public touched? deprecation plan?>
- Config compat: <new keys? renamed keys? default change?>

## 7. Test Plan
- Unit tests: which classes
- Integration: MiniTezCluster scenarios
- Manual: any out-of-suite verification

## 8. Rollout
- Default off? On? Feature flag name?
- Migration steps for existing users.

Attach as TEZ-NNNN-design.md or TEZ-NNNN-design.pdf. Announce it on dev@ with subject [DISCUSS] TEZ-NNNN: <short title> and a link.

Expect 1–2 weeks of asynchronous discussion before consensus. Do not start patching until the design is at least loosely agreed — patches without design buy-in get rejected.

"See TEZ-NNNN" — The Codebase Convention

Search the Tez source for back-references:

cd ~/tez-src
grep -rn "TEZ-[0-9]" tez-dag/src/main/java | head -20

Every such reference is a permanent link from the code to a design conversation. When you add a non-obvious workaround, you do the same — leave a // TEZ-NNNN: <one line why> so the next reader can find your reasoning.

When the Design Lives on dev@ Only

Some discussions never reach JIRA — release planning, branch policy, build infrastructure. Those live on dev@tez.apache.org only. Archive:

  • https://lists.apache.org/list.html?dev@tez.apache.org

Search by subject prefix:

PrefixMeans
[DISCUSS]Open question, no decision sought yet
[PROPOSAL]Concrete proposal, feedback wanted
[VOTE]Decision being made; 72h window
[ANNOUNCE]One-way: release, new committer
[NOTICE]One-way: infrastructure change

Subscribing: send empty mail to dev-subscribe@tez.apache.org.

Validation Artifacts

After this chapter you should be able to produce:

  1. The URL of three different TEZ-NNNN JIRAs cited from the Tez source, and a one-line summary of what each one is about.
  2. A draft JIRA description (in a local file ~/tez-notes/draft-jira.md) for a bug or improvement you have noticed, following the skeleton above.
  3. A subscription confirmation to dev@tez.apache.org.
  4. One archived [DISCUSS] thread URL relevant to a Tez area you care about.

The next chapter — Community Interaction — covers how to actually post on dev@ and behave on JIRA without burning trust on day one.

Community Interaction

This chapter covers the operational mechanics of communicating with the Apache Tez community — dev@tez.apache.org, JIRA, and the project's chat presence. Most of the "rules" below are not Tez rules; they are Apache-wide conventions that 25 years of mailing lists have settled into. Violating them is not a hanging offence, but it does mark you as new and costs you a small amount of credibility you have not yet earned.

The Lists

Tez has the standard ASF list set:

ListPurposeWho reads
dev@tez.apache.orgDevelopment discussion, design, votesContributors, committers, PMC
user@tez.apache.orgUsage questions, "how do I"Users, some committers
commits@tez.apache.orgAuto-mailed commit notificationsMostly bots; subscribe to follow trunk
issues@tez.apache.orgAuto-mailed JIRA notificationsBots, some committers
private@tez.apache.orgPMC-only (new-committer votes, security)PMC only

Subscribe to a list by sending an empty mail to <list>-subscribe@tez.apache.org. Confirm the reply. Unsubscribe via <list>-unsubscribe@tez.apache.org.

Default for new contributors: subscribe to dev@ and user@. Add issues@ once you are actively tracking JIRAs.

Mail Etiquette: Subject Prefixes

Subject lines on dev@ use ASCII-bracketed prefixes so subscribers can filter. Use them.

PrefixWhen
[DISCUSS]Open-ended question or design idea, no vote yet
[PROPOSAL]Concrete proposal seeking comment
[VOTE]Vote in progress; body has voting rules
[VOTE][RESULT]Closing a vote; tallies the result
[ANNOUNCE]One-way announcement (release, new committer)
[NOTICE]Infrastructure / branch / policy change
[jira] [Created] etc.Auto-prefixed by the JIRA bot; don't compose these

For a JIRA-related question, the subject is usually Re: [jira] [Created] (TEZ-NNNN) <title> — a reply to the bot mail.

Examples of good subjects:

  • [DISCUSS] Promoting MROutput#getDelegationToken to @Public
  • [PROPOSAL] TEZ-4321: Caching DAG plans across submissions
  • [VOTE] Apache Tez 0.10.4 RC1
  • [ANNOUNCE] New Tez committer: NAME

Mail Etiquette: Formatting

The ASF lists are plaintext-first. The hard rules:

  1. Plain text only. No HTML, no rich text. Most clients have a "Send as plain text" toggle; set it as the default for *@apache.org recipients.
  2. Inline reply, not top-post. Quote the relevant lines, reply below each.
  3. Wrap at ~78 columns. Long unbroken lines render badly in archives.
  4. Sign off. First name or first + last; not your full corporate signature block.
  5. No attachments over a few KB. Patches go on JIRA, not the list.
  6. No images. Diagrams as ASCII or as links to images hosted elsewhere.

A good dev@ reply looks like:

On Tue, May 7, 2024 at 10:14 AM, Foo Bar <foo@example.com> wrote:
> I think we should change the default of tez.am.resource.memory.mb
> from 1024 to 2048 to handle large DAGs better.

Agreed for large DAGs, but 2048 doubles the AM footprint for everyone
running small jobs (most CI users). Could we instead size it based on
DAGPlan size, falling back to 1024? Sketch:

  am_mem_mb = max(1024, dagPlanBytes / 1024 * 4)

I can prototype on TEZ-4XXX if there's interest.

-- 
Jane

What it doesn't have: HTML, a corporate disclaimer, a 2 MB inline screenshot, "+1" with no context, or "any updates?" with no quoted reference.

JIRA Etiquette

JIRA is the system of record for code-touching work. The mores:

Don't reassign

The Assignee field belongs to whoever is doing the work. If a JIRA is assigned to someone else, do not reassign it to yourself, even if it's been idle for a year. Comment first:

Hi @ASSIGNEE, I'd like to pick this up if you're not actively working on it. Happy to hand back if you have an in-flight patch. If I don't hear back in a week I'll assign to myself.

After a week of silence, then take it.

Ask before claiming high-traffic JIRAs

For high-visibility issues (release blockers, anything with multiple watchers), comment "I'll take a look at this" before you set yourself as assignee. This prevents two people working on the same fix.

"Patch Available" semantics

Setting status to Patch Available is a signal that means:

  • A patch (or PR linked from the JIRA) is attached
  • It applies cleanly to the current trunk
  • The author believes tests pass locally
  • The author is requesting review

It does not mean "I am still iterating." If you upload a draft, leave the status as In Progress and say so in a comment.

Status flow you control vs. don't

You may setMeans
Open → In ProgressStarting work
In Progress → Patch AvailableReady for review
Patch Available → In ProgressReopening to revise after feedback
Comment with new patchIteration
Committer-onlyMeans
Patch Available → ResolvedCommitted
Resolved → ClosedReleased
Any → ReopenedBug returned

Patch naming convention

Patches attached to JIRA use the convention TEZ-NNNN.NNN.patch:

TEZ-4321.001.patch   <- first iteration
TEZ-4321.002.patch   <- after first review round
TEZ-4321.003.patch   <- after second review round

Branch-specific patches add a branch suffix:

TEZ-4321.branch-0.10.001.patch

Old patches stay attached — never delete them. The history is part of the review record.

Where the Tez Community Currently Lives

Tez does not have an official Slack or Discord. The active channels are:

ChannelUse
dev@tez.apache.orgPrimary, for all dev discussion
user@tez.apache.orgUsage questions
JIRAPer-issue discussion
ASF Slack (the-asf.slack.com), #tez if it existsInformal, ephemeral

If a #tez Slack channel does not exist, do not assume one. The mailing list is the official channel and is where decisions are made and archived. Slack/IRC is at most a hallway conversation that must be summarised back to the list.

Sister projects you may need to follow because Tez integrates with them:

  • dev@hive.apache.org — Hive on Tez execution issues
  • dev@hadoop.apache.org — YARN / HDFS compatibility
  • dev@pig.apache.org — Pig on Tez (mostly inactive but exists)

Self-Introduction Template

A first post to dev@tez.apache.org after subscribing is optional but helpful. Keep it short:

Subject: [DISCUSS] Introduction and intent to contribute

Hi all,

I'm <first> <last>, a <role> at <company / "independent">. I've been
using Tez via Hive in production for ~<N> months and have been
reading the codebase to understand <component / area>.

I'm interested in contributing in the area of <one or two concrete
areas, e.g. "shuffle reliability" or "AM logging">. I've worked
through Levels 1-4 of the open-source-engineer curriculum and have
TEZ-NNNN (small Javadoc fix) ready as my first patch.

Happy for any pointers on first issues to tackle.

Thanks,
<First>

What this does:

  • Signals you've done homework (not asking "how do I start?")
  • Names a concrete area so committers can match you to mentors
  • References a tiny first patch, so you've already shown you understand the workflow

What to avoid:

  • "I'd like to contribute, please assign me a task" (no committer will do this for you)
  • A list of grand redesigns
  • A corporate signature block

Asking a Question on user@ Well

The format that gets answers:

Subject: Tez 0.10.x: AM OOMing on submission of 200-vertex DAG

Versions:
  Tez 0.10.3
  Hadoop 3.3.6
  Hive 3.1.3
  JDK 11

Symptom:
  TezClient.submitDAG throws OOM after ~12 seconds. AM log attached
  shows GC overhead limit exceeded inside DAGImpl.init.

Reproduction:
  - submit DAG with 200 vertices, each with 5 inputs
  - tez.am.resource.memory.mb = 1024 (default)

What I tried:
  - bumping to 2048 — works
  - reducing parallelism — works around but unwanted

Question:
  Is there a known scaling limit for DAGPlan size with default AM
  memory? Should the AM default scale with DAGPlan size?

Logs / DAG: <link to gist or paste in JIRA>

It gives versions, symptom, reproduction, what was already tried, and a focused question. A question that omits any of these gets a "please provide more info" reply, costing a round-trip day.

Validation Artifacts

After this chapter you should have, on disk and in the public archive:

  1. A subscription confirmation to dev@tez.apache.org and user@tez.apache.org.
  2. A self-introduction email posted to dev@, with archive URL saved.
  3. One inline-reply (not top-post) reply to an existing dev@ thread.
  4. A draft JIRA in JIRA (status Open) describing a real issue you've noticed.
  5. A ~/tez-notes/etiquette.md cheatsheet with the subject prefixes table.

The next chapter — Patch Quality — is what your first attached patch needs to look like.

Patch Quality

A "patch" in Apache parlance is a unified diff attached to a JIRA (or, more recently, a GitHub PR linked from a JIRA). This chapter tells you what a committer is looking for when they open it for the first time. Internalising these expectations is the difference between a patch that gets committed in two review rounds and one that dies after a "please rebase" comment in month three.

What Committers Look For — In Reading Order

A committer reviewing your patch does, roughly, this:

1. Read JIRA description.        (30 sec)
2. Open the patch, skim the diff stat.   (30 sec)
3. Look at tests.                (2 min)
4. Look at the implementation.   (5 min)
5. Run mvn install / mvn test.   (background)
6. Comment.                      (variable)

Notice tests come before implementation. If the test diff is empty or weak, the implementation is read with suspicion. If the test diff is strong and minimal, the implementation is read with trust.

Rule 1: Minimum Diff

The single rule that most distinguishes a strong patch from a weak one. The diff should contain only the changes that the JIRA describes. Not:

  • A whitespace cleanup of the surrounding method
  • A rename of an unrelated variable you didn't like
  • An import reorder by your IDE
  • A bumped dependency version "while you were here"
  • A reformatted block

Every line you change costs the reviewer attention. Lines that don't serve the JIRA are a tax on the review.

Check before submitting:

cd ~/tez-src
git diff --stat origin/master
git diff origin/master | head -50

If git diff --stat shows changes in files unrelated to the JIRA, revert them:

git checkout origin/master -- path/to/unrelated/file

Rule 2: No Unrelated Changes

The corollary to Rule 1. Even within a touched file, do not bundle unrelated improvements. If you notice a separate bug while fixing your bug:

# don't fix it here. Open a separate JIRA:
echo "Noticed: VertexImpl.java:842 catches Exception too broadly" >> ~/tez-notes/queue.md

File a follow-up JIRA at the end of the week. Two small patches beat one mixed patch every time.

Rule 3: Apache Commit Message Format

The exact format used in git log for committed Tez changes:

TEZ-NNNN: <short imperative summary, under 72 chars>. (<contributor-name> via <committer-name>)

Verify with:

cd ~/tez-src
git log --oneline -20

You will see lines like:

abc1234 TEZ-4321: Fix NPE in VertexImpl.recover when no inputs. (Jane Doe via gunther)
def5678 TEZ-4322: Add MR compat test for vectorized output. (John Smith via gopalv)

When you submit, your commit message has the contributor side only:

TEZ-4321: Fix NPE in VertexImpl.recover when no inputs.

The committer appends (Jane Doe via <committer>) at commit time. Don't pre-fill it.

The summary line rules:

  • Imperative mood: "Fix", "Add", "Remove", "Refactor" — not "Fixed", "Adding".
  • Under 72 characters.
  • Ends with a period.
  • No trailing whitespace.

If the change needs more explanation, leave one blank line and add a body wrapped at 72 columns:

TEZ-4321: Fix NPE in VertexImpl.recover when no inputs.

When a vertex has no Inputs (a root data-source vertex with no
upstream edges), VertexImpl.recover called .iterator() on a null
inputs collection. The fix initialises inputs to an empty list in
the recover path.

Adds TestVertexImpl.testRecoverNoInputs covering the case.

Rule 4: Tests for Behavior Changes

Any behavior change must come with a test. This includes bug fixes — the test should fail before your fix and pass after. Verify:

cd ~/tez-src
# stash your fix
git stash
# run the new test
mvn test -pl tez-dag -Dtest=TestVertexImpl#testRecoverNoInputs
# it should fail
git stash pop
mvn test -pl tez-dag -Dtest=TestVertexImpl#testRecoverNoInputs
# now it should pass

If your "bug fix" passes with the test added but without the fix applied, your test doesn't actually exercise the bug.

Exceptions where a test is not required:

Change typeTest needed?
Javadoc fixNo
Log message string changeNo
Comment / formatting (rare; should be its own patch)No
Build / Maven config changeUsually no, but justify
Behavior changeYes, always

Rule 5: No Whitespace Churn

Whitespace-only diff lines are noise. IDEs love to insert them — turn off "format on save" for tez-src, or restrict it to lines you edited.

Detect before submitting:

cd ~/tez-src
git diff -w origin/master --stat
git diff origin/master --stat

If the second shows many more changed files than the first, you have whitespace churn. Either clean it up or, if it's pervasive, configure your editor and re-do the change.

Rule 6: Javadoc for @Public API

If you add or modify a method on a class annotated @InterfaceAudience.Public, it needs javadoc. The check:

cd ~/tez-src
grep -l "@InterfaceAudience.Public" tez-api/src/main/java -r | head

For each such class, every public method has Javadoc with at least:

  • One-sentence summary
  • @param for each parameter
  • @return for non-void
  • @throws for any non-RuntimeException declared exception

If your patch adds a new public method without Javadoc, expect the first review comment to ask for it.

Rule 7: @InterfaceAudience and @InterfaceStability Annotations

Every public-ish class in tez-api is annotated. Example from Vertex.java:

@Public
@Evolving
public class Vertex {
    ...
}

The grid:

@Stable@Evolving@Unstable
@PublicCompat guaranteed across minor versionsMay change between minor versions with warningMay change between any release
@LimitedPrivate({"Hive"})Stable for named projectsEvolving for named projectsUnstable, named projects only
@PrivateInternal; do not depend onInternalInternal

When you add a new class to tez-api, you must annotate it. The annotations live in tez-api/src/main/java/org/apache/hadoop/classification/. When in doubt, default to:

@Public
@Unstable

so users see the class but know not to depend on its shape yet.

Rule 8: Pre-Submit Checklist

Before you upload TEZ-NNNN.001.patch, run each of these and have all pass.

cd ~/tez-src

# 1. Full compile, all modules, no tests.
mvn install -DskipTests

# 2. Checkstyle. Tez uses the config in tez-tools/.
mvn checkstyle:check

# 3. Tests in modules you changed.
# For tez-dag, tez-api, etc.:
mvn test -pl tez-dag
mvn test -pl tez-api

# 4. A representative integration test.
mvn test -pl tez-tests -Dtest=TestOrderedWordCount

# 5. Patch applies cleanly to current master.
git fetch origin
git rebase origin/master
git diff origin/master > /tmp/TEZ-NNNN.001.patch
cd /tmp
git -C ~/tez-src apply --check TEZ-NNNN.001.patch

If any step fails, fix and re-run. Submit only when all pass.

Rule 9: Patch Generation

Generate the patch from a clean rebase against origin/master:

cd ~/tez-src
git fetch origin
git rebase origin/master           # resolves conflicts now, not at commit time
git diff origin/master --no-color --unified=5 > TEZ-NNNN.001.patch

The --unified=5 gives reviewers 5 lines of context instead of the default 3. This is a small kindness that makes review materially easier.

Inspect the patch before attaching:

wc -l TEZ-NNNN.001.patch          # how big is it?
head -30 TEZ-NNNN.001.patch       # right files?
grep -c "^+" TEZ-NNNN.001.patch   # added lines
grep -c "^-" TEZ-NNNN.001.patch   # removed lines

A patch of 50–300 lines is comfortable for a single review round. A patch over 1000 lines will sit unreviewed until you split it.

Worked Example — A Minimal Trivial Patch

A real-shape patch for a Javadoc fix on Vertex.java:

diff --git a/tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java b/tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java
index abcdef1..1234567 100644
--- a/tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java
+++ b/tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java
@@ -180,7 +180,10 @@ public class Vertex {
   }

   /**
-   * Set the parallelism.
+   * Set the parallelism (number of tasks) for this Vertex.
+   *
+   * @param parallelism the number of tasks. Must be > 0 unless
+   *                    {@link #setVertexManagerPlugin} configures a dynamic plugin.
+   * @return this Vertex, for chaining.
    */
   public Vertex setParallelism(int parallelism) {

That's the entire patch — 5 changed lines, +6/-1. No test (Javadoc only). It passes checkstyle:check, mvn install -DskipTests, and the JIRA is TEZ-NNNN: Improve Javadoc on Vertex#setParallelism.

Anti-Patterns

What committers flag immediately:

Anti-patternWhy it's flagged
Reformat of an entire fileHides the real change
// TODO: refactor comment addedShould be a separate JIRA
System.out.println left inUse LOG, never System.out
e.printStackTrace()Use LOG.warn(msg, e)
Catch Exception swallowing everythingCatch specific or rethrow
New configuration key with no @Public annotationWon't be honored as stable
New method with throws ExceptionUse specific exceptions
Test that always passes (no assertion)Useless
Test depending on wall-clock timingFlaky
@Ignore added to silence a failing testFix it or revert

Validation Artifacts

After this chapter you should have:

  1. A ~/tez-notes/precommit.sh script running the seven pre-submit commands above.
  2. One actual patch file TEZ-NNNN.001.patch on disk, even if you haven't uploaded it.
  3. A ~/tez-notes/patch-checklist.md cheatsheet from Rule 8.
  4. Knowledge of the @InterfaceAudience / @InterfaceStability matrix.

The next chapter — Responding to Feedback — covers what happens after you press "Attach".

Responding to Feedback

Your patch is attached. A committer comments. What happens next is the most underrated skill in open source: turning review comments into a committed patch without burning the reviewer's patience or your own. This chapter is the playbook.

The Asynchronous Reality

Apache review is asynchronous and bursty. The committer who reviews your patch may:

  • Be in a different time zone (most likely)
  • Be reviewing on weekends or commute time
  • Have other patches queued
  • Be the only person in the world who deeply knows the file you touched

Practical consequences:

RealityWhat it means for you
Reviews come in bursts, not steady dripRespond within 24–48h of the burst, then wait
Patches sit for weeks between roundsKeep a ~/tez-notes/in-flight.md list
Same committer often reviews 2–3 of your patches in one sittingHave all of them ready
A committer may never come backPolite ping on JIRA at 2 weeks, dev@ at 4

Set the expectation early — both for yourself and for reviewers — that a non-trivial patch takes 3–6 weeks from first attach to commit. Optimise for round-trip count, not round-trip duration.

Address Each Comment Explicitly

Reviewers leave per-line comments on a patch (on JIRA in older Tez, on a PR in newer). Each comment needs an explicit response. Not implicit. The committer should not have to diff your old and new patches to figure out which feedback you took.

The pattern:

Reviewer:  L243: This catches Exception too broadly. Tighten to IOException.

You (in JIRA comment when attaching .002):

  Addressed in .002:
    - L243: tightened to IOException; rethrowing wrapped TezException as before.
    - L301: added the missing null check you mentioned.
    - L427: pushed back; see explanation below.

This three-line response is more valuable than a perfect patch with no commentary. It shows you read every comment and decided about each one.

Don't Argue Without Evidence

When a committer says "this is wrong" and you disagree, the natural reflex is to defend. The Apache-effective reflex is to provide evidence.

Bad:

I don't think changing this would help.

Good:

I tried the suggested approach in a local branch. It causes TestVertexImpl#testRecover to fail because REASON. Output:

java.lang.AssertionError: expected 3 attempts, got 2
  at ...

Suggesting we keep the current approach with the additional comment you also asked for.

Three rules for pushback:

  1. Always try the alternative first. Often the committer is right and you didn't see it.
  2. Quote the failing test or benchmark. Numbers and stack traces close arguments.
  3. Offer the smallest possible compromise. "Keep current behavior but add the comment you asked for" is much easier to accept than "no."

When to Push Back

You should push back when:

  • The committer's suggestion would break a documented behavior of a @Public API.
  • The committer's suggestion contradicts another committer's suggestion (cite the other).
  • The committer's suggestion expands scope beyond the JIRA (offer to file a follow-up).
  • You have a measurement (perf, memory) that contradicts the suggestion.

You should not push back when:

  • It's a style preference and you don't strongly care. Take it; save your capital.
  • It's a test-coverage ask. Add the test.
  • It's a "split this into two patches" ask. Split it.
  • It's "rename this method." Rename it.

The principle: defend the substance of the patch, never the shape.

When to Abandon

Most patches that get abandoned should not have been opened in the first place. But some get abandoned mid-review and that's the right call. Signals:

SignalRight action
Two committers disagree on the approach, irreconcilableWait for them to resolve on dev@; don't ping-pong patches
The JIRA is rejected as "won't fix" after design discussionClose the JIRA, archive the patch locally, move on
The required change is much larger than you estimated and you can't commit the timeComment honestly, unassign yourself, leave the JIRA open
The codebase has changed significantly and a complete rewrite is neededComment, unassign, leave for someone else

Abandoning is a respectable outcome. Ghosting a patch is not. If you can't continue, say so on the JIRA in one sentence:

Stepping away from this; my time has been redirected. Unassigning so someone else can pick it up. Latest patch (.003) is a good starting point but needs the test reviewer @NAME asked for.

Post a New Patch with a Clear Delta

When you upload TEZ-NNNN.002.patch, leave a JIRA comment that lists the deltas from .001:

Posted .002. Delta from .001:

- L243: tightened catch to IOException, per @<reviewer>.
- L301: added null check, per @<reviewer>.
- L427: kept current logic; rationale above.
- Added testRecoverNoInputs in TestVertexImpl.

mvn install -DskipTests, mvn checkstyle:check, mvn test -pl tez-dag all pass.

Why this matters:

  • Reviewer can re-review by diffing the delta, not the full patch.
  • Future readers of the JIRA see the iteration history at the JIRA level, not just in git.
  • It demonstrates the patch had real iteration, not a vibes-based "I changed some stuff."

Diff your own patches locally:

diff -u TEZ-NNNN.001.patch TEZ-NNNN.002.patch | less

Thank the Reviewer

After commit, comment on the JIRA:

Thanks @COMMITTER for the review and commit. Thanks @OTHER-REVIEWERS for the feedback.

This is not perfunctory. Apache is a long game. The committer who reviewed your first patch is likely to review your tenth. They are humans investing volunteer attention.

Acknowledgement also matters at the project level — it shows other onlookers that the project's reviewers are responsive, which makes the next contributor more likely to attempt a patch.

The Shepherd Committer

For non-trivial JIRAs, especially design-heavy ones, one committer often becomes the "shepherd" — the de facto reviewer and merge-committer. The relationship:

Their roleYour role
Reviews each patch iterationAddresses comments promptly
Surfaces concerns from other committersTreats them as that committer's concerns, not the shepherd's
Commits the final patchProvides commit message text
May ask for sub-JIRAsFiles them, links them
Champions the design on dev@ if questionedProvides ammunition (numbers, tests)

Spotting a shepherd: after 2–3 review rounds with the same committer, they're shepherding. Direct future questions on the JIRA to them ("@COMMITTER, would you prefer A or B for the rename?"). Don't ping multiple committers in parallel; that fragments attention.

When to Ping

JIRA pings have a half-life. Use them sparingly.

Wait time since last activityAction
< 1 weekDon't ping. Reviewers are busy.
1–2 weeksComment on JIRA: "Friendly ping — anything blocking on my side?"
2–4 weeksRe-ping on JIRA, cc'ing any prior reviewer by @-mention.
> 4 weeksMention on dev@ in a [DISCUSS] thread: "TEZ-NNNN has been quiet for a month, anyone willing to take another look?"

What kills a patch dead: pinging weekly or daily. After two such pings, reviewers deprioritise the patch out of self-defence. Don't.

Worked Example — A Full Round-Trip

JIRA: TEZ-4321, "Fix NPE in VertexImpl.recover when no inputs."

Day 0:  You attach TEZ-4321.001.patch, set status to Patch Available.
Day 4:  Committer @gunther comments:
          L88: prefer Collections.emptyList() over new ArrayList<>()
          L92: add test for the no-inputs case
          L94: should we also handle no-outputs symmetrically?

Day 5:  You reply on JIRA:
          - L88: agreed, will fix.
          - L92: agreed, adding TestVertexImpl#testRecoverNoInputs.
          - L94: noticed but out of scope for this JIRA. Filed TEZ-4329 for follow-up.

Day 5:  You attach TEZ-4321.002.patch and a delta-summary comment.

Day 9:  @gunther comments: "+1 LGTM"

Day 10: @gunther commits as
          "TEZ-4321: Fix NPE in VertexImpl.recover when no inputs. (Jane Doe via gunther)"
        and sets status to Resolved.

Day 10: You comment:
          "Thanks @gunther. Working on TEZ-4329 next."

10 days, 2 patch rounds, 1 follow-up JIRA filed, 0 arguments. That is a healthy review.

When Feedback Comes from a Non-Committer

Non-committers can review too. Their +1 is non-binding (only committers' votes count for commit), but their feedback is often substantively excellent — they may know the area better than the committer who eventually commits.

Treat non-committer feedback exactly like committer feedback: address each comment, explain, iterate. Two non-binding +1s also signal to a committer that the patch is ready to consider, accelerating attention.

Validation Artifacts

After this chapter you should have:

  1. A ~/tez-notes/in-flight.md listing any JIRA you currently have a patch on, with the date of last activity.
  2. A template for the "delta from previous patch" comment, saved as ~/tez-notes/delta-template.md.
  3. Internalised the four-tier ping schedule.
  4. The reflex to thank the committer after merge.

The next chapter — Compatibility — is the technical knowledge you need so reviewers don't have to teach you compatibility rules during review.

Compatibility

Tez is a library that ships into long-lived production clusters running Hive, Pig, and custom DAG applications. A compatibility break in Tez ripples out to every downstream project that depends on it. This chapter is the operational knowledge of what you may and may not change without breaking users.

The Three Compatibility Surfaces

Tez has three distinct compatibility surfaces, each with different rules:

SurfaceWhat it coversWhere defined
API compatibilitySource/binary compat of Java classes@InterfaceAudience/@InterfaceStability annotations in tez-api
Wire compatibilitySerialised messages over the networkprotobufs in */src/main/proto/
Configuration compatibilityConfig keys and default valuesTezConfiguration constants in tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java

A single patch may touch zero, one, two, or all three. Knowing which surface you're touching tells you which rules apply.

API Compatibility — The Annotation Grid

Every class in tez-api is (or should be) annotated. The two-axis grid:

@Stable@Evolving@Unstable
@PublicCompat across minor versions. Major bump to change.May change across minor versions with deprecation.May change across any release.
@LimitedPrivate({"Hive"})Stable for named projects only (e.g. Hive).Evolving for named projects.Unstable, named projects only.
@PrivateInternal. No external compat.Internal.Internal.

The annotations live at tez-api/src/main/java/org/apache/hadoop/classification/:

ls ~/tez-src/tez-api/src/main/java/org/apache/hadoop/classification/
# InterfaceAudience.java
# InterfaceStability.java

Verify a class:

grep -B2 "^public class Vertex" ~/tez-src/tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java

You will see:

@Public
@Evolving
public class Vertex {

That tells you: external users may write code against Vertex, but the class may evolve between minor versions. You may add methods. You should not remove or change the signature of an existing method without deprecation.

What You Can and Can't Change

The decision matrix for modifying an existing public method:

Change@Public @Stable@Public @Evolving@Public @Unstable@Private
Add new method to classOKOKOKOK
Add overload (different signature)OKOKOKOK
Add optional parameter (new overload)OKOKOKOK
Rename methodMajor version onlyDeprecate firstOK with note in CHANGES.mdOK
Change parameter typeMajor version onlyDeprecate + add newOKOK
Change return type (widening)Major version onlyOK with noteOKOK
Change return type (narrowing)Major version onlyMajor version onlyOKOK
Remove methodMajor version onlyMajor after 1 minor deprecationOK with noteOK
Change method behavior (same signature)Avoid; needs dev@ discussionNote in CHANGES.mdOKOK

The default rule for @Public @Stable: assume you can't change it. To change it, you need dev@ agreement first.

Deprecation Procedure

When deprecating a @Public @Evolving method:

/**
 * @deprecated Since 0.10.5, use {@link #setParallelism(int, VertexLocationHint)} instead.
 *             This method will be removed in 0.12.0.
 */
@Deprecated
public Vertex setParallelism(int parallelism) {
    return setParallelism(parallelism, null);
}

Three required elements:

  1. @Deprecated annotation on the method.
  2. @deprecated Javadoc tag explaining what to use instead.
  3. A target removal version. Vague "may be removed" deprecations live forever.

Add a note to CHANGES.txt:

DEPRECATIONS:
  TEZ-NNNN: Vertex.setParallelism(int) is deprecated; use setParallelism(int, VertexLocationHint).
            Will be removed in 0.12.0.

Wire Compatibility — Protobufs

The DAGPlan protobuf is the most compatibility-sensitive file in Tez. It is the serialised contract between:

  • The Tez client (often inside Hive, Pig, or user code) and the AM
  • The AM and history (ATSHistoryLoggingService)
  • The AM and the recovery file

A DAGPlan written by a 0.10.3 client must be readable by a 0.10.5 AM. A DAGPlan written today must be readable from recovery files written months ago.

The protobuf compatibility rules (protobuf 2.5 semantics, which Tez still uses for historic reasons):

Change to a .protoWire compat impact
Add a new optional field with defaultForward + backward compatible
Add a new repeated fieldForward + backward compatible
Add a new required fieldBREAKS old readers
Remove an optional fieldBREAKS if old readers ignore unknowns badly
Rename a field (same tag)OK in wire, breaks source compat
Change a field's tag numberBREAKS wire compat
Change a field's typeUsually BREAKS
Convert optional to repeatedBREAKS
Add a new enum valueBREAKS if old readers reject unknowns

The hard rule for DAGApiRecords.proto:

ls ~/tez-src/tez-api/src/main/proto/
# DAGApiRecords.proto
# DAGClientAMProtocol.proto
# Events.proto
  • Never reuse a tag number. Once tag 12 was used, it's used forever.
  • Never change a field's type. Even widening (int32 to int64) is a wire break.
  • Never make an optional field required.
  • New fields go at the end with the next free tag number, marked optional.

When adding a new field:

 message VertexPlan {
   required string name = 1;
   optional int32 num_tasks = 2;
   ...
   optional int64 last_modified_time = 11;
+  optional int32 max_attempts = 12;
 }

The Java side should treat the new field as "may be absent" forever — old plans don't have it.

Recovery File Compatibility

The AM writes recovery files containing serialised DAGPlan and event records. On restart, the AM reads its own recovery file. A patched AM must be able to read recovery files written by the previous patched AM.

Practical rule: recovery is at least as wire-compat-sensitive as RPC. Treat every DAGPlan change as a recovery-format change. Tests:

find ~/tez-src -name "TestDAGRecovery*.java"
find ~/tez-src -name "TestRecovery*.java"

If your patch touches a proto, run these tests and add a new case demonstrating old-format recovery still works.

History / ATS Compatibility

The history record format (used by the Tez UI and ATS) is also a wire format:

find ~/tez-src -name "HistoryEvent*.java" | head
find ~/tez-src -name "HistoryEvent.proto"

A change here breaks Tez UI queries on historical DAGs. The compatibility rule is the same as for DAGPlan. The reviewer for any history-format patch is typically a Hive committer who depends on the Tez UI.

Configuration Compatibility

Configuration keys are defined in TezConfiguration:

grep "public static final String TEZ_" \
    ~/tez-src/tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java | head -30

Each key looks like:

@ConfigurationProperty(type = "integer")
public static final String TEZ_AM_RESOURCE_MEMORY_MB = "tez.am.resource.memory.mb";
public static final int TEZ_AM_RESOURCE_MEMORY_MB_DEFAULT = 1024;

Adding a new key

OK at any time. Add the String constant, the _DEFAULT constant, an @Public / @Unstable (or @Evolving) annotation if the surrounding class is annotated, and a javadoc explaining the key and its valid range.

Renaming a key

This requires a deprecation alias. Tez has a deprecation mechanism via Hadoop's Configuration.addDeprecation. Pattern:

public static final String TEZ_AM_RESOURCE_MEMORY_MB = "tez.am.resource.memory.mb";

// Old key, deprecated since 0.10.5.
public static final String TEZ_AM_RESOURCE_MEMORY_MB_DEPRECATED = "tez.am.memory.mb";

static {
    Configuration.addDeprecation(
        TEZ_AM_RESOURCE_MEMORY_MB_DEPRECATED,
        TEZ_AM_RESOURCE_MEMORY_MB);
}

Old config files using the deprecated name continue to work. Log a warning on first read.

Removing a key

Only at a major version bump, after at least one minor version of deprecation. Document in CHANGES.txt and the release notes.

Changing a default

Treat as a behavior change. Requires dev@ discussion if the change affects perf or resource usage. Document the change explicitly:

DEFAULT CHANGES:
  TEZ-NNNN: tez.am.resource.memory.mb default changed from 1024 to 1536 to reduce OOMs
            on large DAGs. Users with tight container budgets should explicitly set the
            old value.

Compatibility Across Tez and Hive/Pig

Tez has cross-project compatibility commitments to Hive and Pig — they bundle Tez and expect a Tez version bump not to break them. The mechanism is @LimitedPrivate.

grep -rn "@LimitedPrivate" ~/tez-src/tez-api/src/main/java | head

A class annotated @LimitedPrivate({"Hive"}) has API compatibility guaranteed to Hive only. The Tez side may not break it without first warning dev@hive.apache.org. The Hive side commits to not relying on anything other than @LimitedPrivate or @Public APIs.

When you change a @LimitedPrivate({"Hive"}) class:

  1. Search Hive for usage: grep -rn <ClassName> ~/hive-src/ql/src/
  2. If Hive uses it, post a heads-up on dev@hive.apache.org referencing the JIRA.
  3. Consider providing both old and new methods for one Tez minor version.

Validation Artifacts

After this chapter you should have:

  1. A ~/tez-notes/compat-cheatsheet.md with the API matrix from above.
  2. A list of every .proto file in tez-api and which compat surface each protects.
  3. The set of files in tez-api/.../classification/ open in your IDE for reference.
  4. Knowledge of which Hive classes import from tez-api:
    grep -rn "import org.apache.tez" ~/hive-src/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ | head
    
  5. The ability to predict, for any change, which compat surface(s) it touches and what the deprecation timeline would be.

The next chapter — Meritocracy — is the project-level perspective: how Apache Tez decides who gets to make compatibility decisions.

Meritocracy: Contributor → Committer → PMC

The Apache Way uses a specific, technical sense of the word "meritocracy" that is often misread. This chapter is what it actually means inside Apache Tez, how the path from casual contributor to PMC member works, and what each step really requires.

The Three Roles

RoleGranted byWhat it gives youWhat it asks of you
ContributorNothing — anyone who contributes is oneJIRA account, ability to submit patchesNothing formal
CommitterVote on private@tez.apache.org by PMCCommit access to apache/tez, vote rights on patches (non-binding for releases)ICLA on file, ongoing engagement
PMC memberVote on private@tez.apache.org by PMCBinding vote on releases, vote rights on new committers and PMC members, board reporting shareLegal stewardship, release responsibility

There is no fourth role. "Lead contributor" or "maintainer" are not Apache concepts. "Chair" is a PMC member who reports to the board; rotating, often by lottery within the PMC.

What "Meritocracy" Actually Means at Apache

Apache uses "meritocracy" in a very specific sense: decisions and elevations are based on accumulated, evidenced contribution to the project — not on title, employer, or personal connections.

That is narrower than the colloquial meaning. It explicitly does not mean:

  • "Best engineer wins." Many excellent engineers are not committers because they have not engaged with this specific community.
  • "Most patches wins." LOC is not a measure of merit.
  • "Paid time on the project wins." Full-time paid Tez work, on its own, does not earn committership. The community must observe the contribution.
  • "Smartest design wins arguments." Arguments are won by evidence and consensus, not cleverness.

What it does mean:

  • Sustained, visible contribution over months
  • Quality demonstrated by patches getting committed with few iterations
  • Trust demonstrated by reasonable behavior on JIRA and dev@
  • Investment in the project itself, not just in your features

The Path to Committer

The committer vote is private; the criteria are not codified anywhere with bullet points. What committers actually look at, in rough order:

  1. Patch quality. Have your patches gone in with light review? Have you mastered the workflow in Patch Quality?
  2. Volume and sustained activity. Not LOC, but consistency. 10 small patches over 6 months is much stronger than 1 huge patch.
  3. Engagement breadth. Have you reviewed others' patches (with non-binding +1s)? Helped on user@ questions? Filed clean JIRAs?
  4. Judgement on dev@. Have you participated in design discussions? Were your contributions thoughtful, not just adding noise?
  5. Area coverage. Have you worked in more than one corner of the codebase, or are you trusted for a deep one? Either can earn the bit.
  6. Trust. Would the existing committers be comfortable with you committing your own patches?

There is no fixed threshold. Different projects have different bars; Tez is in the middle (not as strict as Hadoop, not as loose as a brand-new TLP).

Typical Trajectory

Month 1-2:   First few small patches (Javadoc, log messages, tiny bug fixes).
             Some friction in review as you learn conventions.
Month 3-6:   More substantive patches. Lower review iteration count.
             Reviewing others' patches with non-binding +1.
Month 6-12:  Larger patches with design discussion.
             Filing follow-up JIRAs after your patches.
             Recognised name on dev@.
Month 12+:   A PMC member notices and proposes you on private@.
             PMC discusses, votes. Vote happens silently.
             You receive a private email offering the bit.
             You publicly accept on dev@ via an [ANNOUNCE] thread by the PMC.

The 12-month figure is a median, not a rule. Faster is possible with very sustained engagement; slower is common.

Accepting the Bit

If a PMC member emails you with an offer of committership, the steps:

  1. Accept privately, via reply to the offer email.
  2. The PMC raises an [ANNOUNCE] New Tez committer: <name> thread on dev@.
  3. You acknowledge publicly on the thread.
  4. ASF Infrastructure provisions your ASF ID (<id>@apache.org).
  5. You get karma to push to apache/tez.

What changes for you:

  • You can commit your own patches. Don't commit your own patches without review for the first few months. The community trust applies to your judgement of others' patches; your own still get reviewed.
  • You get a binding +1 vote on commits.
  • You get a non-binding +1 on releases (PMC +1 is binding).
  • You are now visible as part of the project. Behave accordingly on dev@, JIRA, and conferences.

The Path to PMC

PMC membership is a separate, later, additive step. Committership is necessary but not sufficient. Criteria, looser even than committership:

  1. Sustained activity as a committer. Months to years post-committer.
  2. Project-level judgement, not just code. Have you weighed in on release timing, compat questions, community-management issues?
  3. Willingness to take on release-management or PMC duties. Cutting a release, responding to security reports, mentoring new committers.
  4. Trust to handle confidential matters — security disclosures arrive on private@tez.apache.org, and PMC members must handle them carefully.

PMC votes are also private. You are notified by email; the public announcement is on dev@.

What PMC Members Do That Committers Don't

DutyWhy PMC only
Binding +1 on releasesASF policy: releases are PMC acts
Vote on new committers and PMC membersSelf-perpetuating governance
Receive and process security reportsConfidentiality
Approve / sign release artifactsLegal liability flows through PMC
Quarterly board reportsStewardship to the foundation
Trademark guardianship"Apache Tez" is a Foundation mark
Brand decisions (logos, names, conferences)ASF authorises through PMCs

Common Misconceptions

"I work on Tez full-time, so I should be a committer."

Paid time is irrelevant. The community can only assess what it can observe — public patches, public reviews, public discussion. Internal company work, no matter how extensive, does not exist from the project's perspective.

If your day job is Tez work, the way to convert that into committership is to do that work in the open: file JIRAs, attach patches, post designs.

"I wrote N lines of code, so I should be a committer."

LOC is not used. A contributor with 200 lines spread across 15 thoughtful patches is strictly stronger than one with 5000 lines in 2 mega-patches. Smaller, frequent, high- quality contributions demonstrate the judgement committership rewards.

"My company has N committers, so we should have the next slot."

Apache projects are explicitly company-independent. Many PMCs have an informal limit on the proportion of committers from any single employer (no more than ~50%) to preserve project independence. Companies do not have slots.

"I was a committer on project X, so I should get the bit here automatically."

You don't. Committership is per-project. Past contribution elsewhere is positive prior evidence but does not substitute for engagement on Tez.

"I have an ASF ICLA on file, so I'm a contributor."

An ICLA is a legal document covering future contributions. It does not make you a contributor; submitting a contribution makes you a contributor. ICLA is necessary for non-trivial contributions to be committed.

"There is a contributor-rank or leaderboard."

There isn't. Apache projects do not maintain rankings, badges, or stars. The closest thing is the CHANGES.txt file, which records the contributor name on each committed patch.

What Earns the Bit, Concretely

If you want a checklist, this is roughly it. None are individually required, but most committers tick most boxes by the time they're proposed:

  • 10+ patches committed, spanning multiple areas of the code.
  • At least one patch with non-trivial design discussion on dev@ or JIRA.
  • At least one bug found by you, reproduced by you, fixed by you, tested by you.
  • Reviewed at least 5 other contributors' patches with constructive non-binding +1s or -1s.
  • Helped answer questions on user@ or in JIRA comments.
  • Filed follow-up JIRAs when you noticed adjacent issues.
  • Behaved well in every public interaction, including when a patch was rejected.
  • Maintained existing patches as the codebase moved under them (rebased, addressed review).
  • Sustained over 6+ months, not concentrated in one sprint.
  • Not gaming any of the above (committers can tell).

What Earns PMC, Concretely

  • Committer for 1–3+ years.
  • Demonstrated judgement on dev@ beyond your own patches.
  • Have either cut a release or helped with one.
  • Have proposed or seconded other committers.
  • Have engaged with at least one cross-project compat concern.
  • Visible willingness to do PMC work (security, brand, board reports) — not just code.

Validation Artifacts

After this chapter you should have:

  1. A clear-eyed view of where you currently are on the path.
  2. A ~/tez-notes/karma.md listing every concrete thing you've done that the community can observe — patches, reviews, JIRA comments, dev@ posts.
  3. A goal for the next 3 months in terms of contribution shape, not LOC.
  4. The ability to explain the contributor / committer / PMC distinction to a colleague without using the word "lead."

This chapter closes the Contributor Mindset section. The next major section, Release & PMC Reality, takes you inside the committer and PMC view — what those roles actually look like from inside.

Issue Roadmap — Twelve Stages from Trivial to Release-Blocking

This roadmap is a deliberately ordered ladder of Apache Tez contributions. Each rung trains a specific skill, depends on the rung below it, and ends at a concrete review-ready patch. Skipping rungs is the most common reason contributors stall: a shuffle bug fix without state-machine fluency turns into a six-month patch thread, and a release-blocker triage call without compatibility reflexes turns into a reverted commit.

The stages are calibrated to the Tez 0.10.x codebase on disk at ~/tez-src. JIRA queries assume https://issues.apache.org/jira/projects/TEZ. Patch discussion happens on dev@tez.apache.org. Where stages reference real modules they use the exact paths you will see under ~/tez-src:

tez-api/                       public interfaces, descriptors, configuration keys
tez-common/                    IDs, util, log helpers, ATS/timeline shared code
tez-dag/                       AppMaster: DAGImpl, VertexImpl, TaskImpl, schedulers
tez-runtime-internals/         TezTaskRunner, LogicalIOProcessorRuntimeTask
tez-runtime-library/           ShuffleManager, Fetcher, IFile, MergeManager
tez-mapreduce/                 MR-shim inputs/outputs/processors
tez-tests/                     MiniTezCluster integration tests
tez-examples/                  OrderedWordCount, SimpleSessionExample, etc.
tez-plugins/tez-yarn-timeline-history/   ATS history events
tez-plugins/tez-aux-services/  NM-side ShuffleHandler hook
docs/                          User-facing site under src/site/markdown

The Twelve Stages

#StageTarget skillPrereqTypical patch sizeReview depth
1Docs & testsReading the codebase, JIRA workflow, RAT/checkstylenone1–30 lines1 reviewer
2Build & logging hygienepom dep bands, slf4j idioms, LOG.isDebugEnabled()15–80 lines1 reviewer
3Error message contextException chaining, ID propagation, tez-dag CONTEXT rule220–200 lines1–2 reviewers
4State machine transitionsStateMachineFactory, InvalidStateTransitonException330–250 lines + test2 reviewers, dev@ ping
5Scheduler bugsTaskSchedulerManager, YarnTaskSchedulerService, AMRMClient450–500 lines + MiniCluster test2 reviewers
6Shuffle & runtimeShuffleManager, Fetcher, MergeManager, IFile580–600 lines + test2 reviewers
7Hive-on-Tez compatibilityDAGPlan size, edge property contracts, session reuse5 or 6varies; often a tez-side + HIVE-side ticketcommitters in both projects
8YARN integrationAMRMToken, log aggregation, NM aux service, kerberos renewal550–400 lines2 reviewers, often YARN-side too
9Flaky testsDrainDispatcher, dispatcher-aware waits, port collisions420–150 lines per test1–2 reviewers; sometimes "stamped"
10Performance regressiongit bisect, async-profiler / JFR, JMH micro6 or 830–300 lines + bench evidence2 reviewers, dev@ design ping
11Backward compatibility@InterfaceAudience, @InterfaceStability, protobuf evolution4small code, long dev@ threadcommitters + PMC
12Release-blockingRC voting, -1 binding, security CVE pipelinecommittervariesPMC + release manager

How to Use This Roadmap

Pick a stage honestly

Find your rung by asking what is the largest patch you have shipped:

  • Never landed a Tez patch: start at Stage 1.
  • Landed a docs patch but never touched Java in tez-dag: Stage 2.
  • Comfortable with tez-common Java but never read a state machine: Stage 3.
  • Read VertexImpl.stateMachineFactory once and were confused: Stage 4.
  • Read it twice and could draw the state graph: Stage 5+.
  • Already a Tez committer: jump straight to Stages 10–12 for sharpening.

Do not jump rungs to chase a "cool" bug. A locality miscount in YarnTaskSchedulerService looks self-contained and isn't — the patch will land on state-machine transitions you have never edited.

One stage per PR

Resist the urge to fix two things in one patch. Reviewers reject mixed-concern patches almost reflexively. If you find a logging issue while fixing an error message, file a follow-up JIRA and move on. The roadmap rewards small surface area.

Always start with git log and git blame

Before touching a file, find the last 5 commits that modified it:

cd ~/tez-src
git log --oneline -n 5 -- tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java
git blame -L 1200,1260 tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java

The blame output tells you which committer cares about that area. CC them on the JIRA.

Time investment per stage

Calibrated against a working contributor who has the codebase checked out, can build locally with mvn clean install -DskipTests -Phadoop28, and has filed at least one JIRA before:

StageFirst patchBecoming fluent (5 patches landed)
1half a day1 week
21 day2 weeks
31–2 days1 month
43–5 days2–3 months
51–2 weeks4–6 months
62–4 weeks6 months
7weeks per attribution calla year of cross-project work
81–3 weeks6 months
91–3 days per flakeongoing
10weeks (perf is bisect-bound)committer-level skill
11weeks (dev@ design cycle)committer-level skill
12PMC-level responsibilityn/a

Success criterion per stage

Each stage is "complete" for you when:

  • Stage 1: one docs and one test patch are committed to master.
  • Stage 2: at least two logging or build patches are committed without nits.
  • Stage 3: one error-context patch is committed with no reviewer asking "which DAG?"
  • Stage 4: one transition fix is committed and has a regression test in TestVertexImpl.
  • Stage 5: one scheduler patch is committed with a MiniTezCluster repro test.
  • Stage 6: one shuffle-runtime patch is committed with a deterministic repro.
  • Stage 7: one cross-project ticket is filed with a written attribution argument.
  • Stage 8: one YARN-integration patch is committed with explicit Hadoop-version evidence.
  • Stage 9: at least three flaky tests have been de-flaked.
  • Stage 10: one perf patch is committed with before/after benchmark numbers.
  • Stage 11: one compatibility-sensitive patch is committed with explicit annotations and dev@ sign-off.
  • Stage 12: you have helped triage at least one RC vote.

When to ask on dev@

Before writing any code for Stages 4 and above, send a short note to dev@tez.apache.org:

Subject: [DISCUSS] TEZ-XXXX — proposed approach

I see <symptom> at <file>:<line>. My read is <cause>. I plan to <fix>, with
a regression test in <test>. Would appreciate any context I'm missing before
I post a patch.

Three sentences. No essay. The list will tell you in 24 hours whether you are about to step on someone else's in-flight work.

When the roadmap does not apply

This roadmap is for bug fixes and small features. It is not for:

  • New runtime engines or scheduler rewrites — those are Tez Improvement Proposals (TEPs); start a dev@ thread, not a patch.
  • Hive query-engine changes that happen to surface in Tez — file on HIVE, not TEZ.
  • YARN-side fixes that Tez merely consumes — file on YARN, not TEZ.

Stage 7 teaches the attribution skill that keeps these in the right project.


What to read alongside this roadmap


What this roadmap is not

This roadmap is not a tutorial on Apache Tez itself. The deep dives in ../deep-dives/index.md cover the architecture; the labs in ../level-1/index.md onward cover the hands-on code reading. The roadmap assumes you can already build Tez from source, run the unit tests, and stand up a MiniTezCluster end-to-end. If you cannot, the prerequisite chapter is Level 1, Lab 1.1.

It is also not a generic Apache contribution guide. The Apache "How to Contribute" pages cover the cross-project mechanics (ICLA, JIRA account creation, mailing list etiquette). The roadmap assumes those are done.

Finally, it is not a roadmap for committership. Becoming a Tez committer is a separate path that the PMC manages. The roadmap teaches the skills that, applied consistently over time, make committership a reasonable outcome — but landing patches is necessary, not sufficient.

Reading order

If you read this book front-to-back, you will hit this chapter after the deep dives and before the capstone. That is the intended sequence:

  1. Read the deep dives to understand the architecture.
  2. Read this roadmap to understand the contribution ladder.
  3. Pick a rung and ship a patch.
  4. Come back to this roadmap when the patch lands, and step up a rung.
  5. After three or four rungs, attempt the capstone in ../capstone/index.md.

If you are jumping in mid-book, start at the rung that matches your current skill (see "Pick a stage honestly" above) and read the stage's companion deep dive at the same time.

A note on JQL

The JIRA queries in each stage are starting points. The Tez project's issue labelling has drifted over the years — labels like newbie and beginner are inconsistently applied. If a filter returns zero results, broaden it (remove a clause) before assuming the filter is wrong. Each stage gives at least one fallback grep-based candidate-finding method that does not depend on labels.

A second JQL tip: pin a "watched issues" filter for the components you care about. Tez has roughly a dozen components in JIRA; you do not need to watch all of them, but watching the two or three closest to your current rung is how you stay current on landed work.

A note on local clone hygiene

Every stage in this roadmap assumes you have a clean checkout at ~/tez-src. "Clean" means:

  • git status shows no untracked files outside .gitignore.
  • git branch shows you on master (or a topic branch you remember creating).
  • mvn clean install -DskipTests -Phadoop28 completes in under two minutes locally.

A messy checkout produces hard-to-reproduce results: a grep that catches your own WIP, a git bisect that visits commits whose builds were already broken by an unrelated local change, a mvn test that passes locally because of a stale ~/.m2 jar.

Refresh on Mondays:

cd ~/tez-src
git checkout master
git pull --ff-only
git clean -fdx
mvn -q clean install -DskipTests -Phadoop28

The git clean -fdx is aggressive — it removes everything not tracked by git, including IDE artifacts. Keep an .idea/ (or equivalent) backup elsewhere if you customise it.

How the stages interlock

Each stage builds vocabulary the next stage uses without re-explaining:

  • Stage 1 teaches the patch artifact format. Every later stage assumes it.
  • Stage 2 teaches the LOG.isDebugEnabled() pattern. Stage 3 builds on it with the CONTEXT rule.
  • Stage 3 teaches you to navigate tez-dag. Stage 4 lives in tez-dag/...impl/.
  • Stage 4 teaches the state-machine DSL. Stage 5 reads the same DSL in the scheduler.
  • Stage 5 teaches MiniTezCluster. Stage 6 leans on it for every shuffle test.
  • Stage 6 teaches the runtime contracts. Stage 7 attributes bugs against those contracts to Hive.
  • Stage 8 teaches the YARN boundary. Stage 11 references it when discussing compat across Hadoop versions.
  • Stage 9 teaches deterministic testing. Stage 10 uses it as the baseline for benchmark stability.
  • Stage 10 teaches measurement. Stage 11 uses measurement as evidence for compat decisions.
  • Stage 11 teaches the audience/stability matrix. Stage 12 uses it when triaging blockers.

Skipping a stage means skipping a vocabulary. Reviewers will notice.

Now turn the page to Stage 1.

Stage 1 — Docs and Tests

What this stage teaches

Stage 1 is the on-ramp. The skills are deliberately non-technical:

  • Navigate the Apache JIRA workflow: claim a ticket, assign it to yourself, attach a patch, set "Patch Available", respond to review.
  • Run mvn apache-rat:check and mvn checkstyle:check cleanly.
  • Produce a git format-patch artifact that applies on master.
  • Wait for a Jenkins precommit run and read its output without panicking.

The contributions themselves are surgical: a docs typo, a missing @since tag, a @param javadoc that the linter complains about, a LOG.info whose message is misleading. Nothing in this stage will surprise a reviewer. That is the point: you are exercising the workflow so the next stages can be about code.

JIRA filter to find candidates

Real JQL you can paste into https://issues.apache.org/jira/issues:

project = TEZ
  AND labels in (newbie, beginner, "newbie-friendly", "low-hanging-fruit")
  AND resolution = Unresolved
  AND (component in (Documentation) OR summary ~ "typo" OR summary ~ "javadoc")
ORDER BY updated DESC

A second filter that often surfaces good Stage 1 work — javadoc that the build already flags:

project = TEZ AND status = Open AND text ~ "javadoc" AND text ~ "missing"

Open three candidates, read each comment thread end to end. Choose one that has no assignee, no patch attached, and was last updated more than three months ago. That is the abandoned-but-still-valid ticket: a perfect Stage 1.

If nothing fits, file your own. Walk the docs/src/site/markdown/ tree and grep for broken links, stale Hadoop version numbers, and configuration keys removed years ago:

cd ~/tez-src
grep -rn "tez\.am\.task\.max\.failed\.attempts" docs/src/site/markdown/
grep -rn "hadoop-2\.[0-6]" docs/src/site/markdown/
grep -rn "TODO\|FIXME\|XXX" docs/src/site/markdown/

A genuine doc bug found this way is fair game for your first JIRA.

Walked example — TezConfiguration javadoc missing @since

Symptom: a contributor reports on dev@ that TezConfiguration.TEZ_AM_RESOURCE_MEMORY_MB has no @since tag, so users cannot tell which release introduced the property's default change.

Step 1 — Locate the symbol

cd ~/tez-src
grep -n "TEZ_AM_RESOURCE_MEMORY_MB" \
  tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java | head

Open the file. The relevant block looks roughly like:

@ConfigurationScope(Scope.AM)
public static final String TEZ_AM_RESOURCE_MEMORY_MB =
    TEZ_AM_PREFIX + "resource.memory.mb";
public static final int TEZ_AM_RESOURCE_MEMORY_MB_DEFAULT = 1024;

No javadoc, no @since. That is the bug.

Step 2 — Claim the JIRA

On https://issues.apache.org/jira/projects/TEZ:

  1. Click Create, set Project = TEZ, Issue Type = Improvement.
  2. Summary: Add @since tags and javadoc for TEZ_AM_RESOURCE_MEMORY_MB family.
  3. Component: tez-api. Affects Version: 0.10.3. Fix Version: leave blank — the release manager sets it.
  4. Description: state the symptom, paste the grep above, link the dev@ thread.
  5. Save, then click Assign to me.

Step 3 — Diff

--- a/tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java
+++ b/tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java
@@
+  /**
+   * Memory (in MB) requested for the AppMaster container. If the AM is launched
+   * by YARN, this is passed through to {@link
+   * org.apache.hadoop.yarn.api.records.Resource#setMemorySize(long)} on the
+   * {@code ApplicationSubmissionContext}.
+   *
+   * @since 0.5.0
+   */
   @ConfigurationScope(Scope.AM)
   public static final String TEZ_AM_RESOURCE_MEMORY_MB =
       TEZ_AM_PREFIX + "resource.memory.mb";
+  /** Default value of {@link #TEZ_AM_RESOURCE_MEMORY_MB}. @since 0.5.0 */
   public static final int TEZ_AM_RESOURCE_MEMORY_MB_DEFAULT = 1024;

Two rules for @since:

  1. Look at the earliest commit that introduced the symbol, not the current version. git log --diff-filter=A -- tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java then git log -S "TEZ_AM_RESOURCE_MEMORY_MB" -- tez-api/.... Cross-reference the commit hash against the release tags (git tag --contains <hash>).
  2. Never guess. If you cannot find the release, ask on dev@. A wrong @since is worse than no @since.

Step 4 — Build and lint

cd ~/tez-src
mvn -pl tez-api -am clean install -DskipTests -Phadoop28 -q
mvn -pl tez-api checkstyle:check -q
mvn -pl tez-api apache-rat:check -q
mvn -pl tez-api javadoc:javadoc -q 2>&1 | grep -i "error\|warning" | head

The javadoc target is the slowest gate in Tez. Run it. If it warns about an @link that no longer resolves, fix that in the same patch — reviewers will ask anyway.

Step 5 — Format and attach the patch

cd ~/tez-src
git add tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java
git commit -m "TEZ-XXXX. Add @since tags for TEZ_AM_RESOURCE_MEMORY_MB family"
git format-patch -1 HEAD --stdout > /tmp/TEZ-XXXX.001.patch

The Tez convention is TEZ-XXXX.NNN.patch where NNN starts at 001 and increments on every reroll. Upload to the JIRA, click "Submit Patch" so the status flips to Patch Available. Jenkins precommit will pick it up within an hour and post results.

Step 6 — Respond to review

Almost certain reviewer requests for a docs patch:

  • "Add {@value} macros so the default appears inline."
  • "Wrap the line at 100 chars."
  • "Capitalise the first word of the javadoc sentence."

Reroll as 002, never overwrite the 001 file. Each reroll is an attachment in JIRA, not a force-push; reviewers compare attachments by name.

Pitfalls

  • Don't fix two bugs in one patch. A whitespace cleanup tacked onto a typo fix is the most common reason a Stage 1 patch sits unmerged for months.
  • Don't run mvn install without -DskipTests. The full test suite takes well over an hour. For a docs patch you need only the lint targets above.
  • Don't squash through git rebase -i master and call git diff master — the Apache toolchain expects git format-patch -1 output. The two are not identical whenever your branch contains merge commits.
  • Don't paste the diff into the JIRA description. Attach the .patch file.
  • Don't request a reviewer in the JIRA description. Use the Assignee field to assign to yourself and let committers self-select. CC on dev@ if it has been more than two weeks with no review.
  • Don't open a GitHub PR instead of a JIRA patch unless the project guide says so. As of 0.10.x, Tez accepts GitHub PRs but the JIRA is still the source of truth and must be referenced in the PR title.

Exit criteria — when you're ready for the next stage

You can move to Stage 2 when:

  • You have one merged docs or javadoc patch and one merged test-only patch (typically a missing @Test method or a broken assertion message in tez-tests/).
  • You have responded to at least one round of reviewer nits without needing the reviewer to walk you through git format-patch syntax.
  • A green Jenkins precommit run on your patch no longer makes you nervous, and you can read the report and tell which warnings are pre-existing versus introduced by your change.
  • You can recite from memory: "JIRA first, branch from master, one logical change per patch, TEZ-XXXX.NNN.patch naming, attach not paste."

A second walked example — fixing a misleading log message

Symptom: a contributor sees a LOG.info in tez-dag that reads:

LOG.info("Vertex " + vertexName + " has " + numTasks + " tasks");

But it fires every time the vertex is re-initialised, not just on first initialisation. The message implies a one-shot event; operators have complained that they cannot grep the log to find unique vertices.

The diff

--- a/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java
+++ b/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java
@@
-    LOG.info("Vertex " + vertexName + " has " + numTasks + " tasks");
+    LOG.info("Vertex {} (id={}) initialised with {} tasks (init count={})",
+        vertexName, vertexId, numTasks, ++initCount);

Three changes in one diff:

  1. The message uses slf4j placeholders.
  2. The vertex ID is added so operators can correlate with downstream ATS events.
  3. The init counter makes the "re-initialise" case visible.

This patch is technically a borderline Stage 3 candidate (it adds the vertex ID — see stage-3-error-messages.md). For a first patch, the JIRA description should explicitly say "I am only changing the log message; the init-count field is added but no transition behaviour changes." That framing keeps the patch in Stage 1 scope.

Test

A log-message change usually has no functional test. The reviewer signal is a manual run of a small OrderedWordCount against MiniTezCluster with the modified jar, and a grep of the resulting log to confirm the new format. Document the grep in the JIRA comments:

grep "initialised with" tez-am.log | head

When to file a follow-up

If, while working on a Stage 1 patch, you discover a bigger issue — suppose the missing javadoc is missing because the configuration key was silently renamed without an @since in either place — file a follow-up JIRA in the same component. Do not bundle the bigger fix into your Stage 1 patch.

Standard wording in your JIRA comments:

While working on TEZ-XXXX I noticed that TEZ_AM_RESOURCE_MEMORY_MB was
renamed from TEZ_AM_MEMORY_MB in 0.7.0 without an @deprecated on the
old key. Filed TEZ-YYYY to track the deprecation cleanup.

This habit — narrow Stage 1 patch + follow-up JIRA — is what reviewers mean when they say "keep patches focused." It is the skill the rest of the roadmap depends on.

Where Stage 1 patches go wrong

The two most common failure modes for a Stage 1 patch:

  1. Scope creep. The contributor "just fixes" three sibling issues while editing the file. Reviewers ask for a split. The contributor reroll incompletely. Two months later the patch is abandoned.
  2. Silent rebase break. The contributor rebases on master, the patch no longer applies cleanly, but they never upload an 002 reroll. The committer sees a stale patch and moves on.

Neither failure is about code. Both are about workflow discipline. Stage 1 exists to drill that discipline before the stakes get higher.

Stage 2 will move you from documentation into code that runs in production AMs.

Stage 2 — Build and Logging Hygiene

What this stage teaches

Stage 2 teaches the smallest patches that touch running production code: build metadata (pom.xml), logging idioms, and dependency hygiene. You learn:

  • How Tez's dependency version bands work and which bumps are safe within a minor line.
  • The slf4j-api + log4j (or reload4j) logging stack as wired in tez-common, and the four idioms reviewers actively enforce.
  • How to remove deprecated Guava and Hadoop calls without breaking older Hadoop consumers in the supported compatibility band.
  • How to triage log-level mismatches: messages logged at INFO that should be DEBUG (and the reverse).

The patches are still small (5–80 lines) and the risk surface is small, but they go into the AM and the runtime tasks. A LOG.info in ShuffleManager that fires once per fetch will be seen by every operator running Hive-on-Tez.

JIRA filter to find candidates

project = TEZ
  AND resolution = Unresolved
  AND (summary ~ "logging" OR summary ~ "deprecated" OR summary ~ "guava"
       OR summary ~ "bump" OR summary ~ "upgrade dependency"
       OR summary ~ "System.out" OR description ~ "isDebugEnabled")
ORDER BY updated DESC

A second sweep for dependency bumps that the build flags:

project = TEZ AND component in (build) AND status = Open ORDER BY priority DESC

You can also generate candidates by running OWASP / dependency-check:

cd ~/tez-src
mvn -pl tez-common dependency:tree -DoutputType=text | grep -E "guava|jackson|netty"

Any line that flags a Guava 12.x in transitive scope is a Stage 2 candidate, because Tez has been on Guava-shaded internals for years.

Walked example A — System.out.println in production code

Symptom: a grep finds three stray System.out.println calls in tez-runtime-library. They were left over from a debugging session and now show up in NodeManager stdout logs, polluting operator dashboards.

Step 1 — Find every offender

cd ~/tez-src
grep -rn "System\.out\.println\|System\.err\.println" \
  tez-runtime-library/src/main/java tez-runtime-internals/src/main/java tez-dag/src/main/java \
  | grep -v "/test/" | grep -v "examples"

Each hit is a separate JIRA candidate (one stage-2 patch per ticket). Pick one, file the JIRA, claim it.

Step 2 — The diff

Suppose the offender is in tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/PipelinedSorter.java:

--- a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/PipelinedSorter.java
+++ b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/PipelinedSorter.java
@@
-    System.out.println("Spill " + numSpills + " starting, size=" + buffer.position());
+    if (LOG.isDebugEnabled()) {
+      LOG.debug("Spill {} starting, size={}", numSpills, buffer.position());
+    }

Three rules in one diff:

  1. Replace System.out with the class's existing slf4j LOG. If the file does not have one, add private static final Logger LOG = LoggerFactory.getLogger(...) at the top.
  2. Use slf4j {} placeholders, not string concatenation. The placeholder form avoids constructing the message string when the log level is filtered out.
  3. Wrap the call in LOG.isDebugEnabled() only when the argument list does non-trivial work (a toString() on a large object, a list copy, a .size() on a synchronized collection). Pure references (numbers, already-bound strings) do not need the guard.

The third rule is the one reviewers nitpick most. The placeholder form already defers toString(), so a guard around a plain LOG.debug("foo {}", x) where x is an int is unnecessary noise. But this:

LOG.debug("Pending {}", scheduledTasks);   // scheduledTasks.toString() is expensive

does benefit from a guard, because scheduledTasks will be toString()-ed before slf4j forms the message.

Step 3 — Verify the build

mvn -pl tez-runtime-library -am clean install -DskipTests -Phadoop28 -q
mvn -pl tez-runtime-library checkstyle:check -q

There is no easy unit test for "no System.out left behind." The reviewer signal is a clean grep across the changed file plus a green checkstyle run.

Walked example B — pom.xml dep bump within the compat band

Symptom: jackson-databind 2.12.x has a known CVE; Tez is pinned to 2.12.6 in the parent POM. The compatibility band for the 0.10.x line allows bumps within the 2.12.* range.

Step 1 — Find the pin

cd ~/tez-src
grep -n "jackson-databind\|jackson.version\|jackson-core" pom.xml

Result, abbreviated:

pom.xml:178:    <jackson.version>2.12.6</jackson.version>

Most jackson artifacts in Tez are governed by ${jackson.version} in the parent POM. That is the only string you change.

Step 2 — The diff

--- a/pom.xml
+++ b/pom.xml
@@
-    <jackson.version>2.12.6</jackson.version>
+    <jackson.version>2.12.7.1</jackson.version>

That is the entire patch. The harder part is justifying it.

Step 3 — The JIRA description

Summary: Bump jackson-databind from 2.12.6 to 2.12.7.1

Description:
2.12.6 is affected by CVE-YYYY-NNNN. 2.12.7.1 is the latest patch on the 2.12
line and is API-compatible per the jackson maintainers' compat notes. We do not
bump to 2.13 / 2.14 here to keep Hive-on-Tez compatibility unchanged.

Verification:
  mvn clean install -DskipTests -Phadoop28
  mvn -pl tez-dag test -Dtest=TestDAGImpl
  mvn -pl tez-runtime-library test -Dtest=TestShuffleManager

Step 4 — Why "within the compat band" matters

If you bumped to 2.14, you would break Hive 3.x users who ship 2.13. A 2.12 → 2.12.7.1 bump is a one-line patch. A 2.12 → 2.14 bump is a six-month compatibility argument and lives in Stage 11. Stay on rung.

Walked example C — log-level mismatch

Symptom: a user reports their NodeManager logs are at 100GB/day. Investigation shows Fetcher is logging every single shuffle fetch at INFO:

LOG.info("Fetcher " + id + " connecting to " + host + ":" + port);

That message fires per attempt per source per fetch. For a 10k-task vertex it is catastrophic.

Diff

--- a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/Fetcher.java
+++ b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/Fetcher.java
@@
-    LOG.info("Fetcher " + id + " connecting to " + host + ":" + port);
+    if (LOG.isDebugEnabled()) {
+      LOG.debug("Fetcher {} connecting to {}:{}", id, host, port);
+    }

Rules for INFO → DEBUG demotions:

  • The message fires more than once per task attempt → almost always DEBUG.
  • The message fires once per DAG lifecycle event (DAG start, vertex committed, task killed by user) → keep at INFO.
  • The message fires per exception → keep at WARN or ERROR per the existing level, never demote silently.
  • Never demote a log without dev@ confirmation if the message references a contract event (state transition, container release). Operators rely on those for postmortems.

The Fetcher example is uncontroversial; a LOG.info on every state transition in VertexImpl is not — that would be Stage 4.

Pitfalls

  • Don't introduce a logger dependency change in a logging patch. If the file imports org.apache.commons.logging.Log, do not migrate it to slf4j in this patch. That migration is a separate JIRA and a much larger surface area.
  • Don't use Throwable.printStackTrace() even in tests. Reviewers will flag it. Use LOG.error("msg", t) instead.
  • Don't bump a dep across a major version line in a Stage 2 patch. That is Stage 11.
  • Don't mvn versions:use-latest-releases and submit the resulting diff. The bump must be justified per artifact with the CVE or the bug being fixed.
  • Don't remove deprecated Guava calls by adding new Guava calls. The Tez trajectory is off Guava in public code. Replace Preconditions.checkNotNull with Objects.requireNonNull (JDK 7+) — not with a different Guava class.
  • Don't add a LOG.debug guard around a string literal. LOG.debug("hello") needs no guard.

Exit criteria — when you're ready for the next stage

Move on when:

  • You have shipped at least two logging-cleanup patches and one pom dep bump.
  • You can explain, without looking it up, when to add LOG.isDebugEnabled() and when not to.
  • You have read tez-common/src/main/java/org/apache/tez/common/CallableWithNdc.java and understand the NDC pattern used to attach DAG/Vertex IDs to log messages — that knowledge is the bridge to Stage 3.
  • Your last patch was reviewed and merged without a "split this into two JIRAs" comment.

Stage 3 layers on top: you keep the same surgical patch style, but now you make the content of error messages tell the operator which DAG and which vertex.

Appendix — finding logging hygiene candidates yourself

The JIRA filter at the top of this stage may return zero results during quiet periods. When that happens, you can manufacture candidates yourself with two grep patterns that have a high signal-to-noise ratio.

Pattern A — unguarded string concatenation in LOG.debug

cd ~/tez-src
grep -rn 'LOG\.debug(.*+' --include="*.java" tez-dag tez-runtime-internals \
    tez-runtime-library | grep -v isDebugEnabled | head -30

This finds calls of the form LOG.debug("got " + counter) that allocate the concatenated string unconditionally. Pick one, wrap in if (LOG.isDebugEnabled()), attach to a JIRA.

Pattern B — LOG.info calls with high call-site frequency

cd ~/tez-src
grep -rn 'LOG\.info' --include="*.java" tez-runtime-library | wc -l
grep -rn 'LOG\.info' --include="*.java" tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle | head

The shuffle path runs per-fetch — any LOG.info there fires hundreds of thousands of times per DAG. Most are candidates for demotion to DEBUG.

Pattern C — pom files referencing pinned old versions

cd ~/tez-src
grep -rn "<version>" --include="pom.xml" | grep -E "jackson|commons-|guava|netty" \
  | grep -v -- "-test" | head -20

Cross-reference against the latest patch release on the package's GitHub releases page. If your match is two patch versions behind and the changelog mentions a security fix, you have a Stage 2 candidate.

The bar for these "self-found" candidates is the same: file a JIRA before coding, attach a 001 patch, wait for review.

Stage 3 — Error Messages and Exception Context

What this stage teaches

Stage 3 is the first stage where you change behaviour visible to operators in a production postmortem. You learn:

  • The CONTEXT rule for tez-dag: every error raised, logged, or rethrown inside the AppMaster must include the DAG ID, and the vertex/task/attempt ID wherever the call site has them in scope.
  • How to chain causes correctly: throw new TezException(msg, cause) instead of throw new TezException(msg) then cause.printStackTrace().
  • How to find exception sites that swallow the original cause: a catch (Exception e) followed by throw new RuntimeException("init failed") is the canonical bug.
  • How NDC (Nested Diagnostic Context, configured in tez-common) propagates IDs into log messages automatically — and how to add explicit IDs where NDC is not set up.

These patches are 20–200 lines, often single-method changes that touch error paths. The reviewer test is brutal but fair: "If this exception fires in a production AM log, can the on-call engineer identify the DAG, vertex, and task without cross-referencing any other log file?" If the answer is "no," the patch is not done.

JIRA filter to find candidates

project = TEZ
  AND resolution = Unresolved
  AND (text ~ "uninformative error" OR text ~ "missing context"
       OR text ~ "swallowed exception" OR text ~ "no DAG id"
       OR text ~ "improve error message" OR description ~ "InvalidStateTransitonException"
       AND text ~ "stack trace")
ORDER BY updated DESC

A second sweep — find your own candidates by grep:

cd ~/tez-src
# error sites that build a message without an ID
grep -rn 'throw new .*Exception(".*failed' tez-dag/src/main/java \
  | grep -v "ID\|Id\|getName" | head -30

# catch sites that drop the cause
grep -rn "catch (.*Exception .*)" tez-dag/src/main/java -A 2 \
  | grep -B1 "throw new" | grep -v ", e)" | head -30

The second grep is fuzzy; you will get false positives. But every true positive is a Stage 3 patch.

The CONTEXT rule for tez-dag

Every error inside the AppMaster must include enough state to identify which DAG instance on which AM on which application attempt threw it. The minimum fields, listed in priority order:

  1. The DAG ID (TezDAGID).
  2. The Vertex ID (TezVertexID) — required if the error is in a vertex context.
  3. The Task ID (TezTaskID) — required if in a task context.
  4. The Task Attempt ID (TezTaskAttemptID) — required if in an attempt context.
  5. The container ID — required for container-management errors.

Each of these IDs is a stable string (toString() returns the canonical form). They are present on every relevant impl object in tez-dag:

grep -n "getDagId\|getVertexId\|getTaskId\|getTaskAttemptId" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head

If you are editing a method on VertexImpl, you have getVertexId() and getDagId() in scope. If you do not include them in the error, the patch is incomplete.

Walked example A — uninformative TezException in VertexImpl.maybeSendConfiguredEvent

Symptom: a user reports their DAG fails with:

2026-04-12 10:14:21,003 ERROR [Dispatcher thread] org.apache.tez.dag.app.dag.impl.VertexImpl:
  Vertex init failed
org.apache.tez.dag.api.TezException: init failed
    at org.apache.tez.dag.app.dag.impl.VertexImpl.maybeSendConfiguredEvent(VertexImpl.java:NNNN)

That error tells the operator nothing. No DAG ID, no vertex name, no cause.

Step 1 — Find the throw site

cd ~/tez-src
grep -n 'throw new TezException("init failed' \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java

Read 20 lines of context around the hit. The method has vertexId, getDagId(), and getName() all in scope.

Step 2 — The diff

--- a/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java
+++ b/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java
@@
-    } catch (AMUserCodeException e) {
-      throw new TezException("init failed");
-    }
+    } catch (AMUserCodeException e) {
+      String msg = String.format(
+          "Vertex %s (%s) of DAG %s failed during configured-event dispatch: %s",
+          getName(), vertexId, getDagId(), e.getMessage());
+      LOG.error(msg, e);
+      throw new TezException(msg, e);
+    }

What changed:

  1. The message now identifies the vertex name (human-readable), the vertex ID (machine-stable), and the DAG ID.
  2. The original exception is chained via the two-argument TezException constructor. The full stack trace survives.
  3. The error is also logged at ERROR with the cause. Belt and braces — some callers swallow the exception silently, and the log line is the only record that survives.
  4. String.format is used so the placeholders are visually aligned with the field names. Reviewers prefer it over +-concatenation when the message has more than three substitutions.

Step 3 — Regression test

Add to tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestVertexImpl.java:

@Test(timeout = 5000)
public void testInitFailureMessageIncludesIds() throws Exception {
  VertexImpl v = createVertexThatFailsInConfigured(); // existing helper pattern
  try {
    v.maybeSendConfiguredEvent();
    fail("expected TezException");
  } catch (TezException e) {
    assertTrue("message should contain vertex id",
        e.getMessage().contains(v.getVertexId().toString()));
    assertTrue("message should contain dag id",
        e.getMessage().contains(v.getDagId().toString()));
    assertNotNull("cause should be preserved", e.getCause());
  }
}

The test asserts on substring presence, not exact string equality. Reviewers reject exact-string assertions because they break the next time the message is rephrased.

Step 4 — Run targeted tests

cd ~/tez-src
mvn -pl tez-dag test -Dtest=TestVertexImpl -q 2>&1 | tail -40

The full TestVertexImpl suite takes 3–5 minutes on a laptop. Run it. A state-machine-adjacent change always risks breaking a sibling transition.

Walked example B — swallowed cause in DAGAppMaster.startDAG

Find the bug:

cd ~/tez-src
grep -rn "catch (.*Exception" tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java \
  -A 3 | grep -B1 "throw new" | head -20

Suppose the offender looks like:

try {
  initServices();
} catch (Exception e) {
  throw new TezUncheckedException("Failed to start AM");
}

The diff:

--- a/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java
+++ b/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java
@@
     try {
       initServices();
     } catch (Exception e) {
-      throw new TezUncheckedException("Failed to start AM");
+      throw new TezUncheckedException(
+          "Failed to start AM for application " + appAttemptID + ": "
+              + e.getMessage(), e);
     }

Two fixes at once: the cause is preserved (the second constructor argument), and the message now includes the appAttemptID which the surrounding DAGAppMaster has in scope. This patch is small but high-leverage: the AM startup path is the single most common place a swallowed cause hides a real configuration bug.

Walked example C — log-only context via NDC

Some hot paths cannot afford a String.format per call. The Tez convention there is NDC. Look in tez-common/src/main/java/org/apache/tez/common/CallableWithNdc.java:

cat $(find ~/tez-src/tez-common/src/main/java -name "CallableWithNdc.java")

When the dispatcher invokes a vertex transition callback, it pushes the vertex ID onto the NDC stack. log4j's %X{...} pattern then includes the ID in every log line for the duration of the call. If you discover a log message in VertexImpl that lacks the vertex ID, first check whether NDC already provides it via the log pattern. If yes, the message is fine; if no, add the ID inline. Submitting a patch that adds an explicit ID where NDC already prints it is a reviewer-rejected patch.

Pitfalls

  • Don't include e.getStackTrace() in your message. The stack trace is what LOG.error(msg, e) is for. Concatenating it into the message turns a one-line log into a 60-line one.
  • Don't use e.toString() in messages. Use e.getMessage() so the message stays single-line; the stack trace lives in the chained throwable.
  • Don't catch Throwable to add context. Catching Throwable swallows OutOfMemoryError and ThreadDeath. Catch Exception (or the narrowest superclass that fits).
  • Don't add context that requires a lock. A getName() call that internally takes the vertex write-lock is a deadlock waiting to happen if the error path itself holds the lock. Always check the lock semantics of the getter you call in an error path.
  • Don't change the exception type to add context. throw new TezException is still a TezException after your patch; changing it to TezUncheckedException is a behavior change and not allowed in Stage 3.
  • Don't add context that includes user data without redaction. If your error message includes a configuration value, check whether it could contain credentials. The Tez convention is to print the key, not the value, when the key matches .*\.(password|secret|token|credential).

Exit criteria — when you're ready for the next stage

Move to Stage 4 when:

  • You have shipped at least one error-context patch in tez-dag and one in tez-runtime-library that includes the DAG and vertex/task IDs.
  • A reviewer has accepted your test pattern (substring assertion, no exact-string match) without a comment.
  • You can find at least three more candidate error sites in five minutes of grepping without referring to this chapter.
  • You have read VertexImpl.maybeSendConfiguredEvent and the surrounding 200 lines without feeling lost — that file is the gateway to Stage 4.

Stage 4 will take you inside the state machines themselves.

Stage 4 — State Machine Transitions

What this stage teaches

Stage 4 is the first stage that requires you to understand the Tez AppMaster, not just navigate it. You learn:

  • The StateMachineFactory DSL used in Hadoop / Tez to declare finite state machines. The two canonical instances are VertexImpl.stateMachineFactory and TaskImpl.stateMachineFactory.
  • The InvalidStateTransitonException (note the historical typo — "Transiton", not "Transition" — preserved for API compatibility) that the state machine throws when an event arrives in a state with no registered transition.
  • How to add a transition with the right guard, without widening the surface area of the state machine accidentally.
  • The hard rule: never widen a transition without a dev@ design discussion. Adding a transition from RUNNING to KILLED on a new event class is a semantic change that may cascade to ATS, the client, and the speculator.
  • The TestVertexImpl and TestTaskImpl patterns for asserting that an event in a state produces an expected next state.

Patches are typically 30–250 lines: a transition table entry, a small guard helper, a fired event, and a deterministic regression test.

Reading order before you touch any code

  1. tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java — read the static stateMachineFactory block end to end. It is several hundred lines of .addTransition(...) calls. Diagram it on paper.
  2. tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java — same exercise for tasks.
  3. tez-common/src/main/java/org/apache/tez/state/StateMachineTez.java — the wrapper Tez puts around the Hadoop state machine.
  4. The deep dives state-machines and vertex-lifecycle. Do not skip these.

Then, and only then, file a JIRA.

JIRA filter to find candidates

The most fruitful filter:

project = TEZ
  AND resolution = Unresolved
  AND (text ~ "InvalidStateTransitonException" OR text ~ "Invalid event"
       OR text ~ "missing transition" OR description ~ "stateMachineFactory")
ORDER BY updated DESC

A second filter for postmortem-style tickets:

project = TEZ AND status = Open AND component in ("tez-dag")
  AND priority in (Major, Critical) AND text ~ "VertexState\\|TaskState"

Most real Stage 4 work comes from operator reports of an AM that crashed with InvalidStateTransitonException: Invalid event X on Y in state Z. That stack trace is the smoking gun: state Z received event X and had no registered handler. The fix is one of:

  1. Add the transition with a guard (most common).
  2. Suppress the event in that state because it is a benign late delivery (use addTransition(state, state, event) — a self-loop).
  3. Fix the sender not to emit the event in that state (sometimes the bug is upstream).

Choosing wrong is the most common Stage 4 mistake. Pick option 3 only if you can prove the event should never have been emitted.

Walked example — missing V_INIT transition in VertexState.NEW

Symptom: an operator reports a recurring AM crash:

InvalidStateTransitonException: Invalid event: V_INIT at NEW
  at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(...)
  at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:NNNN)

V_INIT arriving while the vertex is in NEW is suspicious — NEW is supposed to accept V_INIT. Investigation reveals the transition is registered for the common path, but a recently-added early-error path emits V_INIT from a different thread before the main scheduler does, and the second V_INIT arrives while the vertex is back in NEW after a re-init.

Step 1 — Read the existing transitions

cd ~/tez-src
grep -n "addTransition(VertexState.NEW" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head -20

You will see something like (simplified):

.addTransition(VertexState.NEW, VertexState.INITED,
    VertexEventType.V_INIT, new InitTransition())
.addTransition(VertexState.NEW, VertexState.FAILED,
    VertexEventType.V_TERMINATE, new TerminateNewVertexTransition())

V_INIT on NEW is registered. So the crash means the vertex was not in NEW when the second V_INIT arrived — it was somewhere else, perhaps INITED. Re-grep:

grep -n "addTransition(VertexState.INITED" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | grep "V_INIT"

No hit. That is the bug: V_INIT arriving in INITED is unhandled.

Step 2 — Decide: add, ignore, or fix upstream

V_INIT in INITED is a duplicate event. It is benign (the vertex is already initialised; the second message is redundant). The correct fix is to ignore the duplicate — a self-loop. This is the safe, narrow change.

We are not widening behaviour. We are saying: "in INITED, a redundant V_INIT is a no-op, not a crash."

Step 3 — The diff

--- a/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java
+++ b/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java
@@
        .addTransition(VertexState.INITED, VertexState.RUNNING,
            VertexEventType.V_START, new StartTransition())
+
+       // A duplicate V_INIT can arrive when an early error path fires V_INIT
+       // concurrently with the scheduler. The vertex is already initialised;
+       // ignore the duplicate rather than crashing the AM. See TEZ-XXXX.
+       .addTransition(VertexState.INITED, VertexState.INITED,
+           VertexEventType.V_INIT, VERTEX_STATE_CHANGED_CALLBACK_NOOP)

Where VERTEX_STATE_CHANGED_CALLBACK_NOOP is either a constant MultipleArcTransition that does nothing, or, more idiomatically, a small inner class:

private static class IgnoreEventTransition
    implements SingleArcTransition<VertexImpl, VertexEvent> {
  @Override
  public void transition(VertexImpl vertex, VertexEvent event) {
    LOG.debug("Ignoring duplicate {} on vertex {} in state {}",
        event.getType(), vertex.getVertexId(), vertex.getState());
  }
}

Two rules in this diff:

  1. The transition has a comment with the JIRA ID explaining why the self-loop exists. State-machine entries without comments are hard to remove safely two years later.
  2. The transition logs at DEBUG, not INFO. If the duplicate event is actually a symptom of a larger bug upstream, the debug log is what tells the operator.

Step 4 — Regression test in TestVertexImpl

@Test(timeout = 10000)
public void testDuplicateVInitInInitedIsNoOp() throws Exception {
  initAllVertices(VertexState.INITED);                  // existing helper
  VertexImpl v = vertices.get("vertex1");
  assertEquals(VertexState.INITED, v.getState());

  // Fire a second V_INIT — must not throw, must not change state
  v.handle(new VertexEvent(v.getVertexId(), VertexEventType.V_INIT));
  dispatcher.await();

  assertEquals("duplicate V_INIT should leave INITED unchanged",
      VertexState.INITED, v.getState());
}

The test pattern:

  • Use the existing initAllVertices(VertexState.INITED) helper. Do not invent your own bootstrap.
  • Always call dispatcher.await() after v.handle(...). TestVertexImpl uses DrainDispatcher, which is the only way to make event-driven tests deterministic.
  • Assert the post state. Never assert on internal counters unless the transition is supposed to change them.

Run it:

cd ~/tez-src
mvn -pl tez-dag test -Dtest=TestVertexImpl#testDuplicateVInitInInitedIsNoOp -q 2>&1 | tail -30

Then run the whole TestVertexImpl suite. A single transition addition has broken a sibling test more than once in Tez history.

Step 5 — dev@ notification

Before you post the patch:

Subject: [DISCUSS] TEZ-XXXX — add INITED -> INITED self-loop for V_INIT

I have a repro for a recurring AM crash where V_INIT arrives twice. The state
machine currently has no INITED+V_INIT entry. Proposed fix: self-loop with a
debug log. Sender side (early-error path) is left unchanged on the grounds
that defensive handling in the state machine is cheaper than chasing every
sender. Would appreciate a sanity check before I post the patch.

If a committer replies "actually the sender is the bug, fix that instead," you revise your approach. If silence for 48 hours, post the patch.

The "never widen without dev@" rule

What counts as widening:

  • Adding a transition from a non-terminal state to a terminal state on a new event. Example: RUNNING -> KILLED on V_USER_REQUEST_FORCE_KILL.
  • Adding a transition that changes a previously-rejected event into an accepted one with side effects (counters updated, downstream events emitted).
  • Removing a transition.

What is not widening:

  • Adding a self-loop that ignores a duplicate event (as above).
  • Adding a transition that converts an InvalidStateTransitonException into a controlled ERROR transition, when the event was clearly a fatal-bug signal.

The dev@ rule exists because state machines are observed externally: the AM emits state-changed events to ATS, the client poll loop watches them, the speculator reads them. Adding a transition is an API change for those observers, even if no Java type signature changes.

Pitfalls

  • Don't add transitions to fix symptoms. If you see InvalidStateTransitonException and the cause is "the sender shouldn't have emitted that event," fix the sender. Adding a transition to silence the exception hides the real bug.
  • Don't forget the regression test. Every transition patch must have a test that fires the event in the state and asserts the result. Tests using DrainDispatcher are the only ones reviewers accept.
  • Don't use Mockito.spy on VertexImpl. The state machine has private internal state that spies cannot reach reliably. Use the production class with the test helpers in TestVertexImpl and MockDAGAppMaster.
  • Don't change the transition() callback signature. Existing transitions use SingleArcTransition or MultipleArcTransition. Pick the matching one; do not introduce a new interface.
  • Don't ignore the typo. InvalidStateTransitonException (no second "i") is the canonical name in Hadoop. If you "fix" the typo in Tez code, you break binary compatibility with downstream callers that catch the exception by name.
  • Don't bundle a transition fix with an unrelated cleanup. Reviewers will ask you to split.

Exit criteria — when you're ready for the next stage

Move to Stage 5 when:

  • You have shipped one transition fix in VertexImpl or TaskImpl with a passing regression test in the corresponding Test* class.
  • You can draw the VertexImpl state diagram from memory (8 states, the main transitions, the terminal set).
  • You have read TaskAttemptImpl.stateMachineFactory in full and recognise the similarities and differences to VertexImpl.
  • A committer has reviewed your transition patch and accepted the addition without asking for a dev@ design thread — meaning your choice of "ignore vs add vs fix sender" was correct.

Stage 5 takes you out of the AM event loop and into the scheduler.

Stage 5 — Scheduler Bugs

What this stage teaches

Stage 5 takes you out of the per-vertex event loop and into the AM-wide scheduling layer. You learn:

  • The split between TaskSchedulerManager (the multi-scheduler dispatch shim) and the concrete YarnTaskSchedulerService (the AMRMClient-backed scheduler used in production), plus the alternative LocalTaskSchedulerService used by local mode and tests.
  • How container requests, allocations, and releases flow through AMRMClient, including the heldContainer lifecycle and the canonical leak: a held container that is never returned to YARN after an onError callback fires.
  • Locality miscounts: the bookkeeping mistake where a node-local allocation is charged as rack-local in getAvailableContainers, distorting the affinity signal sent back to the AMRM protocol.
  • Priority inversion: a high-priority request stuck behind a low-priority pending list because the request was added to the wrong queue.
  • Container behaviour across AM failover: when the AM restarts with tez.am.am-rm.heartbeat.interval-ms retries, what should and should not be re-claimed.
  • How to write a MiniTezCluster-backed integration test, and when the cheaper AMRMClient stub pattern is sufficient.

Patches are 50–500 lines, often with a non-trivial test that needs MiniTezCluster or MiniYARNCluster. Reviewers are strict: a scheduler patch without a deterministic test is rejected on sight.

Reading order

  1. tez-dag/src/main/java/org/apache/tez/dag/app/rm/TaskSchedulerManager.java
  2. tez-dag/src/main/java/org/apache/tez/dag/app/rm/YarnTaskSchedulerService.java
  3. tez-dag/src/main/java/org/apache/tez/dag/app/rm/container/AMContainerImpl.java
  4. tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestTaskSchedulerManager.java
  5. The deep dive scheduler.
cd ~/tez-src
wc -l tez-dag/src/main/java/org/apache/tez/dag/app/rm/*.java

If YarnTaskSchedulerService.java is over 2000 lines, that is expected.

JIRA filter to find candidates

project = TEZ
  AND component in ("tez-dag")
  AND resolution = Unresolved
  AND (text ~ "container leak" OR text ~ "scheduler" OR text ~ "locality"
       OR text ~ "priority" OR text ~ "AMRMClient" OR text ~ "heldContainer"
       OR description ~ "onError")
ORDER BY priority DESC, updated DESC

A second filter for AM-failover-related candidates:

project = TEZ AND resolution = Unresolved AND (text ~ "failover" OR text ~ "AM restart")
  AND component in ("tez-dag")

Walked example A — heldContainer never released after onError

Symptom: an operator reports their long-running session AM holds onto containers indefinitely after a transient RM disconnect. yarn application -status shows allocated containers far above what the running DAG should need.

Step 1 — Locate the leak path

cd ~/tez-src
grep -n "onError\|heldContainer\|releaseContainer" \
  tez-dag/src/main/java/org/apache/tez/dag/app/rm/YarnTaskSchedulerService.java | head -30

You find a class field:

private final Map<ContainerId, HeldContainer> heldContainers = new HashMap<>();

and an onError(Throwable t) callback (inherited from AMRMClientAsync.CallbackHandler):

@Override
public void onError(Throwable t) {
  LOG.error("AMRMClient error", t);
  appContext.getEventHandler().handle(
      new DAGAppMasterEventSchedulingServiceError(t));
}

The bug: heldContainers is populated by onContainersAllocated but never drained in onError. When the AM recovers and the RM reissues the same container IDs, the map already has stale entries, and the new allocations are silently dropped (the bookkeeping path checks heldContainers.containsKey(id)). The containers are effectively leaked.

Step 2 — Diff

--- a/tez-dag/src/main/java/org/apache/tez/dag/app/rm/YarnTaskSchedulerService.java
+++ b/tez-dag/src/main/java/org/apache/tez/dag/app/rm/YarnTaskSchedulerService.java
@@
   @Override
   public void onError(Throwable t) {
     LOG.error("AMRMClient error", t);
+    // Before we tear down, release any containers we still hold. If we don't,
+    // a recovering RM will re-issue the same ContainerIds and the dedup
+    // bookkeeping below will silently drop the new allocations. See TEZ-XXXX.
+    synchronized (heldContainers) {
+      for (HeldContainer hc : heldContainers.values()) {
+        try {
+          amRmClient.releaseAssignedContainer(hc.getContainer().getId());
+        } catch (Exception releaseErr) {
+          LOG.warn("Failed to release {} during onError cleanup: {}",
+              hc.getContainer().getId(), releaseErr.getMessage());
+        }
+      }
+      heldContainers.clear();
+    }
     appContext.getEventHandler().handle(
         new DAGAppMasterEventSchedulingServiceError(t));
   }

Rules in this diff:

  1. The cleanup runs before the event is dispatched. Once the event fires, the AM may shut down handlers, and any release call would race.
  2. The cleanup is synchronized on the same monitor that other writers to heldContainers use. Find that monitor first; if there is none, you have a second bug to file separately. Do not introduce a new lock in this patch.
  3. Each release is wrapped individually. One failure must not prevent the others from being released.
  4. Logged failures are WARN, not ERROR. The AM is already in an error path; doubling the severity drowns the originating cause.

Step 3 — Test with AMRMClient stub

A full MiniTezCluster test for this is overkill. Stub the client:

@Test(timeout = 10000)
public void testOnErrorReleasesHeldContainers() throws Exception {
  AMRMClientAsync<CookieContainerRequest> mockRm =
      mock(AMRMClientAsync.class);
  YarnTaskSchedulerService scheduler =
      new YarnTaskSchedulerService(mockAppCallbackHandler, appContext, mockRm);
  scheduler.serviceInit(new Configuration());
  scheduler.serviceStart();

  // simulate two allocations
  Container c1 = newContainer("container_1");
  Container c2 = newContainer("container_2");
  scheduler.onContainersAllocated(Arrays.asList(c1, c2));

  // fire onError
  scheduler.onError(new RuntimeException("RM gone"));

  // verify both were released
  verify(mockRm).releaseAssignedContainer(c1.getId());
  verify(mockRm).releaseAssignedContainer(c2.getId());
  assertTrue(scheduler.getHeldContainersForTest().isEmpty());
}

The pattern uses Mockito on the AMRM client interface, not on the YarnTaskSchedulerService itself. getHeldContainersForTest() is a package-private accessor you add in the same patch with a // VisibleForTesting comment.

Step 4 — Build, test, sign off

cd ~/tez-src
mvn -pl tez-dag test -Dtest=TestYarnTaskSchedulerService -q 2>&1 | tail -40
mvn -pl tez-tests test -Dtest=TestExternalTezServices -q 2>&1 | tail -10

The integration test (tez-tests) takes 5–10 minutes; skip it on the first local iteration but run it before the patch submission.

Walked example B — locality miscount

Symptom: a debug log shows node-local: 4, rack-local: 12, off-switch: 0 for a vertex whose input splits should give 14 node-local containers. The bookkeeping is off.

Locating the counter

cd ~/tez-src
grep -n "nodeLocal\|rackLocal\|offSwitch" \
  tez-dag/src/main/java/org/apache/tez/dag/app/rm/YarnTaskSchedulerService.java | head -20

You find an assignContainer(...) path that compares the allocated host against the request's preferred host. The bug: the comparison is host.equals(req.host), but host arrives as node-1.cluster.local while req.host is node-1. The short-form comparison fails, the allocation is miscounted as rack-local, and the affinity penalty cascades into the next request.

Diff

--- a/tez-dag/src/main/java/org/apache/tez/dag/app/rm/YarnTaskSchedulerService.java
+++ b/tez-dag/src/main/java/org/apache/tez/dag/app/rm/YarnTaskSchedulerService.java
@@
-    if (host.equals(request.getHosts()[0])) {
+    // Hosts may be reported as FQDNs by the RM but as short names by the
+    // caller-supplied hint. Compare on the leading label to keep both forms
+    // equivalent. See TEZ-XXXX.
+    if (hostMatches(host, request.getHosts()[0])) {
       nodeLocalCount.incrementAndGet();
     } else if (rackOf(host).equals(rackOf(request.getHosts()[0]))) {
       rackLocalCount.incrementAndGet();
     } else {
       offSwitchCount.incrementAndGet();
     }
   }
+
+  static boolean hostMatches(String a, String b) {
+    if (a == null || b == null) return false;
+    return a.equals(b)
+        || leadingLabel(a).equals(leadingLabel(b));
+  }
+
+  private static String leadingLabel(String h) {
+    int dot = h.indexOf('.');
+    return dot < 0 ? h : h.substring(0, dot);
+  }

The accompanying test asserts the counter under both FQDN and short-name forms.

Walked example C — priority inversion

Symptom: a high-priority request (priority 0, AM speculation) waits indefinitely behind a long queue of priority-5 requests, even though the scheduler has capacity.

Root cause: the request was added to the queue keyed by priority string, not priority int. "0" sorts after "10" in string ordering. The fix is to use an Integer key or a TreeMap with a numeric comparator. The diff and test follow the same pattern as above; the file is tez-dag/src/main/java/org/apache/tez/dag/app/rm/YarnTaskSchedulerService.java near the requestsByPriority field.

MiniTezCluster pattern

For bugs that only manifest end-to-end:

cd ~/tez-src
find tez-tests/src/test/java -name "TestMRRJobsDAGApi.java"

That file is the canonical worked example. The setup pattern:

private static MiniTezCluster tezCluster;

@BeforeClass
public static void setup() throws Exception {
  Configuration conf = new Configuration();
  tezCluster = new MiniTezCluster("TEZ-XXXX", 1, 1, 1);
  tezCluster.init(conf);
  tezCluster.start();
}

@AfterClass
public static void teardown() {
  if (tezCluster != null) {
    tezCluster.stop();
  }
}

Tests should:

  • Submit a small DAG (an OrderedWordCount derivative is fine).
  • Assert on DAGStatus and VertexStatus via the client.
  • Set tight tez.am.am-rm.heartbeat.interval-ms and tez.task.am.heartbeat.interval-ms overrides so retries fire quickly.

A MiniTezCluster test takes 30s+ per run; do not add more than one per JIRA.

Pitfalls

  • Don't mock the AppContext or the EventHandler if you can avoid it. Scheduler bugs often live in the handoff between scheduler and dispatcher. Mocking the dispatcher hides the bug.
  • Don't add Thread.sleep to scheduler tests. Use DrainDispatcher.await() or poll the scheduler's getHeldContainers() view with a timeout.
  • Don't introduce a new lock to fix a race. Most scheduler races are fixed by moving an existing line inside an existing synchronized block. Adding a new lock is a Stage 11 patch.
  • Don't change the AMRM heartbeat interval to make a test pass. That hides timing bugs that bite in production. Use the existing test helpers that drive the heartbeat synchronously.
  • Don't release containers in onContainersCompleted to "be safe". Hadoop's AMRMClient documentation forbids that; the container is already released by the RM, and a second release fires a confusing log line.
  • Don't fix a locality miscount by changing the comparison everywhere. The bug is usually a single inconsistency. Pin it down with a focused unit test before broadening the change.

Exit criteria — when you're ready for the next stage

Move to Stage 6 when:

  • You have shipped one scheduler patch with a passing MiniTezCluster or AMRM stub regression test.
  • You can read YarnTaskSchedulerService.assignContainer without referring to external docs.
  • You have written a MiniTezCluster test from scratch and it runs locally in under a minute.
  • You can explain the heldContainer lifecycle to another contributor in five sentences.

Stage 6 moves you into the runtime: ShuffleManager, Fetcher, MergeManager.

Stage 6 — Shuffle and Runtime

What this stage teaches

Stage 6 is the runtime stage. You learn:

  • The shuffle pipeline: how ShuffleManager schedules Fetcher threads against the upstream task outputs, how MergeManager consolidates fetched segments, and how the result is presented to the downstream processor as a KeyValuesReader.
  • The on-disk IFile format and the off-by-one EOF bugs that haunt every serialiser written against it.
  • FetchedInput and the in-memory vs on-disk decision: how tez.runtime.shuffle.memory.limit.percent interacts with MergeManager.canShuffleToMemory.
  • Fetch-failure retry storms: when a single bad NodeManager triggers cascading fetcher restarts that swamp the AM event queue.
  • How to inject deterministic faults using the FaultInjectionFetcher pattern (or, where it does not exist, the equivalent test double).

Patches are 80–600 lines and almost always come with a MiniTezCluster test because the runtime contracts are too subtle for unit tests alone.

Reading order

  1. tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/ShuffleManager.java
  2. tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/Fetcher.java
  3. tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/MergeManager.java
  4. tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/IFile.java
  5. The deep dive shuffle-sort.
cd ~/tez-src
wc -l tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/*.java
wc -l tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/IFile.java

JIRA filter to find candidates

project = TEZ
  AND component in ("tez-runtime-library", "tez-runtime-internals")
  AND resolution = Unresolved
  AND (text ~ "shuffle" OR text ~ "fetcher" OR text ~ "MergeManager"
       OR text ~ "IFile" OR text ~ "FetchedInput" OR text ~ "spill")
ORDER BY priority DESC, updated DESC

A second filter for fetch-failure storms specifically:

project = TEZ AND text ~ "fetch failure" AND text ~ "retry" AND resolution = Unresolved

Walked example A — fetch-failure retry storm

Symptom: a 5k-task vertex runs on a cluster where one NodeManager goes bad. Within minutes the AM logs are flooded with INPUT_READ_ERROR events. The DAG eventually succeeds but takes hours instead of minutes. The AM event queue backs up to 100k+ pending events.

Step 1 — Trace the path

cd ~/tez-src
grep -n "INPUT_READ_ERROR\|reportReadError\|fetchFailures" \
  tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/ShuffleManager.java

You find ShuffleManager.reportReadError(...) which fires a TaskAttemptEvent to the AM for every failed fetch. With 5k downstream tasks each trying to fetch from the bad source, the AM receives 5k events per cycle. The AM dedupes by source attempt, but only after the events are on the queue.

Step 2 — Identify the fix

The minimal fix is client-side debounce: a ShuffleManager should not re-report the same source attempt failure more than once per tez.runtime.shuffle.fetch-failure.report.cooldown-ms window. The TEZ convention is to add the config key with a sensible default (reportCooldownMs = 5_000).

Step 3 — Diff

--- a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/ShuffleManager.java
+++ b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/ShuffleManager.java
@@
+  private final ConcurrentMap<InputAttemptIdentifier, Long> lastReportedAt =
+      new ConcurrentHashMap<>();
+  private final long reportCooldownMs;
@@
   public void reportReadError(InputAttemptIdentifier srcAttempt, IOException e) {
+    long now = clock.getTime();
+    Long prev = lastReportedAt.get(srcAttempt);
+    if (prev != null && now - prev < reportCooldownMs) {
+      if (LOG.isDebugEnabled()) {
+        LOG.debug("Debouncing read-error report for {} (last={}ms ago)",
+            srcAttempt, now - prev);
+      }
+      return;
+    }
+    lastReportedAt.put(srcAttempt, now);
     inputContext.sendEvents(Collections.singletonList(
         createInputReadErrorEvent(srcAttempt, e)));
   }

Add the config key to TezRuntimeConfiguration:

+  public static final String TEZ_RUNTIME_SHUFFLE_FETCH_FAILURE_REPORT_COOLDOWN_MS =
+      TEZ_RUNTIME_PREFIX + "shuffle.fetch-failure.report.cooldown-ms";
+  public static final long
+      TEZ_RUNTIME_SHUFFLE_FETCH_FAILURE_REPORT_COOLDOWN_MS_DEFAULT = 5_000L;

And register it in the same file's tezRuntimeKeys set so the validator does not reject it.

Step 4 — Test with FaultInjectionFetcher pattern

There is no production FaultInjectionFetcher; the test pattern is to subclass ShuffleManager and override createFetcher to return a Fetcher that throws IOException on every call. The repro test sits in tez-runtime-library/src/test/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/TestShuffleManager.java:

@Test(timeout = 10000)
public void testReadErrorReportDebounce() throws Exception {
  Clock clock = new ControlledClock();
  TezConfiguration conf = new TezConfiguration();
  conf.setLong(TEZ_RUNTIME_SHUFFLE_FETCH_FAILURE_REPORT_COOLDOWN_MS, 1000);

  ShuffleManager sm = createShuffleManager(conf, clock);
  InputAttemptIdentifier src = newInputAttempt(0);

  sm.reportReadError(src, new IOException("first"));
  sm.reportReadError(src, new IOException("second (debounced)"));
  sm.reportReadError(src, new IOException("third (debounced)"));

  // Only the first event should reach the inputContext
  verify(inputContext, times(1)).sendEvents(anyList());

  // Advance the clock past the cooldown
  ((ControlledClock) clock).setTime(clock.getTime() + 2000);
  sm.reportReadError(src, new IOException("after cooldown"));
  verify(inputContext, times(2)).sendEvents(anyList());
}

Then a MiniTezCluster integration test with OrderedWordCount and a fault injection on a single Fetcher — confirms the AM event queue stays bounded.

Walked example B — off-by-one in IFile EOF

Symptom: a reader of IFile-format data occasionally returns one extra zero-length record at the end of a segment. Downstream processors see a null/empty key and either throw or silently insert a bogus row.

Step 1 — Locate

cd ~/tez-src
grep -n "EOF_MARKER\|readNextKeyValue\|nextRawKey" \
  tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/IFile.java

Read the Reader.nextRawKey loop and the EOF_MARKER constant. The classic bug shape: the loop tests bytesRead >= length after a successful read instead of before, allowing one extra iteration when the segment ends exactly on a record boundary.

Diff

--- a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/IFile.java
+++ b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/IFile.java
@@
   public boolean nextRawKey(DataInputBuffer key) throws IOException {
-    int recordLength = readVInt(dataIn);
-    if (recordLength == EOF_MARKER) {
-      return false;
-    }
+    if (bytesRead >= segmentLength) {
+      return false;
+    }
+    int recordLength = readVInt(dataIn);
+    if (recordLength == EOF_MARKER) {
+      return false;
+    }
     ...
   }

The fix is two lines. The harder part is the test.

Step 2 — Test

Add to tez-runtime-library/src/test/java/org/apache/tez/runtime/library/common/sort/impl/TestIFile.java:

@Test
public void testReaderStopsAtExactSegmentBoundary() throws Exception {
  // Write exactly two records, capture the byte length, construct a Reader
  // bounded to that byte length, and assert the third nextRawKey() returns
  // false without throwing.
  Path p = writeRecords(2);
  long segLen = fs.getFileStatus(p).getLen();
  Reader r = new Reader(conf, fs, p, codec, /*ifileReadAhead*/false, 0, segLen);
  assertTrue(r.nextRawKey(keyBuf));
  assertTrue(r.nextRawKey(keyBuf));
  assertFalse("must not return phantom third record",
      r.nextRawKey(keyBuf));
  r.close();
}

Run:

mvn -pl tez-runtime-library test -Dtest=TestIFile -q 2>&1 | tail -30

A reviewer will also ask for a check that bytesRead does not advance past segmentLength on a malformed input — add it.

Walked example C — MergeManager unexpected spill

Symptom: a small DAG that fits comfortably in memory still spills to disk. Investigation: MergeManager.canShuffleToMemory returns false for inputs smaller than the configured threshold because it compares against the total memory budget rather than the per-input share.

The bug shape is in MergeManager.canShuffleToMemory(long size) — the comparison uses usedMemory + size > maxMemory * memoryLimitPercent where it should be >= plus a fairness check against singleShuffleLimit.

The repro: a tiny OrderedWordCount on MiniTezCluster with tez.runtime.shuffle.memory.limit.percent=0.95 and a single 100KB input. The counter MERGED_MAP_OUTPUTS_DISK should be 0 and is not.

The fix and test follow the same pattern as the previous two examples.

Pitfalls

  • Don't add Thread.sleep to a shuffle test. Use DrainDispatcher, the ControlledClock pattern, or a CountDownLatch driven by the production callback. Sleep-based shuffle tests are the #1 source of flakes in tez-runtime-library (see Stage 9).
  • Don't relax MergeManager thresholds to "fix" a memory error. The thresholds are a contract with the AM scheduler. If MergeManager runs out of memory, the bug is usually upstream — a Fetcher that should have used disk and went to memory.
  • Don't add a config key without registering it in tezRuntimeKeys. The runtime validates against an allowlist; an unregistered key is silently ignored.
  • Don't fix the IFile reader by widening the boundary check. Boundary bugs in IFile usually have a sibling bug in the writer. Read both before patching either.
  • Don't add a Fetcher retry loop that does not respect the AM's already- scheduled retry policy. Two retry loops in series turn a 3x retry into a 9x retry. Confirm via dispatcher trace that the AM is the only retry authority.
  • Don't change the on-disk IFile format without bumping IFile.VERSION. That is a Stage 11 patch and requires explicit back-compat shims.

Exit criteria — when you're ready for the next stage

Move to Stage 7 when:

  • You have shipped one shuffle or runtime patch with a deterministic MiniTezCluster regression test that passes in under two minutes.
  • You can recite the relationship between tez.runtime.shuffle.memory.limit.percent, tez.runtime.shuffle.fetch.buffer.percent, and the JVM heap.
  • You have read MergeManager.merge() end to end and can explain the on-disk vs in-memory branches.
  • A reviewer has accepted your fix without asking "is this the same bug as TEZ-XXXX?" — meaning you have learned to grep for prior art before patching.

Stage 7 takes you out of Tez code and into the Hive-on-Tez attribution skill.

Stage 7 — Hive-on-Tez Compatibility

What this stage teaches

Stage 7 is the cross-project stage. You learn:

  • The largest consumer of Tez in production is Hive. Bugs that look like Tez bugs are often Hive bugs that surface through Tez, and vice versa.
  • The contracts Hive depends on: DAGPlan size limits, edge property serialisation, session reuse via TezSessionPoolManager, and the HiveSplitGenerator event protocol.
  • The attribution decision tree: when to file on TEZ, when on HIVE, and when on both with a cross-reference.
  • The release-train interplay: Hive 3.x ships a specific Tez version; Hive 4.x ships a different one. A "fix" in Tez master does not automatically reach a Hive user until the next Hive release picks up a Tez release.
  • How to write an attribution argument in a JIRA description so committers in both projects agree on ownership before any code is written.

The "patch" deliverable for Stage 7 is often a JIRA, not code. A correct attribution call is the contribution; the code may be one line in each project or zero lines in Tez and a workaround in Hive.

JIRA filter to find candidates

project = TEZ AND text ~ "Hive" AND resolution = Unresolved
  ORDER BY updated DESC

Then on the Hive side:

project = HIVE
  AND (text ~ "Tez" OR text ~ "TezSession" OR text ~ "DAGPlan"
       OR text ~ "VertexManagerPlugin")
  AND resolution = Unresolved
  ORDER BY updated DESC

Cross-reference: a TEZ- ticket linked to a HIVE- ticket is a Stage 7 opportunity. The contribution is reading both, choosing the owner project, and writing the attribution.

The attribution decision tree

Given a symptom, walk this tree:

Is the symptom observed in a non-Hive Tez workload?
├── Yes  → Tez bug. File on TEZ. Stage 4–6 patch.
└── No (Hive-specific)
    │
    Does the symptom depend on a Hive class on the stack trace?
    ├── Yes (Hive frame is the top user-code frame)
    │   │
    │   Is the Tez API contract being misused by Hive?
    │   ├── Yes  → HIVE bug. File on HIVE. Tez may need a clearer
    │   │         contract / better exception message — file
    │   │         a follow-up TEZ ticket.
    │   └── No   → Possibly a Tez API contract gap. File on TEZ
    │             with a Hive repro, link the HIVE ticket.
    │
    └── No (the bug surfaces inside Tez code triggered by Hive's DAG)
        │
        Does the Hive DAG exercise an edge case Tez tests don't cover?
        ├── Yes  → Tez bug. File on TEZ. Add a Tez-side test that
        │         reproduces the shape without Hive.
        └── No   → File a `cross-project` ticket on TEZ with a
                  HIVE counter-ticket; sort ownership on dev@.

The tree is not law. It is the start of a dev@ conversation.

Walked example A — DAGPlan size exceeds limit on Hive autogenerated DAG

Symptom: a Hive 3.1 query with a large IN list (10k+ literals) submits a DAG that fails at TezClient.submitDAG with:

TezException: DAGPlan serialised size 67_108_864 exceeds limit 67_108_864

The Tez default is 64MB on the wire. Hive can in principle stay under it, but the codegen path for very large IN lists doesn't truncate.

Step 1 — Attribution

Walk the tree:

  • Non-Hive workload? No, Hive-specific.
  • Hive on stack? Yes, HiveSplitGenerator.
  • Is Hive misusing Tez API? NoDAGPlan is exactly the wire format Tez expects; Hive is sending a legitimate but large payload.
  • Is this an edge case Tez tests don't cover? Yes — Tez tests submit small DAGPlans.

Conclusion: this is a Tez API contract gap that Hive happens to hit first. The fix is twofold:

  1. Tez side: raise the configurable limit and improve the error message to tell the operator which key to bump. File on TEZ.
  2. Hive side: paginate the IN list literal codegen. File on HIVE.

The Tez patch is small and lands first.

Step 2 — The Tez-side diff

--- a/tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java
+++ b/tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java
@@
+  /**
+   * Maximum size (bytes) of the serialised {@link DAGPlan} that the AM
+   * accepts in a single submission. The default of 64MiB is a Hadoop
+   * IPC limit. Operators submitting very large DAGs (typically generated
+   * by upstream query engines) may need to raise this.
+   * @since 0.10.4
+   */
+  public static final String TEZ_DAG_PLAN_MAX_BYTES =
+      TEZ_PREFIX + "dag.plan.max.bytes";
+  public static final int TEZ_DAG_PLAN_MAX_BYTES_DEFAULT = 64 * 1024 * 1024;

And in tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java:

-    if (serialised.length > 64 * 1024 * 1024) {
-      throw new TezException("DAGPlan too large");
+    int max = conf.getInt(TEZ_DAG_PLAN_MAX_BYTES, TEZ_DAG_PLAN_MAX_BYTES_DEFAULT);
+    if (serialised.length > max) {
+      throw new TezException(String.format(
+          "DAGPlan serialised size %d exceeds limit %d. "
+              + "Raise %s on the submitter and AM, or reduce DAGPlan size "
+              + "(typically by pruning literal lists or split metadata).",
+          serialised.length, max, TEZ_DAG_PLAN_MAX_BYTES));
     }

The patch makes the limit explicit, configurable, and self-describing.

Step 3 — The JIRA description (attribution argument)

Summary: Make DAGPlan size limit configurable and self-describing

Description:
Hive's HiveSplitGenerator can generate DAGPlans > 64MiB for queries with
very large IN lists. Currently Tez throws "DAGPlan too large" with no
actionable advice. The Hive side will paginate (HIVE-NNNNN), but Tez
should:

  1. Expose tez.dag.plan.max.bytes so operators can raise the cap.
  2. Produce an error message that names the key and the cause.

Attribution rationale:
  - This is a Tez API contract gap: legitimate DAGPlans should not be
    silently rejected with no recourse.
  - Hive is the first downstream that hits this; other DAG generators
    (Pig-on-Tez, custom DAGs from BI tools) will hit it next.
  - HIVE-NNNNN is filed in parallel for the codegen pagination.

Tests:
  - TestDAGAppMaster#testDAGPlanSizeLimitConfigurable
  - End-to-end repro left to HIVE-NNNNN (Tez has no test that builds a
    pathological 64MiB DAGPlan).

This is the cross-project pattern: the TEZ ticket cites HIVE-NNNNN explicitly, states the attribution rationale, and stops short of fixing Hive's behaviour.

Walked example B — edge property mismatch on Hive upgrade

Symptom: after upgrading Hive 3.1 → 3.2, certain queries fail with:

TezException: EdgeProperty mismatch on edge v1->v2: source class
  org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
  does not match sink class
  org.apache.tez.runtime.library.input.UnorderedKVInput

Tez rejects the DAG because the edge wiring is inconsistent.

Attribution: Hive 3.2 emitted a different sink type for that vertex. Tez is behaving correctly — it is enforcing the edge contract. This is a HIVE bug. File on HIVE. The Tez side requires no patch.

The contribution here is the attribution itself plus a Tez-side documentation note on the validator: "see EdgeProperty.checkCompatible for the rules enforced." Add a docs patch (Stage 1) if no such note exists.

Walked example C — TezSessionPoolManager reuse leak

Symptom: HiveServer2 uses TezSessionPoolManager to reuse AMs across queries. A specific Hive query path leaves the session in a state where the next query sees stale credentials.

Attribution: TezSessionPoolManager is a Hive class (in the Hive repo), even though it manages TezClient instances. Find it:

grep -rn "class TezSessionPoolManager" ~/hive-src/ql/src/java

The bug is in Hive. The Tez API used (TezClient.start()) is correct.

File on HIVE. The Tez contribution is zero code; it is the attribution call and the explanation in the JIRA comments that prevents the ticket bouncing.

Reading the Hive code path for attribution

Even though you may not commit to Hive, you must be able to read the Hive classes that touch Tez:

  • org.apache.hadoop.hive.ql.exec.tez.DagUtils — Hive's DAG builder.
  • org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator — Hive's input split generation, called from Tez VertexManagers.
  • org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager — session reuse.
  • org.apache.hadoop.hive.ql.exec.tez.TezSessionState — per-session state.

Keep a Hive checkout next to your Tez checkout:

git clone https://github.com/apache/hive ~/hive-src

A grep across both:

grep -rn "DAGPlan\|VertexManagerPluginDescriptor" ~/hive-src/ql/src/java | head

is the start of every Stage 7 investigation.

Pitfalls

  • Don't fix a Hive bug in Tez. Even if the symptom appears on a Tez stack frame, do not patch Tez to work around an incorrect Hive use of the API. You will trap Tez into supporting buggy clients forever.
  • Don't expand a Tez API to "make Hive easier". That is a Stage 11 patch with a dev@ design thread; not a Stage 7 patch.
  • Don't assume the Hive committers will read your TEZ ticket. CC the appropriate Hive committers explicitly, or post a short note on dev@hive.apache.org linking the JIRA.
  • Don't promise a Tez backport to a specific Hive release. Release alignment is a separate conversation; you control your patch's landing in Tez, not when Hive picks it up.
  • Don't file the same bug on both projects without distinguishing the work. TEZ-NNNN should fix the Tez side; HIVE-NNNN should fix the Hive side; each ticket should cross-reference the other and say exactly what code lives in which project.
  • Don't break older Hive versions to fix newer ones. A Tez change that raises the minimum required Hive version is a Stage 11 / Stage 12 call.

Exit criteria — when you're ready for the next stage

Move to Stage 8 when:

  • You have correctly attributed at least one symptom to HIVE (saving Tez from an incorrect patch) and one to TEZ (with a Hive counter-ticket).
  • You have a ~/hive-src checkout next to ~/tez-src and have grepped across both at least three times during real investigation.
  • You can describe the lifecycle of a TezSessionState from creation to reuse to teardown in five sentences.
  • You have read EdgeProperty.checkCompatible and know which mismatches the Tez validator does and does not flag.

Stage 8 takes you into the YARN integration layer.

Stage 8 — YARN Integration

What this stage teaches

Stage 8 lives at the Tez/YARN boundary. You learn:

  • How the Tez AM acquires and renews its AMRMToken, and the canonical bug: long-running session AMs (multi-day Hive sessions) whose AMRMToken expires while the AM is mid-RPC.
  • Log aggregation: how Tez's container exit hooks interact with the NM's LogAggregationService. The canonical symptom: missing container logs after AM crash because the AM never told the NM to flush.
  • The NM aux service: the Tez ShuffleHandler (or the MR ShuffleHandler when configured) lives in tez-plugins/tez-aux-services. Version mismatches between AM-side tez-runtime-library and NM-side aux service cause shuffle failures with cryptic error messages.
  • Kerberos delegation token renewal across DAG lifecycles, especially when multiple DAGs in a session use the same Credentials object.
  • TezClient AMRMToken handling: where the token lives in the submitter process versus the AM.

Patches in this stage are 50–400 lines but often require a Hadoop-version- specific code path, so the tez-plugins/tez-aux-services profile structure matters more than in other stages.

Reading order

  1. tez-api/src/main/java/org/apache/tez/client/TezClient.java
  2. tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java — focus on the AMRMToken handling and the credential propagation.
  3. tez-plugins/tez-aux-services/src/main/java/org/apache/tez/auxservices/ShuffleHandler.java
  4. The deep dive yarn-integration.
cd ~/tez-src
grep -rn "AMRMToken\|getCredentials\|TokenIdentifier" \
  tez-api/src/main/java tez-dag/src/main/java | head -30
ls tez-plugins/tez-aux-services/src/main/java/org/apache/tez/auxservices/

JIRA filter to find candidates

project = TEZ
  AND component in ("tez-dag", "tez-plugins")
  AND resolution = Unresolved
  AND (text ~ "AMRMToken" OR text ~ "kerberos" OR text ~ "delegation token"
       OR text ~ "log aggregation" OR text ~ "ShuffleHandler"
       OR text ~ "aux service" OR description ~ "TokenExpired")
ORDER BY updated DESC

A second filter focused on long-running session bugs:

project = TEZ AND text ~ "session" AND text ~ "expired"
  AND resolution = Unresolved

Walked example A — AMRMToken expiry on long DAGs

Symptom: a Hive session AM runs for 36 hours. On hour 24 it starts logging:

SecretManager$InvalidToken: AMRMToken for application appattempt_X has expired.

The AM crashes mid-DAG. The user loses the long-running session and resubmits all in-progress queries.

Step 1 — Trace token lifetime

cd ~/tez-src
grep -n "AMRMToken\|registerApplicationMaster" \
  tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java
grep -rn "renewMaxLifetime\|token-max-lifetime" tez-api tez-dag tez-common

YARN's yarn.resourcemanager.am-rm-tokens.master-key-rolling-interval-secs default is 24h. When the RM rotates the master key, the AM's cached AMRMToken becomes invalid. The fix is to detect a token-expired exception on the AMRM heartbeat path and re-acquire the token from the RM (which already exposes this via the heartbeat response in modern Hadoop versions).

Step 2 — Choose the right Hadoop version

tez-aux-services and tez-dag build against the configured Hadoop profile:

grep -rn "hadoop28\|hadoop29\|hadoop31" pom.xml | head

Token rollover handling differs across Hadoop minor versions. The patch must be a no-op on profiles where the Hadoop client already handles the rollover transparently. Confirm by:

grep -rn "AMRMToken" ~/hadoop-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client | head

If AMRMClientAsyncImpl already loops on token expiry in Hadoop 3.x, your Tez patch is a Hadoop-2.x-only path guarded by an availability check.

Step 3 — Diff

--- a/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java
+++ b/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java
@@
   private void heartbeatLoop() {
     while (!shutdownRequested) {
       try {
         AllocateResponse resp = amRmClient.allocate(progress);
+        // Hadoop 2.x clients did not transparently refresh the AMRMToken
+        // on master-key rollover. Detect token-expired and re-acquire.
+        // See TEZ-XXXX.
+        if (resp.getAMRMToken() != null) {
+          UserGroupInformation.getCurrentUser().addToken(
+              ConverterUtils.convertFromYarn(resp.getAMRMToken(), null));
+        }
         processAllocations(resp);
       } catch (InvalidToken e) {
+        LOG.warn("AMRMToken invalid for {}, attempting re-register", appAttemptID);
+        try {
+          amRmClient.registerApplicationMaster(host, port, trackingUrl);
+          continue;
+        } catch (Exception reErr) {
+          LOG.error("Re-register failed; AM will exit", reErr);
+          throw new TezUncheckedException(reErr);
+        }
       } catch (Exception e) {
         ...
       }
     }
   }

Step 4 — Test

A unit test stubs the AMRMClient to return an InvalidToken once then a healthy response, and asserts that registerApplicationMaster was called once and the loop continued. Pattern:

@Test(timeout = 10000)
public void testAmrmTokenReacquiredOnInvalidToken() throws Exception {
  AMRMClient mockRm = mock(AMRMClient.class);
  when(mockRm.allocate(anyFloat()))
      .thenThrow(new InvalidToken("expired"))
      .thenReturn(emptyAllocateResponse());
  DAGAppMaster am = createTestAM(mockRm);
  am.runOneHeartbeatIteration();
  verify(mockRm).registerApplicationMaster(anyString(), anyInt(), anyString());
  am.runOneHeartbeatIteration();
  // second iteration must succeed
}

A MiniYARNCluster test that triggers an actual key rollover is possible but slow; the unit test above is sufficient for review.

Walked example B — log aggregation race on AM crash

Symptom: an AM crashes (OutOfMemoryError). The cluster operator runs yarn logs -applicationId ... and gets nothing. The NodeManager's LogAggregationService reports the logs as never finalised.

Root cause: the JVM crashed before Tez's DAGAppMaster.shutdown() could flag the logs as aggregation-ready. NM's default is "wait for the AM to mark finalisation" rather than aggregating on container exit.

The fix

Tez registers a JVM shutdown hook (Runtime.getRuntime().addShutdownHook) that calls into the YARN LogAggregationContext to force-finalise. The hook must run before the JVM's normal exit handlers.

cd ~/tez-src
grep -n "addShutdownHook\|LogAggregationContext" \
  tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java

If a shutdown hook is registered but does not handle OutOfMemoryError, add a defensive try/catch (Throwable) and ensure the hook is the first shutdown hook registered (so it runs last and after other hooks have cleaned up).

The diff is small; the test is hard. The accepted pattern is a logged-evidence test: spin up a MiniYARNCluster, submit a DAG, kill -9 the AM process, and assert that the NM log aggregation finalised the logs within a bounded time. This test belongs in tez-tests and is slow (~30s).

Walked example C — NM aux service version mismatch

Symptom: a cluster operator deploys Tez 0.10.3 but the NMs still run the Tez 0.10.1 aux service. Shuffle fails with:

IOException: Unknown shuffle handler version: 2; expected 1

The fix is in tez-aux-services plus a docs note: the aux service on every NM must match the AM-side tez-runtime-library minor version. The Tez patch is twofold:

  1. The aux service must report its version in the protocol handshake.
  2. The client side must produce a self-describing error message that names the NM, the version it reported, and the version the AM expected.
--- a/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/Fetcher.java
+++ b/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/Fetcher.java
@@
-    if (serverVersion != EXPECTED_SHUFFLE_VERSION) {
-      throw new IOException("Unknown shuffle handler version: " + serverVersion);
+    if (serverVersion != EXPECTED_SHUFFLE_VERSION) {
+      throw new IOException(String.format(
+          "Tez shuffle handler version mismatch on %s:%d: server=%d, expected=%d. "
+              + "Likely cause: NodeManager aux-service jar is older than the AM. "
+              + "Ensure tez-aux-services-%s.jar is deployed to every NM.",
+          host, port, serverVersion, EXPECTED_SHUFFLE_VERSION,
+          TezVersionInfo.getVersion()));
     }

The patch is one improved error message and one documentation update in docs/src/site/markdown/install.md.

Pitfalls

  • Don't add a new JVM shutdown hook without considering ordering. Java does not guarantee shutdown hook order; if two hooks rely on each other, you must serialise them explicitly.
  • Don't catch Throwable outside a shutdown path. Catching Throwable in the heartbeat loop will swallow OutOfMemoryError and leave the AM in an undefined state.
  • Don't conflate AMRMToken with delegation tokens. AMRMToken authenticates the AM to the RM; delegation tokens authenticate the AM/tasks to HDFS or other services. Renewal paths and lifetimes are different.
  • Don't deploy a fix that requires the operator to redeploy tez-aux-services without saying so in the release notes. Aux service upgrades require an NM restart; that is operationally expensive.
  • Don't assume the Hadoop version on disk is the Hadoop version in production. Test against the minimum Hadoop version supported by your Tez release line (see pom.xml profile defs).
  • Don't hard-code token renewal intervals. Use the YARN-side configuration keys directly (yarn.resourcemanager.am-rm-tokens.master-key-rolling-interval-secs).

Exit criteria — when you're ready for the next stage

Move to Stage 9 when:

  • You have shipped one YARN-integration patch with evidence (in the JIRA description) of which Hadoop minor versions you tested against.
  • You can describe the AMRMToken lifecycle in five sentences including the master-key rollover.
  • You have read the LogAggregationContext API in the Hadoop source and understand the logIncludePattern / logExcludePattern interplay.
  • You have a tez-plugins/tez-aux-services build that runs locally and you understand which NMs need it.

Stage 9 returns to the in-repo skill set with a focus on test stability.

Stage 9 — Flaky Tests

What this stage teaches

Stage 9 is the unglamorous-but-essential stage. You learn:

  • The Tez flake taxonomy: Thread.sleep races, undrained AsyncDispatcher, MiniTezCluster port collisions, and @Test(timeout=...) budgets that were too tight for slow CI.
  • How to distinguish a flake (passes locally, fails on Jenkins 1-in-30 runs) from a real intermittent bug (manifests in production under load). Flakes are tests; intermittent bugs are not.
  • The DrainDispatcher.await() refactor: how to convert a sleep-based synchronisation to an event-drain-based one.
  • The @Rule and TestName patterns for diagnosing which test in a suite leaks state into the next.
  • When a flake fix is also a production code fix (the test was right; the code had a race).

Patches are 20–150 lines per test. They rarely change production code. The ones that do warrant a Stage 4–6 ticket in addition to the test fix.

JIRA filter to find candidates

project = TEZ
  AND (text ~ "flaky" OR text ~ "intermittent" OR labels = "flaky-test")
  AND resolution = Unresolved
ORDER BY updated DESC

A second source: Jenkins precommit history. Pick any open JIRA, find its Jenkins URL in the comments, click through to recent runs, look for tests that failed in one run and passed in the next on the same patch. Those tests are flake candidates regardless of whether a JIRA already exists.

A third source: your own mvn test output. Run any tez-dag test suite three times in a row:

cd ~/tez-src
for i in 1 2 3; do
  mvn -pl tez-dag test -Dtest=TestVertexImpl -q 2>&1 | tail -5
done

Any failure in the three-pass that doesn't repeat is a flake to investigate.

The Tez flake taxonomy

1. Thread.sleep races

The most common shape:

worker.submitJob(j);
Thread.sleep(500);                 // "wait for it to start"
assertTrue(worker.isJobRunning(j));

On a slow CI box, 500ms may not be enough. On a fast box, the job may have completed before the assertion. Both fail.

The fix is a poll with timeout:

worker.submitJob(j);
TestUtils.waitFor(() -> worker.isJobRunning(j), /*pollMs*/50, /*timeoutMs*/30_000);
assertTrue(worker.isJobRunning(j));

If TestUtils.waitFor does not exist in the module, copy the pattern from org.apache.tez.test.GenericCounter or write one yourself in three lines.

2. Undrained AsyncDispatcher

The dispatcher is event-driven. A test that fires an event and immediately asserts on state will see the pre-event state half the time.

The fix is DrainDispatcher.await():

cd ~/tez-src
grep -rn "class DrainDispatcher" tez-common/src/main/java tez-dag/src/test

Find the canonical class. The refactor:

-    dispatcher.getEventHandler().handle(new VertexEvent(vid, VertexEventType.V_INIT));
-    Thread.sleep(200);
-    assertEquals(VertexState.INITED, vertex.getState());
+    dispatcher.getEventHandler().handle(new VertexEvent(vid, VertexEventType.V_INIT));
+    dispatcher.await();
+    assertEquals(VertexState.INITED, vertex.getState());

The contract: await() returns when the event queue is empty and the last event has been fully handled (including any subsequent events the handler itself emitted). If the test still flakes after this refactor, the handler is emitting events to a different dispatcher (e.g. a child component has its own). Find it and drain that one too.

3. MiniTezCluster port collisions

The default MiniTezCluster binds a fixed RM port. Two suites running in parallel on the same machine collide. The fix is per-suite port randomisation:

-    tezCluster = new MiniTezCluster("test", 1, 1, 1);
+    tezCluster = new MiniTezCluster(TestName.getMethodName(), 1, 1, 1);
+    Configuration conf = new Configuration();
+    conf.setInt(YarnConfiguration.RM_PORT, 0);  // 0 = OS-assigned
+    conf.setInt(YarnConfiguration.RM_SCHEDULER_PORT, 0);
+    conf.setInt(YarnConfiguration.RM_RESOURCE_TRACKER_PORT, 0);
+    tezCluster.init(conf);

The 0 port tells the OS to assign an unused port. Then read the actual port from the cluster after start:

int amrmPort = tezCluster.getConfig().getInt(YarnConfiguration.RM_PORT, -1);

4. @Test(timeout=...) too tight

A test with @Test(timeout=1000) may pass on a developer's M3 Pro and fail on a contention-laden Jenkins agent. Raise the timeout to a value that comfortably covers the slow CI but is still bounded:

-  @Test(timeout = 1000)
+  @Test(timeout = 30_000)
   public void testInitTransitionRunsOnce() { ... }

The Tez convention: 30s for unit tests, 300s for MiniTezCluster tests. Never @Test(timeout = 0) — a hung test will block CI for hours.

Walked example — TestShuffleManager flake

Symptom: testReadErrorReportDebounce fails 1-in-12 runs on Jenkins with:

expected:<1> but was:<2>

i.e. the verify on inputContext.sendEvents saw two calls when one was expected.

Step 1 — Reproduce locally

cd ~/tez-src
for i in $(seq 1 50); do
  mvn -pl tez-runtime-library test \
    -Dtest=TestShuffleManager#testReadErrorReportDebounce \
    -q 2>&1 | tail -3
done | grep -c "FAILED"

A local reproduction at 1/50 frequency is good enough to start.

Step 2 — Diagnose

Read the test. The pattern:

sm.reportReadError(src, new IOException("first"));
sm.reportReadError(src, new IOException("second"));
verify(inputContext, times(1)).sendEvents(anyList());

reportReadError may dispatch to an internal executor. The verify runs before the executor has serviced the call. The Mockito verify sees only the synchronous call most of the time; the async one fires 1-in-12.

Step 3 — Fix

Replace verify with a timeout-bounded verify:

-    verify(inputContext, times(1)).sendEvents(anyList());
+    verify(inputContext, timeout(5_000).times(1)).sendEvents(anyList());

Mockito.timeout(ms) polls until the expected interactions match, then asserts the count. The test now waits up to 5 seconds before failing.

A bigger refactor (preferred): inject a deterministic executor:

ShuffleManager sm = createShuffleManager(conf, new DirectExecutor());

where DirectExecutor is a java.util.concurrent.Executor whose execute runs synchronously on the caller thread. Now there is no race, and the original verify(..., times(1)) is correct.

The reviewer rule: prefer the deterministic executor refactor over Mockito.timeout. The timeout-based fix masks future races; the deterministic fix eliminates them.

Step 4 — Confirm the fix

Run the loop again:

for i in $(seq 1 200); do
  mvn -pl tez-runtime-library test \
    -Dtest=TestShuffleManager#testReadErrorReportDebounce -q 2>&1 | tail -3
done | grep -c "FAILED"

200 runs, zero failures, is the bar. Don't ship a flake fix you have not stress-tested.

When a flake is a real bug

Sometimes a test flakes because the production code has a race. If the "obvious" flake fix is to insert a sleep or relax an assertion, stop and ask: could a production caller exercise the same race?

Example: VertexImpl.handle returning before all event-emission side effects complete. The flaky test fixes itself by dispatcher.await(), but a production caller doing the same sequence sees a partially-applied state. That is a Stage 4 bug, not a Stage 9 bug.

The decision rule:

  • The test races against an internal event queue → flake fix.
  • The test races against a public contract method → file a real bug.

Pitfalls

  • Don't @Ignore a flake to "fix" CI. The next contributor will silently remove the @Ignore and re-introduce the flake. File a real ticket with a written analysis even if you don't fix it.
  • Don't bump the @Test(timeout) without reasoning. A 30s timeout is evidence the test does real work; a 30000s timeout is evidence the test is broken.
  • Don't replace assertEquals with assertTrue(... contains ...) to silence a flake. That weakens the assertion permanently and hides the underlying race.
  • Don't refactor a test class wholesale in a flake patch. Fix the one test. If the class needs a wholesale refactor, file a separate JIRA.
  • Don't use Thread.yield() to fix a race. It is not a guarantee; it is a hint. Always use a real synchronisation primitive (CountDownLatch, dispatcher.await(), Future.get()).
  • Don't catch InterruptedException and ignore it. The Tez convention is Thread.currentThread().interrupt(); throw new ... so the interrupt status propagates.

Exit criteria — when you're ready for the next stage

Move to Stage 10 when:

  • You have de-flaked at least three tests with confirmed 200-run stability.
  • You have caught at least one real production race that was masquerading as a flake.
  • You can name the three flake patterns by heart (sleep races, undrained dispatcher, port collisions, tight timeouts).
  • A reviewer has accepted your deterministic-executor refactor as the preferred pattern over Mockito.timeout.

Stage 10 turns the focus to performance regressions.

Stage 10 — Performance Regressions

What this stage teaches

Stage 10 is where you stop fixing bugs and start measuring. You learn:

  • The Tez perf-regression workflow: identify symptom, git bisect to the culprit commit, profile under load, attribute the cost, ship a fix with before/after numbers.
  • Microbenchmarking with tez-examples/OrderedWordCount as the canonical small DAG. When that is too coarse, JMH at the call-site level.
  • Profilers: async-profiler for CPU/lock contention, JFR for allocation/GC pressure. When to use which.
  • The two perf hotspots most often blamed first: AsyncDispatcher queue contention and IFile record encoding.
  • How to file a perf-regression JIRA that committers take seriously: numbers, methodology, reproducibility, and a fix bounded in scope.

Patches are 30–300 lines, always with benchmark evidence in the JIRA. A performance patch without numbers is a no-op.

JIRA filter to find candidates

project = TEZ
  AND resolution = Unresolved
  AND (text ~ "performance regression" OR text ~ "slow"
       OR text ~ "contention" OR text ~ "allocation"
       OR labels = "performance")
ORDER BY priority DESC, updated DESC

A second source is the dev@ archive — search for "slowdown" or "regression" in the last six months. Operators often report perf issues without filing a JIRA. The first contribution is filing the JIRA with a repro.

The Tez perf-regression workflow

1. Reproduce the regression with a number

Never start a perf investigation with a vibe. Get a number:

cd ~/tez-src
mvn -pl tez-examples -am clean install -DskipTests -Phadoop28 -q
# Then run OrderedWordCount end-to-end on MiniTezCluster
mvn -pl tez-tests test -Dtest=TestExternalTezServices#testOrderedWordCount -q

For a more isolated benchmark, write a JMH micro:

find ~/tez-src -name "pom.xml" -exec grep -l jmh {} \;

If JMH is not in the test pom, add it scoped to test only — never to compile.

2. git bisect to the culprit commit

Suppose the regression is "OrderedWordCount on a 10-node MiniTezCluster went from 12s to 19s between 0.10.2 and 0.10.3":

cd ~/tez-src
git bisect start
git bisect bad 0.10.3
git bisect good 0.10.2

# Each step:
mvn clean install -DskipTests -Phadoop28 -q
mvn -pl tez-tests test -Dtest=TestExternalTezServices#testOrderedWordCount -q
# Record the wall time. Then:
git bisect good   # or 'git bisect bad'

Twenty commits between two minor releases means log2(20) ≈ 5 bisect steps. Bisect to the single commit, then read its diff. Often the commit is innocent and the regression is in a sibling commit interacting with it; bisect is the start of the investigation.

3. Profile under load

Once you suspect a region of code, profile:

# async-profiler: CPU samples
$ASYNC_PROFILER/profiler.sh -d 60 -f /tmp/dag.html -e cpu <AM-pid>

# JFR: GC + allocation
jcmd <AM-pid> JFR.start name=tez duration=60s filename=/tmp/dag.jfr

Profile the AM, not the submitting client. The AM is the long-running process where contention manifests.

For a per-task profile:

// In a one-off test only — never in production code
conf.set(TezConfiguration.TEZ_TASK_LAUNCH_CMD_OPTS,
    "-agentpath:/path/to/libasyncProfiler.so=start,event=cpu,file=/tmp/task-%p.jfr");

4. Attribute the cost

Read the flame graph. A single fat frame above the noise floor is your target. Most Tez regressions land in one of three buckets:

  • Lock contention on AsyncDispatcher.eventQueue or VertexImpl.writeLock.
  • Allocation pressure from IFile.Writer or MergeManager building short-lived buffers in a tight loop.
  • GC overhead from a long-lived collection that grows unbounded (e.g. a HashMap keyed by TaskAttemptId that is never pruned).

5. Ship a fix with numbers

A Stage 10 JIRA description must include:

Methodology:
  - Hardware: 16-core M3 Pro, 32GB RAM.
  - Command: mvn -pl tez-tests test -Dtest=...
  - Runs: 5 cold, 10 warm, report median + p95.
  - Hadoop profile: hadoop28.

Before (TEZ master at <hash>): median 19.0s, p95 22.1s.
After  (this patch on top):    median 12.4s, p95 13.7s.

Profile evidence: flame graph attached. AsyncDispatcher.handle was 38% CPU
before, 4% after.

A reviewer will ask for the profile artifact. Attach it.

Walked example A — AsyncDispatcher queue contention

Symptom: AM throughput collapses on DAGs with > 10k tasks. Profile shows 40% of CPU is in AsyncDispatcher.handle under LinkedBlockingQueue.put.

Step 1 — Diagnose

cd ~/tez-src
grep -n "LinkedBlockingQueue\|eventQueue" \
  tez-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java

(The class is technically Hadoop's AsyncDispatcher, but Tez subclasses and configures it in tez-common.) Single-producer multi-consumer would benefit from a partitioned queue keyed by event type.

Step 2 — The fix surface

Two acceptable approaches:

  1. Sharded dispatcher: partition events by destination ID so each shard has its own queue. Tez has the building blocks but not the wiring; the patch is the wiring.
  2. Batched event submission: collect events on the producer side and submit in groups, reducing lock acquisitions per task.

Both are large patches. The Stage 10 contribution is one of them, with a clear scope: "sharded dispatcher for vertex events only", not "rewrite AsyncDispatcher".

Step 3 — Numbers

For the sharded-dispatcher patch on a 10k-task OrderedWordCount:

Before: 19.0s median, 22.1s p95.
After:  12.4s median, 13.7s p95.
AsyncDispatcher.handle: 38% → 4% CPU.

These numbers go into the JIRA description, with a flame graph attached.

Step 4 — dev@ design ping

Any Stage 10 patch above ~50 lines deserves a dev@ thread:

Subject: [DISCUSS] TEZ-XXXX — shard AsyncDispatcher by destination type

I have a repro for AM throughput collapse on 10k-task DAGs. Profile attached.
Proposed fix: shard the AsyncDispatcher event queue by destination type
(Vertex / Task / TaskAttempt / Container). Numbers: 19s -> 12s median.

Open questions:
  1. Default shard count: I propose 4 with a configurable override.
  2. Compat: AsyncDispatcher is org.apache.hadoop, so we shim in tez-common.
  3. Tests: TestAsyncDispatcher + the existing scheduler integration tests.

Comments welcome before I post the patch.

If a committer flags an unexpected constraint (e.g. "we cannot shard because ATS event ordering depends on global sequence"), redesign before coding.

Walked example B — IFile record encoding hot path

Symptom: profile shows 22% CPU in IFile.Writer.append under WritableUtils.writeVInt. Allocation profile shows two byte[] per record.

Diagnose:

cd ~/tez-src
grep -n "writeVInt\|writeVLong\|new byte\[" \
  tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/IFile.java

The hot path allocates a fresh byte[] per record for VInt encoding. The fix is a reusable scratch buffer per Writer instance:

+  private final byte[] vIntBuf = new byte[9];
+
   public void append(DataInputBuffer key, DataInputBuffer value) throws IOException {
-    byte[] scratch = new byte[9];
-    int n = encodeVInt(key.getLength(), scratch);
-    out.write(scratch, 0, n);
+    int n = encodeVInt(key.getLength(), vIntBuf);
+    out.write(vIntBuf, 0, n);
     ...
   }

The patch is six lines. The justification is the JMH micro:

JMH benchmark: IFileWriter.append for 1M small records.
Before: 14.2 us/op, 32B/op allocation.
After:   8.7 us/op,  0B/op allocation.

This is a textbook Stage 10 patch: small, measurable, attributable.

Pitfalls

  • Don't ship a perf patch without numbers. Reviewers will reject it. "Looks faster" is not evidence.
  • Don't benchmark on the same machine you developed on without warm-up. Always run cold + warm passes; report median + p95.
  • Don't compare across different Hadoop profiles. Pick one profile and hold it constant.
  • Don't widen the scope of a perf patch mid-review. "I found another hotspot while I was here" → new JIRA.
  • Don't use micro-benchmark numbers in isolation. Always show the end-to-end impact too. A 2x improvement in IFile.Writer.append that yields 0.1% end-to-end improvement may not be worth merging.
  • Don't git bisect against a tree with unrelated WIP. git bisect is deterministic only against a clean tree.
  • Don't profile in production without the operator's consent. Even async-profiler has overhead; the operator should know.

Exit criteria — when you're ready for the next stage

Move to Stage 11 when:

  • You have shipped one perf patch with documented before/after numbers and an attached profile.
  • You can git bisect 20 commits without referring to documentation.
  • You have read at least one async-profiler flame graph for Tez and identified the hotspot without help.
  • A committer has accepted your patch's methodology section as sufficient evidence.

Stage 11 takes you into the compatibility contract.

Stage 11 — Backward Compatibility

What this stage teaches

Stage 11 is where every change you make is constrained by what was there before. You learn:

  • The Apache @InterfaceAudience and @InterfaceStability annotations and what they obligate you to preserve.
  • The Tez API surface: which packages are Public, which are LimitedPrivate("Hive,Pig"), and which are Private. The audience determines the cost of breaking a contract.
  • How to evolve a protobuf message without breaking older clients (optional fields, never reuse field numbers, never change a field type).
  • The deprecation cycle: how long a deprecated symbol must remain before removal, and what evidence is required to declare it ready for removal.
  • How to negotiate the dev@ conversation when a change is technically compatible but operationally disruptive.

The patches in this stage are often small. The thread is long. A compatibility change without a dev@ design thread is a Stage 11 patch that will be reverted.

The annotation taxonomy

Three audience levels:

AnnotationMeaningExamples
@InterfaceAudience.PublicAny external consumer may call this. Removal is a major-version break.TezClient, DAG, Vertex, Edge, Processor, most of tez-api.
@InterfaceAudience.LimitedPrivate({"Hive","Pig"})Only the named projects may call this. Coordinate with them before changing.Some internal-ish tez-api helpers used by Hive's DagUtils.
@InterfaceAudience.PrivateInternal to Tez. Free to change.Everything in tez-dag/src/main/java/org/apache/tez/dag/app/....

Three stability levels:

AnnotationMeaning
@InterfaceStability.StableCompatible across minor versions. Removal requires a major bump.
@InterfaceStability.EvolvingMay change between minor versions, but deprecation cycle expected.
@InterfaceStability.UnstableFree to break at any time.

The combined matrix gives nine cells. Most public Tez API is Public + Stable: the most expensive to change. Most internal Tez API is Private + Unstable: free to change.

Find the annotations:

cd ~/tez-src
grep -rn "@InterfaceAudience\|@InterfaceStability" tez-api/src/main/java | head -20

JIRA filter to find candidates

project = TEZ AND resolution = Unresolved
  AND (text ~ "deprecate" OR text ~ "compatibility"
       OR text ~ "InterfaceAudience" OR text ~ "protobuf"
       OR labels = "incompatible")
ORDER BY priority DESC, updated DESC

Walked example A — adding an optional protobuf field

Symptom: Tez wants to add a per-vertex "originating-user-class" string to the DAGPlan so the AM can attribute resource usage. The DAGPlan is wire-serialised to YARN's RM cache, so older AMs must continue to deserialise plans without the new field.

Step 1 — Locate the proto

cd ~/tez-src
find . -name "*.proto" | head
grep -n "message VertexPlan" $(find . -name "*.proto") | head

Read the existing VertexPlan message. Note the highest field number in use (say, 12). The new field must use a new number, not a recycled one.

Step 2 — The diff

--- a/tez-api/src/main/proto/DAGProtos.proto
+++ b/tez-api/src/main/proto/DAGProtos.proto
@@
 message VertexPlan {
   optional string name = 1;
   ...
   optional int32 task_resource_memory_mb = 12;
+  // @since 0.10.4 — optional; old AMs ignore unknown fields.
+  optional string originating_user_class = 13;
 }

Three rules:

  1. The field is optional. Never required — required fields break old readers. Tez uses proto2, where optional is the default for fields you may add later.
  2. The field number 13 has never been used before. Search the entire git history:
    git log -p -S "= 13" -- tez-api/src/main/proto/DAGProtos.proto
    
    to confirm.
  3. The comment names the introduction release. Future contributors will use it to decide whether the field is safe to assume in their code path.

Step 3 — Producer and consumer sides

The producer in tez-api/src/main/java/org/apache/tez/dag/api/DAG.java sets the field when known and leaves it unset when not. The consumer in tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java must tolerate the unset case:

+    if (vertexPlan.hasOriginatingUserClass()) {
+      this.originatingUserClass = vertexPlan.getOriginatingUserClass();
+    } else {
+      this.originatingUserClass = null;
+    }

The reviewer will reject any consumer that calls getOriginatingUserClass() without first calling hasOriginatingUserClass(). Proto2 optional fields return a default ("" for strings) when unset, which is not the same as "absent".

Step 4 — Test the back-compat

The test is a serialisation round-trip with an older binary deserialiser:

@Test
public void testOldAMCanDeserialiseNewPlan() throws Exception {
  VertexPlan newPlan = VertexPlan.newBuilder()
      .setName("v1")
      .setOriginatingUserClass("com.example.Job")
      .build();
  byte[] wire = newPlan.toByteArray();

  // Parse as if we were an older AM that doesn't know the new field
  // (use the generated descriptor with the field removed, or use
  // DynamicMessage to ignore unknown fields).
  VertexPlan parsed = VertexPlan.parseFrom(wire);
  assertEquals("v1", parsed.getName());
  // The unknown field is preserved in parsed.getUnknownFields() but
  // ignored by the AM's logic. That is the contract.
}

A real test against an older Tez jar is also valuable; check it in as a resource.

Walked example B — deprecating a public method

Symptom: TezClient.submitDAG(DAG) returns a DAGClient whose getDAGStatus contract is unclear. A new method submitDAGWithStatus(DAG) returns a typed future. The old method should be deprecated.

The diff

--- a/tez-api/src/main/java/org/apache/tez/client/TezClient.java
+++ b/tez-api/src/main/java/org/apache/tez/client/TezClient.java
@@
+  /**
+   * @deprecated as of 0.10.4. Use {@link #submitDAGWithStatus(DAG)} which
+   *     returns a typed future. This method will be removed in 0.11.0.
+   *     See <a href="https://issues.apache.org/jira/browse/TEZ-XXXX">TEZ-XXXX</a>.
+   */
+  @Deprecated
   public DAGClient submitDAG(DAG dag) throws ... { ... }

Rules for deprecation:

  1. The Javadoc names the replacement, the removal version, and the JIRA with the rationale.
  2. The @Deprecated annotation is on the method, not the class.
  3. The implementation is unchanged. Deprecation is a docs-and-annotation change; behaviour stays the same so existing callers continue to work.
  4. Never delete a deprecated method in the same patch. Deprecation and removal are separate releases. The minimum cycle in Tez is one minor release as deprecated, then removal in the next major.

The removal patch goes in only when:

  • The deprecation has been in a released version for at least one minor cycle.
  • Search of downstream code (Hive, Pig, the Tez examples) confirms no remaining callers.
  • A dev@ thread has confirmed removal is acceptable.

Walked example C — changing a LimitedPrivate("Hive") API

Symptom: a LimitedPrivate("Hive") helper in tez-api is mis-named. You want to rename it.

This is not a free change, despite LimitedPrivate. The audience ("Hive") must be coordinated with. The workflow:

  1. File the TEZ ticket with the rename proposal.
  2. Search the Hive source for the existing name; if any caller uses it, write the HIVE-side patch first (deprecation-import shim).
  3. Add the new name in Tez. Keep the old name as a @Deprecated wrapper for one release.
  4. Remove the old name in Tez only after Hive has shipped a release that uses the new name.

The contribution often spans two Tez releases and two Hive releases. That is the cost of LimitedPrivate.

Pitfalls

  • Don't reuse a protobuf field number after removing a field. Reserve it with reserved 7; in the proto file. Recycling a number breaks cross-version readers in undetectable ways.
  • Don't change the type of a protobuf field. stringbytes looks identical on the wire but is incompatible at parse time. Add a new field with a new number; deprecate the old.
  • Don't widen a Private API to Public without a dev@ thread. Once public, you cannot retract.
  • Don't remove a @Deprecated method in the same release that introduces the deprecation. That defeats the purpose of deprecation.
  • Don't change the default value of a configuration key without a dev@ thread. Default changes are invisible to compile-time checks but catastrophic in production. They are a Stage 12-adjacent change.
  • Don't introduce a new Stable annotation lightly. Once Stable, the method is locked for a major-version cycle.
  • Don't assume Hadoop's compatibility annotations are identical in meaning. They are similar but have project-specific nuance; read the Tez project's BUILDING.txt and the dev@ archive before relying on them.

Exit criteria — when you're ready for the next stage

Move to Stage 12 when:

  • You have shipped one compatibility-sensitive change (a protobuf evolution, a deprecation, or an API rename) with explicit annotations and dev@ sign-off.
  • You can recite the audience × stability matrix and pick the correct cell for an arbitrary tez-api class.
  • You have written a deprecation Javadoc that named the replacement, the removal version, and the JIRA without being prompted.
  • You have read the BUILDING.txt and dev@-archived compatibility guidance for Tez and Hadoop.

Stage 12 is the final stage: release-blocking issues and PMC-level work.

Stage 12 — Release-Blocking Issues

What this stage teaches

Stage 12 is the committer/PMC stage. You learn:

  • The four categories of release blockers: data loss, correctness regressions, AM crash, security CVE.
  • How to triage a candidate blocker during an RC vote: what evidence is required, who must be CC'd, and what the deadline-pressure tradeoffs are.
  • The Apache release process from a committer's seat: building an RC, signing artifacts, calling a [VOTE] thread, the 72-hour rule, and the meaning of +1 binding, -1 binding, +1, and 0 votes.
  • The Tez release notes format and what a release blocker contributes to it.
  • Security CVE handling: the private security@ list, embargoed disclosure, and the path from private patch to public release.

This is the only stage where you may be voting on someone else's work as much as writing your own. The patch surface is identical to earlier stages; the context in which you act is different.

JIRA filter to find candidates

project = TEZ
  AND priority in (Blocker, Critical)
  AND resolution = Unresolved
ORDER BY priority DESC, updated DESC

The set is small at any given time. During an RC vote it grows fast.

A second filter for the RC voting period:

project = TEZ AND priority = Blocker AND created > -7d

The four categories of release blockers

1. Data loss

The strictest category. Any code path where a successfully-acknowledged write can be lost, or a successfully-acknowledged read can return wrong data, is a data-loss blocker. Examples in Tez history:

  • A MergeManager spill that double-counted records and silently dropped one.
  • A Fetcher that ignored a checksum mismatch and returned corrupted bytes to the downstream processor.
  • A DAGRecovery path that reconstructed an incorrect parent vertex state after AM restart.

Triage: the JIRA description must contain a deterministic repro that the release manager can run in under five minutes. Without a repro, the issue is not a blocker — it is a "to be investigated" ticket.

2. Correctness regressions

A query that returned correct results in version N-1 returns wrong results in version N. The bar is lower than data loss (the data is still there; the output is wrong) but the triage is the same. A correctness regression that affects a single Hive query path is a blocker.

3. AM crash

Any reproducible InvalidStateTransitonException in master is a blocker during an RC. Operators expect the AM to survive their workload. An AM crash on a Hive-emitted DAG that worked in the previous release blocks the RC even if the DAG itself is "unusual" — the AM must be defensive against its inputs.

4. Security CVE

A demonstrated CVE in a Tez-owned class is a blocker regardless of whether it has been exploited. The disclosure path is security@tez.apache.org first, then the public JIRA only after the fix is ready.

Triage during an RC vote

The RC vote pattern on dev@:

Subject: [VOTE] Release Apache Tez 0.10.4 (RC1)

Hi,

I've prepared the first release candidate for Tez 0.10.4. The artifacts
are at:
  https://dist.apache.org/repos/dist/dev/tez/tez-0.10.4-rc1/

The git tag is:
  https://github.com/apache/tez/releases/tag/release-0.10.4-rc1

The release notes are:
  CHANGES.txt at the top of the tag.

Please verify the signatures, run the smoke tests, and vote:
  [+1] release this RC
  [0]  no opinion
  [-1] do not release (please explain)

The vote is open for 72 hours.

Your job, as a contributor evaluating the RC:

  1. Verify the artifact:
    curl -O https://dist.apache.org/repos/dist/dev/tez/tez-0.10.4-rc1/apache-tez-0.10.4-src.tar.gz
    curl -O https://dist.apache.org/repos/dist/dev/tez/tez-0.10.4-rc1/apache-tez-0.10.4-src.tar.gz.asc
    gpg --verify apache-tez-0.10.4-src.tar.gz.asc apache-tez-0.10.4-src.tar.gz
    
  2. Build from source:
    tar xf apache-tez-0.10.4-src.tar.gz
    cd apache-tez-0.10.4-src
    mvn clean install -DskipTests -Phadoop28
    
  3. Run a smoke test:
    mvn -pl tez-tests test -Dtest=TestExternalTezServices -Phadoop28
    
  4. Reply on the vote thread with your evidence.

Vote semantics

VoteMeaning
+1 bindingPMC member endorses release. Three are required for release.
+1Non-PMC endorses. Counts for momentum, not the binding count.
0No opinion. Often used to indicate "I built it, smoke test passed, but I can't speak to my use case."
-1 bindingPMC member vetoes. One -1 binding stops the release unless overridden by another vote (rare).
-1Non-PMC veto. Not binding, but committers will read it.

A -1 vote must include the reason. "Build failed" is not enough; "build failed because X test fails reproducibly on Hadoop 3.x profile, evidence at URL" is.

Walked example — discovering a blocker during RC vote

Symptom: during the 0.10.4 RC1 vote, you run the smoke test and observe a test failure in TestShuffleManager#testReadErrorReportDebounce that did not happen in 0.10.3.

Step 1 — Reproduce

cd apache-tez-0.10.4-src
for i in 1 2 3; do
  mvn -pl tez-runtime-library test \
    -Dtest=TestShuffleManager#testReadErrorReportDebounce -q 2>&1 | tail -5
done

If the failure is 3/3, it is reproducible. If 1/3, it is a flake (Stage 9 issue, not a blocker).

Step 2 — Identify the cause

git log v0.10.3..release-0.10.4-rc1 -- \
  tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped

You see a commit that changed the debounce window default from 5000ms to 500ms. The test was written against 5000ms; the change silently broke it.

Step 3 — Decide blocker vs not

A failing unit test in an RC is not automatically a blocker. The question is: does the underlying behaviour change affect production?

  • If the default change is intentional and the test should be updated → not a blocker. Fix the test in 0.10.4 hotfix or 0.10.5.
  • If the default change is unintentional or it breaks production users → blocker. RC1 must be cancelled; RC2 reverts the default change.

For this example, suppose the default change was intentional but the release notes don't mention it. The behaviour change is operator-visible (fetch-failure reports now arrive 10x more often, may overwhelm the AM event queue). That makes it a blocker for a different reason than the test failure: an undocumented behaviour change.

Step 4 — Vote and document

Subject: Re: [VOTE] Release Apache Tez 0.10.4 (RC1)

[-1] non-binding

While building the RC and running the smoke tests, I observed:
  TestShuffleManager#testReadErrorReportDebounce fails 3/3 runs.

Root cause: commit <hash> changed the default of
tez.runtime.shuffle.fetch-failure.report.cooldown-ms from 5000 to 500.
This is operator-visible behaviour change not noted in CHANGES.txt.

Recommendation: either revert the default in RC2 with the new default
deferred to 0.11.0, or keep the new default and update CHANGES.txt to
flag the operator impact and update the test.

Filed TEZ-XXXX with the analysis.

The release manager will respond. RC2 will either fix the issue (cancel, rebuild, vote again) or argue why the change is acceptable.

Release notes

The Tez release notes live in CHANGES.txt at the repo root, organised by release. The format:

Release 0.10.4 - 2026-XX-XX

  NEW FEATURES:
    TEZ-XXXX. Sharded AsyncDispatcher for high-fanout DAGs. (you)

  IMPROVEMENTS:
    TEZ-YYYY. Make DAGPlan size limit configurable. (you)

  BUG FIXES:
    TEZ-ZZZZ. Release held containers on AMRM onError. (you)

  INCOMPATIBLE CHANGES:
    TEZ-AAAA. Default of tez.runtime.shuffle.fetch-failure.report.cooldown-ms
              changed from 5000 to 500. Operators of long-running session AMs
              should evaluate AM event-queue capacity. (you)

Every patch that lands during the release cycle gets a line. The release manager assembles the file from the JIRA "Fix Version" field; contributors make the lines short and accurate.

Security CVE pipeline

The path from "I think I found a CVE" to a public release:

  1. Do not file a public JIRA. Email security@tez.apache.org (the private list, monitored by PMC members).
  2. Wait for acknowledgement (typically within 48 hours).
  3. Work with the security responder on a fix privately, in a private branch.
  4. Once the fix is ready, request a CVE ID via the Apache security team (or MITRE via the responder).
  5. Build a release that includes the fix.
  6. Publish the release; then the CVE is disclosed publicly with a JIRA.

The embargo window is typically 30–90 days. Contributors who report through the private channel and respect the embargo are credited in the advisory.

Pitfalls

  • Don't +1 a release you have not built and smoke-tested. A +1 carries weight; do not give it as a courtesy.
  • Don't -1 without evidence. A -1 blocks the release; the bar for evidence is high.
  • Don't escalate a Stage 9 flake to a blocker. Reproduce three times before voting.
  • Don't disclose a security vulnerability publicly before the embargo expires. Apache projects take this very seriously; a leak can lose you committer status.
  • Don't file Priority: Blocker casually. Reserve it for the four categories above. JIRA pollution diminishes the signal.
  • Don't merge a "must-have" fix during an active RC vote without cancelling the RC first. Mid-vote merges invalidate the artifact and reset the 72-hour clock.
  • Don't assume the release manager will catch your concern silently. Vote on the thread, even if just to 0 with a comment.

Exit criteria — there is no next stage

Stage 12 is the final rung of this roadmap. The exit criterion is that you continue — you are now operating as a committer-track contributor. The next steps are not stages but ongoing practices:

  • Participate in every RC vote with a built artifact and a smoke-test result, even just 0.
  • Watch the security@ and dev@ lists daily.
  • Mentor a new contributor through Stages 1–4 every year.
  • Read every CHANGES.txt diff for every release line you care about.
  • Send a quarterly note to dev@ on which areas of the codebase you are willing to review, so contributors know where to ask.

If you have walked all twelve stages, you are the Apache Tez committer the project needed when you started reading this book.

Deep Dives: Reading Order

This directory contains 21 deep-dive chapters. They are the reference material behind the Level curriculum. Each chapter is self-contained, but most chapters depend on a handful of earlier ones. Read in the order below the first time through; thereafter use the index as a lookup.

The chapters are grouped by subsystem. For each chapter we list:

  • Title — the file.
  • One-line summary — what you should walk away knowing.
  • Consumed by — which Levels/Labs depend on it.

Group 1 — The DAG Model and the Client

These four chapters define "what is a Tez job" before any execution machinery exists.

#FileSummaryConsumed by
1dag-model.mdDAG/Vertex/Edge as immutable plan; DAGPlan protobuf; validation rulesLevel 1 (all labs); Level 2 lab 2.1
2logical-physical.mdHow the logical DAG becomes a physical execution plan with concrete parallelismLevel 4 lab 4.2; Level 5 lab 5.1
3tez-client.mdClient-side bring-up: session mode, local resources, AM start, submission RPCLevel 3 lab 3.1; Level 7 lab 7.1
4dag-client.mdStatus polling, kill, error reporting; RPC vs ATS backendsLevel 3 lab 3.1; Level 8 lab 8.1

Start here. Without the DAG model in your head, every later chapter feels like trivia.


Group 2 — AM Lifecycle and Dispatch

#FileSummaryConsumed by
5dag-app-master.mdAM as YARN application; dispatchers, heartbeats, recoveryLevel 3 lab 3.2; Level 8 lab 8.2
6state-machines.mdHadoop StateMachineFactory API; dispatcher invariants; testsLevel 4 labs 4.1, 4.3, 4.4
7event-routing.mdThe event hierarchy; "events are the only mutation API" ruleLevel 4 (all labs)

These chapters explain how the AM mutates state. They must precede the per-entity lifecycle chapters that follow.


Group 3 — Per-Entity Lifecycle

#FileSummaryConsumed by
8vertex-lifecycle.mdVertexImpl state machine: NEW → SUCCEEDED, plus failure/kill pathsLevel 4 lab 4.2
9task-lifecycle.mdTaskImpl state machine; speculation; max-failed-attemptsLevel 4 lab 4.3
10task-attempt-lifecycle.mdTaskAttemptImpl state machine; container assignment; termination causesLevel 4 lab 4.4; Level 8 lab 8.2

Read 8, 9, 10 in this order. Each refers backward to events from chapter 7 and state-machine primitives from chapter 6.


Group 4 — Input/Processor/Output

#FileSummaryConsumed by
11ipo-abstractions.mdLogicalInput/LogicalOutput/Processor; lifecycle methods; mergedinputsLevel 5 lab 5.1; Level 7 lab 7.1
12tez-runtime.mdTezTaskRunner2, LogicalIOProcessorRuntimeTask, the umbilicalLevel 5 lab 5.1

These chapters live inside tez-runtime-internals and tez-runtime-library — the JVM the task actually runs in.


Group 5 — Shuffle, Sort, and Counters

#FileSummaryConsumed by
13shuffle-sort.mdSorter implementations, IFile, ShuffleManager, Fetcher, MergeManagerLevel 5 labs 5.2, 5.3
14counters-diagnostics.mdTezCounters, framework counters, custom counters, ATS publicationLevel 8 lab 8.1

If you skip 13, do not attempt to debug shuffle issues in production. Always read it cold before opening a fetcher-related JIRA.


Group 6 — Scheduling and Resources

#FileSummaryConsumed by
15scheduler.mdTaskSchedulerManager, YarnTaskSchedulerService, AMRM heartbeatsLevel 6 lab 6.2
16container-reuse.mdAMContainerImpl lifecycle; reuse policy; idle timeoutsLevel 6 labs 6.1, 6.2
17yarn-integration.mdYARN tokens, AMRM client, app master failover, log aggregationLevel 6 lab 6.2

Group 7 — Modes and Integrations

#FileSummaryConsumed by
18local-mode.mdLocalContainerLauncher, debugging without YARNLevel 2 labs
19hive-integration.mdHive TezTask, edge usage, DynamicPartitionPruning, ATS spansLevel 7 (Hive labs h1–h6)

Group 8 — Failure, Recovery, and Testing

#FileSummaryConsumed by
20failure-handling.mdTask retry, vertex rerun, AM restart, recovery recordsLevel 8 lab 8.2
21testing-framework.mdMiniTezCluster, MockContainerLauncher, DrainDispatcher, fault injectionLevel 2 labs; Level 4 labs

A note on order vs index

The deep-dives are an index — they exist to be looked up later. The first read should follow the table above. But when you return to fix a bug, jump directly to the chapter most relevant and use the cross-references inside it.

Every chapter ends with a Validation: prove you understand this section. Treat that as the gate before declaring the chapter "read."

DAG Model

A Tez DAG is an immutable plan for a distributed computation. This chapter describes the model classes (DAG, Vertex, Edge, EdgeProperty, DataSourceDescriptor, DataSinkDescriptor, *Descriptor), the protobuf representation that crosses the wire, and the validation rules that turn a "DAG you wrote" into a "DAG the AM will accept."

After this chapter you should be able to write a small DAG by hand, predict which EdgeManager implementation will be picked for each edge, and find any classification rule in the source.


The classes you actually call from a client

All of these live in tez-api:

tez-api/src/main/java/org/apache/tez/dag/api/
  DAG.java
  Vertex.java
  Edge.java
  EdgeProperty.java
  InputDescriptor.java
  OutputDescriptor.java
  ProcessorDescriptor.java
  VertexManagerPluginDescriptor.java
  DataSourceDescriptor.java
  DataSinkDescriptor.java
  EntityDescriptor.java          (base class for all *Descriptors)
  GroupInputEdge.java            (multi-source unioning edge)
  VertexGroup.java               (group of vertices for grouped commits)

Use this command to inspect the API surface:

grep -n "^public " tez-api/src/main/java/org/apache/tez/dag/api/DAG.java | head -40

Every class above is immutable by convention once handed to TezClient. You may mutate via the builder methods (addVertex, addEdge, addDataSource) before submission. After submission the only way to change the plan is via VertexManagerPlugin callbacks (see vertex-lifecycle.md and the Level 4 lab on VertexManager).


EdgeProperty — three orthogonal axes

EdgeProperty.create(DataMovementType, DataSourceType, SchedulingType, OutputDescriptor, InputDescriptor) is the single most important constructor in the API.

grep -n "enum " tez-api/src/main/java/org/apache/tez/dag/api/EdgeProperty.java

The three enums:

EnumValuesWhat it controls
DataMovementTypeONE_TO_ONE, BROADCAST, SCATTER_GATHER, CUSTOMHow outputs are routed from src to dst tasks
DataSourceTypePERSISTED, PERSISTED_RELIABLE, EPHEMERALDurability of intermediate data
SchedulingTypeSEQUENTIAL, CONCURRENTWhether dst tasks must wait for src to finish

Edge type matrix (movement × scheduling)

MovementSchedulingTypical useEdgeManager impl
SCATTER_GATHERSEQUENTIALMap → Reduce shuffleShuffleEdgeManager (the AM-internal default)
ONE_TO_ONESEQUENTIALSorted reducer → re-sorter (rare)OneToOneEdgeManager
BROADCASTSEQUENTIALSmall-side join broadcastBroadcastEdgeManager
CUSTOMSEQUENTIALHive cartesian product, custom partitionerUser-supplied EdgeManagerPlugin
BROADCASTCONCURRENTStreaming push between long-running tasksBroadcastEdgeManager
SCATTER_GATHERCONCURRENT(Unusual — generally invalid for shuffles)

Locate the actual EdgeManager implementations:

find tez-dag/src/main/java -name "*EdgeManager*"

Key files (exact names vary slightly by branch):

tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/
  OneToOneEdgeManagerOnDemand.java
  ScatterGatherEdgeManager.java
  BroadcastEdgeManager.java

Read Edge.java (tez-api) to see how it wires the right manager based on EdgeProperty:

grep -n "EdgeManager\|edgeManager\|createEdgeManager" \
  tez-api/src/main/java/org/apache/tez/dag/api/Edge.java

DataSourceDescriptor vs Input

Beginners frequently confuse these two:

ConceptClassDefined inLives during
Plan-time root-input definitionDataSourceDescriptortez-apiClient + AM (planning)
Runtime input attached to a taskInput (interface)tez-apiTask JVM (execution)

A DataSourceDescriptor describes "how to materialize splits for this vertex" (controller class + input descriptor + (optional) initializer). The AM may run an InputInitializer (e.g., MRInputAMSplitGenerator) to enumerate splits before the vertex starts. The result of that initialization becomes InputDataInformationEvents pushed to tasks (see ipo-abstractions.md and event-routing.md).

At task time the input class is instantiated from the InputDescriptor and called with initialize() → start() → getReader() → close(). The task never sees the DataSourceDescriptor.


The DAGPlan protobuf — the wire format

tez-api/src/main/proto/DAGApiRecords.proto

Inspect:

grep -n "^message " tez-api/src/main/proto/DAGApiRecords.proto

Key messages:

  • DAGPlan — root: name, vertices, edges, plan-level configs, credentials, ACLs.
  • VertexPlan — name, processor descriptor, parallelism, location hints, associated edges, root inputs.
  • EdgePlan — source/dest vertex names, edge properties, edge manager descriptor.
  • TezEntityDescriptorProto{class_name, user_payload, history_text} — the serialized form of any *Descriptor.
  • RootInputLeafOutputProto — the protobuf encoding of DataSourceDescriptor and DataSinkDescriptor.

The conversion from API classes to protobuf happens in:

grep -rn "createDAGPlan\|toProtoFormat" tez-api/src/main/java/org/apache/tez/dag/api/ | head

Specifically DAG.createDag(...) and DagTypeConverters (a kitchen-sink class of to/from helpers).


Validation — what DAG.verify() checks

grep -n "private void.*verify\|public void verify" \
  tez-api/src/main/java/org/apache/tez/dag/api/DAG.java

DAG.verify(restricted=true) enforces, at minimum:

  1. Name uniqueness — vertex names and DAG name are unique.
  2. No cycles — DFS over the edge graph; throws IllegalStateException ("DAG contains a cycle") if any back-edge is found.
  3. Parallelism rules:
    • ONE_TO_ONE edges require source.parallelism == dest.parallelism if both are statically set.
    • Vertices with BROADCAST outputs must have a finite parallelism (since each downstream task receives every output).
  4. Descriptor non-null for required slots (Processor, Output for vertices that produce, Input for vertices that consume).
  5. No "dangling" data sources — every root input is on a real vertex.
  6. VertexManagerPlugin specified explicitly for vertices that need dynamic reconfig (else a default is chosen — see vertex-lifecycle.md for the default rules).

Read the body of verify(...) line-by-line; the comments cite the JIRA that added each check.


How a DAG becomes a plan, end-to-end

flowchart LR
    A[User code: new DAG] --> B[addVertex/addEdge/addDataSource]
    B --> C[TezClient.submitDAG]
    C --> D[DAG.verify]
    D -->|ok| E[DAG.createDag -> DAGPlan proto]
    E --> F[RPC DAGClientAMProtocol.submitDAG]
    F --> G[DAGAppMaster: DAGImpl init]
    G --> H[VertexImpl per VertexPlan]
    H --> I[Edge per EdgePlan; EdgeManager selected]

Each arrow has a citation:

  • verify: DAG.verify(...).
  • createDag: DAG.createDag(BinaryConfig, Credentials, Map<String,LocalResource>, JobTokenSecretManager, boolean tezLrsAsArchive).
  • AM-side: DAGImpl.init() and VertexImpl.constructInputDescriptors(), Edge.<init> (in tez-dag, not the tez-api Edge).

Reading exercise

# Top-level surface
sed -n '1,80p' tez-api/src/main/java/org/apache/tez/dag/api/DAG.java
sed -n '1,80p' tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java
sed -n '1,80p' tez-api/src/main/java/org/apache/tez/dag/api/Edge.java

# All the places where DAGPlan is constructed
grep -rn "DAGPlan.newBuilder" tez-api/src/main/java | head

# Cycle detection
grep -n "cycle\|cycleFound\|visit" \
  tez-api/src/main/java/org/apache/tez/dag/api/DAG.java

Answer:

  1. What exception class does DAG.verify() throw on a cycle, and what does its message contain that helps a user diagnose the offending vertex?
  2. Which method on Vertex is used to attach a DataSourceDescriptor? Which to attach a DataSinkDescriptor?
  3. What is the role of DagTypeConverters and why is it preferred over each class owning its own toProto/fromProto methods?
  4. When you call Edge.create(srcV, dstV, EdgeProperty.create(...)), where is the resulting Edge registered? On the source vertex? Destination? The DAG itself?
  5. Suppose you call dag.addVertex(v) twice with the same v instance. What happens, and where in DAG.java is the protection?
  6. What is the difference between DataSourceType.PERSISTED and DataSourceType.PERSISTED_RELIABLE? Find the consumer (search tez-dag for uses of DataSourceType).

Common bugs and symptoms

SymptomRoot causeWhere to look
IllegalStateException: DAG contains a cycle at submissionAccidentally added a back-edgeDAG.verify
Vertex starts with parallelism -1 and never runssetParallelism(-1) and no VertexManagerPlugin to reconfigureVertexImpl.initialize; check for "parallelism not set"
Job hangs with all vertices in INITEDA DataSourceDescriptor has an initializer that never emits eventsSearch AM log for InputInitializerEvent; cross-reference initializer impl
ClassNotFoundException at task start for your ProcessorThe class is in client classpath but not uploaded as a local resourceTezClient.addAppMasterLocalFiles not called; see tez-client.md
EdgeManager mismatch between sides — task hangs readingCustom EdgeManagerPlugin returns inconsistent partition countsAlways run TestEdgeManagerSelf on your plugin
DAGPlan proto exceeds 64 MBEncoding huge userPayload directly into the planUse a side file via LocalResource; payload is byte[] not free-form storage

Validation: prove you understand this

  1. Write, on a whiteboard, a 4-vertex DAG with two SCATTER_GATHER edges and one BROADCAST edge. Annotate each edge with its three EdgeProperty enums. Justify each choice.
  2. Given an edge with (SCATTER_GATHER, PERSISTED, SEQUENTIAL), name the EdgeManager class that will be selected at runtime and the source file where the selection logic lives.
  3. From memory, list the five required arguments to EdgeProperty.create(...).
  4. Open DAG.verify() and identify the first five checks. For each, propose a one-line DAG that would fail it.
  5. In a new method getAllRootInputs(DAG), walk the DAG and return all DataSourceDescriptor objects across all vertices. Compile it; check against DAG.java's own helpers.

TezClient

TezClient is the client-side API: the class your driver code instantiates to start an AM, submit DAGs, and (optionally) keep the AM alive across DAGs. This chapter walks bring-up, the session vs non-session distinction, local resource staging, RPC submission, and ATS hookup.

After this chapter you should be able to point at every line of code that runs between TezClient.create(...) and the moment a DAG appears inside the AM ready to be start()ed.


Files to open

tez-api/src/main/java/org/apache/tez/client/
  TezClient.java
  TezClientUtils.java
  TezSessionImpl.java
  FrameworkClient.java
  TezYarnClient.java            (YARN-backed FrameworkClient)
  LocalClient.java              (in-process FrameworkClient for local mode)

Plus the YARN-AM protocol definition:

tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClient.java
tez-api/src/main/proto/DAGClientAMProtocol.proto

Two modes: session and non-session

The mode is chosen at TezClient.create(...):

TezClient client = TezClient.create(
    "MyApp",
    tezConf,
    isSession  /* true = session mode */);
PropertyNon-sessionSession
AM lifetimePer DAGAcross many DAGs
start() semanticsNo-op (AM launched at submitDAG)Launches AM and waits for it to register
Allowed DAGs in flight11 (sequential within a session by default)
Keep-aliven/atez.session.am.dag.submit.timeout.secs
Use caseOne-shot jobs (CLI tools, scheduled batch)Latency-sensitive (Hive, Pig, interactive)

The AM keep-alive timer is critical. In session mode, after a DAG completes the AM waits for the configured timeout for a new DAG. If none arrives, it shuts down to free YARN resources. Find the timer:

grep -n "AMSessionDAGSubmitTimeout\|dag.submit.timeout" \
  tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java

Bring-up control flow

sequenceDiagram
    participant U as User code
    participant TC as TezClient
    participant TCU as TezClientUtils
    participant YC as TezYarnClient
    participant RM as YARN RM
    participant AM as DAGAppMaster

    U->>TC: TezClient.create(name, conf, isSession)
    U->>TC: addAppMasterLocalFiles(map)
    U->>TC: start()
    TC->>TCU: createApplicationSubmissionContext(...)
    TCU->>TCU: stage local resources to HDFS
    TCU->>TCU: build classpath & env
    TC->>YC: submitApplication(appSubmissionContext)
    YC->>RM: submitApplication
    RM-->>YC: appId
    Note over RM,AM: RM launches AM container
    AM->>AM: serviceInit, serviceStart
    AM-->>TC: AM registers via heartbeat; TC sees RUNNING
    U->>TC: submitDAG(dag)
    TC->>AM: DAGClientAMProtocol.submitDAG(rpcCall)
    AM-->>TC: dagId
    TC-->>U: DAGClient

Where each call lives:

  • TezClient.start()TezClientUtils.createFinalConfProtoForApp()TezClientUtils.createApplicationSubmissionContext()frameworkClient.submitApplication(...).
  • TezClient.submitDAG(dag)getSessionAMProxy()dagAMProtocol.submitDAG(submitRequest) (the YARN AM proxy).
grep -n "submitApplication\|submitDAG\|dagAMProtocol" \
  tez-api/src/main/java/org/apache/tez/client/TezClient.java

Local resources that TezClientUtils uploads

A YARN container starts with a clean working directory plus whatever local resources the AM submission context declares. For Tez, that includes:

  1. Tez framework tarball — pointed to by tez.lib.uris (or a local jar list). Contains tez-api.jar, tez-dag.jar, tez-runtime-*.jar, etc.
  2. User application jars — anything you added via TezClient.addAppMasterLocalFiles(Map<String, LocalResource>) plus addTaskLocalFiles.
  3. The DAGPlannot a local resource. It is sent via the submitDAG RPC payload.

Inspect:

grep -n "tez.lib.uris\|TezConfiguration.TEZ_LIB_URIS\|addAppMasterLocalFiles" \
  tez-api/src/main/java/org/apache/tez/client/TezClient.java \
  tez-api/src/main/java/org/apache/tez/client/TezClientUtils.java

The AMRM token is delivered by YARN when the container starts; Tez does not manage it directly.


The submission RPC

The protocol is defined in:

tez-api/src/main/proto/DAGClientAMProtocol.proto
grep -n "rpc " tez-api/src/main/proto/DAGClientAMProtocol.proto

Key RPCs:

RPCWhat it does
submitDAGSubmit a new DAG to a running AM
getDAGStatusPoll status (also used by DAGClient)
getVertexStatusPoll a specific vertex
tryKillDAGInitiate kill
shutdownSessionStop the AM in session mode

The RPC server lives in the AM (DAGClientHandler and its Protobuf implementation):

grep -rn "DAGClientAMProtocol\|submitDAG" \
  tez-dag/src/main/java/org/apache/tez/dag/api/client/ 2>/dev/null | head

ATS / Timeline Service integration

When tez.history.logging.service.class is set to ATSHistoryLoggingService (the default in many distros), TezClient does not publish events itself — the AM does, via the HistoryEventHandler. However, TezClient does:

  • Set tez.history.logging.service.class into the AM env.
  • Provide ATS credentials in the application submission context.

Read:

grep -rn "ATSHistoryLoggingService\|YARN_TIMELINE_SERVICE" \
  tez-api/src/main/java/org/apache/tez/client/

For the AM-side, see counters-diagnostics.md.


TezSessionImpl vs TezClient

There is a subclass relationship: TezSessionImpl was the older name; modern Tez uses TezClient with isSession=true, but TezSessionImpl still appears in some codepaths. The two are largely interchangeable. Inspect both:

grep -n "class TezClient\|class TezSessionImpl" \
  tez-api/src/main/java/org/apache/tez/client/*.java

Reading exercise

sed -n '1,120p' tez-api/src/main/java/org/apache/tez/client/TezClient.java
grep -n "submitDAG\b" tez-api/src/main/java/org/apache/tez/client/TezClient.java
grep -n "stopSession\|stop\|close" \
  tez-api/src/main/java/org/apache/tez/client/TezClient.java
grep -rn "submitApplication" tez-api/src/main/java/org/apache/tez/client/

Answer:

  1. What is the difference between TezClient.stop() in session vs non-session mode?
  2. When TezClient.submitDAG() is called for a DAG that conflicts with one currently running in the session, what happens?
  3. Find the timeout used while waiting for the AM to reach RUNNING after start(). Which config key controls it?
  4. What pre-condition does submitDAG enforce on the DAG's vertex names with respect to previously-submitted DAGs in the same session?
  5. Trace addAppMasterLocalFiles(...) end-to-end. Where do those files end up on HDFS?
  6. Why is tez.lib.uris sometimes a directory and sometimes a tarball? What does TezClientUtils.setupTezJarsLocalResources do for each case?

Common bugs and symptoms

SymptomRoot causeFix
AM never reaches RUNNING; client hangs in start()tez.lib.uris points to a path the NodeManager can't readVerify HDFS perms; check NM logs
submitDAG throws SessionNotRunningAM died (idle timeout, crash)Catch, recreate TezClient, resubmit
submitDAG blocks foreverPrevious DAG still in flight in the sessionDon't reuse session for parallel DAGs; or wait
IOException: Failed to submit applicationRM rejected (queue full, ACL)Inspect RM logs; verify queue config
AM starts but cannot talk back to clientClient behind NAT; AM cannot reach client's RPC serverUse polling-only DAGClient; avoid callbacks
Tasks fail with ClassNotFoundException for user codeaddTaskLocalFiles not called for that jarAdd jars via both addAppMasterLocalFiles and addTaskLocalFiles if used in tasks

Validation: prove you understand this

  1. Write a 30-line Java driver that creates a TezClient in session mode, submits two DAGs back-to-back, prints both DAGClient.getDAGStatus() results, and shuts down cleanly.
  2. From TezClient.java, list every method that ultimately reaches dagAMProtocol.
  3. Explain why addAppMasterLocalFiles is a Map<String, LocalResource> and not a List<Path>.
  4. From the proto file DAGClientAMProtocol.proto, write the exact request message used by submitDAG.
  5. Reproduce the "AM idle timeout" path on MiniTezCluster: submit one DAG, wait past the configured timeout, attempt a second submit, observe the exception class and message.

DAGClient

DAGClient is the read-only client-side handle to a submitted DAG. It is returned by TezClient.submitDAG(...) and lives until the DAG completes (or the user kills it). This chapter covers status polling, the StatusGetOpts flag, the RPC vs ATS backends, error reporting, and the contract DAGClient exposes to callers like Hive, Pig, and CLI drivers.

After this chapter you should know which backend a given DAGClient instance is using, what fields will be populated, and which calls block vs poll.


Files to open

tez-api/src/main/java/org/apache/tez/dag/api/client/
  DAGClient.java                       (abstract base)
  DAGStatus.java                       (the snapshot type)
  VertexStatus.java
  Progress.java
  StatusGetOpts.java                   (enum: GET_COUNTERS, GET_MEMORY_USAGE)

  rpc/
    DAGClientRPCImpl.java              (talks to the AM via DAGClientAMProtocol)
    DAGClientImplLocal.java            (in-process; for LocalClient)

  registry/                            (service discovery if applicable)

ATS-backed variant:

tez-plugins/tez-yarn-timeline-history-with-fs/  or
tez-plugins/tez-yarn-timeline-history/
  src/main/java/org/apache/tez/dag/api/client/DAGClientTimelineImpl.java

(Module names vary across versions; locate with find . -name "DAGClientTimelineImpl.java".)


Core API

public abstract class DAGClient implements Closeable {
  public abstract String getExecutionContext();
  public abstract DAGStatus getDAGStatus(Set<StatusGetOpts> opts) throws ...;
  public abstract DAGStatus getDAGStatus(Set<StatusGetOpts> opts, long timeoutMillis) throws ...;
  public abstract VertexStatus getVertexStatus(String vertexName, Set<StatusGetOpts> opts) throws ...;
  public abstract DAGStatus waitForCompletion() throws ...;
  public abstract DAGStatus waitForCompletionWithStatusUpdates(Set<StatusGetOpts> opts) throws ...;
  public abstract void tryKillDAG() throws ...;
  // ...
}
grep -n "public abstract\|public " \
  tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClient.java

DAGStatus — what callers actually consume

grep -n "public " tez-api/src/main/java/org/apache/tez/dag/api/client/DAGStatus.java

Fields you'll see in production triage:

FieldPopulated byNotes
state (DAGStatus.State)AlwaysSUBMITTED/INITING/RUNNING/SUCCEEDED/FAILED/KILLED/ERROR
progressRPC backend; ATS backend may lagProgress per vertex + aggregate
diagnosticsOn terminal statesNewline-joined messages
countersOnly if StatusGetOpts.GET_COUNTERS passedExpensive over RPC
memoryUsageOnly if StatusGetOpts.GET_MEMORY_USAGE passedAggregated across containers

Note: state is not the same as VertexStatus.State. Vertex states are richer (INITED, RUNNING, COMMITTING, SUCCEEDED, etc.) — see vertex-lifecycle.md. DAG state is a roll-up.


RPC backend: DAGClientRPCImpl

grep -n "DAGClientAMProtocol\|proxy" \
  tez-api/src/main/java/org/apache/tez/dag/api/client/rpc/DAGClientRPCImpl.java

Behavior:

  • Each getDAGStatus(opts) is a synchronous RPC to the AM.
  • Default timeout per call is governed by tez.dag.am.client.am-connect-timeout-secs.
  • If GET_COUNTERS is set, the AM serializes the entire TezCounters tree (potentially MBs); avoid in tight loops.
  • waitForCompletion() is implemented as a polling loop with backoff. Find the loop:
grep -n "waitForCompletion\|sleep\|poll" \
  tez-api/src/main/java/org/apache/tez/dag/api/client/rpc/DAGClientRPCImpl.java

ATS backend: DAGClientTimelineImpl

When the AM has exited but ATS retains history, status is fetched from the ATS REST API (or RM web UI) instead. This is critical for post-mortem and "why did my job fail" UIs.

Behavior differences from RPC:

  • Eventually consistent (ATS publication is async; see counters-diagnostics.md).
  • state is the final state recorded; intermediate states between two ATS events are invisible.
  • Counters are available if ATSHistoryLoggingService was active and the event made it past the publisher queue.

Search for the fallback path that picks ATS when RPC fails:

grep -rn "DAGClientTimelineImpl\|getDAGAndAMURL\|RPCFailed\|amProxyFailed" \
  tez-api/src/main/java/org/apache/tez/dag/api/client/ \
  tez-api/src/main/java/org/apache/tez/client/

tryKillDAG() — the only mutation

Despite the name, DAGClient has exactly one mutating method: tryKillDAG. It triggers the AM to start the kill path, but does not block until the DAG is dead.

grep -n "tryKillDAG\|killDAG" \
  tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClient.java \
  tez-api/src/main/java/org/apache/tez/dag/api/client/rpc/DAGClientRPCImpl.java

To wait for the kill to take effect:

client.tryKillDAG();
DAGStatus status = client.waitForCompletion();
// status.state will be KILLED (or whatever it raced to)

Status populate flow

sequenceDiagram
    participant U as User code
    participant DC as DAGClientRPCImpl
    participant AM as DAGAppMaster
    participant DH as DAGClientHandler
    participant DI as DAGImpl

    U->>DC: getDAGStatus(opts)
    DC->>AM: RPC: getDAGStatus(dagId, opts)
    AM->>DH: dispatch
    DH->>DI: dagImpl.getDAGStatus(opts)
    DI-->>DH: DAGStatusProto
    DH-->>AM: response
    AM-->>DC: response bytes
    DC-->>U: DAGStatus

The conversion DAGImpl → DAGStatusProto happens in DAGImpl.getDAGStatus() (in tez-dag). For GET_COUNTERS, the AM walks the counter aggregation tree — expensive.


Reading exercise

# Surface
sed -n '1,80p' tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClient.java

# State enum
grep -n "public enum State\b" \
  tez-api/src/main/java/org/apache/tez/dag/api/client/DAGStatus.java

# RPC polling loop
grep -n "waitForCompletion\|backoff\|sleep" \
  tez-api/src/main/java/org/apache/tez/dag/api/client/rpc/DAGClientRPCImpl.java

Answer:

  1. What is the difference between waitForCompletion() and waitForCompletionWithStatusUpdates(opts)?
  2. What happens if GET_COUNTERS is requested but the DAG is still INITING?
  3. List the exact DAGStatus.State enum values and the terminal subset.
  4. From the polling loop, what is the maximum sleep between polls?
  5. When tryKillDAG() is called after the DAG already finished, what does the RPC return? Is it an error?
  6. In DAGClientTimelineImpl, how is the "I don't see a SUCCEEDED event yet" case distinguished from "the DAG is still running"?

Common bugs and symptoms

SymptomRoot causeFix
waitForCompletion() returns RUNNING foreverAM crashed, RPC keeps timing outAdd timeout; check AM log; fall back to ATS
Counters are stale by ~30sAM aggregation intervaltez.am.aggregate.counters.interval-secs
tryKillDAG() returns immediately but DAG keeps running for minutesKill is async; tasks must drainAlways follow with waitForCompletion
Hive sees DAGStatus.State=ERROR with no diagnosticsAM crashed before publishingCheck NM container log for the AM
ATS-backed status missing for a recently completed DAGATS publisher queue backed upWait; or query ATS REST directly
Inconsistent state between RPC and ATS for same DAGRace during AM shutdown; ATS publishes after final RPCTrust RPC while AM lives, ATS after

Validation: prove you understand this

  1. Write a 20-line program that polls getDAGStatus(GET_COUNTERS) once a second and prints the FILE_BYTES_WRITTEN counter from each snapshot.
  2. List the four StatusGetOpts enum values (check the source — there may be fewer/more than you remember) and what each adds to the payload.
  3. From DAGClient.java, draw the inheritance/factory diagram for how a DAGClient instance is actually constructed (look at TezClient.submitDAG to see which subclass is returned in YARN vs local mode).
  4. Force the RPC backend to fail and confirm whether (or not) Tez falls back to the ATS backend automatically. Cite the line that performs the fallback.
  5. Explain why DAGStatus is a snapshot rather than an observable.

DAGAppMaster

DAGAppMaster is Tez's YARN ApplicationMaster: a single JVM, launched by the YARN ResourceManager, that owns one or more DAGs over its lifetime. This chapter describes its bring-up, its dispatcher topology, its YARN-facing heartbeats, and the recovery service that lets it restart after a crash.

After this chapter you should be able to map any AM log line in the first 60 seconds of operation to a method in DAGAppMaster.java.


Files to open

tez-dag/src/main/java/org/apache/tez/dag/app/
  DAGAppMaster.java                          (the AM main class)
  TaskCommunicatorManager.java               (task umbilical multiplexer)
  ContainerHeartbeatHandler.java             (container liveness)
  rm/
    TaskSchedulerManager.java                (one per scheduler instance)
    YarnTaskSchedulerService.java            (the default scheduler impl)
    container/
      AMContainerImpl.java                   (container state machine)
  launcher/
    ContainerLauncherManager.java
    DagContainerLauncher.java                (varies by version)
    LocalContainerLauncher.java              (in-process)
  recovery/
    RecoveryService.java                     (event log; restart path)
  dag/impl/
    DAGImpl.java
    VertexImpl.java
    TaskImpl.java
    TaskAttemptImpl.java

Bring-up: serviceInit and serviceStart

DAGAppMaster extends AbstractService. YARN starts it with a main; control flows:

main()
  -> DAGAppMaster.create / new DAGAppMaster(...)
  -> init(conf)
       -> serviceInit(conf)
          - parse appAttemptId
          - load credentials
          - construct AsyncDispatcher
          - construct + register child services: TaskSchedulerManager,
              ContainerLauncherManager, TaskCommunicatorManager,
              RecoveryService (if enabled), HistoryEventHandler, ATSHook
          - register event handlers on the dispatcher
  -> start()
       -> serviceStart()
          - start child services (they each start their own threads)
          - if not session mode: handle the inline DAG plan
          - if session mode: enter idle loop, wait for submitDAG RPC

Inspect the boundaries:

grep -n "serviceInit\|serviceStart\|serviceStop" \
  tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java

The AsyncDispatcher and registered handlers

DAGAppMaster builds one AsyncDispatcher (from hadoop-yarn-common) and registers a handler per event type. The contract is:

  • Each handler runs on a single dispatch thread.
  • Handlers must be fast (no blocking I/O); they should mutate state and emit follow-on events.

Find the registrations:

grep -n "dispatcher.register\|register(.*\.class" \
  tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java

Typical registrations (names approximate by version):

Event typeHandler classOwned subsystem
DAGEventTypeDAGEventDispatcher (forwards to DAGImpl.handle)DAG lifecycle
VertexEventTypeVertexEventDispatcher (forwards to VertexImpl.handle)Vertex lifecycle
TaskEventTypeTaskEventDispatcher (forwards to TaskImpl.handle)Task lifecycle
TaskAttemptEventTypeTaskAttemptEventDispatcher (forwards to TaskAttemptImpl.handle)Attempt lifecycle
AMSchedulerEventTypeAMSchedulerEventDispatcher (forwards to TaskSchedulerManager)Scheduling
AMContainerEventTypecontainer event dispatcherContainer state
AMNodeEventTypenode event dispatcherNode tracking
ContainerLauncherEventTypelauncher dispatcherLaunch/stop containers
TaskCommunicatorEventTypecomms dispatcherPer-launcher umbilical
HistoryEventTypehistory event dispatcherATS/log publication
SpeculatorEventTypespeculator dispatcherSpeculation (if enabled)
DAGAppMasterEventTypeAM itselfLifecycle (e.g., shutdown)
RecoveryEventTyperecovery dispatcherRecovery log

The handlers themselves are inner classes or top-level dispatchers found in:

grep -rn "extends EventHandler\|implements EventHandler" \
  tez-dag/src/main/java/org/apache/tez/dag/app/ | head -20

Event flow diagram

flowchart TB
    subgraph "Sources of events"
        TC[Task heartbeat]
        SCH[Scheduler callback]
        TL[Container launcher]
        UC[User: submitDAG/killDAG]
        RC[Recovery on restart]
    end
    TC --> D
    SCH --> D
    TL --> D
    UC --> D
    RC --> D
    D[AsyncDispatcher] --> DH[DAGEventDispatcher]
    D --> VH[VertexEventDispatcher]
    D --> TH[TaskEventDispatcher]
    D --> AH[TaskAttemptEventDispatcher]
    D --> SH[AMSchedulerEventDispatcher]
    D --> HH[HistoryEventDispatcher]
    DH --> DI[DAGImpl]
    VH --> VI[VertexImpl]
    TH --> TI[TaskImpl]
    AH --> TAI[TaskAttemptImpl]
    SH --> TSM[TaskSchedulerManager]
    HH --> HEH[HistoryEventHandler]

Everything flows through D. There is no other way to mutate the state of a DAG, vertex, task, or attempt. See event-routing.md.


YARN-facing components

AMRM heartbeat (the resource conversation)

TaskSchedulerManager (and underneath, YarnTaskSchedulerService) maintains an AMRMClient (from YARN). This heartbeats with the RM at a configurable interval (tez.am.am-rm.heartbeat.interval-ms.max) carrying:

  • ContainerRequests for new tasks.
  • ContainerReleases for freed containers.
  • Progress percent (visible in yarn application -status).

Responses contain:

  • AllocatedContainers (RM granted).
  • CompletedContainersStatuses (RM tells us a container died).
grep -n "heartbeat\|AMRMClient\|allocate" \
  tez-dag/src/main/java/org/apache/tez/dag/app/rm/YarnTaskSchedulerService.java | head

Container heartbeat (the liveness check)

ContainerHeartbeatHandler tracks the wall time of the last heartbeat() call from each running container's umbilical. If a container goes silent past tez.task.timeout-ms, the AM declares the container unresponsive and kills the attempt.

grep -n "ContainerHeartbeatHandler\|tez.task.timeout" \
  tez-dag/src/main/java/org/apache/tez/dag/app/ContainerHeartbeatHandler.java

Task umbilical (the per-task RPC server)

TaskCommunicatorManager runs an in-AM RPC server (the umbilical) that tasks call into for:

  • getTask() — pick up assigned task.
  • statusUpdate(...) / heartbeat(...) — progress and liveness.
  • done(...) / fatalError(...) — completion.
  • outputReady(...) / inputEvents(...) — runtime data plane.

The umbilical protocol is TezTaskUmbilicalProtocol:

find . -name "TezTaskUmbilicalProtocol.java"

Recovery: surviving an AM restart

If tez.am.am-rm.heartbeat.interval-ms.max allows it and recovery is enabled (tez.dag.recovery.enabled=true), RecoveryService writes a log of state-changing events to HDFS. On a restart (YARN gives the AM a new appAttemptId but the same appId), the new AM:

  1. Reads the recovery log under ${tez.staging-dir}/$appId/recovery/$attemptId/.
  2. Replays events into DAGImpl, VertexImpl, etc., to rebuild in-memory state up to the last durable point.
  3. Resumes execution: completed tasks remain completed, in-flight tasks are relaunched.
grep -rn "RecoveryService\|RecoveryEvent\|replayEvents" \
  tez-dag/src/main/java/org/apache/tez/dag/app/recovery/ | head

Note: recovery is per-DAG, not per-task. A vertex that was RUNNING becomes RUNNING again; tasks that completed stay completed; tasks that were in flight get fresh attempts.


Reading exercise

# Bring-up
sed -n '1,200p' tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java | head -200
grep -n "serviceInit\|serviceStart" tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java

# Handlers
grep -n "register\b" tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java | head -30

# Session vs non-session control
grep -n "isSession\|sessionMode" tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java | head -20

# Recovery hookup
grep -n "RecoveryService\|recoveryEnabled" tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java | head

Answer:

  1. In what order are the child services started in serviceStart? Why does order matter?
  2. List the first three events that flow through the dispatcher when an AM in non-session mode starts.
  3. What thread does DAGImpl.handle(DAGEvent) execute on? Is it the same thread as VertexImpl.handle(VertexEvent)?
  4. Where is the appAttemptId > 1 check that decides whether to start fresh or recover?
  5. What is the difference between DAGAppMaster.shutdown() and DAGAppMaster.serviceStop()?
  6. Find the line that emits the first "DAGAppMaster started" log statement (or its modern equivalent).

Common bugs and symptoms

SymptomRoot causeWhere to look
AM dies immediately with NPE in serviceInitMissing or wrong tez.lib.uris; jars not foundNM container log; verify HDFS perms
AM hangs forever after serviceStart in session modeNo DAGs submitted; tez.session.am.dag.submit.timeout.secs exhaustedIncrease timeout; or check why client isn't submitting
Tasks all fail with "container lost" after a long GCAM GC pause exceeded heartbeat budget; RM killed AMTune AM heap; reduce dispatcher pressure
Recovery replays but stalls in INITINGRecovery log truncated mid-vertex-initLook for SummaryEventWriter errors in prior attempt
Event dispatcher queue grows without boundA handler is doing blocking I/O on the dispatch threadTake a thread dump; verify which event is stuck
AM exits with ERROR and no DAG transitionAn uncaught exception bubbled out of an event handlergrep "Error in dispatcher thread" in AM log

Validation: prove you understand this

  1. From memory, list ten event-type→handler registrations in DAGAppMaster.
  2. Draw the event flow from TezTaskUmbilicalProtocol.heartbeat to TaskAttemptImpl.handle(TA_DONE).
  3. Reproduce a single-DAG, non-session AM bring-up on MiniTezCluster and identify the log line emitted by each child-service start.
  4. Read the RecoveryService writer and identify which event types are persisted vs in-memory-only.
  5. Explain why the dispatcher must be single-threaded and what would break if you parallelized it.

VertexImpl Lifecycle

VertexImpl is the AM-side representation of a single Vertex in a running DAG. Its lifecycle is a Hadoop state machine with ~15 states and dozens of events. This chapter walks the happy path (NEW → SUCCEEDED), the major failure and kill paths, and the rules that govern transitions.

After this chapter you should be able to draw the state machine on a whiteboard and predict every state transition for any event in any state.


File

tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java

This is one of the largest files in Tez (typically 4000+ lines). Skim once top-to-bottom, then read the stateMachineFactory block carefully.

grep -n "stateMachineFactory" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head

The factory is a single chained builder defined near the top of the file (roughly 200–600 lines depending on version).


The states

grep -n "VertexState\." tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head
# or
grep -n "public enum\|enum VertexState" \
  tez-api/src/main/java/org/apache/tez/dag/api/event/VertexState.java \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java

The full state set (names exact as of 0.10.x):

StateMeaning
NEWJust constructed; no events seen
INITIALIZINGInputs being initialized (e.g., split generation)
INITEDReady to run; awaiting V_START
RUNNINGTasks executing
COMMITTINGAll tasks succeeded; outputs being committed
SUCCEEDEDTerminal: all good
TERMINATINGFailure/kill in progress; awaiting task drain
KILLEDTerminal: killed externally
FAILEDTerminal: failed (own fault)
ERRORTerminal: AM internal error
RECOVERING(Recovery only) replaying events into this vertex

State × event matrix (happy path)

StateEventNext stateAction
NEWV_INITINITIALIZINGconstruct inputs, kick off InputInitializers
INITIALIZINGV_ROOT_INPUT_INITIALIZEDINITIALIZINGaccumulate events; if all done → INITED
INITIALIZINGV_ROOT_INPUT_FAILEDTERMINATINGbubble failure
INITIALIZINGV_INIT_COMPLETEDINITEDfinalize parallelism if not set
INITEDV_STARTRUNNINGschedule tasks via VertexManagerPlugin
RUNNINGV_TASK_COMPLETED (success)RUNNINGbump counter; if all done → COMMITTING
RUNNINGV_TASK_COMPLETED (final fail)TERMINATINGinitiate cleanup
RUNNINGV_TASK_RESCHEDULEDRUNNINGrerun a task
COMMITTINGV_COMMIT_COMPLETEDSUCCEEDEDpublish history
COMMITTINGV_COMMIT_FAILEDTERMINATINGrerun or fail

For the complete matrix, count the addTransition(...) calls:

grep -c "addTransition" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java

There are usually >100 transitions registered. Each carries a one-line comment with the bug or JIRA that motivated it; read those comments.


Failure path walk

stateDiagram-v2
    [*] --> NEW
    NEW --> INITIALIZING: V_INIT
    INITIALIZING --> INITED: V_INIT_COMPLETED
    INITIALIZING --> TERMINATING: V_ROOT_INPUT_FAILED
    INITED --> RUNNING: V_START
    RUNNING --> COMMITTING: all tasks SUCCEEDED
    RUNNING --> TERMINATING: any task FAILED beyond max-attempts
    RUNNING --> TERMINATING: V_TERMINATE
    COMMITTING --> SUCCEEDED: V_COMMIT_COMPLETED
    COMMITTING --> TERMINATING: V_COMMIT_FAILED
    TERMINATING --> FAILED
    TERMINATING --> KILLED
    SUCCEEDED --> [*]
    FAILED --> [*]
    KILLED --> [*]

TERMINATING exists because a vertex cannot just jump to FAILED — it must first kill all running tasks and clean up its outputs. The transition from TERMINATING to a terminal state happens when the task count reaches zero.


Vertex initialization in detail

V_INIT is the most complex transition. The handler must:

  1. Construct each root InputDescriptor and call its InputInitializer.
  2. If parallelism is -1, defer task creation until either the VertexManagerPlugin calls reconfigureVertex(...) or the root inputs report concrete counts.
  3. Construct downstream Edge objects (the AM-side Edge, not the tez-api one) and bind their EdgeManagers.
  4. Schedule the VertexManagerPlugin.onVertexStarted callback (it fires on V_START, not V_INIT).

Read the body:

grep -n "InitTransition\|RootInputInitTransition\|RECOVERING" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head -20

The commit path

A vertex with a DataSink (an OutputCommitter) must run a commit phase after all tasks succeed. The commit:

  • Runs on the AM (not in tasks).
  • May fail and trigger a rerun (V_COMMIT_FAILED → TERMINATING).
  • Holds the vertex in COMMITTING for the duration.

Vertex-group commit (when multiple vertices write to a shared VertexGroup) is coordinated by DAGImpl; individual VertexImpls just signal that they are ready to commit.

grep -n "CommittingTransition\|commitOutput\|OutputCommitter" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head

Reading exercise

# State machine block
sed -n '1,500p' tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java

# Count transitions
grep -c "addTransition" tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java

# Find every event that can take the vertex to FAILED
grep -n "VertexState.FAILED" tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head

# Find the InitTransition body
grep -n "private.*class.*Transition\b" tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head

Answer:

  1. List five events that can take the vertex from RUNNING to TERMINATING.
  2. What determines the final state (FAILED vs KILLED) once TERMINATING completes?
  3. Why is INITED distinct from RUNNING — what does V_START actually trigger?
  4. How is parallelism set when a vertex starts with parallelism = -1?
  5. What happens to in-flight tasks when a vertex transitions to TERMINATING?
  6. Why does the state machine have a separate COMMITTING state instead of committing inside RUNNING?

Common bugs and symptoms

SymptomRoot causeWhere to look
InvalidStateTransitonException: Invalid event V_TASK_COMPLETED at SUCCEEDEDA late task completion event arrived after vertex completed (race)Check task retry logic; add a no-op transition
Vertex stuck in INITIALIZING foreverRoot input initializer never emitted eventsCheck InputInitializerEvents in log; cross-check initializer impl
Vertex transitions to FAILED but the failing task was killed externallyBug in TaskAttemptImpl setting the wrong termination causeSee task-attempt-lifecycle.md
All tasks succeed but vertex stays in COMMITTINGOutput committer hangsCheck committer for synchronous slow I/O; consider async
Recovery replays into RUNNING but tasks aren't relaunchedMissing recovery event for in-flight tasksLook for VertexTaskStartEvent gaps in recovery log
V_KILL causes vertex to stay in TERMINATING with one task lingeringContainer heartbeat timeout > kill deadlineTune tez.task.timeout-ms

Validation: prove you understand this

  1. From memory, list all 10–11 VertexState values with a one-line meaning.
  2. Without running code, predict the next state for: (NEW, V_TERMINATE), (INITIALIZING, V_TERMINATE), (RUNNING, V_TASK_RESCHEDULED), (COMMITTING, V_TASK_RESCHEDULED). Verify against the source.
  3. Find the JIRA reference next to one transition you don't understand; read the JIRA; come back and explain why the transition exists.
  4. Write a unit test that drives a VertexImpl from NEW to SUCCEEDED using DrainDispatcher. (Use TestVertexImpl as a template.)
  5. Modify VertexImpl to add a no-op transition for some (state, event) pair currently absent; update TestVertexImpl in the same patch. Compile.

TaskImpl Lifecycle

TaskImpl is the AM-side representation of one logical task within a vertex. It is a relatively small state machine, but it owns a critical piece of policy: which attempt of this task is the "winner." This chapter walks the states, the attempt management rules, speculation, and the max-failed threshold that promotes a task to "this whole vertex must fail."

After this chapter you should be able to explain why a task with three failed attempts may still be RUNNING while another with one failed attempt is already FAILED.


File

tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java

Tests:

tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestTaskImpl.java

The states

grep -n "TaskState\." tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java | head
grep -n "public enum TaskState\|enum TaskState" \
  tez-api/src/main/java/org/apache/tez/dag/api/event/TaskState.java
StateMeaning
NEWConstructed; no attempts yet
SCHEDULEDFirst attempt requested from scheduler
RUNNINGAt least one attempt is RUNNING
SUCCEEDEDTerminal: one attempt succeeded; task complete
KILLEDTerminal: explicitly killed (vertex termination, user)
FAILEDTerminal: max attempts exceeded

TaskImpl does not have INITIALIZING or TERMINATING — those concerns belong to the vertex.


State × event matrix

StateEventNext stateAction
NEWT_SCHEDULESCHEDULEDcreate first TaskAttemptImpl, send TA_SCHEDULE
SCHEDULEDT_ATTEMPT_LAUNCHEDRUNNINGmark first attempt as running
RUNNINGT_ATTEMPT_SUCCEEDEDSUCCEEDEDpick this attempt as the winner; kill others (if speculating)
RUNNINGT_ATTEMPT_FAILEDRUNNING (retry) or FAILED (exceeded)spawn new attempt or terminate
RUNNINGT_ATTEMPT_KILLEDRUNNINGno-op unless this was last attempt
RUNNINGT_ADD_SPEC_ATTEMPTRUNNINGspawn a duplicate attempt
RUNNINGT_TERMINATEKILLEDkill all attempts
anyT_RECOVER_*recovered statereplay events

Count transitions:

grep -c "addTransition" tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java

Retry: how max-failed-attempts works

The config:

grep -n "TASK_MAX_FAILED_ATTEMPTS\|tez.am.task.max.failed.attempts" \
  tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java

Default is 4 in most branches; a task is FAILED only after N attempts have failed (not been killed).

Failed vs killed distinction:

OutcomeCounts toward max.failed.attempts?
TaskAttempt failed (own crash, processor exception)yes
TaskAttempt killed by speculation (lost the race)no
TaskAttempt killed because vertex terminatedno
TaskAttempt killed because container preemptedno

The classification is owned by TaskAttemptTerminationCause (see task-attempt-lifecycle.md). TaskImpl.handle consults the cause when deciding whether to retry or fail.

grep -n "TerminationCause\|isFailureCause" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java | head

Speculation

Speculation runs a second copy of a task before the first finishes, hoping the second wins. Implementation:

tez-dag/src/main/java/org/apache/tez/dag/app/dag/speculate/
  LegacySpeculator.java
  SimpleSpeculator.java                    (varies by version)
  legacy/RuntimeTaskStatsEstimator.java

The speculator emits T_ADD_SPEC_ATTEMPT events into the dispatcher; the task spawns an additional attempt. The first attempt to succeed wins; the others are killed with cause TERMINATED_BY_OWNER (or similar). Killed attempts do not count toward max.failed.attempts.

Enabled by:

grep -n "tez.am.speculation.enabled\|speculation" \
  tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java | head

"Best attempt" selection

When multiple attempts of the same task exist, the first to send TA_DONE (successful completion) wins. The handler:

  1. Marks that attempt as the canonical one (cached in TaskImpl).
  2. Iterates remaining attempts, sending each a kill event.
  3. Transitions task to SUCCEEDED.
grep -n "successfulAttempt\|setWinnerAttempt\|markSuccessful" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java | head

Downstream vertices reading from this task's output use the winner's outputLocationHint for shuffle (see shuffle-sort.md).


Reading exercise

# Surface
sed -n '1,120p' tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java

# State machine block
grep -n "addTransition" tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java | head -40

# Retry logic
grep -n "addAttempt\|nextAttemptNumber\|createAttempt" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java | head

# Speculation hook
grep -rn "T_ADD_SPEC_ATTEMPT" tez-dag/src/main/java/org/apache/tez/dag/app/ | head

Answer:

  1. What is the precise condition for transitioning from RUNNING to FAILED on a T_ATTEMPT_FAILED event? Cite the line.
  2. Where is a new TaskAttemptImpl constructed? Is it a public method or private to TaskImpl?
  3. How does TaskImpl know whether a failed attempt should count toward the failure budget?
  4. In what state can T_ADD_SPEC_ATTEMPT arrive? What does the handler do?
  5. Why does TaskImpl not own its own scheduling? Who does?
  6. When a task succeeds with two parallel attempts, which one becomes the downstream input? How is the loser cleaned up?

Common bugs and symptoms

SymptomRoot causeWhere to look
Task retries forever and never failsmax.failed.attempts set absurdly high; or all failures classified as "kill"Check config; verify TerminationCause for each failure
Speculation kills the original just after it succeeds (lost work)Race on markSuccessful and speculative-attempt killEnsure speculator backs off when task is in completing
Task SUCCEEDED but a sibling attempt still appears as RUNNING for a long timeContainer slow to acknowledge killLook at ContainerHeartbeatHandler and TA_KILL_REQUEST
Task succeeded reported but downstream cannot fetch outputsRace between TA_DONE and output ready eventCheck ordering of outputReady umbilical calls
Recovery brings task back as RUNNING even though it had finishedMissing TaskFinishedEvent in recovery logInvestigate RecoveryService flush boundaries

Validation: prove you understand this

  1. Draw the TaskImpl state machine from memory, including all six states.
  2. From TestTaskImpl, identify a test that drives a task to FAILED. Walk the events it sends.
  3. List the four TaskAttemptTerminationCause categories that do not count toward max.failed.attempts. Cite the enum and the consumer.
  4. Trace, line by line, what TaskImpl does when T_ATTEMPT_SUCCEEDED arrives for the second of two concurrent attempts.
  5. Modify TaskImpl to log the winner's attempt number explicitly at the INFO level. Run a MiniTezCluster job and observe.

TaskAttemptImpl Lifecycle

TaskAttemptImpl is the AM-side representation of a single execution attempt of a task. It owns the container assignment, the umbilical, the output commit decision, and — critically — the TaskAttemptTerminationCause that drives upstream retry decisions.

After this chapter you should be able to look at any TaskAttemptImpl state in an AM log and explain (a) what container holds it, (b) which umbilical calls have or have not landed, and (c) what its termination cause will be if it dies right now.


File

tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptImpl.java

Tests:

tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestTaskAttempt.java

Termination cause enum:

tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptTerminationCause.java

The states (typical 0.10.x naming)

StateMeaning
NEWConstructed; not yet given to scheduler
START_WAITRequest sent to scheduler; awaiting container
SUBMITTEDContainer allocated; awaiting launch ack (some versions)
RUNNINGContainer launched; processor executing
SUCCEEDEDTerminal: TA_DONE received
KILL_IN_PROGRESSKill requested; awaiting confirmation
KILLEDTerminal: killed before/during execution
FAIL_IN_PROGRESSFailure recognized; cleaning up
FAILEDTerminal: failed (counts against max.failed.attempts)

Exact list varies by branch. Verify:

grep -n "TaskAttemptStateInternal\." \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptImpl.java | head

Tez separates the external state (TaskAttemptState in tez-api, the 3-state coarse enum visible to ATS) from the internal state machine state (richer). Mapping:

InternalExternal
NEW, START_WAIT, SUBMITTED, RUNNINGSTARTING / RUNNING
SUCCEEDEDSUCCEEDED
KILL_IN_PROGRESS, KILLEDKILLED
FAIL_IN_PROGRESS, FAILEDFAILED

State × event matrix (key transitions)

StateEventNext stateNotes
NEWTA_SCHEDULESTART_WAITrequest container
START_WAITTA_STARTEDSUBMITTED/RUNNINGcontainer launched
START_WAITTA_CONTAINER_TERMINATINGKILL_IN_PROGRESSpreemption before launch
RUNNINGTA_DONESUCCEEDEDdone(...) umbilical call
RUNNINGTA_FAILEDFAIL_IN_PROGRESSprocessor threw
RUNNINGTA_TIMED_OUTFAIL_IN_PROGRESSheartbeat exceeded tez.task.timeout-ms
RUNNINGTA_KILL_REQUESTKILL_IN_PROGRESSexternal kill
RUNNINGTA_CONTAINER_TERMINATEDFAIL_IN_PROGRESS / KILL_IN_PROGRESSNM said container died
KILL_IN_PROGRESSTA_CONTAINER_TERMINATEDKILLEDcleanup done
FAIL_IN_PROGRESSTA_CONTAINER_TERMINATEDFAILEDcleanup done
grep -c "addTransition" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptImpl.java

Container assignment

When a TaskAttempt becomes schedulable, the AM:

  1. Builds a ContainerRequest (resource, priority, locality).
  2. Hands it to TaskSchedulerManager.allocateTask(...).
  3. The scheduler (YarnTaskSchedulerService) eventually matches a granted container.
  4. The match drives an AMSchedulerEventTAEnded/...TALaunchRequest flow that updates the TaskAttemptImpl state.
  5. ContainerLauncherManager actually starts the JVM via NMClient.
grep -n "allocateTask\|deallocateTask\|AMSchedulerEvent" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptImpl.java | head

The container is not assigned at construction; that's why the START_WAIT state exists. Some configurations short-circuit this via container reuse (the scheduler offers a free, idle container).

See container-reuse.md and scheduler.md.


Output commit rules (per attempt)

For attempts of vertices with an OutputCommitter:

ConditionCommit who?
Output commits are at the task level (tez.am.commit-all-outputs-on-dag-success=false)Each TaskAttemptImpl runs commit() from inside the task JVM (via processor)
Output commits are at the vertex level (default for MROutput)Only the AM commits, after all tasks succeed (see vertex-lifecycle.md)

Losing speculative attempts must not commit. The setOutputCommitted(true) flag on TaskAttemptImpl records who actually committed. The AM ensures exactly one attempt of each task has outputCommitted=true.

grep -n "outputCommitted\|commitOutput\|noCommit" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptImpl.java | head

TaskAttemptTerminationCause — the policy enum

sed -n '1,200p' \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptTerminationCause.java

Categories (the exact enum is long):

CauseCounts as failure?Typical trigger
TERMINATED_BY_CLIENTNoUser killed DAG
TERMINATED_AT_SHUTDOWNNoAM shutting down
TERMINATED_INEFFECTIVE_SPECULATIONNoLost the speculation race
INTERNAL_PREEMPTIONNoAM preempted it (e.g., for higher-priority work)
EXTERNAL_PREEMPTIONNoYARN preempted the container
CONTAINER_EXITEDYes (default)Container died mid-run
NODE_FAILEDYesNM died
TASK_HEARTBEAT_ERRORYesHeartbeat timeout
OUTPUT_LOSTYesDownstream reported output gone (rerun)
APPLICATION_ERRORYesProcessor threw

TaskImpl uses cause.causesFailure() (or equivalent) to decide whether to bump the failure counter.


Reading exercise

sed -n '1,160p' tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptImpl.java
grep -n "TaskAttemptStateInternal\." \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptImpl.java | head -20
grep -n "TerminationCause" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptImpl.java | head -20

# Heartbeat timeout path
grep -n "TA_TIMED_OUT\|heartbeatTimeout" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptImpl.java | head

Answer:

  1. What event arrives when an attempt's container heartbeat times out? What issues it?
  2. What is the difference between TA_FAILED and TA_CONTAINER_TERMINATED? When does each fire?
  3. Which TaskAttemptTerminationCause values are not counted toward tez.am.task.max.failed.attempts?
  4. In what state does an attempt sit during container provisioning?
  5. What does outputCommitted track, and how is it used by the AM to choose the canonical attempt?
  6. Why are there separate FAIL_IN_PROGRESS and FAILED states (likewise for kill)?

Common bugs and symptoms

SymptomRoot causeWhere to look
Attempt stuck in START_WAIT for minutesScheduler can't satisfy locality/resourceTaskSchedulerManager log; relax locality
Attempt marked FAILED when container was preemptedTerminationCause set incorrectlyCheck the TA_CONTAINER_TERMINATED handler
Two attempts both commit outputs (data corruption)setOutputCommitted race; speculative commitRun TestSpeculation; ensure committer is idempotent
TaskAttempt heartbeat timeout fires even though task was runningAM GC pause; clock skewTune AM heap; check NM/AM clock drift
Recovery comes back with all attempts FAILEDRecovery log lacks TaskAttemptStartedEvent for last attemptForce flush before submitting next event
KILL_IN_PROGRESS lingersTA_CONTAINER_TERMINATED never arrivesNM is dead; AM eventually times out container

Validation: prove you understand this

  1. Without running code: given an attempt in RUNNING and event TA_CONTAINER_TERMINATED with cause INTERNAL_PREEMPTION, what is the next state and does the failure counter increment?
  2. From the enum, list every TaskAttemptTerminationCause and tag each "counts" / "does not count".
  3. Reproduce a heartbeat timeout on MiniTezCluster by suspending a task JVM. Identify the exact log line that transitions the attempt.
  4. Walk the path from TaskCommunicatorManager.heartbeat returning a LATEST_RESPONSE_TIMEOUT to TaskAttemptImpl.handle(TA_TIMED_OUT).
  5. Verify that a speculative-loser attempt does not corrupt counters by reading the kill-handler code.

State Machines

Tez's AM uses Hadoop's StateMachineFactory extensively: every long-lived entity (DAGImpl, VertexImpl, TaskImpl, TaskAttemptImpl, container state objects) is a state machine. This chapter explains the API, the dispatcher contract that keeps state machines correct, the AsyncDispatcher vs DrainDispatcher distinction, the common InvalidStateTransitonException bug class, and the discipline required to add a transition safely.

After this chapter you should be able to write a small state machine from scratch and review a transition-modifying patch.


The API

The factory lives in:

hadoop-yarn-common
  org/apache/hadoop/yarn/state/StateMachineFactory.java
  org/apache/hadoop/yarn/state/SingleArcTransition.java
  org/apache/hadoop/yarn/state/MultipleArcTransition.java
  org/apache/hadoop/yarn/state/InvalidStateTransitonException.java

(Yes, the exception is spelled Transiton in the Hadoop source — historical typo, preserved for compatibility. Greps that look for Transition will miss it.)

Skeleton:

private static final StateMachineFactory<MyEntity, MyState, MyEvtType, MyEvt>
    stateMachineFactory =
    new StateMachineFactory<MyEntity, MyState, MyEvtType, MyEvt>(MyState.NEW)

        // Single-arc: state, event, nextState, transition
        .addTransition(MyState.NEW, MyState.RUNNING,
            MyEvtType.START,
            new StartTransition())

        // Multiple-arc: state, set of possible next states, event, transition
        .addTransition(MyState.RUNNING,
            EnumSet.of(MyState.SUCCEEDED, MyState.FAILED),
            MyEvtType.DONE,
            new DoneTransition())

        // Self-loop: state, state, event, transition
        .addTransition(MyState.RUNNING, MyState.RUNNING,
            MyEvtType.HEARTBEAT,
            new HeartbeatTransition())

        // No-op self-loop with no transition object
        .addTransition(MyState.SUCCEEDED, MyState.SUCCEEDED,
            EnumSet.of(MyEvtType.HEARTBEAT))

        .installTopology();

installTopology() returns a builder you store; per-instance:

private final StateMachine<MyState, MyEvtType, MyEvt> stateMachine =
    stateMachineFactory.make(this);

public void handle(MyEvt event) {
  writeLock.lock();
  try {
    MyState oldState = stateMachine.getCurrentState();
    try {
      stateMachine.doTransition(event.getType(), event);
    } catch (InvalidStateTransitonException e) {
      LOG.error("Invalid event " + event.getType() + " at " + oldState);
      // typically: re-throw or transition to ERROR
    }
  } finally {
    writeLock.unlock();
  }
}

Single-arc vs multiple-arc

ConceptWhen to useImplementation
SingleArcTransition<OPERAND, EVENT>The next state is always the samevoid transition(OPERAND op, EVENT event)
MultipleArcTransition<OPERAND, EVENT, STATE>Next state depends on event contentSTATE transition(OPERAND op, EVENT event) (returns the chosen state)

You almost always start with SingleArcTransition. Promote to MultipleArcTransition only when the next state legitimately depends on runtime data (e.g., "if task count == 0 then SUCCEEDED else RUNNING").


Dispatcher contract

State machines are not thread-safe by themselves. Tez upholds correctness via the single-dispatcher-thread invariant:

  • All events for a DAGAppMaster's state machines flow through one AsyncDispatcher.
  • The dispatcher has one thread that pulls events and calls handle(event).
  • Therefore handlers run serially; no two handle() calls overlap.

This invariant is the reason VertexImpl.handle can manipulate fields without synchronization. Break the invariant and you get races no test will catch consistently.

grep -n "AsyncDispatcher\|GenericEventHandler" \
  tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java

AsyncDispatcher vs DrainDispatcher

ClassWhereBehavior
AsyncDispatcherProductionBackground thread; events processed asynchronously
DrainDispatcherTestsSame API; tests call await() to block until queue empty

Tests use DrainDispatcher so they can assert state after a known set of events has been processed:

DrainDispatcher dispatcher = new DrainDispatcher();
dispatcher.register(VertexEventType.class, vertexEventHandler);
dispatcher.init(conf);
dispatcher.start();
dispatcher.getEventHandler().handle(new VertexEvent(...));
dispatcher.await();   // blocks until queue empty
assertEquals(VertexState.RUNNING, vertex.getState());
find . -name "DrainDispatcher.java"
grep -rn "new DrainDispatcher" tez-dag/src/test/java | head

InvalidStateTransitonException

Thrown when doTransition(type, event) finds no registered handler for the (currentState, eventType) pair. The exception message has the form:

Invalid event: V_TASK_RESCHEDULED at SUCCEEDED

Common causes:

  1. A late event arrived after the entity reached a terminal state (race between cancellation and a completion event).
  2. A new code path emits an event but the receiving state machine forgot to register a handler.
  3. The event sender misunderstood the protocol.

Fixing one of these almost always requires:

  • Adding a (state, eventType, sameState) no-op transition (case 1).
  • Adding a real transition (case 2).
  • Removing the bogus emit (case 3).

Never silently catch and swallow the exception in production code — it indicates a real protocol violation, and an unhandled exception in the dispatch thread is a worse outcome than a graceful error.


How to add a transition safely

Process every Tez committer follows when modifying a state machine:

  1. Find the existing transitions for the state — read all addTransition(STATE, ...) lines.
  2. Identify the gap — confirm the event is not already handled.
  3. Add the transition in the correct alphabetical/grouping order the file uses.
  4. Add a unit test to the corresponding Test*Impl class that triggers the new event in the relevant state.
  5. Update related no-op transitions for terminal states (a new event needs no-op handlers in SUCCEEDED, FAILED, KILLED).
  6. Run all tests in the module before opening a PR.

The discipline "always update the test in the same patch" is enforced by reviewers. PRs that change VertexImpl without changes to TestVertexImpl are typically blocked.


Reading exercise

# Find the factory blocks
grep -n "stateMachineFactory" tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/*.java

# Count transitions per entity
for f in tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java \
         tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java \
         tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java \
         tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptImpl.java; do
  echo "$f $(grep -c addTransition $f)"
done

# Look at one transition impl
grep -n "class StartTransition\|class InitTransition" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head

Answer:

  1. Why is the exception named InvalidStateTransitonException (with a typo)? What would happen if you renamed it?
  2. Which Tez class uses MultipleArcTransition most heavily, and why?
  3. What does installTopology() return, and why is the factory typically a static final field?
  4. In TestVertexImpl, find the DrainDispatcher.await() calls. Why are they essential and what failure mode occurs if you forget?
  5. If two threads call vertexImpl.handle(event) concurrently — bypassing the dispatcher — what specific bug class arises?
  6. Read one MultipleArcTransition and explain how its return value determines the next state.

Common bugs and symptoms

SymptomRoot causeFix
InvalidStateTransitonException: Invalid event X at TERMINAL_STATELate event after terminal stateAdd a no-op transition
Test passes locally, fails on CI intermittentlyDrainDispatcher.await() missing or called too earlyAlways call await() between event sends and asserts
State machine mutates wrong fieldsTransition class accidentally captures outer stateMake transition classes static; pass everything via the event
Dispatcher thread deadlocksHandler is doing blocking I/O on dispatch threadMove I/O to a worker; emit a follow-up event when done
addTransition for a no-op throws compile errorWrong arity overloadUse the variant with EnumSet<EventType>
Adding a transition silently breaks recoveryRecovery replay hits the new event in an old stateCover the recovery test path; recovery uses the same SM

Validation: prove you understand this

  1. Implement a Light state machine with states OFF, ON, BROKEN and events TOGGLE, BREAK. Compile and unit-test.
  2. Find every SingleArcTransition in VertexImpl that is registered as static final — explain why static.
  3. Take an InvalidStateTransitonException from a real AM log; map it to the exact (state, event) pair and propose either a fix or a JIRA.
  4. Run TestVertexImpl#testKilledTasksHandling. Identify every DrainDispatcher.await() call and what it guards.
  5. Add a (SUCCEEDED, T_HEARTBEAT, SUCCEEDED) no-op to TaskImpl and the corresponding test in TestTaskImpl. Ensure all tests pass.

Event Routing

Events are the only sanctioned API for mutating any AM-side entity. This chapter catalogs the event hierarchy, explains the "events are the only mutation API" rule, walks how a single task-completion percolates up to the DAG, and shows where each event is registered and dispatched.

After this chapter you should be able to trace any state transition in the AM back through the chain of events that caused it.


The hierarchy

hadoop-yarn-common
  org/apache/hadoop/yarn/event/AbstractEvent<EVT_TYPE>
  org/apache/hadoop/yarn/event/EventHandler<E>
  org/apache/hadoop/yarn/event/AsyncDispatcher

tez-dag
  org/apache/tez/dag/app/dag/event/
    DAGEvent (subclasses: DAGEventStart, DAGEventDAGAttemptStarted, ...)
    VertexEvent (subclasses: VertexEventTaskCompleted, VertexEventVertexCompleted, ...)
    TaskEvent (subclasses: TaskEventTAUpdate, TaskEventTermination, ...)
    TaskAttemptEvent (subclasses: TaskAttemptEventStartedRemotely, ...)
    AMSchedulerEvent
    AMContainerEvent
    AMNodeEvent
    SpeculatorEvent
    ...

Hint to grep all event classes:

find tez-dag/src/main/java/org/apache/tez/dag/app -path "*event*" -name "*.java" \
  | xargs grep -l "extends AbstractEvent\|extends DAGEvent\|extends VertexEvent" \
  | head -30

The AbstractEvent<E> base has two fields: an event type (enum) and a timestamp. Concrete event classes add payloads (e.g., VertexEventTaskCompleted carries the TezTaskID and the TaskAttemptIdentifier).


The "events are the only mutation API" rule

This rule is the bedrock of correctness:

Any change to the externally observable state of a DAGImpl, VertexImpl, TaskImpl, or TaskAttemptImpl must occur inside a state-machine transition handler, triggered by an event that flowed through the AsyncDispatcher.

Concretely:

  • Never call a setter directly on VertexImpl from another thread.
  • Never have one entity reach into another and mutate. Send an event.
  • The only "side door" is read-only getters (intentionally not synchronized; callers tolerate slight staleness).

Why this rule:

  1. Concurrency safety — the dispatcher serializes everything. Direct mutation re-introduces races.
  2. Auditability — events appear in the AM log; field writes do not.
  3. RecoverabilityRecoveryService writes events; replay rebuilds state. Mutations outside events are invisible to recovery.
  4. TestabilityDrainDispatcher controls the world; bypass it and tests become non-deterministic.

A patch that calls a mutator method outside a transition handler is, by convention, immediately rejected.


Bubble-up: a task completion to the DAG

sequenceDiagram
    participant TA as TaskAttemptImpl
    participant T as TaskImpl
    participant V as VertexImpl
    participant D as DAGImpl
    participant DI as Dispatcher

    Note over TA: heartbeat -> done(...) on umbilical
    TA->>TA: handle(TA_DONE)
    TA-->>DI: emit T_ATTEMPT_SUCCEEDED
    DI->>T: handle(T_ATTEMPT_SUCCEEDED)
    T->>T: mark winner; check siblings
    T-->>DI: emit V_TASK_COMPLETED (success)
    DI->>V: handle(V_TASK_COMPLETED)
    V->>V: bump succeededTaskCount
    alt All tasks done
        V-->>DI: emit V_COMMIT_REQUEST (if applicable)
        V-->>DI: emit DAG_VERTEX_COMPLETED
        DI->>D: handle(DAG_VERTEX_COMPLETED)
        D->>D: bump succeededVertexCount
    end

Every arrow is a state-machine transition. Every emit is an eventHandler.handle(...) call inside the transition body.

Find the emit sites:

grep -n "eventHandler.handle\b" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java | head
grep -n "eventHandler.handle\b" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head

Where events are registered

Registrations live in DAGAppMaster.serviceInit (see dag-app-master.md):

grep -n "dispatcher.register\|register(.*\.class" \
  tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java

Each registration maps an event type to a handler. Most handlers are inner classes that delegate to entity.handle(event):

private class TaskEventDispatcher implements EventHandler<TaskEvent> {
  @Override
  public void handle(TaskEvent event) {
    DAG dag = context.getCurrentDAG();
    Task task = dag.getVertex(event.getTaskID().getVertexID())
                   .getTask(event.getTaskID());
    ((EventHandler<TaskEvent>) task).handle(event);
  }
}

Why the indirection: events carry IDs, not object references. The dispatcher handler does the resolve, then forwards.


Per-entity event types

EntityEvent type enumWhere emitted
DAGImplDAGEventTypeVertex completions, kill, recovery
VertexImplVertexEventTypeTask completions, manager callbacks, root input events
TaskImplTaskEventTypeAttempt completions, speculation, kill
TaskAttemptImplTaskAttemptEventTypeContainer events, umbilical events
TaskSchedulerManagerAMSchedulerEventTypeNew requests, completions, container availability
AMContainerImplAMContainerEventTypeLaunch, assignment, completion
HistoryEventHandlerHistoryEventTypeAny history-loggable change

Each enum lives next to the event class:

ls tez-dag/src/main/java/org/apache/tez/dag/app/dag/event/*EventType.java

Reading exercise

# Catalog
find tez-dag/src/main/java/org/apache/tez/dag/app -name "*Event.java" \
  | head -40

# Find a transition that emits other events
grep -B2 -A15 "class CommitCompletedTransition" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java

# Find AMSchedulerEvent emit sites
grep -rn "AMSchedulerEvent" tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/ | head

# Compare: emit vs direct mutation
grep -n "eventHandler.handle" tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | wc -l

Answer:

  1. Why does the dispatcher carry IDs (e.g., TezTaskID) inside events rather than object references?
  2. Find an event that crosses subsystems: e.g., TaskAttemptImpl emitting an AMSchedulerEvent. What is the receiver and what action does it take?
  3. List the four classes of events that VertexImpl.handle reacts to and the three classes it emits.
  4. How does the AM ensure ordering when multiple events for the same entity are emitted in quick succession?
  5. What happens if a transition handler throws an uncaught exception? Which thread catches it?
  6. Find one event that has no consumers (dead code). If you find one, propose its removal in a JIRA.

Common bugs and symptoms

SymptomRoot causeWhere to look
Inconsistent state visible to two gettersDirect mutation outside dispatcherAudit for setters called from non-handler code
Event "lost" — entity never sees itForgot to register handler in DAGAppMaster.serviceInitAdd registration; add unit test
Replay during recovery diverges from original runAn event was emitted but not recorded (recovery log gap)RecoveryService writer filter
Deadlock when one entity event handler tries to read another entityReader path uses a lock held elsewherePrefer event-emit over cross-entity reads
Test hangs in DrainDispatcher.await()Transition emitted an event of a type with no handler in testRegister the missing handler (no-op is fine)
One subsystem floods the dispatcherStorm of small events (e.g., per-heartbeat)Batch in the emitter; or upgrade to a separate dispatcher

Validation: prove you understand this

  1. Pick one transition in TaskAttemptImpl and trace every event it emits; for each, name the receiving entity.
  2. Open DAGAppMaster and list every event type registered, in order.
  3. Walk a V_KILL from DAGImpl.killDAG down to a TaskAttemptImpl actually shutting down its container.
  4. Write a unit test that triggers a transition with an event whose payload is malformed; verify the dispatcher logs the error without crashing.
  5. Explain why moving from AsyncDispatcher to a multi-threaded dispatcher would break Tez and what would have to change to support it.

IPO Abstractions

Input, Processor, Output (collectively "IPO") are the three core runtime contracts. A Tez task is built from one processor, zero or more inputs, and zero or more outputs. This chapter walks the abstractions, the distinction between the LogicalInput/LogicalOutput layer (rich, modern) and the plain Input/Output layer (used for raw byte pipelines), the lifecycle methods, merged inputs, root vs intermediate inputs, and the minimum skeleton needed to write a new input or output.

After this chapter you should be able to read any concrete IPO class in tez-runtime-library and explain what each lifecycle method is for.


The interfaces

tez-api/src/main/java/org/apache/tez/runtime/api/
  Input.java
  Output.java
  Processor.java
  LogicalInput.java                          (extends Input)
  LogicalOutput.java                         (extends Output)
  LogicalIOProcessor.java                    (extends Processor)
  AbstractLogicalInput.java                  (base class for custom inputs)
  AbstractLogicalOutput.java
  AbstractLogicalIOProcessor.java
  Reader.java                                (the byte/record stream interface)
  Writer.java
  MergedLogicalInput.java                    (combines multiple inputs)
  InputContext.java
  OutputContext.java
  ProcessorContext.java
  Event.java                                 (DataMovementEvent, etc.)
grep -n "^public " tez-api/src/main/java/org/apache/tez/runtime/api/LogicalInput.java

Plain Input/Output vs LogicalInput/LogicalOutput

LayerClassWhy it exists
Low-levelInputBare contract: provides a Reader
Low-levelOutputBare contract: provides a Writer
High-levelLogicalInputAdds events, lifecycle, knowledge of upstream completion
High-levelLogicalOutputAdds events (to AM and downstream)

Almost all production inputs/outputs are LogicalInput/LogicalOutput. The plain layer exists for primitive byte-stream cases (rarely used directly).


Lifecycle methods (LogicalInput)

public abstract class AbstractLogicalInput implements LogicalInput {
  // Called by the runtime when the task starts. Setup; no I/O yet.
  public abstract List<Event> initialize() throws Exception;

  // Called after `initialize` for *all* inputs has completed.
  // Begin actively pulling data.
  public abstract void start() throws Exception;

  // The processor calls this to get a Reader for this input.
  public abstract Reader getReader() throws Exception;

  // Handle data movement / control events from the AM (e.g., upstream task done).
  public abstract void handleEvents(List<Event> inputEvents) throws Exception;

  // Final cleanup; close streams; return any final events.
  public abstract List<Event> close() throws Exception;
}

Order in a task's life:

constructor -> setContext -> initialize -> start -> getReader -> close

initialize returns events to the AM (for example, InputInitializerEvents that ask the AM to do more split work). Most inputs return an empty list.


Lifecycle methods (LogicalOutput)

Mirror of input:

public abstract class AbstractLogicalOutput implements LogicalOutput {
  public abstract List<Event> initialize() throws Exception;
  public abstract void start() throws Exception;
  public abstract Writer getWriter() throws Exception;
  public abstract void handleEvents(List<Event> outputEvents) throws Exception;
  public abstract List<Event> close() throws Exception;
}

The close of an output is the most consequential call: it flushes pending data, returns CompositeDataMovementEvent (or VertexManagerEvent) telling the AM (and thus downstream vertices) what this output produced.


Root inputs vs intermediate inputs

KindSource of dataInitializer runs where?
Root inputExternal (HDFS, HBase, Kafka)AM-side: InputInitializer enumerates splits, emits InputDataInformationEvents
Intermediate inputUpstream Tez vertex outputNo initializer; data arrives via DataMovementEvent from the AM

MRInput is the canonical root input. Its AM-side initializer (MRInputAMSplitGenerator) calls InputFormat.getSplits(...) and pushes the resulting splits to tasks.

Intermediate inputs (e.g., OrderedGroupedKVInput) receive their data descriptors from the AM via DataMovementEvents — one event per upstream task completion, carrying the upstream task's location and partition.


MergedLogicalInput

When a vertex has multiple physical inputs that should look like one to the processor (e.g., a vertex group union), Tez wraps them in a MergedLogicalInput:

grep -n "MergedLogicalInput\|getInputs\|getReader" \
  tez-api/src/main/java/org/apache/tez/runtime/api/MergedLogicalInput.java

The processor calls getReader() once; the merged input combines all underlying readers. Common subclasses live in tez-runtime-library:

  • OrderedGroupedMergedInput — merge K/V streams preserving sort order.
  • ConcatenatedMergedKeyValueInput — concatenate.

Events flowing between AM and task

Event classDirectionCarries
DataMovementEventAM → task inputSource task index, source URL/path, partition
InputReadErrorEventtask input → AM"This source URL is broken, please re-route"
CompositeDataMovementEventtask output → AM (then forwarded)Bulk version of DataMovementEvent
InputDataInformationEventAM → task inputConcrete split (root inputs only)
InputInitializerEventtask → AM (initializer)Custom signal to the initializer
VertexManagerEventtask output → AM (vertex manager)Stats for auto-parallelism (ShuffleVertexManager)
ls tez-api/src/main/java/org/apache/tez/runtime/api/events/

Minimal LogicalInput skeleton

package com.example;

import org.apache.tez.runtime.api.*;
import org.apache.tez.runtime.api.events.*;
import java.io.IOException;
import java.util.Collections;
import java.util.List;

public class HelloLogicalInput extends AbstractLogicalInput {

  private final List<Event> deferred = new java.util.ArrayList<>();

  public HelloLogicalInput(InputContext ctx, int physicalInputCount) {
    super(ctx, physicalInputCount);
  }

  @Override
  public List<Event> initialize() throws IOException {
    // Allocate resources here. Do not do I/O.
    return Collections.emptyList();
  }

  @Override
  public void start() throws IOException {
    // Begin background fetch threads if any.
  }

  @Override
  public Reader getReader() throws IOException {
    // Return a Reader. Simplest: a no-op reader that reports EOF.
    return new SimpleStringReader("hello");
  }

  @Override
  public void handleEvents(List<Event> events) throws IOException {
    // Receive DataMovementEvents from the AM. Build internal routing.
  }

  @Override
  public List<Event> close() throws IOException {
    return Collections.emptyList();
  }
}

Real implementations to read for reference:

find tez-runtime-library/src/main/java -name "OrderedGrouped*Input*.java"
find tez-runtime-library/src/main/java -name "Unordered*Input.java"

Reading exercise

sed -n '1,140p' tez-api/src/main/java/org/apache/tez/runtime/api/LogicalInput.java
sed -n '1,140p' tez-api/src/main/java/org/apache/tez/runtime/api/LogicalOutput.java
grep -rn "extends AbstractLogicalInput" tez-runtime-library/src/main/java | head
grep -rn "extends AbstractLogicalOutput" tez-runtime-library/src/main/java | head

# Event flow
ls tez-api/src/main/java/org/apache/tez/runtime/api/events/

Answer:

  1. What is the ordering guarantee between initialize() calls across the multiple inputs/outputs of a task?
  2. When does start() get called relative to getReader()?
  3. What's the difference in return semantics between getReader() of a LogicalInput vs an MergedLogicalInput?
  4. Find one concrete LogicalOutput; identify what event types its close() returns and what downstream effect each has.
  5. Why does initialize() return List<Event> instead of void?
  6. What is the difference between InputInitializerEvent and InputDataInformationEvent? Who emits each?

Common bugs and symptoms

SymptomRoot causeWhere to look
Task hangs in getReader()Input's start() never returned; deadlock with handlerAlways make start() non-blocking
NullPointerException in handleEventsEvents arrived before initialize(); you're using a field not yet setAllocate state in initialize()
Downstream sees half the dataclose() returned Collections.emptyList() when it should have emitted DMEAlways emit completion events
Custom input never receives DataMovementEventsEdgeManager on the upstream side not aware of your partitioningCheck edge property OutputDescriptor matches your InputDescriptor
Root input never startsInitializer's handleInputInitializerEvent not implementedProvide a default; never silently drop
Task succeeds but produces no outputWriter was never flushed (forgot close())Verify with IFile size = 0 in logs

Validation: prove you understand this

  1. Write a minimal LogicalInput that produces 100 fixed strings via its Reader. Wire it into a one-vertex DAG and run on MiniTezCluster.
  2. From OrderedGroupedKVInput, identify exactly when handleEvents is called and what it does with each event.
  3. List the seven event classes in org.apache.tez.runtime.api.events.
  4. Diagram the events flowing from one upstream task's LogicalOutput.close() to a downstream task's LogicalInput.handleEvents().
  5. Explain why initialize() is split from start() rather than collapsed into a single method.

Logical vs Physical Plan

Tez exposes two planes to the user and to the runtime:

  • Logical plan — what the application author writes: vertices, edges, edge properties. Lives in tez-api. Immutable once submitted (mostly).
  • Physical plan — what the AM actually schedules: task instances per vertex, per-edge routing decisions, container assignments. Lives in tez-dag. Mutable at runtime via VertexManager reconfiguration and EdgeManagerPlugin routing.

This chapter walks the boundary between them.


The logical plane

A logical DAG is a DAG object containing Vertex objects connected by Edge objects, each carrying an EdgeProperty.

ls tez-api/src/main/java/org/apache/tez/dag/api/ | head -30

Key classes:

ClassFilePurpose
DAGtez-api/src/main/java/org/apache/tez/dag/api/DAG.javaThe DAG builder.
Vertextez-api/src/main/java/org/apache/tez/dag/api/Vertex.javaLogical vertex with processor + target parallelism.
Edgetez-api/src/main/java/org/apache/tez/dag/api/Edge.javaLogical edge between two vertices.
EdgePropertytez-api/src/main/java/org/apache/tez/dag/api/EdgeProperty.javaRouting + scheduling + storage characteristics.

EdgeProperty — four orthogonal axes

grep -n "enum DataMovementType\|enum DataSourceType\|enum SchedulingType" \
  tez-api/src/main/java/org/apache/tez/dag/api/EdgeProperty.java
public enum DataMovementType { ONE_TO_ONE, BROADCAST, SCATTER_GATHER, CUSTOM }
public enum DataSourceType   { PERSISTED, PERSISTED_RELIABLE, EPHEMERAL }
public enum SchedulingType   { SEQUENTIAL, CONCURRENT }
AxisValuesWhat it controls
DataMovementTypeONE_TO_ONE, BROADCAST, SCATTER_GATHER, CUSTOMHow source outputs map to destination inputs.
DataSourceTypePERSISTED, PERSISTED_RELIABLE, EPHEMERALWhether outputs survive a task failure; affects re-execution policy.
SchedulingTypeSEQUENTIAL, CONCURRENTWhether destination can start before source completes (required for pipelined shuffle and broadcast).
OutputDescriptor / InputDescriptorclass names + payloadsThe IO classes wired on each side of the edge.

A logical edge says nothing about which destination task index reads from which source task index. That decision is the EdgeManagerPlugin.


The physical plane

When the AM initializes a DAG it builds:

  • VertexImpl per logical Vertex (tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java)
  • TaskImpl[] per vertex, sized by parallelism (tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java)
  • TaskAttemptImpl per attempt of each task (tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptImpl.java)
  • Edge runtime objects with an active EdgeManagerPlugin (tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/Edge.java)
wc -l tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/{VertexImpl,TaskImpl,TaskAttemptImpl,Edge}.java

Mapping logical to physical

flowchart LR
  subgraph logical[Logical]
    LV1[Vertex A parallelism=3]
    LV2[Vertex B parallelism=2]
    LV1 -- "EdgeProperty SCATTER_GATHER" --> LV2
  end
  subgraph physical[Physical]
    A0[A.0] --> B0[B.0]
    A0 --> B1[B.1]
    A1[A.1] --> B0
    A1 --> B1
    A2[A.2] --> B0
    A2 --> B1
  end
  logical --> physical

Every source attempt produces partitions for every destination task. The EdgeManager decides which output partition goes to which input.


EdgeManagerPlugin — the routing brain

find tez-api/src/main/java -name "EdgeManagerPlugin.java"
cat tez-api/src/main/java/org/apache/tez/dag/api/EdgeManagerPlugin.java

The contract (paraphrased):

public abstract class EdgeManagerPlugin {
  public abstract void routeDataMovementEventToDestination(
      DataMovementEvent event,
      int srcTaskIndex,
      int outputIndex,
      Map<Integer, List<Integer>> destTaskAndInputIndices);

  public abstract int getNumDestinationConsumerTasks(int srcTaskIndex);
  public abstract int getDestinationConsumerTaskNumber(int srcTaskIndex,
                                                       int srcOutputIndex);
  public abstract int getNumDestinationTaskPhysicalInputs(int destTaskIndex);
  public abstract int getNumSourceTaskPhysicalOutputs(int srcTaskIndex);
}

Built-in implementations

find tez-dag/src/main/java -name "*EdgeManager*.java"
PluginDataMovementTypeRouting rule
ScatterGatherEdgeManagerSCATTER_GATHERSource task i produces N partitions; destination d reads partition d from every source.
BroadcastEdgeManagerBROADCASTEvery source output is consumed by every destination task.
OneToOneEdgeManagerONE_TO_ONERequires srcParallelism == destParallelism. Source i → destination i.
User-suppliedCUSTOMAnything. Cartesian product, range partitioning, etc.

Inspecting routing on a live AM

grep -n "EdgeManager\|setCustomEdgeManager" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/Edge.java | head -20

For each destination task Edge.sendTezEventToDestinationTasks() consults the plugin to expand source outputs into per-destination input events. The destination task receives a DataMovementEvent per logical input partition.


SCATTER_GATHER walkthrough

Source: A with parallelism 3, each task emits N partitions. Destination: B with parallelism 2.

cat tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/ScatterGatherEdgeManager.java

For source task A.1 emitting partitions [0, 1]:

CallReturns
getNumSourceTaskPhysicalOutputs(1)2 (= destination parallelism)
getNumDestinationTaskPhysicalInputs(0)3 (= source parallelism)
getNumDestinationConsumerTasks(1)2
routeDataMovementEventToDestination(event, 1, 0, out)out = { 0 -> [1] }
routeDataMovementEventToDestination(event, 1, 1, out)out = { 1 -> [1] }

Invariant: numSrcOutputs == destParallelism, numDestInputs == srcParallelism.


ONE_TO_ONE walkthrough

cat tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/OneToOneEdgeManager.java

Requires numSrcTasks == numDestTasks. Each source produces exactly one partition consumed by exactly one destination of the same index.

Common bug: changing destination parallelism via reconfigureVertex while a ONE_TO_ONE edge feeds it. Tez throws at edge initialization.


BROADCAST walkthrough

cat tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/BroadcastEdgeManager.java

Source emits a single logical output. Every destination task receives one input event per source task. Cost scales as srcParallelism * destParallelism — large broadcast vertices are an antipattern.


CUSTOM walkthrough — CartesianProductEdgeManager

find tez-dag/src/main/java -name "CartesianProductEdgeManager*.java"
wc -l $(find tez-dag/src/main/java -name "CartesianProductEdgeManager*.java")

CartesianProductVertexManager chunks source outputs and creates a 2D grid of destination tasks; the edge manager projects (srcChunkX, srcChunkY) → destIndex.

The CUSTOM movement type is the contract by which Hive ships its own routing for unconventional joins.


Runtime mutation: parallelism reconfiguration

A logical Vertex declares a target parallelism; the physical parallelism can change before the vertex starts running, via the VertexManager:

grep -n "reconfigureVertex" tez-api/src/main/java/org/apache/tez/dag/api/VertexManagerPluginContext.java

reconfigureVertex(int parallelism, VertexLocationHint, Map<String,EdgeProperty>) does three things in one atomic step inside VertexImpl:

  1. Resizes the TaskImpl[] array (must happen before any task is scheduled).
  2. Re-installs EdgeManagerPlugin instances on incoming edges.
  3. Updates location hints used by the scheduler.

Read the state machine guard:

grep -n "reconfigureVertex\|VertexState.INITED\|VertexState.INITIALIZING" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head -30

Reconfiguration is illegal once any task has been scheduled.

Worked example: ShuffleVertexManager auto-parallelism

  1. Vertex R declared with parallelism = 100 (pessimistic upper bound).
  2. Upstream tasks emit VertexManagerEvent payloads with byte counts per partition.
  3. ShuffleVertexManager.onVertexManagerEventReceived accumulates totals.
  4. After the slow-start threshold, it computes target = ceil(totalBytes / desiredTaskInputSize) clamped to [minParallelism, originalParallelism].
  5. Calls reconfigureVertex(target, null, updatedEdgeProps).
  6. VertexImpl resizes from 100 → e.g. 17 task instances.
  7. The edge manager on the incoming SCATTER_GATHER edge is rebuilt to route 100-partition outputs into 17 destinations (merging at the destination).

Reading exercise

  1. grep -n "createEdgeManager\|edgeManager =" tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/Edge.java — when is the EdgeManagerPlugin instantiated?
  2. cat tez-api/src/main/java/org/apache/tez/dag/api/EdgeProperty.java | head -100 — list all factory methods on EdgeProperty. Which require an EdgeManagerPluginDescriptor?
  3. grep -n "setParallelism\|setVertexParallelism" tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head — which state transitions accept a parallelism change?
  4. grep -rn "OneToOneEdgeManager\|ScatterGatherEdgeManager\|BroadcastEdgeManager" tez-dag/src/test — list the unit tests covering each built-in routing.
  5. cat tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/CartesianProductEdgeManager.java | head -120 — what state must this plugin keep across destination task initializations?
  6. grep -n "EdgeProperty\." ~/hive-src/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java | head — which edge property combinations does Hive build?

Answer these:

  • For SCATTER_GATHER what is the size of the output partition array of one source task?
  • For ONE_TO_ONE, what happens if the upstream vertex auto-parallelizes from 100 → 17 after the destination has been initialized?
  • For BROADCAST, what is the data volume amplification?
  • Which EdgeManager methods are called on every destination task init, and which once per edge?

Common bugs and symptoms

SymptomLikely cause
Vertex failed: Cannot change parallelism after tasks scheduledVertexManager.reconfigureVertex invoked after scheduleTasks. Fix ordering.
OneToOneEdgeManager: srcParallelism != destParallelismAuto-parallelism broke the ONE_TO_ONE invariant. Forbid auto-parallelism on ONE_TO_ONE edges.
Destination task receives 0 DataMovementEventsCustom EdgeManagerPlugin returned 0 from getNumDestinationTaskPhysicalInputs.
Hive query produces wrong row counts after a custom joinCUSTOM EdgeManagerPlugin mis-routed partitions; fence-post bug in routeDataMovementEventToDestination.
BROADCAST edge OOMs the destinationSource parallelism × payload size exceeds destination heap; switch to PERSISTED source type and stream from disk.

Validation: prove you understand this

  1. Given Vertex A (parallelism=4) SCATTER_GATHERVertex B (parallelism=3), compute the number of DataMovementEvents B.1 receives. Show the work.
  2. Explain in one sentence each: when does EdgeManagerPlugin get re-instantiated, and when does it survive across reconfiguration?
  3. Write a one-paragraph rejection of "let's just use BROADCAST for our 500-task lookup vertex" referencing concrete cost.
  4. Identify the exact line in VertexImpl.java where reconfigureVertex is rejected if tasks have been scheduled. Cite path + line number from grep -n.
  5. Sketch a CUSTOM EdgeManagerPlugin for range-partitioned merge: source task i emits keys in range [i*R, (i+1)*R); the destination is K tasks where K may differ from source parallelism. Define getNumDestinationTaskPhysicalInputs and the routing rule in code.

Shuffle and Sort

The shuffle layer is where Tez moves data between vertices. It splits into two halves, both living in tez-runtime-library:

  • Sort path — producer side: partition, sort, spill, merge. OrderedPartitionedKVOutputPipelinedSorter / DefaultSorterIFile segments on local disk.
  • Shuffle path — consumer side: fetch, merge, iterate. ShuffleManagerFetcherFetchedInputMergeManagerValuesIterator.

Between them sits the YARN ShuffleHandler aux service inside the NodeManager that serves spilled segments over HTTP.

ls tez-runtime-library/src/main/java/org/apache/tez/runtime/library/

The producer side

OrderedPartitionedKVOutput

find tez-runtime-library/src/main/java -name "OrderedPartitionedKVOutput.java"
wc -l $(find tez-runtime-library/src/main/java -name "OrderedPartitionedKVOutput.java")

The output that powers MapReduce-style shuffles. Lifecycle:

  1. initialize() — creates a Sorter (Pipelined or Default), allocates tez.runtime.io.sort.mb of byte buffer, registers as a MemoryUpdateCallback with the MemoryDistributor.
  2. getWriter() — returns a KeyValueWriter that delegates to the sorter.
  3. close() — calls sorter.flush() to merge spills into final segments and emits CompositeDataMovementEvent per partition with offsets into the merged file.

Two sorter implementations

find tez-runtime-library/src/main/java -name "PipelinedSorter.java" \
                                       -o -name "DefaultSorter.java"
SorterStrategyWhen to pick
DefaultSorterSingle in-memory accumulator; quicksort by (partition, key); spill when buffer crosses tez.runtime.sort.spill.percent; final merge of all spills.MapReduce parity, conservative memory.
PipelinedSorterMulti-buffer accumulator; concurrent spill thread; per-partition sort and merge; final spill writes the merged output in one pass.Large outputs, faster; default in Hive.

Configuration knobs:

KeyDefaultEffect
tez.runtime.io.sort.mb100Sort buffer in MB. Reused for both sorters.
tez.runtime.sort.spill.percent0.8Threshold to start spilling (DefaultSorter).
tez.runtime.sorter.classPIPELINEDPIPELINED or LEGACY (DefaultSorter).
tez.runtime.compressfalsePer-segment compression.
tez.runtime.compress.codecDefaultCodecSnappy, Lz4, Gzip.
tez.runtime.combiner.classunsetCombiner ran during spill merge.

IFile on-disk format

IFile is the segment format both sorters write.

find tez-runtime-library/src/main/java -name "IFile.java"
grep -n "class Writer\|class Reader\|EOF_MARKER\|writeKVPair" \
  tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/IFile.java | head -30

Per-record layout:

+--------------+--------------+----------------+----------------+
| keyLen (VInt)| valLen (VInt)| key bytes (KL) | value bytes (VL)|
+--------------+--------------+----------------+----------------+

End of segment:

keyLen = -1, valLen = -1   // EOF_MARKER

If compression is enabled, the bytes between the partition header and EOF_MARKER are compressed; the record framing is inside the compressed stream.

A sorter writes one IFile segment per partition per spill. After the final merge, an IFile.OutputStream produces one file per output with an *.index sibling that records (rawLen, partLen, compressedLen) per partition.

find tez-runtime-library/src/main/java -name "TezSpillRecord.java"
grep -n "rawLength\|partLength\|compressedLength" \
  $(find tez-runtime-library/src/main/java -name "TezSpillRecord.java")

The index file is what ShuffleHandler reads when a fetcher asks for partition p of source attempt (vertex, task, attempt).

Combiner integration

Both sorters honor tez.runtime.combiner.class. The combiner is invoked during the merge step (not during accumulation), running over sorted runs:

grep -n "combiner\|combineAndSpill\|runCombiner" \
  $(find tez-runtime-library/src/main/java -name "DefaultSorter.java")

A correct combiner is associative and commutative on the value space; Tez gives no guarantee on how many merge phases run it.

Spill walkthrough

sequenceDiagram
  participant P as Processor
  participant W as KeyValueWriter
  participant S as Sorter (Pipelined)
  participant D as Local disk
  P->>W: write(K, V) [N times]
  W->>S: collect into KV buffer
  S->>S: buffer crosses sort.spill.percent
  S->>D: spill_0.out (partitioned, sorted)
  S->>D: spill_0.out.index
  Note over S: continue accepting writes into next buffer
  P->>W: close()
  W->>S: flush()
  S->>D: merge spill_0..spill_N -> file.out + file.out.index
  S-->>P: CompositeDataMovementEvent per partition

The consumer side

OrderedGroupedKVInput and ShuffleManager

find tez-runtime-library/src/main/java -name "OrderedGroupedKVInput.java"
find tez-runtime-library/src/main/java -name "ShuffleManager.java"
find tez-runtime-library/src/main/java -name "Shuffle.java"

OrderedGroupedKVInput.initialize() constructs Shuffle which holds:

  • ShuffleManager — pool of Fetcher threads and inbound event queue.
  • MergeManager — receives FetchedInputs, decides in-memory vs disk placement, kicks off background merges.
  • ValuesIterator — the reader the processor sees.

Fetcher

find tez-runtime-library/src/main/java -name "Fetcher.java"
wc -l $(find tez-runtime-library/src/main/java -name "Fetcher.java")

A Fetcher is a thread that connects via HTTP to the NodeManager ShuffleHandler running on the source task's node:

GET /mapOutput?job=<jobId>&dag=<dagId>&reduce=<partition>&map=<attempt1,attempt2,...>

Multi-map response: ShuffleHandler streams all requested attempts back-to-back, each prefixed with a header (MapOutputInfo). The Fetcher reads the header, decides if the payload fits in memory (MergeManager.reserve), and either writes to an in-memory buffer or directly to disk.

Key configs:

KeyDefaultEffect
tez.runtime.shuffle.parallel.copies20Fetcher thread count per task.
tez.runtime.shuffle.connect.timeout30000HTTP connect timeout.
tez.runtime.shuffle.read.timeout180000HTTP socket read timeout.
tez.runtime.shuffle.fetch.max.task.output.at.once20Max attempts per HTTP request.
tez.runtime.shuffle.memory.limit.percent0.25Max fraction of heap held in-memory before forcing disk.
tez.runtime.shuffle.merge.percent0.9When in-mem buffer crosses this, kick a merge.

FetchedInput

grep -n "abstract class FetchedInput\|MemoryFetchedInput\|DiskFetchedInput" \
  $(find tez-runtime-library/src/main/java -name "FetchedInput.java")

A FetchedInput is one source partition payload. Two subclasses:

  • MemoryFetchedInput — bytes held in a ByteBuffer.
  • DiskFetchedInput — bytes on local disk under tez.runtime.shuffle.tmp.directory.

The MergeManager decides which based on size and current in-memory budget.

MergeManager

find tez-runtime-library/src/main/java -name "MergeManager.java"

Three merge tracks:

  1. In-memory merge — N in-memory inputs are merged into one in-memory buffer or spilled to disk.
  2. On-disk merge — N on-disk inputs are merged into a single larger on-disk segment.
  3. Final merge — at processor pull time, remaining in-memory and on-disk inputs are merged into a unified KeyValuesReader.
grep -n "InMemoryMerger\|OnDiskMerger\|finalMerge\|mergeFactor" \
  $(find tez-runtime-library/src/main/java -name "MergeManager.java") | head -20

io.sort.factor (default 100) — max segments merged in one pass; more segments trigger multiple passes.

ValuesIterator

find tez-runtime-library/src/main/java -name "ValuesIterator.java"
grep -n "next\|groupingKey\|valuesIter" \
  $(find tez-runtime-library/src/main/java -name "ValuesIterator.java") | head

Wraps the merged sorted stream, presenting (key, Iterable<value>) pairs to the processor — the classic reducer API.

Shuffle walkthrough

sequenceDiagram
  participant T as Task processor
  participant SM as ShuffleManager
  participant F as Fetcher
  participant NM as Source NM (ShuffleHandler)
  participant MM as MergeManager
  SM->>F: assign source attempt + partition
  F->>NM: GET /mapOutput?...
  NM-->>F: stream attempt headers + IFile bytes
  F->>MM: reserve(size)
  alt fits in memory
    MM-->>F: MemoryFetchedInput
  else too big
    MM-->>F: DiskFetchedInput
  end
  F->>MM: commit FetchedInput
  MM->>MM: kick InMemoryMerger / OnDiskMerger when thresholds crossed
  T->>SM: getReader() (blocks until all inputs done)
  SM->>MM: finalMerge()
  MM-->>T: KeyValuesReader (ValuesIterator)

ShuffleHandler is YARN's, not Tez's

ls /opt/hadoop/share/hadoop/yarn/lib/ | grep shuffle    # cluster path varies

org.apache.hadoop.mapred.ShuffleHandler lives in Hadoop. NodeManagers load it as an aux service via yarn-site.xml:

<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>

Tez piggybacks on this — Tez ships no NodeManager-side fetch service. Misconfigured aux services are a common cause of ConnectException in Fetcher.


Reading exercise

  1. grep -n "EOF_MARKER\|writeRecord" $(find tez-runtime-library/src/main/java -name "IFile.java") — verify the EOF sentinel value.
  2. wc -l $(find tez-runtime-library/src/main/java -name "PipelinedSorter.java" -o -name "DefaultSorter.java") — which is larger? Hypothesize why.
  3. grep -rn "tez.runtime.io.sort.mb\|tez.runtime.sort.spill.percent" tez-runtime-library/src/main/java — find every read site for these keys.
  4. grep -n "GET /mapOutput\|reduce=\|map=" $(find ~ -name ShuffleHandler.java 2>/dev/null | head -1) — read the exact request format.
  5. cat $(find tez-runtime-library/src/main/java -name "ShuffleManager.java") | head -200 — how is back-pressure on Fetcher threads applied?
  6. grep -n "combiner" tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/*.java — at what phases does the combiner run?

Common bugs and symptoms

SymptomLikely cause
Fetcher: java.net.ConnectExceptionmapreduce_shuffle aux service not configured or NM not running.
Shuffle error: java.io.IOException: Failed on local exception: org.apache.hadoop.security.AccessControlExceptionShuffleSecret missing or stale; check JobTokenSecretManager.
OOM during sorttez.runtime.io.sort.mb too high relative to container JVM heap.
OOM during shuffletez.runtime.shuffle.memory.limit.percent too high; in-memory inputs starve heap.
Premature EOF from inputStreamSource task wrote partial IFile (killed mid-spill); destination retries from another attempt.
Wrong reducer output countCombiner not idempotent across merge passes.
OnDiskMerger thrashingio.sort.factor too low; many tiny segments forcing many merge passes.
Long shuffle plateauOne source NM saturated; HDFS-local fetch concentration.

Validation: prove you understand this

  1. Sketch the byte layout of an IFile segment containing 3 records and a single partition. Show key/val lengths and the EOF marker.
  2. A reducer task reads from 200 mappers. With tez.runtime.shuffle.parallel.copies=20 and tez.runtime.shuffle.fetch.max.task.output.at.once=20, what is the minimum number of HTTP requests the fetcher pool must make? Justify.
  3. Explain why PipelinedSorter reduces wall time but not CPU time.
  4. Given a 10 GB shuffle into a 4 GB heap reducer with tez.runtime.shuffle.memory.limit.percent=0.25, predict which inputs go to disk versus memory and why.
  5. Identify the exact file and method where the URL pattern ?reduce=&map= is constructed on the Tez fetcher side. Use grep.

Tez Runtime Internals

The Tez runtime is the code that runs inside the container, not inside the AM. Its job: boot a JVM, accept tasks from the AM over umbilical RPC, run them to completion, and report status.

Three modules collaborate:

  • tez-runtime-internals — process boot, task driver, umbilical client, memory broker.
  • tez-runtime-library — concrete Input / Processor / Output implementations (KV, shuffle, etc).
  • tez-api — the SPI the user implements (AbstractLogicalInput, AbstractLogicalOutput, AbstractLogicalIOProcessor).
ls tez-runtime-internals/src/main/java/org/apache/tez/runtime/

The container process: TezChild

TezChild.main() is the JVM entry point for every Tez task container.

find tez-runtime-internals/src/main/java -name "TezChild.java"
grep -n "public static void main\|new TezChild\|run()" \
  $(find tez-runtime-internals/src/main/java -name "TezChild.java")

Boot sequence (paraphrased from TezChild.java):

  1. Read JVM args: AM host, AM port, container ID, application attempt ID, PID, JVM identifier.
  2. Read the security tokens from $HADOOP_TOKEN_FILE_LOCATION.
  3. Construct TezTaskUmbilicalProtocol RPC proxy pointing at the AM TaskAttemptListenerImpl.
  4. Enter TezChild.run() — an infinite loop: a. umbilical.getTask(containerContext) — blocks until the AM hands us a ContainerTask. b. If ContainerTask.shouldDie(), exit cleanly. c. Otherwise build a TezTaskRunner2 for the assigned attempt and call runner.run(). d. Loop — this is container reuse: same JVM, next task.
flowchart TD
  S[JVM start] --> P[Parse args + tokens]
  P --> R[RPC connect to AM]
  R --> L{umbilical.getTask}
  L -- shouldDie --> X[exit]
  L -- new task --> T[TezTaskRunner2.run]
  T --> L

Why container reuse needs this loop

Allocating a YARN container costs hundreds of ms; starting a JVM costs seconds. Tez amortizes both by running multiple tasks in the same TezChild process. See container-reuse.md for the AM side.


TezTaskRunner2 — the task driver

find tez-runtime-internals/src/main/java -name "TezTaskRunner2.java"
wc -l $(find tez-runtime-internals/src/main/java -name "TezTaskRunner2.java")

Per-attempt driver. Owns:

  • a LogicalIOProcessorRuntimeTask (the actual task body),
  • the input/output initialization thread pool,
  • abort hooks (kill, fatal error, timeout).

Lifecycle of a single attempt:

sequenceDiagram
  participant TC as TezChild
  participant TR as TezTaskRunner2
  participant T as LogicalIOProcessorRuntimeTask
  participant IO as Inputs / Outputs
  participant P as Processor
  TC->>TR: new + run()
  TR->>T: initialize()
  T->>IO: input.initialize() (parallel)
  T->>IO: output.initialize() (parallel)
  T->>P: processor.initialize()
  TR->>T: run() => processor.run(inputs, outputs)
  P->>IO: read inputs, write outputs
  T->>IO: output.close() (parallel)
  T->>IO: input.close() (parallel)
  TR->>TC: result (success/failure)

Failure routes:

  • Input init throws → TaskFailedException to AM, attempt fails.
  • Processor throws → same.
  • AM sends kill via umbilical heartbeat reply → TezTaskRunner2.killTask() interrupts the processor thread.
  • Fatal error on any IO → TaskReporter.notifyFatalError() short-circuits the run.

LogicalIOProcessorRuntimeTask — orchestrator

find tez-runtime-internals/src/main/java -name "LogicalIOProcessorRuntimeTask.java"
wc -l $(find tez-runtime-internals/src/main/java -name "LogicalIOProcessorRuntimeTask.java")

This is the class that actually instantiates the user's IPO triple.

initialize() does, in order:

  1. Build the TezConfiguration for this task from the AM-provided TaskSpec.
  2. Build the MemoryDistributor (next section) over all IOs.
  3. For each InputSpec: instantiate the input class, set its InputContext, call initialize() on a worker thread.
  4. Same for each OutputSpec.
  5. Instantiate the processor; call processor.initialize(processorContext).
  6. Wait for all input/output initialize() calls to complete (parallel).

run():

  1. Block until every input reports it has data (or signals empty).
  2. Call processor.run(inputs, outputs) on the main thread.
  3. On return, call output.close() for every output (parallel), then input.close() for every input (parallel).
  4. Collect counters; hand the final TaskStatus back to TezTaskRunner2.

Key field:

grep -n "initializerCompletionService\|runInputRunnable\|runOutputRunnable" \
  $(find tez-runtime-internals/src/main/java -name "LogicalIOProcessorRuntimeTask.java") | head

Parallel init is what makes Tez fast for processors with many inputs (eg multi-input joins).


MemoryDistributor

find tez-runtime-internals/src/main/java -name "MemoryDistributor.java"
cat $(find tez-runtime-internals/src/main/java -name "MemoryDistributor.java") | head -160

A single broker that hands out portions of the task's JVM heap to IOs that ask for memory.

Flow:

  1. At task init, each Input / Output calls context.requestInitialMemory(size, callback) with what it would like to reserve (e.g. OrderedPartitionedKVOutput requests tez.runtime.io.sort.mb MB).
  2. MemoryDistributor.makeInitialAllocations() runs an InitialMemoryAllocator plugin (default: WeightedScalingMemoryDistributor) to scale requests down to fit the container's available heap.
  3. Allocations are dispatched to callbacks; each IO learns its actual budget via MemoryUpdateCallback.memoryAssigned(long).
  4. IO classes resize their buffers accordingly.

Configuration knobs:

KeyEffect
tez.runtime.task.initial.memory.allocator.classPlugin to use. Default WeightedScalingMemoryDistributor.
tez.task.scale.memory.enabledMaster toggle.
tez.task.scale.memory.ratiosPer-IO-class weight overrides.
tez.task.scale.memory.reserve-fractionReserved for processor/JVM.
grep -n "requestInitialMemory\|memoryAssigned" \
  $(find tez-runtime-library/src/main/java -name "OrderedPartitionedKVOutput.java")

Without the distributor an output would request its configured size verbatim, potentially OOMing the container when summed across IOs.


TaskReporter and the umbilical

find tez-runtime-internals/src/main/java -name "TaskReporter*.java"
find tez-api/src/main/java -name "TezTaskUmbilicalProtocol.java"

TaskReporter runs a heartbeat thread per task attempt. Each cycle:

  1. Collect outbound events (counter updates, completion events from completed IOs).
  2. Call umbilical.heartbeat(request) where request contains attempt ID, counters, status messages, and the outbound TezEvent list.
  3. Decode the reply: AM may push back inbound TezEvents (e.g. DataMovementEvents from upstream tasks), a shouldDie flag, or a shouldReset flag.
  4. Dispatch inbound events into the appropriate Input via LogicalIOProcessorRuntimeTask.handleEvents().

Interval: tez.task.am.heartbeat.interval-ms (default 100) plus a counter-update interval tez.task.am.heartbeat.counter.interval-ms (default 4000).

Why heartbeats carry events

Tez has no separate "event bus" between AM and containers. Everything piggybacks on the umbilical heartbeat. This means:

  • Event latency is bounded below by heartbeat.interval-ms.
  • A wedged umbilical (network partition) blocks all task communication; tez.task.timeout-ms (default 5 minutes) eventually fires and the AM considers the attempt lost.

End-to-end task lifecycle inside the JVM

grep -n "phase\|TaskRunnerPhase" $(find tez-runtime-internals/src/main/java -name "TezTaskRunner2.java") | head
PhaseOwnerWhat happens
1 ReceiveTezChild.runumbilical.getTask returns a ContainerTask.
2 BuildTezTaskRunner2Construct LogicalIOProcessorRuntimeTask, hook up TaskReporter.
3 InitLogicalIOProcessorRuntimeTask.initializeMemoryDistributor + parallel IO init + processor init.
4 RunLogicalIOProcessorRuntimeTask.runprocessor.run(inputs, outputs).
5 ClosesameOutputs close (flush spills, emit DataMovementEvents), inputs close.
6 ReportTaskReporter final tickSend counters + completion event. AM transitions attempt to SUCCEEDED.
7 LoopTezChild.runDiscard task object, request next.

Reading exercise

  1. grep -n "shouldDie\|exit(" $(find tez-runtime-internals/src/main/java -name "TezChild.java") — list every termination path.
  2. grep -n "initialize\(\)\|run\(\)\|close\(\)" $(find tez-runtime-internals/src/main/java -name "LogicalIOProcessorRuntimeTask.java") | head -40 — verify the lifecycle order.
  3. cat $(find tez-runtime-internals/src/main/java -name "MemoryDistributor.java") | head -100 — how does it handle the case where summed requests exceed available?
  4. grep -n "heartbeat\|TezTaskUmbilical" $(find tez-runtime-internals/src/main/java -name "TaskReporter.java") | head — find the heartbeat loop body.
  5. cat tez-api/src/main/java/org/apache/tez/runtime/api/AbstractLogicalIOProcessor.java — read the user-facing processor contract.
  6. wc -l $(find tez-runtime-internals/src/main/java -name "*.java" | head -20) — find the biggest classes in the runtime module.

Common bugs and symptoms

SymptomLikely cause
Container OOM during initMemoryDistributor disabled or summed IO requests exceed heap. Enable tez.task.scale.memory.enabled.
TaskAttempt timed out after 5 min of no heartbeatTaskReporter thread died (uncaught exception) or RPC hung.
Processor sees zero eventsInbound events not delivered — check TaskReporter.heartbeat reply path; common when tez.task.am.heartbeat.interval-ms raised too high.
Container reuse off, JVMs constantly spinning upTezChild.run loop returns shouldDie too eagerly; check AM-side AMContainerImpl reuse decision.
IllegalStateException: Cannot reserve more memoryIO requesting after makeInitialAllocations already ran.
Outputs never close (process hangs)Processor never returned from run(); usually an infinite loop on a KeyValuesReader.

Validation: prove you understand this

  1. Trace, with file:method references, the path from TezChild.main to processor.run for a single attempt.
  2. Explain in two sentences why LogicalIOProcessorRuntimeTask.initialize parallelizes input/output init. Cite the field name.
  3. Given a container with 1 GB heap, one OrderedPartitionedKVOutput requesting 512 MB and two OrderedGroupedKVInputs requesting 256 MB each, compute the actual allocations under the default WeightedScalingMemoryDistributor.
  4. Identify the single umbilical method that delivers inbound TezEvents to the task. Cite the file and the field on the response object.
  5. Sketch the smallest possible AbstractLogicalIOProcessor that prints the class names of all configured inputs and exits. Include initialize, handleEvents, run, close.

Scheduler

The scheduler is the AM-side component that turns task launch requests into running containers. It lives in tez-dag under org.apache.tez.dag.app.rm.

Two-layer design:

  • TaskSchedulerManager — single dispatcher and router. Receives AMSchedulerEvents from the rest of the AM, forwards to the right scheduler instance.
  • TaskScheduler instances — one per scheduler ID. In practice almost always YarnTaskSchedulerService (production) or LocalTaskSchedulerService (tez.local.mode=true). External pluggable schedulers (LLAP) also slot in here.
ls tez-dag/src/main/java/org/apache/tez/dag/app/rm/

TaskSchedulerManager

find tez-dag/src/main/java -name "TaskSchedulerManager.java"
wc -l $(find tez-dag/src/main/java -name "TaskSchedulerManager.java")

Implements EventHandler<AMSchedulerEvent> and is wired into the AM AsyncDispatcher. Every scheduling decision in the AM starts by enqueuing an AMSchedulerEvent.

Event types

find tez-dag/src/main/java -name "AMSchedulerEvent*.java"
grep -rn "extends AMSchedulerEvent" tez-dag/src/main/java
EventSourcePurpose
AMSchedulerEventTALaunchRequestTaskAttemptImpl after a TA is ready to scheduleAsk scheduler to launch this attempt.
AMSchedulerEventTAStateUpdatedTaskAttemptImpl on completionNotify scheduler the container is now releasable.
AMSchedulerEventContainerCompletedYARN RM callbackRM told us a container died.
AMSchedulerEventDeallocateContainervariousForce-release a held container.
AMSchedulerEventNodeBlacklistUpdateNodeTrackerAdd/remove node from blacklist.
AMSchedulerEventDAGStart, AMSchedulerEventVertexStateUpdatedDAGImpl, VertexImplDAG lifecycle hints (drives priority adjustments).

TaskSchedulerManager.handle(event) switches on event type and forwards via getTaskScheduler(event.getSchedulerId()).handleEvent(...).


YarnTaskSchedulerService

find tez-dag/src/main/java -name "YarnTaskSchedulerService.java"
wc -l $(find tez-dag/src/main/java -name "YarnTaskSchedulerService.java")

This is where Tez talks to YARN. Owns:

  • AMRMClientAsync — async RM heartbeat client.
  • Map<Priority, BlockingQueue<CookieContainerRequest>> — outstanding requests, bucketed by priority.
  • Map<ContainerId, HeldContainer> — currently-assigned containers (see container-reuse.md).
  • A DelayedContainerManager thread that releases idle reused containers.

Request flow

sequenceDiagram
  participant TA as TaskAttemptImpl
  participant TSM as TaskSchedulerManager
  participant Y as YarnTaskSchedulerService
  participant RM as YARN RM
  TA->>TSM: AMSchedulerEventTALaunchRequest
  TSM->>Y: allocateTask(...)
  Y->>Y: build CookieContainerRequest (priority, resource, locality)
  Y->>RM: addContainerRequest (via AMRMClientAsync)
  Note over RM: scheduler matches request to a node
  RM-->>Y: onContainersAllocated([Container])
  Y->>Y: assignContainer() — match to a pending request
  Y->>TSM: containerAllocated(taskAttempt, container)
  TSM->>TA: TAEventContainerAssigned

Matching: priority + locality

grep -n "assignContainer\|matchContainerToRequest\|getMatchingRequests" \
  $(find tez-dag/src/main/java -name "YarnTaskSchedulerService.java") | head -20

When a container arrives, assignContainer walks pending requests at the container's priority. For each:

  1. NODE_LOCAL — container's node matches a hint host of the request.
  2. RACK_LOCAL — same rack but different host.
  3. ANY — locality wildcard.

AMRMClientAsync already biases matches by locality on the YARN side; this pass is the AM-side tiebreaker when multiple requests are eligible.

Hint levelYARN requestTez match
NODE_LOCALhost + rack + *accepts container on the exact host
RACK_LOCALrack + *accepts container on the same rack
ANY* onlyaccepts any container at this priority

TaskLocationHint is set on TaskAttemptImpl either from the input split (MRInput), the VertexLocationHint (provided by VertexManager), or left null.

Priorities

grep -n "Priority\|priorityForVertex" \
  $(find tez-dag/src/main/java -name "YarnTaskSchedulerService.java" \
                                -o -name "DAGImpl.java") | head

Tez assigns each vertex a priority class derived from its topological order in the DAG; downstream vertices have higher numeric priority (lower priority value), so that source tasks complete first and free their containers for downstream consumers. Priority is the primary key for container reuse matching as well.

RM callbacks

grep -n "AMRMClientAsync.CallbackHandler\|onContainersAllocated\|onContainersCompleted\|onShutdownRequest" \
  $(find tez-dag/src/main/java -name "YarnTaskSchedulerService.java")

YarnTaskSchedulerService implements AMRMClientAsync.CallbackHandler:

  • onContainersAllocated(List<Container>) — enqueue for assignment.
  • onContainersCompleted(List<ContainerStatus>) — translate exit status into AMSchedulerEventContainerCompleted.
  • onShutdownRequest() — RM asked AM to die (eg lost AM attempt).
  • onNodesUpdated(List<NodeReport>) — update node health for blacklisting.
  • getProgress() — AM tells RM its overall DAG progress.

LocalTaskSchedulerService

find tez-dag/src/main/java -name "LocalTaskSchedulerService.java"
wc -l $(find tez-dag/src/main/java -name "LocalTaskSchedulerService.java")

Same contract as YarnTaskSchedulerService but bypasses YARN:

  • A bounded ExecutorService of LocalContainer worker threads stands in for the YARN cluster.
  • allocateTask instantly synthesizes a fake Container and dispatches containerAllocated.
  • The container launcher (LocalContainerLauncher) runs TezChild in the same JVM on the executor.

Used by tez.local.mode=true and MiniTezCluster tests of certain flavors. See local-mode.md.


Pluggable schedulers

grep -n "tez.am.task.scheduler.classes\|TASK_SCHEDULER_SERVICE_CLASS" \
  tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java

Configuration:

tez.am.task.scheduler.classes = <comma-separated FQNs>

TaskSchedulerManager instantiates one per ID. Hive's LLAP plugs in a custom scheduler that talks to LLAP daemons instead of YARN.


Walkthrough: launching a single task attempt

  1. VertexImpl decides to schedule task T.k (via VertexManager or scaling).
  2. TaskImpl creates TaskAttemptImpl for attempt 0 → state NEW.
  3. TaskAttemptImpl transitions to START_WAIT, dispatches AMSchedulerEventTALaunchRequest with TaskLocationHint and capability.
  4. TaskSchedulerManager.handle routes to the configured scheduler.
  5. YarnTaskSchedulerService.allocateTask constructs CookieContainerRequest(priority, capability, hosts, racks, relaxLocality=true) and calls AMRMClientAsync.addContainerRequest.
  6. RM schedules → callback onContainersAllocated([c]).
  7. assignContainer(c) finds the matching pending request, calls informAppAboutAssignmentTaskSchedulerManager.containerAllocated.
  8. TaskSchedulerManager dispatches AMContainerEventAssignTA to AMContainerImpl, then TAEventContainerAssigned to TaskAttemptImpl.
  9. AMContainerImpl asks ContainerLauncherImpl to launch the container (or reuse a held one).
  10. TezChild starts (or accepts new task via reuse loop). The umbilical fires up; the attempt transitions to RUNNING.
sequenceDiagram
  participant V as VertexImpl
  participant TA as TaskAttemptImpl
  participant TSM as TaskSchedulerManager
  participant Y as YarnTaskSchedulerService
  participant AC as AMContainerImpl
  participant RM as YARN RM
  V->>TA: schedule
  TA->>TSM: AMSchedulerEventTALaunchRequest
  TSM->>Y: allocateTask
  Y->>RM: addContainerRequest
  RM-->>Y: onContainersAllocated
  Y->>TSM: containerAllocated
  TSM->>AC: AMContainerEventAssignTA
  TSM->>TA: TAEventContainerAssigned
  AC->>RM: start container (via NMClient)

Reading exercise

  1. cat $(find tez-dag/src/main/java -name "TaskSchedulerManager.java") | head -200 — list the event types handled.
  2. grep -n "addContainerRequest\|removeContainerRequest\|releaseAssignedContainer" \ $(find tez-dag/src/main/java -name "YarnTaskSchedulerService.java") — find all RM client interactions.
  3. grep -n "NODE_LOCAL\|RACK_LOCAL\|OFF_SWITCH\|ANY" \ $(find tez-dag/src/main/java -name "YarnTaskSchedulerService.java") — how is locality classified?
  4. grep -n "CookieContainerRequest" $(find tez-dag/src/main/java -name "*.java" | grep rm) — what is the "cookie"? (Hint: opaque payload to thread reuse data through AMRMClient.)
  5. wc -l $(find tez-dag/src/main/java/org/apache/tez/dag/app/rm -name "*.java") — which file dominates? Likely YarnTaskSchedulerService ≫ everything.
  6. grep -n "Priority.newInstance\|priority(" \ $(find tez-dag/src/main/java -name "VertexImpl.java" -o -name "DAGImpl.java") — where is per-vertex priority computed?

Common bugs and symptoms

SymptomLikely cause
AM stuck "0 containers running"RM has no capacity at requested priority; queue at capacity. Check yarn application -status.
All tasks scheduled OFF_SWITCHTaskLocationHint not propagated through VertexManager.
Tasks fail with Container released by AMYarnTaskSchedulerService released a container that an attempt still owned — usually a state machine race; see failure-handling.md.
Reuse not happeningPriorities mismatch between completed and pending tasks; check tez.am.container.reuse.locality.delay-allocation-millis.
AM heartbeat thread blockedA scheduler callback (onContainersAllocated) ran a slow blocking op on the RM client thread. Keep callbacks light.
IllegalStateException: Priority N not registeredallocateTask called for a vertex whose priority class was never bootstrapped.

Validation: prove you understand this

  1. Walk an AMSchedulerEventTALaunchRequest from dispatch in TaskAttemptImpl to a YARN AMRMClient.addContainerRequest call. Cite file paths.
  2. Explain the difference between priority (YARN concept) and DAG priority (Tez concept) and where Tez sets each.
  3. Given a 100-task Vertex A followed by a 10-task Vertex B, what priority class does each get and why?
  4. Describe how YarnTaskSchedulerService decides between two pending requests at the same priority when a container arrives.
  5. Identify the single method on YarnTaskSchedulerService that the RM callback thread invokes when containers become available. Cite file:line.

Container Reuse

Container reuse is the single biggest reason Tez runs short-task DAGs faster than MapReduce. This chapter explains why, where the policy lives, and how to debug it when it stops working.


Why reuse matters

Container allocation has three costs:

  1. RM round-tripaddContainerRequest, RM scheduling cycle (typically yarn.scheduler.capacity.node-locality-delay adds extra ms), onContainersAllocated.
  2. NM container launchContainerLaunchContext setup, localization of resources, NodeManager forking the JVM.
  3. JVM warmup — classloading, JIT, GC tuning.

For a 5-second task on a fresh container the wall time looks like:

Phasems
AM request → RM allocate200–2000
NM launch + localization500–3000
JVM start500–2000
Task work5000
Overhead share25–60%

For 100 such tasks, paying that overhead 100 times turns a DAG that should finish in ~10s into one that takes 60–90s. Reuse drops to near-zero overhead for tasks 2..N on the same container.


The reuse loop in TezChild

See tez-runtime.md. The single key fact: after each completed task, TezChild.run() calls umbilical.getTask() again instead of exiting. As long as the AM hands it work, the same JVM keeps running.

grep -n "umbilical.getTask\|shouldDie\|run()" \
  $(find tez-runtime-internals/src/main/java -name "TezChild.java") | head

So the entire reuse policy is implemented on the AM side — the container asks "what next?" and the AM decides whether to give it another task or release it.


AMContainerImpl — per-container state machine

find tez-dag/src/main/java -name "AMContainerImpl.java"
wc -l $(find tez-dag/src/main/java -name "AMContainerImpl.java")
grep -n "AMContainerState\|enum AMContainerState" \
  $(find tez-dag/src/main/java -name "AMContainerState.java" \
                                -o -name "AMContainerImpl.java") | head

Each YARN container the AM holds has a corresponding AMContainerImpl state machine. States include:

StateMeaning
ALLOCATEDRM has assigned the container; not yet launched.
LAUNCHINGNMClient is forking the JVM.
IDLELaunched, no task assigned (reuse candidate).
RUNNINGA task attempt is currently executing.
STOP_REQUESTED / COMPLETEDReleasing or released.

The transition RUNNING → IDLE is the moment Tez decides between reuse and release.


HeldContainer

grep -n "HeldContainer\|heldContainers\|delayedContainers" \
  $(find tez-dag/src/main/java -name "YarnTaskSchedulerService.java") | head -20

HeldContainer is the scheduler-side view of an idle reused container:

FieldPurpose
containerThe underlying YARN Container (resource, node, priority).
priorityThe priority class it was originally allocated at.
lastTaskActivityTimestamp of the last task completion.
nextScheduleTimeWhen DelayedContainerManager will reconsider it.
localityMatchLevelTrack the locality at which it can still be matched.

When a task completes, AMContainerImpl reports back to YarnTaskSchedulerService which wraps the container in a HeldContainer and queues it for matching.


Matching: who gets the held container?

Algorithm (paraphrased from YarnTaskSchedulerService):

  1. Walk pending requests at the same priority as the held container's original allocation.
  2. Prefer requests with locality matching the container's node, then rack, then any.
  3. Verify resource compatibility: container's Resource must satisfy the request's capability.
  4. If a match exists, dispatch reuse to the matched TaskAttemptImpl.
  5. If no match, leave the container as HeldContainer and schedule the DelayedContainerManager to re-evaluate after the locality delay.
grep -n "tryAssignReUsedContainer\|matchHeldContainerToRequest\|getMatchingRequests" \
  $(find tez-dag/src/main/java -name "YarnTaskSchedulerService.java") | head

Why priority-strict matching?

Tez does not reuse a container allocated for priority class P1 for a task of priority P2 because RM accounting attributed the container to the P1 queue/request. Crossing priority classes would corrupt fairness and create double-counting in the RM's view of demand.


Idle timeout

grep -n "tez.am.container.idle.release-timeout" \
  tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java

Two timeouts bracket the wait:

KeyDefaultMeaning
tez.am.container.idle.release-timeout-min.millis5000Don't release before this much idle time.
tez.am.container.idle.release-timeout-max.millis10000Definitely release after this much.

DelayedContainerManager runs a periodic sweep. For each HeldContainer:

  • If now - lastActivity < min, wait.
  • If min ≤ now - lastActivity < max, try a relaxed-locality match.
  • If now - lastActivity ≥ max, release back to YARN (AMRMClient.releaseAssignedContainer).

Why a range? Avoids thundering-herd releases when a wave of tasks finishes simultaneously, and gives the AM a window to re-match before paying the allocate-from-scratch cost.


Locality re-matching

grep -n "localityMatchLevel\|adjustLocalityMatch\|fallbackMatch" \
  $(find tez-dag/src/main/java -name "YarnTaskSchedulerService.java") | head

A held container starts at NODE_LOCAL. Each sweep without a match relaxes the level:

NODE_LOCAL → RACK_LOCAL → ANY → release.

tez.am.container.reuse.locality.delay-allocation-millis (default 250) is the per-step delay. Higher values raise locality at the cost of throughput; lower values give up locality faster.


DAG transitions and reuse

grep -n "tez.am.container.reuse.across-dags.enabled\|tez.am.container.reuse.enabled" \
  tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java

Reuse policy at DAG boundaries:

KeyDefaultEffect
tez.am.container.reuse.enabledtrueMaster toggle.
tez.am.container.reuse.rack-fallback.enabledtrueAllow RACK_LOCAL fallback.
tez.am.container.reuse.non-local-fallback.enabledfalseAllow ANY-locality fallback.
tez.am.container.reuse.new-containers.enabledtrueReuse a brand-new container for a different task than originally requested.
tez.am.session.mode.tez-session.enabled (Hive)controls inter-DAG reuse via sessionHive holds the AM across queries.

When Session mode is on (Hive's TezSessionPoolManager does this), the AM holds containers across DAGs, so the first DAG warms the JVMs that the second DAG reuses.


Failure modes

Stale credentials

grep -n "credentials\|Token\|getCredentials" \
  $(find tez-dag/src/main/java -name "ContainerLauncherImpl.java" \
                                -o -name "AMContainerImpl.java") | head

If a DAG uses delegation tokens (HDFS, HiveMetastore) that expire mid-session, reused containers still hold the old tokens. Symptoms: tasks fail with SecretManager$InvalidToken on file open. Fix: token renewal via TokenRenewer, or release reused containers between DAGs that use tokens with short TTLs.

Leaked containers on AM failover

grep -n "recoverContainer\|onAMRestart" \
  $(find tez-dag/src/main/java -name "*.java" | head -50) | head

When the AM dies and YARN restarts attempt 2, the old containers are still running. YARN passes them to the new AM via getContainersFromPreviousAttempts. If the new AM mis-handles the priority mapping, those containers can become orphaned — neither released nor reused — until the YARN-level yarn.am.liveness-monitor.expiry-interval-ms kicks in.

Resource fragmentation

Tez does not reshape containers. A 4 GB container allocated for a heavyweight mapper sits idle through the reduce phase if reducers want 2 GB containers — the 4 GB block is not subdivided.

Container blacklisting

grep -n "blacklist\|NodeTracker" \
  $(find tez-dag/src/main/java -name "*.java" | grep rm) | head

A node accumulating task failures gets blacklisted; held containers on that node are released even within the idle window.


Tuning playbook

GoalTune
Reduce p50 task latencyIncrease tez.am.container.idle.release-timeout-max.millis — keep JVMs warm longer.
Reduce YARN queue pressureLower tez.am.container.idle.release-timeout-min.millis — return idle containers faster.
Improve locality on long DAGsIncrease tez.am.container.reuse.locality.delay-allocation-millis.
Hive interactive queriesEnable session pools (hive.server2.tez.initialize.default.sessions) and large reuse windows.
Debugging "why was this container released?"Set log4j level for org.apache.tez.dag.app.rm to DEBUG.

Reading exercise

  1. wc -l $(find tez-dag/src/main/java -name "AMContainerImpl.java") then read the state machine declaration block. Count states and transitions.
  2. grep -n "DelayedContainerManager" $(find tez-dag/src/main/java -name "YarnTaskSchedulerService.java") — find the sweep loop.
  3. grep -rn "idle.release-timeout" tez-dag/src/main/java — list all read sites for the idle timeout.
  4. grep -n "previousAttemptContainers\|registerApplicationMaster" $(find tez-dag/src/main/java -name "YarnTaskSchedulerService.java") — how does the AM enumerate inherited containers on failover?
  5. cat tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java | grep -A 1 "REUSE\|REUSE_ENABLED" — list every reuse-related config key.
  6. grep -n "containerCompleted" $(find tez-dag/src/main/java -name "AMContainerImpl.java") — where does the AM learn that the JVM exited unexpectedly?

Common bugs and symptoms

SymptomLikely cause
0% reuse despite tez.am.container.reuse.enabled=truePriority mismatches; verify with AM log Container released because no matching request.
Hive query slow after token refreshReused container holding stale HiveMetastore delegation token. Release after refresh or shorten reuse window.
AM log spam: Released container X because expiredTasks completing faster than next-wave dispatch — lower idle.release-timeout-min.
YARN queue at 100% but tasks pendingHeld containers at wrong priority blocking new allocations; check nm-rm-heartbeat-interval-ms.
Containers orphaned after AM crashNew AM did not register previous containers; check
getContainersFromPreviousAttempts handling.

Validation: prove you understand this

  1. Describe the four-step locality relaxation a HeldContainer undergoes.
  2. Why is priority-strict matching enforced even when relaxing locality? Cite the RM accounting consequence.
  3. Given idle.release-timeout-min=5000, idle.release-timeout-max=10000, and 200 ms between successive task completions on the same vertex, what fraction of containers get reused?
  4. Identify the exact configuration key that controls whether RM-fresh containers can be assigned to a task different from the one that triggered the request. Cite file:line.
  5. Sketch the sequence of AM events when an AMContainer transitions RUNNING → IDLE → RUNNING with reuse, including which state machine emits each event.

Local Mode

Tez ships two "no YARN" execution paths:

  • Local modetez.local.mode=true. The whole AM + all containers run in the calling JVM. No RM, no NM, no networking.
  • MiniTezCluster — a real YARN MiniCluster (RM + NMs as threads) with a real Tez AM submitted as a YARN app. Networking goes over loopback.

Both let you test without a cluster, but they are not interchangeable. This chapter explains the wiring and the tradeoffs.


Why a no-YARN path exists

Production Tez requires YARN to allocate containers. For:

  • IDE-driven unit tests of vertex managers, edge managers, processors;
  • short reproducers in JIRAs;
  • in-process pipelines (e.g. running a DAG inline from a Hive Driver in a test);

paying YARN startup cost (30+ seconds) is intolerable. Local mode is the escape hatch.

grep -rn "tez.local.mode\|LOCAL_MODE" \
  tez-api/src/main/java tez-dag/src/main/java | head

How tez.local.mode=true rewires the AM

grep -n "TEZ_LOCAL_MODE\|isLocalMode\|getBoolean.*LOCAL" \
  tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java \
  tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java

When tez.local.mode=true:

  1. TezClient.start() does not submit to YARN. Instead it constructs a DAGAppMaster instance directly in the client JVM and starts it as a service.
  2. TaskSchedulerManager is configured with LocalTaskSchedulerService instead of YarnTaskSchedulerService.
  3. ContainerLauncherManager uses LocalContainerLauncher instead of ContainerLauncherImpl.
  4. TaskCommunicatorManager uses TezLocalTaskCommunicatorImpl which bypasses RPC entirely.

The net effect: the AM, scheduler, container launcher, and TezChilds all live in the same JVM, talking via in-process queues.


LocalTaskSchedulerService

find tez-dag/src/main/java -name "LocalTaskSchedulerService.java"
wc -l $(find tez-dag/src/main/java -name "LocalTaskSchedulerService.java")

Mirrors YarnTaskSchedulerService but the "resource pool" is a thread pool.

ConceptYarn versionLocal version
Resource poolYARN clusterExecutorService of bounded thread count
allocateTaskAMRMClient.addContainerRequestenqueue to local queue, immediately synthesize fake Container
releaseAssignedContainerRM releasereturn thread to pool
LocalityNODE_LOCAL/RACK_LOCALalways ANY (single "node")
PriorityYARN priority classhonored as a queue-ordering hint

Configuration:

grep -n "TEZ_AM_INLINE_TASK_EXECUTION_MAX_TASKS\|tez.am.inline" \
  tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java

tez.am.inline.task.execution.max-tasks (default 1) controls thread-pool size in local mode. Bumping this exposes concurrency bugs that production container parallelism would also expose.


LocalContainerLauncher

find tez-dag/src/main/java -name "LocalContainerLauncher.java"

When the AM "launches" a local container, the launcher allocates a LocalContainer worker that runs TezChild logic in the same process:

  • No new JVM.
  • No serialization of the ContainerLaunchContext — the AM hands the TaskSpec directly to the local task runner.
  • The umbilical "RPC" is a Java method call on an in-process object.

This means: local mode does not exercise the RPC layer, classpath construction, NM localization, or token plumbing. Bugs in those paths are invisible to local-mode tests.


What local mode does not exercise

LayerSkipped in local mode
YARN RM scheduling
NodeManager container launch
Resource localization (HDFS download)
AMRMToken / ClientToAMToken
HDFS shuffle path (uses local FS only)
ShuffleHandler aux service
RPC serialization
JVM cold start / classloader isolation

What it does exercise: the DAG state machine, VertexManagers, EdgeManagers, sort/merge code, processors, and the umbilical event flow.


MiniTezCluster

find tez-tests/src/test/java -name "MiniTezCluster.java"
wc -l $(find tez-tests/src/test/java -name "MiniTezCluster.java")

A real cluster compressed onto one host. Inherits MiniYARNCluster from Hadoop:

  • One RM thread.
  • N NM threads (configurable).
  • A Tez AM submitted as a normal YARN application.
  • TezChild runs in separate JVMs spawned by NM ContainerExecutor.
  • HDFS is MiniDFSCluster (a few NameNode + DataNode threads in the same JVM) or a RawLocalFileSystem.
grep -n "MiniYARNCluster\|MiniDFSCluster\|appJar\|deploy" \
  $(find tez-tests/src/test/java -name "MiniTezCluster.java") | head

Setup pattern

grep -rn "MiniTezCluster\b" tez-tests/src/test/java | head -10
MiniTezCluster cluster = new MiniTezCluster("test", numNMs, numDNs, racks);
cluster.init(conf);
cluster.start();

TezConfiguration tezConf = new TezConfiguration(cluster.getConfig());
TezClient client = TezClient.create("test", tezConf);
client.start();
client.waitTillReady();
client.submitDAG(myDag);

When MiniTezCluster is the right tool

  • You are exercising RPC, security, or localization code.
  • You hit ShuffleHandler paths or HDFS-backed recovery (see failure-handling.md).
  • You're reproducing a bug that involves real container lifecycle (kill -9 vs orderly shutdown) — MiniCluster can forkProcess and SIGKILL.
  • You need realistic counters and ATS event flow.

When MiniTezCluster is the wrong tool

  • Pure VertexManager logic — use local mode or mock dispatcher.
  • Pure IFile / sort behavior — use a unit test on the runtime-library classes directly.
  • Anything where 30–60 s startup + heavy memory cost (~1 GB minimum) is intolerable.

Side-by-side comparison

AspectLocal modeMiniTezCluster
Startup< 1 s30–60 s
Memory~256 MB1 GB+
YARN exercisednoyes (in-process)
RPC exercisednoyes (loopback)
Tokens exercisednoyes (simple, unkerberized by default)
Separate JVMs for tasksnoyes
HDFSRawLocalMiniDFS or RawLocal
Shuffle pathno ShuffleHandlerfull ShuffleHandler
Use caseunit / integration of AM logicend-to-end integration tests
Example test classTestLocalModeTestOrderedWordCount
find tez-tests/src/test/java -name "TestLocalMode.java" \
                              -o -name "TestOrderedWordCount.java"

Worked example: switching between modes in one test

@Parameters
public static Iterable<Object[]> modes() {
  return Arrays.asList(new Object[][] {{"local"}, {"mini"}});
}

@Before
public void setUp() throws Exception {
  conf = new TezConfiguration();
  if ("local".equals(mode)) {
    conf.set("fs.defaultFS", "file:///");
    conf.setBoolean(TezConfiguration.TEZ_LOCAL_MODE, true);
  } else {
    miniCluster = new MiniTezCluster("test", 1, 1, 1);
    miniCluster.init(conf);
    miniCluster.start();
    conf = new TezConfiguration(miniCluster.getConfig());
  }
  client = TezClient.create("t", conf);
  client.start();
  client.waitTillReady();
}

This is the pattern in several Tez tests where a feature must work in both universes.


Reading exercise

  1. wc -l $(find tez-dag/src/main/java -name "LocalTaskSchedulerService.java" \ -o -name "YarnTaskSchedulerService.java") — confirm the local version is much smaller.
  2. grep -n "tez.local.mode" $(find tez-dag/src/main/java -name "DAGAppMaster.java") — find every branch that depends on local mode.
  3. cat $(find tez-dag/src/main/java -name "LocalContainerLauncher.java") | head -160 — how does it run TezChild without a fork?
  4. find tez-tests/src/test/java -name "MiniTezCluster.java" -exec grep -n "ShuffleHandler\|aux-services" {} \; — verify MiniTezCluster wires the YARN aux service.
  5. grep -rn "TEZ_LOCAL_MODE" tez-api tez-dag tez-runtime-internals | head — list every config read site.
  6. find tez-tests/src/test/java -name "TestLocal*" -o -name "TestMRR*" — read one local-mode and one MiniCluster test, side by side.

Common bugs and symptoms

SymptomLikely cause
Test passes in local mode, fails on clusterLocal mode skipped RPC/localization/tokens. Add a MiniCluster variant.
MiniCluster test times out at waitTillReadyRM never registered the AM. Check tez-site.xml is on the AM classpath in the MiniCluster config.
Local-mode race conditions only visible with inline.task.execution.max-tasks > 1Single-threaded local mode hides ordering bugs in VertexManager and dispatchers.
ClassNotFoundException for custom processor in MiniClusterContainer localization needs the JAR; either put it on the launch classpath or register via LocalResources.
MiniCluster blows the heapDefault 1 NM + MiniDFS already 1 GB; bump JVM heap or reduce NM count to 1.
Hive integration test wedges only in MiniClusterHive needs full Hadoop config; check hadoop.security.authentication=simple in test conf.

Validation: prove you understand this

  1. List four layers that local mode does not exercise. For each, name a bug class it can hide.
  2. In local mode, where does the "RPC" between TezChild and the AM actually happen? Cite the file path.
  3. Why is tez.am.inline.task.execution.max-tasks=1 the default in local mode? What test reliability tradeoff does it enforce?
  4. Given a reproducer for a bug in ShuffleHandler aux-service interaction, explain why a TestLocalMode-style test cannot reproduce it, and what the minimum MiniCluster setup is.
  5. Show the minimum TezConfiguration setup for local mode in code. Three lines max.

Hive on Tez

Hive is the largest single consumer of Tez. Roughly 70% of bug reports filed against Tez originate in a Hive query; many "Tez bugs" turn out to be Hive bugs, and vice versa. This chapter walks the compile boundary, explains how Hive operators map to Tez I/P/O, and gives a triage tree for attribution.


The compile boundary

Hive's query compiler produces a TezWork, a graph of BaseWork nodes (MapWork, ReduceWork, MergeJoinWork, etc). TezTask.execute walks TezWork and constructs a Tez DAG.

ls ~/hive-src/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/

Key files:

FileRole
TezTask.javaHive's Task impl; builds the DAG and submits via TezSessionState.
DagUtils.javaDAG construction helpers (createVertex, createEdge, etc).
TezSessionPoolManager.javaWarm session pool — keeps AMs alive between queries.
TezSessionState.javaOne Hive session ↔ one Tez AM.
TezProcessor.javaThe LogicalIOProcessor that runs Hive operator pipelines inside a Tez task.
wc -l ~/hive-src/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/{TezTask,DagUtils,TezSessionPoolManager,TezProcessor}.java

TezTask.execute — high-level flow

grep -n "execute\|build\|submitDAG" \
  ~/hive-src/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java | head -30

Steps:

  1. Acquire a TezSessionState from TezSessionPoolManager (or open a new one).
  2. build(jobConf, work, scratchDir, ...) — call DagUtils to turn each BaseWork into a Tez Vertex and each TezEdgeProperty into a Tez Edge.
  3. submit(dag, sessionState)tezClient.submitDAG(dag).
  4. Poll dagClient.getDAGStatus(...) until terminal.
  5. Surface counters + diagnostics back to Hive.

DagUtils.createVertex

grep -n "createVertex\|createEdge\|createEdgeProperty\|setVertexManagerPlugin" \
  ~/hive-src/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java | head -30

For a MapWork:

Hive conceptTez vertex configuration
Operator tree starting with TableScanOperatorprocessor = MapTezProcessor (subclass of TezProcessor)
Number of input splitsparallelism = splits.length (often overridden by grouping)
Per-split inputDataSourceDescriptor with MRInputLegacy and the InputFormat
CombinerEdge-level (downstream ReduceWork configures it as a combiner.class)

For a ReduceWork:

Hive conceptTez vertex configuration
Operator tree starting at ReduceSinkOperator's consumerprocessor = ReduceTezProcessor
Target parallelismnumReducers (from Hive's Operator tree, optionally
auto-parallelized via ShuffleVertexManager)
Sort key codecOrderedGroupedKVInput.KEY_CLASS, KEY_COMPARATOR_CLASS
setVertexManagerPluginShuffleVertexManager with auto-parallelism if hive.tez.auto.reducer.parallelism=true

For a MergeJoinWork:

  • processor = MergeJoinProcessor
  • Multiple sorted inputs (one per join side) using OrderedGroupedKVInput
  • A custom or built-in vertex manager that coordinates inputs

Operator → IPO mapping

Hive operators run inside a Tez task — they are not Tez constructs. The mapping happens at the input/output boundary of the vertex.

PositionHive operatorTez wiring
Vertex entry (map side)TableScanOperatorMRInputLegacy (tez-mapreduce) emits (key, value) from InputFormat
Vertex middleFilter / Select / GroupBy partial / etcPure in-memory operator chain inside TezProcessor.process
Vertex exit (shuffle producer)ReduceSinkOperatorOrderedPartitionedKVOutput with Hive's HiveKey serializer and partitioner
Vertex entry (reduce side)First operator after the boundaryOrderedGroupedKVInput provides a KeyValuesReader; ReduceRecordProcessor adapts it into Hive's tuple-at-a-time interface
Vertex middle (reduce)GroupBy aggregation, Join, etcOperator chain
Vertex exit (final)FileSinkOperatorMROutputLegacy writes to HDFS
Broadcast join buildHashTableSinkOperatorUnorderedKVOutput (or in newer Hive a BROADCAST-typed edge) feeding the probe vertex
Broadcast join probeMapJoinOperatorUnorderedKVInput on a BROADCAST edge
grep -rn "OrderedPartitionedKVOutput\|OrderedGroupedKVInput\|UnorderedKVOutput\|UnorderedKVInput" \
  ~/hive-src/ql/src/java/org/apache/hadoop/hive/ql/exec/tez | head -20

TezProcessor adapter

grep -n "class TezProcessor\|class MapTezProcessor\|class ReduceTezProcessor\|process(" \
  ~/hive-src/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java

TezProcessor.run(inputs, outputs):

  1. Pull the singular input (MRInputLegacy or first OrderedGroupedKVInput).
  2. Construct a RecordSource that adapts the Tez reader into Hive's Operator.process(Object row, int tag) calling convention.
  3. Run the operator tree until the input is drained.
  4. Call forward(EOF) to drain operator buffers.
  5. Close outputs in reverse order.

The processor is intentionally thin — all the interesting logic is in the Hive operator chain.


TezSessionPoolManager

find ~/hive-src/ql/src/java -name "TezSessionPoolManager.java"
wc -l ~/hive-src/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java

A Tez session = a long-lived Tez AM holding zero or more idle containers ready to accept the next DAG.

ConfigDefaultEffect
hive.server2.tez.default.queuesdefaultPre-warm sessions per YARN queue.
hive.server2.tez.sessions.per.default.queue1Number of pre-warm sessions per queue.
hive.server2.tez.initialize.default.sessionsfalseStart them at HS2 boot.
hive.tez.exec.print.summaryfalseSurface Tez counters in query output.

Pool flow:

  1. HS2 starts. If initialize.default.sessions=true, launches N AMs per queue.
  2. Query comes in. HS2 calls TezSessionPoolManager.getSession(queue) — gets an idle session or opens a new one.
  3. Session executes the DAG; AM holds containers across DAGs (see container-reuse.md).
  4. On session return, AM remains idle awaiting next DAG.
  5. On idle timeout (hive.server2.session.check.interval), pool may close sessions.

LLAP

LLAP (Live Long And Process) is a different execution model that replaces the per-query AM with a long-lived per-node daemon. The Tez AM still coordinates, but instead of asking YARN for containers it asks LLAP daemons for "fragments".

find ~/hive-src/llap-* -type d -maxdepth 2 2>/dev/null | head

Key differences (do not extrapolate Tez-on-YARN debugging to LLAP):

  • Containers are replaced by LlapTaskExecutorService worker slots.
  • The shuffle path uses a Netty-based fetcher (LlapShuffleHandler).
  • The Tez scheduler plugin is LlapTaskSchedulerService (in hive-llap-server).
  • Container reuse is not relevant — LLAP slots are always "hot".

This chapter does not cover LLAP further; treat it as a separate world.


Bug attribution: where does it really live?

Triage tree. Symptom: query fails or returns wrong result.

flowchart TD
  S[Failure observed] --> Q1{Failure message mentions Hive operator?}
  Q1 -- yes --> H1[Hive bug: open against HIVE]
  Q1 -- no --> Q2{Failure in TezChild / IFile / Fetcher?}
  Q2 -- yes --> T1[Tez bug: open against TEZ]
  Q2 -- no --> Q3{Failure in container launch / RM allocation?}
  Q3 -- yes --> Y1[YARN bug: open against YARN]
  Q3 -- no --> Q4{Wrong result not crash?}
  Q4 -- yes --> Q5{Reproduce with same DAG via TestOrderedWordCount-style?}
  Q5 -- no --> H1
  Q5 -- yes --> T1

Practical heuristics:

Stack trace containsProbably
org.apache.hadoop.hive.ql.exec.OperatorHive
org.apache.tez.runtime.libraryTez
org.apache.tez.dag.app.rmTez (scheduling)
org.apache.hadoop.yarnYARN
ShuffleHandlerYARN-side (mapreduce auxservice)
LlapDaemonLLAP (Hive)
MapJoinOperator + OOMHive (join planning), even though the OOM happens in a Tez container

Wrong-result bugs almost always live in Hive (operator semantics) unless you can isolate the same DAG with synthetic data in TestOrderedWordCount style.


Reading exercise

  1. cat ~/hive-src/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java | head -200 — read the top of TezTask.execute.
  2. grep -n "createEdgeProperty\|EdgeProperty\.create" \ ~/hive-src/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java — list all edge property factories Hive uses.
  3. grep -rn "ShuffleVertexManager\|RootInputVertexManager" \ ~/hive-src/ql/src/java/org/apache/hadoop/hive/ql/exec/tez — when does Hive set each manager?
  4. find ~/hive-src/ql/src/java -name "TezProcessor.java" -exec wc -l {} \; — confirm the processor is < 1000 lines (it's an adapter, not the brain).
  5. grep -rn "TezSessionPoolManager.getSession" ~/hive-src/service/src — when does HS2 acquire sessions?
  6. cat ~/hive-src/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java | head -100 — see how a session wraps a TezClient.

Common bugs and symptoms

SymptomLikely owner
MetaException mid-queryHive (HMS client)
Container OOM during reduce joinHive operator (map-join build size); Tez can not size around an oversized hash table
Wrong row counts after a query rewriteHive optimizer or MapJoinOperator semantics
Fetcher: ConnectException to nm:13562YARN (aux-service mis-config)
AM dies with org.apache.tez.dag.app.DAGAppMaster: Vertex failed and the diagnostic mentions only TezProcessor, no Hive classTez bug — open a reproducer DAG without Hive
Slow first query after HS2 restartNo warm sessions; enable initialize.default.sessions
Stale ACL after GRANT reissueHive (HMS) — Tez containers cache delegation tokens; see container-reuse.md

Validation: prove you understand this

  1. List the Hive operators on the source and destination sides of a SCATTER_GATHER shuffle edge and map each side to the Tez Input or Output class.
  2. Identify the Hive method that finally calls tezClient.submitDAG. Cite path + grep command.
  3. Given a query that succeeds in standalone HS2 but fails in HS2 with session pooling on, name two likely failure modes and where to look.
  4. Explain why a MapJoinOperator OOM is a Hive bug even though the OOM stack trace is rooted in TezChild.
  5. Show, in three lines, the conditional inside DagUtils that decides whether to install ShuffleVertexManager on a reduce vertex. (Find via grep; quote the file:line.)

YARN Integration

The Tez AM is, from YARN's perspective, an ordinary YARN application: an ApplicationMaster running in a container, talking to the ResourceManager to request more containers, talking to NodeManagers to launch them, and writing events to a Timeline Server.

This chapter walks every YARN-facing interface Tez touches.


DAGAppMaster as a YARN AM

find tez-dag/src/main/java -name "DAGAppMaster.java"
wc -l $(find tez-dag/src/main/java -name "DAGAppMaster.java")
grep -n "main(\|serviceStart\|serviceInit" \
  $(find tez-dag/src/main/java -name "DAGAppMaster.java") | head

Boot sequence when YARN launches the AM container:

  1. NodeManager runs the AM command line (constructed by TezClientUtils), which is essentially java -cp <classpath> org.apache.tez.dag.app.DAGAppMaster.
  2. DAGAppMaster.main parses environment for ApplicationAttemptId, container ID, AM Resource, etc.
  3. Constructs the DAGAppMaster service tree (state machines, dispatchers, schedulers, ATS publisher).
  4. serviceStart() registers with the RM via AMRMClientAsync.registerApplicationMaster.
  5. Starts an RPC server for client connections (DAGClient, TezTaskUmbilicalProtocol).
  6. Waits for DAG submissions over the client RPC (or, for non-session mode, picks up the pre-submitted DAG from local disk).

Key clients owned by the AM:

ClientPurposeLibrary
AMRMClientAsyncRM heartbeat: request/release containershadoop-yarn-client
NMClientAsyncNM RPC: launch/stop containershadoop-yarn-client
TimelineClientATS event publisherhadoop-yarn-client
DFSClientHDFS access for recovery & temp fileshadoop-hdfs-client

AMRMClientAsync

grep -n "AMRMClientAsync\|addContainerRequest\|releaseAssignedContainer\|allocate" \
  $(find tez-dag/src/main/java -name "YarnTaskSchedulerService.java") | head -20

The async wrapper around AMRMClient. Tez uses it instead of the sync client so allocate-callbacks fire on a dedicated thread.

Lifecycle:

  1. Register: registerApplicationMaster(host, rpcPort, trackingUrl). This is the AM telling the RM "I'm alive, here is where to find me."
  2. Allocate loop: a background thread heartbeats every yarn.am.liveness-monitor.expiry-interval-ms / 3 (roughly). Each heartbeat the AMRM client sends:
    • Pending container requests (added via addContainerRequest).
    • Containers to release.
    • Application progress (0..1). It receives:
    • Newly allocated containers.
    • Completed container statuses.
    • Updated node reports (for blacklisting).
    • Decommissioned-node reports.
  3. Unregister: unregisterApplicationMaster(state, msg, trackingUrl) on AM shutdown.
grep -n "CallbackHandler\|onContainersAllocated\|onContainersCompleted\|onShutdownRequest\|onNodesUpdated" \
  $(find tez-dag/src/main/java -name "YarnTaskSchedulerService.java")

These callbacks run on the AMRM client's internal thread; Tez keeps them short by forwarding to its own dispatcher.


NMClientAsync and ContainerLauncherImpl

find tez-dag/src/main/java -name "ContainerLauncherImpl.java"
wc -l $(find tez-dag/src/main/java -name "ContainerLauncherImpl.java")
grep -n "NMClientAsync\|startContainerAsync\|stopContainerAsync" \
  $(find tez-dag/src/main/java -name "ContainerLauncherImpl.java")

After the RM allocates a container, Tez must tell the relevant NM to actually launch the JVM. ContainerLauncherImpl uses NMClientAsync to send startContainerAsync(container, containerLaunchContext).

ContainerLaunchContext

grep -n "buildContainerLaunchContext\|ContainerLaunchContext\|setLocalResources\|setEnvironment\|setCommands" \
  $(find tez-dag/src/main/java -name "ContainerLauncherImpl.java" \
                                -o -name "AMContainerHelpers.java")

The CLC is what NM uses to fork the JVM. It carries:

FieldWhat Tez puts there
commandsjava <jvm opts> -Dlog4j.configuration=... org.apache.tez.runtime.task.TezChild <args>
environmentCLASSPATH, JVM_PID, container ID, AM host/port
localResourcesTez tarball, user JARs, any HDFS-distributed resources
tokensDelegation tokens (HDFS, HMS, etc) for the container to use
serviceDataPer-aux-service payload (e.g. mapreduce_shuffle job secret)
grep -n "ServiceData\|JobTokenSecretManager\|shuffleSecret" \
  $(find tez-dag/src/main/java -name "*.java") | head

The serviceData map entry under key mapreduce_shuffle carries the serialized JobToken that NM's ShuffleHandler will use to authorize fetch requests — this is why mapreduce_shuffle must be configured as an NM aux-service even for Tez DAGs.


Tokens

grep -rn "AMRMToken\|ClientToAMToken\|TimelineDelegationToken" \
  tez-dag/src/main/java | head
TokenIssued byUsed forWhere it lives
AMRMTokenRM, auto-injected into AM's CredentialsAM↔RM RPCAM JVM credentials
ClientToAMTokenRM, returned to client at submitClient (DAGClient) ↔ AM RPCClient credentials
TimelineDelegationTokenTimeline ServerAM → Timeline publisherAM credentials, refreshed periodically
HDFS delegation tokenNNTasks reading/writing HDFSContainer credentials
Hive Metastore tokenHMSTasks calling HMSContainer credentials, via Hive code path

The AM is responsible for collecting all necessary delegation tokens at submit time (client-side TezClientUtils does this) and passing them to NMs in the CLC. Tokens that expire mid-DAG must be renewed by a TokenRenewer.


Log aggregation

grep -rn "log-aggregation\|LogAggregationService" \
  $(find ~/hadoop-src -name "*.java" 2>/dev/null | head -3) 2>/dev/null | head

YARN log aggregation is configured in yarn-site.xml:

<property>
  <name>yarn.log-aggregation-enable</name>
  <value>true</value>
</property>
<property>
  <name>yarn.nodemanager.remote-app-log-dir</name>
  <value>/app-logs</value>
</property>

When enabled, every container's stdout, stderr, and syslog are uploaded to HDFS under /app-logs/<user>/logs/<applicationId>/<nodeAddress> when the container exits. Retrieve with:

yarn logs -applicationId application_1234_0001 -containerId container_..._01
yarn logs -applicationId application_1234_0001 -appOwner alice

Without aggregation, logs sit in ${yarn.nodemanager.log-dirs}/application_.../container_.../ on each NM until cleaned by yarn.nodemanager.log.retain-seconds.


Timeline Server (ATS)

Tez publishes a rich event stream to ATS for post-mortem debugging.

find tez-plugins -type d -name "tez-yarn-timeline*"
ls tez-plugins/

Three flavors exist in the wild:

VersionTez plugin moduleNotes
ATSv1tez-yarn-timeline-historyOriginal; LevelDB-backed Timeline Server. Deprecated.
ATSv1.5tez-yarn-timeline-history-with-acls and tez-yarn-timeline-history-with-fsAdds entity-file staging to HDFS; reduces ATS write load.
ATSv2tez-yarn-timeline-history-with-fs + ATSv2 reader configurationHBase-backed, scalable; requires Hadoop 3.x.
grep -rn "TimelineClient\|TIMELINE_HISTORY\|HistoryEventHandler" \
  tez-plugins/tez-yarn-timeline-history*/src/main/java | head

What gets published:

  • AppLaunchedEvent
  • DAGSubmittedEvent, DAGInitializedEvent, DAGStartedEvent, DAGFinishedEvent
  • VertexInitializedEvent, VertexStartedEvent, VertexFinishedEvent
  • TaskStartedEvent, TaskFinishedEvent
  • TaskAttemptStartedEvent, TaskAttemptFinishedEvent
  • ContainerLaunchedEvent, ContainerStoppedEvent

The Tez UI (Ambari, standalone) reads these events to render the DAG view, vertex graphs, task swimlanes, and counter trees.

ls tez-ui/src/main 2>/dev/null

Configuration cheat sheet

grep -n "YARN\|ATS\|TIMELINE\|LOG_AGG" \
  tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java | head -20
KeyDefaultEffect
tez.am.am-rm.heartbeat.interval-ms.max1000Cap on AMRM heartbeat interval.
tez.am.client.am.port-rangeautoRPC port for AM client RPC.
tez.am.container.lookup.timeout-ms30000How long to wait for an NM ack before failing the launch.
tez.history.logging.service.class(varies)Which ATS plugin to use.
tez.am.tez-ui.history-url.templatetemplateWhere the UI is hosted; surfaced in DAGStatus.

yarn CLI behaviors for Tez apps

CommandBehavior on a Tez app
yarn application -listLists Tez AMs alongside MR/Spark; type tag is TEZ.
yarn application -status <appId>Shows AM state, RM tracking URL, ATS tracking URL (if configured).
yarn application -kill <appId>RM SIGKILLs the AM container; Tez state is lost (no recovery beyond what RecoveryService already wrote).
yarn logs -applicationId <appId>Streams aggregated logs of all containers — AM and TezChilds.
yarn node -listUseful for confirming aux-service mapreduce_shuffle is up on each NM.

Reading exercise

  1. grep -n "registerApplicationMaster\|unregisterApplicationMaster" \ $(find tez-dag/src/main/java -name "*.java") — find every AM-lifecycle call.
  2. grep -rn "setupContainerEnvironment\|buildContainerEnvironment" \ tez-dag/src/main/java tez-api/src/main/java | head — what environment variables does the AM pass to each container?
  3. cat $(find tez-dag/src/main/java -name "ContainerLauncherImpl.java") | head -200 — read the launch path.
  4. grep -rn "mapreduce_shuffle" tez-dag/src/main/java tez-api/src/main/java — verify the aux-service name is hard-coded.
  5. find tez-plugins -name "*.java" | xargs grep -l "TimelineEntity" | head -3 — which classes assemble ATS entities?
  6. cat $(find tez-dag/src/main/java -name "DAGAppMaster.java") | head -300 — locate serviceInit and list every service added to the composite.

Common bugs and symptoms

SymptomLikely cause
AM exits with InvalidApplicationMasterRequestExceptionAM tried to register twice or after un-register; usually a re-init bug.
Auxiliary service mapreduce_shuffle not configuredyarn-site.xml aux-services missing.
ConnectionRefused from FetcherNodeManager aux-service crashed or wrong shuffle port.
AM dies "RM expired"AMRM heartbeat thread blocked or paused for GC > expiry interval.
ATS empty for completed apptez.history.logging.service.class mis-set, or ATS not running.
yarn logs returns "Logs not aggregated"Container did not finish cleanly, or aggregation not enabled.
ClientToAMToken auth failClient and AM disagree on cluster security; check both have the same hadoop.security.authentication.

Validation: prove you understand this

  1. Trace the exact call path from DAGAppMaster.serviceStart to AMRMClientAsync.registerApplicationMaster.
  2. List the contents of the ContainerLaunchContext.serviceData map that Tez populates, and explain who reads each entry.
  3. Explain why an AM long pause for full GC can manifest as an RM expired shutdown, and which config controls the threshold.
  4. For an app with yarn.log-aggregation-enable=false and a TezChild that crashed, give the exact filesystem path on the NM where its stderr lives. Use the configured yarn.nodemanager.log-dirs as a variable.
  5. Name the three ATS plugin modules, and pick the right one for a Hadoop 3.x cluster targeting HBase-backed ATSv2.

Failure Handling

A Tez DAG fails for many reasons: a corrupted input split, a flaky NM, an OOM in the user processor, a Kerberos token expiry, an RM connectivity blip, an AM crash. Tez has a layered escalation model: small failures are absorbed, big ones propagate, and the AM persists enough state to recover from its own death.

This chapter walks the escalation, the failure taxonomy, and the recovery machinery.


Escalation: attempt → task → vertex → DAG

flowchart TD
  TA[TaskAttempt fails] -->|retry budget| TA2[New TaskAttempt]
  TA -->|exhausted| T[Task fails]
  T -->|failure policy| V[Vertex fails]
  V -->|fail-on-vertex-failure| D[DAG fails]

Default behavior:

LayerConfigurationDefaultEffect when exceeded
TaskAttempttez.am.task.max.failed.attempts4Mark Task as failed
Tasktez.am.vertex.max.task.failed.attempts (no direct knob; per-vertex policy)variesVertex fails on first failed task by default
Vertexper-DAG failure policyfail-fastDAG fails
grep -n "MAX_FAILED_ATTEMPTS\|MAX_TASK_ATTEMPTS\|TEZ_AM_TASK_MAX" \
  tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java

TaskAttemptTerminationCause

find tez-dag/src/main/java -name "TaskAttemptTerminationCause.java"
cat $(find tez-dag/src/main/java -name "TaskAttemptTerminationCause.java")

The enum names every reason a TaskAttempt can end up non-SUCCEEDED. A selection:

CauseSourceRetryable?
TERMINATED_BY_CLIENTuser-initiated DAG killno
INTERNAL_PREEMPTIONscheduler preempted to make roomyes
EXTERNAL_PREEMPTIONYARN preempted the containeryes
CONTAINER_LAUNCH_FAILEDNM rejected the launchretried on a new container
CONTAINER_EXITEDTezChild exited without a status updateyes
CONTAINER_STOPPEDAM stopped the container intentionallydepends
NODE_FAILEDNM diedyes, on a different node
NODE_BLACKLISTEDnode accumulated too many failuresretried elsewhere
OUTPUT_LOSTdownstream reported missing outputyes, re-run source TA
INPUT_READ_ERRORTA failed reading shuffle from a sourceyes
APPLICATION_ERRORuncaught exception in user codeusually no, but retried up to attempt budget
FRAMEWORK_ERRORuncaught exception in Tez codesometimes no
OTHER_TASK_ATTEMPT_KILLED_DUPLICATEspeculative duplicate lostno (not a failure)

This enum is the most important debugging signal — every failed attempt in ATS / AM log surfaces a cause from this list.


TaskAttempt failure retries

grep -n "max.failed.attempts\|maxFailedAttempts\|attemptFailed" \
  $(find tez-dag/src/main/java -name "TaskImpl.java")

On a TA failure:

  1. TaskAttemptImpl transitions to FAILED (or KILLED if the cause is in the "killed" subset).
  2. TaskImpl increments its failed-attempt counter.
  3. If counter < tez.am.task.max.failed.attempts, TaskImpl schedules a new TaskAttemptImpl (incremented attempt index).
  4. Otherwise TaskImpl transitions to FAILED and reports up to VertexImpl.

Some causes are not counted against the budget (e.g. OUTPUT_LOST, NODE_FAILED) — these are infrastructure failures, not user-code failures.

grep -n "isFatalFailure\|isExternalError\|countAsFailure" \
  $(find tez-dag/src/main/java -name "*.java" | head -50)

Node blacklisting

grep -rn "NodeTracker\|blacklist\|BLACKLISTED" \
  tez-dag/src/main/java/org/apache/tez/dag/app | head

Per-node failure accounting:

TriggerEffect
N task attempts fail on the same node within a windowAdd node to blacklist for this app
NodeReport from RM says UNHEALTHYAdd node to blacklist immediately
tez.am.maxtaskfailures.per.nodePer-node failure threshold (default 3)
tez.am.node-blacklisting.enabledMaster toggle
tez.am.node-blacklisting.ignore-threshold-node-percentDon't blacklist if it would remove more than N% of the cluster

A blacklisted node:

  • No new container requests go to it.
  • Held containers on it are released.
  • Existing attempts already running on it are allowed to finish (not preemptively killed).

Output loss

A common late-stage failure: a downstream task reads from a shuffle source and finds the source's output is gone (the NM died, the disk was wiped, etc).

grep -rn "OUTPUT_LOST\|reportSourceTaskAttemptFailed\|inputFailedEvent" \
  tez-runtime-library/src/main/java tez-dag/src/main/java | head

Flow:

  1. Destination TA's Fetcher fails permanently on source S.a.0.
  2. Destination TA sends InputReadErrorEvent via umbilical heartbeat.
  3. AM's VertexImpl receives the event, marks S.a.0 as OUTPUT_LOST.
  4. TaskImpl for S.a schedules a new attempt S.a.1.
  5. New attempt re-runs, produces fresh outputs, downstream resumes.

This is the cascading-rerun engine — and a source of pathological behavior when a single bad disk poisons many downstream tasks.


AM failover

find tez-dag/src/main/java -name "RecoveryService.java"
find tez-dag/src/main/java -name "RecoveryEventHandler.java"
wc -l $(find tez-dag/src/main/java -name "RecoveryService.java")

YARN keeps a small budget of AM restarts (yarn.resourcemanager.am.max-attempts, default 2). When the AM crashes:

  1. RM allocates a fresh AM container, attempt index incremented.
  2. New AM boots, sees attempt > 1, enters recovery mode.
  3. RecoveryService reads the recovery log from HDFS (written by attempt 1).
  4. Replays events to reconstruct DAG, Vertex, Task, TaskAttempt state.
  5. Inherits any pre-existing containers via AMRMClient.getContainersFromPreviousAttempts.
  6. Resumes scheduling from the last consistent state.

RecoveryService

grep -n "writeEvent\|flush\|recover\|RecoveryEventType" \
  $(find tez-dag/src/main/java -name "RecoveryService.java" \
                                -o -name "RecoveryEventHandler.java")

Append-only event log on HDFS, one file per app attempt:

hdfs:///tmp/staging/<user>/.staging/application_<id>/appattempt_<id>_NNNNNN/
  recovery/
    summary.dag_1.recovery
    dag_1.recovery

Event kinds:

EventWhen written
DAGSubmittedEventDAG arrives at AM
DAGInitializedEventDAG state machine reaches INITED
DAGStartedEventDAG reaches RUNNING
DAGFinishedEventDAG terminal state
VertexInitializedEvent, VertexStartedEvent, VertexFinishedEventmirror state transitions
TaskStartedEvent, TaskFinishedEventper task
TaskAttemptStartedEvent, TaskAttemptFinishedEventper attempt
VertexConfigurationDoneEventparallelism finalized
find tez-dag/src/main/java -name "*Event*.java" -path "*recovery*"

Configuration

KeyDefaultEffect
tez.dag.recovery.enabledtrueMaster toggle.
tez.dag.recovery.flush.interval.secs30Periodic fsync of the recovery log.
tez.dag.recovery.io.buffer.size8192Buffer for the writer.
yarn.resourcemanager.am.max-attempts (YARN)2Caps recovery attempts.

What recovery can and cannot recover

CanCannot
DAG / Vertex / Task / TA state at last flushIn-flight events lost since last flush
Counter snapshots written to recovery logReal-time counter updates between flushes
Container assignmentsNM-side container state — those are rediscovered via getContainersFromPreviousAttempts
User payload of DAGPlanUser in-memory state inside a custom VertexManagerPlugin

A VertexManagerPlugin that holds in-memory state across events must override getState() / setState() to participate in recovery — otherwise it starts fresh on AM attempt 2.


DAG-level termination causes

find tez-dag/src/main/java -name "DAGTerminationCause.java"
cat $(find tez-dag/src/main/java -name "DAGTerminationCause.java")
CauseTrigger
DAG_KILLclient called dagClient.tryKillDAG()
VERTEX_FAILUREa vertex transitioned to FAILED
INIT_FAILUREDAG init failed (bad plan, bad input)
INTERNAL_ERRORunhandled exception inside AM
AM_USERCODE_FAILUREuser-supplied plugin threw
OUT_OF_TEZ_TASK_RESOURCESscheduler could not satisfy resource requests
RECOVERY_FAILUREreplay couldn't reconstruct prior state

Reading exercise

  1. grep -n "transition\|FAILED\|KILLED" \ $(find tez-dag/src/main/java -name "TaskAttemptImpl.java") | head -40 — count terminal transitions.
  2. grep -rn "OUTPUT_LOST" tez-dag/src/main/java tez-runtime-library/src/main/java — what triggers this cause?
  3. cat $(find tez-dag/src/main/java -name "RecoveryService.java") | head -200 — read the writer loop.
  4. grep -n "RecoveryEvent" $(find tez-dag/src/main/java -name "*.java" | head -50) — list all recovery event classes.
  5. wc -l $(find tez-dag/src/main/java -name "TaskAttemptTerminationCause.java" \ -o -name "DAGTerminationCause.java" \ -o -name "VertexTerminationCause.java")
  6. grep -rn "node-blacklisting\|blacklistNode" tez-dag/src/main/java | head — where is blacklist enforcement implemented?

Common bugs and symptoms

SymptomLikely cause
OUTPUT_LOST cascade kills the DAGOne bad NM is poisoning downstream; blacklist or pin off it.
Recovery infinite-loops on attempt 2Corrupt recovery log; check fsync gating and tez.dag.recovery.flush.interval.secs.
INTERNAL_PREEMPTION repeatedlyTez scheduler is preempting its own attempts; usually a higher-priority vertex starving lower; tune priorities.
All attempts of one task fail in < 1sUser code throws deterministically; cause is APPLICATION_ERROR.
DAG hangs forever after one task failsVertex failure policy is permissive (rare); look at the vertex transition.
NODE_BLACKLISTED removes 100% of clusterignore-threshold-node-percent not set; the DAG is now unschedulable.
AM crashes, attempt 2 boots, but tasks restart from scratchRecovery disabled or HDFS staging dir not accessible to attempt 2.

Validation: prove you understand this

  1. List five TaskAttemptTerminationCause values that do not count against the attempt budget. Cite where the predicate lives.
  2. Explain in two sentences how an OUTPUT_LOST on source S.a.0 triggers a re-run of S.a, not just S.a.0's downstream consumers.
  3. Identify the HDFS path pattern under which recovery events are written. Give the exact path components.
  4. Describe what happens to in-flight DataMovementEvents when the AM crashes mid-DAG and AM attempt 2 takes over.
  5. Given tez.am.maxtaskfailures.per.node=3 and an 8-node cluster, what is the smallest sequence of task failures that triggers blacklisting? Show the math.

Counters and Diagnostics

When a Tez DAG misbehaves, you have two primary signals: counters (numeric aggregates from every task) and diagnostics strings (free-text causes at every level of the hierarchy). This chapter is the operator's reference for both.


TezCounters

find tez-api/src/main/java -name "TezCounters.java"
wc -l $(find tez-api/src/main/java -name "TezCounters.java")
grep -n "class TezCounters\|addGroup\|getGroup\|findCounter" \
  $(find tez-api/src/main/java -name "TezCounters.java")

TezCounters is a typed map of (groupName) → CounterGroup → (counterName) → Counter. It is hash-cons style: identical strings share storage. Counters are long values with thread-safe increment.

find tez-api/src/main/java -name "TaskCounter.java"
cat $(find tez-api/src/main/java -name "TaskCounter.java")

Standard groups

GroupSource classWhat lives there
org.apache.tez.common.counters.TaskCounterTaskCounter enumPer-task framework metrics
org.apache.tez.common.counters.DAGCounterDAGCounter enumPer-DAG aggregate metrics
org.apache.tez.common.counters.FileSystemCounterFileSystemCounterPer-FS bytes-read/written
org.apache.hadoop.mapreduce.JobCounter(legacy MR)Compatibility shim
User-defined<your class name>App code

Key TaskCounter values

grep -n "INPUT_RECORDS_PROCESSED\|OUTPUT_RECORDS\|SPILLED_RECORDS\|SHUFFLE_BYTES\|GC_TIME_MILLIS\|REDUCE_INPUT_GROUPS" \
  $(find tez-api/src/main/java -name "TaskCounter.java")
CounterMeaning
INPUT_RECORDS_PROCESSEDRecords read from logical inputs
OUTPUT_RECORDSRecords written to logical outputs
OUTPUT_BYTESBytes written (post-compression for shuffle)
OUTPUT_BYTES_PHYSICALBytes actually written to disk
SPILLED_RECORDSRecords spilled by sorter
NUM_SPILLSNumber of spill files created
MERGED_MAP_OUTPUTSSpills merged on the source side
SHUFFLE_BYTESBytes fetched by shuffle
SHUFFLE_BYTES_TO_MEM, SHUFFLE_BYTES_TO_DISKFetcher allocation split
REDUCE_INPUT_GROUPSDistinct keys seen by a KeyValuesReader
REDUCE_INPUT_RECORDSTotal values across all groups
GC_TIME_MILLISSum of GC time during the task
CPU_MILLISECONDSProcess CPU time
COMMITTED_HEAP_BYTESHeap size at task end
PHYSICAL_MEMORY_BYTES, VIRTUAL_MEMORY_BYTESProcess memory snapshot

DAGCounter

find tez-api/src/main/java -name "DAGCounter.java"
cat $(find tez-api/src/main/java -name "DAGCounter.java")
CounterMeaning
NUM_SUCCEEDED_TASKSAggregated across all vertices
NUM_KILLED_TASKSSpeculative duplicates + user kills
NUM_FAILED_TASKSTA failures (counts every failed attempt)
TOTAL_LAUNCHED_TASKSLifetime sum
OTHER_LOCAL_TASKS, RACK_LOCAL_TASKS, DATA_LOCAL_TASKSLocality histogram
AM_CPU_MILLISECONDS, AM_GC_TIME_MILLISAM process counters
WALL_CLOCK_MILLISDAG submission → completion

Aggregation: task → TA → vertex → DAG

flowchart TD
  TA[TaskAttempt counters] -->|flushed via heartbeat| T[Task counters]
  T -->|on TASK_SUCCEEDED| V[Vertex counters]
  V -->|on VERTEX_SUCCEEDED| D[DAG counters]

Mechanism:

  1. Each LogicalIOProcessorRuntimeTask accumulates counters in process.
  2. TaskReporter heartbeat carries a snapshot to the AM via TezTaskUmbilicalProtocol.statusUpdate.
  3. AM's TaskAttemptImpl stores the latest snapshot.
  4. On TASK_SUCCEEDED, the winning attempt's counters become the Task counters; other attempts are discarded.
  5. On VERTEX_SUCCEEDED, VertexImpl sums all task counters into the vertex counters.
  6. On DAG_SUCCEEDED, DAGImpl sums all vertex counters into DAG counters and includes AM_* and DAG_* self-counters.
grep -n "incrCounters\|aggregateCounters\|getCounters\|setCounters" \
  $(find tez-dag/src/main/java -name "TaskAttemptImpl.java" \
                                -o -name "TaskImpl.java" \
                                -o -name "VertexImpl.java" \
                                -o -name "DAGImpl.java") | head -30

Counter limits (and how they kill DAGs)

grep -n "COUNTERS_MAX\|TEZ_COUNTERS_MAX\|countersMax" \
  tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java
KeyDefaultCap on
tez.counters.max1200Total counter count per TezCounters instance
tez.counters.max.groups500Group count
tez.counters.group-name.max256Length of a group name
tez.counters.counter-name.max64Length of a counter name

Exceeding any limit throws LimitExceededException. This typically happens when:

  • An app creates a counter per unique key (e.g. per file path).
  • A user vertex manager creates per-task counters.
  • A DAG has very many vertices, each contributing many counters, and the DAG-level aggregate blows the cap.

The exception propagates up the heartbeat path and kills the DAG with INTERNAL_ERROR. Look for LimitExceededException in the AM log to confirm.


Diagnostics strings

Every level (TA → Task → Vertex → DAG) has a List<String> of diagnostics.

LevelClassPopulated by
TaskAttemptTaskAttemptImplUser exception stacks, framework errors, container exit reasons
TaskTaskImplAggregate of failed attempt diagnostics + scheduling diagnostics
VertexVertexImplAggregate of failed task diagnostics + vertex manager events
DAGDAGImplAggregate of failed vertex diagnostics + DAG-level events
grep -n "addDiagnostic\|diagnostics\|getDiagnostics" \
  $(find tez-dag/src/main/java -name "TaskAttemptImpl.java" \
                                -o -name "TaskImpl.java" \
                                -o -name "VertexImpl.java" \
                                -o -name "DAGImpl.java") | head -40

When a DAG completes, DAGStatus.getDiagnostics() is the union of every diagnostic at every level. This is what tez-tool and the Tez UI display.


Where to find diagnostics

SurfacePathNotes
Client return valueDAGStatus.getDiagnostics()Concatenated strings
AM logsyslogSearch for DIAG:, ERROR, the cause keyword
ATSDAGFinishedEvent.diagnostics, VertexFinishedEvent.diagnostics, etcOne field per entity
Tez UIDAG / Vertex / Task pageRenders the same ATS fields
dag.dot (if dumped)local file written by TezClient when enabledStatic plan only, no diagnostics
Counter dump from CLItez-tool dump-counters <appId>Counter snapshots
grep -rn "DIAG\|addDiagnosticInfo" tez-dag/src/main/java | head -20

Counters in the AM log

A typical successful-task log line:

TaskAttempt: [attempt_1_0_00_000000_0]
  TASK_ATTEMPT_FINISHED ...
  counters: Counters: 26
    org.apache.tez.common.counters.TaskCounter
      INPUT_RECORDS_PROCESSED=12345
      OUTPUT_RECORDS=12345
      OUTPUT_BYTES=4567890
      ...
grep -rn "Counters: " tez-dag/src/main/java | head

For diagnostic grepping, search the AM log for:

PatternWhat it finds
DIAG:Diagnostics appends
Counters:Counter dumps
LimitExceededExceptionCounter limit hits
TaskAttemptTerminationCauseFailure causes
TERMINATED_BY_CLIENTUser-initiated kills
OUTPUT_LOSTCascading reruns

Custom counters

User code accesses counters via the IPO context:

public class MyProcessor extends AbstractLogicalIOProcessor {
  @Override
  public void run(Map<String, LogicalInput> inputs, Map<String, LogicalOutput> outputs) {
    TezCounters counters = getContext().getCounters();
    counters.findCounter("MyApp", "ROWS_FILTERED").increment(1);
  }
}
grep -rn "getContext().getCounters\|getCounters()" \
  tez-tests/src/main/java tez-examples/src/main/java | head

Operational guidance:

  • Cap group/counter cardinality at compile time. Never use unbounded user input as a counter name.
  • One group per app; many counters per group.
  • Counter names are visible in ATS forever — treat them as a stable API.

Reading exercise

  1. cat $(find tez-api/src/main/java -name "TaskCounter.java") — read the enum.
  2. grep -n "incrCounter\|addCounters" \ $(find tez-runtime-library/src/main/java -name "*.java") | head -20 — find every place runtime increments counters.
  3. grep -rn "LimitExceededException" tez-api/src/main/java tez-dag/src/main/java — trace the kill path.
  4. find tez-tools -type f -name "*.java" | head — look at tez-tools for counter-dump tooling.
  5. grep -rn "addDiagnosticInfo\|addDiagnostic" tez-dag/src/main/java | wc -l — count the call sites; build a mental model of "where diagnostics flow in."
  6. Open the Tez UI for a recent app, navigate DAG → Vertex → Task, and compare each level's counter view against what the AM log shows.

Common bugs and symptoms

SymptomLikely cause
DAG fails with LimitExceededExceptionToo many counters — search AM log for the limit that triggered.
Counters at DAG level don't sum to vertex countersOne vertex failed; its counters are excluded from the sum.
Counter group missing from ATSCounter was never incremented (zero is not stored).
Diagnostics string truncatedATS field length limit; check yarn.timeline-service.client.max-attempts and entity size.
INPUT_RECORDS_PROCESSED is zero but task succeededInput had zero rows, or a custom IPO does not increment the standard counter.
SHUFFLE_BYTES_TO_DISK >> SHUFFLE_BYTES_TO_MEMFetcher exhausted memory budget; tune tez.runtime.shuffle.memory.limit.percent.
Wall clock huge vs CPU millisTask spent most time waiting (shuffle, GC, blocked); not CPU bound.

Validation: prove you understand this

  1. Name the four standard counter groups and the class that defines each.
  2. Explain why two attempts of the same task can have different counter values, and what happens to the loser's counters.
  3. Calculate the smallest DAG that can hit tez.counters.max=1200, assuming each TaskCounter contributes 26 counters per vertex on success.
  4. Trace the path of a single counter increment in user code through the classes that aggregate it up to the DAGStatus returned to the client.
  5. Given an AM log line DIAG: TaskAttempt attempt_1_0_05_000003_2 failed, cause=APPLICATION_ERROR, list the four levels where this diagnostic ultimately appears and the exact classes that store each copy.

Testing Framework

Tez ships three tiers of tests, each with a different cost/coverage tradeoff. Knowing which tier to use for a given change — and which patterns are considered idiomatic — is the difference between a patch that lands and one that sits in review.


Three tiers

TierModuleBoots...Run costUse for
Uniteach module's src/test/javanothing real; pure mocks + dispatchersecondsState-machine transitions, parsers, helper classes
Mini-clustertez-tests/src/test/javaMiniTezCluster (MiniYARNCluster + Tez session)seconds-to-minutesEnd-to-end DAGs in a JVM
Full clusterexternalreal YARN clusterminutesRelease validation, perf tests
find . -name "MiniTezCluster.java"
wc -l $(find . -name "MiniTezCluster.java")

Unit testing state machines

The dominant pattern for Tez unit tests is arrange-state, send-event, drain-dispatcher, assert. Reference: TestVertexImpl, TestTaskImpl, TestTaskAttemptImpl, TestDAGImpl.

find tez-dag/src/test/java -name "TestVertexImpl.java"
wc -l $(find tez-dag/src/test/java -name "TestVertexImpl.java")
grep -n "DrainDispatcher\|MockVertex\|MockDAG\|setupVertices\|dispatcher.await" \
  $(find tez-dag/src/test/java -name "TestVertexImpl.java") | head -20

Building blocks

ClassPurpose
DrainDispatcherSynchronous-ish event dispatcher; await() blocks until queue drains
MockVertex, MockDAG, MockTask, etcLightweight stand-ins that satisfy Vertex etc interfaces
MockClockControllable clock for time-dependent transitions
MockHistoryEventHandlerCaptures recovery / ATS events for assertion
Mockito (mock, when, verify)Mocks for collaborators (TaskSchedulerManager, etc)

Recipe

@Test
public void testVertexInitsAfterAllInputsReady() throws Exception {
  // 1. Arrange
  DrainDispatcher dispatcher = new DrainDispatcher();
  dispatcher.init(new Configuration());
  dispatcher.start();

  TaskSchedulerManager sched = mock(TaskSchedulerManager.class);
  DAG dag = mock(DAG.class);
  when(dag.getID()).thenReturn(TezDAGID.getInstance(appAttemptId, 1));

  VertexImpl v = new VertexImpl(vertexId, plan, name, conf,
      dispatcher.getEventHandler(),
      mock(TaskCommunicatorManagerInterface.class),
      mockClock, taskHeartbeatHandler, mockAppContext,
      VertexLocationHint.create(...), dispatcher,
      mockVertexManager, ...);

  // 2. Act
  dispatcher.getEventHandler().handle(
      new VertexEvent(vertexId, VertexEventType.V_INIT));
  dispatcher.await();

  // 3. Assert
  assertEquals(VertexState.INITED, v.getState());
  verify(sched, never()).taskAllocated(any(), any(), any());
}

Key idioms:

  • Never call Thread.sleep. Always dispatcher.await().
  • Never assume event ordering unless you've sent events sequentially through the same dispatcher.
  • Mock the AppContext aggressively. It's the god-object; mocking it lets each test isolate exactly the collaborators it cares about.

MiniTezCluster tests

find tez-tests/src/test/java -name "TestOrderedWordCount.java" \
                              -o -name "TestMRRJobsDAGApi.java" \
                              -o -name "TestExtServicesWithLocalMode.java" | head
wc -l $(find tez-tests/src/test/java -name "TestOrderedWordCount.java")

MiniTezCluster boots:

  • A MiniYARNCluster (in-process RM + N NMs).
  • A MiniDFSCluster (in-process NN + DNs) — optional.
  • A TezClient configured against the mini cluster.
grep -n "MiniTezCluster\|new MiniYARNCluster\|setup\|tearDown" \
  $(find tez-tests/src/test/java -name "TestOrderedWordCount.java")

Lifecycle

flowchart TD
  setUp[BeforeClass: setup] --> mini[Start MiniTezCluster]
  mini --> tez[Create TezClient]
  test1[Test: build DAG] --> submit[submitDAG]
  submit --> wait[waitForCompletion]
  wait --> assert[Assert DAGStatus + counters]
  tear[AfterClass: tearDown] --> stop[Stop TezClient + cluster]

Common shape

@BeforeClass
public static void setup() throws Exception {
  conf = new Configuration();
  conf.setInt(YarnConfiguration.RM_NM_HEARTBEAT_INTERVAL_MS, 100);
  miniTezCluster = new MiniTezCluster("name", 1, 1, 1);
  miniTezCluster.init(conf);
  miniTezCluster.start();
  TezConfiguration tezConf = new TezConfiguration(miniTezCluster.getConfig());
  tezClient = TezClient.create("test", tezConf);
  tezClient.start();
}

@AfterClass
public static void tearDown() throws Exception {
  tezClient.stop();
  miniTezCluster.stop();
}

@Test(timeout = 60_000)
public void testWordCount() throws Exception {
  DAG dag = buildWordCountDAG();
  DAGClient client = tezClient.submitDAG(dag);
  DAGStatus status = client.waitForCompletionWithStatusUpdates(EnumSet.of(StatusGetOpts.GET_COUNTERS));
  assertEquals(DAGStatus.State.SUCCEEDED, status.getState());
  assertEquals(EXPECTED_ROW_COUNT,
      status.getDAGCounters().findCounter(TaskCounter.OUTPUT_RECORDS).getValue());
}

@Test(timeout = ...) is mandatory

A mini-cluster test that hangs blocks the whole CI build. Every MiniTezCluster test has a JUnit timeout in the 60-300 second range.


Local mode for tests

Faster than MiniTezCluster: no YARN, no DFS, everything in-process.

grep -rn "TEZ_LOCAL_MODE\|setLocalMode\|tez.local.mode" \
  tez-tests/src/test/java tez-runtime-library/src/test/java | head

Used for:

  • Unit-style integration tests where YARN isn't relevant.
  • Examples / smoke tests in tez-examples.
  • Quick repro of runtime issues — see the local mode deep dive.

Patterns: do and don't

Do

grep -rn "DrainDispatcher\|await()" tez-dag/src/test/java | wc -l
  • Send all setup events synchronously, then call dispatcher.await().
  • Use MockClock and advance it explicitly.
  • Capture emitted events with a custom handler and assert on the collection.
  • Use @Before / @After to reset shared dispatcher and mocks.
  • Mock external collaborators (TaskScheduler, ContainerLauncher, NMClient); never instantiate the real ones in unit tests.
  • Bound parallelism in mini-cluster tests (numNodeManagers=1 is usually fine).

Don't

Anti-patternWhy it bites
Thread.sleep(N) to wait for stateFlake city; transition time depends on machine load.
while (vertex.getState() != X) busy loopSame flake, plus burns CPU.
Assume e1 happens before e2 when both posted asyncDispatcher orders by arrival, not posting.
Static state across testsTests run in some JVM order; static leaks corrupt later tests.
Real network calls in unit testsSlow, flaky, often forbidden in CI sandboxes.
System.exit from tested code pathsKills the JVM running the test runner.

CI / build integration

cat pom.xml | head -100
grep -n "surefire\|failsafe" pom.xml
Maven pluginRunsDefault scope
maven-surefire-pluginunit tests under src/test/javaTest*.java, *Test.java, *Tests.java, *TestCase.java
maven-failsafe-pluginintegration testsIT*.java, *IT.java, *ITCase.java

Tez puts MiniTezCluster tests under surefire as well (no separation), which is one reason mvn test is slow. Run a single test:

mvn test -pl tez-dag -Dtest=TestVertexImpl
mvn test -pl tez-tests -Dtest=TestOrderedWordCount#testWordCount

Test-only utilities

find . -path "*/src/main/java/*" -name "Test*.java" | head
find . -path "*/src/test/java/*" -name "Mock*.java" | head

Helpful classes (some live under src/main so they're reusable downstream):

ClassModulePurpose
MiniTezClustertez-testsBootstrap an in-process cluster
TezClientForTesttez-apiSubclass exposing internals
MockDAG, MockVertex, MockTasktez-dag test sourcesPlain-old objects implementing state-machine interfaces
TestProcessor, TestInput, TestOutputtez-testsNo-op IPOs for plan plumbing tests
DrainDispatcherhadoop-yarn-common (depended upon)Dispatcher with await()

Reading exercise

  1. cat $(find tez-dag/src/test/java -name "TestVertexImpl.java") | head -150 — read the setup + first test.
  2. grep -n "@Test" $(find tez-dag/src/test/java -name "TestTaskImpl.java" \ -o -name "TestVertexImpl.java" \ -o -name "TestDAGImpl.java") | wc -l — get a sense of the test surface.
  3. cat $(find tez-tests/src/test/java -name "TestOrderedWordCount.java") | head -200 — see a real MiniTezCluster test.
  4. grep -rn "dispatcher.await\|DrainDispatcher" tez-dag/src/test/java | wc -l — confirm the pattern is universal.
  5. grep -rn "Thread.sleep" tez-dag/src/test/java | head — find any stragglers using the anti-pattern; understand why each one is there (usually waiting on real OS state, e.g. a port).
  6. mvn -pl tez-dag test -Dtest=TestVertexImpl -DfailIfNoTests=false — run one and read the output structure.

Common bugs and symptoms

SymptomLikely cause
Test passes locally, flakes in CIThread.sleep waiting for transition; replace with dispatcher.await().
MiniTezCluster test hangs foreverMissing @Test(timeout = …); AM never finishes due to test bug.
BindException in mini-clusterPrevious test didn't stop(); ports leaked.
State machine throws InvalidStateTransitionException in testTest sent event in wrong state; check arrange step.
Mock returns null from getDAG()Forgot to stub when(appContext.getCurrentDAG()).
OutOfMemoryError: Java heap space in surefireEach test forking JVM holds too much; tune argLine=-Xmx1g in pom.
Test depends on counter being non-zero, but it's zeroCounter incremented in code path the mock skipped; verify the code under test actually ran.

Validation: prove you understand this

  1. Outline the four-step recipe for a state-machine unit test, with the exact call to drain the dispatcher.
  2. Name three classes from tez-dag/src/test/java that implement the Mock* pattern and what each replaces.
  3. Explain why Thread.sleep is an anti-pattern in Tez tests and what the correct alternative is for time-dependent transitions.
  4. Given a hang in TestVertexImpl#testTaskKill, list the first three diagnostics you'd inspect (no debugger).
  5. Describe the difference between a MiniTezCluster test and a local-mode test, and give one scenario where each is the correct choice.

Hive-on-Tez Labs

Hive on Tez is the production context that has carried Tez through the last decade. Every large Hive deployment that's not on Spark is on Tez. Understanding the Tez/Hive boundary is therefore not a niche skill — it is the production debugging skill for both projects.

These labs work from a SQL query down through Hive compilation, into a Tez DAG, into running tasks, and back out through failure attribution and remediation. They are deliberately hands-on; every step has commands to run against ~/tez-src and ~/hive-src.

Prerequisites

ToolRequired versionWhy
Apache Tez0.10.xMatches the rest of this book
Apache Hive3.x or 4.xProduction-relevant; Hive 2 is end of life
Hadoop3.3.xTez and Hive both target this
JDK11 (Hive 4) or 8 (Hive 3)Per project requirements
Local clones~/tez-src, ~/hive-srcAll commands assume these paths

If you only have one of Hive 3 vs Hive 4, the labs work either way — they call out the delta where it matters. Class paths used throughout these labs (the integration boundary):

org.apache.hadoop.hive.ql.exec.tez.TezTask                  — Hive's "execute on Tez" task
org.apache.hadoop.hive.ql.exec.tez.DagUtils                 — Builds Tez DAG from Hive plan
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager    — Pools Tez sessions
org.apache.hadoop.hive.ql.exec.tez.TezSessionState          — One Hive session = one Tez AM
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource          — Map-side record source
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource       — Reduce-side record source

Verify these exist in your tree:

find ~/hive-src -path "*ql/exec/tez/TezTask.java"
find ~/hive-src -path "*ql/exec/tez/DagUtils.java"
find ~/hive-src -path "*ql/exec/tez/TezSessionPoolManager.java"
find ~/hive-src -path "*ql/exec/tez/TezSessionState.java"
find ~/hive-src -path "*ql/exec/tez/MapRecordSource.java"
find ~/hive-src -path "*ql/exec/tez/ReduceRecordSource.java"

If any are missing, your Hive tree may be too old. Hive 3.1.x and 4.0.x both have all six.

The Tez/Hive Boundary, At a Glance

The boundary is one Hive class — TezTask — and a handful of supporting utilities. Above the boundary, Hive owns: SQL parsing, semantic analysis, logical plan, physical plan (MapWork/ReduceWork). Below the boundary, Tez owns: DAG execution, task scheduling, shuffle, recovery.

flowchart TD
  subgraph Hive
    A[SQL Query] --> B[Parser]
    B --> C[Semantic Analyzer]
    C --> D[Logical Plan]
    D --> E[Physical Plan<br/>MapWork / ReduceWork]
    E --> F[TezTask.execute]
    F --> G[DagUtils.createVertex<br/>DagUtils.createEdge]
    G --> H[DAG object]
  end
  subgraph Tez
    H --> I[TezSession.submitDAG]
    I --> J[DAGAppMaster<br/>tez-dag]
    J --> K[Vertex tasks<br/>tez-runtime-internals]
    K --> L[Shuffle I/O<br/>tez-runtime-library]
  end

That TezTaskDagUtilsDAGsubmitDAG sequence is the entire integration surface. The seven labs below walk it from the top (Lab H1) to the runtime (Lab H6).

Lab Index

LabGoalOutput artifact
H1: SQL → DAGTrace a SELECT...GROUP BY...ORDER BY from SQL to a labelled Tez DAGDAG diagram
H2: Inspect DAGCapture and inspect the DAG Hive submitsEXPLAIN output + .dot file
H3: Debug a queryWalk from a "Vertex failed" message to the actual exceptionFailure narrative
H4: Bug attributionUse stack-trace top frame to attribute to Hive, Tez runtime, Tez AM, or YARNDecision tree applied
H5: Reproducing bugsBuild a minimum reproducer for a Hive-on-Tez bugRepro tarball
H6: DiagnosticsWrite a small diagnostic patch (log, counter, config) and attach to JIRAPatch + JIRA

Reading Order

H1 and H2 are foundational — do them in order. H3 and H4 are debugging skills that build on each other. H5 and H6 are the contributor-facing skills you need to file a useful Hive-on-Tez JIRA from a production observation.

If you are coming to this section from the Capstone, H4 and H5 are the most directly relevant.

Where the Real Work Happens

The Tez/Hive boundary is one of the most-asked-about areas on both project mailing lists. The labs are written so that, when you encounter a production issue, you can:

  1. Read the stack trace and attribute it (H4).
  2. Locate the SQL that produced the DAG (H1).
  3. Capture the DAG and find the relevant vertex (H2).
  4. Identify the failing task and its log (H3).
  5. Reproduce it minimally on MiniTezCluster (H5).
  6. Attach a diagnostic patch to a JIRA to get more data from the reporter (H6).

That six-step routine, executed crisply, is what gets Hive-on-Tez JIRAs resolved.

Validation for the Section

You have absorbed the Hive-on-Tez section when, given a freshly-failing query in a production Hive-on-Tez deployment, you can:

  1. Within 10 minutes, identify which project owns the failure (Hive / Tez / YARN).
  2. Within 30 minutes, locate the relevant code on both sides of the boundary.
  3. Within 1 hour, capture the DAG and the failing task's log.
  4. Within a day, produce a minimum reproducer on MiniTezCluster.
  5. Within a week, file a JIRA on the right project with all the data needed.

That is the standard a Hive-on-Tez committer holds themselves to. The labs build the muscle.

Lab H1: SQL → DAG

Background

A user writes:

SELECT a, COUNT(*) FROM t GROUP BY a ORDER BY a;

Hive compiles this into a Tez DAG with three vertices and two edges. This lab walks the compilation path: parser → semantic analyzer → logical plan → physical plan (MapWork/ReduceWork) → TezTaskDagUtils.createVertex/createEdge → submitted DAG.

By the end you will have a labelled DAG diagram for this query and you will be able to trace any similar query from SQL to runtime topology.


Setup

cd ~/hive-src
git log --oneline -1                    # know the version you're on
find . -name "TezTask.java"             # boundary class
find . -name "DagUtils.java"            # DAG construction

A representative test table (use Hive CLI or beeline):

CREATE TABLE t (a INT, b STRING)
  STORED AS ORC;

INSERT INTO t VALUES (1,'x'),(1,'y'),(2,'z'),(3,'p'),(3,'q'),(3,'r');

The query under study:

SELECT a, COUNT(*) AS c
  FROM t
  GROUP BY a
  ORDER BY a;

Step 1: Parser (lexing, AST)

Hive uses ANTLR. The grammar lives in:

find ~/hive-src -name "HiveParser.g" -o -name "HiveLexer.g"

The parser produces an AST. From the CLI:

EXPLAIN AST SELECT a, COUNT(*) AS c FROM t GROUP BY a ORDER BY a;

You will see a Lisp-style tree:

(TOK_QUERY
  (TOK_FROM (TOK_TABREF (TOK_TABNAME t)))
  (TOK_INSERT
    (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE))
    (TOK_SELECT
      (TOK_SELEXPR (TOK_TABLE_OR_COL a))
      (TOK_SELEXPR (TOK_FUNCTIONSTAR COUNT) c))
    (TOK_GROUPBY (TOK_TABLE_OR_COL a))
    (TOK_ORDERBY (TOK_TABSORTCOLNAMEASC (TOK_TABLE_OR_COL a)))))

The AST is the input to the next phase.


Step 2: Semantic Analyzer

The AST goes through SemanticAnalyzer:

find ~/hive-src -name "SemanticAnalyzer.java" | head

It resolves table references, expands *, type-checks aggregates, and produces a Query Block (QB) tree → Operator tree (logical plan).

EXPLAIN LOGICAL SELECT a, COUNT(*) AS c FROM t GROUP BY a ORDER BY a;

You see operators like TS (TableScan), SEL (Select), GBY (GroupBy), RS (ReduceSink), FS (FileSink). Two GBY and two RS are typical for a GROUP BY ... ORDER BY (one pair each).


Step 3: Physical Plan — MapWork, ReduceWork

The logical operator tree is converted to a physical plan whose top-level units are MapWork, ReduceWork, and MergeJoinWork. For our query, Hive produces three Work units:

WorkPurposeOperators inside
MapWork (Map 1)Read t, partial aggregate by aTS → SEL → GBY → RS
ReduceWork (Reducer 2)Final aggregate by a, prepare for sortGBY → RS
ReduceWork (Reducer 3)Total-order sort by a, write outputSEL → FS

Inspect the structures:

grep -rn "class MapWork" ~/hive-src/ql/src/java/
grep -rn "class ReduceWork" ~/hive-src/ql/src/java/

Get this from Hive directly:

EXPLAIN SELECT a, COUNT(*) AS c FROM t GROUP BY a ORDER BY a;

Look for the Stage: Stage-1 / Tez block and the per-vertex sections (Map 1, Reducer 2, Reducer 3).


Step 4: TezTask — The Boundary

The Hive-side execution entry point for a Tez query:

grep -n "public int execute" $(find ~/hive-src -name TezTask.java)

TezTask.execute(TaskQueue queue, DriverContext driverContext) does roughly:

  1. Acquire a TezSessionState (existing pooled session or new one) via TezSessionPoolManager.
  2. Build a DAG from MapWork/ReduceWork via DagUtils.
  3. Submit the DAG via the session's TezSession.submitDAG.
  4. Block on the DAGClient for completion.
  5. Surface counters and diagnostics.

The DAG-building call:

grep -n "DagUtils\|dagUtils\.create" $(find ~/hive-src -name TezTask.java)

You will see calls to DagUtils.createDag or DagUtils.buildDag (name varies by Hive version).


Step 5: DagUtils.createVertex / createEdge

The mapping from Hive Work units to Tez Vertex happens here:

find ~/hive-src -name "DagUtils.java"
grep -n "createVertex\|public Vertex " $(find ~/hive-src -name DagUtils.java)
grep -n "createEdge\|public Edge "     $(find ~/hive-src -name DagUtils.java)

For our query, DagUtils produces:

Hive WorkTez VertexProcessor descriptor
MapWork "Map 1"Vertex "Map 1"MapTezProcessor
ReduceWork "Reducer 2"Vertex "Reducer 2"ReduceTezProcessor
ReduceWork "Reducer 3"Vertex "Reducer 3"ReduceTezProcessor

And two edges:

FromToEdgeProperty kind
Map 1Reducer 2SCATTER_GATHER (shuffle)
Reducer 2Reducer 3SCATTER_GATHER (with a 1-task sink for total order)

The "1-task sink for total order" is how Hive forces a single reducer for ORDER BY (no LIMIT): Reducer 3 has parallelism 1.


Step 6: The Submitted DAG

After DagUtils.createDag returns, TezTask submits via the session:

grep -n "submitDAG" $(find ~/hive-src -name TezTask.java)
grep -n "submitDAG" $(find ~/hive-src -name TezSessionState.java)

The call lands on TezSession.submitDAG(DAG dag) in tez-api:

grep -n "public DAGClient submitDAG" \
  $(find ~/tez-src/tez-api/src/main/java -name TezClient.java)

From there, Reading the Codebase Step 2's worked exercise picks up.


Step 7: Validation — Labelled DAG Diagram

Build this diagram for our query and save it.

flowchart TD
  M1["Map 1<br/>processor: MapTezProcessor<br/>operators: TS → SEL → GBY → RS<br/>parallelism: numSplits(t)"]
  R2["Reducer 2<br/>processor: ReduceTezProcessor<br/>operators: GBY → RS<br/>parallelism: hive.exec.reducers.* tuning"]
  R3["Reducer 3<br/>processor: ReduceTezProcessor<br/>operators: SEL → FS<br/>parallelism: 1 (ORDER BY)"]
  M1 -->|"SCATTER_GATHER<br/>partition on a"| R2
  R2 -->|"SCATTER_GATHER<br/>partition on sort key"| R3

Capture this as your validation artifact (~/tez-notes/hive-h1-dag.md).


Step 8: Print the DAG via Hive

Hive has a setting to print a runtime summary of the executed DAG:

SET hive.exec.print.summary=true;
SELECT a, COUNT(*) AS c FROM t GROUP BY a ORDER BY a;

The summary, printed after the query, lists each vertex, its task count, and counters. Confirm the topology matches the diagram. (If you see four vertices, you may be on a build that splits ORDER BY differently; record the actual topology.)

For more detail, tez.am.dag.dot.file.location writes a .dot file — used in Lab H2.


Step 9: Counter Pop Quiz

After the query runs (with hive.exec.print.summary=true), find:

CounterWhere it livesWhat it measures
INPUT_RECORDS_PROCESSEDMap 1Rows read from t
OUTPUT_RECORDSMap 1Records emitted to shuffle (post partial-aggregate)
REDUCE_INPUT_GROUPSReducer 2Distinct a values seen
OUTPUT_RECORDSReducer 2Records to Reducer 3
OUTPUT_RECORDSReducer 3Final result row count

For our 6-row input with 3 distinct values of a:

CounterExpected
Map 1 INPUT_RECORDS_PROCESSED6
Map 1 OUTPUT_RECORDS3 (after partial GBY)
Reducer 2 REDUCE_INPUT_GROUPS3
Reducer 2 OUTPUT_RECORDS3
Reducer 3 OUTPUT_RECORDS3

Verify against your actual run.


Validation Artifacts

  1. The labelled mermaid DAG diagram saved at ~/tez-notes/hive-h1-dag.md.
  2. The EXPLAIN AST, EXPLAIN LOGICAL, and EXPLAIN outputs saved.
  3. The hive.exec.print.summary output for the actual run.
  4. The counter table above, with your actual numbers filled in.
  5. The grep results for createVertex and createEdge in DagUtils.java saved as ~/tez-notes/hive-h1-dagutils.txt.

You can now trace any Hive query through compilation to a Tez DAG. The next lab — Lab H2: Inspect the DAG — adds the production-grade techniques for capturing and inspecting that DAG at runtime.

Lab H2: Inspecting the Hive-Emitted DAG

Background

Lab H1 traced compilation to derive the DAG by reading code. In production, you can't always re-derive — you need to capture the DAG Hive submitted to Tez. This lab covers the four production-grade ways to do that:

  1. EXPLAIN FORMATTED and EXPLAIN VECTORIZATION DETAIL from Hive.
  2. TezTask logging at DEBUG level.
  3. The Tez UI (backed by YARN ATS or Tez SimpleHistoryLoggingService).
  4. The tez.am.dag.dot.file.location graphviz dump.

Plus the cross-cutting skill: mapping each Hive operator in the captured DAG to its Tez Input/Processor/Output (I/P/O).


Setup

# Hive CLI or beeline. Use the same table from H1:
CREATE TABLE IF NOT EXISTS t (a INT, b STRING) STORED AS ORC;
INSERT INTO t VALUES (1,'x'),(1,'y'),(2,'z'),(3,'p'),(3,'q'),(3,'r');

Verify Tez is the execution engine:

SET hive.execution.engine;        -- should be 'tez'

If not:

SET hive.execution.engine=tez;

Method 1: EXPLAIN FORMATTED

EXPLAIN FORMATTED emits a JSON-ish structure with operator details. Useful for programmatic parsing.

EXPLAIN FORMATTED
  SELECT a, COUNT(*) AS c FROM t GROUP BY a ORDER BY a;

Snippet of the output (structure varies by Hive version):

{
  "STAGE DEPENDENCIES": {
    "Stage-1": {"ROOT STAGE": "TRUE"},
    "Stage-0": {"DEPENDENT STAGES": "Stage-1"}
  },
  "STAGE PLANS": {
    "Stage-1": {
      "Tez": {
        "DagId:": "...",
        "Edges:": {
          "Reducer 2": [{"parent": "Map 1", "type": "SIMPLE_EDGE"}],
          "Reducer 3": [{"parent": "Reducer 2", "type": "SIMPLE_EDGE"}]
        },
        "Vertices:": {
          "Map 1": {
            "Map Operator Tree:": [...],
            "Execution mode:": "vectorized"
          },
          "Reducer 2": { ... },
          "Reducer 3": { ... }
        }
      }
    }
  }
}

Save it:

hive -e "EXPLAIN FORMATTED SELECT a, COUNT(*) FROM t GROUP BY a ORDER BY a;" \
  > ~/tez-notes/hive-h2-explain-formatted.json

What it tells you that EXPLAIN doesn't:

  • Edge types between vertices (SIMPLE_EDGE, BROADCAST_EDGE, CUSTOM_SIMPLE_EDGE, CUSTOM_EDGE).
  • Execution mode per vertex (vectorized, llap, neither).
  • The full operator tree per vertex, including row-schema annotations.

Method 2: EXPLAIN VECTORIZATION DETAIL

When a query runs slower than expected on Tez, vectorization is the first thing to check. EXPLAIN VECTORIZATION DETAIL shows per-operator whether vectorization succeeded and, if not, why.

EXPLAIN VECTORIZATION DETAIL
  SELECT a, COUNT(*) AS c FROM t GROUP BY a ORDER BY a;

Look for per-vertex Execution mode: vectorized and per-operator Vectorized: true. If you see notVectorizedReason: <reason>, that's the diagnostic.

Common notVectorizedReason values:

ReasonCause
UDF X is not vectorizedHive lacks a vectorized impl of a UDF you used
Reduce vectorization disabledhive.vectorized.execution.reduce.enabled=false
MAP_JOIN with key types ...Vectorized map-join doesn't support the key type combo
Column type X not supportedVectorization doesn't handle the column type (DECIMAL precision, etc.)

This explains a class of Hive-on-Tez perf surprises that are unrelated to Tez itself.


Method 3: TezTask Logging

Increase the log level on TezTask to capture the DAG it submitted:

SET hive.root.logger=DEBUG,console;
-- or, more targeted:
SET hive.log.explain.output=true;

hive.log.explain.output=true writes the EXPLAIN to the Hive log on each query — useful in production where you can't get a CLI run but can grep the log.

grep -A100 "DAG description" /var/log/hive/hive-server2.log | head -200

For the most detail, set DEBUG specifically on the Tez integration:

# in hive-site.xml or via SET:
log4j.logger.org.apache.hadoop.hive.ql.exec.tez=DEBUG
log4j.logger.org.apache.tez.dag.api=DEBUG

In DEBUG you see:

  • The serialised DAGPlan size at submit time.
  • Each Vertex's name, parallelism, processor descriptor class.
  • Each Edge's source, destination, data-source / data-movement / scheduling type.

Method 4: Tez UI

The Tez UI runs against YARN Timeline Service (ATS) or against the file-system SimpleHistoryLoggingService. When configured, every Tez DAG submitted by Hive (or anything else) is captured.

Capture is enabled via tez.history.logging.service.class:

grep "tez.history.logging.service.class" ~/tez-src/tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java

Once a DAG runs, browse to:

http://<atstimeline-host>:8188/applicationhistory/

or for the standalone Tez UI:

http://<tez-ui-host>:9999/tez-ui/

Click into a DAG to see:

  • Per-vertex stats (tasks, attempts, succeeded, failed, killed).
  • Edges with type and statistics (BYTES_TRANSFERRED).
  • A graphical DAG view.
  • Per-task and per-attempt logs.

For an offline cluster, the file-system logger writes JSON files under tez.simple.history.logging.dir. They can be loaded into the Tez UI later.


Method 5: tez.am.dag.dot.file.location

For visual inspection, Tez can write each DAG as a Graphviz .dot file:

SET tez.am.dag.dot.file.location=/tmp/tez-dags;
SELECT a, COUNT(*) AS c FROM t GROUP BY a ORDER BY a;

After the query:

ls /tmp/tez-dags/
# <app-id>_<dag-name>.dot

dot -Tpng /tmp/tez-dags/<file>.dot -o ~/tez-notes/hive-h2-dag.png

The .dot has the same nodes/edges as the Tez UI, in a portable format.

Caveat: the location is written from the AM, so on a real cluster it lands on the AM node, not the client. Configure the path to a shared filesystem or copy after the fact.


Mapping Hive Operators to Tez I/P/O

Now the cross-cutting skill: each Hive operator inside a Vertex maps to one of Tez's three runtime roles — Input, Processor, or Output. For our query:

Map 1 (vertex)

Hive operatorTez roleTez class
TableScanInputMRInput (from tez-mapreduce) or HiveInputFormat adapter
Select(inside Processor)
GroupBy (partial)(inside Processor)
ReduceSinkOutputOrderedPartitionedKVOutput (from tez-runtime-library)

The Processor itself: MapTezProcessor. Find it:

find ~/hive-src -name "MapTezProcessor.java"

Reducer 2 (vertex)

Hive operatorTez roleTez class
(shuffle in)InputOrderedGroupedKVInput
GroupBy (final)(inside Processor)
ReduceSinkOutputOrderedPartitionedKVOutput

Processor: ReduceTezProcessor. Find it:

find ~/hive-src -name "ReduceTezProcessor.java"

Reducer 3 (vertex)

Hive operatorTez roleTez class
(shuffle in)InputOrderedGroupedKVInput
Select(inside Processor)
FileSinkOutputMROutput (from tez-mapreduce)

Validation — A Side-by-Side Table

Build this for your captured DAG and save it:

VertexTasksInputs (class, source)ProcessorOutputs (class, dest)
Map 1(from EXPLAIN)MRInputt ORC filesMapTezProcessorOrderedPartitionedKVOutput → Reducer 2
Reducer 2(from EXPLAIN)OrderedGroupedKVInput ← Map 1ReduceTezProcessorOrderedPartitionedKVOutput → Reducer 3
Reducer 31OrderedGroupedKVInput ← Reducer 2ReduceTezProcessorMROutput → query result location

Save as ~/tez-notes/hive-h2-iop-mapping.md.


Worked Differences Across Methods

When all four capture methods agree, you have ground truth. When they disagree:

DisagreementLikely cause
EXPLAIN FORMATTED shows N vertices, runtime UI shows N+1Dynamic vertex insertion (CBO, runtime statistics)
tez.am.dag.dot.file.location shows fewer edges than UIEdges added by VertexManager at runtime (see Lab 4.2)
UI shows BROADCAST_EDGE, EXPLAIN says SIMPLE_EDGEHive's EXPLAIN is sometimes loose on edge type; trust the UI
Parallelism in UI differs from EXPLAIN's -mapred.reduce.taskstez.shuffle.vertex.manager reconfigured parallelism at runtime

Each disagreement is informative — it shows you which subsystem made the dynamic decision.


Production Diagnostic Routine

When asked "why is this query slow on Tez?":

  1. EXPLAIN FORMATTED to see the planned DAG.
  2. EXPLAIN VECTORIZATION DETAIL to spot non-vectorized operators.
  3. Run with hive.exec.print.summary=true to get the runtime summary.
  4. Open the Tez UI for the DAG, look at per-vertex and per-edge stats.
  5. Compare planned parallelism to actual (VertexManager may have changed it).
  6. Identify the bottleneck vertex by WALL_CLOCK_MILLIS or OUTPUT_RECORDS skew.

Most slowness is one of: vectorization failure, parallelism mismatch, data skew on a shuffle key, or AM overhead for a many-vertex DAG.


Validation Artifacts

  1. The EXPLAIN FORMATTED JSON saved to ~/tez-notes/hive-h2-explain-formatted.json.
  2. The EXPLAIN VECTORIZATION DETAIL saved to ~/tez-notes/hive-h2-vec.txt.
  3. A .png rendered from the .dot saved to ~/tez-notes/hive-h2-dag.png.
  4. The Tez UI URL for the actual DAG run, bookmarked.
  5. The Hive-operator-to-Tez-I/P/O table above, filled in for your captured DAG.

Once you can capture and read the DAG four ways, you are ready for failure analysis — Lab H3: Debug a Failed Query.

Lab H3: Debugging a Failed Query

Background

Production Hive-on-Tez failures usually surface as one line in the Hive console:

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask.
Vertex failed, vertexName=Map 1, vertexId=vertex_1718000000000_4321_1_00,
diagnostics=[Task failed, taskId=task_1718000000000_4321_1_00_000003,
diagnostics=[TaskAttempt 0 failed, info=[
Container container_e123_1718000000000_4321_01_000007 failed.
Exit code: 1
Container exited with a non-zero exit code 1. Last 4096 bytes of stderr :
... ]]]

That message is the tip. The actual exception is buried 3–4 hops away. This lab is the operational walk from that tip to the root-cause stack trace, with a fabricated-but- realistic example.


The Failure Hop Sequence

flowchart TD
  H[Hive console error<br/>'Vertex failed, vertexName=Map 1']
  H --> A[AM log<br/>tez-dag log on the AM container]
  A --> T[TaskAttempt diagnostics<br/>which task, which container]
  T --> C[Container stderr / stdout log<br/>on the worker node]
  C --> E[Actual exception<br/>the root cause]
  E --> X[Attribute to Hive / Tez runtime / Tez AM / YARN]

Five hops. Most engineers can do hop 1 (read the console). Few can do hops 2–4 without guidance. This lab is the guidance.


Step 1: Parse the Console Message

Take the message above and extract the identifiers:

IdentifierValue (in our example)Use for
Application IDapplication_1718000000000_4321YARN log retrieval
DAG IDdag_1718000000000_4321_1Tez UI URL
Vertex IDvertex_1718000000000_4321_1_00The failing vertex; here 00 ≈ Map 1
Task IDtask_1718000000000_4321_1_00_000003Which task within the vertex
Attempt0First attempt failed
Container IDcontainer_e123_1718000000000_4321_01_000007Where the work was running
Exit code1Process died abnormally

The format is consistent across all Hive-on-Tez failures. Memorise the structure.


Step 2: Get the AM Log

The Tez AM is itself a YARN container. Its log is fetched with yarn logs:

yarn logs -applicationId application_1718000000000_4321 \
  -containerId container_e123_1718000000000_4321_01_000001

The AM container is typically _01_000001 (always the first container of the app). The log streams to stdout. Pipe to a file:

yarn logs -applicationId application_1718000000000_4321 \
  -containerId container_e123_1718000000000_4321_01_000001 \
  > ~/tez-notes/hive-h3-amlog.txt

The AM log contains the DAGAppMaster lifecycle, vertex state transitions, and diagnostics aggregated from failing tasks.

Search for our failing task:

grep -n "task_1718000000000_4321_1_00_000003" ~/tez-notes/hive-h3-amlog.txt | head

You will see lines like:

2024-06-10 14:22:11,432 [INFO ] TaskImpl - task_..._000003 transitioned from SCHEDULED to RUNNING
2024-06-10 14:22:13,108 [INFO ] TaskAttemptImpl - attempt_..._000003_0 transitioned from RUNNING to FAILED
2024-06-10 14:22:13,108 [WARN ] TaskImpl - Diagnostics for ..._000003_0:
  Container ..._000007 failed.
  Exit code: 1
  ... [Last 4096 bytes of stderr] ...

The "Last 4096 bytes of stderr" is the AM's view of why the container died. It's truncated. For the full container log, hop 3.


Step 3: Get the Container Log

The container ID from the AM log (container_..._000007) is the worker. Its log:

yarn logs -applicationId application_1718000000000_4321 \
  -containerId container_e123_1718000000000_4321_01_000007 \
  > ~/tez-notes/hive-h3-container-007.txt

The container log contains the full stdout and stderr from the Tez task runtime (LogicalIOProcessorRuntimeTask), including all logged exceptions and any user-code output.

The container log structure:

LogType:stdout
...
LogType:syslog
2024-06-10 14:22:12,856 [INFO ] LogicalIOProcessorRuntimeTask - Initializing task ...
2024-06-10 14:22:12,891 [INFO ] MRInput - Initializing MRInput for ...
2024-06-10 14:22:13,007 [WARN ] MRInput - ...
2024-06-10 14:22:13,084 [ERROR] LogicalIOProcessorRuntimeTask - Failed to execute task
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
  Hive Runtime Error while processing row {"a":3,"b":"q"}
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:418)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:223)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
        at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
        ...
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load UDF X
        at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:1893)
        at org.apache.hadoop.hive.ql.exec.UDFBridge.<init>(UDFBridge.java:54)
        ...
Caused by: java.lang.ClassNotFoundException: com.example.udf.X
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        ...
LogType:stderr

This is the actual exception. The Caused by: chain walks from Hive's wrapping exception down to the JVM-level cause.


Step 4: Walk the Exception

Reading the trace top-down for our example:

FrameTells you
java.lang.RuntimeExceptionContainer exit, generic
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"a":3,"b":"q"}Hive boundary; you know the input row
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow:91Hive Tez map-side row processor
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run:418Hive Tez map record processor
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run:223Hive's Tez Processor adapter
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run:374Tez runtime task
org.apache.tez.runtime.task.TaskRunner2Callable...Tez runtime task launcher

Now the Caused by: chain:

CauseTells you
HiveException: Unable to load UDF XThe proximate Hive problem
ClassNotFoundException: com.example.udf.XThe root: classloader can't find UDF

So the root cause is a UDF class missing from the classpath of the Tez task. That's a Hive (or user) issue, not a Tez issue. See Lab H4 for how to make that attribution rigorously.


Step 5: Attribute the Failure

Apply the decision rule from H4 (preview):

The package of the top frame whose code you can change indicates the project.

Top frames in order:

  1. java.lang.RuntimeException — JVM, not actionable.
  2. org.apache.hadoop.hive.ql.metadata.HiveException — Hive, but generic wrap; keep walking.
  3. org.apache.hadoop.hive.ql.exec.tez.MapRecordSource:91 — Hive code, specific. Stop here for the top frame: this is Hive's MapRecordSource.

Then the Caused by: chain:

  1. HiveException: Unable to load UDF X — Hive.
  2. ClassNotFoundException: com.example.udf.X — root cause.

Attribution: Hive (the proximate code is MapRecordSource) and user (the missing class is the user's UDF jar). Tez is not at fault — it correctly ran the task, the Hive code, and surfaced the exception. Tez's job is to provide a stack trace, which it did.

The fix is to ensure the UDF jar is on the AuxJar list:

ADD JAR /path/to/udf.jar;

or in hive-site.xml:

<property>
  <name>hive.aux.jars.path</name>
  <value>file:///opt/hive/auxlib/udf.jar</value>
</property>

Tooling Shortcuts

Get all container logs at once

yarn logs -applicationId application_1718000000000_4321 \
  > ~/tez-notes/hive-h3-all.txt

For a large DAG with many containers, this is large (often 100s of MB). Use the per-container form when you know which one to look at.

Search across container logs

grep -B2 -A20 "java.lang.\|Caused by" ~/tez-notes/hive-h3-all.txt | head -100

Find the failing task fast

grep "FAILED\|state changed.*FAILED\|attempt.*FAILED" ~/tez-notes/hive-h3-amlog.txt

Tez UI shortcut

If your cluster has the Tez UI, the per-task log links are one click. The UI URL pattern:

http://<tez-ui-host>:9999/tez-ui/#/tez-dag/dag_1718000000000_4321_1

From that page, navigate to Map 1 → task 000003 → attempt 0 → "logs". The UI fetches the container log automatically.


A Second Worked Example — Tez Runtime Failure

Console:

Vertex failed, vertexName=Reducer 2, ...
Container ... failed. Exit code: 1

Container log top of stack:

java.io.IOException: Failed on local exception: java.io.IOException: Failed to fetch shuffle data
        at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.copyFailed(ShuffleScheduler.java:391)
        at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Fetcher.copyFromHost(Fetcher.java:355)
        at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Fetcher.run(Fetcher.java:262)
        ...
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        ...

Top actionable frame: org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler:391.

Attribution: Tez runtime library. Specifically the shuffle fetcher. The root cause — ConnectException: Connection refused — points to the upstream task's container being gone (killed, evicted, or networked away). Investigation continues into the upstream container's log.

This is the canonical Tez shuffle failure shape. The reproduction is in H5.


A Third Worked Example — AM Failure

Console:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask.
Application application_1718000000000_4321 failed with state FAILED.
Diagnostics: Application application_1718000000000_4321 failed 2 times due to AM Container ... exited with exitCode: -103 ...

The AM itself died. Container log of the AM:

[ERROR] DAGAppMaster - Caught exception while running DAGAppMaster
java.lang.OutOfMemoryError: Java heap space
        at org.apache.tez.dag.app.dag.impl.VertexImpl.<init>(VertexImpl.java:412)
        ...

Top frame: org.apache.tez.dag.app.dag.impl.VertexImpl:412. Attribution: Tez AM. Root cause: AM heap too small for the DAG (tez.am.resource.memory.mb). Fix is configuration; if reproducible at the default, file a JIRA against Tez requesting either a smarter default or a sizing recommendation.


Validation Artifacts

For our first example, save:

  1. The console error verbatim (~/tez-notes/hive-h3-console.txt).
  2. The parsed-identifiers table (Application ID, DAG ID, Vertex ID, Task ID, Container ID).
  3. The AM log fragment showing the task transition to FAILED.
  4. The container log fragment showing the full exception with Caused by: chain.
  5. The attribution paragraph: which project owns the bug, and why.
  6. The fix you propose.

Once you can produce that artifact for an arbitrary Hive-on-Tez failure, you can debug one. The next lab — Lab H4: Bug Attribution — makes the attribution rigorous with a decision tree and four more worked examples.

Lab H4: Bug Attribution

Background

A failing Hive-on-Tez query may be a Hive bug, a Tez runtime bug, a Tez AM bug, a YARN bug, a Hadoop common bug, a JVM bug, a user bug, or an infrastructure bug. Filing it on the wrong project wastes the reporter's time and the maintainer's. This lab gives you a mechanical decision tree to attribute correctly from a stack trace, plus four worked examples.


The Decision Tree

Given a stack trace (after Lab H3 has surfaced it):

flowchart TD
  S[Start: have stack trace]
  S --> T1[Find top frame whose package you can change]
  T1 --> P{Package prefix?}
  P -->|org.apache.hadoop.hive.*| H[Hive bug]
  P -->|org.apache.tez.runtime.library.*| TR[Tez runtime library<br/>tez-runtime-library]
  P -->|org.apache.tez.runtime.*<br/>not .library| TRI[Tez runtime internals<br/>tez-runtime-internals]
  P -->|org.apache.tez.dag.app.*| TA[Tez AM<br/>tez-dag]
  P -->|org.apache.tez.dag.api.*| TC[Tez client / API<br/>tez-api]
  P -->|org.apache.tez.client.*| TC
  P -->|org.apache.hadoop.yarn.*| Y[YARN bug]
  P -->|org.apache.hadoop.hdfs.*| HD[HDFS bug]
  P -->|org.apache.hadoop.mapred.*| MR[Hadoop MR compat<br/>tez-mapreduce]
  P -->|user package| U[User code bug]
  P -->|java.*, sun.*| J[Walk down to next frame]
  J --> T1
  H --> CD[Then check Caused by chain]
  TR --> CD
  TRI --> CD
  TA --> CD
  Y --> CD
  HD --> CD
  CD --> R[Root cause may shift attribution]
  R --> END[File on the project that owns the actionable code]

The rule in one sentence: find the top frame in actionable code, name its package prefix, and read off the project.


Package → Project → Module Table

Package prefixProjectModule / areaWhere to file
org.apache.hadoop.hive.ql.exec.tez.*HiveTez integrationhttps://issues.apache.org/jira/projects/HIVE
org.apache.hadoop.hive.ql.exec.* (not .tez)HiveOperatorsHIVE JIRA
org.apache.hadoop.hive.ql.metadata.*HiveMetadata / UDFHIVE JIRA
org.apache.hadoop.hive.serde2.*HiveSerializationHIVE JIRA
org.apache.hadoop.hive.* (any other)HiveCoreHIVE JIRA
org.apache.tez.runtime.library.*Teztez-runtime-libraryTEZ JIRA
org.apache.tez.runtime.task.*Teztez-runtime-internalsTEZ JIRA
org.apache.tez.runtime.* (not .library, not .task)Teztez-runtime-internalsTEZ JIRA
org.apache.tez.dag.app.dag.impl.*Teztez-dag (state machines)TEZ JIRA
org.apache.tez.dag.app.rm.*Teztez-dag (RM client / container scheduling)TEZ JIRA
org.apache.tez.dag.app.launcher.*Teztez-dag (container launcher)TEZ JIRA
org.apache.tez.dag.app.* (other)Teztez-dag (AM core)TEZ JIRA
org.apache.tez.dag.api.*Teztez-api (DAG / Vertex / Edge)TEZ JIRA
org.apache.tez.client.*Teztez-api (TezClient)TEZ JIRA
org.apache.tez.mapreduce.*Teztez-mapreduce (MRInput/MROutput)TEZ JIRA
org.apache.hadoop.yarn.client.*YARNClientHADOOP JIRA, component YARN
org.apache.hadoop.yarn.server.resourcemanager.*YARNRMHADOOP YARN
org.apache.hadoop.yarn.server.nodemanager.*YARNNMHADOOP YARN
org.apache.hadoop.hdfs.*HDFSClient / DN / NNHADOOP HDFS
org.apache.hadoop.mapred.*MR compattez-mapreduce for MR-on-TezTEZ JIRA
org.apache.hadoop.io.* / .fs.* / .conf.*Hadoop commonhadoop-commonHADOOP COMMON
com.<user>.* / org.<user>.* (not apache)User coden/aFix locally
java.*, sun.*, jdk.*JVMwalk down(not the cause; keep looking)

Verify the modules against your tree:

find ~/tez-src -maxdepth 2 -name pom.xml | sort
find ~/hive-src -maxdepth 3 -name pom.xml | head

Example 1: UDF Not Found (Hive bug → User bug)

Trace (from Lab H3):

java.lang.RuntimeException: ...
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:418)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:223)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(...)
        ...
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load UDF X
        at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:1893)
        ...
Caused by: java.lang.ClassNotFoundException: com.example.udf.X

Apply the tree:

  1. Top actionable frame: org.apache.hadoop.hive.ql.exec.tez.MapRecordSource:91.
  2. Package: org.apache.hadoop.hive.ql.exec.tez.*.
  3. Project: Hive (the Tez integration code).
  4. Check Caused by: root is ClassNotFoundException: com.example.udf.X — a user class.
  5. Adjust: this is user error (their UDF jar isn't on the classpath), surfaced by Hive's UDF registry, surfaced by Hive's Tez integration. No bug to file.

Fix: ADD JAR or hive.aux.jars.path.

If the same trace came with Caused by: ClassNotFoundException: org.apache.hadoop.hive.ql.exec.UDFBridge, then the root is a Hive class missing from the Hive distribution — file on HIVE.


Example 2: Shuffle Fetch Failure (Tez runtime bug)

Trace:

java.io.IOException: Failed to fetch shuffle data
        at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler.copyFailed(ShuffleScheduler.java:391)
        at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Fetcher.copyFromHost(Fetcher.java:355)
        at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Fetcher.run(Fetcher.java:262)
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)

Apply the tree:

  1. Top actionable frame: org.apache.tez.runtime.library.common.shuffle.orderedgrouped.ShuffleScheduler:391.
  2. Package: org.apache.tez.runtime.library.*.
  3. Project: Tez, module tez-runtime-library.
  4. Check Caused by: ConnectException — network.
  5. Adjust: the root is a network/infra failure. The shuffle code surfaced it correctly; not a bug in itself. But:
    • If this happens once with sporadic node failures: infrastructure issue, no bug.
    • If this happens frequently and the fetcher isn't retrying enough times before giving up: Tez bug — file on TEZ asking to bump or expose tez.runtime.shuffle.connect.timeout/retry counts.
    • If the upstream container died because of an AM scheduling bug: Tez AM bug, file on TEZ with the AM log evidence.

Verify the retry config:

grep "shuffle.connect\|shuffle.fetch.retry\|shuffle.read.timeout" \
  ~/tez-src/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/api/TezRuntimeConfiguration.java

Example 3: AM OOM During DAG Submit (Tez AM bug)

AM container log:

[ERROR] DAGAppMaster - Caught exception while running DAGAppMaster
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3210)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(...)
        ...
        at com.google.protobuf.ByteString.copyFrom(ByteString.java:194)
        at org.apache.tez.dag.api.records.DAGProtos$DAGPlan.toBuilder(DAGProtos.java:...)
        at org.apache.tez.dag.app.dag.impl.VertexImpl.<init>(VertexImpl.java:412)
        at org.apache.tez.dag.app.DAGAppMaster.createDAG(DAGAppMaster.java:...)

Apply the tree:

  1. Top actionable frame: skip JVM/protobuf frames. First Tez frame: org.apache.tez.dag.app.dag.impl.VertexImpl:412.
  2. Package: org.apache.tez.dag.app.*.
  3. Project: Tez, module tez-dag (AM).
  4. Check Caused by: none — just the OOM.

Attribution: Tez AM. The proximate cause is constructing VertexImpl from a large DAGPlan. Three possible JIRA shapes:

  • "Tez AM OOMs on submission of N-vertex DAG at default tez.am.resource.memory.mb" — file requesting smarter sizing or doc.
  • "VertexImpl construction allocates O(N²) memory in inputs" — file with a profile and a fix suggestion.
  • "DAGPlan toBuilder() materialises a full copy" — file as a perf bug.

The correct shape depends on profile evidence. Without profiling, file the sizing/doc variant first; the deeper variants follow.


Example 4: NodeManager Lost (YARN bug)

AM log:

2024-06-10 ... [WARN ] AMContainerImpl - Container container_..._000007 transitioned from RUNNING to STOPPED. exitStatus -100
2024-06-10 ... [WARN ] DAGAppMaster - Container ..._000007 completed unexpectedly; will be rescheduled
2024-06-10 ... [WARN ] RMContainerRequestor - Lost node nm-12.example.com
2024-06-10 ... [INFO ] DAGAppMaster - Marking task attempt as failed due to lost node: attempt_..._000003_0

Apply the tree:

  1. Top frame in trace: org.apache.tez.dag.app.rm.AMContainerImpl — but this is the AM's correct reaction to a node loss, not a bug.

  2. The substantive cause is "NodeManager nm-12 lost" — diagnose by checking NodeManager log on that host:

    yarn node -list -all | grep nm-12
    tail -200 /var/log/hadoop-yarn/yarn-nodemanager.log  # on nm-12
    
  3. Common nm-side root causes:

    • NM heap OOM (NM stops responding to RM heartbeats) → YARN bug or NM tuning.
    • Network partition → infra.
    • Disk full on NM local-dirs → ops issue.

Attribution:

  • If NM died from OOM, file on HADOOP YARN.
  • If Tez AM didn't reschedule the lost task correctly, file on TEZ. But the AM log here shows correct reaction, so that's not in play.
  • If Tez's TaskScheduler retried the task on the same lost node repeatedly, file on TEZ (a scheduler awareness issue).

Cross-Project Patterns

Some failure modes have a well-known cross-project shape. Memorise the shapes:

ShapeLikely projectQuick diagnostic
ClassCastException inside MapRecordSource / ReduceRecordSourceHive (schema mismatch in vectorization)Check EXPLAIN VECTORIZATION DETAIL
IOException: Stream is closed in shuffle readerTez runtime libraryCheck upstream container alive
TaskCommitDeniedExceptionTez AM speculative-exec coordinationCheck tez.am.speculation.enabled
NoSuchMethodError on a Tez or Hive classVersion skewCheck classpath; check mvn dependency:tree
IllegalArgumentException: Wrong FSHadoop FSCheck fs.defaultFS, core-site.xml
Container killed by OOM killer (exit code 137)YARN or workloadCheck container memory request vs JVM heap
org.apache.hadoop.security.AccessControlExceptionHDFS or Hive RangerPermissions issue, not a code bug

What to Do With the Attribution

Having attributed correctly:

AttributionAction
HiveFile on https://issues.apache.org/jira/projects/HIVE with Tez in summary if relevant
Tez tez-runtime-libraryFile on https://issues.apache.org/jira/projects/TEZ, component Runtime Library
Tez tez-runtime-internalsFile on TEZ, component Runtime Internals
Tez tez-dag (AM)File on TEZ, component AM
Tez tez-apiFile on TEZ, component Client / API
Tez tez-mapreduceFile on TEZ, component MR Compat
YARNFile on https://issues.apache.org/jira/projects/HADOOP, component YARN
HDFSFile on HADOOP, component HDFS
UserFix locally, no JIRA
InfrastructureOperations issue, no JIRA
Multiple (Hive needs change AND Tez needs change)File on both, cross-reference

In all cases, the JIRA description follows the skeleton in Design via JIRA.


Validation Artifacts

After this lab:

  1. The decision tree printed and pinned at your desk (or in ~/tez-notes/).
  2. The Package → Project → Module table memorised or saved as ~/tez-notes/hive-h4-attribution.md.
  3. Four attributions, one for each worked example, written out in your own words.
  4. The reflex: never file a JIRA on a project whose code does not appear in the top of the actionable stack.

The next lab — Lab H5: Reproducing Bugs — covers how to turn an attributed bug into a minimum reproducer suitable to attach to a JIRA.

Lab H5: Reproducing Bugs

Background

A JIRA without a reproducer drifts. A JIRA with a clean reproducer gets attention. "Clean" means: minimal schema, minimal data, minimal query, runnable in under a minute on a local MiniTezCluster or MiniHS2. This lab is the procedure.

The Hive integration test framework (hive-itests) is the source of every pattern you need. Reading its existing tests is the cheapest education.


The Three Reduction Axes

To minimise a reproducer, reduce along three independent axes:

AxisReduceStop reducing when
SchemaDrop unused columns; simplify typesRemoving a column makes the bug disappear
DataReduce row count; generate synthetic dataReducing rows makes the bug disappear
QueryDrop joins, predicates, projectionsDropping a clause makes the bug disappear

The goal is the smallest schema × smallest data × smallest query that still reproduces.


Setup — Local MiniHS2 + MiniTezCluster

MiniHS2 is a single-JVM HiveServer2 that runs against a MiniTezCluster (a single-JVM YARN). Together they let you reproduce a Hive-on-Tez bug in seconds without an external cluster.

Existing reference in your tree:

find ~/hive-src/itests -name "MiniHS2.java" | head
find ~/hive-src/itests -name "TestMiniLlapVectorArrowWithLlapIODisabled.java" | head
find ~/tez-src/tez-tests -name "MiniTezCluster.java"

A reproducer test class skeleton (Hive 3/4 style):

public class TestMyBugRepro {
  private MiniHS2 miniHS2;

  @Before
  public void setUp() throws Exception {
    HiveConf conf = new HiveConf();
    conf.set("hive.execution.engine", "tez");
    conf.set("tez.lib.uris",
        "file://" + System.getProperty("tez.lib.dir"));
    miniHS2 = new MiniHS2.Builder()
        .withConf(conf)
        .withMiniMR()                  // brings up MiniTezCluster
        .build();
    miniHS2.start(new HashMap<>());
  }

  @After
  public void tearDown() throws Exception {
    miniHS2.stop();
  }

  @Test
  public void reproBug() throws Exception {
    try (Connection c = DriverManager.getConnection(miniHS2.getJdbcURL());
         Statement s = c.createStatement()) {
      s.execute("CREATE TABLE t (...) STORED AS ORC");
      s.execute("INSERT INTO t VALUES (...)");
      ResultSet rs = s.executeQuery("SELECT ...");
      // assert behaviour or expect exception
    }
  }
}

Run with mvn test -pl itests -Dtest=TestMyBugRepro.


Reducing the Schema

Starting from a real production table with 200 columns, reduce iteratively:

  1. Identify referenced columns. Read the failing query; note which columns the SELECT, WHERE, GROUP BY, JOIN, ORDER BY actually reference.
  2. Drop everything else. Make a new test schema with only the referenced columns.
  3. Re-run. Does the bug still reproduce? If yes, you've reduced. If no, you've found a column that's load-bearing; add it back and look for why.
  4. Simplify remaining types. Replace DECIMAL(38,10) with DECIMAL(10,2) if the bug doesn't depend on precision. Replace STRUCT<...> with STRING if you can. Replace partition columns with non-partitioned tables unless the partition is load-bearing.
  5. Stop when reduction breaks the repro.

For our running example query:

SELECT a, COUNT(*) FROM t GROUP BY a ORDER BY a;

Only column a is referenced. Schema reduces to:

CREATE TABLE t (a INT) STORED AS ORC;

If the bug needs the second column for some reason (e.g. ORC stripe layout), keep it.


Reducing the Data — JoinDataGen Pattern

Hive's itests includes data generators for systematic minimisation. The most common pattern is JoinDataGen for generating join inputs at controlled cardinalities:

find ~/hive-src -name "JoinDataGen*.java" -o -name "*DataGen*.java" | head

The pattern (adapt for your bug):

public final class TestDataGen {
  public static void writeIntRows(String tableName, int rowCount, int distinctKeys,
                                   Statement s) throws SQLException {
    Random r = new Random(42);
    StringBuilder values = new StringBuilder();
    for (int i = 0; i < rowCount; i++) {
      if (i > 0) values.append(",");
      values.append("(").append(r.nextInt(distinctKeys)).append(")");
    }
    s.execute("INSERT INTO " + tableName + " VALUES " + values);
  }
}

Reduce data:

  1. Start with original data size. 1 billion rows? Reduce to 1 million.
  2. Halve until bug disappears. Binary-search the row count: 1M → 500K → 250K → ...
  3. At the smallest row count that still repros, vary distinct-key count. Bug may need 5 distinct keys (skew) or 500K (cardinality). Find which.
  4. Vary value distribution. If the bug needs a skewed distribution (one key gets 90% of rows), generate that explicitly.
  5. Document the minimum. "Bug reproduces at >= 1024 rows with >= 8 distinct keys."

For our running example, with no actual bug, the minimum is whatever you need to exercise the GROUP BY + ORDER BY path — single-digit rows are enough.


Reducing the Query

Remove clauses one at a time and re-test:

  1. Remove ORDER BY → does the bug still happen? (Probably not, if the bug is in the total-order reducer.)
  2. Remove the aggregate → does the bug still happen?
  3. Remove WHERE predicates one at a time.
  4. Remove JOINs; if the join is the cause, simplify to a 2-table join, then to a tiny-on-tiny join.
  5. Replace MAP JOIN with SHUFFLE JOIN by disabling map joins (hive.auto.convert.join=false) and re-test.

A reproducer query of 3 lines beats a reproducer query of 30 lines, even for the same bug.


Capturing the Artifacts

A complete bug-report artifact set:

ArtifactWhy
CREATE TABLE DDL for every table involvedReproducer setup
Data generation code or inline INSERT valuesReproducer setup
The minimal queryThe test
SET hive.* lines that were necessaryConfiguration
The expected behavior (correct result)Oracle
The actual behavior (incorrect result or exception)Symptom
EXPLAIN FORMATTED outputPlan
AM log fragment showing failureDiagnostic
Container log fragment showing exceptionDiagnostic
Tez and Hive versionVersion

Bundle into a single artifact:

cd ~/tez-notes
mkdir hive-h5-repro
cp ddl.sql hive-h5-repro/
cp gen.sql hive-h5-repro/
cp query.sql hive-h5-repro/
cp explain.txt hive-h5-repro/
cp amlog-fragment.txt hive-h5-repro/
cp container-log-fragment.txt hive-h5-repro/
cat > hive-h5-repro/README.md <<EOF
# Repro for HIVE-XXXXX / TEZ-XXXX

Tez version: 0.10.X
Hive version: 4.0.X
Hadoop version: 3.3.X
JDK: 11

Setup:  hive -f ddl.sql && hive -f gen.sql
Repro:  hive -f query.sql
Expected: rows = N, max value = M.
Actual:   exception in container log (see container-log-fragment.txt).
EOF
tar czf hive-h5-repro.tar.gz hive-h5-repro/

Attach hive-h5-repro.tar.gz to the JIRA. A reproducer in this shape gets opened by maintainers; one without these elements doesn't.


When MiniTezCluster Doesn't Reproduce

A bug that reproduces on a production cluster but not on MiniTezCluster is the worst shape. Common causes:

CauseDiagnostic
Multi-node shuffle behavior; mini cluster is single-nodeForce multiple containers per node; can't fully simulate
Container OOM at production memory; mini cluster doesn't have memory pressureConfigure mini cluster with tight memory limits
Concurrent DAG submissions; mini cluster has noneRun multiple parallel tests
ORC stripe layout; needs production-size filesGenerate larger ORC files
Production data distribution; mini cluster has uniformUse realistic random seed and distribution
Speculative execution; not enabled in mini by defaultEnable with tez.am.speculation.enabled=true

If none of these reduce, the bug may be in cluster-only code paths (RM scheduling edge cases). Document that the reproducer requires N nodes and attach what evidence you have.


A Worked Reproducer — Hypothetical Bug

Suppose a bug: COUNT(*) returns 0 when input table has exactly 1024 rows and vectorization is enabled. (Imaginary; for the pattern.)

Schema

CREATE TABLE t (a INT) STORED AS ORC;

Data

INSERT INTO t SELECT col1 FROM dual WHERE 1=0;  -- placeholder
-- repeat to produce exactly 1024 rows:
INSERT INTO t SELECT pos AS a FROM (
  SELECT explode(sequence(1, 1024)) AS pos
) s;

(Hive's explode(sequence(...)) may or may not be available depending on version; use the equivalent for your version.)

Query

SET hive.vectorized.execution.enabled=true;
SELECT COUNT(*) FROM t;

Expected vs Actual

Expected: 1024
Actual:   0

EXPLAIN

EXPLAIN VECTORIZATION DETAIL SELECT COUNT(*) FROM t;

Save the output. Look for Execution mode: vectorized and any odd Vectorized: false on a key operator.

Trial Reductions

  • 1023 rows: bug? No.
  • 1024 rows: bug.
  • 2048 rows: bug? Test.
  • Vectorization off: bug? Reset.

Document the conditions:

Bug reproduces at:
  - row count exactly 1024
  - hive.vectorized.execution.enabled=true
Bug does NOT reproduce at:
  - row count != 1024
  - hive.vectorized.execution.enabled=false

That's a sharp, actionable bug. Attribution (by Lab H4): likely Hive's vectorized aggregation code path. File on HIVE.


Production-to-Test Translation

When a real production bug is reported to you with no reproducer:

  1. Get the query. From the user, from hive.log (hive.server2.logging.operation.enabled), or from HiveServer2 audit logs.
  2. Get the schema. Run SHOW CREATE TABLE on each involved table; copy.
  3. Get a sample of data. A few hundred to a few thousand rows. Anonymise PII if needed.
  4. Get the version triplet. Tez / Hive / Hadoop.
  5. Reproduce. Stand up MiniHS2, load the schema, load the sample data, run the query.
  6. If it reproduces, reduce. Apply the three axes.
  7. If it doesn't reproduce, expand. More data, more nodes, more concurrency.

A one-day cycle for a complex production bug is fast. A one-week cycle is realistic for something subtle.


Validation Artifacts

After this lab:

  1. A complete reproducer artifact (a hive-h5-repro.tar.gz-style bundle) for a real or imagined Hive-on-Tez bug.
  2. A TestMyBugRepro.java skeleton you can adapt.
  3. The three-axes reduction discipline applied at least once.
  4. The reflex to capture the version triplet (Tez/Hive/Hadoop) on every reproducer.

The next lab — Lab H6: Diagnostics — covers what to do when you can't reproduce locally and need to ask the production reporter to capture more data.

Lab H6: Writing a Diagnostic Patch

Background

You have a Hive-on-Tez bug report from production. You can't reproduce locally (Lab H5 didn't work). You need more data. The way to get it is a diagnostic patch — a small change that adds logging, counters, or a debug toggle without changing behavior, attached to the JIRA, that the reporter can apply and re-run.

A well-shaped diagnostic patch:

  1. Adds boundary-INFO logging at the suspected fault site.
  2. Adds a TezCounter so the data is captured in the standard counter mechanism.
  3. Adds a debug-only TezConfiguration switch so the cost is opt-in.

This lab walks the three patterns.


Pattern 1: Boundary INFO Logging

A "boundary" is the point at which control flows from one subsystem to another:

BoundaryExample
Hive → Tez submitTezTask.executeTezSession.submitDAG
Tez AM → ContainerDAGAppMaster.scheduleTaskAttemptContainerLauncherImpl.launch
Container → TaskLogicalIOProcessorRuntimeTask.runProcessor.run
Task → Input shuffleOrderedGroupedKVInput.start
Task → Output shuffleOrderedPartitionedKVOutput.start

INFO at a boundary is cheap, lasts the lifetime of a task, and gives the next debugger a structured trail.

Example patch (illustrative diff)

Suppose the bug is "DAG submission occasionally takes >10s on large DAGs." A diagnostic patch in TezTask:

diff --git a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java b/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
index abcdef1..2345678 100644
--- a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
+++ b/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
@@ -201,7 +201,12 @@ public class TezTask extends Task<TezWork> {
   private DAGClient submit(DAG dag, TezSessionState session) throws Exception {
+    long submitStartNs = System.nanoTime();
+    int dagPlanBytes = dag.createDag(conf, null, null, null, false).getSerializedSize();
+    LOG.info("HIVE-XXXX diag: about to submitDAG, dagName={}, vertices={}, planBytes={}",
+        dag.getName(), dag.getVertices().size(), dagPlanBytes);
     DAGClient client = session.getSession().submitDAG(dag);
+    LOG.info("HIVE-XXXX diag: submitDAG returned in {} ms",
+        TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - submitStartNs));
     return client;
   }

Rules for the patch:

  • Tag every log line with the JIRA ID. The reporter greps for HIVE-XXXX diag: to find your data.
  • INFO, not DEBUG. The reporter must not have to change log levels.
  • Structured key=value or {} placeholders. Easy to parse.
  • Cheap. Measure or log only what's needed; no full-DAG dumps unless explicitly asked.

Pattern 2: A New TezCounter

Counters are the production-safe way to surface a number. They aggregate across tasks and are visible in the Tez UI, in hive.exec.print.summary output, and in the AM log.

Define a new counter

Tez counters are enums. The Tez-side counters:

find ~/tez-src -name "DAGCounter.java" -o -name "TaskCounter.java"
public enum TaskCounter {
  // ... existing ...
  REDUCE_INPUT_GROUPS,
  REDUCE_OUTPUT_RECORDS,
  // new for diagnostic:
  /** TEZ-XXXX diag: number of shuffle fetch retries on this task. */
  SHUFFLE_FETCH_RETRIES,
}

The Hive-side counters live in:

find ~/hive-src -name "OperatorVariation.java" -o -name "HiveCounter*.java" | head

For Hive-side, use a Reporter.incrCounter or operator counter mechanism, depending on Hive version.

Increment it where it matters

In the suspected hot spot:

-        copyFromHost(host);
+        try {
+          copyFromHost(host);
+        } catch (IOException e) {
+          context.getCounters().findCounter(TaskCounter.SHUFFLE_FETCH_RETRIES).increment(1);
+          throw e;
+        }

After the reporter runs with the patch:

SET hive.exec.print.summary=true;
-- repro the bug

The summary will show SHUFFLE_FETCH_RETRIES = N per task, surfacing data that was previously invisible.

Counters vs logs

AspectCounterLog
Aggregation across tasksAutomaticManual
Production safetyHighHigh
PersistenceLong (ATS / Tez UI)Short (containerlog rotation)
Detail per eventNone (just a count)Full message
CostNear zeroLow to moderate

Use both for big diagnostics: a counter to know "this happened N times" and a log to know "and the first time, here's what it looked like."


Pattern 3: A Debug TezConfiguration Switch

For more invasive diagnostics — extra log lines that would be too noisy by default, or extra checks that have a measurable cost — gate them behind a config switch.

Define the switch

In TezConfiguration (Tez side) or HiveConf (Hive side):

// TezConfiguration.java
@Private @Unstable
public static final String TEZ_AM_DIAGNOSTICS_VERBOSE = "tez.am.diagnostics.verbose";
public static final boolean TEZ_AM_DIAGNOSTICS_VERBOSE_DEFAULT = false;

Use @Private and @Unstable for a diagnostic key — see Compatibility. It signals "this is not a supported API, may be removed once the bug is fixed."

Gate the diagnostic

private final boolean verboseDiagnostics;

public VertexImpl(...) {
  this.verboseDiagnostics = conf.getBoolean(
      TezConfiguration.TEZ_AM_DIAGNOSTICS_VERBOSE,
      TezConfiguration.TEZ_AM_DIAGNOSTICS_VERBOSE_DEFAULT);
}

public void scheduleTasks(...) {
  // ... existing logic ...
  if (verboseDiagnostics) {
    LOG.info("TEZ-XXXX diag: scheduling {} tasks for vertex {}; first task locations: {}",
        tasksToSchedule.size(), getName(),
        tasksToSchedule.subList(0, Math.min(5, tasksToSchedule.size())));
  }
}

Reporter applies the patch and turns it on:

SET tez.am.diagnostics.verbose=true;
-- repro the bug
SET tez.am.diagnostics.verbose=false;

When the bug is diagnosed, the switch is removed in the proper fix. It is not a supported config — the JIRA tracks both the diagnostic patch (to be reverted) and the real fix.


Assembling the Diagnostic Patch

A complete patch for attachment:

  1. One new INFO log line at the boundary you suspect.
  2. One new counter if there's a count to track.
  3. One debug switch if the diagnostic has cost.
  4. JIRA description with:
    • What the patch adds.
    • How to apply it.
    • How to enable any switch.
    • What output the reporter should capture and attach.
  5. Test that the patch compiles and runs the existing tests — diagnostic patches must not change behavior.

Skeleton JIRA comment

Diagnostic patch attached: TEZ-XXXX.diag.001.patch

Adds:
  - INFO log "TEZ-XXXX diag:" in VertexImpl.scheduleTasks
  - TaskCounter SHUFFLE_FETCH_RETRIES
  - Config switch tez.am.diagnostics.verbose (default false)

To reproduce with the patch:

  1. Apply: git apply TEZ-XXXX.diag.001.patch
  2. Build: mvn install -DskipTests
  3. Run query that reproduces the issue, with:
       SET tez.am.diagnostics.verbose=true;
       SET hive.exec.print.summary=true;
  4. Attach:
       - The AM log (yarn logs -applicationId ...)
       - The full Tez summary output
       - One container log from a failing task

Will use the data to file a proper fix. The diagnostic patch is not for commit;
the fix patch will be separate.

Thanks,
<First>

When the Reporter Can't Apply a Patch

Some reporters can't patch their cluster (locked-down enterprise environment). In that case:

  1. Ask for what data they can capture: AM log, container logs, Tez UI screenshots, counter values.
  2. Tell them which existing INFO-level logs to grep for.
  3. Tell them which existing counters to read off.
  4. If a config switch already exists that increases logging, point at it (e.g. tez.am.history.logging.enabled=true, tez.runtime.shuffle.connect.timeout=X).

You don't always get a diagnostic patch onto the cluster. The skill is to plan the diagnostic so you get as much as possible from what's already shipped.


After the Diagnostic Runs

The reporter attaches:

  • AM log with TEZ-XXXX diag: lines.
  • Tez summary with counters.
  • Container log fragments.

You analyse:

  1. What did the diagnostic counter show?
  2. What did the diagnostic log line tell you?
  3. Where is the actual root cause?

Once you know the root cause:

  1. File a separate JIRA for the real fix (or repurpose the diagnostic JIRA).
  2. Attach a proper fix patch (no diag, no INFO noise, no @Private config keys).
  3. Note in the JIRA comment that the diagnostic patch is being abandoned in favor of the fix patch.

Worked Example — Slow DAG Submission

Production bug: "Hive query takes 30 seconds in TezTask.submit before the DAG starts running on large DAGs."

Diagnostic patch

(As above) — adds two INFO lines in TezTask.submit capturing time and DAGPlan size.

Reporter runs

HIVE-XXXX diag: about to submitDAG, dagName=Hive_..., vertices=347, planBytes=8421567
HIVE-XXXX diag: submitDAG returned in 28341 ms

Analysis

  • 347 vertices: large DAG, but not absurd.
  • DAGPlan 8.4 MB: very large.
  • 28 seconds for submitDAG: most likely RPC + protobuf parse on the AM side.

Further diagnosis

Add a Tez-side INFO in DAGAppMaster.submitDAGToAppMaster:

TEZ-XXXX diag: received DAGPlan of {} bytes, deserializing
TEZ-XXXX diag: createDAG completed in {} ms
TEZ-XXXX diag: VertexImpl construction completed in {} ms

Re-run on the cluster. Pinpoint where the 28 seconds go.

Likely result: VertexImpl construction is O(N²) in vertex count for some reason. File the fix patch with the profile evidence.


What a Diagnostic Patch Is Not

  • Not a place to add unrelated improvements.
  • Not a permanent feature; it gets reverted after the bug is fixed.
  • Not a substitute for a proper reproducer (combine both when you can).
  • Not a place to use @Public APIs — diagnostic config keys are @Private, @Unstable.
  • Not committable to trunk as-is.

Validation Artifacts

After this lab:

  1. A ~/tez-notes/diag-patch-template.md containing the JIRA-comment template above.
  2. One worked diagnostic patch (real or imagined) following the three patterns.
  3. The reflex to tag every diagnostic log line with the JIRA ID.
  4. The discipline to file diagnostic and fix as separate JIRAs (or stages of one JIRA).

This lab closes the Hive-on-Tez Labs section. You now have the full toolkit: trace a SQL query into a DAG, capture the DAG four ways, walk a failure to its root, attribute it to the right project, reproduce it minimally, and instrument it for remote diagnosis. That toolkit is the practising-Tez-committer skill at the Tez/Hive boundary.

Release & PMC Reality

This section takes you inside the committer and PMC view of Apache Tez. It is written for two audiences:

  1. Contributors who want to understand what a committer is reading when they review your patch, why a release vote takes 72 hours, and what a PMC member actually does between commits.
  2. New committers and PMC members on Tez (or any other ASF project) who need the operational playbook nobody hands them.

The chapters are deliberately not aspirational. They are the mechanics — what email to send, what file to sign, what the [VOTE] thread template looks like, where the LICENSE and NOTICE rules are bright lines.

Reading Order

#ChapterAudience
1Mailing ListsEveryone
2JIRA & Code ReviewContributors and committers
3Committer MindsetNew committers, contributors who want to think like one
4Release VotingPMC and release managers
5PMC ResponsibilitiesPMC members
6LicensingEveryone touching dependencies; PMC for releases
7Code Style & TrustAll contributors

Chapters 1–3 and 6–7 are useful to contributors. Chapters 4–5 are PMC-facing but worth reading earlier to understand why committers behave the way they do at release time.

How This Section Differs From the Mindset Section

The Contributor Mindset section answered the question "how do I behave so my work gets accepted?" This section answers "what is the work being done by the people who accept it?" — the asymmetric view from the other side.

You don't need to be a committer to read this material. You need to internalise it before you become one, so the offer doesn't catch you off guard.

What This Section Is Not

This section is not:

It is a faithful, project-specific summary of what those documents and that onboarding actually contain, written so that a contributor can build accurate expectations and a new committer can move fast without surprises.

Prerequisites

Before this section is fully useful:

  • You have read the Contributor Mindset section.
  • You have a JIRA account at https://issues.apache.org/jira/.
  • You are subscribed to dev@tez.apache.org.
  • You have a local clone of Tez at ~/tez-src.

If you are a new Tez committer:

  • You have received your ASF ID (<id>@apache.org).
  • You have set up GPG (we'll cover this in Release Voting).
  • You are subscribed to private@tez.apache.org.

Validation for the Section

You have absorbed this section when you can:

  1. Compose a [VOTE] thread email for an RC without consulting a template.
  2. Read a LICENSE change in a patch and predict if it would block a release.
  3. Explain why Tez is RTC (Review Then Commit) and not CTR (Commit Then Review).
  4. Predict, before opening a JIRA, which committer will likely shepherd it.
  5. Identify the category-A / category-B / category-X status of a dependency you want to add.
  6. Run mvn apache-rat:check and read its output.

The next chapter — Mailing Lists — covers the operational mechanics of the ASF list system that this entire section relies on.

Mailing Lists

Mailing lists are the spine of Apache governance. Every decision that affects the project — design, release, new committer, security disclosure — happens on a mailing list, in an archived thread, with a documented vote when required. This chapter is the operational manual for the Tez lists.

The Tez Lists

ListPurposeSubscribeNotes
dev@tez.apache.orgDevelopment discussion, design, votesdev-subscribe@tez.apache.orgPrimary list. Read first, post sparingly.
user@tez.apache.orgUsage questionsuser-subscribe@tez.apache.orgLower-traffic. Answer here if you can.
commits@tez.apache.orgGit commit notificationscommits-subscribe@tez.apache.orgBot-driven. Subscribe to follow trunk live.
issues@tez.apache.orgJIRA event notificationsissues-subscribe@tez.apache.orgBot-driven. Verbose; use a filter rule.
private@tez.apache.orgPMC-only(Auto on PMC)New-committer votes, security reports.

Archive: https://lists.apache.org/list.html?dev@tez.apache.org and equivalent for each list. Anything posted is public forever (except private@, which is archived but not public).

Subscribing

# From the address you want subscribed:
echo "" | mail -s "" dev-subscribe@tez.apache.org
# You will get a confirmation request. Reply to it.

For multiple lists, repeat. To unsubscribe, replace subscribe with unsubscribe.

Filtering

issues@tez.apache.org posts dozens of mails per day. Set a Gmail / Outlook / Thunderbird rule to file it into a folder. Same for commits@tez.apache.org if you subscribe.

For dev@, file by subject prefix:

PrefixFolder
[VOTE]dev-vote (read same-day)
[ANNOUNCE]dev-announce (read same-day)
[NOTICE]dev-notice (read same-day)
[DISCUSS]dev-discuss (read within the week)
[PROPOSAL]dev-proposal (read within the week)
(anything else)dev-misc

ASF Mailing-List Mores

Lists predate the web at Apache. The conventions are old and load-bearing.

Plain text only

HTML mail is dropped by some clients, breaks quoting, and bloats archives. Apache lists are plain-text. Configure your mail client:

  • Gmail web: Settings → General → Default text style → Plain text
  • Mutt / mu4e / aerc: already plain
  • Outlook: File → Options → Mail → Plain Text

Inline reply, not top-post

The Apache convention is to reply under the relevant quoted text, quoting only the part you're answering. Trim aggressively.

On Tue, May 7, 2024, Foo Bar wrote:
> Should we bump the default for tez.am.resource.memory.mb?

Yes, but conditionally. See the sizing sketch on TEZ-4321.

> And what about the AM heap?

Same patch; -Xmx is computed from -resource.memory.mb in the launch
command. We don't need a separate knob.

-- 
Jane

What top-post would do — your full reply at the top, the original quoted in full below — makes archive threads unreadable. People will gently note this once; do not require a second note.

No attachments

Patches go on JIRA. Logs and stack traces go in a gist or a pastebin and are linked. Long output goes as an attachment to the JIRA, not the email.

A 2 MB attachment forces hundreds of subscribers to download it. A link forces only the interested.

Sign off

A short sign-off — first name, or first + last — is conventional. No corporate signature block, no legal disclaimer, no "Sent from my iPhone."

If you must have a signature, use the standard -- \n separator (dash-dash-space) so mail clients can suppress it.

Subject hygiene

Subject prefixes are filterable. Use them.

PrefixWhen
[DISCUSS]Open question, no decision sought yet
[PROPOSAL]Concrete proposal, comment wanted
[VOTE]Vote in progress; body has voting rules
[VOTE][RESULT]Closing a vote; tallies the result
[ANNOUNCE]One-way announcement (release, new committer)
[NOTICE]Infrastructure / policy change

Don't prefix replies. The Re: is enough; subscribers' filters key off the embedded [VOTE] already.

Reply-To etiquette

ASF lists are configured to set Reply-To: list. So your reply goes to the list by default. Don't break it by manually rewriting the To:.

If you want to reply privately to the sender (rare — use only for personal/off-topic), explicitly remove the list and address them.

[VOTE] Mechanics

ASF votes are the formal decision mechanism. They use a fixed +1 / 0 / -1 syntax.

Voting tokens

TokenMeaning
+1I approve.
+0I'm slightly positive but won't block.
0I have no opinion.
-0I'm slightly negative but won't block.
-1I disapprove.

The -1 (a "veto") is a heavy tool. It must be accompanied by a technical justification. A -1 without justification is invalid. Once a valid -1 is cast on a code change, the issue must be resolved (typically by revision) before the change proceeds.

Binding vs non-binding votes

Vote topicWho is binding
Code changeCommitters and PMC
Release artifactPMC only
New committerPMC only
New PMC memberPMC only
Project mechanics (board reports, etc.)PMC only

Non-binding votes are welcomed and counted, but only the binding count determines the outcome.

Required minimums

For releases, the ASF rule:

  • 72-hour minimum vote duration.
  • At least 3 binding +1 votes.
  • More +1 than -1 votes.

If those conditions aren't met by close, the vote is extended or fails. See Release Voting for the full mechanics.

For code changes in Tez (RTC project — see JIRA & Code Review):

  • Typically 1 binding +1 (a committer) is sufficient to commit, after review.
  • A -1 from any committer or PMC member blocks the commit pending resolution.

For new committers / PMC:

  • Run on private@.
  • Typically a few-day vote window.
  • Pass: more +1 than -1; common practice is at least 3 +1.

Lazy consensus

Many decisions don't require a vote. The mechanism is lazy consensus:

"I'm planning to do X. Speaking up if you disagree; otherwise I'll proceed in 72 hours."

Used for things like cutting a branch, scheduling a release-vote window, or applying a trivial fix. The poster picks a reasonable window (24–72 hours). Silence = consent.

Lazy consensus is not for irreversible decisions (release, license change, PMC membership). Those require an explicit vote.

Composing a [VOTE] Email

Template — release vote (the full version is in Release Voting):

Subject: [VOTE] Apache Tez 0.10.4 RC1

Hi all,

I'd like to call a vote on releasing Apache Tez 0.10.4 RC1.

Source release:  https://dist.apache.org/repos/dist/dev/tez/tez-0.10.4-RC1/
Git tag:         release-0.10.4-rc1
Commit hash:     <full sha>
Staging Nexus:   https://repository.apache.org/content/repositories/orgapachetez-NNNN/

KEYS file:       https://downloads.apache.org/tez/KEYS
Signed with key: <your key id and fingerprint>

The vote will be open for 72 hours.

[ ] +1 Release this package
[ ]  0 No opinion
[ ] -1 Do not release because ...

My +1.

Thanks,
<First>

Template — new committer (run on private@):

Subject: [VOTE] New Tez committer: <First Last>

Hi PMC,

I'd like to propose <First Last> as a new committer on Apache Tez.

<First Last> has been contributing since <month year> and has had
<N> patches committed, spanning <areas>. Highlights:

  - TEZ-NNNN: <one line>
  - TEZ-NNNN: <one line>
  - Active reviewer on TEZ-XXX, TEZ-YYY.

They've shown <judgement / quality / breadth>.

Vote open for 72 hours.

[ ] +1
[ ]  0
[ ] -1

My +1.

Thanks,
<First>

Template — closing a vote:

Subject: [VOTE][RESULT] Apache Tez 0.10.4 RC1

Hi all,

The vote on Apache Tez 0.10.4 RC1 has passed with the following tally:

Binding +1: <list of names>
Non-binding +1: <list of names>
0: <names if any>
-1: <names if any, with reasons>

I'll proceed with the release steps.

Thanks to everyone who voted.

<First>

Lazy Consensus Examples

Good lazy-consensus posts:

  • "I'm cutting branch branch-0.10.5 from current master tomorrow at 12:00 UTC unless there's objection."
  • "Planning to apply TEZ-4321 (one-line log fix, trivial) by end of week unless someone flags it. Patch is .001 on JIRA."
  • "Will cancel the 0.10.5 RC1 vote and roll RC2 tomorrow due to the LICENSE finding."

Bad lazy-consensus posts:

  • "Going to release 0.10.5 next week." (Requires a [VOTE].)
  • "Going to add NAME as committer." (Requires a [VOTE] on private@.)
  • "Going to remove the deprecated key X." (User-visible behavior; requires [DISCUSS] → consensus.)

When You're New on the List

The first month of reading a list:

  • Read every [VOTE] thread.
  • Read every [DISCUSS] thread.
  • Skim [jira] [Created] mails.
  • Post nothing initially.

After the first month:

  • Reply to a user@ question you can answer.
  • Post a self-introduction (see Community Interaction).
  • Comment on a [DISCUSS] thread once you have substance.

Validation Artifacts

After this chapter:

  1. Subscriptions confirmed to dev@, user@, and (if your mail client tolerates the volume) issues@.
  2. Mail-client filters configured for the subject prefixes table.
  3. A ~/tez-notes/vote-templates.md containing the four templates above.
  4. The reflex to inline-reply, not top-post.
  5. One archived [VOTE] thread URL bookmarked for reference.

The next chapter — JIRA & Code Review — is the operational view of what code review looks like from the committer side of the table.

JIRA & Code Review — Inside a Tez Review

This chapter is the committer view of code review. Read it as a contributor, and your patches will become reviewable. Read it as a new committer, and you'll have a workflow.

Tez is RTC

Apache projects choose between two commit philosophies:

ModelMeaningUsed by
RTC (Review Then Commit)Patch must be reviewed and +1'd before commitTez, Hive, Hadoop (for most code)
CTR (Commit Then Review)Committer may commit and discuss afterSome smaller projects, certain Hadoop subsystems

Tez is RTC. The implication: every commit went through at least one review round. Patches sitting at "Patch Available" with no review block on attention, not on velocity — the committer pool is finite.

The RTC exception: trivial fixes (typos, log message edits, javadoc improvements) may be committed by a committer without an explicit +1, but the commit message references the JIRA and the patch is still attached for the record.

How a Committer Reads a Patch

When a committer opens your patch (in JIRA, GitHub PR, or git apply locally), the sequence is roughly:

1. Read the JIRA description.              30s
2. git apply --check on a fresh clone.     30s
3. Look at git diff --stat.                30s
4. Read the test changes.                  2-5 min
5. Read the implementation changes.        5-15 min
6. Run mvn checkstyle:check.               30s
7. Run mvn test in changed modules.        2-15 min
8. Optionally: run an integration test.    5-30 min
9. Comment.                                Variable

The first three steps determine whether the patch gets the full read or a bounce. If the JIRA is unclear or the diff doesn't apply or includes unrelated changes, the patch goes back without step 5.

The Skim Phase

A committer skimming git diff --stat is looking for:

  • File count and module spread. A patch touching one module is easy; one touching five is suspicious.
  • Tests in the diff. No tests in a behavior-changing patch is a red flag.
  • Generated files in the diff. target/, *.iml, .idea/ — never committed.
  • Whitespace-only churn. git diff -w should not be vastly smaller than git diff.

If any of these are off, expect a comment before the implementation is read.

The Test Phase

Committers read tests before implementation because the test reveals intent. A good test named testRecoverNoInputs tells the reviewer:

  • The bug is in recovery.
  • The trigger is "no inputs."
  • The fix should not break recovery in any other case.

If the test is missing, weak (no assertions, or assertions that would pass without the fix), or named generically (testMethod1), the reviewer assumes the implementation is also weak.

The Implementation Phase

By the time the reviewer reads the code, they have a mental model from the JIRA, the test name, and the diff stat. The implementation read is checking:

  • Does the code match the intent of the JIRA and test?
  • Is the change minimal — does it touch what it must, and only what it must?
  • Are exceptions handled appropriately for the file's conventions?
  • Is logging at the right level (DEBUG for hot paths, INFO for state transitions, WARN for recoverable, ERROR for unrecoverable)?
  • Are there obvious thread-safety issues (state visible across threads, shared mutable collections)?
  • Are there back-compat concerns? (See Compatibility)

Comment Phrasing

Committer comments follow soft conventions that contributors should recognise — they encode meaning beyond the literal text.

CommentMeans
"Nit: ..."Stylistic preference; you may take it or push back without controversy.
"Suggestion: ..."Reviewer thinks there's a better way but isn't blocking.
"Concern: ..."Reviewer wants this addressed before commit.
"I don't think this is right."Block; must be resolved.
"Have you considered X?"Genuine question; respond with your reasoning.
"Let's discuss on dev@."Issue is bigger than the patch; design discussion needed.
"+1 LGTM"Approval (informal).
"+1 pending checkstyle"Conditional approval.
"-1, see ..."Veto; must be resolved before commit.

Reciprocal etiquette on responses, see Responding to Feedback: acknowledge every comment explicitly, fix what's fixable, push back with evidence on what's not.

Patch Available → Reviewed Lifecycle

The JIRA state transitions for a typical patch:

Open
 |  (contributor starts)
 v
In Progress
 |  (contributor attaches .001)
 v
Patch Available  ← reviewer reads here
 |  (review comments)
 v
In Progress  ← contributor revises
 |  (attaches .002)
 v
Patch Available
 |  (LGTM)
 v
Resolved (committer commits to trunk)
 |  (release ships)
 v
Closed

The patch attachments accumulate: .001, .002, .003. They are never deleted. Future readers can reconstruct the review by walking through them.

GitHub-PR-based reviews follow the same lifecycle, but the iteration happens in the PR's commit history rather than separate .NNN.patch files. The JIRA still moves through the states above.

Backport Patches

A patch may need to land on multiple branches (e.g. master and branch-0.10). The contributor attaches both:

TEZ-4321.001.patch                 (for master)
TEZ-4321.branch-0.10.001.patch     (for the maintenance branch)

The committer reviews and commits each. The JIRA comment notes the commits:

Committed to master: <sha>
Committed to branch-0.10: <sha>

The Committer's Pre-Commit Checklist

A committer about to commit runs:

cd ~/tez-src
git fetch origin
git checkout master
git merge --ff-only origin/master
git apply --check /tmp/TEZ-4321.003.patch
git apply /tmp/TEZ-4321.003.patch

mvn install -DskipTests
mvn checkstyle:check
mvn test -pl tez-dag,tez-api      # changed modules
mvn test -pl tez-tests -Dtest=TestOrderedWordCount

git add -A
git commit -s -m "TEZ-4321: Fix NPE in VertexImpl.recover when no inputs. (Jane Doe via gunther)"
git push origin master

Notes on the commit step:

  • -s adds a Signed-off-by: trailer. Tez doesn't currently require DCO, but it's Apache-idiomatic.
  • The (Jane Doe via gunther) suffix is added by the committer, not the contributor.
  • The push goes to apache/tez (committer karma required).

After push:

1. Update JIRA: status → Resolved, set Fix Version (e.g. "0.10.5").
2. Comment on JIRA with the commit SHA.
3. Thank the contributor.

Holding the "No" Muscle

A subtle and underappreciated committer skill is declining patches that shouldn't go in. A patch can be technically correct and still not belong in trunk — too narrow a use case, too much added complexity, the wrong layer.

Wording for a respectful decline:

Thanks for the patch. After reading, I'm not comfortable taking this in trunk because REASON. I appreciate the work, and I'd encourage ALTERNATIVE-PATH. Closing the JIRA as Won't Fix; if there's broader consensus on dev@ for a different approach, happy to reopen.

The "no" muscle is not natural. Committers learn it because the alternative — accepting every patch — accumulates technical debt that the committer pool will pay forever. See Committer Mindset.

When to Refactor Unsolicited Code in a Patch

A contributor's patch sometimes lands in a corner of the code the committer would like to clean up. The temptation is to do the cleanup at commit time. Don't.

The rules:

  • Never modify the contributor's diff at commit. The patch attached to JIRA must match what was reviewed.
  • File a follow-up JIRA for the cleanup. Reference the contributor in CC.
  • If the patch creates a refactoring opportunity, take it later. Not in this commit.

The exception: trivial cleanups the contributor agreed to in review may be applied at commit. The JIRA comment notes them. Example:

Committed with a small change: extracted the new logic into a private
helper method as discussed in review. Attaching the committed patch
as .004 for the record.

What Goes On the GitHub PR vs. JIRA

Tez accepts patches as JIRA attachments and as GitHub PRs (linked from the JIRA). The mapping:

Lives onWhat
JIRADescription, design discussion, root cause, attachments (.NNN.patch), final commit reference
GitHub PR (if used)Line-by-line comments, CI run results, iterative push history

A PR without a linked JIRA is incomplete; the JIRA is the system of record. A JIRA without a PR is fine — many Tez patches are still JIRA-attachment-only.

If you open a PR, link it on the JIRA in the first comment and set the JIRA to "Patch Available."

Worked Example — A Full Review Cycle

JIRA: TEZ-4321. "Fix NPE in VertexImpl.recover when no inputs."

Day 0   You: file JIRA with description, repro, root cause.
        Set yourself as assignee, status In Progress.
        Attach .001 patch; status → Patch Available.

Day 3   Committer @gopalv: applies patch locally, runs tests, reviews.
        Comments on JIRA:
          - L88: prefer Collections.emptyList().
          - L92: add a test for the no-inputs case.
          - L94: should we handle no-outputs symmetrically? Concern: see
            VertexImpl.recover at L142, looks like the same shape.

Day 4   You: reply on JIRA.
          - L88: agreed.
          - L92: agreed; adding testRecoverNoInputs.
          - L94: I see the parallel but think it's a separate JIRA.
            Filing TEZ-4329 to track.
        Attach .002.

Day 7   @gopalv: re-reviews. "+1 LGTM."

Day 8   @gopalv: commits.
        "TEZ-4321: Fix NPE in VertexImpl.recover when no inputs. (Jane Doe via gopalv)"
        Sets JIRA Resolved, Fix Version 0.10.5.

Day 8   You: comment "Thanks @gopalv. Working on TEZ-4329 next."

That is a healthy review — 2 patch rounds, 1 follow-up JIRA filed, no friction.

Validation Artifacts

After this chapter:

  1. A ~/tez-notes/reviewer-vocab.md cheatsheet from the comment-phrasing table.
  2. The four checklist steps committers run pre-commit, saved for when you are one.
  3. The discipline to never modify a contributor's diff at commit (with an exception only for explicit reviewer-author agreement).
  4. The reflex to comment "Thanks @COMMITTER" after a merge of your patch.

The next chapter — Committer Mindset — takes the perspective further: the judgement model committers use across many patches and many years.

Committer Mindset

Becoming a committer is a one-day event. Thinking like one is a multi-year practice. This chapter sketches the practice: the asymmetries, the recurring trade-offs, and the mental model that distinguishes "writes good patches" from "stewards the codebase."

The Long-Lived Code Tax

A contributor writes a patch and leaves. A committer commits a patch and inherits it forever. Every line a committer approves is theirs to debug at 11pm three years later when it breaks in production.

Practical consequence: the committer's "yes" is a much heavier word than the contributor's "this would be nice." Committers reflexively ask:

QuestionWhy
Who will maintain this in 2 years?Code without a maintainer becomes everyone's problem
Is the complexity proportional to the value?Complex code is paid for in every future bug
Does this make tez-dag harder to onboard into?Onboarding cost is real
What's the failure mode at 10x scale?Tez runs in production clusters at scale
Does this lock us into a design we'll regret?API and proto changes are forever

These are not abstractions. Every committer has at least one patch they regret approving. That memory is the source of the "no" muscle.

Reasoning About Compatibility

The compatibility surface is exhaustively documented in Compatibility. The mindset around it:

  • Default to backwards-compat. A change that breaks no one is always preferable to one that breaks anyone, even if uglier.
  • A deprecation is a promise. If you deprecate a method "to be removed in 0.12," it had better be removable in 0.12 — which means no production user can still be on it by then, which means the deprecation window has to be long enough to drain.
  • Wire compat is not negotiable. A DAGPlan change that breaks recovery from an old AM means a cluster can't roll-restart safely. That's a P0 production issue.
  • Configuration compatibility is silent until it isn't. Renaming a key without a deprecation alias breaks every cluster that has the old key in tez-site.xml. Reviewers will catch this if they're paying attention; committers must always pay attention.

The mental model: imagine you are the SRE on call at a Fortune 500 that runs Tez via Hive at 1 AM. What does this patch do to your night?

Reasoning About Performance

Tez runs in the hot path of Hive on terabyte-scale workloads. A 5 ms regression in a per-task code path is real money. The mindset:

  • Measure, don't guess. A patch claiming performance benefit needs numbers, not intuition. A patch claiming no performance impact in a hot path still needs a check.
  • Hot vs. cold paths. Optimisations matter in tez-runtime-library and the per-task paths of tez-runtime-internals. They matter much less in tez-dag AM startup code that runs once per DAG.
  • GC is performance. A patch that allocates an extra object per task adds GC pressure at scale. Reuse buffers; use primitives; bound queues.
  • Logging is performance. LOG.debug("..." + obj) allocates the string even when DEBUG is off. Use LOG.debug("... {}", obj) instead.

The committer reading a patch in a hot path keeps these questions ready:

  • Does this allocate per-record? Per-batch? Per-DAG?
  • Is the allocation reusable / poolable?
  • Is the log statement guarded or formatted?
  • Has the contributor said how this performs at scale?

Reasoning About Complexity

Complexity has a half-life of bugs. The reviewing committer's complexity check:

Complexity additionWhat it costs
A new abstract base classA new mental model for readers
A new configuration keyDocumentation, default-tuning, deprecation later
A new state in a state machineCombinatorial new transitions to test
A new event typeNew event dispatcher cases, new history entries
A new public methodCompatibility commitment
A new dependencyLicensing review, attack surface, build complexity

A patch that adds, say, a new configuration key for a corner-case behavior is not trivially "yes" even if the code is correct. The cost of the key — documentation, tuning, eventual deprecation — must justify the value.

The reflexive committer question: "Could this be a default, with no key?" If the answer is yes, skip the key.

Reasoning About Risk

Different code paths carry different risk profiles:

PathRisk
tez-tools/Low. Process tooling; broken doesn't affect runtime.
tez-mapreduce/Medium. Affects MR-on-Tez users; relatively well-tested.
tez-runtime-library/High. In the per-task hot path.
tez-runtime-internals/High. Task runtime; affects every DAG.
tez-dag/ AM schedulingHigh. AM bugs lose work.
tez-dag/ DAG planningVery high. Errors are bad DAGs.
tez-api/Very high. Public API; breaking it breaks downstream projects.
tez-api/src/main/proto/Critical. Wire format; cluster-rolling-restart implications.

Committers calibrate review depth to risk. A 50-line patch in tez-tools/ may get a quick read and +1. A 50-line patch in tez-api/src/main/proto/ gets word-by-word scrutiny, a [DISCUSS] thread, and possibly a -1 if the protobuf change is anything other than additive.

The "No" Muscle — When and How

The hardest committer skill is saying no. Not no-by-silence (the default and worst form), but explicit, kind, decisive no. Patterns for when to use it:

PatternPattern of "no"
Patch fixes a real but rare bug at the cost of significant new complexity"Let's not fix this in code; document the workaround and close as Won't Fix."
Patch adds a feature with one user (the contributor)"Could you maintain this as an out-of-tree plugin? VertexManagerPlugin exists for this."
Patch is technically correct but encodes a design that conflicts with planned direction"We're going a different way on dev@ thread XYZ; let's wait."
Patch is correct but vastly over-scoped"Could you split into 3 JIRAs? Happy to commit them one at a time."
Patch is correct but in a part of the codebase being rewritten"Let's wait for TEZ-NNNN to land first; this conflicts."

The crucial thing about saying no: do it early, explicitly, and once. Don't ghost the patch. The contributor's time is worth your one paragraph of explanation.

When to Refactor Unsolicited

A patch lands in a part of the codebase the committer has been wanting to refactor. The temptation is to do the refactor in or alongside the commit. Don't, except in narrow cases.

The rules:

  • Refactor neither in the contributor's patch nor in the same commit. Their patch must match what was reviewed.
  • File a follow-up JIRA for the refactor. Reference the contributor in CC; they often have context.
  • Do the refactor in a separate review cycle. Either you do it (review by someone else) or someone else does it (review by you).
  • Exception: If the contributor's patch sits in code that is literally being moved or removed by an imminent committed patch, coordinate. Either delay the contributor's patch or rebase the imminent one.

Mentoring Pattern

A committer's leverage is not just commits — it's mentoring. The well-trodden Apache mentoring pattern:

  1. Notice a thoughtful new contributor. Their first patch was clean; they responded well to feedback; they asked good questions on dev@.
  2. Suggest a JIRA in your area. Comment on a JIRA: "This would be a good fit for NAME based on their recent work on TEZ-XXXX."
  3. Shepherd it. Review their patch yourself, fast. Set expectations on iteration count.
  4. Make them visible. Refer to their work on dev@. Cite them in commits as you would any contributor.
  5. Eventually propose them. When they hit the rough bar from Meritocracy, propose them on private@.

A committer who has mentored two or three contributors into committership has done more for the project than one who has committed thousands of patches.

Time Allocation

Newly-minted committers underestimate how time-consuming the role is. A rough budget for sustained committership:

ActivityWeekly time
Reviewing patches2–4 hours
Filing or shepherding your own patches2–4 hours
dev@ discussion participation1–2 hours
JIRA triage (closing dups, asking for repros)0.5–1 hour
Mentoring0.5–1 hour
Release work (during release windows)4–8 hours

A committer who spends 0.5 hours/week on the project will be reactive at best and become inactive within a year. A committer who spends 4+ hours/week stewards the codebase.

Avoiding Burnout

The committer pool at any Apache project is finite. Burnout is a real failure mode:

Burnout signalSelf-rescue
Reviewing patches feels like a choreTake a 2-week formal break; tell dev@
You're saying yes to patches you don't believe inPractice saying no
You're the only reviewer for an areaMentor someone into co-reviewing
You're sleeping less because of a release windowAsk the PMC to split the RM duties
You haven't filed a JIRA you cared about in monthsStop reviewing for a week; write

Committership is voluntary. Stepping back is honourable. Emeritus committer status exists at Apache for those who want a graceful exit; you can come back later.

Validation Artifacts

After this chapter:

  1. A ~/tez-notes/committer-questions.md of the five recurring questions a committer asks of every patch.
  2. The discipline to score each Tez file path you touch by risk tier.
  3. The vocabulary to say no, in writing, with no rancour.
  4. The plan to do mentoring at some point in your committer life.

The next chapter — Release Voting — is the operational manual for the most visible PMC-level work: cutting a release.

Release Voting

Cutting an Apache Tez release is a procedural, legal, and cryptographic operation. It is the most formal thing the PMC does. This chapter is the operational manual: the steps, the artifacts, the vote thread, and the failure modes.

The authoritative reference is the ASF Release Distribution Policy. This chapter is the Tez-specific overlay on top of it.

What "Release" Means at Apache

An Apache release has a precise legal meaning. Only source artifacts are official Apache releases. Binary artifacts (jars in Maven Central, Docker images) are convenience artifacts that the PMC may publish but that are not the legal release.

Practical consequence: every vote is a vote on the source release. Binaries derive from it.

Release Artifacts

A Tez release consists of:

ArtifactWhereFormat
Source tarballdist.apache.orgapache-tez-X.Y.Z-src.tar.gz
ASCII-armored signaturedist.apache.orgapache-tez-X.Y.Z-src.tar.gz.asc
SHA-512 checksumdist.apache.orgapache-tez-X.Y.Z-src.tar.gz.sha512
(Optional) binary tarballdist.apache.orgapache-tez-X.Y.Z-bin.tar.gz plus .asc and .sha512
Staged Maven jarsrepository.apache.org (Nexus)Standard Maven layout
Git tagapache/tezrelease-X.Y.Z-rcN then release-X.Y.Z

Notes:

  • MD5 and SHA-1 are forbidden for release checksums (ASF policy since 2019). Use SHA-512 (preferred) or SHA-256.
  • The signature must be ASCII-armored (.asc), not binary.
  • The signing key must be in the project KEYS file at https://downloads.apache.org/tez/KEYS and in your public key on a public keyserver.

Prerequisites — One-Time PMC Setup

Before you can RM (release-manage), once:

# 1. Generate a GPG key (4096-bit RSA).
gpg --full-generate-key

# 2. Submit the public key to keyservers.
gpg --send-keys <KEY_ID>

# 3. Add your key to the Tez KEYS file.
svn co https://dist.apache.org/repos/dist/release/tez tez-dist-release
cd tez-dist-release
(gpg --list-sigs <KEY_ID> && gpg --armor --export <KEY_ID>) >> KEYS
svn commit KEYS -m "Add <Your Name>'s release-signing key"

# 4. Verify it lands at:
#    https://downloads.apache.org/tez/KEYS

The Nexus staging access:

# Add ~/.m2/settings.xml entry:
cat >> ~/.m2/settings.xml <<EOF
<settings>
  <servers>
    <server>
      <id>apache.releases.https</id>
      <username>YOUR_APACHE_ID</username>
      <password>YOUR_APACHE_LDAP_PASSWORD</password>
    </server>
  </servers>
</settings>
EOF

The Release Cut

Roughly the sequence the release manager runs:

cd ~/tez-src
git fetch origin

# 1. Branch (for X.Y.0 releases) or check out maintenance branch.
git checkout -b branch-0.10.4 origin/master    # for a new minor
# or
git checkout branch-0.10                       # for a patch release

# 2. Update version.
mvn versions:set -DnewVersion=0.10.4
git commit -am "Setting version to 0.10.4 for release"
git tag release-0.10.4-rc1
git push origin branch-0.10.4
git push origin release-0.10.4-rc1

# 3. Build everything; tests must pass.
mvn clean install
mvn apache-rat:check

# 4. Build source tarball.
mvn clean package -Pdist,docs,src -DskipTests
ls tez-dist/target/                       # apache-tez-0.10.4-src.tar.gz

# 5. Sign and checksum.
gpg --armor --output apache-tez-0.10.4-src.tar.gz.asc \
    --detach-sign apache-tez-0.10.4-src.tar.gz
sha512sum apache-tez-0.10.4-src.tar.gz > apache-tez-0.10.4-src.tar.gz.sha512

# 6. Stage to dist.apache.org/dev.
svn co https://dist.apache.org/repos/dist/dev/tez tez-dev
mkdir tez-dev/tez-0.10.4-RC1
cp apache-tez-0.10.4-src.tar.gz* tez-dev/tez-0.10.4-RC1/
cd tez-dev
svn add tez-0.10.4-RC1
svn commit -m "Apache Tez 0.10.4 RC1"

# 7. Stage Maven artifacts.
mvn clean deploy -Papache-release -DskipTests
#    Then on https://repository.apache.org, log in, find your
#    staging repo (orgapachetez-NNNN), "Close" it.

The exact Maven profiles differ across Tez versions; check ~/tez-src/RELEASING.txt and the release notes for the prior release for the recipe in use.

The [VOTE] Email

After staging, you send the vote. The template:

Subject: [VOTE] Apache Tez 0.10.4 RC1

Hi all,

I'd like to call a vote on releasing Apache Tez 0.10.4 RC1.

Notable changes since 0.10.3:
  - TEZ-NNNN: <one line>
  - TEZ-MMMM: <one line>
  - <N> additional fixes; see CHANGES.txt for the full list.

Source release:
  https://dist.apache.org/repos/dist/dev/tez/tez-0.10.4-RC1/

The release was signed with key:
  <KEY_ID>  <fingerprint>

KEYS file:
  https://downloads.apache.org/tez/KEYS

Git tag:        release-0.10.4-rc1
Git commit:     <full 40-char sha>

Staging repository for Maven:
  https://repository.apache.org/content/repositories/orgapachetez-NNNN/

The vote will be open for 72 hours.

Please verify and vote:

  [ ] +1 Release this package
  [ ]  0 No opinion
  [ ] -1 Do not release this package because ...

Verification steps (https://www.apache.org/info/verification.html):
  - Download src.tar.gz, .asc, .sha512.
  - Verify SHA512: sha512sum -c apache-tez-0.10.4-src.tar.gz.sha512
  - Verify signature:
      gpg --import KEYS
      gpg --verify apache-tez-0.10.4-src.tar.gz.asc apache-tez-0.10.4-src.tar.gz
  - Untar; check LICENSE, NOTICE, DISCLAIMER.
  - Build: mvn clean install -DskipTests

My +1.

Thanks,
<First Last>

Send to dev@tez.apache.org. Subject [VOTE] Apache Tez 0.10.4 RC1.

What Voters Verify

A binding +1 is not just trust. It carries a check. PMC voters typically:

CheckCommand / location
Source artifact downloadswget from dist.apache.org/repos/dist/dev/tez/...
Signature is valid and from a Tez committergpg --verify against KEYS file
SHA-512 matchessha512sum -c
LICENSE is correct and currentRead it
NOTICE reflects bundled third-partyRead it; cross-check against LICENSE
DISCLAIMER present if incubating (not for Tez since 2014)Check
No binary files in source treefind apache-tez-X.Y.Z-src -type f -name '*.jar' -o -name '*.class'
Apache RAT cleanmvn apache-rat:check
Builds cleanmvn clean install -DskipTests
Tests pass (optional but valued)mvn test

A voter who finds anything wrong with the source tarball can -1. Common -1 reasons:

ReasonSeverity
Missing or broken signatureVetoes (must respin)
MD5 / SHA-1 onlyVetoes
Binary files in source treeVetoes
Missing or wrong LICENSEVetoes
Missing or wrong NOTICEVetoes
GPL or category-X depVetoes
RAT failureVetoes
Apache headers missingVetoes
Failed unit tests of significanceUsually vetoes
Build failureVetoes
Documentation issueOften non-blocking, opinion

Vote Pass Criteria

The release passes if, after the 72-hour minimum:

  • At least 3 binding +1 votes from PMC members.
  • More +1 than -1 total (binding and non-binding).
  • No unaddressed binding -1.

If criteria fail:

  • Extend the vote by 24–48 hours and ask explicitly for more attention.
  • Or cancel and roll RC2 with the fixes.

Closing the Vote

The release manager closes:

Subject: [VOTE][RESULT] Apache Tez 0.10.4 RC1

Hi all,

The vote on Apache Tez 0.10.4 RC1 has passed.

Binding +1: <names of PMC voters>
Non-binding +1: <names>
0: <names>
-1: <names with reasons, if any>

Proceeding with the release steps.

Thanks to everyone who voted.

<First>

If the vote fails:

Subject: [VOTE][RESULT] Apache Tez 0.10.4 RC1

The vote did not pass. Issues raised:
  - <issue from voter>
  - <issue from voter>

Rolling RC2 with these fixes. Expect a new [VOTE] thread within
<N> days.

<First>

Promoting the Release

After the vote passes:

# 1. Move source from dev to release.
svn mv \
  https://dist.apache.org/repos/dist/dev/tez/tez-0.10.4-RC1 \
  https://dist.apache.org/repos/dist/release/tez/0.10.4 \
  -m "Releasing Apache Tez 0.10.4"

# 2. Promote Nexus staging repo to release (one-click in Nexus UI).

# 3. Tag the final release.
cd ~/tez-src
git tag release-0.10.4 release-0.10.4-rc1
git push origin release-0.10.4

# 4. Wait 24h for mirrors.

# 5. Update the Tez website with download links.

# 6. Send ANNOUNCE.

The announce email goes to announce@apache.org (BCC), dev@tez.apache.org, user@tez.apache.org, and your usual ASF lists for downstream projects (e.g. dev@hive.apache.org):

Subject: [ANNOUNCE] Apache Tez 0.10.4 released

The Apache Tez community is pleased to announce the release of
Apache Tez 0.10.4.

Apache Tez is an application framework that allows for a complex
directed acyclic graph of tasks for processing data. It is built
atop Apache Hadoop YARN.

Highlights:
  - <user-facing change>
  - <user-facing change>

Download:    https://tez.apache.org/releases/0.10.4/
Release notes: https://tez.apache.org/releases/0.10.4/release-notes.html

Thanks to everyone who contributed.

The Apache Tez team

RC Iteration Patterns

A first RC almost never passes. Typical RC count for a minor release:

Release typeTypical RCs
Patch (0.10.X)1–2
Minor (0.10.0, 0.11.0)2–4
Major (1.0.0 if it happened)4+

Each RC means: cancel vote, fix issues, re-tag (release-X.Y.Z-rcN+1), respin tarball, re-sign, re-stage Nexus (new staging repo), re-send [VOTE]. Plan for 1–3 weeks per release cycle.

Common Failure Modes

FailureRecovery
Signature key not in KEYS fileStop, update KEYS, restart vote
RAT failure on a new fileAdd Apache header, respin
Forgot to update CHANGES.txtUpdate, respin
Stray .class or .jar in src treeClean, respin
Missing LICENSE entry for new bundled depAdd LICENSE entry + NOTICE if needed, respin
Vote got fewer than 3 binding +1 in 72hExtend with explicit ping to PMC
-1 on the source artifact for a legitimate issueRespin
Maven staging mistakeDrop staging repo in Nexus, re-stage

Validation Artifacts

After this chapter you should have:

  1. A GPG key generated and added to the project KEYS file (if you are PMC).
  2. A ~/tez-notes/release-checklist.md with the seven RM steps.
  3. The [VOTE] and [VOTE][RESULT] templates saved.
  4. The discipline to never vote +1 on an RC you haven't checked at least signature + LICENSE + a build.
  5. The phone number for ASF Infra Slack handy in case Nexus or dist.apache.org misbehaves.

The next chapter — PMC Responsibilities — covers the rest of what PMC membership entails, beyond releases.

PMC Responsibilities

PMC (Project Management Committee) membership at Apache is not a senior-engineer title. It is a stewardship role with explicit legal, brand, community, and release responsibilities. This chapter is the operational manual for what PMC members actually do between releases.

The Tez PMC list is at private@tez.apache.org. Public PMC members are listed at https://tez.apache.org/team-list.html (or the equivalent on the current site).

The Four Buckets of PMC Work

BucketExamplesFrequency
LegalLicense headers, NOTICE file, third-party LICENSE entries, ICLA matchingPer-patch and per-release
BrandTrademark protection, conference talk approvals, logo useQuarterly to annual
CommunityModerating list, voting new committers, mentoring, code of conduct enforcementContinuous
ReleasesVoting on RCs, cutting RCs, post-release announcePer-release

Plus one cross-bucket: board reporting, quarterly.

License Headers

Every source file in the Tez tree must have an Apache 2.0 license header. Tez uses Apache RAT to enforce this.

cd ~/tez-src
mvn apache-rat:check

The expected header for a .java file:

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

If RAT fails on a release candidate, the release cannot ship. PMC members reviewing a release verify RAT cleanliness as part of vote-time checks (see Release Voting).

For non-.java files (.proto, .xml, .sh, .md), the same content with the appropriate comment delimiters.

NOTICE File

The NOTICE file at the repo root carries:

  • The required Apache attribution line.
  • Required attribution for any bundled third-party code that explicitly demands it.
cat ~/tez-src/NOTICE

Most BSD-, MIT-, and Apache-licensed dependencies do not require NOTICE entries. Some do (notably ones with NOTICE files of their own, which by Apache convention propagate into bundlers). The rule of thumb: if a dependency ships a NOTICE file, copy the required text into Tez's NOTICE.

Common error: adding random "thanks to" lines. NOTICE is not a thank-you file; it is a legal artifact. Keep it minimal and correct.

LICENSE File

LICENSE at the repo root is the Apache License 2.0 plus appendices for any bundled third-party code under different licenses.

For Tez, mostly the appendices are absent because the source release bundles no third-party source. The binary release (the convenience tarball) may bundle jars whose licenses must be appendixed.

If you are a PMC member adding a new dependency that gets bundled in the binary release:

  1. Identify the dependency's license (read it, don't guess).
  2. Verify category (A, B, or X) — see Licensing.
  3. If A: update LICENSE appendix; sometimes NOTICE.
  4. If B: requires PMC discussion + LICENSE / NOTICE updates.
  5. If X: stop. Cannot be bundled. May only be a runtime-optional dep, never a hard one.

ICLA Matching

Every non-trivial contribution must come from someone with an Apache ICLA on file. The ICLA list is maintained by Apache Infra; PMC members can verify by emailing secretary@apache.org with a contributor name.

In practice, for casual contributors:

  • Trivial patches (Javadoc, typo) do not require ICLA.
  • Anything substantive does.
  • The contributor sends the ICLA themselves; PMC verifies it landed.

If a substantial patch is committed without an ICLA on file, that is a legal exposure for the foundation. PMC members must catch this before commit.

Brand Responsibilities

"Apache Tez" is a trademark of the Apache Software Foundation. The PMC is the steward.

Brand decisionPMC action
New logoPMC vote, register with VP Brand Management
Conference talk titled "Apache Tez"OK; speaker should follow trademark guidelines
Conference talk titled "Tez" without ApachePolite ask: please use full mark
Third-party product named "TezCloud"Likely refer to VP Brand; could be misleading
Third-party product built on Tez, named differentlyOK; clarify attribution if uncertain
Use of the Tez feather logo in a slide deckOK with attribution

For specifics see the ASF Trademark and Brand Policy. When in doubt, the PMC defers to trademarks@apache.org.

Community Responsibilities

Moderation

Most ASF mailing lists are moderated for non-subscribers (subscribers post freely). The moderation work is light: approving first posts, rejecting spam.

Tez has a small mod team (typically a couple of PMC members). Add dev-moderate@ or similar to your mail filter to spot moderation requests.

If subscriber behavior on a list becomes problematic — flame wars, code-of-conduct violations — the PMC handles it. Typical escalation:

  1. Off-list private email from a PMC member to the offending subscriber.
  2. If unaddressed, a public on-list warning.
  3. If unaddressed, removal from the list (rare; requires PMC vote).

For severe cases (harassment, security threats), escalate immediately to board@apache.org.

Voting New Committers

The committer-bit process, from the PMC's side:

1. PMC member observes a strong contributor (see meritocracy chapter).
2. PMC member emails private@tez.apache.org with [VOTE] thread.
3. PMC members vote +1 / 0 / -1 (usually +1, sometimes 0 with rationale).
4. Vote runs ~72 hours; passes with >3 binding +1 and no binding -1.
5. PMC member privately emails the contributor with the offer.
6. On acceptance, ASF Infra is notified to provision the ASF account.
7. PMC announces publicly on dev@.

A -1 from a PMC member on a committer vote requires a concrete reason. "Doesn't feel right" is not enough; "two recent JIRAs showed inadequate care for compatibility" is.

PMC members may vote 0 if they don't know the contributor well — common, no shame in it.

Voting New PMC Members

Same mechanism as committer, except:

  • All committers are pre-considered, so the candidate is always a sitting committer.
  • The bar is higher (judgement, willingness to do PMC work, see Meritocracy).

After acceptance, the candidate is invited to the PMC. The Apache Board confirms.

Code of Conduct

Apache projects follow the ASF Code of Conduct. The PMC is the enforcement body within the project. Most enforcement is gentle and private. Serious cases are escalated to the board.

Release Responsibilities

Covered in detail in Release Voting. The PMC-specific elements:

  • Binding +1 votes on release artifacts are PMC-only.
  • At least 3 binding +1 required for a release to pass.
  • PMC member is the release manager (or supervises if a non-PMC committer is designated by lazy consensus to RM under PMC oversight).
  • Post-release, PMC member ensures the announce@apache.org mail goes out and the website is updated.

Security Reports

Security disclosures arrive at private@tez.apache.org or security@apache.org. The process:

1. Acknowledge receipt within 48 hours.
2. PMC investigates in private; reproduce.
3. Develop a fix in a private branch (not in apache/tez until disclosure).
4. Determine severity (CVSS) and assign a CVE.
5. Coordinate disclosure timing with downstream projects (Hive, etc).
6. Cut a release containing the fix.
7. Send disclosure to oss-security and security@apache.org with CVE and details.

The discipline: never discuss security issues on public lists or public JIRA until the fix has been released and disclosure is published.

If you are new to PMC, read the ASF Security Team process before you need it.

Board Reporting

The Apache Board oversees every project via quarterly reports. The Tez PMC submits a report each quarter (or per the schedule the board sets — currently quarterly with projects rotated through). The chair (or a delegate) submits it via https://reporter.apache.org/.

A standard report contains:

  • Community activity (new committers, new PMC members, list activity)
  • Releases since last report
  • Brand or legal issues
  • Health concerns the board should know about

The board looks for warning signs:

WarningBoard concern
No releases in many quartersIs the project dormant?
All committers from one companyIs the project independent?
Mailing-list activity fallingIs the community shrinking?
Code-of-conduct issues unresolvedIs the PMC functional?

The chair is responsible for filing on time. If the report is late, the board notices.

Time Commitment

A PMC member with no other ASF roles spends roughly:

ActivityMonthly time
Reviewing private@ traffic1–2 hours
Voting on releases (when there is one)1–3 hours per release
Voting on new committers30 minutes per vote
Board reporting (every 3 months)1–2 hours
Security incidents (when they happen)Variable; possibly days
Committer work on top of PMC duties(as before)

A PMC member who is also chair adds the report-filing burden and acts as the project's ambassador to the board.

Stepping Back

PMC membership is permanent until you step back. Emeritus PMC status exists for those who have stepped away from active project work but want to remain available for consultation.

To go emeritus:

Subject: [NOTICE] Going emeritus PMC

Hi all,

Effective <date>, I'm moving to emeritus PMC status on Tez. My
involvement in the project has tapered and I want the active PMC
to reflect who's currently doing the work.

Please feel free to reach out if you ever want a sanity check on
something I worked on historically.

Thanks for the years of collaboration.

<First>

PMC removes you from active count. You retain your ASF account; you may return to active later by vote.

Validation Artifacts

After this chapter:

  1. A ~/tez-notes/pmc-duties.md listing the four buckets and a one-line example of each.
  2. A subscription to private@tez.apache.org (when you are PMC).
  3. Knowledge of how to verify an ICLA, how to find the trademark policy, how to file a board report.
  4. A reflex to escalate security reports to private@ immediately and never discuss them publicly until disclosure.

The next chapter — Licensing — drills into the legal bucket: ALv2, LICENSE/NOTICE rules, and category A/B/X.

Licensing

Apache licensing is precise. The rules are not "be reasonable about open source"; they are a specific framework administered by Apache Legal. Getting them wrong blocks a release. This chapter is the working knowledge needed by committers and PMC, plus the bits every contributor should know before adding a dependency.

The Apache License 2.0

Apache Tez is licensed under the Apache License, Version 2.0 ("ALv2"). This is a permissive license that allows:

  • Use, reproduction, modification, distribution
  • Commercial use
  • Patent grant (explicitly, unlike MIT/BSD)
  • Sublicensing under different terms (with attribution)

In exchange:

  • You include the LICENSE and NOTICE in distributions
  • You note significant modifications
  • You preserve attribution and patent grants

Practically, ALv2 is one of the most permissive copyleft-free licenses. It's compatible with almost everything except GPL 2.0 (and is one-way compatible with GPL 3.0).

The Three Files in the Tez Repo Root

FilePurpose
LICENSEThe Apache License 2.0 text, plus appendices for any bundled third-party code under different licenses
NOTICERequired attributions for bundled code (Apache + any NOTICE-bearing deps)
KEYS (in dist, not repo)PGP keys used to sign releases
ls ~/tez-src/LICENSE ~/tez-src/NOTICE
cat ~/tez-src/NOTICE

For Tez source releases, LICENSE and NOTICE are typically short — the source tarball bundles no third-party code. For convenience binary releases, both grow with the bundled jars.

Category A / B / X — The Dependency Classes

Apache Legal classifies third-party licenses into categories. The full list is at Apache Legal Resolved. Summary:

CategoryMeaningExamplesCan it be a Tez dependency?
ACompatible with ALv2ALv2, MIT, BSD 2/3-clause, ISC, MPL 2.0Yes; document in LICENSE/NOTICE if bundled
BCompatible with conditionsEPL 1.0/2.0, CDDL 1.0/1.1, MPL 1.1, IBM Public License 1.0Yes, but only as bundled binary, not source. Add LICENSE/NOTICE entry.
XIncompatibleGPL (any version), AGPL, LGPL 2.0/2.1 (kind of), SSPL, BUSL, CC-BY-NCNo. May not be bundled in any release. Runtime optional dep only, with care.

The hard cases:

  • LGPL is category X for binary distribution but acceptable as an optional runtime dependency. Be careful; this is one of the most-asked questions on legal-discuss@apache.org.
  • CC-BY-SA and other ShareAlike licenses depend on the work: data and documentation are sometimes B, sometimes X.
  • Bespoke licenses (custom permissive licenses) must be reviewed before use.

If you are uncertain, post on legal-discuss@apache.org with a link to the license text. Don't guess.

"GPL Contamination"

Apache projects cannot ship GPL code. The rule has corollaries that catch people:

ActionOK?
Tez code calls a GPL library via reflection at runtimeNo — if the library must be present, it's a dep
Tez code can optionally integrate with a GPL tool the user installs themselvesYes — runtime-optional, user-supplied
Tez ships a GPL jar in the binary tarballNo
Tez build script downloads a GPL jar during buildNo (this is contamination)
Tez source contains a comment "see SOME GPL CODE for reference"Risky — get review
Tez source copies a snippet from GPL codeNo — pollutes the codebase

The conservative rule: GPL code may exist near Tez (a user's runtime environment) but not in Tez (source or binary distribution).

Adding a New Dependency — Procedure

When a patch proposes a new third-party dependency:

  1. Identify the license. Open the project's LICENSE file. Don't read the GitHub "License" sidebar; it can be wrong.
  2. Classify. Category A, B, or X (above). If A, proceed. If B, plan for LICENSE / NOTICE updates and PMC discussion. If X, stop.
  3. Check transitive deps. A category-A library may pull in a category-X transitive. Use mvn dependency:tree and verify every transitive's license.
  4. Justify. On the JIRA, explain why this dep is needed and why no in-tree alternative suffices.
  5. Update LICENSE. If the dep is bundled in the binary release (it usually is), add an appendix entry naming the dep, its license, and where to find the full license text.
  6. Update NOTICE. If the dep ships a NOTICE file, copy the required text into Tez's NOTICE. Read the dep's NOTICE; not all of it is required.
  7. Test the build. Run mvn apache-rat:check and a full build. The dep should not produce RAT-flagged files (most don't).

PMC review the dependency before commit. If you are PMC, ask:

  • Is the license correctly classified?
  • Is the dep maintained?
  • What is the size cost (Tez binary tarball grows by N MB)?
  • Are there security advisories against the version proposed?

Apache RAT in Tez Pre-commit

Apache RAT (Release Audit Tool) checks that every source file has an Apache license header. It is part of every Tez release vote and should be part of every contributor's pre-submit.

Run:

cd ~/tez-src
mvn apache-rat:check

Output on success:

[INFO] BUILD SUCCESS

Output on failure:

[ERROR] Files with unapproved licenses:
  tez-dag/src/main/java/.../NewClass.java

The fix is to add the license header. The standard Java header is at the top of any existing Tez Java file; copy it.

RAT can be configured to allow certain files to be exempt (e.g. generated .proto-derived files, META-INF/). The exemption config lives in the parent pom.xml:

grep -A20 "apache-rat-plugin" ~/tez-src/pom.xml

Adding a new file type that legitimately can't carry a header (e.g. a JSON test fixture) requires updating the exemption list and noting it in the JIRA.

License Header Template

For .java:

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

For .proto:

//
// Licensed to the Apache Software Foundation (ASF) ... (same content with // comments)
//

For .xml:

<!--
   Licensed to the Apache Software Foundation (ASF) ... (same content)
-->

For .sh / .py:

#
# Licensed to the Apache Software Foundation (ASF) ... (same content with # comments)
#

For .md: by convention, no header is needed for markdown docs in the source tree, but project policy may require one. Check mvn apache-rat:check output.

The Tez NOTICE File

A typical Tez NOTICE:

Apache Tez
Copyright 2014-YYYY The Apache Software Foundation

This product includes software developed at
The Apache Software Foundation (https://www.apache.org/).

Plus, if bundled deps require:

This product bundles SomeLibrary, which is available under
the Foo Bar License. See <path or URL>.

NOTICE is not:

  • A list of contributors (that's CHANGES.txt and git).
  • A thank-you list.
  • A list of services or users.

Keep it minimal and legally precise.

Source vs Binary Release — Different Rules

Apache makes a sharp distinction:

AspectSource releaseBinary release
StatusOfficial Apache releaseConvenience artifact
What's bundledSource code onlyCompiled jars, possibly third-party jars
Must have ALv2 LICENSEYesYes
Must have NOTICEYesYes; longer than source NOTICE
Must pass RATYesSource check passes for binary, plus binary-bundled jars are exempt
Category B bundlingGenerally allowed in source, restrictiveAllowed with LICENSE/NOTICE entry
Category X bundlingNeverNever

Practical implication: a source release rarely bundles anything except Tez's own source. A binary release bundles tez-dist/target/apache-tez-X.Y.Z-bin.tar.gz which contains all the runtime jars Tez depends on (Hadoop, Jackson, etc.).

Common Licensing Mistakes

MistakeCaught byFix
New file without Apache headermvn apache-rat:checkAdd header
Random third-party snippet pasted into TezCode reviewReplace with original code or pull in via dep
New category-B dep with no LICENSE updatePMC at release voteUpdate LICENSE
New category-X depPMC at release voteRemove dep
NOTICE accidentally clearedCode reviewRestore from prior release
Copyright (c) Company Name in a fileCode reviewReplace with Apache header; Company-owned code requires CLA review

What ICLAs and CCLAs Cover

Two contributor license agreements:

CLAWho signsWhat it covers
ICLA (Individual)An individual contributorTheir personal contributions
CCLA (Corporate)A company's authorised signatoryContributions by listed employees

An ICLA is required for any non-trivial contribution. A CCLA is required if the contribution is made in the contributor's capacity as a company employee.

PMC members can verify ICLA status via secretary@apache.org. For a casual single-patch contributor, the trivial-patch exception often applies and no ICLA is needed; for a contributor on path to committer, the ICLA needs to be on file by the second or third patch.

Validation Artifacts

After this chapter:

  1. A ~/tez-notes/license-categories.md cheatsheet of A/B/X with examples.
  2. The reflex to run mvn apache-rat:check in your pre-submit script.
  3. The discipline to check a new dep's category before opening a JIRA proposing it.
  4. The ability to read Tez's NOTICE file and confirm what each line is there for.

The next chapter — Code Style & Trust — closes the section with the operational mechanics of style enforcement and the trust ladder a contributor climbs.

Code Style & Trust

The Tez project enforces a specific code style via checkstyle. The style itself is less interesting than the trust mechanism it embodies: an automated, opinionated style is how a project of dozens of committers and hundreds of contributors keeps its codebase coherent without requiring every reviewer to argue about braces.

This chapter is the practical guide to the style, the tools that enforce it, and the trust ladder a contributor climbs from first patch to commit bit.

Where the Style Lives

Tez's checkstyle configuration:

cat ~/tez-src/tez-tools/src/main/resources/tez/checkstyle.xml

This file is the source of truth. If a reviewer says "your patch fails checkstyle," they mean this file is unhappy.

The file is invoked by the parent pom.xml:

grep -A10 "maven-checkstyle-plugin" ~/tez-src/pom.xml

Verify locally:

cd ~/tez-src
mvn checkstyle:check

Output on success is silent (exit 0). Output on failure lists each violation with file and line number.

The Rules That Matter

The full ruleset is the file above. The rules that catch contributors most often:

RuleWhat it enforces
Line lengthUsually 120 chars max
Indentation2 spaces (not 4, not tabs)
ImportsNo wildcard imports; specific order
Brace styleEgyptian ({ on same line)
Unused importsDisallowed
Member orderingStatic fields, instance fields, constructors, methods
Trailing whitespaceDisallowed
Final newlineRequired
@Override annotationsRequired when overriding
Javadoc on public methods of @Public classesRequired

The full list is in the file. Notable absences:

  • Tez does not enforce a strict naming convention beyond standard Java (camelCase, PascalCase for classes).
  • Tez does not enforce method length limits (so committers must catch overly long methods in review).
  • Tez does not enforce strict cyclomatic complexity (same).

So checkstyle is a floor, not a ceiling. Passing it doesn't mean the patch is well-styled in the human sense — it means the obvious mechanical violations are absent.

IDE Setup

Configure your IDE to match. IntelliJ:

1. File → Settings → Editor → Code Style → Java.
2. Set Tab size: 2; Indent: 2; Continuation indent: 4.
3. Use spaces, not tabs.
4. Wrapping: hard wrap at 120.
5. Import → Class count to use import with '*': 999.
6. Final newline: required.

Or import the Hadoop / Tez IntelliJ style file if one is in the repo:

find ~/tez-src -name "*.xml" | xargs grep -l "CodeStyle" 2>/dev/null | head

Eclipse: Window → Preferences → Java → Code Style → Formatter, import an XML if one is provided in tez-tools/.

VS Code with the Java extension: edit .vscode/settings.json per workspace:

{
  "java.format.settings.url": "tez-tools/src/main/resources/tez/eclipse-formatter.xml",
  "editor.tabSize": 2,
  "editor.insertSpaces": true,
  "files.insertFinalNewline": true,
  "files.trimTrailingWhitespace": true
}

The goal: at save time, your IDE produces checkstyle-passing code.

Catching Violations Pre-Submit

The pre-submit script (from Patch Quality):

#!/usr/bin/env bash
set -e
cd ~/tez-src
mvn install -DskipTests
mvn checkstyle:check
git diff --check                       # detects whitespace errors
mvn test -pl tez-dag,tez-api

git diff --check is a free win — it catches trailing whitespace and conflict markers before they reach the reviewer.

The Trust Ladder

Style is the visible surface of a deeper thing: trust. The contributor-to-committer path is a multi-step climb up a trust ladder.

Step 0: Anonymous reader.
        Reads the codebase.
        Trust: none required.

Step 1: First-time contributor (Javadoc fix).
        Patch passes mechanical checks.
        Trust to receive: a few minutes of review attention.

Step 2: Multi-patch contributor.
        Several patches in over weeks/months.
        Trust to receive: a sympathetic reviewer who will guide.
        Trust to give: explain your reasoning on JIRA without being asked.

Step 3: Repeat contributor in one area.
        Becomes recognised as an expert in that area.
        Trust to receive: their +1 (non-binding) carries weight on patches in that area.
        Trust to give: stay engaged on follow-up issues.

Step 4: Reviewer.
        Provides non-binding +1 on others' patches with insight.
        Trust to receive: PMC members notice.
        Trust to give: your reviews must be substantive, not drive-by +1s.

Step 5: Committer (the bit).
        Granted by PMC vote on private@.
        Trust to receive: commit access to apache/tez.
        Trust to give: review patches in your areas, mentor newcomers, attend to dev@.

Step 6: PMC member.
        Granted later, after sustained committership.
        Trust to receive: binding release vote, security-disclosure access.
        Trust to give: stewardship duties (legal, brand, community, releases).

Each step takes months of consistent engagement. The ladder is asymmetric: the contribution required to climb each step grows roughly linearly, but the trust granted grows roughly exponentially.

Patterns Committers Want

Beyond mechanical style, certain patterns mark a patch as "from someone who gets it":

Use the existing logging idiom

private static final Logger LOG = LoggerFactory.getLogger(MyClass.class);

// Then in method:
LOG.info("Initialized vertex {} with {} tasks", vertexName, numTasks);

Not System.out.println. Not LOG.info("Initialized vertex " + vertexName + ...) (the string is built even when INFO is off in some logging stacks; with SLF4J it's avoided by parameterized form).

Use existing helper classes

If tez-common has a TezUtils helper for serialising a config to a byte buffer, use it. Don't write a new helper inline. Search:

grep -rn "class.*Utils" ~/tez-src/tez-common/src/main/java

Match the surrounding file's style for ambiguous things

If the file uses final on every parameter, your additions should too. If the file uses single-letter loop variables (for (int i = 0; ...), don't suddenly switch to for (int taskIndex = 0; ...). Match the file.

Avoid speculative generality

Don't introduce an interface "in case we need a second implementation later." Don't add a configuration key "in case someone wants to tune this." Both increase the surface area the committer pool must maintain forever.

Cite the JIRA in non-obvious code

// TEZ-4321: handle the case where inputs is null after recover.
if (inputs == null) {
    inputs = Collections.emptyList();
}

The comment is a permanent breadcrumb back to the design discussion.

Keep try/catch narrow

// Good
try {
    state = readState();
} catch (IOException e) {
    LOG.warn("Failed to read state for {}", id, e);
    return defaultState();
}

// Bad — catches too much
try {
    state = readState();
    process(state);              // <-- different exception domain
    publish(state);              // <-- different exception domain
} catch (Exception e) {          // <-- swallows everything
    LOG.error("Something failed", e);
}

Don't add @SuppressWarnings without justification

// Bad
@SuppressWarnings("unchecked")
public List<T> getStuff() { ... }

// Good
@SuppressWarnings("unchecked") // safe; we control all writers
public List<T> getStuff() { ... }

A bare @SuppressWarnings is a code smell that says "I didn't want to deal with the real warning."

Use specific exception types in throws

// Bad
public DAG build() throws Exception { ... }

// Good
public DAG build() throws TezException, IOException { ... }

throws Exception defeats the type system. Reviewers will ask for specifics.

How Trust Is Withdrawn

Trust is built one patch at a time; it can also erode. Things that erode committer trust in a contributor:

BehaviorErosion
Ghosting a patch mid-reviewSignificant; reviewer's time wasted
Re-attaching the same patch without addressing commentsSignificant; wastes another review cycle
Arguing without evidenceModerate; teaches reviewer to expect friction
Pinging weeklyModerate; reviewer learns to deprioritise
Submitting a patch that breaks testsMild if rare; serious if pattern
Committing your own patch without review (as committer)Serious; loss of community trust
Reverting another committer's work without discussionVery serious; potential PMC issue
Public criticism of a committer for their reviewVery serious

The recoverable: explain, apologise, address the underlying issue. Trust returns.

The non-recoverable: code-of-conduct violations. PMC handles these privately.

From First Patch to Commit Bit — The Arc

A realistic 12-month arc for a contributor on the path:

Month 1   First Javadoc fix.   Review takes 2 weeks (reviewer wasn't sure).
          You learn the patch generation workflow.
Month 2   Three small bug fixes.   Review faster (reviewer knows you).
          You learn checkstyle, run it pre-submit.
Month 3   Mid-sized refactor.   Two review rounds, no friction.
          You start filing follow-up JIRAs from things you notice.
Month 4-5 You review someone else's patch with a substantive +1.
          A PMC member notices on dev@.
Month 6   First design discussion on a JIRA.   You write a one-page design.
          Review goes well; consensus reached.
Month 7-8 You're patch-author on the implementation.   Three review rounds.
          Final commit feels routine.
Month 9   You shepherd a new contributor through their first patch.
          PMC notices.
Month 10  You're proposed on private@.   Vote passes.
          You're a committer.
Month 11  You commit your first patch (someone else's, reviewed by you).
          You explicitly don't commit your own work unreviewed.
Month 12  You're routine.   You review 2-3 patches a month, file 2-3.
          The flywheel.

This is one path, not the only path. Some contributors hit the bit at month 6 (extremely sustained activity); some at month 24+ (slower but steady). The trust ladder doesn't have a clock; it has a contribution count + sustained behavior pattern.

Validation Artifacts

After this chapter:

  1. Your IDE is configured to produce checkstyle-passing code at save time.
  2. Your pre-submit script runs mvn checkstyle:check and git diff --check.
  3. A ~/tez-notes/style-patterns.md listing the "patterns committers want" above.
  4. A clear-eyed estimate of where you are on the trust ladder, and what step is next.

This chapter closes the Release & PMC Reality section. The next major section, Hive-on-Tez Labs, is operational engineering at the Tez/Hive boundary — the most common production context for Tez today.

Capstone Project

The Capstone is the bridge from "I have read the Tez codebase" to "I have shipped a non-trivial fix that an Apache Tez committer merged into master." Everything in Levels 1–7 was preparation. This is the work.

You will pick one real, open Apache Tez JIRA, reproduce it against a current build, trace the failure through the codebase, identify the root cause, write a minimum-diff patch with deterministic tests, get it through precommit (Yetus / GitHub Actions), respond to review comments, and land the change. Then you write it up so the next person can learn from your investigation.

This chapter is the table of contents. The ten step-chapters that follow are the work itself.


Prerequisites

Do not start the Capstone until you can answer "yes" to every one of these:

  • Level 1–7 complete. You can read DAGImpl, VertexImpl, TaskImpl, TaskAttemptImpl, AsyncDispatcher, the shuffle path (ShuffleManager, Fetcher, MergeManager), and at least one VertexManagerPlugin (ShuffleVertexManager or RootInputVertexManager) without a guide open.
  • You have built Tez from source. mvn clean install -DskipTests succeeds on your machine, and mvn test -pl tez-dag finishes (some flakes are normal — see Stage 9 of the issue roadmap).
  • You have run MiniTezCluster locally. mvn test -pl tez-tests -Dtest=TestOrderedWordCount goes green.
  • You have a working JIRA + Apache ID (or a GitHub account ready to PR).
  • You have read the Tez contribution guide: https://tez.apache.org/contribution_guide.html and https://cwiki.apache.org/confluence/display/TEZ/How+to+Contribute.

If any of these is "no," stop. Go back. The Capstone is unforgiving of partial preparation — you will spend three weeks confused instead of three weeks shipping.


The 10-Step Flow

flowchart TD
    A[Step 1: Issue Selection] --> B[Step 2: Reproduction]
    B --> C[Step 3: Execution Path Analysis]
    C --> D[Step 4: Root Cause Identification]
    D --> E[Step 5: Implementation]
    E --> F[Step 6: Testing]
    F --> G[Step 7: Validation]
    G --> H[Step 8: Patch / PR]
    H --> I[Step 9: JIRA + Docs]
    I --> J[Step 10: Engineering Write-Up]
    G -.fail.-> D
    F -.fail.-> E
    H -.review.-> E

The dotted arrows are the loops you will actually run. Nobody gets root cause right on the first hypothesis. Nobody passes precommit on the first push. Plan for two or three iterations through Steps 4–8 before you land.


Deliverables

By the time you mark the Capstone done, every one of these artifacts exists:

#ArtifactLives in
1Failing reproducer test (a JUnit test that fails on master without your patch and passes with it)tez-tests/ or a module-local src/test/java/...
2Root-cause document (200–500 words, with file:line citations)capstone-work/root-cause.md in your fork
3Minimum-diff patchA branch on your fork of apache/tez
4Unit tests using DrainDispatcher / mock dispatcher (if state-machine related)The relevant src/test/java
5Integration test using MiniTezCluster (if end-to-end behavior changed)tez-tests/src/test/java/org/apache/tez/test/
6Validation report (output of mvn test -pl <module>, checkstyle, spotbugs, RAT)capstone-work/validation.md
7GitHub PR against apache/tez:master (or .patch file attached to JIRA)https://github.com/apache/tez/pulls
8JIRA updated: status = "Patch Available," PR linked, release-notes filled if user-visiblehttps://issues.apache.org/jira/browse/TEZ-NNNN
9Engineering write-up (500–1000 words: problem, investigation, design, alternatives, lessons)Personal blog, Apache wiki page, or dev@ summary

Every one. No exceptions. The write-up is not optional — it is how the community (and your future self) learns from your investigation.


100-Point Rubric Summary

The full rubric lives in evaluation-rubric.md. Headline:

AreaWeight
Problem articulation (symptom vs. root cause separation, conditions)20
Execution-path mastery (file:line citations, diagram, accuracy)20
Implementation quality (minimum diff, conventions, no scope creep)20
Testing (unit + integration, deterministic, coverage)15
Review responsiveness (addresses comments, iteration cadence)10
Documentation (JIRA, code comments, write-up)10
Community interaction (mailing-list etiquette, handoff hygiene)5

Tier thresholds:

  • 80+ — credible Tez contributor. You can sustain a steady patch flow.
  • 90+ — committer-ready. You are doing work a committer would do without hand-holding.
  • 95+ — PMC-track. You are leading work others want to follow.

You will self-grade in Step 10. Be honest. Inflated self-grades are visible from orbit when a committer reads your write-up.


Timeline

The Capstone is a 4–6 week effort if you have one focused evening per weekday plus weekend mornings. Less than that and you risk losing context between sessions (which is far more expensive than people expect for state-machine code).

WeekStepsHours
11–2: Pick an issue, build a deterministic reproducer10–15
23–4: Trace execution, identify root cause12–18
35–6: Implement fix, write unit + integration tests12–18
47–8: Validate, prepare patch / PR, push8–12
58–9: Review iteration (two or three rounds is normal)6–10
610: Write-up, JIRA cleanup, retrospective4–6

If you blow past six weeks, that is a signal — not a failure. Either the issue is larger than it looked (in which case, pause and renegotiate scope in the JIRA), or you are stuck on a specific step (in which case, ask on dev@tez.apache.org).


Success Indicators

You will know it is working when:

  1. A committer comments "+1" or "LGTM, will commit shortly" on your PR.
  2. Your fix appears in git log apache/master with (cherry picked from commit ...) landing on the next release branch.
  3. The JIRA you claimed flips to "Resolved / Fixed in X.Y.Z" with your name on it.
  4. Your write-up gets traffic — search-engine hits, a comment from another contributor, a question on user@.
  5. The next time you pick a JIRA, you reach root cause in days, not weeks.

You will know it is failing when:

  1. You are still editing files in Step 5 with no failing test in hand from Step 2.
  2. Your PR description says "I think this might fix it."
  3. You have not run mvn test -pl tez-dag end-to-end in over a week.
  4. You are arguing in PR comments instead of changing code or asking questions.

If you spot a failure signal, do not push through. Stop, reread the relevant step chapter, and reset.


How to Use This Chapter

Read all ten step-chapters once, end-to-end, before you start Step 1. You need the shape of the whole journey in your head — Step 4 (root cause) makes choices that Step 6 (testing) depends on; Step 8 (patch) assumes you have artifacts from Steps 2 and 7. Skim now, deep-read each as you arrive at it.

Then go to Step 1: Issue Selection. Pick the issue. The clock starts when you comment "Working on this" on the JIRA.


Validation / Self-check

Before starting Step 1, confirm:

  1. You can produce, from memory, the file path of DAGAppMaster, DAGImpl, VertexImpl, TaskImpl, and AsyncDispatcher.
  2. mvn clean install -DskipTests completes against your local ~/tez-src/ clone.
  3. mvn test -pl tez-tests -Dtest=TestOrderedWordCount passes.
  4. You have a capstone-work/ directory in your fork ready for the root-cause.md, validation.md, and writeup.md deliverables.
  5. You have skimmed every step-chapter once.
  6. You have set aside 4–6 calendar weeks with realistic time budget.
  7. You have subscribed to dev@tez.apache.org (send subscribe to dev-subscribe@tez.apache.org) and issues@tez.apache.org.

Step 1: Issue Selection

Picking the wrong issue is the most expensive mistake in the Capstone. Two weeks of investigation on a JIRA that turns out to be a duplicate, a WONTFIX, or a multi-month rearchitecture is two weeks you do not get back. The goal of this step is not to find a perfect issue. It is to find a tractable issue that exercises the parts of Tez you actually know.

Budget: 1–3 days. If you are past day 4 and still triaging, your standards are too high.


Where the Real Issues Live

Apache Tez tracks issues in JIRA at:

https://issues.apache.org/jira/projects/TEZ

There is no good-first-issue label on Tez (unlike Hadoop). The closest proxies are newbie, very small subtasks of larger umbrellas, and stale unassigned bugs with reproducers attached. You will write your own JQL.

Starter JQL Queries

Run these in JIRA's "Advanced" search box. Open each in a separate tab; do not chase one result before you have seen the whole landscape.

1. Unassigned open bugs, sorted by recency:

project = TEZ AND status in (Open, "In Progress")
  AND assignee is EMPTY
  AND type = Bug
ORDER BY created DESC

2. Bugs with reproducers attached (the gold standard):

project = TEZ AND status = Open
  AND type = Bug
  AND attachments is not EMPTY
ORDER BY updated DESC

3. Newbie-labeled (small surface area):

project = TEZ AND status = Open
  AND (labels = newbie OR labels = beginner OR labels = "low-hanging-fruit")
ORDER BY priority DESC, created DESC

4. Flaky tests (Stage 9 territory, often great Capstone fodder):

project = TEZ AND status = Open
  AND (summary ~ "flaky" OR summary ~ "intermittent" OR description ~ "flaky")
ORDER BY votes DESC

5. Open bugs touching modules you know:

project = TEZ AND status = Open AND type = Bug
  AND (component in ("tez-dag", "tez-runtime-internals", "tez-runtime-library")
       OR summary ~ "VertexImpl"
       OR summary ~ "ShuffleManager"
       OR summary ~ "AsyncDispatcher")
ORDER BY created DESC

Cast a wide net. Pull 20+ candidates into a scratchpad. You will trim aggressively.


Triage: Pick 5 Finalists from 20

For each candidate, spend 10–15 minutes — no more — answering this single question: "Could I write a failing test for this today?" If "no" or "I have no idea," drop it. If "probably yes, here's how," keep it.

Concrete triage protocol:

  1. Read the JIRA description and every comment. Watch for "I cannot reproduce" or "this is a duplicate of TEZ-XXXX" buried at the bottom.
  2. Check git log --grep "TEZ-NNNN" in your ~/tez-src/ clone — has it already been partially fixed?
  3. Search the dev@ mailing list archive for the issue number: https://lists.apache.org/list.html?dev@tez.apache.org.
  4. Open the linked files in your editor. Are they in tez-dag, tez-runtime-*, tez-api (familiar territory), or tez-ui, tez-plugins, tez-yarn-timeline-* (less familiar — skip unless you specifically studied them)?
  5. Note the Affects-Versions field. If it only affects 0.8.x and master has been rewritten in the area, the fix may not be portable.

Keep the 5 finalists in a markdown table:

| TEZ-NNNN | Title | Component | Reproducer? | Last activity | My read |
|---|---|---|---|---|---|
| TEZ-4321 | Fetcher hangs on connection reset | tez-runtime-library | none | 2024-11 | Plausible; I know ShuffleManager |
| TEZ-4456 | VertexImpl NPE on V_ROUTE_EVENT after kill | tez-dag | stack trace only | 2025-02 | Race-y; familiar state machine |
| ... | | | | | |

Scoring Rubric

Score each finalist 0–2 in each column. The winner is the highest aggregate.

Criterion012
ClarityDescription is one sentence and ambiguousDescription names symptom but not conditionsClear symptom + reproduction conditions in description
ScopeOpen-ended ("refactor X")Bounded but spans modulesBounded to one or two classes
IsolationRequires Hive/Pig runningNeeds MiniTezClusterCan be reproduced in pure unit test
TestabilityNo clear failing assertion possibleFailing assertion possible after MiniTezCluster runFailing assertion possible in DrainDispatcher test
AlignmentTouches code I have never readTouches one familiar classTouches 2–3 classes I have studied in Levels 4–6
Community engagementLast activity > 2 years, no watchersSome activity in last yearRecently discussed; a committer responded

Total possible: 12. Anything below 7 is risky. Pick the 9+ candidate.


Three Worked Examples

These are illustrative archetypes, not literal current JIRAs.

Candidate A: "ShuffleManager retries forever on IOException: Connection reset"

  • Clarity: 2 (description names the exception and the loop).
  • Scope: 2 (one class: ShuffleManager or Fetcher).
  • Isolation: 1 (need a fake Fetcher to inject the exception).
  • Testability: 2 (mock-based unit test with retry counter assertion).
  • Alignment: 2 (you read this in Level 5).
  • Community engagement: 1 (one committer comment, no resolution).
  • Total: 10. Pick this.

Candidate B: "Refactor DAGImpl state machine to use enum-based transitions"

  • Clarity: 1 (vague — "refactor").
  • Scope: 0 (touches DAGImpl, every event handler, every test).
  • Isolation: 0 (no failing behavior to test).
  • Testability: 0 (regression-only testing).
  • Alignment: 1 (you know DAGImpl but this is huge).
  • Community engagement: 0 (no committer +1).
  • Total: 2. Skip. This is a months-long design proposal, not a bug.

Candidate C: "Container reuse logs say assigned then released for same container"

  • Clarity: 2 (you can pull the log lines from the description).
  • Scope: 1 (touches TaskSchedulerManager and possibly YarnTaskSchedulerService).
  • Isolation: 0 (need MiniYARNCluster — slow, flaky, environment-sensitive).
  • Testability: 1 (assertions are on log content + scheduler state).
  • Alignment: 1 (you read TaskSchedulerManager once).
  • Community engagement: 2 (recent discussion).
  • Total: 7. Borderline. Pick only if you have no candidate above 8 and you budget extra time for the YARN harness.

Claiming the Issue

Once you decide, claim it publicly. This is non-negotiable — it prevents wasted work by others, and it commits you.

JIRA comment template

Hi — I'd like to work on this as part of an extended Tez learning project.

My plan:
1. Build a deterministic reproducer (target: <date+1 week>).
2. Root-cause analysis (target: <date+2 weeks>).
3. Patch + tests posted for review (target: <date+4 weeks>).

I'll post weekly updates here. If anyone with context has pointers on
<specific question, e.g. "whether this race was discussed in TEZ-NNNN">,
I'd be grateful. Otherwise I'll start on the reproducer this week.

— <Your Name>

Then assign the JIRA to yourself (you need a JIRA account; the Tez PMC grants contributor role on request — comment "please grant contributor role" on any issue and a PMC member will action it within a few days).

If you get no response in 5 business days

Post to dev@tez.apache.org:

Subject: [TEZ-NNNN] Working on this — any context before I dive in?

Hi all,

I left a comment on TEZ-NNNN <link> last week saying I plan to work on it. No
objections so far, so I'm starting on a reproducer this week. If anyone has
historical context — especially whether this overlaps with TEZ-XXXX — please
shout. Otherwise I'll update the JIRA as I make progress.

Thanks,
<Your Name>

If still no response after another week, proceed. Silence on a small bug is permission. (Silence on a redesign proposal is not — different beast.)


Red Flags: Issues to Skip

  • Last comment is from a committer saying "we should think about this more." You are not the right person to land a design call.
  • Open for >5 years with multiple abandoned patches. Something is structurally hard. Not Capstone material — pick later.
  • Touches tez-ui (Ember 1.x). The UI is on a separate lifecycle; build and test setup is divergent from the JVM modules you studied.
  • "Upgrade dependency X to version Y." Looks easy, ends up rebuilding the shuffle stack to handle a Guava API change. Skip unless you specifically want this experience.
  • Critical or Blocker priority with no patch. A committer would already be on it. If they are not, the issue may be misclassified or stale-critical.
  • Reproducer requires a specific Hive version + a 1TB TPC-DS run. No.

Validation / Self-check

Before you advance to Step 2, produce:

  1. A markdown table of your 5 finalists with full scoring rubric, saved as capstone-work/issue-shortlist.md.
  2. The TEZ-NNNN number of your chosen issue, posted as a JIRA comment claiming it.
  3. A 1-paragraph statement of why you picked it (which two criteria scored highest and which scored lowest).
  4. A self-assigned target date for Step 2 (deterministic reproducer in hand).
  5. Subscription confirmed to dev@tez.apache.org and the JIRA itself (click the "Start watching" eye icon).
  6. Your fork of apache/tez exists on GitHub with a branch named tez-NNNN-<short-slug> checked out locally.
  7. A note in capstone-work/issue-shortlist.md of any near-miss candidates you may revisit after the Capstone — these are your next contributions.

Step 2: Reproduction

You do not have a bug until you have a failing test. Stack traces in JIRA comments are circumstantial evidence; a deterministic, automated reproducer is proof. Until you have one, every hypothesis in Step 4 is unverifiable and every "fix" in Step 5 is theater.

Goal of this step: a JUnit test that fails on a clean checkout of apache/tez:master without your patch, in under two minutes, on five out of five runs.


Where Reproducers Live

MiniTezCluster is the Tez-specific harness that boots an in-process YARN cluster plus a DAGAppMaster against the local filesystem. It is the closest thing to a real deployment that you can debug from your IDE.

find ~/tez-src/tez-tests -name "MiniTezCluster.java"
# tez-tests/src/test/java/org/apache/tez/test/MiniTezCluster.java

Read it first, then read one consumer:

grep -n "MiniTezCluster" \
  ~/tez-src/tez-tests/src/test/java/org/apache/tez/test/TestTezJobs.java
grep -n "MiniTezCluster" \
  ~/tez-src/tez-tests/src/test/java/org/apache/tez/test/TestOrderedWordCount.java

TestTezJobs is the canonical "wire up a real cluster, submit a small DAG, assert on the output" example. TestOrderedWordCount is the lighter-weight end-to-end sanity check.

For pure unit-level reproducers (no YARN, no shuffle), use the patterns in:

~/tez-src/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestVertexImpl.java
~/tez-src/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestTaskAttempt.java

These use DrainDispatcher (a synchronous dispatcher that lets you control event ordering deterministically) — see Step 6 for the full pattern.


Three Reproducer Templates

Pick the template that matches your issue type.

Template A: Race-Condition Reproducer (state-machine level)

When the bug is "two events arrive in an unexpected order and the state machine NPEs / wedges / drops a task," you need DrainDispatcher plus controlled event ordering. No MiniTezCluster.

package org.apache.tez.dag.app.dag.impl;

import org.apache.hadoop.yarn.event.DrainDispatcher;
import org.apache.tez.dag.app.AppContext;
import org.apache.tez.dag.app.dag.event.VertexEventTaskCompleted;
import org.apache.tez.dag.app.dag.event.VertexEventSourceTaskAttemptCompleted;
import org.apache.tez.dag.records.TezTaskID;
import org.junit.Before;
import org.junit.Test;

import static org.junit.Assert.assertEquals;

public class TestVertexImplTezNNNNRepro {

  private DrainDispatcher dispatcher;
  private VertexImpl vertex;
  private AppContext appContext;

  @Before
  public void setUp() {
    dispatcher = new DrainDispatcher();
    dispatcher.register(VertexEventType.class, vertexEventHandler());
    dispatcher.start();
    // Use the same factory as TestVertexImpl. Read its setUp() carefully.
    appContext = MockAppContext.create();
    vertex = createVertex(appContext, dispatcher);
    vertex.handle(new VertexEvent(vertex.getVertexId(), VertexEventType.V_INIT));
    dispatcher.await();
  }

  @Test
  public void reproTaskCompletionBeforeRouteEvent() throws Exception {
    // 1. Drive vertex to RUNNING.
    vertex.handle(new VertexEvent(vertex.getVertexId(), VertexEventType.V_START));
    dispatcher.await();
    assertEquals(VertexState.RUNNING, vertex.getState());

    // 2. Inject a task completion BEFORE the V_ROUTE_EVENT that the bug requires
    //    has been processed. This is the race window from the JIRA.
    TezTaskID t0 = vertex.getTask(0).getTaskId();
    vertex.handle(new VertexEventTaskCompleted(t0, TaskState.SUCCEEDED));

    // Do NOT call dispatcher.await() yet — interleave a second event.
    vertex.handle(new VertexEventSourceTaskAttemptCompleted(...));

    dispatcher.await();

    // 3. Assertion that fails on master, passes with fix.
    assertEquals(VertexState.SUCCEEDED, vertex.getState());
    //                     ^^^^^^^^^^^ on master this is FAILED due to the race
  }
}

Key principles:

  • Drive the state machine by handing events to vertex.handle() directly, not by going through a scheduler.
  • Use dispatcher.await() to deterministically drain the queue between phases.
  • The failing assertion is on a getState() or counter, not on log output.

Template B: Configuration / Validation Reproducer

When the bug is "setting tez.foo=bar is silently ignored / produces wrong behavior," reproduce at the API layer.

@Test
public void testConfigKeyHonored() throws Exception {
  TezConfiguration conf = new TezConfiguration();
  conf.set(TezConfiguration.TEZ_AM_FOO_BAR, "42");

  DAG dag = DAG.create("test-dag");
  Vertex v = Vertex.create("v1", ProcessorDescriptor.create(NoOpProcessor.class.getName()), 4);
  dag.addVertex(v);

  // The component under test reads conf — instantiate it directly.
  FooComponent foo = new FooComponent(conf);
  assertEquals(42, foo.getEffectiveValue());
  //                ^^ on master this is the default (e.g. 100) because conf is ignored
}

No cluster, no DAG submission. Just instantiate the class that reads the config and assert the effective value. The fix usually changes one conf.get() call.

Template C: Shuffle / Correctness Reproducer

When the bug is "output is wrong" (missing rows, duplicated rows, partial sort), you need MiniTezCluster and a small DAG with deterministic input.

public class TestShuffleCorrectnessTezNNNN {

  private static MiniTezCluster mrrTezCluster;
  private static FileSystem fs;

  @BeforeClass
  public static void setup() throws Exception {
    Configuration conf = new Configuration();
    fs = FileSystem.getLocal(conf);
    mrrTezCluster = new MiniTezCluster("TestShuffleRepro", 1, 1, 1);
    mrrTezCluster.init(conf);
    mrrTezCluster.start();
  }

  @AfterClass
  public static void cleanup() throws Exception {
    if (mrrTezCluster != null) mrrTezCluster.stop();
  }

  @Test(timeout = 120_000)
  public void reproPartitionedOutputMissingRows() throws Exception {
    Path inputDir = new Path("/tmp/repro-input-" + System.nanoTime());
    Path outputDir = new Path("/tmp/repro-output-" + System.nanoTime());
    writeKnownInput(fs, inputDir, /*rows=*/ 10_000);

    TezConfiguration tezConf = new TezConfiguration(mrrTezCluster.getConfig());
    DAG dag = buildTwoVertexDAG(inputDir, outputDir);

    TezClient client = TezClient.create("repro", tezConf);
    client.start();
    try {
      DAGClient dagClient = client.submitDAG(dag);
      DAGStatus status = dagClient.waitForCompletionWithStatusUpdates(null);
      assertEquals(DAGStatus.State.SUCCEEDED, status.getState());

      long outputRowCount = countRows(fs, outputDir);
      // On master this is 9_973 (27 rows lost in shuffle). With fix: 10_000.
      assertEquals(10_000L, outputRowCount);
    } finally {
      client.stop();
    }
  }
}

Build with deterministic input (fixed seed if random) so the missing-row count is reproducible across runs.


Logging: See What the State Machine Is Actually Doing

A reproducer without logs is half a reproducer. You will spend Step 4 staring at these logs.

Drop this into your test resources at src/test/resources/log4j.properties (or log4j2.properties for newer modules — check which the module uses):

log4j.rootLogger=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{HH:mm:ss.SSS} %-5p [%t] %c{1}: %m%n

# Tez AM internals — the state-machine event log lives here
log4j.logger.org.apache.tez.dag.app.DAGAppMaster=DEBUG
log4j.logger.org.apache.tez.dag.app.dag.impl.DAGImpl=DEBUG
log4j.logger.org.apache.tez.dag.app.dag.impl.VertexImpl=DEBUG
log4j.logger.org.apache.tez.dag.app.dag.impl.TaskImpl=DEBUG
log4j.logger.org.apache.tez.dag.app.dag.impl.TaskAttemptImpl=DEBUG

# Async dispatcher event flow
log4j.logger.org.apache.tez.dag.app.AsyncDispatcher=DEBUG

# Runtime task lifecycle
log4j.logger.org.apache.tez.runtime.task=DEBUG
log4j.logger.org.apache.tez.runtime.LogicalIOProcessorRuntimeTask=DEBUG

# Shuffle internals
log4j.logger.org.apache.tez.runtime.library.common.shuffle=DEBUG
log4j.logger.org.apache.tez.runtime.library.common.shuffle.impl.ShuffleManager=DEBUG
log4j.logger.org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Fetcher=DEBUG

# Scheduler
log4j.logger.org.apache.tez.dag.app.rm.TaskSchedulerManager=DEBUG
log4j.logger.org.apache.tez.dag.app.rm.YarnTaskSchedulerService=DEBUG

The two most useful patterns to grep for in the output:

grep -E "VertexImpl|TaskImpl|TaskAttemptImpl" target/surefire-reports/*.txt \
  | grep -E "state|State|Event|EVENT"

That gives you the state-transition trace, which is what you'll diagram in Step 3.

Capturing container logs from MiniTezCluster

MiniTezCluster writes container logs (where your tasks' stderr/stdout end up) under the surefire working directory:

<module>/target/<test-class>-tmpDir/<application-id>/container-logs/

Or, in newer YARN versions:

<module>/target/MiniMRYarnCluster-localDir-nm-X_Y/usercache/<user>/appcache/<app>/container_*/

Find them with:

find ~/tez-src/tez-tests/target -name "syslog" -path "*container*" -mmin -30

Read syslog (TaskAttempt logs) and stderr (uncaught exceptions). The prelaunch.out and directory.info files explain what was actually launched.


Verify Determinism

Five runs. If even one is green, your reproducer is not deterministic yet — it is a coin flip you happen to have caught. Fix the race window before declaring victory.

cd ~/tez-src
for i in 1 2 3 4 5; do
  echo "=== Run $i ==="
  mvn test -pl tez-dag -Dtest=TestVertexImplTezNNNNRepro -q 2>&1 \
    | tail -20
done

Expected output: five FAILs with the same assertion failure on the same line.

If you see 4 FAIL / 1 PASS:

  • Add a Thread.sleep is the wrong answer. (Reread Step 6.)
  • Insert an explicit event ordering: drain the dispatcher between every event, inject the conflicting events as a Future you control.
  • Use CountDownLatch to gate the producer thread until the consumer is at a known state.

If you cannot get to 5/5 fails, the bug may genuinely depend on external timing (network, GC). In that case, escalate to a stress-test pattern: run the inner test body 100x in a @RepeatedTest and assert that the failure rate is >50%. Less ideal but acceptable for some shuffle race bugs.


Validation / Self-check

By the end of Step 2 you must have:

  1. A new test file under <module>/src/test/java/... named Test<Component>Tez<NNNN>Repro.java (the Repro suffix is for your workflow; you'll rename it to a real test name in Step 6).
  2. The test fails on a clean ~/tez-src/ at master with an assertion error (not a setup error, not a timeout — an assertion error).
  3. Five consecutive runs produce the same failure on the same line.
  4. The failure happens in under 120 seconds per run.
  5. A log4j.properties snippet in src/test/resources/ enabling debug logging on the relevant Tez packages.
  6. A captured log excerpt (paste into capstone-work/repro-logs.txt) showing the state-machine trace at the moment of failure.
  7. A one-paragraph description of the failure mode in your own words, saved to capstone-work/repro-summary.md. You will refine this into the root-cause document in Step 4.

Step 3: Execution Path Analysis

You have a failing test. Now you map the path the request takes from the moment TezClient.submitDAG() returns through every event, dispatcher hop, and state transition until the failure manifests. This map is the foundation for every hypothesis in Step 4. A wrong map produces a wrong root cause.

Budget: 2–4 evenings. The work is reading code, grep, and drawing.


The Canonical Submit Path

Every DAG that fails went through this skeleton path before it failed. Memorize it; you will use it as the reference axis when you sketch where your particular failure deviates.

TezClient.submitDAG(DAG)
    [tez-api/src/main/java/org/apache/tez/client/TezClient.java]
        |
        v
TezClient.submitDAGSession() or submitDAGApplication()
        |  (session vs. non-session — see TezClient.java for branch)
        v
DAGClientHandler.submitDAG(...)
    [tez-dag/src/main/java/org/apache/tez/dag/api/client/DAGClientHandler.java]
        |
        v
DAGAppMaster.submitDAGToAppMaster(...)
    [tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java]
        |
        v
DAGAppMaster.startDAG(...)
        |  - builds DAGImpl
        |  - emits DAGEventType.DAG_INIT
        v
AsyncDispatcher.dispatch(DAGEvent)
    [tez-dag/src/main/java/org/apache/tez/dag/app/AsyncDispatcher.java]
    (uses Hadoop's hadoop-yarn-common AsyncDispatcher under the hood;
     Tez subclasses it — see Tez source for the wrapper)
        |
        v
DAGImpl.handle(DAGEvent)
    [tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java]
        |  state DAG_NEW --DAG_INIT--> INITED
        |  emits DAGEventType.DAG_START
        v
DAGImpl on DAG_START
        |  state INITED --DAG_START--> RUNNING
        |  for each Vertex: emits VertexEvent V_INIT
        v
VertexImpl.handle(VertexEventType.V_INIT)
    [tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java]
        |  state NEW --V_INIT--> INITIALIZING
        |  invokes VertexManagerPlugin.initialize()
        |  on success emits V_INITED
        v
VertexImpl on V_INITED -> on V_START
        |  state INITED --V_START--> RUNNING
        |  schedules tasks via TaskImpl events (T_SCHEDULE)
        v
TaskImpl.handle(T_SCHEDULE)
    [tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java]
        |  state NEW --T_SCHEDULE--> SCHEDULED
        |  spawns a TaskAttemptImpl, emits TA_SCHEDULE
        v
TaskAttemptImpl.handle(TA_SCHEDULE)
    [tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptImpl.java]
        |  state NEW --TA_SCHEDULE--> START_WAIT
        |  requests container from TaskSchedulerManager
        v
TaskSchedulerManager / YarnTaskSchedulerService
    [tez-dag/src/main/java/org/apache/tez/dag/app/rm/]
        |  assigns container, emits TA_CONTAINER_LAUNCHED
        v
TaskAttemptImpl receives TA_CONTAINER_LAUNCHED
        |  state START_WAIT --TA_CONTAINER_LAUNCHED--> RUNNING
        |  the container is now actually running our task
        v
[ container process boots ]
TezTaskRunner2.run()
    [tez-runtime-internals/src/main/java/org/apache/tez/runtime/task/TezTaskRunner2.java]
        |
        v
TezChild / TaskRunner instantiates LogicalIOProcessorRuntimeTask
        |
        v
LogicalIOProcessorRuntimeTask.run()
    [tez-runtime-internals/src/main/java/org/apache/tez/runtime/LogicalIOProcessorRuntimeTask.java]
        |  initializes Inputs, Outputs, Processor
        |  calls Processor.run(inputs, outputs)
        v
[ user code runs — e.g. OrderedWordCount or your DAG's processor ]
        |
        v
heartbeat -> TaskAttemptListener -> TaskAttemptImpl TA_DONE / TA_FAILED

That is the skeleton. Your job in this step is to find the segment where your failure occurs and draw it with line numbers.


Run These Greps

These greps locate the actual file paths and method bodies on your local clone. Run them in ~/tez-src/. Each one gives you a line number to open.

# Entry: submitDAG
grep -n "public.*submitDAG" \
  tez-api/src/main/java/org/apache/tez/client/TezClient.java

# Server-side intake
grep -n "submitDAG\|startDAG" \
  tez-dag/src/main/java/org/apache/tez/dag/api/client/DAGClientHandler.java \
  tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java

# DAGImpl handlers
grep -nE "addTransition|stateMachineFactory" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java | head -40

# VertexImpl state machine
grep -nE "addTransition|stateMachineFactory" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head -60

# TaskImpl state machine
grep -nE "addTransition|stateMachineFactory" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java | head -60

# TaskAttemptImpl state machine
grep -nE "addTransition|stateMachineFactory" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskAttemptImpl.java | head -80

# Dispatcher
grep -n "class AsyncDispatcher\|dispatch\b" \
  tez-dag/src/main/java/org/apache/tez/dag/app/AsyncDispatcher.java

# Runtime task entry
grep -n "public void run\|class TezTaskRunner2" \
  tez-runtime-internals/src/main/java/org/apache/tez/runtime/task/TezTaskRunner2.java

grep -n "public void run\|initialize\|class LogicalIOProcessorRuntimeTask" \
  tez-runtime-internals/src/main/java/org/apache/tez/runtime/LogicalIOProcessorRuntimeTask.java

Open each line in your editor. Read the transition table. Note which event you care about and which state(s) it is legal in.


Locate Your Specific Failure Segment

The skeleton is the highway; your bug is at one specific exit. Use these heuristics:

Symptom in repro logsLikely segment
VertexImpl ... transitioned from RUNNING to FAILEDVertexImpl state machine — transition on V_TASK_RESCHEDULED or V_INTERNAL_ERROR
TaskAttemptImpl ... NPETaskAttemptImpl event handlers; check container-launched and TA_DONE paths
NPE in AsyncDispatcher.dispatchRace between dispatcher start/stop and event submission
ShuffleManager: too many fetch failuresFetcher retry/timeout; ShuffleManager.fetchFailure()
IFile checksum mismatchIFile.Writer/Reader; check spill+merge
OutOfMemory ... GROUP_COMPARATORMergeManager memory math; ifile spill thresholds
Container released before TA_DONETaskSchedulerManager reuse path; check container release races

Once you know your segment, draw it.


Build the Path Diagram

Two formats. Do both — they validate each other.

Text-arrow form (paste into the root-cause doc)

Use this in JIRA comments and PR descriptions. It survives any rendering.

TezClient.submitDAG (TezClient.java:485)
  -> DAGClientHandler.submitDAG (DAGClientHandler.java:152)
  -> DAGAppMaster.startDAG (DAGAppMaster.java:1234)
  -> DAGImpl V_NEW --DAG_INIT--> INITED (DAGImpl.java:340)
  -> DAGImpl INITED --DAG_START--> RUNNING (DAGImpl.java:380)
  -> VertexImpl v1 NEW --V_INIT--> INITIALIZING (VertexImpl.java:1820)
  -> VertexImpl v1 INITIALIZING --V_INITED--> INITED (VertexImpl.java:1856)
  -> VertexImpl v1 INITED --V_START--> RUNNING (VertexImpl.java:1901)
  -> [21 TaskImpl T_SCHEDULE events fired]
  -> TaskImpl t0 NEW --T_SCHEDULE--> SCHEDULED (TaskImpl.java:412)
  -> TaskAttemptImpl t0.0 NEW --TA_SCHEDULE--> START_WAIT (TaskAttemptImpl.java:560)
  -> [container assigned]
  -> TaskAttemptImpl t0.0 START_WAIT --TA_CONTAINER_LAUNCHED--> RUNNING (...:610)
  -> [container starts LogicalIOProcessorRuntimeTask]
  -> ShuffleManager.run starts fetcher loop
  -> Fetcher.fetchNext throws IOException (Fetcher.java:289)  <-- FAILURE HERE
  -> ShuffleManager.fetchFailure -> InputReadErrorEvent
  -> TaskAttemptImpl t0.0 RUNNING --TA_FAILED--> FAILED

Cite real line numbers from your checkout. Future-you will thank you.

Mermaid diagram (for the write-up and PR)

sequenceDiagram
    participant C as Client
    participant AM as DAGAppMaster
    participant D as DAGImpl
    participant V as VertexImpl v1
    participant T as TaskImpl t0
    participant TA as TaskAttempt t0.0
    participant SM as ShuffleManager
    participant F as Fetcher

    C->>AM: submitDAG
    AM->>D: DAG_INIT
    D->>D: NEW -> INITED
    AM->>D: DAG_START
    D->>V: V_INIT
    V->>V: NEW -> INITIALIZING -> INITED
    D->>V: V_START
    V->>T: T_SCHEDULE
    T->>TA: TA_SCHEDULE
    TA->>TA: NEW -> START_WAIT
    Note over TA: container assigned + launched
    TA->>TA: START_WAIT -> RUNNING
    TA->>SM: shuffle starts
    SM->>F: fetchNext
    F-->>SM: IOException
    SM->>TA: InputReadErrorEvent (TA_FAILED)
    TA->>TA: RUNNING -> FAILED

Both diagrams say the same thing. Together they pass review with a committer because they prove you actually read the code instead of paraphrasing the JIRA.


Verify Empirically with Temporary LOG.info() Probes

The map is a hypothesis. Confirm it with probes. Add temporary logging at the points you think your event traverses. Pattern:

// In VertexImpl.java, inside the handler you suspect:
private static final Logger LOG = LoggerFactory.getLogger(VertexImpl.class);

LOG.info("PROBE-TEZ{}: V_INIT entered for vertex={} state={}",
    "NNNN", getName(), getState());

Rules for probes:

  • Prefix every probe with PROBE-TEZ<NNNN> so you can grep them in one pass and delete in one pass.
  • Use LOG.info not LOG.debug so they appear without changing log config.
  • Include the field values you care about (state, event type, IDs).
  • Never commit probes. They are scaffolding for Step 4.

After re-running your test:

mvn test -pl tez-dag -Dtest=TestVertexImplTezNNNNRepro -q 2>&1 \
  | grep "PROBE-TEZNNNN" | tee /tmp/probe-trace.txt

Compare the probe trace to your diagram. Discrepancies are the most valuable output of this whole step — they are exactly where your mental model differs from the code.

Common discrepancies to watch for:

  • "I thought this handler ran once. It ran three times." (Re-entrancy bug.)
  • "I thought events arrived in order A,B,C. They arrived B,A,C." (Async dispatch reordering.)
  • "I thought the vertex was in RUNNING. It was in INITED." (Wrong assumption about state at the time of the event.)

When a probe surprises you, do not delete the probe. Lean in. That is the shortest path to root cause.


Output

Your Step 3 deliverables live in capstone-work/execution-path/:

  • path-skeleton.md — text-arrow form with line numbers.
  • path.mmd — the mermaid source.
  • probe-trace.txt — grep output from the probe run.
  • notes.md — three to five surprises you found while reading.

Validation / Self-check

Before you advance to Step 4, you must:

  1. Be able to name, from memory, every state transition between TezClient.submitDAG() and your failure point.
  2. Have file:line citations for every transition in your diagram, against your ~/tez-src/ HEAD.
  3. Have run the repro with PROBE-TEZ<NNNN> log statements and confirmed the sequence matches your diagram (or, more usefully, noted where it diverges).
  4. Have removed every probe from your working tree before any commit (git diff should not contain "PROBE-").
  5. Have at least one "surprise" noted in notes.md — if you have zero, you did not look hard enough.
  6. Be able to answer: "Which event, in which state, on which class, fires the handler that produces the failure?" in one sentence.
  7. Have the mermaid diagram render without syntax errors (mdbook serve your capstone-work folder, or paste into mermaid.live).

Step 4: Root Cause Identification

A symptom is "the test fails." A root cause is "this specific line, in this specific state, when this specific event arrives, performs this specific incorrect operation, because of this specific design assumption that no longer holds." If your statement does not have that shape, you have not found root cause yet.

This step is mostly thinking. The tools are five-whys, git blame, and git bisect. The output is a 200–500 word root-cause document and a tested hypothesis.


Five Whys, Applied to a State-Machine Race

The five-whys technique sounds trite. It is not. The discipline of asking "why" five times in a row forces you past the first plausible explanation (almost always wrong) and into the actual design defect (almost always two or three levels deeper than you initially thought).

Worked example: vertex stays in RUNNING after all tasks succeed

Symptom from Step 2: assertEquals(SUCCEEDED, vertex.getState()) fails with expected SUCCEEDED but was RUNNING. Repro is deterministic at 5/5.

Why 1: Why is the vertex still in RUNNING?

Because the transition to SUCCEEDED requires all tasks to have completed AND the vertex's completion handler to have been invoked. Looking at the probe trace from Step 3, the completion handler was invoked. So the transition was attempted.

Why 2: Why did the transition not happen even though the handler ran?

Because the handler returned a new state that depends on a counter (completedTaskCount). The probe shows completedTaskCount = 19 when the handler ran, but the vertex has 20 tasks. So the guard says "not done yet."

Why 3: Why is the count 19 when all 20 task-completed events were fired?

Because the count is incremented inside the handler, AFTER a check that re-routes certain V_TASK_COMPLETED events back through another handler. The re-route fires for the 20th task (look at VertexImpl.java around line 2750 — the if (recoveryData != null) branch). The re-routed event is queued but the test's dispatcher.await() returns before the queue is fully drained.

Why 4: Why does dispatcher.await() return before the re-routed event is processed?

Because AsyncDispatcher.await() waits for the current queue to drain, but the re-route enqueues into a secondary queue (the recovery dispatcher) which is not joined by the primary await.

Why 5: Why are there two dispatchers, and why does the test only await one?

Because recovery events were added in TEZ-2877 as a separate dispatch path to avoid blocking the main event loop during recovery replay. The test setup predates that change. The test never knew there was a second queue to wait on.

Root cause statement: The 20th V_TASK_COMPLETED event is enqueued into the recovery dispatcher rather than handled directly when recoveryData != null, and the test (and any caller relying on the primary dispatcher having drained) observes a stale completedTaskCount. The fix is either to (a) join the recovery dispatcher in await(), (b) handle the recovery-data branch synchronously when not actually replaying recovery, or (c) document that callers must use a different barrier.

That is a root cause. The fix direction is now obvious-ish. You can argue between (a), (b), (c) — but you know what each one changes.


Git Archaeology

Once you have a candidate cause, ask: when did this break? And why did the person who wrote it think it was correct?

git log --follow -p -S<token>

Find every commit that introduced or removed a specific string or method name:

cd ~/tez-src

# Every commit that touched the recovery dispatcher branch
git log --follow -p -S "recoveryData != null" \
  -- tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java

# Every commit that mentions the counter
git log --follow -p -S "completedTaskCount" \
  -- tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java

# The original change that added recovery dispatching
git log --all --grep="TEZ-2877" --oneline

-S ("pickaxe") matches commits where the count of that string changed — either added or removed. It is the single most powerful git command in this entire chapter. Learn it.

git blame -L <start>,<end>

Once you know the file and lines, find the commit and committer:

git blame -L 2740,2770 \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java

Output looks like:

a1b2c3d4 (Alice 2018-04-12 09:34:18 -0700 2745)     if (recoveryData != null) {
a1b2c3d4 (Alice 2018-04-12 09:34:18 -0700 2746)       handleRecovery(event);
a1b2c3d4 (Alice 2018-04-12 09:34:18 -0700 2747)       return;
a1b2c3d4 (Alice 2018-04-12 09:34:18 -0700 2748)     }

Then read the commit:

git show a1b2c3d4
git log -1 --format="%B" a1b2c3d4

Look for the JIRA reference in the commit message (TEZ-NNNN: ...). Open that JIRA. Read every comment. Often you will discover:

  • The change was made to fix a different bug (recovery correctness) and introduced your bug as collateral.
  • There was a comment on the original JIRA flagging the exact concern you are hitting. ("This might race with the test dispatcher pattern" — and it did.)
  • The fix you are considering was discussed and rejected for a reason you must now address.

git bisect for Regressions

If the bug is a regression — works in 0.9.x, broken in 0.10.x — bisect tells you the exact commit that introduced it. This is the highest-confidence signal in all of root-cause work.

cd ~/tez-src
git bisect start
git bisect bad master
git bisect good rel/release-0.9.2

# git checks out a midpoint commit. Build and run the repro:
mvn install -DskipTests -pl tez-dag -am -q
mvn test -pl tez-dag -Dtest=TestVertexImplTezNNNNRepro -q

# If the test FAILS at this commit: bug exists here
git bisect bad
# If the test PASSES at this commit: bug introduced later
git bisect good

# Repeat. git narrows to one commit in log2(N) steps.

Once bisect converges:

a1b2c3d4 is the first bad commit
commit a1b2c3d4
Author: Alice <alice@example.org>
Date:   Thu Apr 12 09:34:18 2018
    TEZ-2877: Add recovery dispatcher path

Now you know:

  • The JIRA that introduced the regression.
  • The author (potential reviewer for your fix — Cc them).
  • The exact diff to study.

Automating bisect with git bisect run <script> is also fair game once you have a return-code-clean reproducer command.


Writing the Root-Cause Statement

This document goes into your JIRA, into your PR description, and into your write-up. 200–500 words, no more, no less. Use this template:

## Root cause: TEZ-NNNN

### Symptom
<one sentence — what the user sees>

### Trigger conditions
- <condition 1, e.g. recovery data is non-null when V_TASK_COMPLETED fires>
- <condition 2, e.g. only on the last task in a vertex>
- <condition 3 if any>

### Affected code
- `tez-dag/src/main/java/.../VertexImpl.java#L2745-L2748` (the recovery branch)
- `tez-dag/src/main/java/.../AsyncDispatcher.java#L210` (`await()` does not
  join the secondary queue)

### Mechanism
<three to five sentences explaining the actual defect. Use words like "because",
"as a result", "however". This is the part most people get wrong — they describe
the symptom again instead of the mechanism. The mechanism answers: of the
many ways this code could have been written, why does the current way produce
this wrong answer?>

### Introducing change
- TEZ-2877 (commit a1b2c3d4) added the recovery-dispatch branch without
  updating `AsyncDispatcher.await()` to join the recovery queue.
- The original JIRA flagged this as a concern (link to comment) but the
  resolution was deferred ("we don't await in production paths, only in
  tests").

### Fix direction
Three options considered:

1. **Join the recovery dispatcher in `await()`.** Smallest change. Risk: may
   slow recovery in production if a slow recovery handler blocks the await.
2. **Handle the recovery branch synchronously when not replaying.** Larger
   change, narrower blast radius. Recommended.
3. **Document that tests must use a new barrier.** Cheapest. Pushes burden
   onto every test author. Rejected.

Recommended: option 2. See Step 5 for the diff.

Save as capstone-work/root-cause.md.


Validating the Hypothesis

A root cause is not validated until you have demonstrated it. Two ways:

1. Revert the introducing commit and re-run the repro

git checkout master
git revert --no-commit a1b2c3d4   # introducing commit from bisect
mvn install -DskipTests -pl tez-dag -am -q
mvn test -pl tez-dag -Dtest=TestVertexImplTezNNNNRepro -q

If the test now PASSES (because the change you reverted is what introduced the bug), your root cause is at least partially correct. If it still FAILS, the introducing commit is not the root cause — there is a deeper issue.

Reset before you go any further:

git reset --hard origin/master

2. Make a minimal one-line "patch" that confirms the mechanism

You are not writing the real fix yet. You are confirming the mechanism. For the example above:

--- a/tez-dag/.../VertexImpl.java
+++ b/tez-dag/.../VertexImpl.java
@@ -2745,3 +2745,3 @@
-    if (recoveryData != null) {
+    if (recoveryData != null && isReplayingRecovery()) {
       handleRecovery(event);
       return;
     }

(Assume isReplayingRecovery() does not exist yet — pretend it returns false in tests, true only during actual recovery replay.) Apply this, re-run the repro. If it passes, the mechanism is confirmed even if the actual API does not exist yet.

If the test still fails: your mechanism is wrong. Go back to the five-whys.

If the test now passes but breaks 14 other tests: your fix direction is too broad. Go back to "fix direction" in the root-cause statement and pick a narrower option.


Validation / Self-check

Before advancing to Step 5:

  1. capstone-work/root-cause.md exists, follows the template, is 200–500 words.
  2. You can name the introducing commit (full SHA) and JIRA.
  3. You ran git bisect to convergence (or proved bisect doesn't apply because the bug existed since the file was first added — note this in the doc).
  4. You ran a "revert introducing commit" experiment and saw the test go green (or have a documented reason the revert doesn't apply).
  5. You wrote a one-line throwaway "mechanism confirmation" patch and saw the test pass on it.
  6. You have read every comment on the introducing JIRA.
  7. You can articulate three fix directions and explain why you rejected two of them in one sentence each.

Step 5: Implementation

Your fix is the smallest diff that makes the failing test pass without breaking any other test. Period. Anything else — a refactor you noticed, a TODO you want to address, a better name for a field — belongs in a separate JIRA, not this PR.

Committer reviewers' single biggest objection to first-time contributors is scope creep. The second is API hygiene. This chapter is about both.


Minimum-Diff Principle

The fix should change as few lines as possible while addressing the root cause identified in Step 4. Everything that survives compilation but is not strictly required to fix the bug is review surface area. Review surface area is the enemy of "merged this week."

Too much

-  public void handleVertexCompleted(VertexEvent event) {
-    if (recoveryData != null) {
-      handleRecovery(event);
-      return;
-    }
-    completedTaskCount++;
-    if (completedTaskCount == numTasks) {
-      transitionToSucceeded();
-    }
-  }
+  // Refactored to use stream API for clarity
+  public void handleVertexCompleted(final VertexEvent event) {
+    Optional.ofNullable(recoveryData)
+      .filter(rd -> isReplayingRecovery())
+      .ifPresentOrElse(
+          rd -> handleRecovery(event),
+          () -> {
+            this.completedTaskCount = this.completedTaskCount + 1;
+            this.maybeTransitionToSucceeded();
+          });
+  }
+
+  private void maybeTransitionToSucceeded() {
+    if (completedTaskCount == numTasks) {
+      transitionToSucceeded();
+    }
+  }

This will be rejected. You changed five things (stream API, final keyword, method extraction, control-flow shape, formatting). A committer cannot tell which change is the actual fix without re-deriving the root cause from scratch.

Just right

   public void handleVertexCompleted(VertexEvent event) {
-    if (recoveryData != null) {
+    if (recoveryData != null && isReplayingRecovery()) {
       handleRecovery(event);
       return;
     }
     completedTaskCount++;
     if (completedTaskCount == numTasks) {
       transitionToSucceeded();
     }
   }

One line. The change matches the root-cause statement verbatim. A reviewer reads it, opens the root-cause doc, agrees in 30 seconds.

The extracted helper, the final keyword, the stream rewrite — all may be good ideas. File them as separate JIRAs after this lands.

The Boy Scout rule does NOT apply

In a green-field project, "leave the campground cleaner than you found it" is fine. In Apache project review, drive-by cleanups block your fix because they expand the review and trigger objections you do not need to deal with to land the actual bug fix. Resist the urge.


Where Does the Fix Go? A Decision Tree

Is the bug a check that should have rejected an input but didn't?
    -> Guard condition (likely in a setter or builder).
       Example: TezConfiguration.validate(), DAG.verify().

Is the bug a wrong state machine transition?
    -> State-machine transition table edit.
       Look for stateMachineFactory.addTransition() in the affected *Impl class.
       The fix is usually adding/removing a transition or changing its target state.

Is the bug a config key being read at the wrong place or with the wrong default?
    -> Config validation in the constructor of the class that reads it.
       Or a fix to where conf.get() / conf.getInt() is called.

Is the bug a logic error in business code (wrong arithmetic, wrong comparator,
missing close())?
    -> Logic bug. Fix is local to the offending method.
       Add a test that asserts the corrected behavior.

Is the bug a race?
    -> First, prove it is actually a race with DrainDispatcher. Most "races"
       turn out to be logic bugs that *look* race-y because event ordering
       is non-obvious.
    -> If genuinely a race: usually a missing dispatcher.await, a missing
       volatile, or a transition guard that isn't atomic with a counter
       increment. Synchronize the smallest critical section.

Is the bug a memory issue (OOM, off-heap leak)?
    -> Almost never in scope for a first Capstone. Pause and consult a committer.

Configuration Keys: The Right Way

You will be tempted to "add a knob" — a new tez.foo.bar flag that defaults to the old (buggy) behavior, lets users opt in to the fix. Resist. Knobs are an admission that you don't trust your fix. If your fix is correct, it should be the new default; if it isn't, fix the fix, not the user's configuration burden.

When a knob IS justified:

  • The fix changes a performance-sensitive default that may regress some users.
  • The fix changes user-visible output format (release-note required).
  • The fix is gated on a long-deprecation window and the old behavior must remain available for one or two releases.

When you DO add a key, conform to Tez convention. Read:

grep -n "TEZ_AM\|TEZ_TASK\|TEZ_RUNTIME" \
  tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java \
  | head -40

You will see the pattern:

/**
 * Maximum number of times an AM can attempt to launch a task before failing
 * the task.
 * <p>
 * Default: {@link #TEZ_AM_TASK_MAX_FAILED_ATTEMPTS_DEFAULT}.
 *
 * @since 0.9.0
 */
@ConfigurationScope(Scope.AM)
@ConfigurationProperty(type = "integer")
public static final String TEZ_AM_TASK_MAX_FAILED_ATTEMPTS =
    TEZ_AM_PREFIX + "task.max.failed.attempts";
public static final int TEZ_AM_TASK_MAX_FAILED_ATTEMPTS_DEFAULT = 4;

Mandatory elements for any new key:

  • Javadoc that explains what the knob does and when to change it.
  • @since X.Y.Z matching the next release version.
  • @ConfigurationScope (AM, VERTEX, TASK, CLIENT).
  • @ConfigurationProperty(type = "integer" / "long" / "boolean" / "string").
  • A _DEFAULT constant alongside.
  • Use the right prefix constant (TEZ_AM_PREFIX, TEZ_RUNTIME_PREFIX, etc.).
  • Add to tez-api/src/main/resources/META-INF/services/... if the doc-gen needs to pick it up (check existing keys to see if their config-doc generator catches up automatically or needs manual entries).

A new key that violates any of these will fail review.


Tez Coding Style

Read the existing class you are editing. Match its style exactly. The project-wide rules below are necessary but not sufficient — the file-local conventions matter just as much.

Logging

Always slf4j, never log4j directly, never System.out:

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

private static final Logger LOG = LoggerFactory.getLogger(VertexImpl.class);

LOG.info("Vertex {} transitioned from {} to {} on event {}",
    getName(), oldState, newState, event.getType());

Use {} parameterization, never string concatenation in log args. Use the exception form LOG.error("Failed to schedule task {}", taskId, ex) rather than concatenating ex.toString().

Preconditions

Tez uses Guava Preconditions heavily. Use it for invariants and argument checks:

import com.google.common.base.Preconditions;

Preconditions.checkNotNull(event, "event must not be null");
Preconditions.checkArgument(parallelism > 0,
    "parallelism must be positive, got %s for vertex %s", parallelism, vertexName);
Preconditions.checkState(getState() == VertexState.RUNNING,
    "Vertex %s must be RUNNING to receive %s, was %s",
    getName(), event.getType(), getState());

The variadic %s form is preferable to string concatenation because it is free when the check passes.

Exception messages

Always include the context: which vertex, which task ID, which state, which event. Diagnosing a Tez bug from a stack trace alone is hard enough; an exception message that just says "invalid state" is hostile.

Bad:

throw new IllegalStateException("invalid state");

Good:

throw new IllegalStateException(String.format(
    "Vertex %s received event %s in state %s, which is not legal. "
        + "Expected one of [RUNNING, INITED].",
    getName(), event.getType(), getState()));

Forbidden

  • System.out.println / System.err.println (use LOG).
  • e.printStackTrace() (use LOG.error("...", e)).
  • Thread.sleep in production code unless you have a // TEZ-NNNN: justification comment AND a committer agreed in review.
  • New synchronized methods on hot paths — discuss in the JIRA before adding.
  • Adding new dependencies to pom.xml without discussion. This is a major re-review trigger.

Imports

  • No wildcard imports (import foo.bar.*;). The project's checkstyle catches these and you will fail precommit.
  • Group order: java, javax, org, com, third-party, project. Most IDEs handle this automatically.

Tests

Discussed fully in Step 6, but: every fix must come with at least one test that fails on master and passes with your fix. No test, no merge.


Building Incrementally

Do not try to write the whole fix and run the whole test suite. That feedback loop is too slow. Instead:

# Tight loop: compile + run only the changed module's affected test.
mvn install -DskipTests -pl tez-api,tez-common -am -q && \
  mvn test -pl tez-dag -Dtest=TestVertexImplTezNNNNRepro -q

# When that goes green, broaden:
mvn test -pl tez-dag -Dtest='TestVertex*' -q

# Finally, full module:
mvn test -pl tez-dag -q

If your fix touches tez-api, you have to rebuild every downstream module. The -am flag is your friend — "also make" upstream deps.


When You Get Stuck

Hard rule: if you have not made forward progress in three sessions, post on the JIRA. Format:

Status update: I have the repro from Step 2 passing/failing as expected. My
working hypothesis is <one-sentence>. I have tried:

1. <approach A> — does not work because <observed result>.
2. <approach B> — does not work because <observed result>.

I am unsure whether to (a) <option a> or (b) <option b>. The constraint I am
trying to satisfy is <invariant>. If anyone has context on whether <approach C>
was considered for a related JIRA, please share.

Reproducer is at <link to gist or branch>.

This is not failure. This is community engagement done right. Committers respect contributors who ask sharp questions with context attached. They ignore contributors who ask "any update?" or "can you help?"


Validation / Self-check

Before advancing to Step 6:

  1. Your fix is committed to your branch as a single commit with the title TEZ-NNNN: <short summary> and a body that references the root-cause document.
  2. git diff origin/master --stat shows the smallest plausible diff (single digit files changed, double-digit lines at most for a typical bug fix).
  3. The diff contains zero unrelated changes (no formatting-only changes, no import reordering not caused by your edit, no Javadoc cleanups in methods you didn't touch).
  4. mvn install -DskipTests -pl <changed-module> -am -q succeeds.
  5. The Step 2 reproducer test now passes (you'll generalize the test in Step 6 — the repro itself is still the gating signal).
  6. If you added a TezConfiguration key, it has all required annotations, Javadoc, _DEFAULT constant, and @since tag.
  7. You have re-read your diff line by line and convinced yourself every line change is required by the root cause. Strike anything that isn't.

Step 6: Testing

Your reproducer from Step 2 is the minimum — it proves the bug existed. The tests in this step prove that the fix is correct, that it stays correct, and that the next person who edits this code path will notice if they break it again. A good test suite is the most durable artifact you ship.

Two kinds of tests are required. Unit tests using a controlled dispatcher (fast, deterministic, surgical) and at least one integration test on MiniTezCluster (slow, realistic, end-to-end). Both. Always both.


Unit Tests with DrainDispatcher

The single most important Tez test pattern: synchronous, deterministic state- machine testing. Read the canonical example top to bottom before you write your own:

~/tez-src/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestVertexImpl.java
~/tez-src/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestTaskAttempt.java
~/tez-src/tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestTaskImpl.java

Each is 1000+ lines. They are not light reading. They are also the only authoritative source on what is and isn't testable at the unit layer.

What DrainDispatcher Does

DrainDispatcher is Hadoop's synchronous testing dispatcher (from hadoop-yarn-common). When you dispatch() an event into it, the event sits in a queue. When you call await(), the queue drains synchronously on the calling thread — every handler runs before await() returns. This gives you two superpowers:

  1. Deterministic event ordering. You can dispatch A, dispatch B, await — and you know A's handler completed before B's started.
  2. No real threading. Bugs reproduce on every machine, not just under contention.

State-Transition Test Pattern

The template every state-machine unit test follows:

@Test
public void testV_TASK_COMPLETED_inRunningWithRecovery() throws Exception {
  // 1. Arrange: drive the SUT to the state under test.
  vertex.handle(new VertexEvent(vertex.getVertexId(), VertexEventType.V_INIT));
  dispatcher.await();
  vertex.handle(new VertexEvent(vertex.getVertexId(), VertexEventType.V_START));
  dispatcher.await();
  assertEquals(VertexState.RUNNING, vertex.getState());

  // 2. Set up the precondition that triggers the bug.
  vertex.setRecoveryData(mockRecoveryData());

  // 3. Act: fire the event under test.
  TezTaskID lastTaskId = vertex.getTask(vertex.getNumTasks() - 1).getTaskId();
  vertex.handle(new VertexEventTaskCompleted(lastTaskId, TaskState.SUCCEEDED));
  dispatcher.await();

  // 4. Assert: the new state and any side-effect counters.
  assertEquals(VertexState.SUCCEEDED, vertex.getState());
  assertEquals(vertex.getNumTasks(), vertex.getCompletedTaskCount());
  assertFalse("vertex must not call handleRecovery when not actually replaying",
      vertex.getRecoveryHandlerCalled());
}

The sections — arrange, set precondition, act, assert — should always be visible. Reviewers skim for that shape. Hidden setup inside helpers makes the test harder to debug when it fails on a future change.

Build a Negative Test Too

You proved the bug is fixed. Now prove the non-buggy path still works:

@Test
public void testV_TASK_COMPLETED_inRunningWithoutRecovery() throws Exception {
  // Same arrange/state machinery, but recoveryData stays null.
  vertex.handle(new VertexEvent(vertex.getVertexId(), VertexEventType.V_INIT));
  dispatcher.await();
  // ...
  TezTaskID lastTaskId = vertex.getTask(vertex.getNumTasks() - 1).getTaskId();
  vertex.handle(new VertexEventTaskCompleted(lastTaskId, TaskState.SUCCEEDED));
  dispatcher.await();
  // Without recovery data, the existing transition behavior is unchanged.
  assertEquals(VertexState.SUCCEEDED, vertex.getState());
}

The negative test catches the regression where someone "fixes" your fix by removing the recovery branch entirely.

Test Both Branches of Every Guard You Added

If your fix is:

if (recoveryData != null && isReplayingRecovery()) { ... }

You owe four tests, one per combination:

recoveryData == nullisReplayingRecovery() returnsExpected branch
truen/a (short-circuited)non-recovery path
falsetruerecovery path
falsefalsenon-recovery path (this is the bug fix)
truetruenon-recovery path (impossible? assert it cannot happen)

The last row is the kind of test that catches a future refactor where someone deletes the short-circuit.


MockAppContext, MockHistoryEventHandler, and friends

Building a VertexImpl in a unit test requires a small zoo of collaborators (an AppContext, an event handler, an EdgeManager, etc.). Don't try to build them all from scratch — copy the helpers from TestVertexImpl.

grep -nE "private.*setUp\(|class Mock|createVertex\(" \
  tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestVertexImpl.java \
  | head -30

You'll see helper methods like createVertex(...), createDAG(...), and inner MockHistoryEventHandler. Use them as a template; do not duplicate them in your own test if you can extend the existing test class with a new @Test method.


Integration Tests with MiniTezCluster

Unit tests prove the fix works in isolation. Integration tests prove it works when wired up to a real YARN cluster (in-process, but real). For correctness bugs and shuffle bugs, this is non-negotiable.

Canonical example:

~/tez-src/tez-tests/src/test/java/org/apache/tez/test/TestOrderedWordCount.java

Read its setUp / tearDown carefully. The pattern:

private static MiniTezCluster mrrTezCluster;
private static Path TEST_ROOT_DIR;

@BeforeClass
public static void setup() throws IOException {
  Configuration conf = new Configuration();
  TEST_ROOT_DIR = new Path("target", TestYourFix.class.getName() + "-tmpDir");
  mrrTezCluster = new MiniTezCluster(TestYourFix.class.getSimpleName(),
      /*numNodeManagers=*/ 1, /*numLocalDirs=*/ 1, /*numLogDirs=*/ 1);
  mrrTezCluster.init(conf);
  mrrTezCluster.start();
}

@AfterClass
public static void tearDown() {
  if (mrrTezCluster != null) {
    mrrTezCluster.stop();
    mrrTezCluster = null;
  }
}

@Test(timeout = 180_000)
public void testTezNNNNFixEndToEnd() throws Exception {
  TezConfiguration tezConf = new TezConfiguration(mrrTezCluster.getConfig());
  DAG dag = buildDAGThatExercisesFix();

  TezClient tezClient = TezClient.create("test-tez-NNNN", tezConf);
  tezClient.start();
  try {
    DAGClient dagClient = tezClient.submitDAG(dag);
    DAGStatus status = dagClient.waitForCompletionWithStatusUpdates(
        EnumSet.of(StatusGetOpts.GET_COUNTERS));

    assertEquals(DAGStatus.State.SUCCEEDED, status.getState());

    // The actual assertion — what proves the fix works end-to-end:
    long counterVal = status.getDAGCounters()
        .findCounter(YourCounterGroup.class.getName(), "ExpectedCounter")
        .getValue();
    assertEquals(20L, counterVal);
  } finally {
    tezClient.stop();
  }
}

awaitVertexState and the Deterministic Polling Pattern

MiniTezCluster tests look async (real cluster, real time) but you can still write deterministic assertions. Use the await* helpers in the tez-tests test utility classes:

grep -rn "awaitVertexState\|awaitDAGCompletion\|awaitTaskAttempt" \
  ~/tez-src/tez-tests/src/test/java/

Pattern:

TestTezUtils.awaitVertexState(dagClient, "v1", VertexStatus.State.SUCCEEDED, 60_000);

This polls with backoff up to the timeout. It never returns early on a spurious signal and never sleeps a fixed wallclock duration.


Determinism Rules

Hard rules. Violating any of them gets your PR sent back.

RuleBadGood
No Thread.sleepThread.sleep(500)dispatcher.await() or awaitVertexState(...)
No wallclock waitswhile (!done && System.currentTimeMillis() < deadline) {...}latch.await(60, SECONDS) driven by event callback
No Random without seednew Random()new Random(42L)
No timezone-dependent assertionassertEquals("2024-...", LocalDate.now())inject Clock
No order-dependent assertion on a SetassertEquals(List.of("a","b"), new HashSet<>(...))sort first or use containsInAnyOrder
Tests must clean up tmpdirsleaving target/...-tmpDir between runs@After removes it or uses unique nanoTime() path
No global mutable statestatic int counter = 0; shared across testsper-test instance state

Tez has shipped many flaky-test fixes. Read a few of them:

cd ~/tez-src
git log --oneline --grep="flaky\|intermittent" | head -20
git show <flaky-fix-sha>

Notice the pattern — most flaky fixes are replacing a Thread.sleep with an event-driven await, or replacing a counter assertion with a state assertion.


Coverage Target

You do not need 100% line coverage on the file you touched. You do need ~80% coverage on the lines you changed, plus tests that exercise every new branch (true and false sides).

Spot-check coverage:

mvn test -pl tez-dag -Dtest='TestVertexImpl*' \
  org.jacoco:jacoco-maven-plugin:prepare-agent \
  org.jacoco:jacoco-maven-plugin:report

# Open tez-dag/target/site/jacoco/index.html

If your changed lines show red, add a test before pushing.


A Complete Test That Fails on Master, Passes With Fix

The deliverable for this step is a test (typically two or three @Test methods on the same class) that:

  1. Fails on a clean checkout of origin/master — assertion error, not a compilation error, not a setup error.
  2. Passes when run against your fix branch.
  3. Runs in under 10 seconds for unit tests, under 3 minutes for integration tests.
  4. Has zero flakes in 10 consecutive runs.

Verify the third and fourth:

for i in {1..10}; do
  echo "=== Run $i ==="
  mvn test -pl tez-dag -Dtest=TestVertexImplTezNNNN -q || break
done

If even one run fails, you have a flaky test. Fix it before pushing. A flaky test you ship is technical debt every other contributor will pay.


Test Naming

Tez convention:

  • Unit test file: Test<ClassUnderTest>.java lives in <module>/src/test/java/<package>/. If TestVertexImpl.java already exists, add a new @Test method there rather than a new file.
  • Test method: test<Method>_<Condition>_<ExpectedResult> or test<Scenario>_<ExpectedBehavior>.
  • Bad: testFoo, testBug, testCase1.
  • Good: testV_TASK_COMPLETED_inRunningWithRecoveryData_doesNotShortCircuit.

The verbose name is the test's documentation. Future-you reading the failure output of CI will be glad for the verbosity.


Validation / Self-check

Before advancing to Step 7:

  1. At least two @Test methods exist that fail on origin/master and pass on your branch.
  2. At least one of them uses DrainDispatcher for deterministic event ordering (or has a documented reason it doesn't — pure unit, no events).
  3. At least one integration test on MiniTezCluster is present if your fix affects end-to-end behavior (correctness, shuffle, scheduling).
  4. Ten consecutive runs of your tests are all green.
  5. Every new conditional branch in your production code has at least one test that exercises each side.
  6. No Thread.sleep, no wallclock waits, no unseeded Random, no order-dependent assertions on unordered collections.
  7. mvn test -pl <module> runs your tests in under the budget (10s unit, 3min integration).

Step 7: Validation

Your patch compiles. Your new tests pass. That is not enough. Validation is proving that the rest of the build — full module test suites, the static analyzers Tez runs, the legal scanner, the end-to-end examples — is also still green. Reviewers will not run this for you. They will check that you ran it and reject the PR if you didn't.

Budget: 1–2 evenings. Most of it is waiting on mvn test.


The Validation Checklist

In order. Do not skip steps because the previous step passed.

  1. Full test suite of every module you touched.
  2. Full clean build of the whole repo.
  3. Checkstyle.
  4. SpotBugs.
  5. Apache RAT (license header check).
  6. TestOrderedWordCount end-to-end.
  7. Re-run your original Step 2 reproducer to confirm green.
  8. Regression sweep of any module that depends on what you changed.
  9. Performance validation (if perf-relevant).

Capture the output of each into capstone-work/validation/. You'll cite it in the PR description.


1. Full Module Tests

The module you changed:

cd ~/tez-src
mvn test -pl tez-dag -q 2>&1 | tee capstone-work/validation/01-tez-dag-test.log

This will take 5–20 minutes depending on the module. tez-dag is the slowest non-integration module. While it runs, work on the diff cleanup.

When it finishes, scroll to the summary lines. Look for:

[INFO] Tests run: 1342, Failures: 0, Errors: 0, Skipped: 17

If you see Failures > 0, open every failure. Then triage:

  • My fix caused it. Go back to Step 5. Reread the test. Either your fix is wrong, or the test is wrong (rare — assume the test is right until proven otherwise).
  • It is a known flaky test. Grep the JIRA: git log --grep="<TestName>". If there is an open ticket, link it in your PR description ("known flake, see TEZ-XYZ"). If there is not, file one before claiming the green.
  • It is also broken on master. Verify by running git stash && mvn test ... && git stash pop. If it fails on master too, link the JIRA or file one. Do not let your PR be the one to surface a pre-existing failure silently.

Run for every module you touched. If you touched tez-api, you touched everything downstream — plan accordingly.


2. Full Clean Build

The compilation gate. Catches missing imports, accidental Java-version features, downstream API breaks:

mvn clean install -DskipTests -q 2>&1 \
  | tee capstone-work/validation/02-clean-install.log

Expect a clean BUILD SUCCESS. Common failures:

  • Missing import. Your IDE auto-imported something not on the classpath of a downstream module.
  • API break. You changed a public method signature in tez-api and a downstream caller broke. Either revert the signature change or update the caller.
  • Java version. You used var or text blocks. Tez compiles to a JDK baseline (check pom.xml for <maven.compiler.target>). Use compatible syntax.

3. Checkstyle

Tez uses checkstyle aggressively. Run:

mvn checkstyle:check -q 2>&1 \
  | tee capstone-work/validation/03-checkstyle.log

Or, per module:

mvn checkstyle:check -pl tez-dag

Common violations and fixes:

ViolationFix
Line longer than 120 charsBreak the line. Indent continuation 4 spaces.
Wildcard importReplace with explicit imports.
Missing javadoc on public methodAdd /** ... */ block.
Trailing whitespaceConfigure your editor to strip it on save.
Tab characterConvert to 2 spaces (Tez uses 2-space indent in most modules).
Method orderingPublic before private; static before instance.

The checkstyle config lives at tez-build-tools/src/main/resources/tez/checkstyle/checkstyle.xml — read it to understand the rules.


4. SpotBugs

Static analysis for null-deref, unchecked cast, dead-store, etc.:

mvn spotbugs:check -q 2>&1 \
  | tee capstone-work/validation/04-spotbugs.log

If it fails, view the report:

mvn spotbugs:gui -pl tez-dag

Common warnings worth fixing:

  • NP_NULL_ON_SOME_PATH — your new code dereferences a value that can be null on some branch.
  • EI_EXPOSE_REP — your getter returns a mutable internal collection directly. Wrap in Collections.unmodifiableList(...) or copy.
  • RV_RETURN_VALUE_IGNORED_BAD_PRACTICE — the result of file.delete() was ignored.

Warnings already present on master are not your problem to fix, but the analyzer will fail the build if your change introduces new ones. git diff origin/master tez-dag/target/spotbugsXml.xml (after running on both branches) tells you which are new.


5. Apache RAT (License Headers)

Every new .java, .xml, .properties file must carry the ASL header. RAT enforces this:

mvn apache-rat:check -q 2>&1 \
  | tee capstone-work/validation/05-rat.log

If it complains about your new test file, prepend the standard header:

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

(Copy from any existing Tez file — it is the canonical form.)

For shell, properties, and XML files, use the appropriate comment syntax. Look at neighboring files in the same directory.


6. TestOrderedWordCount End-to-End

The closest thing to a smoke test of "does Tez actually still work for a real user workload":

mvn test -pl tez-tests -Dtest=TestOrderedWordCount -q 2>&1 \
  | tee capstone-work/validation/06-orderedwordcount.log

Takes 2–5 minutes. If this fails when your unit tests pass, your fix likely broke an interaction your unit test didn't exercise. Common culprits:

  • You changed an event ordering and a downstream component assumed the old ordering.
  • You added a config key default that breaks the example's expectations.
  • Your MiniTezCluster test is leaking state into a sibling test.

7. Re-Run Your Original Step 2 Reproducer

Sanity check. The thing you set out to fix is still fixed:

mvn test -pl <module> -Dtest=<YourReproTest> 2>&1 \
  | tee capstone-work/validation/07-repro.log

Five runs:

for i in 1 2 3 4 5; do
  mvn test -pl <module> -Dtest=<YourReproTest> -q
done

Five greens. Or you have not actually shipped a fix.


8. Regression Sweep

Run the test suite of every module that depends on what you changed. If you touched tez-api, that is everything. If you touched tez-runtime-library, that is at least tez-tests, tez-mapreduce, and tez-examples.

# Identify dependents
grep -l "tez-runtime-library" $(find ~/tez-src -name pom.xml)

# Run each
mvn test -pl tez-mapreduce -q | tail -5
mvn test -pl tez-examples -q | tail -5
mvn test -pl tez-tests -q | tail -10

If tez-tests takes too long (it can — there are real MiniTezCluster runs in there), at least run the tests whose name contains your changed class:

mvn test -pl tez-tests -Dtest='*Vertex*' -q

9. Performance Validation (If Relevant)

Skip this section unless your fix touches scheduling, shuffle, or any code path documented as "hot." For those, use async-profiler or JFR to capture a flamegraph before and after.

async-profiler pattern

# Start the JVM under test (e.g. a MiniTezCluster integration test)
mvn test -pl tez-tests -Dtest=TestPerfWorkload -DforkMode=never &
TEST_PID=$!

# Attach profiler
~/async-profiler/profiler.sh -d 60 -f /tmp/flame-before.svg $TEST_PID

# Apply your fix, repeat
~/async-profiler/profiler.sh -d 60 -f /tmp/flame-after.svg $TEST_PID

Compare the two SVGs. The stack frames you care about (e.g. ShuffleManager.run, MergeManager.merge) should not be wider after your fix than before. If they are, you have introduced a regression and you owe the JIRA an explanation.

Simpler: timing assertions in a JUnit test

@Test
public void testShuffleNotSlowerAfterFix() throws Exception {
  long start = System.nanoTime();
  runShuffleWorkload();
  long elapsedMs = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
  // Loose bound — assert no >30% regression vs. a previously-measured baseline.
  assertTrue("shuffle took " + elapsedMs + "ms, expected < 15000",
      elapsedMs < 15_000);
}

Brittle. Only add if perf is truly the concern.


The Validation Report

Compile everything into one document for the PR:

# Validation report for TEZ-NNNN

## Environment
- JDK: `java -version` -> openjdk version "11.0.21"
- Maven: `mvn -version` -> Apache Maven 3.9.6
- OS: macOS 14.2 / Linux 5.15.0-91-generic
- Tez HEAD: `git rev-parse origin/master` -> a1b2c3d4

## Results

| Check | Status | Notes |
|---|---|---|
| `mvn test -pl tez-dag` | PASS | 1342 tests, 0 failures, 17 skipped |
| `mvn clean install -DskipTests` | PASS | |
| `mvn checkstyle:check` | PASS | |
| `mvn spotbugs:check` | PASS | |
| `mvn apache-rat:check` | PASS | |
| `mvn test -pl tez-tests -Dtest=TestOrderedWordCount` | PASS | |
| Original reproducer | PASS (5/5 runs) | |
| `mvn test -pl tez-mapreduce` | PASS | |
| `mvn test -pl tez-examples` | PASS | |

## Known flakes encountered
- TestSomething#testWhatever — pre-existing flake, see TEZ-XXXX, not caused by this change.

## Performance
- Not applicable / no perf-relevant code paths touched.

Save as capstone-work/validation/REPORT.md. Paste it (or a summary plus link) into your PR description.


Validation / Self-check

Before advancing to Step 8:

  1. capstone-work/validation/ contains one log file per check (logs 01–07 at minimum).
  2. capstone-work/validation/REPORT.md exists with the table above filled in honestly.
  3. Every check passes, or every failure is documented as a pre-existing issue with a JIRA link.
  4. You re-ran your Step 2 reproducer five times with your fix applied and got 5/5 green.
  5. You ran the test suite of at least one module that depends on the one you changed (regression sweep).
  6. No new SpotBugs warnings introduced (diff against master baseline).
  7. The validation report is short enough to paste into a PR description without making the reviewer scroll for a screen.

Step 8: Patch Preparation

You have working code, working tests, and a green validation run. Now you package the change so it can land. Modern Tez does this via GitHub PR; older Tez (and still some committers' preference) is .patch files attached to the JIRA. You should know how to do both.

This step is the easiest to skip past and the easiest to lose a week on if you do it sloppily. Treat the PR title, description, and commit message as seriously as the code.


Modern Tez: GitHub Pull Request

Apache Tez has been on GitHub Issues + PRs (mirrored to JIRA) for several years. The flow:

# 1. Make sure your branch is up to date with master
cd ~/tez-src
git remote -v
# origin    git@github.com:<you>/tez.git
# apache    https://github.com/apache/tez.git

git fetch apache
git checkout tez-NNNN-<slug>
git rebase apache/master
# Resolve any conflicts. Rebuild and re-run your tests after rebase.

# 2. Squash to a single clean commit (or 2-3 if logically separable)
git rebase -i apache/master
# In the editor: pick the first commit, squash the rest. Edit the combined
# commit message to one final version.

# 3. Push to your fork
git push --force-with-lease origin tez-NNNN-<slug>

# 4. Open a PR via https://github.com/apache/tez
# Title: TEZ-NNNN: <short summary, present tense>
# Base: apache/tez:master

--force-with-lease instead of --force: protects against overwriting a collaborator's commit if someone pushed to your branch between your fetch and your push.

Commit Message Template

TEZ-NNNN: Fix VertexImpl recovery branch short-circuiting non-recovery path

Reverts the unconditional short-circuit added in TEZ-2877 so that
V_TASK_COMPLETED events on the final task are processed by the standard
transition when no recovery is in progress. The original short-circuit
assumed any non-null recoveryData implies an active replay; this assumption
broke when recoveryData is populated speculatively at vertex initialization
even though no replay will occur.

The fix gates the recovery branch on the new isReplayingRecovery() predicate.
The previous behavior is preserved for actual recovery scenarios.

Tests:
- New unit test TestVertexImpl#testV_TASK_COMPLETED_inRunningWithRecovery
- New integration test in tez-tests verifying end-to-end DAG success
  with recoveryData populated.
- Existing TestOrderedWordCount and full tez-dag suite pass.

Note the shape:

  • Title line: TEZ-NNNN: <verb-phrase, present tense, < 72 chars>.
  • Blank line.
  • Body paragraphs: what the change does, why, what assumption broke.
  • Tests: explicit list of tests added or affected.

No "should fix" or "I think." Past tense for what you did, present tense for what the code does after the change.


PR Title

TEZ-NNNN: <Short imperative summary>
  • TEZ-NNNN prefix is mandatory. The bot uses it to link to JIRA.
  • Imperative mood: "Fix race", "Add config", "Avoid NPE". Not "Fixed race", not "Fixing race".
  • < 72 characters total including the prefix.

Bad: Fix bug, Updates to VertexImpl, My fix for TEZ-NNNN. Good: TEZ-4567: Honor isReplayingRecovery in VertexImpl completion path.


PR Description Template

## JIRA

https://issues.apache.org/jira/browse/TEZ-NNNN

## Problem

<2-4 sentences. Symptom + trigger conditions. Cite the root-cause doc.>

When the last `V_TASK_COMPLETED` event arrives for a vertex with non-null
`recoveryData` outside an actual recovery replay, the event is unconditionally
re-routed through `handleRecovery()` rather than processed by the standard
transition. As a result, `completedTaskCount` is not incremented and the vertex
fails to transition to SUCCEEDED. This affects DAGs whose AM populates
recovery data speculatively at vertex initialization.

## Root cause

See `capstone-work/root-cause.md` (or paste inline if short).

Introduced in TEZ-2877 (commit a1b2c3d4).

## Fix

Gate the recovery short-circuit on a new `isReplayingRecovery()` predicate
that returns true only during active replay. Minimum-diff (one production
line + one new private method).

## Testing

- **New:** `TestVertexImpl#testV_TASK_COMPLETED_inRunningWithRecovery` —
  unit test using `DrainDispatcher` that reproduces the failure on master
  and passes with this fix. Plus a negative-control test.
- **New:** `TestTezNNNNFixIntegration` — `MiniTezCluster` end-to-end test
  that runs a 20-task vertex with speculatively-populated recoveryData and
  asserts DAG SUCCEEDED.
- **Existing:** Full `tez-dag` suite (1342 tests) passes. `TestOrderedWordCount`
  passes. Validation report in commit message footer.

## Backward compatibility

None affected. The fix changes behavior only for the broken case (no replay
in progress). Recovery scenarios are unchanged.

## Configuration

No new keys.

Adjust sections to your fix. The structure stays the same.


GitHub Actions / Yetus Precommit

When you open the PR, GitHub Actions runs the precommit checks. The full config lives in .github/workflows/ — read it:

ls ~/tez-src/.github/workflows/
cat ~/tez-src/.github/workflows/build.yml

Common checks (subject to change as the workflow evolves):

CheckWhat it runsFailure means
Compilemvn install -DskipTestsBuild broken on some module
Testsmvn test for each moduleSome test failed (yours or flake)
Checkstylemvn checkstyle:checkStyle violation in changed file
Javadocmvn javadoc:javadocBroken Javadoc reference or missing tag
RATmvn apache-rat:checkNew file missing ASL header
LicenseLicense snippet checkSame as RAT, or LICENSE/NOTICE drift
SpotBugsmvn spotbugs:checkNew static-analysis warning

Failures appear on the PR as red ✖ marks. Click into the failing job to read the log. Common first-PR failures:

  • Javadoc broken: You referenced {@link Foo#bar} and bar doesn't exist. Either fix the link or remove it.
  • Checkstyle: A line exceeded 120 chars or an unused import slipped in.
  • License: New file missing the header. Add it.
  • Test: A flake. Re-run the workflow ("Re-run all failed jobs" in the Actions tab). If it goes green on retry, leave a comment: "Re-ran job — flake, see TEZ-XYZ." If it fails again, your fix probably broke it.

Push fixes as new commits on the same branch. The PR auto-updates. After review approval, you'll squash on merge.


Old-Style: .patch Files on JIRA

Some committers still review .patch attachments. Know the convention.

Generate

git format-patch apache/master -o /tmp/
# Produces /tmp/0001-TEZ-NNNN-Fix-...patch

Or, for one combined diff:

git diff apache/master..HEAD > /tmp/TEZ-NNNN.01.patch

Naming convention

TEZ-<NNNN>.<iteration>.patch. So your first attachment is TEZ-4567.01.patch, second iteration after review feedback is TEZ-4567.02.patch. Some committers use TEZ-4567.001.patch (three-digit). Match whatever pattern the most recent committer used on that issue.

For a branch-specific patch (e.g. against the branch-0.10):

TEZ-4567.branch-0.10.01.patch.

Attach

In JIRA: "Attach files" → upload. Then "More" → "Patch Available" to flip the state. Cancel patch (revert to "In Progress") if you find a problem before review starts.

The JIRA workflow is covered fully in Step 9. The patch-file mechanics live here.


Rebasing on Master Without Losing Review Comments

GitHub's PR view loses inline comment threads when you force-push a rebase that changes the SHAs reviewers commented on. To minimize the damage:

  1. Don't rebase mid-review unless you have to. Merge-commits from apache/master into your branch are usually acceptable during active review; squash at the end.
  2. When you do rebase, leave a comment: "Force-pushed to rebase on master (was <old SHA>, now <new SHA>). All review threads should still be visible against the latest commit."
  3. Squash only at the very end, after approval, just before merge.
  4. If you really break the comment threads, post a summary comment listing "what was at line X became line Y in the new push." Reviewers appreciate it.

To rebase:

git fetch apache
git checkout tez-NNNN-<slug>
git rebase apache/master
# If conflicts: edit, git add, git rebase --continue
mvn install -DskipTests -pl <module> -am -q
mvn test -pl <module> -Dtest=<YourTests> -q
git push --force-with-lease origin tez-NNNN-<slug>

Co-Author and Sign-Off

If a committer or another contributor materially helped (suggested the fix direction, found the root cause), credit them:

TEZ-NNNN: <summary>

<body>

Co-authored-by: Alice <alice@example.org>

Tez does not require a Signed-off-by line (it is not a DCO project — it requires an Apache CLA), but committers appreciate when you note influences in the commit message.


What Reviewers Look For First

In rough order:

  1. PR title and JIRA link — wrong format, instant correction request.
  2. Description quality — vague description, "please clarify" comment.
  3. Diff size — > ~200 lines for a "bug fix" gets scrutiny on scope creep.
  4. Tests present — no tests, immediate request.
  5. Tests fail on master, pass with fix — confirms the test is actually testing the fix, not just a happy path.
  6. Production diff is minimum to fix the bug — every extra change has to justify itself.
  7. Style and convention compliance — checkstyle and tests must be green.
  8. API hygiene — no public methods added/removed without discussion.
  9. Backward compatibility — does the fix change observable behavior for non-buggy cases? If yes, was it discussed?

Optimize for the first seven before you push. The last two are usually discussed in JIRA comments before the PR opens.


Validation / Self-check

Before advancing to Step 9:

  1. PR exists on apache/tez with the format TEZ-NNNN: <summary>.
  2. PR description follows the template; cites the root-cause document.
  3. Commit message follows the template (title, body, tests footer).
  4. GitHub Actions precommit is green (every check), or every red has a documented and accepted explanation.
  5. Branch is rebased on a recent apache/master (within last 24-48h ideally).
  6. PR was opened with the URL pasted into the JIRA as a comment.
  7. You can articulate in one paragraph why every line of your diff is necessary, if a reviewer asks.

Step 9: JIRA and Documentation

The JIRA is the project's permanent memory. The PR is ephemeral — it lives on GitHub, gets merged, fades into git log. The JIRA is what users grep when they hit a similar bug two years later, what release managers read when compiling release notes, what new contributors find when researching prior art. Treating it as a checkbox is the laziest possible thing you can do.

This step is short on procedure and heavy on hygiene. Twenty minutes done well saves three different people an hour each later.


The Status Workflow

Tez uses Apache's standard JIRA workflow. The states you will pass through:

Open  ->  In Progress  ->  Patch Available  ->  Resolved  ->  Closed
                              ^                    ^
                              |                    |
                            (you)              (committer)
StateSet byMeaning
OpenReporterBug exists, nobody is working on it.
In ProgressAssigneeSomeone is actively investigating.
Patch AvailableAssigneeA patch / PR is ready for committer review.
ResolvedCommitterPatch merged. Resolution: Fixed plus Fix Version.
ClosedAnyoneVerified in a release. Often skipped — many Tez tickets stay Resolved indefinitely.

Transitioning correctly

  • You move it to In Progress when you claim it in Step 1.
  • You move it to Patch Available when your PR is open and precommit is green. This is the signal "ready for human review, not just CI."
  • You do NOT move it to Resolved. Only a committer does that when they merge. Setting it yourself will be reverted, and you will look new.
  • If a committer asks you to revise, the state usually stays at Patch Available. Move back to In Progress only if you'll be rewriting significantly (multi-day rework).

The Patch Available ritual

When you flip to Patch Available, leave a comment:

PR is now open at <link>, precommit is green, ready for review.

Summary: <one paragraph from the PR description>.

Tests: <list>.

Specific reviewer requests: <if any, e.g. "would appreciate a look from
@someone since they wrote the original code">.

This wakes up the JIRA's watchers (committers who follow issues@) and gives them enough context to decide whether to pick it up.


Required Fields

FieldWho setsWhat to set
AssigneeYouYourself (Step 1).
ComponentYoutez-dag, tez-runtime-library, etc. — whatever module you primarily changed.
Affects VersionReporter or youThe earliest version where the bug reproduces.
Fix VersionCommitterLeave blank. Only PMC/committers set this. You can comment "suggesting fix version X.Y.Z" if you have a strong opinion.
PriorityReporter or PMCDon't bump your own. Comment if you think the priority is wrong.
LabelsYouAdd flaky-test / recovery / shuffle if it helps grep later. Don't invent vanity labels.
Release NotesYou, if user-visibleMandatory if behavior, API, or configuration changes are visible. See below.
Linked IssuesYouLink the PR (web link) and any related JIRAs. See below.

Release Notes

If your fix changes anything a user can observe — output format, config key default, error message, performance characteristic — fill out the "Release Notes" field. Format:

Fixed an issue where vertices with speculatively-populated recovery data
would not transition to SUCCEEDED after all tasks completed. Affects DAGs
submitted via TezClient when checkpoint-based recovery is enabled. No
configuration or API change is required.

Two to four sentences. Past tense ("Fixed"). User-facing language ("DAGs", "TezClient"), not implementation jargon ("V_TASK_COMPLETED handler").

If your fix is purely internal (refactor of a private method, test-only change), leave Release Notes blank. The release manager will skip it.


Linking the PR

Issue Links → "is related to" → Web Link → paste the GitHub PR URL.

Tez also has a bot that auto-links a PR to the JIRA when the PR title starts with TEZ-NNNN:. The bot fires within minutes. If after an hour the JIRA does not have a "GitHub Pull Request" link visible, add it manually:

JIRA → More → Link → Web Link → URL: https://github.com/apache/tez/pull/<NNN> → Link Text: GitHub PR.

If your fix interacts with other tickets, link them explicitly:

RelationWhen to use
is duplicated byAnother JIRA is a duplicate of yours (close that one).
duplicatesYours is the duplicate (close yours, work on the older one).
is related toTouches similar code but distinct issue.
is blocked byYou cannot land until another JIRA lands first.
is caused byBisect identified TEZ-XYZ as the regression source.
supersedesYour fix replaces an older abandoned attempt.

Be conservative. Spurious links pollute the issue graph. Cross-link only where the connection is concrete.


Code Comments in the Fix

The JIRA explains what and why at the project level. Inline code comments explain why at the file level for the next person editing this line.

Good inline comment patterns:

// TEZ-NNNN: only short-circuit when recovery replay is actually in progress;
// recoveryData may be populated speculatively at vertex init even when no
// replay will occur. See the JIRA for the affected scenario.
if (recoveryData != null && isReplayingRecovery()) {
  handleRecovery(event);
  return;
}

Rules:

  • Cite the JIRA number. Future-you grepping the file for TEZ- will find the context immediately.
  • Explain the non-obvious invariant, not what the code obviously does. Never write // increment counter next to count++.
  • One or two lines max. If you need a paragraph, write the design note in the class Javadoc or in a markdown doc under docs/.
  • Don't paste the entire root-cause document. The JIRA holds that.

Notifying Watchers

After Patch Available, the JIRA's watchers see an email. If you want a specific committer's attention (e.g. the author of the introducing commit from your git bisect), @mention them in a JIRA comment:

[~alice] (author of TEZ-2877) — would appreciate a sanity check on the
recovery short-circuit gating in this PR, since you wrote the original
branch. No urgency.

The [~jira-username] syntax is JIRA's mention. Find the username from their JIRA profile URL (https://issues.apache.org/jira/people/<username>).

Do this once. Do not @-mention in every subsequent comment — committers filter their inboxes.


Backporting Fix to Branches

For most Capstone work, you fix on master and stop. But if your bug affects a maintained release branch and a committer asks you to backport:

  1. Comment on the JIRA: "Will backport to branch-0.10 once master patch lands."
  2. After merge to master, create a new branch from apache/branch-0.10:
    git fetch apache
    git checkout -b tez-NNNN-branch-0.10 apache/branch-0.10
    git cherry-pick <master-fix-commit-sha>
    # Resolve conflicts (often minor; sometimes major if branch diverged).
    
  3. Run validation on the branch (same Step 7 checks).
  4. Open a separate PR titled TEZ-NNNN (branch-0.10): <summary> or attach a TEZ-NNNN.branch-0.10.01.patch to the same JIRA.

Each branch's PR/patch is a separate review.


After Merge

When a committer merges your PR:

  1. The PR is closed automatically, and they'll comment "Committed to master, thanks @you" with the merged-commit SHA.
  2. They (or the bot) set the JIRA to Resolved with Resolution: Fixed and Fix Version: X.Y.Z.
  3. You comment with a thanks and any follow-up plans:
    Thanks for the review and merge, [~alice]. I'll watch for the next RC
    to verify it lands cleanly. Filed TEZ-MMMM for the follow-up refactor
    we discussed.
    
  4. If you spotted a related improvement during review, file the follow-up JIRA immediately — do not let it slip.

Documentation Beyond the JIRA

Most bug fixes need no further doc. Exceptions:

ChangeWhere to document
New config keytez-api/src/main/resources/META-INF/services/... if not auto-generated; reference from the Tez site config docs page.
New public APIJavadoc on the new method/class + the relevant docs/<feature>.md if one exists.
Behavior change visible to operatorsA note in CHANGELOG.md (committer usually handles), and a JIRA Release Notes entry (you write this).
New tunable or debug flag for operatorsMention in the Tez configuration reference page (commit to the tez-site/ directory or open a JIRA for the site update).

When in doubt, ask in the JIRA: "Should I update the docs page for X as part of this, or as a follow-up JIRA?" Committers will tell you.


Validation / Self-check

Before advancing to Step 10:

  1. JIRA status is Patch Available with a comment summarizing the change and linking the PR.
  2. Assignee is you.
  3. Component is set to the right module.
  4. Affects Version is set to a real Tez version where the bug reproduces.
  5. Release Notes field is filled in (or explicitly blank with a one-line "internal only" justification in the PR description).
  6. PR is linked under Issue Links → Web Link.
  7. Any related JIRAs are cross-linked with the correct relation (is caused by / is related to / etc).
  8. Inline code comments cite TEZ-NNNN where the change is non-obvious.
  9. If a committer was specifically helpful (author of regressing commit, reviewer on related work), you @-mentioned them once, not repeatedly.

Step 10: Engineering Write-Up

The patch is merged. The JIRA is Resolved. Most contributors stop here. The ones who become committers, write the post. The write-up is the artifact that travels with you when you change jobs, apply for a committer vote, or get cited by another contributor doing similar work.

Eight hundred to a thousand words. Most of it written in the four hours right after merge, while the dead ends are still fresh.


Why It Matters

Three audiences:

  1. Future you. Six months from now you'll touch this code again and want to remember what you tried.
  2. The next contributor working a similar bug. They'll find your post via Google ("Tez vertex stuck RUNNING") and shortcut a week of work.
  3. The committers / PMC evaluating you for a vote. They want to see that you can communicate engineering reasoning, not just produce diffs.

A good write-up is not a press release. It is a postmortem: honest about what you tried, including the failed approaches.


The Template

Sections in order, suggested word counts.

Title (one line)

Fixing TEZ-NNNN: <one-line technical summary>

Examples:

  • "Fixing TEZ-4567: A speculative-recovery short-circuit race in VertexImpl"
  • "Fixing TEZ-3982: Why our shuffle was 30% slow on small inputs"
  • "Fixing TEZ-2451: An off-by-one in MergeManager spill accounting"

Technical, specific. Not "My first Apache Tez contribution" — write that post separately on your blog. The engineering post stands on its own.

Problem (100–150 words)

What broke, for whom, under what conditions. Plain English, but precise.

Tez vertices configured with checkpoint-based recovery would intermittently
fail to transition to SUCCEEDED, leaving the DAG in RUNNING state until the
AM hit its global timeout. The bug only manifested when the application
master pre-populated recovery data at vertex initialization (rather than
lazily during an actual replay), which is the path used by long-running
Tez sessions reusing AMs across DAG submissions.

The symptom was a stalled DAG with all tasks reporting SUCCEEDED in the
counters but no DAGFinishedEvent in the AM log. Affected Tez 0.9.x and
0.10.0 onward.

State the symptom (what the user sees), the trigger condition (when it manifests), and the affected version range. No code yet.

Investigation Log (200–300 words)

The most valuable section. Walk through what you tried, including the hypotheses that were wrong.

Initial hypothesis was a task-scheduler bug — we suspected
TaskSchedulerManager was dropping a TASK_COMPLETED event under load.
DrainDispatcher-based reproducers in isolation showed no event loss, so
we ruled this out within a day.

Second hypothesis: a state-machine transition guard rejecting the final
event. Adding TRACE logging to VertexImpl confirmed V_TASK_COMPLETED was
arriving and being dispatched, but completedTaskCount remained one short
of total. This shifted attention from "the event is missing" to "the
event is processed but not by the expected handler."

Reading VertexImpl.handle(...) line by line revealed the recovery
short-circuit at line ~2400: `if (recoveryData != null) { handleRecovery(...); }`.
A git blame placed this in TEZ-2877 (commit a1b2c3d4), where the
assumption "non-null recoveryData implies active replay" was reasonable
at the time but became invalid when TEZ-3105 introduced speculative
recovery-data population at vertex init.

The actual race: V_TASK_COMPLETED for the final task arrived at the
moment when recoveryData was populated but isRecovering() would have
returned false — there was no isRecovering() check.

Three to five hypotheses, in the order you tried them. Each with one sentence on what suggested it and one sentence on what disproved it. The dead ends are not embarrassments — they are the work, and they teach readers what not to spend a week on.

Root Cause (50–100 words)

One paragraph, the truth as you now understand it.

The vertex state machine's V_TASK_COMPLETED handler in the RUNNING state
short-circuited any event to handleRecovery() when recoveryData was non-null,
regardless of whether a recovery replay was actually in progress. Speculative
population of recoveryData at vertex initialization (TEZ-3105) made the
guard fire in normal execution, routing terminal events to the recovery
path which silently ignored them when not replaying. The completedTaskCount
counter never reached totalTaskCount, blocking the SUCCEEDED transition.

Cite the introducing JIRA. Cite the bisect commit if you have it.

Final Design (150–200 words)

What you actually changed and why this design over alternatives.

The fix introduces an isReplayingRecovery() predicate that returns true
only when a recovery replay is in flight (tracked by an existing
RecoveryState flag in DAGAppMaster). The short-circuit is gated on this
predicate:

  if (recoveryData != null && isReplayingRecovery()) { ... }

This is a one-line production change plus a four-line predicate method.
It preserves all behavior for actual recovery scenarios and corrects the
behavior only for the speculatively-populated case.

Show the diff size and the principle ("minimum surface area"). Note any public API impact (here: none).

Alternatives Considered (100–150 words)

Two to three alternatives you rejected, with the reason.

**Alternative 1: stop populating recoveryData speculatively at vertex init.**
Rejected: TEZ-3105 documented performance reasons for the eager population
(avoids a stall when actual recovery kicks in). Reverting it would
regress that path.

**Alternative 2: have handleRecovery() forward the event back to the
standard transition when not replaying.** Rejected: it works, but couples
the recovery path to internal knowledge of which events the standard
transition needs. The gate-at-source approach is local and reviewable.

**Alternative 3: remove the short-circuit entirely and let handleRecovery()
no-op when not replaying.** Rejected: changes the semantics of every other
event flowing through the recovery path, with broader behavioral risk for
a narrowly-scoped bug.

This is the section that separates contributor-quality write-ups from committer-quality ones. Anyone can ship a fix. Articulating why this fix and not the obvious alternatives demonstrates engineering judgment.

Performance / Behavior Impact (50–100 words)

If perf-relevant, numbers from Step 7. Otherwise, one sentence:

No measurable performance impact. The new predicate is a single field
read on a hot path (VertexImpl.handle) but the original short-circuit
already paid this cost on every event. Validated via TestOrderedWordCount
runtime: no statistically significant change across 10 runs.

Lessons Learned (100–150 words)

The transferable insights, written for a peer. Things you would tell yourself before starting.

- Recovery code in Tez has always been the sharpest edge: it is the
  least-tested path because it only runs during AM failover, and most
  developer environments don't trigger it. When a bug touches recovery
  data flow, assume the test coverage is thin and add reproducers
  aggressively.
- `git pickaxe` and `git bisect` together were decisive — bisect found
  the introducing commit (TEZ-2877), and pickaxe on the changed expression
  showed it had never had a guard. Without bisect this would have been
  a week of code archaeology.
- DrainDispatcher in TestVertexImpl is underused. The repro test for this
  bug took two hours to write once I learned the pattern, and it is now
  permanent regression protection.

Three to five bullets. Concrete enough that a peer at another project could apply them.

- JIRA: https://issues.apache.org/jira/browse/TEZ-NNNN
- PR: https://github.com/apache/tez/pull/<NNN>
- Merged commit: <SHA>
- Introducing commit (TEZ-2877): <SHA>

Where to Publish

Three venues, in roughly decreasing order of effort and impact.

1. Personal blog or company engineering blog

Full ~1000-word write-up. SEO-friendly title with the JIRA number and a keyword phrase users would search for ("Tez vertex stuck RUNNING fix"). Link prominently to JIRA and PR. This is the version that follows you across jobs.

2. Apache wiki / Tez documentation

Shorter version (300–500 words) focused on the lesson, not the personal narrative. Filed under a relevant page (recovery troubleshooting, debugging state machines). Requires wiki access — committers will grant it once you have a few merged contributions.

3. dev@ summary email

Two to three paragraph summary on dev@tez.apache.org with subject [TEZ-NNNN] Notes on the fix. Lets watchers and PMC see the engineering reasoning without having to read the whole PR. Optional but earns goodwill.

Subject: [TEZ-NNNN] Notes on the fix

Hi all,

Merged TEZ-NNNN this morning. Quick notes on the investigation since
recovery bugs are uncommon and the root cause was a non-obvious
interaction with TEZ-3105:

<2 paragraphs of summary>

Full write-up: <link to blog post>

Thanks again to [~alice] for the review.

Anti-Patterns

What separates write-ups that help from ones that don't:

  • "I learned a lot working on this!" — Yes, we know. Cut it. The artifact is the engineering, not the feel-good.
  • Personal narrative dominating the engineering. Save the "my journey into open source" angle for a separate post. Engineering posts get cited and reread. Narrative posts get one-time clicks.
  • Sanitized version where you "knew the answer all along." Nobody believes this and it actively misleads new contributors who feel inadequate when their investigation is messy. Be honest about the dead ends.
  • No code snippets. A write-up without showing the actual diff or the symptomatic log line is unfalsifiable.
  • No links. JIRA, PR, commit — all three minimum. A write-up without the JIRA link is unreviewable.
  • Word-padding to look thorough. A tight 600-word write-up that respects the reader beats a 2000-word slog every time.

Validation / Self-check

Before declaring the Capstone complete:

  1. The write-up is published at a URL you can share (blog, GitHub Gist, capstone-work/writeup.md in a public repo).
  2. It is 500–1000 words; not 200 (too thin) and not 3000 (padding).
  3. Investigation Log section contains at least two hypotheses you ruled out, not only the winning one.
  4. Alternatives Considered section names at least two designs you rejected with reasons.
  5. Lessons Learned section has three to five bullets, each concrete enough to be reusable by another contributor.
  6. JIRA, PR, and merged-commit SHA are all linked.
  7. The write-up reads as something a peer engineer would respect, not a triumphalist blog post.

Evaluation Rubric

A 100-point self-grading rubric for the Capstone. Score yourself honestly after you finish Step 10. The scoring is calibrated against what Tez committers actually look for — not what feels good to read.

The point of this rubric is not the score. It is the diagnostic: a low score on one dimension tells you exactly where to invest the next contribution.


Scoring Dimensions

Seven dimensions, weighted by how much they matter for review outcomes.

#DimensionPoints
1Problem articulation20
2Execution-path mastery20
3Implementation quality20
4Testing15
5Review responsiveness10
6Documentation10
7Community interaction5
Total100

1. Problem Articulation (20 pts)

Can you state, in one paragraph, what was broken, for whom, under what conditions?

ScoreWhat it looks like
18-20Crisp one-paragraph statement covering symptom, trigger conditions, affected version range, and operational impact. Distinguishes "this is what the user sees" from "this is the underlying mechanism." Could be read aloud at a standup and a peer would correctly grasp the bug.
14-17Clear symptom but trigger conditions vague ("happens sometimes under load"). OR trigger clear but conflates symptom with root cause.
10-13Reader needs to ask follow-up questions to understand what was broken. Uses jargon without grounding it in user-visible behavior.
5-9Mostly restates the JIRA title. No conditions. No version impact.
0-4"It was broken and I fixed it."

Look for: the absence of the word "intermittent" without a documented trigger; conflation of symptom (vertex stuck) with cause (event short-circuit).


2. Execution-Path Mastery (20 pts)

Did you actually trace the code, or did you guess?

ScoreWhat it looks like
18-20Step-3 document maps the full path from user submission to bug location with file:line citations at every layer. Includes a diagram (mermaid or text-arrow). Cites the AsyncDispatcher event hop and the specific state-machine transition where the bug fires. Reviewer reading it could open each file at each line and follow the logic without asking questions.
14-17Most layers cited but one or two skipped ("then the event reaches VertexImpl"). Diagram present but missing a critical hop.
10-13Cites the location of the bug correctly but does not trace how execution reached it. No diagram.
5-9Vague references ("the dispatcher handles it") without file:line.
0-4No execution-path document, or it is just a paragraph of prose.

Look for: presence of tez-api/src/main/...-style paths with line numbers that match the resolved commit SHA.


3. Implementation Quality (20 pts)

Diff hygiene, scope discipline, convention compliance.

ScoreWhat it looks like
18-20Minimum-diff fix. Production change measured in tens of lines, not hundreds. Every changed line is justifiable in one sentence. No drive-by refactors, no opportunistic renames. Public API surface unchanged unless required. Naming, slf4j logging style, Preconditions, exception messages all match Tez conventions. Checkstyle, SpotBugs, RAT all green without manual overrides.
14-17Mostly minimum-diff but one or two stray changes that don't belong. Conventions mostly followed; minor style nits a reviewer would flag.
10-13Fix works but is broader than necessary. Scope creep ("while I was here I cleaned up..."). Conventions inconsistently applied.
5-9Significant scope creep. Public API changed unnecessarily. Style violations would block precommit without revision.
0-4Diff is so large reviewers would request it be broken up before reviewing. OR breaks public API silently.

Look for: scope-creep tells: git diff origin/master --stat with files unrelated to the bug touched.


4. Testing (15 pts)

Coverage, determinism, regression value.

ScoreWhat it looks like
14-15New unit test reproduces the bug deterministically on master (DrainDispatcher or equivalent), passes with fix. Negative-control test (similar input where the bug should NOT trigger) included. Branch coverage on the changed lines is high. Integration test with MiniTezCluster confirms the fix in an end-to-end DAG. No Thread.sleep, no wall-clock dependencies, no order-dependent assertions. Test ran 10x in a loop without flake.
11-13Unit test present and deterministic but no negative control. OR has an integration test but the unit test is weak.
7-10Unit test present but uses Thread.sleep or is otherwise non-deterministic. Coverage of fix path incomplete.
3-6Test exists but only checks the happy path; would have passed before the fix.
0-2No new tests, or tests that fail on master AND on the fix.

Look for: presence of dispatcher.await() rather than Thread.sleep; a test name that describes the scenario (testV_TASK_COMPLETED_inRunningWithRecovery) rather than the method (testHandle).


5. Review Responsiveness (10 pts)

How well you ran the review cycle.

ScoreWhat it looks like
9-10Every reviewer comment addressed in code or with a substantive reply. Iteration cadence < 48h on most comments. Disagreements (when they happened) made the technical case without defensiveness. Updated PR description after material changes so the top-of-PR text stays accurate.
7-8Addresses comments correctly but slow (multi-day gaps). OR addresses most comments but lets a few stylistic ones slide without acknowledgement.
5-6Defensive on at least one comment ("but I think my way is fine"). OR force-pushed without summarizing the diff for reviewers.
2-4Required multiple reminders from reviewers. Comments not addressed cleanly.
0-1PR went silent for > 2 weeks without explanation, or contributor argued every comment.

Look for: PR review threads marked "resolved" by the contributor with a substantive commit pushed, not just a reply.


6. Documentation (10 pts)

JIRA fields, code comments, write-up presence.

ScoreWhat it looks like
9-10JIRA has Component, Affects Version, Release Notes (if user-visible), PR link, and relevant cross-links. In-code comments cite TEZ-NNNN where the change is non-obvious. Write-up exists at a public URL. JIRA status correctly walked through In Progress -> Patch Available.
7-8JIRA mostly filled but Release Notes missing on a user-visible change. Code comments present but don't cite the JIRA.
5-6JIRA workflow followed but fields incomplete. No write-up beyond the PR description.
2-4JIRA fields blank or wrong. Comments absent at the surprising lines.
0-1No JIRA hygiene at all.

Look for: the JIRA's "Release Notes" field being populated or an explicit note explaining why it's intentionally blank.


7. Community Interaction (5 pts)

Mailing list etiquette, claiming/handoff hygiene.

ScoreWhat it looks like
5Claimed the JIRA before starting. Posted to dev@ only when meaningful (design question, summary after merge). Used [TEZ-NNNN] subject prefix. Was reachable during review. Thanked reviewers explicitly. If they hit a wall, posted clearly with "stuck on X, considering A/B/C, leaning A because Y."
3-4Mostly good etiquette; one minor slip (claimed late, or one off-topic mailing-list post).
1-2Did not claim the JIRA before working. OR sent mailing-list traffic that was really just chat ("does anyone know...").
0Worked silently for weeks, then dropped a PR with no JIRA assignment and no context.

Look for: a JIRA comment by the contributor before the first PR push, along the lines of "Working on this, will have a patch in a few days."


Tier Thresholds

Where you land tells you what to do next.

ScoreTierInterpretation
95-100PMC-readyThis is the quality of work that earns a committer vote, given a track record of several such contributions over months. You are operating at the level of someone the PMC would trust to maintain a module.
90-94Committer-readyYou are writing patches at committer quality. With 3-5 such contributions across different modules over 6-12 months and demonstrated review participation on others' patches, a vote is plausible.
80-89Strong contributorA reliable contributor whose patches need minimal review iteration. Keep building the track record; this is the level where committers actively look forward to reviewing your work.
65-79ContributorSolid bug-fix-grade work. Patches land with normal review iteration. Most contributions to most projects live here, and it is honorable work.
50-64LearningPatches eventually land but with significant reviewer guidance. Use the next contribution to focus on the dimension where you scored lowest.
< 50Foundational gapThe contribution may have merged, but the process skipped enough corners that another reviewer or future maintainer is paying a tax. Restart with a smaller bug and apply the rubric end-to-end.

The tier is not a personality assessment. It is calibrated to the artifact you produced for this one Capstone. The same person can score 65 on one contribution and 95 on the next.


How to Self-Grade

Block 30 minutes. Open this rubric. Open your own artifacts side by side (JIRA, PR, code, root-cause doc, write-up, validation report). Score each dimension by reading the band descriptions and picking the one that most honestly matches what you produced.

Two rules:

  1. No interpolation upward. If you're between 14 and 17 on a dimension and unsure, score 14. The optimist's tax.
  2. One independent reviewer. Ask a peer (ideally another contributor) to score independently on the same rubric. If your scores differ by more than 10 points on any dimension, talk about it. The difference is where the calibration lives.

Record both scores in capstone-work/self-grade.md along with one sentence per dimension on what would have moved the score up one band. This becomes the input for the next contribution's plan.


What to Do With a Low Score

Lowest dimensionNext contribution focus
Problem articulationPick a smaller, sharper bug. Write the one-paragraph statement before opening the JIRA edit, and post it for review.
Execution-path masteryPick a bug in a layer you've never traced (e.g. you've done DAG-level, now do shuffle-level). Force yourself to write the path doc before reading the existing tests.
Implementation qualityPick a bug where the minimum fix is < 10 lines. Practice the discipline of leaving the surrounding code untouched.
TestingPick a flaky-test JIRA (Stage 9 of the roadmap). The whole bug is about testing discipline.
Review responsivenessPick a bug in a high-traffic area where you'll get more reviewers. Set a 24-hour SLA for yourself on every comment.
DocumentationPick a bug that requires a Release Notes entry. Write the entry before the fix is done.
Community interactionReply substantively to three other contributors' patches before opening your next one.

Validation / Self-check

Before declaring the Capstone done:

  1. capstone-work/self-grade.md exists with a score per dimension and a total.
  2. The total is honest, not aspirational — you can defend each dimension's score with citations to your own artifacts.
  3. At least one independent reviewer has also scored, and disagreements

    10 points on any dimension have been discussed.

  4. The lowest dimension is identified and the next contribution's focus is written down.
  5. The score is recorded somewhere you'll see again in 3 months (calendar reminder, journal, follow-on JIRA list).
  6. You understand that the tier label ("Contributor", "Committer-ready") describes this one piece of work, not you.
  7. You have a candidate next bug picked, with the focus dimension in mind.