Lab 3.1: Trace a DAG Submission End-to-End

Background

A DAG goes from a Java object constructed with the API to running tasks in containers through a sequence of method calls, IPC calls, and event posts that spans six class boundaries and three JVMs. This lab asks you to trace that path precisely — class name, method name, and the data that crosses each boundary.

Being able to reconstruct this trace from code (not from documentation) is the skill. That means reading DAGAppMaster.java, DAGImpl.java, VertexImpl.java, and TezChild.java and following the chain yourself.


The Six Class Boundaries

[1] TezClient.submitDAG(dag)
         │
         │  DAGClientAMProtocol (IPC) — carries: SubmitDAGRequest{DAGPlan}
         ▼
[2] DAGClientHandler.submitDAG(request)   [in DAGAppMaster]
         │
         │  posts: DAGAppMasterEvent(NEW_DAG_SUBMITTED)
         ▼
[3] DAGAppMaster.handle(event)
         │
         │  calls createDag(dagPlan) → new DAGImpl(...)
         │  posts: DAGEvent(DAG_INIT)
         ▼
[4] DAGImpl.handle(DAGEvent{DAG_INIT})
         │
         │  InitTransition: initializes all VertexImpl objects
         │  posts: VertexEvent(V_INIT) for each vertex
         ▼
[5] VertexImpl.handle(VertexEvent{V_INIT})
         │
         │  InitTransition: sets up tasks, calls VertexManager
         │  posts: VertexEvent(V_START) when ready
         │  posts: TaskEvent(T_SCHEDULE) for each task
         ▼
[6] TaskImpl → TaskAttemptImpl → ContainerLauncher → NM
         │
         │  NM starts container JVM: TezChild.main()
         ▼
[Container JVM] TezChild receives task assignment via TezTaskUmbilicalProtocol
         │
         ▼
LogicalIOProcessorRuntimeTask.run()  — Processor.run() called

Step-by-Step Tasks

Step 1: Find the Entry Point in TezClient

Open tez-api/src/main/java/org/apache/tez/dag/api/TezClient.java.

Find the submitDAG(DAG dag) method. Answer:

  1. What is the name of the IPC protocol interface used to communicate with the AM?
  2. What does TezClient do if it does not yet have an AM to talk to (session not started)?
  3. What method on the DAG object serializes it to a DAGPlan protobuf?
  4. What request object wraps the DAGPlan before it is sent over IPC?
# Find the IPC protocol interface
grep -n "Protocol" tez-api/src/main/java/org/apache/tez/dag/api/TezClient.java | head -10

# Find DAGPlan construction
grep -n "DAGPlan\|createDag\|getPlan" tez-api/src/main/java/org/apache/tez/dag/api/TezClient.java

Step 2: Find the AM-side IPC Handler

The AM exposes the DAGClientAMProtocol interface. The implementation is in DAGAppMaster.

# Find the implementation of submitDAG on the AM side
grep -rn "submitDAG" tez-dag/src/main/java/org/apache/tez/dag/app/ | grep -v test

Open the handler class. Answer:

  1. What is the exact class name that implements DAGClientAMProtocol?
  2. What event type does it post to the AsyncDispatcher after receiving the DAGPlan?
  3. Does the submitDAG call on the AM side block until the DAG completes, or does it return immediately?

Step 3: Trace DAGAppMaster Initialization

Open tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java.

Find the serviceStart() method. Read the component initialization order:

  1. List the components initialized in serviceStart() in order
  2. Find where AsyncDispatcher is created and started
  3. Find where the DAGEventDispatcher (the component that routes DAGEvents to DAGImpl) is registered
# Find component initialization
grep -n "addService\|serviceStart\|startService" \
  tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java | head -20

Step 4: Read the DAGImpl Init Transition

Open tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java.

Find the StateMachineFactory definition. Locate the transition for DAGEventType.DAG_INIT.

The transition handler class is InitTransition. Find it in the same file.

Answer:

  1. What does InitTransition.transition() do with each vertex in the DAG?
  2. After initializing vertices, what event does DAGImpl post?
  3. Under what condition does the init transition immediately move to RUNNING vs waiting?
# Find the init transition
grep -n "InitTransition\|DAG_INIT" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java | head -20

Step 5: Read the VertexImpl Init Transition

Open tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java.

Find the transition from INITIALIZING on event V_INIT. The handler is InitTransition (a different class from the one in DAGImpl).

Answer:

  1. What is the VertexManager and when is it invoked during initialization?
  2. How does VertexImpl know how many tasks to create (the parallelism)?
  3. What event does VertexImpl send to DAGImpl when initialization completes?
# Find vertex init transition
grep -n "V_INIT\|InitTransition" \
  tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java | head -20

Step 6: Trace the Container Launch

After tasks are scheduled, TaskAttemptImpl requests a container from the TaskScheduler. When a container is assigned, ContainerLauncher builds the launch context.

# Find the container launch command construction
grep -rn "containerLaunchContext\|getContainerLaunchContext\|vargs" \
  tez-dag/src/main/java/org/apache/tez/dag/app/launcher/ | grep -v test | head -10

Answer:

  1. What is the main class of the container JVM? (The class with main() that YARN launches)
  2. What information is passed to TezChild via system properties vs environment variables?
  3. How does TezChild know which task to run when it starts?

Step 7: Read TezChild.main()

Open tez-dag/src/main/java/org/apache/tez/dag/app/TezChild.java.

Find the main() method and the run() loop.

Answer:

  1. What IPC interface does TezChild use to communicate with the AM?
  2. What does TezChild do when it receives a TaskSpec from the AM?
  3. What class is instantiated to actually run the processor?
# Find TezChild
find tez-dag/src/main/java -name "TezChild.java"
wc -l $(find tez-dag/src/main/java -name "TezChild.java")

Complete the Trace Table

Fill in this table by reading the code (not from this page or any other documentation):

StepClassMethodData / Event
1TezClientsubmitDAG()Sends SubmitDAGRequest{DAGPlan} via IPC
2?submitDAG()Posts event ???
3DAGAppMasterhandle()Creates DAGImpl, posts DAGEvent{DAG_INIT}
4DAGImplInitTransition.transition()Posts VertexEvent{V_INIT} for each vertex
5VertexImplInitTransition.transition()Posts TaskEvent{T_SCHEDULE} for each task
6TaskAttemptImpl?Requests container from RM via TaskScheduler
7ContainerLauncher?Launches container JVM with TezChild as main class
8TezChildrun()Receives task spec, starts processor
9LogicalIOProcessorRuntimeTaskrun()Calls Processor.run()

Fill in the ? cells from the actual code. Each cell should contain the real method name.


Expected Output

A completed trace table with all cells filled from code, not from documentation. Each answer should be verifiable by pointing to a specific line in a specific file.

Example format for your notes:

Step 2: DAGClientHandler.submitDAG()
  in: tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java
  line: ~1234
  posts: DAGAppMasterEvent(NEW_DAG_SUBMITTED)

Stretch Goals

  1. Find the AsyncDispatcher queue size configuration. What happens if the queue fills up?

    grep -rn "AsyncDispatcher\|dispatcher.queue" \
      tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java | head -10
    
  2. Find where the AM is told to exit when the DAG completes:

    grep -n "stop\|shutdown\|exit" \
      tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java | grep -i "succeeded\|complete"
    
  3. Trace what happens to a TA_DONE event from TezChild back to DAGImpl:

    • TezChild calls a method on the umbilical
    • The AM receives it and posts a TaskAttemptEvent
    • TaskAttemptImpl transitions to SUCCEEDED
    • The chain continues up to DAGImpl Identify every class and event in this reverse chain.