TezClient
TezClient is the client-side API: the class your driver code instantiates
to start an AM, submit DAGs, and (optionally) keep the AM alive across DAGs.
This chapter walks bring-up, the session vs non-session distinction, local
resource staging, RPC submission, and ATS hookup.
After this chapter you should be able to point at every line of code that runs
between TezClient.create(...) and the moment a DAG appears inside the AM
ready to be start()ed.
Files to open
tez-api/src/main/java/org/apache/tez/client/
TezClient.java
TezClientUtils.java
TezSessionImpl.java
FrameworkClient.java
TezYarnClient.java (YARN-backed FrameworkClient)
LocalClient.java (in-process FrameworkClient for local mode)
Plus the YARN-AM protocol definition:
tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClient.java
tez-api/src/main/proto/DAGClientAMProtocol.proto
Two modes: session and non-session
The mode is chosen at TezClient.create(...):
TezClient client = TezClient.create(
"MyApp",
tezConf,
isSession /* true = session mode */);
| Property | Non-session | Session |
|---|---|---|
| AM lifetime | Per DAG | Across many DAGs |
start() semantics | No-op (AM launched at submitDAG) | Launches AM and waits for it to register |
| Allowed DAGs in flight | 1 | 1 (sequential within a session by default) |
| Keep-alive | n/a | tez.session.am.dag.submit.timeout.secs |
| Use case | One-shot jobs (CLI tools, scheduled batch) | Latency-sensitive (Hive, Pig, interactive) |
The AM keep-alive timer is critical. In session mode, after a DAG completes the AM waits for the configured timeout for a new DAG. If none arrives, it shuts down to free YARN resources. Find the timer:
grep -n "AMSessionDAGSubmitTimeout\|dag.submit.timeout" \
tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java
Bring-up control flow
sequenceDiagram
participant U as User code
participant TC as TezClient
participant TCU as TezClientUtils
participant YC as TezYarnClient
participant RM as YARN RM
participant AM as DAGAppMaster
U->>TC: TezClient.create(name, conf, isSession)
U->>TC: addAppMasterLocalFiles(map)
U->>TC: start()
TC->>TCU: createApplicationSubmissionContext(...)
TCU->>TCU: stage local resources to HDFS
TCU->>TCU: build classpath & env
TC->>YC: submitApplication(appSubmissionContext)
YC->>RM: submitApplication
RM-->>YC: appId
Note over RM,AM: RM launches AM container
AM->>AM: serviceInit, serviceStart
AM-->>TC: AM registers via heartbeat; TC sees RUNNING
U->>TC: submitDAG(dag)
TC->>AM: DAGClientAMProtocol.submitDAG(rpcCall)
AM-->>TC: dagId
TC-->>U: DAGClient
Where each call lives:
TezClient.start()→TezClientUtils.createFinalConfProtoForApp()→TezClientUtils.createApplicationSubmissionContext()→frameworkClient.submitApplication(...).TezClient.submitDAG(dag)→getSessionAMProxy()→dagAMProtocol.submitDAG(submitRequest)(the YARN AM proxy).
grep -n "submitApplication\|submitDAG\|dagAMProtocol" \
tez-api/src/main/java/org/apache/tez/client/TezClient.java
Local resources that TezClientUtils uploads
A YARN container starts with a clean working directory plus whatever local resources the AM submission context declares. For Tez, that includes:
- Tez framework tarball — pointed to by
tez.lib.uris(or a local jar list). Containstez-api.jar,tez-dag.jar,tez-runtime-*.jar, etc. - User application jars — anything you added via
TezClient.addAppMasterLocalFiles(Map<String, LocalResource>)plusaddTaskLocalFiles. - The DAGPlan — not a local resource. It is sent via the
submitDAGRPC payload.
Inspect:
grep -n "tez.lib.uris\|TezConfiguration.TEZ_LIB_URIS\|addAppMasterLocalFiles" \
tez-api/src/main/java/org/apache/tez/client/TezClient.java \
tez-api/src/main/java/org/apache/tez/client/TezClientUtils.java
The AMRM token is delivered by YARN when the container starts; Tez does not manage it directly.
The submission RPC
The protocol is defined in:
tez-api/src/main/proto/DAGClientAMProtocol.proto
grep -n "rpc " tez-api/src/main/proto/DAGClientAMProtocol.proto
Key RPCs:
| RPC | What it does |
|---|---|
submitDAG | Submit a new DAG to a running AM |
getDAGStatus | Poll status (also used by DAGClient) |
getVertexStatus | Poll a specific vertex |
tryKillDAG | Initiate kill |
shutdownSession | Stop the AM in session mode |
The RPC server lives in the AM (DAGClientHandler and its Protobuf
implementation):
grep -rn "DAGClientAMProtocol\|submitDAG" \
tez-dag/src/main/java/org/apache/tez/dag/api/client/ 2>/dev/null | head
ATS / Timeline Service integration
When tez.history.logging.service.class is set to
ATSHistoryLoggingService (the default in many distros), TezClient does
not publish events itself — the AM does, via the HistoryEventHandler.
However, TezClient does:
- Set
tez.history.logging.service.classinto the AM env. - Provide ATS credentials in the application submission context.
Read:
grep -rn "ATSHistoryLoggingService\|YARN_TIMELINE_SERVICE" \
tez-api/src/main/java/org/apache/tez/client/
For the AM-side, see counters-diagnostics.md.
TezSessionImpl vs TezClient
There is a subclass relationship: TezSessionImpl was the older name; modern
Tez uses TezClient with isSession=true, but TezSessionImpl still
appears in some codepaths. The two are largely interchangeable. Inspect both:
grep -n "class TezClient\|class TezSessionImpl" \
tez-api/src/main/java/org/apache/tez/client/*.java
Reading exercise
sed -n '1,120p' tez-api/src/main/java/org/apache/tez/client/TezClient.java
grep -n "submitDAG\b" tez-api/src/main/java/org/apache/tez/client/TezClient.java
grep -n "stopSession\|stop\|close" \
tez-api/src/main/java/org/apache/tez/client/TezClient.java
grep -rn "submitApplication" tez-api/src/main/java/org/apache/tez/client/
Answer:
- What is the difference between
TezClient.stop()in session vs non-session mode? - When
TezClient.submitDAG()is called for a DAG that conflicts with one currently running in the session, what happens? - Find the timeout used while waiting for the AM to reach
RUNNINGafterstart(). Which config key controls it? - What pre-condition does
submitDAGenforce on the DAG's vertex names with respect to previously-submitted DAGs in the same session? - Trace
addAppMasterLocalFiles(...)end-to-end. Where do those files end up on HDFS? - Why is
tez.lib.urissometimes a directory and sometimes a tarball? What doesTezClientUtils.setupTezJarsLocalResourcesdo for each case?
Common bugs and symptoms
| Symptom | Root cause | Fix |
|---|---|---|
AM never reaches RUNNING; client hangs in start() | tez.lib.uris points to a path the NodeManager can't read | Verify HDFS perms; check NM logs |
submitDAG throws SessionNotRunning | AM died (idle timeout, crash) | Catch, recreate TezClient, resubmit |
submitDAG blocks forever | Previous DAG still in flight in the session | Don't reuse session for parallel DAGs; or wait |
IOException: Failed to submit application | RM rejected (queue full, ACL) | Inspect RM logs; verify queue config |
| AM starts but cannot talk back to client | Client behind NAT; AM cannot reach client's RPC server | Use polling-only DAGClient; avoid callbacks |
Tasks fail with ClassNotFoundException for user code | addTaskLocalFiles not called for that jar | Add jars via both addAppMasterLocalFiles and addTaskLocalFiles if used in tasks |
Validation: prove you understand this
- Write a 30-line Java driver that creates a
TezClientin session mode, submits two DAGs back-to-back, prints bothDAGClient.getDAGStatus()results, and shuts down cleanly. - From
TezClient.java, list every method that ultimately reachesdagAMProtocol. - Explain why
addAppMasterLocalFilesis aMap<String, LocalResource>and not aList<Path>. - From the proto file
DAGClientAMProtocol.proto, write the exact request message used bysubmitDAG. - Reproduce the "AM idle timeout" path on
MiniTezCluster: submit one DAG, wait past the configured timeout, attempt a second submit, observe the exception class and message.