TaskImpl Lifecycle
TaskImpl is the AM-side representation of one logical task within a vertex.
It is a relatively small state machine, but it owns a critical piece of
policy: which attempt of this task is the "winner." This chapter walks
the states, the attempt management rules, speculation, and the max-failed
threshold that promotes a task to "this whole vertex must fail."
After this chapter you should be able to explain why a task with three failed
attempts may still be RUNNING while another with one failed attempt is
already FAILED.
File
tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java
Tests:
tez-dag/src/test/java/org/apache/tez/dag/app/dag/impl/TestTaskImpl.java
The states
grep -n "TaskState\." tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java | head
grep -n "public enum TaskState\|enum TaskState" \
tez-api/src/main/java/org/apache/tez/dag/api/event/TaskState.java
| State | Meaning |
|---|---|
NEW | Constructed; no attempts yet |
SCHEDULED | First attempt requested from scheduler |
RUNNING | At least one attempt is RUNNING |
SUCCEEDED | Terminal: one attempt succeeded; task complete |
KILLED | Terminal: explicitly killed (vertex termination, user) |
FAILED | Terminal: max attempts exceeded |
TaskImpl does not have INITIALIZING or TERMINATING — those concerns
belong to the vertex.
State × event matrix
| State | Event | Next state | Action |
|---|---|---|---|
| NEW | T_SCHEDULE | SCHEDULED | create first TaskAttemptImpl, send TA_SCHEDULE |
| SCHEDULED | T_ATTEMPT_LAUNCHED | RUNNING | mark first attempt as running |
| RUNNING | T_ATTEMPT_SUCCEEDED | SUCCEEDED | pick this attempt as the winner; kill others (if speculating) |
| RUNNING | T_ATTEMPT_FAILED | RUNNING (retry) or FAILED (exceeded) | spawn new attempt or terminate |
| RUNNING | T_ATTEMPT_KILLED | RUNNING | no-op unless this was last attempt |
| RUNNING | T_ADD_SPEC_ATTEMPT | RUNNING | spawn a duplicate attempt |
| RUNNING | T_TERMINATE | KILLED | kill all attempts |
| any | T_RECOVER_* | recovered state | replay events |
Count transitions:
grep -c "addTransition" tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java
Retry: how max-failed-attempts works
The config:
grep -n "TASK_MAX_FAILED_ATTEMPTS\|tez.am.task.max.failed.attempts" \
tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java
Default is 4 in most branches; a task is FAILED only after N attempts
have failed (not been killed).
Failed vs killed distinction:
| Outcome | Counts toward max.failed.attempts? |
|---|---|
TaskAttempt failed (own crash, processor exception) | yes |
TaskAttempt killed by speculation (lost the race) | no |
TaskAttempt killed because vertex terminated | no |
TaskAttempt killed because container preempted | no |
The classification is owned by TaskAttemptTerminationCause (see
task-attempt-lifecycle.md). TaskImpl.handle
consults the cause when deciding whether to retry or fail.
grep -n "TerminationCause\|isFailureCause" \
tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java | head
Speculation
Speculation runs a second copy of a task before the first finishes, hoping the second wins. Implementation:
tez-dag/src/main/java/org/apache/tez/dag/app/dag/speculate/
LegacySpeculator.java
SimpleSpeculator.java (varies by version)
legacy/RuntimeTaskStatsEstimator.java
The speculator emits T_ADD_SPEC_ATTEMPT events into the dispatcher; the
task spawns an additional attempt. The first attempt to succeed wins; the
others are killed with cause TERMINATED_BY_OWNER (or similar). Killed
attempts do not count toward max.failed.attempts.
Enabled by:
grep -n "tez.am.speculation.enabled\|speculation" \
tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java | head
"Best attempt" selection
When multiple attempts of the same task exist, the first to send
TA_DONE (successful completion) wins. The handler:
- Marks that attempt as the canonical one (cached in
TaskImpl). - Iterates remaining attempts, sending each a kill event.
- Transitions task to
SUCCEEDED.
grep -n "successfulAttempt\|setWinnerAttempt\|markSuccessful" \
tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java | head
Downstream vertices reading from this task's output use the winner's
outputLocationHint for shuffle (see shuffle-sort.md).
Reading exercise
# Surface
sed -n '1,120p' tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java
# State machine block
grep -n "addTransition" tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java | head -40
# Retry logic
grep -n "addAttempt\|nextAttemptNumber\|createAttempt" \
tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java | head
# Speculation hook
grep -rn "T_ADD_SPEC_ATTEMPT" tez-dag/src/main/java/org/apache/tez/dag/app/ | head
Answer:
- What is the precise condition for transitioning from
RUNNINGtoFAILEDon aT_ATTEMPT_FAILEDevent? Cite the line. - Where is a new
TaskAttemptImplconstructed? Is it a public method or private toTaskImpl? - How does
TaskImplknow whether a failed attempt should count toward the failure budget? - In what state can
T_ADD_SPEC_ATTEMPTarrive? What does the handler do? - Why does
TaskImplnot own its own scheduling? Who does? - When a task succeeds with two parallel attempts, which one becomes the downstream input? How is the loser cleaned up?
Common bugs and symptoms
| Symptom | Root cause | Where to look |
|---|---|---|
| Task retries forever and never fails | max.failed.attempts set absurdly high; or all failures classified as "kill" | Check config; verify TerminationCause for each failure |
| Speculation kills the original just after it succeeds (lost work) | Race on markSuccessful and speculative-attempt kill | Ensure speculator backs off when task is in completing |
Task SUCCEEDED but a sibling attempt still appears as RUNNING for a long time | Container slow to acknowledge kill | Look at ContainerHeartbeatHandler and TA_KILL_REQUEST |
Task succeeded reported but downstream cannot fetch outputs | Race between TA_DONE and output ready event | Check ordering of outputReady umbilical calls |
Recovery brings task back as RUNNING even though it had finished | Missing TaskFinishedEvent in recovery log | Investigate RecoveryService flush boundaries |
Validation: prove you understand this
- Draw the
TaskImplstate machine from memory, including all six states. - From
TestTaskImpl, identify a test that drives a task toFAILED. Walk the events it sends. - List the four
TaskAttemptTerminationCausecategories that do not count towardmax.failed.attempts. Cite the enum and the consumer. - Trace, line by line, what
TaskImpldoes whenT_ATTEMPT_SUCCEEDEDarrives for the second of two concurrent attempts. - Modify
TaskImplto log the winner's attempt number explicitly at theINFOlevel. Run aMiniTezClusterjob and observe.