Local Mode
Tez ships two "no YARN" execution paths:
- Local mode —
tez.local.mode=true. The whole AM + all containers run in the calling JVM. No RM, no NM, no networking. - MiniTezCluster — a real YARN MiniCluster (RM + NMs as threads) with a real Tez AM submitted as a YARN app. Networking goes over loopback.
Both let you test without a cluster, but they are not interchangeable. This chapter explains the wiring and the tradeoffs.
Why a no-YARN path exists
Production Tez requires YARN to allocate containers. For:
- IDE-driven unit tests of vertex managers, edge managers, processors;
- short reproducers in JIRAs;
- in-process pipelines (e.g. running a DAG inline from a Hive Driver in a test);
paying YARN startup cost (30+ seconds) is intolerable. Local mode is the escape hatch.
grep -rn "tez.local.mode\|LOCAL_MODE" \
tez-api/src/main/java tez-dag/src/main/java | head
How tez.local.mode=true rewires the AM
grep -n "TEZ_LOCAL_MODE\|isLocalMode\|getBoolean.*LOCAL" \
tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java \
tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java
When tez.local.mode=true:
TezClient.start()does not submit to YARN. Instead it constructs aDAGAppMasterinstance directly in the client JVM and starts it as a service.TaskSchedulerManageris configured withLocalTaskSchedulerServiceinstead ofYarnTaskSchedulerService.ContainerLauncherManagerusesLocalContainerLauncherinstead ofContainerLauncherImpl.TaskCommunicatorManagerusesTezLocalTaskCommunicatorImplwhich bypasses RPC entirely.
The net effect: the AM, scheduler, container launcher, and TezChilds all
live in the same JVM, talking via in-process queues.
LocalTaskSchedulerService
find tez-dag/src/main/java -name "LocalTaskSchedulerService.java"
wc -l $(find tez-dag/src/main/java -name "LocalTaskSchedulerService.java")
Mirrors YarnTaskSchedulerService but the "resource pool" is a thread pool.
| Concept | Yarn version | Local version |
|---|---|---|
| Resource pool | YARN cluster | ExecutorService of bounded thread count |
allocateTask | AMRMClient.addContainerRequest | enqueue to local queue, immediately synthesize fake Container |
releaseAssignedContainer | RM release | return thread to pool |
| Locality | NODE_LOCAL/RACK_LOCAL | always ANY (single "node") |
| Priority | YARN priority class | honored as a queue-ordering hint |
Configuration:
grep -n "TEZ_AM_INLINE_TASK_EXECUTION_MAX_TASKS\|tez.am.inline" \
tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java
tez.am.inline.task.execution.max-tasks (default 1) controls thread-pool
size in local mode. Bumping this exposes concurrency bugs that production
container parallelism would also expose.
LocalContainerLauncher
find tez-dag/src/main/java -name "LocalContainerLauncher.java"
When the AM "launches" a local container, the launcher allocates a
LocalContainer worker that runs TezChild logic in the same process:
- No new JVM.
- No serialization of the
ContainerLaunchContext— the AM hands theTaskSpecdirectly to the local task runner. - The umbilical "RPC" is a Java method call on an in-process object.
This means: local mode does not exercise the RPC layer, classpath construction, NM localization, or token plumbing. Bugs in those paths are invisible to local-mode tests.
What local mode does not exercise
| Layer | Skipped in local mode |
|---|---|
| YARN RM scheduling | ✗ |
| NodeManager container launch | ✗ |
| Resource localization (HDFS download) | ✗ |
| AMRMToken / ClientToAMToken | ✗ |
| HDFS shuffle path (uses local FS only) | ✗ |
ShuffleHandler aux service | ✗ |
| RPC serialization | ✗ |
| JVM cold start / classloader isolation | ✗ |
What it does exercise: the DAG state machine, VertexManagers, EdgeManagers, sort/merge code, processors, and the umbilical event flow.
MiniTezCluster
find tez-tests/src/test/java -name "MiniTezCluster.java"
wc -l $(find tez-tests/src/test/java -name "MiniTezCluster.java")
A real cluster compressed onto one host. Inherits
MiniYARNCluster from Hadoop:
- One RM thread.
- N NM threads (configurable).
- A Tez AM submitted as a normal YARN application.
TezChildruns in separate JVMs spawned by NMContainerExecutor.- HDFS is
MiniDFSCluster(a few NameNode + DataNode threads in the same JVM) or aRawLocalFileSystem.
grep -n "MiniYARNCluster\|MiniDFSCluster\|appJar\|deploy" \
$(find tez-tests/src/test/java -name "MiniTezCluster.java") | head
Setup pattern
grep -rn "MiniTezCluster\b" tez-tests/src/test/java | head -10
MiniTezCluster cluster = new MiniTezCluster("test", numNMs, numDNs, racks);
cluster.init(conf);
cluster.start();
TezConfiguration tezConf = new TezConfiguration(cluster.getConfig());
TezClient client = TezClient.create("test", tezConf);
client.start();
client.waitTillReady();
client.submitDAG(myDag);
When MiniTezCluster is the right tool
- You are exercising RPC, security, or localization code.
- You hit
ShuffleHandlerpaths or HDFS-backed recovery (see failure-handling.md). - You're reproducing a bug that involves real container lifecycle (kill -9
vs orderly shutdown) — MiniCluster can
forkProcessand SIGKILL. - You need realistic counters and ATS event flow.
When MiniTezCluster is the wrong tool
- Pure VertexManager logic — use local mode or mock dispatcher.
- Pure IFile / sort behavior — use a unit test on the runtime-library classes directly.
- Anything where 30–60 s startup + heavy memory cost (~1 GB minimum) is intolerable.
Side-by-side comparison
| Aspect | Local mode | MiniTezCluster |
|---|---|---|
| Startup | < 1 s | 30–60 s |
| Memory | ~256 MB | 1 GB+ |
| YARN exercised | no | yes (in-process) |
| RPC exercised | no | yes (loopback) |
| Tokens exercised | no | yes (simple, unkerberized by default) |
| Separate JVMs for tasks | no | yes |
| HDFS | RawLocal | MiniDFS or RawLocal |
| Shuffle path | no ShuffleHandler | full ShuffleHandler |
| Use case | unit / integration of AM logic | end-to-end integration tests |
| Example test class | TestLocalMode | TestOrderedWordCount |
find tez-tests/src/test/java -name "TestLocalMode.java" \
-o -name "TestOrderedWordCount.java"
Worked example: switching between modes in one test
@Parameters
public static Iterable<Object[]> modes() {
return Arrays.asList(new Object[][] {{"local"}, {"mini"}});
}
@Before
public void setUp() throws Exception {
conf = new TezConfiguration();
if ("local".equals(mode)) {
conf.set("fs.defaultFS", "file:///");
conf.setBoolean(TezConfiguration.TEZ_LOCAL_MODE, true);
} else {
miniCluster = new MiniTezCluster("test", 1, 1, 1);
miniCluster.init(conf);
miniCluster.start();
conf = new TezConfiguration(miniCluster.getConfig());
}
client = TezClient.create("t", conf);
client.start();
client.waitTillReady();
}
This is the pattern in several Tez tests where a feature must work in both universes.
Reading exercise
wc -l $(find tez-dag/src/main/java -name "LocalTaskSchedulerService.java" \ -o -name "YarnTaskSchedulerService.java")— confirm the local version is much smaller.grep -n "tez.local.mode" $(find tez-dag/src/main/java -name "DAGAppMaster.java")— find every branch that depends on local mode.cat $(find tez-dag/src/main/java -name "LocalContainerLauncher.java") | head -160— how does it runTezChildwithout a fork?find tez-tests/src/test/java -name "MiniTezCluster.java" -exec grep -n "ShuffleHandler\|aux-services" {} \;— verify MiniTezCluster wires the YARN aux service.grep -rn "TEZ_LOCAL_MODE" tez-api tez-dag tez-runtime-internals | head— list every config read site.find tez-tests/src/test/java -name "TestLocal*" -o -name "TestMRR*"— read one local-mode and one MiniCluster test, side by side.
Common bugs and symptoms
| Symptom | Likely cause |
|---|---|
| Test passes in local mode, fails on cluster | Local mode skipped RPC/localization/tokens. Add a MiniCluster variant. |
MiniCluster test times out at waitTillReady | RM never registered the AM. Check tez-site.xml is on the AM classpath in the MiniCluster config. |
Local-mode race conditions only visible with inline.task.execution.max-tasks > 1 | Single-threaded local mode hides ordering bugs in VertexManager and dispatchers. |
ClassNotFoundException for custom processor in MiniCluster | Container localization needs the JAR; either put it on the launch classpath or register via LocalResources. |
| MiniCluster blows the heap | Default 1 NM + MiniDFS already 1 GB; bump JVM heap or reduce NM count to 1. |
| Hive integration test wedges only in MiniCluster | Hive needs full Hadoop config; check hadoop.security.authentication=simple in test conf. |
Validation: prove you understand this
- List four layers that local mode does not exercise. For each, name a bug class it can hide.
- In local mode, where does the "RPC" between
TezChildand the AM actually happen? Cite the file path. - Why is
tez.am.inline.task.execution.max-tasks=1the default in local mode? What test reliability tradeoff does it enforce? - Given a reproducer for a bug in
ShuffleHandleraux-service interaction, explain why aTestLocalMode-style test cannot reproduce it, and what the minimum MiniCluster setup is. - Show the minimum
TezConfigurationsetup for local mode in code. Three lines max.