Local Mode

Tez ships two "no YARN" execution paths:

  • Local modetez.local.mode=true. The whole AM + all containers run in the calling JVM. No RM, no NM, no networking.
  • MiniTezCluster — a real YARN MiniCluster (RM + NMs as threads) with a real Tez AM submitted as a YARN app. Networking goes over loopback.

Both let you test without a cluster, but they are not interchangeable. This chapter explains the wiring and the tradeoffs.


Why a no-YARN path exists

Production Tez requires YARN to allocate containers. For:

  • IDE-driven unit tests of vertex managers, edge managers, processors;
  • short reproducers in JIRAs;
  • in-process pipelines (e.g. running a DAG inline from a Hive Driver in a test);

paying YARN startup cost (30+ seconds) is intolerable. Local mode is the escape hatch.

grep -rn "tez.local.mode\|LOCAL_MODE" \
  tez-api/src/main/java tez-dag/src/main/java | head

How tez.local.mode=true rewires the AM

grep -n "TEZ_LOCAL_MODE\|isLocalMode\|getBoolean.*LOCAL" \
  tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java \
  tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java

When tez.local.mode=true:

  1. TezClient.start() does not submit to YARN. Instead it constructs a DAGAppMaster instance directly in the client JVM and starts it as a service.
  2. TaskSchedulerManager is configured with LocalTaskSchedulerService instead of YarnTaskSchedulerService.
  3. ContainerLauncherManager uses LocalContainerLauncher instead of ContainerLauncherImpl.
  4. TaskCommunicatorManager uses TezLocalTaskCommunicatorImpl which bypasses RPC entirely.

The net effect: the AM, scheduler, container launcher, and TezChilds all live in the same JVM, talking via in-process queues.


LocalTaskSchedulerService

find tez-dag/src/main/java -name "LocalTaskSchedulerService.java"
wc -l $(find tez-dag/src/main/java -name "LocalTaskSchedulerService.java")

Mirrors YarnTaskSchedulerService but the "resource pool" is a thread pool.

ConceptYarn versionLocal version
Resource poolYARN clusterExecutorService of bounded thread count
allocateTaskAMRMClient.addContainerRequestenqueue to local queue, immediately synthesize fake Container
releaseAssignedContainerRM releasereturn thread to pool
LocalityNODE_LOCAL/RACK_LOCALalways ANY (single "node")
PriorityYARN priority classhonored as a queue-ordering hint

Configuration:

grep -n "TEZ_AM_INLINE_TASK_EXECUTION_MAX_TASKS\|tez.am.inline" \
  tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java

tez.am.inline.task.execution.max-tasks (default 1) controls thread-pool size in local mode. Bumping this exposes concurrency bugs that production container parallelism would also expose.


LocalContainerLauncher

find tez-dag/src/main/java -name "LocalContainerLauncher.java"

When the AM "launches" a local container, the launcher allocates a LocalContainer worker that runs TezChild logic in the same process:

  • No new JVM.
  • No serialization of the ContainerLaunchContext — the AM hands the TaskSpec directly to the local task runner.
  • The umbilical "RPC" is a Java method call on an in-process object.

This means: local mode does not exercise the RPC layer, classpath construction, NM localization, or token plumbing. Bugs in those paths are invisible to local-mode tests.


What local mode does not exercise

LayerSkipped in local mode
YARN RM scheduling
NodeManager container launch
Resource localization (HDFS download)
AMRMToken / ClientToAMToken
HDFS shuffle path (uses local FS only)
ShuffleHandler aux service
RPC serialization
JVM cold start / classloader isolation

What it does exercise: the DAG state machine, VertexManagers, EdgeManagers, sort/merge code, processors, and the umbilical event flow.


MiniTezCluster

find tez-tests/src/test/java -name "MiniTezCluster.java"
wc -l $(find tez-tests/src/test/java -name "MiniTezCluster.java")

A real cluster compressed onto one host. Inherits MiniYARNCluster from Hadoop:

  • One RM thread.
  • N NM threads (configurable).
  • A Tez AM submitted as a normal YARN application.
  • TezChild runs in separate JVMs spawned by NM ContainerExecutor.
  • HDFS is MiniDFSCluster (a few NameNode + DataNode threads in the same JVM) or a RawLocalFileSystem.
grep -n "MiniYARNCluster\|MiniDFSCluster\|appJar\|deploy" \
  $(find tez-tests/src/test/java -name "MiniTezCluster.java") | head

Setup pattern

grep -rn "MiniTezCluster\b" tez-tests/src/test/java | head -10
MiniTezCluster cluster = new MiniTezCluster("test", numNMs, numDNs, racks);
cluster.init(conf);
cluster.start();

TezConfiguration tezConf = new TezConfiguration(cluster.getConfig());
TezClient client = TezClient.create("test", tezConf);
client.start();
client.waitTillReady();
client.submitDAG(myDag);

When MiniTezCluster is the right tool

  • You are exercising RPC, security, or localization code.
  • You hit ShuffleHandler paths or HDFS-backed recovery (see failure-handling.md).
  • You're reproducing a bug that involves real container lifecycle (kill -9 vs orderly shutdown) — MiniCluster can forkProcess and SIGKILL.
  • You need realistic counters and ATS event flow.

When MiniTezCluster is the wrong tool

  • Pure VertexManager logic — use local mode or mock dispatcher.
  • Pure IFile / sort behavior — use a unit test on the runtime-library classes directly.
  • Anything where 30–60 s startup + heavy memory cost (~1 GB minimum) is intolerable.

Side-by-side comparison

AspectLocal modeMiniTezCluster
Startup< 1 s30–60 s
Memory~256 MB1 GB+
YARN exercisednoyes (in-process)
RPC exercisednoyes (loopback)
Tokens exercisednoyes (simple, unkerberized by default)
Separate JVMs for tasksnoyes
HDFSRawLocalMiniDFS or RawLocal
Shuffle pathno ShuffleHandlerfull ShuffleHandler
Use caseunit / integration of AM logicend-to-end integration tests
Example test classTestLocalModeTestOrderedWordCount
find tez-tests/src/test/java -name "TestLocalMode.java" \
                              -o -name "TestOrderedWordCount.java"

Worked example: switching between modes in one test

@Parameters
public static Iterable<Object[]> modes() {
  return Arrays.asList(new Object[][] {{"local"}, {"mini"}});
}

@Before
public void setUp() throws Exception {
  conf = new TezConfiguration();
  if ("local".equals(mode)) {
    conf.set("fs.defaultFS", "file:///");
    conf.setBoolean(TezConfiguration.TEZ_LOCAL_MODE, true);
  } else {
    miniCluster = new MiniTezCluster("test", 1, 1, 1);
    miniCluster.init(conf);
    miniCluster.start();
    conf = new TezConfiguration(miniCluster.getConfig());
  }
  client = TezClient.create("t", conf);
  client.start();
  client.waitTillReady();
}

This is the pattern in several Tez tests where a feature must work in both universes.


Reading exercise

  1. wc -l $(find tez-dag/src/main/java -name "LocalTaskSchedulerService.java" \ -o -name "YarnTaskSchedulerService.java") — confirm the local version is much smaller.
  2. grep -n "tez.local.mode" $(find tez-dag/src/main/java -name "DAGAppMaster.java") — find every branch that depends on local mode.
  3. cat $(find tez-dag/src/main/java -name "LocalContainerLauncher.java") | head -160 — how does it run TezChild without a fork?
  4. find tez-tests/src/test/java -name "MiniTezCluster.java" -exec grep -n "ShuffleHandler\|aux-services" {} \; — verify MiniTezCluster wires the YARN aux service.
  5. grep -rn "TEZ_LOCAL_MODE" tez-api tez-dag tez-runtime-internals | head — list every config read site.
  6. find tez-tests/src/test/java -name "TestLocal*" -o -name "TestMRR*" — read one local-mode and one MiniCluster test, side by side.

Common bugs and symptoms

SymptomLikely cause
Test passes in local mode, fails on clusterLocal mode skipped RPC/localization/tokens. Add a MiniCluster variant.
MiniCluster test times out at waitTillReadyRM never registered the AM. Check tez-site.xml is on the AM classpath in the MiniCluster config.
Local-mode race conditions only visible with inline.task.execution.max-tasks > 1Single-threaded local mode hides ordering bugs in VertexManager and dispatchers.
ClassNotFoundException for custom processor in MiniClusterContainer localization needs the JAR; either put it on the launch classpath or register via LocalResources.
MiniCluster blows the heapDefault 1 NM + MiniDFS already 1 GB; bump JVM heap or reduce NM count to 1.
Hive integration test wedges only in MiniClusterHive needs full Hadoop config; check hadoop.security.authentication=simple in test conf.

Validation: prove you understand this

  1. List four layers that local mode does not exercise. For each, name a bug class it can hide.
  2. In local mode, where does the "RPC" between TezChild and the AM actually happen? Cite the file path.
  3. Why is tez.am.inline.task.execution.max-tasks=1 the default in local mode? What test reliability tradeoff does it enforce?
  4. Given a reproducer for a bug in ShuffleHandler aux-service interaction, explain why a TestLocalMode-style test cannot reproduce it, and what the minimum MiniCluster setup is.
  5. Show the minimum TezConfiguration setup for local mode in code. Three lines max.