Lab 9.1 — Write Tests for Scheduler Behavior
Lab type: Build It — comprehensive test coverage
Estimated time: 3–4 hours
Tez module: tez-dag
Overview
The Tez task scheduler (TaskSchedulerEventHandler,
CapacityTaskScheduler, FairTaskScheduler) manages how containers are
requested from YARN and how pending tasks are assigned to available containers.
This is one of the least-tested areas of Tez. Well-written scheduler tests are highly valued by committers.
Step 1 — Understand the Scheduler Interface
find ~/tez-src -name "TaskScheduler.java" | head -3
find ~/tez-src -name "TaskSchedulerEventHandler.java" | head -3
find ~/tez-src -name "TestTaskScheduler*.java" | head -10
Open the scheduler interface and answer:
| # | Question |
|---|---|
| 1 | What events does TaskSchedulerEventHandler process? List all event types. |
| 2 | When a container becomes available, what is the algorithm for choosing which task to assign to it? |
| 3 | When Tez requests a container from YARN, what resource profile does it request? (CPU + memory?) |
| 4 | If YARN preempts a container, what does the scheduler do to the task that was running in it? |
Step 2 — Identify Missing Coverage
grep -n "public void test" \
~/tez-src/tez-dag/src/test/java/org/apache/tez/dag/app/rm/TestTaskSchedulerEventHandler.java \
| head -30
Find 3 scenarios that are NOT covered by existing tests. Good candidates:
- Container allocation after task is cancelled (race condition scenario)
- Scheduling under resource pressure (all containers allocated, new task arrives)
- Task scheduled to a blacklisted node
Step 3 — Write 3 New Tests
For each missing scenario, write a test following the pattern of the existing tests. Each test must:
- Set up the scheduler with a mock
RMCommunicatorandDAGAppMaster - Drive a sequence of events
- Assert on the scheduler's resulting state and on calls made to the mock YARN RM
@Test(timeout = 5000)
public void testTaskScheduledAfterContainerPreempted() {
// TODO: set up scheduler with 1 running container
// TODO: simulate YARN preemption of that container
// TODO: verify the task is re-queued (not dropped)
// TODO: simulate new container allocation
// TODO: verify the task is re-scheduled to the new container
}
Step 4 — Run and Verify
mvn test -pl tez-dag -Dtest=TestTaskSchedulerEventHandler -q 2>&1 | tail -10
Step 5 — Reflection
| # | Question |
|---|---|
| 1 | The test uses mocks for YARN and the DAGAppMaster. What real behavior is NOT exercised by this approach? |
| 2 | A scheduler has inherently concurrent behavior. How do the existing tests handle thread safety? |
| 3 | If you were to write an integration test for the scheduler (using MiniTezCluster), what would be harder to set up than in a unit test? What would be easier to assert? |