Step 7: Validation
Your patch compiles. Your new tests pass. That is not enough. Validation is proving that the rest of the build — full module test suites, the static analyzers Tez runs, the legal scanner, the end-to-end examples — is also still green. Reviewers will not run this for you. They will check that you ran it and reject the PR if you didn't.
Budget: 1–2 evenings. Most of it is waiting on mvn test.
The Validation Checklist
In order. Do not skip steps because the previous step passed.
- Full test suite of every module you touched.
- Full clean build of the whole repo.
- Checkstyle.
- SpotBugs.
- Apache RAT (license header check).
TestOrderedWordCountend-to-end.- Re-run your original Step 2 reproducer to confirm green.
- Regression sweep of any module that depends on what you changed.
- Performance validation (if perf-relevant).
Capture the output of each into capstone-work/validation/. You'll cite it
in the PR description.
1. Full Module Tests
The module you changed:
cd ~/tez-src
mvn test -pl tez-dag -q 2>&1 | tee capstone-work/validation/01-tez-dag-test.log
This will take 5–20 minutes depending on the module. tez-dag is the slowest
non-integration module. While it runs, work on the diff cleanup.
When it finishes, scroll to the summary lines. Look for:
[INFO] Tests run: 1342, Failures: 0, Errors: 0, Skipped: 17
If you see Failures > 0, open every failure. Then triage:
- My fix caused it. Go back to Step 5. Reread the test. Either your fix is wrong, or the test is wrong (rare — assume the test is right until proven otherwise).
- It is a known flaky test. Grep the JIRA:
git log --grep="<TestName>". If there is an open ticket, link it in your PR description ("known flake, see TEZ-XYZ"). If there is not, file one before claiming the green. - It is also broken on master. Verify by running
git stash && mvn test ... && git stash pop. If it fails on master too, link the JIRA or file one. Do not let your PR be the one to surface a pre-existing failure silently.
Run for every module you touched. If you touched tez-api, you touched
everything downstream — plan accordingly.
2. Full Clean Build
The compilation gate. Catches missing imports, accidental Java-version features, downstream API breaks:
mvn clean install -DskipTests -q 2>&1 \
| tee capstone-work/validation/02-clean-install.log
Expect a clean BUILD SUCCESS. Common failures:
- Missing import. Your IDE auto-imported something not on the classpath of a downstream module.
- API break. You changed a public method signature in
tez-apiand a downstream caller broke. Either revert the signature change or update the caller. - Java version. You used
varor text blocks. Tez compiles to a JDK baseline (checkpom.xmlfor<maven.compiler.target>). Use compatible syntax.
3. Checkstyle
Tez uses checkstyle aggressively. Run:
mvn checkstyle:check -q 2>&1 \
| tee capstone-work/validation/03-checkstyle.log
Or, per module:
mvn checkstyle:check -pl tez-dag
Common violations and fixes:
| Violation | Fix |
|---|---|
| Line longer than 120 chars | Break the line. Indent continuation 4 spaces. |
| Wildcard import | Replace with explicit imports. |
| Missing javadoc on public method | Add /** ... */ block. |
| Trailing whitespace | Configure your editor to strip it on save. |
| Tab character | Convert to 2 spaces (Tez uses 2-space indent in most modules). |
| Method ordering | Public before private; static before instance. |
The checkstyle config lives at tez-build-tools/src/main/resources/tez/checkstyle/checkstyle.xml
— read it to understand the rules.
4. SpotBugs
Static analysis for null-deref, unchecked cast, dead-store, etc.:
mvn spotbugs:check -q 2>&1 \
| tee capstone-work/validation/04-spotbugs.log
If it fails, view the report:
mvn spotbugs:gui -pl tez-dag
Common warnings worth fixing:
NP_NULL_ON_SOME_PATH— your new code dereferences a value that can be null on some branch.EI_EXPOSE_REP— your getter returns a mutable internal collection directly. Wrap inCollections.unmodifiableList(...)or copy.RV_RETURN_VALUE_IGNORED_BAD_PRACTICE— the result offile.delete()was ignored.
Warnings already present on master are not your problem to fix, but the
analyzer will fail the build if your change introduces new ones.
git diff origin/master tez-dag/target/spotbugsXml.xml (after running on
both branches) tells you which are new.
5. Apache RAT (License Headers)
Every new .java, .xml, .properties file must carry the ASL header.
RAT enforces this:
mvn apache-rat:check -q 2>&1 \
| tee capstone-work/validation/05-rat.log
If it complains about your new test file, prepend the standard header:
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
(Copy from any existing Tez file — it is the canonical form.)
For shell, properties, and XML files, use the appropriate comment syntax. Look at neighboring files in the same directory.
6. TestOrderedWordCount End-to-End
The closest thing to a smoke test of "does Tez actually still work for a real user workload":
mvn test -pl tez-tests -Dtest=TestOrderedWordCount -q 2>&1 \
| tee capstone-work/validation/06-orderedwordcount.log
Takes 2–5 minutes. If this fails when your unit tests pass, your fix likely broke an interaction your unit test didn't exercise. Common culprits:
- You changed an event ordering and a downstream component assumed the old ordering.
- You added a config key default that breaks the example's expectations.
- Your
MiniTezClustertest is leaking state into a sibling test.
7. Re-Run Your Original Step 2 Reproducer
Sanity check. The thing you set out to fix is still fixed:
mvn test -pl <module> -Dtest=<YourReproTest> 2>&1 \
| tee capstone-work/validation/07-repro.log
Five runs:
for i in 1 2 3 4 5; do
mvn test -pl <module> -Dtest=<YourReproTest> -q
done
Five greens. Or you have not actually shipped a fix.
8. Regression Sweep
Run the test suite of every module that depends on what you changed. If you
touched tez-api, that is everything. If you touched tez-runtime-library,
that is at least tez-tests, tez-mapreduce, and tez-examples.
# Identify dependents
grep -l "tez-runtime-library" $(find ~/tez-src -name pom.xml)
# Run each
mvn test -pl tez-mapreduce -q | tail -5
mvn test -pl tez-examples -q | tail -5
mvn test -pl tez-tests -q | tail -10
If tez-tests takes too long (it can — there are real MiniTezCluster
runs in there), at least run the tests whose name contains your changed
class:
mvn test -pl tez-tests -Dtest='*Vertex*' -q
9. Performance Validation (If Relevant)
Skip this section unless your fix touches scheduling, shuffle, or any code path documented as "hot." For those, use async-profiler or JFR to capture a flamegraph before and after.
async-profiler pattern
# Start the JVM under test (e.g. a MiniTezCluster integration test)
mvn test -pl tez-tests -Dtest=TestPerfWorkload -DforkMode=never &
TEST_PID=$!
# Attach profiler
~/async-profiler/profiler.sh -d 60 -f /tmp/flame-before.svg $TEST_PID
# Apply your fix, repeat
~/async-profiler/profiler.sh -d 60 -f /tmp/flame-after.svg $TEST_PID
Compare the two SVGs. The stack frames you care about (e.g.
ShuffleManager.run, MergeManager.merge) should not be wider after your
fix than before. If they are, you have introduced a regression and you owe
the JIRA an explanation.
Simpler: timing assertions in a JUnit test
@Test
public void testShuffleNotSlowerAfterFix() throws Exception {
long start = System.nanoTime();
runShuffleWorkload();
long elapsedMs = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
// Loose bound — assert no >30% regression vs. a previously-measured baseline.
assertTrue("shuffle took " + elapsedMs + "ms, expected < 15000",
elapsedMs < 15_000);
}
Brittle. Only add if perf is truly the concern.
The Validation Report
Compile everything into one document for the PR:
# Validation report for TEZ-NNNN
## Environment
- JDK: `java -version` -> openjdk version "11.0.21"
- Maven: `mvn -version` -> Apache Maven 3.9.6
- OS: macOS 14.2 / Linux 5.15.0-91-generic
- Tez HEAD: `git rev-parse origin/master` -> a1b2c3d4
## Results
| Check | Status | Notes |
|---|---|---|
| `mvn test -pl tez-dag` | PASS | 1342 tests, 0 failures, 17 skipped |
| `mvn clean install -DskipTests` | PASS | |
| `mvn checkstyle:check` | PASS | |
| `mvn spotbugs:check` | PASS | |
| `mvn apache-rat:check` | PASS | |
| `mvn test -pl tez-tests -Dtest=TestOrderedWordCount` | PASS | |
| Original reproducer | PASS (5/5 runs) | |
| `mvn test -pl tez-mapreduce` | PASS | |
| `mvn test -pl tez-examples` | PASS | |
## Known flakes encountered
- TestSomething#testWhatever — pre-existing flake, see TEZ-XXXX, not caused by this change.
## Performance
- Not applicable / no perf-relevant code paths touched.
Save as capstone-work/validation/REPORT.md. Paste it (or a summary plus
link) into your PR description.
Validation / Self-check
Before advancing to Step 8:
capstone-work/validation/contains one log file per check (logs 01–07 at minimum).capstone-work/validation/REPORT.mdexists with the table above filled in honestly.- Every check passes, or every failure is documented as a pre-existing issue with a JIRA link.
- You re-ran your Step 2 reproducer five times with your fix applied and got 5/5 green.
- You ran the test suite of at least one module that depends on the one you changed (regression sweep).
- No new SpotBugs warnings introduced (diff against master baseline).
- The validation report is short enough to paste into a PR description without making the reviewer scroll for a screen.