Lab 1.1: Build Apache Tez from Source
Background
Apache Tez is a multi-module Maven project. Building from source is the mandatory first step for any contributor — you need the ability to make code changes, rebuild specific modules, and run tests against your local changes. This lab walks through the full build, from cloning to verifying artifacts.
Why This Lab Matters for Contributors
- You cannot submit a credible patch without first verifying it builds cleanly
- Knowing which Maven flags control which modules saves hours during development
- Understanding the build structure helps you scope test runs efficiently
- Build failures are sometimes real bugs — knowing a clean build baseline lets you detect regressions
Prerequisites
Verify before starting:
java -version # Must be Java 8 or Java 11
mvn -version # Must be Maven 3.6.3 or newer
git --version # Must be 2.x
Disk space: at least 10 GB free. The full build with tests generates large artifacts.
Memory: at least 8 GB RAM. The tez-dag unit tests can spike to 4 GB during parallel runs.
Step-by-Step Tasks
Step 1: Clone the Repository
git clone https://github.com/apache/tez.git
cd tez
The GitHub repository at https://github.com/apache/tez is a mirror of the canonical
Apache GitBox repository. For contribution purposes (submitting patches via JIRA), the
GitHub mirror is acceptable for development. The patch will be attached to the JIRA issue
rather than sent as a GitHub PR — this is Apache's traditional workflow.
Verify the remote:
git remote -v
# origin https://github.com/apache/tez.git (fetch)
# origin https://github.com/apache/tez.git (push)
Step 2: Inspect the Branch Structure
git branch -r | grep -v HEAD | sort
You will see branches like:
origin/master— development trunkorigin/branch-0.10— stable release branchorigin/branch-0.9— older stable branch
For contributor work, use master unless you are reproducing an issue specific to a
release branch. Bug fixes for release branches are typically backported from master.
Check the current Hadoop dependency in pom.xml:
grep -m1 "hadoop.version" pom.xml
This tells you which Hadoop version Tez is built against. The default Hadoop version target controls which APIs are available.
Step 3: Full Build (Skip Tests)
mvn install -DskipTests -q
Expected duration: 5–15 minutes depending on hardware and Maven cache state.
The first run downloads all dependencies. With a warm Maven cache (~/.m2/repository),
subsequent builds of unchanged modules are near-instant due to incremental compilation.
What -DskipTests does:
Skips compilation and execution of test classes. Use this for iterative development when you
are not changing test code.
What -q does:
Suppresses INFO-level Maven output. Remove -q if you need to debug build failures.
When the build completes, you will see:
[INFO] BUILD SUCCESS
[INFO] Total time: X min Y s
If you see BUILD FAILURE, go to the Troubleshooting section below.
Step 4: Verify Build Artifacts
After a successful build, key JARs exist in each module's target/ directory:
find . -name "tez-dag-*.jar" -not -path "*/test-*" | grep -v sources
# Expected: ./tez-dag/target/tez-dag-<version>.jar
find . -name "tez-api-*.jar" -not -path "*/test-*" | grep -v sources
# Expected: ./tez-api/target/tez-api-<version>.jar
The assembled distribution tarball is built by a separate command:
mvn package -DskipTests -Pdist -q
ls tez-dist/target/*.tar.gz
This produces the full binary distribution used by HDP and other distributions.
Step 5: Build a Single Module
During development you will almost always build a single module to save time:
# Build only tez-dag and its dependencies
mvn install -DskipTests -pl tez-dag -am -q
# Build only tez-api (no dependencies needed — it has none in Tez)
mvn install -DskipTests -pl tez-api -q
-pl specifies the module path. -am (also-make) builds all upstream dependencies first.
This is the command you will run hundreds of times during contributor work.
Step 6: Configure IntelliJ IDEA
IntelliJ handles Maven multi-module projects natively.
File → Open→ select thetez/directory (the one containingpom.xml)- IntelliJ detects the Maven project and imports all modules
- When prompted, select the JDK that matches the build (Java 8 or Java 11)
- Wait for the initial index build to complete (2–5 minutes)
Verify the import worked:
- Open
tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java Ctrl+Clickon any class reference — it should navigate correctly- Open
Find Class(Cmd+O/Ctrl+N) and searchTestDAGImpl— it should find the test
Enable checkstyle integration:
- Install the
CheckStyle-IDEAplugin (Settings → Plugins) - Configure it to use
src/config/checkstyle.xmlin the Tez repo root - This gives you real-time checkstyle feedback as you edit
Implementation Requirements
This lab has no code to implement. Deliverables are:
- A successful
mvn install -DskipTestsrun (screenshot or terminal output) - Identification of the Hadoop version Tez is built against
- Location of the
tez-dag-<version>.jarartifact - A working IntelliJ project that resolves all imports
Troubleshooting Common Build Failures
"Source/Target Java version mismatch"
error: Source option X is no longer supported. Use Y or later.
Cause: Your JAVA_HOME or java in PATH is the wrong version.
Fix:
export JAVA_HOME=$(/usr/libexec/java_home -v 11) # macOS
export PATH=$JAVA_HOME/bin:$PATH
java -version # verify
mvn install -DskipTests -q
"Cannot resolve dependency: org.apache.hadoop:..."
Cause: The required Hadoop version is not in Maven Central or your local cache.
Fix: Ensure Maven Central is reachable. If building offline, use an internal repository
mirror. On a clean machine with network access this should not occur.
"Killed" or "Out of Memory"
Cause: Maven forked JVM runs out of heap.
Fix:
export MAVEN_OPTS="-Xmx4g -XX:MaxPermSize=512m"
mvn install -DskipTests -q
"ERROR: Failed to execute goal ... tez-tests"
Cause: The tez-tests module requires specific integration test infrastructure.
Fix: Build only the modules you need:
mvn install -DskipTests -pl tez-api,tez-dag,tez-runtime-library,tez-examples -am -q
Expected Output
[INFO] Reactor Summary:
[INFO] Apache Tez ......................................... SUCCESS [ 2.345 s]
[INFO] tez-api ............................................ SUCCESS [ 15.678 s]
[INFO] tez-dag ............................................ SUCCESS [ 45.123 s]
[INFO] tez-runtime-internals .............................. SUCCESS [ 12.456 s]
[INFO] tez-runtime-library ................................ SUCCESS [ 18.789 s]
[INFO] tez-mapreduce ...................................... SUCCESS [ 8.012 s]
[INFO] tez-examples ....................................... SUCCESS [ 5.234 s]
...
[INFO] BUILD SUCCESS
Stretch Goals
-
Build against a specific Hadoop version by overriding the
hadoop.versionproperty:mvn install -DskipTests -Dhadoop.version=3.3.6 -q -
Inspect the generated
effective-pom.xmlfortez-dagto see all inherited dependency versions:mvn help:effective-pom -pl tez-dag | grep -A3 "dependency>" -
Identify which modules depend on
tez-apiby inspecting allpom.xmlfiles:grep -r "tez-api" */pom.xml | grep "artifactId"
Related Real-World Issue Types
- Build breakage issues (e.g., dependency version conflicts) — you can observe but not fix at Level 1
- Java version compatibility issues — important context when reading bug reports