Compatibility
Tez is a library that ships into long-lived production clusters running Hive, Pig, and custom DAG applications. A compatibility break in Tez ripples out to every downstream project that depends on it. This chapter is the operational knowledge of what you may and may not change without breaking users.
The Three Compatibility Surfaces
Tez has three distinct compatibility surfaces, each with different rules:
| Surface | What it covers | Where defined |
|---|---|---|
| API compatibility | Source/binary compat of Java classes | @InterfaceAudience/@InterfaceStability annotations in tez-api |
| Wire compatibility | Serialised messages over the network | protobufs in */src/main/proto/ |
| Configuration compatibility | Config keys and default values | TezConfiguration constants in tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java |
A single patch may touch zero, one, two, or all three. Knowing which surface you're touching tells you which rules apply.
API Compatibility — The Annotation Grid
Every class in tez-api is (or should be) annotated. The two-axis grid:
@Stable | @Evolving | @Unstable | |
|---|---|---|---|
@Public | Compat across minor versions. Major bump to change. | May change across minor versions with deprecation. | May change across any release. |
@LimitedPrivate({"Hive"}) | Stable for named projects only (e.g. Hive). | Evolving for named projects. | Unstable, named projects only. |
@Private | Internal. No external compat. | Internal. | Internal. |
The annotations live at tez-api/src/main/java/org/apache/hadoop/classification/:
ls ~/tez-src/tez-api/src/main/java/org/apache/hadoop/classification/
# InterfaceAudience.java
# InterfaceStability.java
Verify a class:
grep -B2 "^public class Vertex" ~/tez-src/tez-api/src/main/java/org/apache/tez/dag/api/Vertex.java
You will see:
@Public
@Evolving
public class Vertex {
That tells you: external users may write code against Vertex, but the class may evolve
between minor versions. You may add methods. You should not remove or change the signature
of an existing method without deprecation.
What You Can and Can't Change
The decision matrix for modifying an existing public method:
| Change | @Public @Stable | @Public @Evolving | @Public @Unstable | @Private |
|---|---|---|---|---|
| Add new method to class | OK | OK | OK | OK |
| Add overload (different signature) | OK | OK | OK | OK |
| Add optional parameter (new overload) | OK | OK | OK | OK |
| Rename method | Major version only | Deprecate first | OK with note in CHANGES.md | OK |
| Change parameter type | Major version only | Deprecate + add new | OK | OK |
| Change return type (widening) | Major version only | OK with note | OK | OK |
| Change return type (narrowing) | Major version only | Major version only | OK | OK |
| Remove method | Major version only | Major after 1 minor deprecation | OK with note | OK |
| Change method behavior (same signature) | Avoid; needs dev@ discussion | Note in CHANGES.md | OK | OK |
The default rule for @Public @Stable: assume you can't change it. To change it, you
need dev@ agreement first.
Deprecation Procedure
When deprecating a @Public @Evolving method:
/**
* @deprecated Since 0.10.5, use {@link #setParallelism(int, VertexLocationHint)} instead.
* This method will be removed in 0.12.0.
*/
@Deprecated
public Vertex setParallelism(int parallelism) {
return setParallelism(parallelism, null);
}
Three required elements:
@Deprecatedannotation on the method.@deprecatedJavadoc tag explaining what to use instead.- A target removal version. Vague "may be removed" deprecations live forever.
Add a note to CHANGES.txt:
DEPRECATIONS:
TEZ-NNNN: Vertex.setParallelism(int) is deprecated; use setParallelism(int, VertexLocationHint).
Will be removed in 0.12.0.
Wire Compatibility — Protobufs
The DAGPlan protobuf is the most compatibility-sensitive file in Tez. It is the
serialised contract between:
- The Tez client (often inside Hive, Pig, or user code) and the AM
- The AM and history (
ATSHistoryLoggingService) - The AM and the recovery file
A DAGPlan written by a 0.10.3 client must be readable by a 0.10.5 AM. A DAGPlan
written today must be readable from recovery files written months ago.
The protobuf compatibility rules (protobuf 2.5 semantics, which Tez still uses for historic reasons):
Change to a .proto | Wire compat impact |
|---|---|
Add a new optional field with default | Forward + backward compatible |
Add a new repeated field | Forward + backward compatible |
Add a new required field | BREAKS old readers |
Remove an optional field | BREAKS if old readers ignore unknowns badly |
| Rename a field (same tag) | OK in wire, breaks source compat |
| Change a field's tag number | BREAKS wire compat |
| Change a field's type | Usually BREAKS |
Convert optional to repeated | BREAKS |
| Add a new enum value | BREAKS if old readers reject unknowns |
The hard rule for DAGApiRecords.proto:
ls ~/tez-src/tez-api/src/main/proto/
# DAGApiRecords.proto
# DAGClientAMProtocol.proto
# Events.proto
- Never reuse a tag number. Once tag
12was used, it's used forever. - Never change a field's type. Even widening (
int32toint64) is a wire break. - Never make an
optionalfieldrequired. - New fields go at the end with the next free tag number, marked
optional.
When adding a new field:
message VertexPlan {
required string name = 1;
optional int32 num_tasks = 2;
...
optional int64 last_modified_time = 11;
+ optional int32 max_attempts = 12;
}
The Java side should treat the new field as "may be absent" forever — old plans don't have it.
Recovery File Compatibility
The AM writes recovery files containing serialised DAGPlan and event records. On restart, the AM reads its own recovery file. A patched AM must be able to read recovery files written by the previous patched AM.
Practical rule: recovery is at least as wire-compat-sensitive as RPC. Treat every
DAGPlan change as a recovery-format change. Tests:
find ~/tez-src -name "TestDAGRecovery*.java"
find ~/tez-src -name "TestRecovery*.java"
If your patch touches a proto, run these tests and add a new case demonstrating old-format recovery still works.
History / ATS Compatibility
The history record format (used by the Tez UI and ATS) is also a wire format:
find ~/tez-src -name "HistoryEvent*.java" | head
find ~/tez-src -name "HistoryEvent.proto"
A change here breaks Tez UI queries on historical DAGs. The compatibility rule is the
same as for DAGPlan. The reviewer for any history-format patch is typically a Hive
committer who depends on the Tez UI.
Configuration Compatibility
Configuration keys are defined in TezConfiguration:
grep "public static final String TEZ_" \
~/tez-src/tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java | head -30
Each key looks like:
@ConfigurationProperty(type = "integer")
public static final String TEZ_AM_RESOURCE_MEMORY_MB = "tez.am.resource.memory.mb";
public static final int TEZ_AM_RESOURCE_MEMORY_MB_DEFAULT = 1024;
Adding a new key
OK at any time. Add the String constant, the _DEFAULT constant, an @Public /
@Unstable (or @Evolving) annotation if the surrounding class is annotated, and a
javadoc explaining the key and its valid range.
Renaming a key
This requires a deprecation alias. Tez has a deprecation mechanism via Hadoop's
Configuration.addDeprecation. Pattern:
public static final String TEZ_AM_RESOURCE_MEMORY_MB = "tez.am.resource.memory.mb";
// Old key, deprecated since 0.10.5.
public static final String TEZ_AM_RESOURCE_MEMORY_MB_DEPRECATED = "tez.am.memory.mb";
static {
Configuration.addDeprecation(
TEZ_AM_RESOURCE_MEMORY_MB_DEPRECATED,
TEZ_AM_RESOURCE_MEMORY_MB);
}
Old config files using the deprecated name continue to work. Log a warning on first read.
Removing a key
Only at a major version bump, after at least one minor version of deprecation. Document
in CHANGES.txt and the release notes.
Changing a default
Treat as a behavior change. Requires dev@ discussion if the change affects perf or
resource usage. Document the change explicitly:
DEFAULT CHANGES:
TEZ-NNNN: tez.am.resource.memory.mb default changed from 1024 to 1536 to reduce OOMs
on large DAGs. Users with tight container budgets should explicitly set the
old value.
Compatibility Across Tez and Hive/Pig
Tez has cross-project compatibility commitments to Hive and Pig — they bundle Tez and
expect a Tez version bump not to break them. The mechanism is @LimitedPrivate.
grep -rn "@LimitedPrivate" ~/tez-src/tez-api/src/main/java | head
A class annotated @LimitedPrivate({"Hive"}) has API compatibility guaranteed to Hive
only. The Tez side may not break it without first warning dev@hive.apache.org. The
Hive side commits to not relying on anything other than @LimitedPrivate or @Public
APIs.
When you change a @LimitedPrivate({"Hive"}) class:
- Search Hive for usage:
grep -rn <ClassName> ~/hive-src/ql/src/ - If Hive uses it, post a heads-up on
dev@hive.apache.orgreferencing the JIRA. - Consider providing both old and new methods for one Tez minor version.
Validation Artifacts
After this chapter you should have:
- A
~/tez-notes/compat-cheatsheet.mdwith the API matrix from above. - A list of every
.protofile intez-apiand which compat surface each protects. - The set of files in
tez-api/.../classification/open in your IDE for reference. - Knowledge of which Hive classes import from
tez-api:grep -rn "import org.apache.tez" ~/hive-src/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ | head - The ability to predict, for any change, which compat surface(s) it touches and what the deprecation timeline would be.
The next chapter — Meritocracy — is the project-level perspective: how Apache Tez decides who gets to make compatibility decisions.