Stage 11 — Backward Compatibility
What this stage teaches
Stage 11 is where every change you make is constrained by what was there before. You learn:
- The Apache
@InterfaceAudienceand@InterfaceStabilityannotations and what they obligate you to preserve. - The Tez API surface: which packages are
Public, which areLimitedPrivate("Hive,Pig"), and which arePrivate. The audience determines the cost of breaking a contract. - How to evolve a protobuf message without breaking older clients (optional fields, never reuse field numbers, never change a field type).
- The deprecation cycle: how long a deprecated symbol must remain before removal, and what evidence is required to declare it ready for removal.
- How to negotiate the dev@ conversation when a change is technically compatible but operationally disruptive.
The patches in this stage are often small. The thread is long. A compatibility change without a dev@ design thread is a Stage 11 patch that will be reverted.
The annotation taxonomy
Three audience levels:
| Annotation | Meaning | Examples |
|---|---|---|
@InterfaceAudience.Public | Any external consumer may call this. Removal is a major-version break. | TezClient, DAG, Vertex, Edge, Processor, most of tez-api. |
@InterfaceAudience.LimitedPrivate({"Hive","Pig"}) | Only the named projects may call this. Coordinate with them before changing. | Some internal-ish tez-api helpers used by Hive's DagUtils. |
@InterfaceAudience.Private | Internal to Tez. Free to change. | Everything in tez-dag/src/main/java/org/apache/tez/dag/app/.... |
Three stability levels:
| Annotation | Meaning |
|---|---|
@InterfaceStability.Stable | Compatible across minor versions. Removal requires a major bump. |
@InterfaceStability.Evolving | May change between minor versions, but deprecation cycle expected. |
@InterfaceStability.Unstable | Free to break at any time. |
The combined matrix gives nine cells. Most public Tez API is Public + Stable:
the most expensive to change. Most internal Tez API is Private + Unstable:
free to change.
Find the annotations:
cd ~/tez-src
grep -rn "@InterfaceAudience\|@InterfaceStability" tez-api/src/main/java | head -20
JIRA filter to find candidates
project = TEZ AND resolution = Unresolved
AND (text ~ "deprecate" OR text ~ "compatibility"
OR text ~ "InterfaceAudience" OR text ~ "protobuf"
OR labels = "incompatible")
ORDER BY priority DESC, updated DESC
Walked example A — adding an optional protobuf field
Symptom: Tez wants to add a per-vertex "originating-user-class" string to the DAGPlan so the AM can attribute resource usage. The DAGPlan is wire-serialised to YARN's RM cache, so older AMs must continue to deserialise plans without the new field.
Step 1 — Locate the proto
cd ~/tez-src
find . -name "*.proto" | head
grep -n "message VertexPlan" $(find . -name "*.proto") | head
Read the existing VertexPlan message. Note the highest field number in use
(say, 12). The new field must use a new number, not a recycled one.
Step 2 — The diff
--- a/tez-api/src/main/proto/DAGProtos.proto
+++ b/tez-api/src/main/proto/DAGProtos.proto
@@
message VertexPlan {
optional string name = 1;
...
optional int32 task_resource_memory_mb = 12;
+ // @since 0.10.4 — optional; old AMs ignore unknown fields.
+ optional string originating_user_class = 13;
}
Three rules:
- The field is
optional. Neverrequired— required fields break old readers. Tez uses proto2, whereoptionalis the default for fields you may add later. - The field number 13 has never been used before. Search the entire git
history:
to confirm.git log -p -S "= 13" -- tez-api/src/main/proto/DAGProtos.proto - The comment names the introduction release. Future contributors will use it to decide whether the field is safe to assume in their code path.
Step 3 — Producer and consumer sides
The producer in tez-api/src/main/java/org/apache/tez/dag/api/DAG.java
sets the field when known and leaves it unset when not. The consumer in
tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java
must tolerate the unset case:
+ if (vertexPlan.hasOriginatingUserClass()) {
+ this.originatingUserClass = vertexPlan.getOriginatingUserClass();
+ } else {
+ this.originatingUserClass = null;
+ }
The reviewer will reject any consumer that calls getOriginatingUserClass()
without first calling hasOriginatingUserClass(). Proto2 optional fields
return a default ("" for strings) when unset, which is not the same as
"absent".
Step 4 — Test the back-compat
The test is a serialisation round-trip with an older binary deserialiser:
@Test
public void testOldAMCanDeserialiseNewPlan() throws Exception {
VertexPlan newPlan = VertexPlan.newBuilder()
.setName("v1")
.setOriginatingUserClass("com.example.Job")
.build();
byte[] wire = newPlan.toByteArray();
// Parse as if we were an older AM that doesn't know the new field
// (use the generated descriptor with the field removed, or use
// DynamicMessage to ignore unknown fields).
VertexPlan parsed = VertexPlan.parseFrom(wire);
assertEquals("v1", parsed.getName());
// The unknown field is preserved in parsed.getUnknownFields() but
// ignored by the AM's logic. That is the contract.
}
A real test against an older Tez jar is also valuable; check it in as a resource.
Walked example B — deprecating a public method
Symptom: TezClient.submitDAG(DAG) returns a DAGClient whose getDAGStatus
contract is unclear. A new method submitDAGWithStatus(DAG) returns a
typed future. The old method should be deprecated.
The diff
--- a/tez-api/src/main/java/org/apache/tez/client/TezClient.java
+++ b/tez-api/src/main/java/org/apache/tez/client/TezClient.java
@@
+ /**
+ * @deprecated as of 0.10.4. Use {@link #submitDAGWithStatus(DAG)} which
+ * returns a typed future. This method will be removed in 0.11.0.
+ * See <a href="https://issues.apache.org/jira/browse/TEZ-XXXX">TEZ-XXXX</a>.
+ */
+ @Deprecated
public DAGClient submitDAG(DAG dag) throws ... { ... }
Rules for deprecation:
- The Javadoc names the replacement, the removal version, and the JIRA with the rationale.
- The
@Deprecatedannotation is on the method, not the class. - The implementation is unchanged. Deprecation is a docs-and-annotation change; behaviour stays the same so existing callers continue to work.
- Never delete a deprecated method in the same patch. Deprecation and removal are separate releases. The minimum cycle in Tez is one minor release as deprecated, then removal in the next major.
The removal patch goes in only when:
- The deprecation has been in a released version for at least one minor cycle.
- Search of downstream code (Hive, Pig, the Tez examples) confirms no remaining callers.
- A dev@ thread has confirmed removal is acceptable.
Walked example C — changing a LimitedPrivate("Hive") API
Symptom: a LimitedPrivate("Hive") helper in tez-api is mis-named. You
want to rename it.
This is not a free change, despite LimitedPrivate. The audience
("Hive") must be coordinated with. The workflow:
- File the TEZ ticket with the rename proposal.
- Search the Hive source for the existing name; if any caller uses it, write the HIVE-side patch first (deprecation-import shim).
- Add the new name in Tez. Keep the old name as a
@Deprecatedwrapper for one release. - Remove the old name in Tez only after Hive has shipped a release that uses the new name.
The contribution often spans two Tez releases and two Hive releases. That is
the cost of LimitedPrivate.
Pitfalls
- Don't reuse a protobuf field number after removing a field. Reserve it
with
reserved 7;in the proto file. Recycling a number breaks cross-version readers in undetectable ways. - Don't change the type of a protobuf field.
string→byteslooks identical on the wire but is incompatible at parse time. Add a new field with a new number; deprecate the old. - Don't widen a
PrivateAPI toPublicwithout a dev@ thread. Once public, you cannot retract. - Don't remove a
@Deprecatedmethod in the same release that introduces the deprecation. That defeats the purpose of deprecation. - Don't change the default value of a configuration key without a dev@ thread. Default changes are invisible to compile-time checks but catastrophic in production. They are a Stage 12-adjacent change.
- Don't introduce a new
Stableannotation lightly. OnceStable, the method is locked for a major-version cycle. - Don't assume Hadoop's compatibility annotations are identical in
meaning. They are similar but have project-specific nuance; read the
Tez project's
BUILDING.txtand thedev@archive before relying on them.
Exit criteria — when you're ready for the next stage
Move to Stage 12 when:
- You have shipped one compatibility-sensitive change (a protobuf evolution, a deprecation, or an API rename) with explicit annotations and dev@ sign-off.
- You can recite the audience × stability matrix and pick the correct cell for an arbitrary tez-api class.
- You have written a deprecation Javadoc that named the replacement, the removal version, and the JIRA without being prompted.
- You have read the
BUILDING.txtanddev@-archived compatibility guidance for Tez and Hadoop.
Stage 12 is the final stage: release-blocking issues and PMC-level work.