Licensing
Apache licensing is precise. The rules are not "be reasonable about open source"; they are a specific framework administered by Apache Legal. Getting them wrong blocks a release. This chapter is the working knowledge needed by committers and PMC, plus the bits every contributor should know before adding a dependency.
The Apache License 2.0
Apache Tez is licensed under the Apache License, Version 2.0 ("ALv2"). This is a permissive license that allows:
- Use, reproduction, modification, distribution
- Commercial use
- Patent grant (explicitly, unlike MIT/BSD)
- Sublicensing under different terms (with attribution)
In exchange:
- You include the LICENSE and NOTICE in distributions
- You note significant modifications
- You preserve attribution and patent grants
Practically, ALv2 is one of the most permissive copyleft-free licenses. It's compatible with almost everything except GPL 2.0 (and is one-way compatible with GPL 3.0).
The Three Files in the Tez Repo Root
| File | Purpose |
|---|---|
LICENSE | The Apache License 2.0 text, plus appendices for any bundled third-party code under different licenses |
NOTICE | Required attributions for bundled code (Apache + any NOTICE-bearing deps) |
KEYS (in dist, not repo) | PGP keys used to sign releases |
ls ~/tez-src/LICENSE ~/tez-src/NOTICE
cat ~/tez-src/NOTICE
For Tez source releases, LICENSE and NOTICE are typically short — the source tarball
bundles no third-party code. For convenience binary releases, both grow with the bundled
jars.
Category A / B / X — The Dependency Classes
Apache Legal classifies third-party licenses into categories. The full list is at Apache Legal Resolved. Summary:
| Category | Meaning | Examples | Can it be a Tez dependency? |
|---|---|---|---|
| A | Compatible with ALv2 | ALv2, MIT, BSD 2/3-clause, ISC, MPL 2.0 | Yes; document in LICENSE/NOTICE if bundled |
| B | Compatible with conditions | EPL 1.0/2.0, CDDL 1.0/1.1, MPL 1.1, IBM Public License 1.0 | Yes, but only as bundled binary, not source. Add LICENSE/NOTICE entry. |
| X | Incompatible | GPL (any version), AGPL, LGPL 2.0/2.1 (kind of), SSPL, BUSL, CC-BY-NC | No. May not be bundled in any release. Runtime optional dep only, with care. |
The hard cases:
- LGPL is category X for binary distribution but acceptable as an optional runtime
dependency. Be careful; this is one of the most-asked questions on
legal-discuss@apache.org. - CC-BY-SA and other ShareAlike licenses depend on the work: data and documentation are sometimes B, sometimes X.
- Bespoke licenses (custom permissive licenses) must be reviewed before use.
If you are uncertain, post on legal-discuss@apache.org with a link to the license text.
Don't guess.
"GPL Contamination"
Apache projects cannot ship GPL code. The rule has corollaries that catch people:
| Action | OK? |
|---|---|
| Tez code calls a GPL library via reflection at runtime | No — if the library must be present, it's a dep |
| Tez code can optionally integrate with a GPL tool the user installs themselves | Yes — runtime-optional, user-supplied |
| Tez ships a GPL jar in the binary tarball | No |
| Tez build script downloads a GPL jar during build | No (this is contamination) |
| Tez source contains a comment "see SOME GPL CODE for reference" | Risky — get review |
| Tez source copies a snippet from GPL code | No — pollutes the codebase |
The conservative rule: GPL code may exist near Tez (a user's runtime environment) but not in Tez (source or binary distribution).
Adding a New Dependency — Procedure
When a patch proposes a new third-party dependency:
- Identify the license. Open the project's
LICENSEfile. Don't read the GitHub "License" sidebar; it can be wrong. - Classify. Category A, B, or X (above). If A, proceed. If B, plan for LICENSE / NOTICE updates and PMC discussion. If X, stop.
- Check transitive deps. A category-A library may pull in a category-X transitive.
Use
mvn dependency:treeand verify every transitive's license. - Justify. On the JIRA, explain why this dep is needed and why no in-tree alternative suffices.
- Update LICENSE. If the dep is bundled in the binary release (it usually is), add an appendix entry naming the dep, its license, and where to find the full license text.
- Update NOTICE. If the dep ships a NOTICE file, copy the required text into Tez's NOTICE. Read the dep's NOTICE; not all of it is required.
- Test the build. Run
mvn apache-rat:checkand a full build. The dep should not produce RAT-flagged files (most don't).
PMC review the dependency before commit. If you are PMC, ask:
- Is the license correctly classified?
- Is the dep maintained?
- What is the size cost (Tez binary tarball grows by N MB)?
- Are there security advisories against the version proposed?
Apache RAT in Tez Pre-commit
Apache RAT (Release Audit Tool) checks that every source file has an Apache license header. It is part of every Tez release vote and should be part of every contributor's pre-submit.
Run:
cd ~/tez-src
mvn apache-rat:check
Output on success:
[INFO] BUILD SUCCESS
Output on failure:
[ERROR] Files with unapproved licenses:
tez-dag/src/main/java/.../NewClass.java
The fix is to add the license header. The standard Java header is at the top of any existing Tez Java file; copy it.
RAT can be configured to allow certain files to be exempt (e.g. generated .proto-derived
files, META-INF/). The exemption config lives in the parent pom.xml:
grep -A20 "apache-rat-plugin" ~/tez-src/pom.xml
Adding a new file type that legitimately can't carry a header (e.g. a JSON test fixture) requires updating the exemption list and noting it in the JIRA.
License Header Template
For .java:
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
For .proto:
//
// Licensed to the Apache Software Foundation (ASF) ... (same content with // comments)
//
For .xml:
<!--
Licensed to the Apache Software Foundation (ASF) ... (same content)
-->
For .sh / .py:
#
# Licensed to the Apache Software Foundation (ASF) ... (same content with # comments)
#
For .md: by convention, no header is needed for markdown docs in the source tree, but
project policy may require one. Check mvn apache-rat:check output.
The Tez NOTICE File
A typical Tez NOTICE:
Apache Tez
Copyright 2014-YYYY The Apache Software Foundation
This product includes software developed at
The Apache Software Foundation (https://www.apache.org/).
Plus, if bundled deps require:
This product bundles SomeLibrary, which is available under
the Foo Bar License. See <path or URL>.
NOTICE is not:
- A list of contributors (that's
CHANGES.txtand git). - A thank-you list.
- A list of services or users.
Keep it minimal and legally precise.
Source vs Binary Release — Different Rules
Apache makes a sharp distinction:
| Aspect | Source release | Binary release |
|---|---|---|
| Status | Official Apache release | Convenience artifact |
| What's bundled | Source code only | Compiled jars, possibly third-party jars |
| Must have ALv2 LICENSE | Yes | Yes |
| Must have NOTICE | Yes | Yes; longer than source NOTICE |
| Must pass RAT | Yes | Source check passes for binary, plus binary-bundled jars are exempt |
| Category B bundling | Generally allowed in source, restrictive | Allowed with LICENSE/NOTICE entry |
| Category X bundling | Never | Never |
Practical implication: a source release rarely bundles anything except Tez's own source.
A binary release bundles tez-dist/target/apache-tez-X.Y.Z-bin.tar.gz which contains
all the runtime jars Tez depends on (Hadoop, Jackson, etc.).
Common Licensing Mistakes
| Mistake | Caught by | Fix |
|---|---|---|
| New file without Apache header | mvn apache-rat:check | Add header |
| Random third-party snippet pasted into Tez | Code review | Replace with original code or pull in via dep |
| New category-B dep with no LICENSE update | PMC at release vote | Update LICENSE |
| New category-X dep | PMC at release vote | Remove dep |
| NOTICE accidentally cleared | Code review | Restore from prior release |
Copyright (c) Company Name in a file | Code review | Replace with Apache header; Company-owned code requires CLA review |
What ICLAs and CCLAs Cover
Two contributor license agreements:
| CLA | Who signs | What it covers |
|---|---|---|
| ICLA (Individual) | An individual contributor | Their personal contributions |
| CCLA (Corporate) | A company's authorised signatory | Contributions by listed employees |
An ICLA is required for any non-trivial contribution. A CCLA is required if the contribution is made in the contributor's capacity as a company employee.
PMC members can verify ICLA status via secretary@apache.org. For a casual single-patch
contributor, the trivial-patch exception often applies and no ICLA is needed; for a
contributor on path to committer, the ICLA needs to be on file by the second or third
patch.
Validation Artifacts
After this chapter:
- A
~/tez-notes/license-categories.mdcheatsheet of A/B/X with examples. - The reflex to run
mvn apache-rat:checkin your pre-submit script. - The discipline to check a new dep's category before opening a JIRA proposing it.
- The ability to read Tez's
NOTICEfile and confirm what each line is there for.
The next chapter — Code Style & Trust — closes the section with the operational mechanics of style enforcement and the trust ladder a contributor climbs.