Licensing

Apache licensing is precise. The rules are not "be reasonable about open source"; they are a specific framework administered by Apache Legal. Getting them wrong blocks a release. This chapter is the working knowledge needed by committers and PMC, plus the bits every contributor should know before adding a dependency.

The Apache License 2.0

Apache Tez is licensed under the Apache License, Version 2.0 ("ALv2"). This is a permissive license that allows:

  • Use, reproduction, modification, distribution
  • Commercial use
  • Patent grant (explicitly, unlike MIT/BSD)
  • Sublicensing under different terms (with attribution)

In exchange:

  • You include the LICENSE and NOTICE in distributions
  • You note significant modifications
  • You preserve attribution and patent grants

Practically, ALv2 is one of the most permissive copyleft-free licenses. It's compatible with almost everything except GPL 2.0 (and is one-way compatible with GPL 3.0).

The Three Files in the Tez Repo Root

FilePurpose
LICENSEThe Apache License 2.0 text, plus appendices for any bundled third-party code under different licenses
NOTICERequired attributions for bundled code (Apache + any NOTICE-bearing deps)
KEYS (in dist, not repo)PGP keys used to sign releases
ls ~/tez-src/LICENSE ~/tez-src/NOTICE
cat ~/tez-src/NOTICE

For Tez source releases, LICENSE and NOTICE are typically short — the source tarball bundles no third-party code. For convenience binary releases, both grow with the bundled jars.

Category A / B / X — The Dependency Classes

Apache Legal classifies third-party licenses into categories. The full list is at Apache Legal Resolved. Summary:

CategoryMeaningExamplesCan it be a Tez dependency?
ACompatible with ALv2ALv2, MIT, BSD 2/3-clause, ISC, MPL 2.0Yes; document in LICENSE/NOTICE if bundled
BCompatible with conditionsEPL 1.0/2.0, CDDL 1.0/1.1, MPL 1.1, IBM Public License 1.0Yes, but only as bundled binary, not source. Add LICENSE/NOTICE entry.
XIncompatibleGPL (any version), AGPL, LGPL 2.0/2.1 (kind of), SSPL, BUSL, CC-BY-NCNo. May not be bundled in any release. Runtime optional dep only, with care.

The hard cases:

  • LGPL is category X for binary distribution but acceptable as an optional runtime dependency. Be careful; this is one of the most-asked questions on legal-discuss@apache.org.
  • CC-BY-SA and other ShareAlike licenses depend on the work: data and documentation are sometimes B, sometimes X.
  • Bespoke licenses (custom permissive licenses) must be reviewed before use.

If you are uncertain, post on legal-discuss@apache.org with a link to the license text. Don't guess.

"GPL Contamination"

Apache projects cannot ship GPL code. The rule has corollaries that catch people:

ActionOK?
Tez code calls a GPL library via reflection at runtimeNo — if the library must be present, it's a dep
Tez code can optionally integrate with a GPL tool the user installs themselvesYes — runtime-optional, user-supplied
Tez ships a GPL jar in the binary tarballNo
Tez build script downloads a GPL jar during buildNo (this is contamination)
Tez source contains a comment "see SOME GPL CODE for reference"Risky — get review
Tez source copies a snippet from GPL codeNo — pollutes the codebase

The conservative rule: GPL code may exist near Tez (a user's runtime environment) but not in Tez (source or binary distribution).

Adding a New Dependency — Procedure

When a patch proposes a new third-party dependency:

  1. Identify the license. Open the project's LICENSE file. Don't read the GitHub "License" sidebar; it can be wrong.
  2. Classify. Category A, B, or X (above). If A, proceed. If B, plan for LICENSE / NOTICE updates and PMC discussion. If X, stop.
  3. Check transitive deps. A category-A library may pull in a category-X transitive. Use mvn dependency:tree and verify every transitive's license.
  4. Justify. On the JIRA, explain why this dep is needed and why no in-tree alternative suffices.
  5. Update LICENSE. If the dep is bundled in the binary release (it usually is), add an appendix entry naming the dep, its license, and where to find the full license text.
  6. Update NOTICE. If the dep ships a NOTICE file, copy the required text into Tez's NOTICE. Read the dep's NOTICE; not all of it is required.
  7. Test the build. Run mvn apache-rat:check and a full build. The dep should not produce RAT-flagged files (most don't).

PMC review the dependency before commit. If you are PMC, ask:

  • Is the license correctly classified?
  • Is the dep maintained?
  • What is the size cost (Tez binary tarball grows by N MB)?
  • Are there security advisories against the version proposed?

Apache RAT in Tez Pre-commit

Apache RAT (Release Audit Tool) checks that every source file has an Apache license header. It is part of every Tez release vote and should be part of every contributor's pre-submit.

Run:

cd ~/tez-src
mvn apache-rat:check

Output on success:

[INFO] BUILD SUCCESS

Output on failure:

[ERROR] Files with unapproved licenses:
  tez-dag/src/main/java/.../NewClass.java

The fix is to add the license header. The standard Java header is at the top of any existing Tez Java file; copy it.

RAT can be configured to allow certain files to be exempt (e.g. generated .proto-derived files, META-INF/). The exemption config lives in the parent pom.xml:

grep -A20 "apache-rat-plugin" ~/tez-src/pom.xml

Adding a new file type that legitimately can't carry a header (e.g. a JSON test fixture) requires updating the exemption list and noting it in the JIRA.

License Header Template

For .java:

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

For .proto:

//
// Licensed to the Apache Software Foundation (ASF) ... (same content with // comments)
//

For .xml:

<!--
   Licensed to the Apache Software Foundation (ASF) ... (same content)
-->

For .sh / .py:

#
# Licensed to the Apache Software Foundation (ASF) ... (same content with # comments)
#

For .md: by convention, no header is needed for markdown docs in the source tree, but project policy may require one. Check mvn apache-rat:check output.

The Tez NOTICE File

A typical Tez NOTICE:

Apache Tez
Copyright 2014-YYYY The Apache Software Foundation

This product includes software developed at
The Apache Software Foundation (https://www.apache.org/).

Plus, if bundled deps require:

This product bundles SomeLibrary, which is available under
the Foo Bar License. See <path or URL>.

NOTICE is not:

  • A list of contributors (that's CHANGES.txt and git).
  • A thank-you list.
  • A list of services or users.

Keep it minimal and legally precise.

Source vs Binary Release — Different Rules

Apache makes a sharp distinction:

AspectSource releaseBinary release
StatusOfficial Apache releaseConvenience artifact
What's bundledSource code onlyCompiled jars, possibly third-party jars
Must have ALv2 LICENSEYesYes
Must have NOTICEYesYes; longer than source NOTICE
Must pass RATYesSource check passes for binary, plus binary-bundled jars are exempt
Category B bundlingGenerally allowed in source, restrictiveAllowed with LICENSE/NOTICE entry
Category X bundlingNeverNever

Practical implication: a source release rarely bundles anything except Tez's own source. A binary release bundles tez-dist/target/apache-tez-X.Y.Z-bin.tar.gz which contains all the runtime jars Tez depends on (Hadoop, Jackson, etc.).

Common Licensing Mistakes

MistakeCaught byFix
New file without Apache headermvn apache-rat:checkAdd header
Random third-party snippet pasted into TezCode reviewReplace with original code or pull in via dep
New category-B dep with no LICENSE updatePMC at release voteUpdate LICENSE
New category-X depPMC at release voteRemove dep
NOTICE accidentally clearedCode reviewRestore from prior release
Copyright (c) Company Name in a fileCode reviewReplace with Apache header; Company-owned code requires CLA review

What ICLAs and CCLAs Cover

Two contributor license agreements:

CLAWho signsWhat it covers
ICLA (Individual)An individual contributorTheir personal contributions
CCLA (Corporate)A company's authorised signatoryContributions by listed employees

An ICLA is required for any non-trivial contribution. A CCLA is required if the contribution is made in the contributor's capacity as a company employee.

PMC members can verify ICLA status via secretary@apache.org. For a casual single-patch contributor, the trivial-patch exception often applies and no ICLA is needed; for a contributor on path to committer, the ICLA needs to be on file by the second or third patch.

Validation Artifacts

After this chapter:

  1. A ~/tez-notes/license-categories.md cheatsheet of A/B/X with examples.
  2. The reflex to run mvn apache-rat:check in your pre-submit script.
  3. The discipline to check a new dep's category before opening a JIRA proposing it.
  4. The ability to read Tez's NOTICE file and confirm what each line is there for.

The next chapter — Code Style & Trust — closes the section with the operational mechanics of style enforcement and the trust ladder a contributor climbs.