Chapter 5. Using Apache Pig

Hortonworks Data Platform deploys Apache Pig for your Hadoop cluster.

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The structure of Pig programs is amenable to substantial parallelization, which enables them to handle very large data sets.

At the present time, Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs. Pig's language layer currently consists of a textual language called Pig Latin, which is easy to use, optimized, and extensible.

Pig Documentation

See the Pig documentation including:

Pig Wiki Docs

See the Pig wiki for additional documentation.

Pig JIRAs

Issue tracking for Pig bugs and improvements can be found here: Pig JIRAs.

Pig Mailing Lists

Information about the Pig mailing lists and their archives can be found here: Apache Pig Mailing Lists.

Legal notices