Read “Programming Pig” by Alan F Gates for free from O’Reilly Media’s Open Feedback Publishing System. Pig provides an engine for executing data flows in parallel on Hadoop. It includes a language, Pig Latin, for expressing these data flows.


Pig Latin includes operators for many of the traditional data operations (join, sort, filter, etc.) as well as the ability for users to develop their own functions for reading, processing, and writing data. Like Hadoop, Pig is an Apache open source project. This means users are free to download it as source or binary, use it for themselves and in their products, change it as they see fit, and contribute to it.

Table of Contents

  • Introduction
  • Installing and Running Pig
  • Grunt
  • Pig’s Data Model
  • Introduction to Pig Latin
  • Advanced Pig Latin
  • Developing and Testing Pig Latin Scripts
  • Making Pig Fly
  • Embedding Pig Latin in Python
  • Writing Evaluation and Filter Functions
  • Writing Load and Store Functions
  • Pig And Other Members Of The Hadoop Community

Book Details

Author(s): Alan Gates
Format(s): HTML
Number of pages: 222
