“Data-Intensive Text Processing with MapReduce”, written by Jimmy Lin and Chris Dyer, is available in pdf format for free. This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning.
MapReduce is a programming model for expressing distributed computations on massive amounts of data and an execution framework for large-scale data processing on clusters of commodity servers. It was originally developed by Google and built on well-known principles in parallel and distributed processing dating back several decades.
MapReduce has since enjoyed widespread adoption via an open-source implementation called Hadoop, whose development was led by Yahoo (now an Apache project). Today, a vibrant software ecosystem has sprung up around Hadoop, with signicant activity in both industry and academia.
Table of Contents
- MapReduce Basics
- MapReduce Algorithm Design
- Inverted Indexing for Text Retrieval
- Graph Algorithms
- EM Algorithms for Text Processing
- Closing Remarks
Publisher: Morgan and Claypool Publishers
File size: 1.71 MB
Number of pages: 175