Data-Intensive Text Processing with MapReduce

April 8, 2012

Download PDF: Data-Intensive Text Processing with MapReduce

“Data-Intensive Text Processing with MapReduce”, written by Jimmy Lin and Chris Dyer, is available in pdf format for free. This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning.

Book Description

MapReduce is a programming model for expressing distributed computations on massive amounts of data and an execution framework for large-scale data processing on clusters of commodity servers. It was originally developed by Google and built on well-known principles in parallel and distributed processing dating back several decades.

MapReduce has since enjoyed widespread adoption via an open-source implementation called Hadoop, whose development was led by Yahoo (now an Apache project). Today, a vibrant software ecosystem has sprung up around Hadoop, with signi cant activity in both industry and academia.

Table of Contents

Introduction
MapReduce Basics
MapReduce Algorithm Design
Inverted Indexing for Text Retrieval
Graph Algorithms
EM Algorithms for Text Processing
Closing Remarks

Download Free PDF / Read Online

Author(s): Jimmy Lin and Chris Dyer.
Publisher: Morgan and Claypool Publishers
Format(s): PDF
File size: 1.71 MB
Number of pages: 175
Link: Download.

An Introduction to Python »