Читать книгу Data-Intensive Text Processing with MapReduce - Jimmy Lin - Страница 7

Оглавление

Contents

Acknowledgments

1 Introduction

1.1 Computing in the Clouds

1.2 Big Ideas

1.3 Why Is This Different?

1.4 What This Book Is Not

2 MapReduce Basics

2.1 Functional Programming Roots

2.2 Mappers and Reducers

2.3 The Execution Framework

2.4 Partitioners and Combiners

2.5 The Distributed File System

2.6 Hadoop Cluster Architecture

2.7 Summary

3 MapReduce Algorithm Design

3.1 Local Aggregation

3.1.1 Combiners and In-Mapper Combining

3.1.2 Algorithmic Correctness with Local Aggregation

3.2 Pairs and Stripes

3.3 Computing Relative Frequencies

3.4 Secondary Sorting

3.5 Relational Joins

3.5.1 Reduce-Side Join

3.5.2 Map-Side Join

3.5.3 Memory-Backed Join

3.6 Summary

4 Inverted Indexing for Text Retrieval

4.1 Web Crawling

4.2 Inverted Indexes

4.3 Inverted Indexing: Baseline Implementation

4.4 Inverted Indexing: Revised Implementation

4.5 Index Compression

4.5.1 Byte-Aligned and Word-Aligned Codes

4.5.2 Bit-Aligned Codes

4.5.3 Postings Compression

4.6 What About Retrieval?

4.7 Summary and Additional Readings

5 Graph Algorithms

5.1 Graph Representations

5.2 Parallel Breadth-First Search

5.3 PageRank

5.4 Issues with Graph Processing

5.5 Summary and Additional Readings

6 EM Algorithms for Text Processing

6.1 Expectation Maximization

6.1.1 Maximum Likelihood Estimation

6.1.2 A Latent Variable Marble Game

6.1.3 MLE with Latent Variables

6.1.4 Expectation Maximization

6.1.5 An EM Example

6.2 Hidden Markov Models

6.2.1 Three Questions for Hidden Markov Models

6.2.2 The Forward Algorithm

6.2.3 The Viterbi Algorithm

6.2.4 Parameter Estimation for HMMs

6.2.5 Forward-Backward Training: Summary

6.3 EM in MapReduce

6.3.1 HMM Training in MapReduce

6.4 Case Study: Word Alignment for Statistical Machine Translation

6.4.1 Statistical Phrase-Based Translation

6.4.2 Brief Digression: Language Modeling with MapReduce

6.4.3 Word Alignment

6.4.4 Experiments

6.5 EM-Like Algorithms

6.5.1 Gradient-Based Optimization and Log-Linear Models

6.6 Summary and Additional Readings

7 Closing Remarks

7.1 Limitations of MapReduce

7.2 Alternative Computing Paradigms

7.3 MapReduce and Beyond

Bibliography

Authors’ Biographies

Data-Intensive Text Processing with MapReduce

Подняться наверх