| |
| |
Preface | |
| |
| |
| |
Getting Started | |
| |
| |
Programming Environment Setup | |
| |
| |
Example 1: Simplest Possible App in Cascading | |
| |
| |
Build and Run | |
| |
| |
Cascading Taxonomy | |
| |
| |
Example 2: The Ubiquitous Word Count | |
| |
| |
Flow Diagrams | |
| |
| |
Predictability at Scale | |
| |
| |
| |
Extending Pipe Assemblies | |
| |
| |
Example 3: Customized Operations | |
| |
| |
Scrubbing Tokens | |
| |
| |
Example 4: Replicated Joins | |
| |
| |
Stop Words and Replicated Joins | |
| |
| |
Comparing with Apache Pig | |
| |
| |
Comparing with Apache Hive | |
| |
| |
| |
Test-Driven Development | |
| |
| |
Example 5: TF-IDF Implementation | |
| |
| |
Example 6: TF-IDF with Testing | |
| |
| |
A Word or Two About Testing | |
| |
| |
| |
Scalding-A Scala DSL for Cascading | |
| |
| |
Why Use Scalding? | |
| |
| |
Getting Started with Scalding | |
| |
| |
Example 3 in Scalding: Word Count with Customized Operations | |
| |
| |
A Word or Two about Functional Programming | |
| |
| |
Example 4 in Scalding: Replicated Joins | |
| |
| |
Build Scalding Apps with Gradle | |
| |
| |
Running on Amazon AWS | |
| |
| |
| |
Cascalog-A Clojure DSL for Cascading | |
| |
| |
Why Use Cascalog? | |
| |
| |
Getting Started with Cascalog | |
| |
| |
Example 1 in Cascalog: Simplest Possible App | |
| |
| |
Example 4 in Cascalog: Replicated Joins | |
| |
| |
Example 6 in Cascalog: TF-IDF with Testing | |
| |
| |
Cascalog Technology and Uses | |
| |
| |
| |
Beyond MapReduce | |
| |
| |
Applications and Organizations | |
| |
| |
Lingual, a DSL for ANSI SQL | |
| |
| |
Using the SQL Command Shell | |
| |
| |
Using the JDBC Driver | |
| |
| |
Integrating with Desktop Tools | |
| |
| |
Pattern, a DSL for Predictive Model Markup Language | |
| |
| |
Getting Started with Pattern | |
| |
| |
Predefined App for PMML | |
| |
| |
Integrating Pattern into Cascading Apps | |
| |
| |
Customer Experiments | |
| |
| |
Technology Roadmap for Pattern | |
| |
| |
| |
The Workflow Abstraction | |
| |
| |
Key Insights | |
| |
| |
Pattern Language | |
| |
| |
Literate Programming | |
| |
| |
Separation of Concerns | |
| |
| |
Functional Relational Programming | |
| |
| |
Enterprise vs. Start-Ups | |
| |
| |
| |
Case Study: City of Palo Alto Open Data | |
| |
| |
Why Open Data? | |
| |
| |
City of Palo Alto | |
| |
| |
Moving from Raw Sources to Data Products | |
| |
| |
Calibrating Metrics for the Recommender | |
| |
| |
Spatial Indexing | |
| |
| |
Personalization | |
| |
| |
Recommendations | |
| |
| |
Build and Run | |
| |
| |
Key Points of the Recommender Workflow | |
| |
| |
| |
Troubleshooting Workflows | |
| |
| |
Index | |