Skip to content

Enterprise Data Workflows with Cascading Streamlined Enterprise Data Management and Analysis

Best in textbook rentals since 2012!

ISBN-10: 1449358721

ISBN-13: 9781449358723

Edition: 2013

Authors: Paco Nathan

List price: $31.99
Blue ribbon 30 day, 100% satisfaction guarantee!
what's this?
Rush Rewards U
Members Receive:
Carrot Coin icon
XP icon
You have reached 400 XP and carrot coins. That is the daily max!

Description:

Despite its growing use in the enterprise, building applications for Hadoop is notoriously difficult. But there is a solution. This hands-on book introduces you to Cascading, the framework that enables you to build powerful data processing applications on Hadoop without having to spend months learning the intricacies of MapReduce.Whether you’re a developer, data scientist, or system/IT administrator, you’ll quickly learn Cascading’s streamlined approach to data processing, data filtering, and workflow optimization, using sample apps based on Java, Scala, and Clojure. Companies such as Etsy, Razorfish, TeleNav, and Twitter already use Cascading for mission-critical applications. This book…    
Customers also bought

Book details

List price: $31.99
Copyright year: 2013
Publisher: O'Reilly Media, Incorporated
Publication date: 8/13/2013
Binding: Paperback
Pages: 170
Size: 7.00" wide x 9.19" long x 0.35" tall
Weight: 0.638
Language: English

Author of the "Corporate Metabolism" series, which now continues with "Liber 118".Formerly a contributing editor for bOING-bOING magazine, also formerly a writer for Mondo 2000, Wired, etc.Co-founder of FringeWare, and former editor of FringeWare Review. We launched a very early e-commerce web site (late 1992) as one of the first online bookstores, plus a popular bookstore in Austin based on subculture titles ("long tail")... That became a hub for weekly salons and performance art shows, including SRL, Robert Anton Wilson, Church of the SubGenius, and much more oddness."One of the cornerstones of Modern Mass Weirdness in the late 20th Century" - Ivan Stang.

Preface
Getting Started
Programming Environment Setup
Example 1: Simplest Possible App in Cascading
Build and Run
Cascading Taxonomy
Example 2: The Ubiquitous Word Count
Flow Diagrams
Predictability at Scale
Extending Pipe Assemblies
Example 3: Customized Operations
Scrubbing Tokens
Example 4: Replicated Joins
Stop Words and Replicated Joins
Comparing with Apache Pig
Comparing with Apache Hive
Test-Driven Development
Example 5: TF-IDF Implementation
Example 6: TF-IDF with Testing
A Word or Two About Testing
Scalding-A Scala DSL for Cascading
Why Use Scalding?
Getting Started with Scalding
Example 3 in Scalding: Word Count with Customized Operations
A Word or Two about Functional Programming
Example 4 in Scalding: Replicated Joins
Build Scalding Apps with Gradle
Running on Amazon AWS
Cascalog-A Clojure DSL for Cascading
Why Use Cascalog?
Getting Started with Cascalog
Example 1 in Cascalog: Simplest Possible App
Example 4 in Cascalog: Replicated Joins
Example 6 in Cascalog: TF-IDF with Testing
Cascalog Technology and Uses
Beyond MapReduce
Applications and Organizations
Lingual, a DSL for ANSI SQL
Using the SQL Command Shell
Using the JDBC Driver
Integrating with Desktop Tools
Pattern, a DSL for Predictive Model Markup Language
Getting Started with Pattern
Predefined App for PMML
Integrating Pattern into Cascading Apps
Customer Experiments
Technology Roadmap for Pattern
The Workflow Abstraction
Key Insights
Pattern Language
Literate Programming
Separation of Concerns
Functional Relational Programming
Enterprise vs. Start-Ups
Case Study: City of Palo Alto Open Data
Why Open Data?
City of Palo Alto
Moving from Raw Sources to Data Products
Calibrating Metrics for the Recommender
Spatial Indexing
Personalization
Recommendations
Build and Run
Key Points of the Recommender Workflow
Troubleshooting Workflows
Index