In Hands-On Big Data Analytics with PySpark, you will not only learn how to use Spark and the Python API to create high-performance analytics with big data, but also discover techniques for testing, immunizing, and parallelizing Spark jobs. You will learn how to source data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3, and deal with large datasets with PySpark to gain practical big data experience. (Limited-time offer)
Table of Contents
- Installing Pyspark and Setting up Your Development Environment
- Getting Your Big Data into the Spark Environment Using RDDs
- Big Data Cleaning and Wrangling with Spark Notebooks
- Aggregating and Summarizing Data into Useful Reports
- Powerful Exploratory Data Analysis with MLlib
- Putting Structure on Your Big Data with SparkSQL
- Transformations and Actions
- Immutable Design
- Avoiding Shuffle and Reducing Operational Expenses
- Saving Data in the Correct Format
- Working with the Spark Key/Value API
- Testing Apache Spark Jobs
- Leveraging the Spark GraphX API
Download Free PDF / Read Online
Author(s): James Cross, Rudy Lai, Bartłomiej Potaczek
Publisher: Packt Publishing
Published: March 2019
Format(s): Online
File size: –
Number of pages: 182
Download / View Link(s): This offer has ended.
Free as of 10/12/2023.
Publisher: Packt Publishing
Published: March 2019
Format(s): Online
File size: –
Number of pages: 182
Download / View Link(s): This offer has ended.
Free as of 10/12/2023.