Spark: A Gentle Introduction
Spark: A Gentle Introduction
Spark is a cluster-computing framework for processing large volumes of data. In this talk I will give a broad introduction to Spark, covering the following core topics:
- Spark versus MapReduce;
- RDDs, DataFrames and Datasets;
- installation and setup;
- access from Python or R; and
- using the Machine Learning Library.
I won't promise to make you an expert, but I'll be sharing tips and resources which will help you get started.