Spark: A Gentle Introduction

Spark: A Gentle Introduction

Spark is a cluster-computing framework for processing large volumes of data. In this talk I will give a broad introduction to Spark, covering the following core topics:

  • Spark versus MapReduce;
  • RDDs, DataFrames and Datasets;
  • installation and setup;
  • access from Python or R; and
  • using the Machine Learning Library.

I won't promise to make you an expert, but I'll be sharing tips and resources which will help you get started.

Next: Productivity Hacks for R.
Previous: Asking a Great Question on Stack Overflow.