Accessing PySpark from a Jupyter Notebook
Andrew B. Collier
It’d be great to interact with PySpark from a Jupyter Notebook. This post describes how to get that set up. It assumes that you’ve installed Spark like this.
- Install the
$ pip3 install findspark
- Make sure that the
SPARK_HOMEenvironment variable is defined
- Launch a Jupyter Notebook.
$ jupyter notebook
- Import the
findsparkpackage and then use
findspark.init()to locate the Spark process and then load the
pysparkmodule. See below for a simple example.