Check out R-bloggers for more excellent content!

## A Shiny Comrades Marathon Pacing App

2019-06-04

The Comrades Marathon is an epic ultramarathon run each year between Durban and Pietermaritzburg (South Africa). A few years ago I put together a simple spreadsheet for generating a Comrades Marathon pacing strategy. But the spreadsheet was clunky to use and laborious to maintain. Plus I was frustrated by the crude plots (largely due to my limited spreadsheet proficiency). It seemed like an excellent opportunity to create a Shiny app. Read more »

## emayili: Sending Email from R

2019-05-27

At Exegetic we do a lot of automated reporting with R. Being able to easily and reliably send emails is a high priority. There is already a selection of packages for sending email from R: {mailR} {gmailr} {blastula} {blatr} (Windows) {mail} and {sendmailR}. We’ve had the most experience with the first two, both of which are really solid packages. However, {gmailr} uses the Google Mail API so it doesn’t work with all SMTP servers and {mailR} has a dependency on {rJava} which can be a bit of a hurdle for deploying in some environments. Read more »

## Setting up an R Admin Group

2019-04-11

When I set up an R server for clients they often want to be able to install packages so that all users on the machine have access to them. This requires them to be able to install the packages onto the root filesystem rather that under their individual home directories. It would be easy enough to give them su access, but this is a risky approach. There are so many other things on the system that they could break with this level of power. Read more »

## Sliding Puzzle Solvable?

2019-04-10

I’m helping develop a new game concept, which is based on the sliding puzzle game. The idea is to randomise the initial configuration of the puzzle. However, I quickly discovered that half of the resulting configurations were not solvable. Not good! Here are two approaches to getting a solvable puzzle: build it (by randomly moving tiles from a known solvable configuration) or generate random configurations and check whether solvable. Read more »

## Integrating Qlik Sense and R

2019-03-26

Components Qlik Sense is a tool for exploratory data analysis and visualisation. It’s powerful and versatile. It’s can, however, be significantly enhanced by interfacing with R. Qlik Sense does not currently integrate directly with R. However, it’s not too tricky to get the two systems talking to each other. We’ll need two things to make this happen: Rserve — A TCP/IP server which allows other programs to use R without initialising a separate R process or linking against an R library; and SSE R-plugin — A server-side extension (SSE) which provides the interface between Qlik Sense and Rserve. Read more »

## satRday (Paris) 2019

2019-02-25

21 February 2019 Arrived in Paris rather late after catching the Eurostar from London. Trip nearly started on a bad note when I underestimated the time required to checkin, get through passport control and security. Sat down on the train literally as it departed. 22 February 2019 Early start, working on my tutorial for satRday. When the Sun came up I went out for a trot, primarily to get acquainted with the neighbourhood but also to locate the grave of Jim Morrison. Read more »

## Docker Images for R: r-base versus r-apt

2019-01-21

I need to deploy a Plumber API in a Docker container. The API has some R package dependencies which need to be baked into the Docker image. There are a few options for the base image: r-base tidyverse or r-apt. The first option, r-base, would require building the dependencies from source, a somewhat time consuming operation. The last option, r-apt, makes it possible to install most packages using apt, which is likely to be much quicker. Read more »

## RServe: Getting Started

2019-01-21

Rserve is a server which allows other programs to use the facilities of R via TCP/IP. Installing Since Rserve gets installed to system folders, you need to do the install as the root user. # Become root. $sudo su # Run R as root.$ R> install.packages("Rserve") Running To launch as a daemon. $R CMD Rserve To launch in debug mode.$ R CMD Rserve. Read more »

## JSON Payload for POST Request

2019-01-10

Starting with JSON body because this is the way that most API documentation will give you the payload examples. body = '{ "filters": { "keywords": ["money","government"], "award_type_codes": [ "A", "B", "C", "D" ] }, "fields": [ "Award ID", "Mod", "Recipient Name", "Action Date", "Transaction Amount", "Awarding Agency", "Awarding Sub Agency", "Award Type" ], "page": 1, "limit": 35, "sort": "Transaction Amount", "order": "desc" }' library(httr) Send the body as a JSON string. Read more »

## Where does .Renviron live on Citrix?

2019-01-08

At one of my clients I run RStudio under Citrix in order to have access to their data. For the most part this works fine. However, every time I visit them I spend the first few minutes of my day installing packages because my environment does not seem to be persisted from one session to the next. I finally had a gap and decided to fix the problem. Where are the packages being installed? Read more »

## Survey Raking: An Illustration

2018-12-26

Analysing survey data can be tricky. There’s often a mismatch between the characteristics of the survey respondents and and those of the general population. If the discrepancies are not accounted for then the survey results can (and generally will!) be misleading. A common approach to this problem is to weight the individual survey responses so that the marginal proportions of the survey are close to those of the population. Raking (also known as proportional fitting, sample-balancing, or ratio estimation) is a technique for generating the required weights. Read more »

2018-12-14

## Scraping the Turkey Accordion

2018-12-12

One of the things I like most about web scraping is that almost every site comes with a new set of challenges. The Accordion Concept I recently had to scrape a few product pages from the site of a large retailer. I discovered that these pages use an “accordion” to present the product attributes. Only a single panel of the accordion is visible at any one time. So, for example, you toggle the Details panel open to see the associated content. Read more »

## Installing RStudio & Shiny Servers

2018-11-13

I did a remote install of Ubuntu Server today. This was somewhat novel because it’s the first time that I have not had physical access to the machine I was installing on. The server install went very smoothly indeed. The next tasks were to install RStudio Server and Shiny Server. The installation process for each of these is well documented on the RStudio web site: Installing RStudio Server and Installing Shiny Server. Read more »

## Accessing Open Data from AWS

2018-11-04

There’s a magnificent variety of open data available on AWS. To see the full list, head over to the Registry of Open Data on AWS. When you find something that’s of interest to you, click through to the respective page. The vital piece of information on this page is the Amazon Resource Name (ARN). Grab the final portion of the ARN. That’s the string that uniquely identifies the bucket on S3. Read more »

## Embedding Dependencies into a HTML File

2018-10-31

I use HTML to generate slide decks. Usually my HTML will reference a host of other files on my machine (CSS, JavaScript and images). If I want to distribute my deck then I have a couple of options: just send the HTML file without all of the dependencies or send the HTML file and dependencies (normally wrapped up in some sort of archive). Both of these have problems. In the former case the HTML just ends up looking like ass because it relies on all of those dependencies to sort out the aesthetics. Read more »

## DNS on Ubuntu 18.04

2018-10-25

For years it’s been simple to set up DNS on a Linux machine. Just add a couple of entries to /etc/resolv.conf and you’re done. # Use Google's public DNS servers. nameserver 8.8.4.4 nameserver 8.8.8.8 But things change and now it’s not that simple. If you now edit /etc/resolv.conf on Ubuntu you’ll find that the edits are ephemeral. If you restart (or even hibernate) your machine then they’ll be overwritten by default content. Read more »

## @pyconza (2018): Data Science and Bayes with Python

2018-10-15

I’ve just returned from PyConZA (2018), held at the Birchwood Hotel in Boksburg North (Johannesburg) on 11-12 October. A great conference with a super selection of talks and great catering. Obviously when the PyCon call for papers came out I was feeling ambitious because I submitted a Workshop and a Talk. They were both accepted, so that put the pressure on a bit. Workshop I gave a full day pre-conference workshop on 10 October entitled “Introduction to Python for Data Science”. Read more »

## Docker Images for Spark

2018-09-28

I recently put together a short training course on Spark. One of the initial components of the course involved deploying a Spark cluster on AWS. I wanted to have Jupyter Notebook and RStudio servers available on the master node too and the easiest way to make that happen was to install Docker and then run appropriate images. There’s already a jupyter/pyspark-notebook image which includes Spark and Jupyter. It’s a simple matter to extend the rocker/verse image (which already includes RStudio server, the tidyverse, devtools and some publishing utilities) to include the sparklyr package. Read more »

## DIY VPN with Docker

2018-09-11

I’ve worked with both ExpressVPN and NordVPN. Both are great services but, from my perspective, have one major shortcoming: they’re currently blocked by Amazon Web Services (AWS). When using either of them you are simply not able to access any of the AWS services. The most common scenario in which I’d be using a VPN is if I’m on a restrictive network where I’m only able to access web sites. Read more »

## Refining an AWS IAM Policy for Flintrock

2018-09-08

Flintrock is a tool for launching a Spark cluster on AWS. To get it working initially I needed an IAM (Identity and Access Management) user with the following policies: AmazonEC2FullAccess and IAMFullAccess. Without these I got errors like botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetInstanceProfile operation: User: arn:aws:iam::690534650866:user/datawookie is not authorized to perform: iam:GetInstanceProfile on resource: instance profile EMR_EC2_DefaultRole and botocore.exceptions.ClientError: An error occurred (UnauthorizedOperation) when calling the DescribeVpcs operation: You are not authorized to perform this operation. Read more »

## Diagnosing RStudio Startup Issues

2018-09-07

Yesterday I tried to start RStudio and something weird happened: the window launched but it was blank and unresponsive. I tried dpkg --remove and then re-installed. Same problem. I tried dpkg --remove followed by dpkg --purge and then re-installed. Same problem. I renamed by .R folder. Still the same problem. A sense of desperation was beginning to set in: most of my projects rely on RStudio. After trying a selection of other options I consulted the Internet Oracle and learned that I could get additional diagnostics using Read more »

## Chairing a Conference Session

2018-08-09

There are many factors which can determine the success of a conference: the location, the venue, the catering, the speakers, the social programme, the contents of the swag bag… However, in my opinion, one of the most important components of an enjoyable conference is a collection of competent chairpersons, for they will ensure that all aspects of the sessions (the very core of a conference!) run smoothly. Chairing a session at a conference is a challenging and important responsibility. Read more »

## Stan.jl Setup

2018-07-25

I’m busy preparing a poster about Stan.jl for JuliaCon 2018. Getting set up is pretty simple, although there are some minor details that I thought I’d document. Read more »

## What's New in R 3.5.0?

2018-07-09

A complete list of the changes in R 3.5.0 can be found here. I’m picking out two (personal) highlights here. Read more »

## Updating R on Ubuntu

2018-07-09

Today I finally got around to updating my R to 3.5 (or, more specifically, 3.5.1). The complete instructions for doing the update on Ubuntu are available here. I’ve paraphrased them below. Read more »

## eRum (2018) Top Twenty

2018-05-18

My Top 20 highlights about eRum (2018) in Budapest. In no particular order: Read more »

## Travelling Salesman with ggmap

2018-05-10

I’ve been testing out some ideas around the Travelling Salesman Problem using TSP and ggmap. For illustration I’ll find the optimal route between the following addresses: Read more »

## Classification: Get the Balance Right

2018-04-21

For classification problems the positive class (which is what you’re normally trying to predict) is often sparsely represented in the data. Unless you do something to address this imbalance then your classifier is likely to be rather underwhelming. Achieving a reasonable balance in the proportions of the target classes is seldom emphasised. Perhaps it’s not very sexy. But it can have a massive effect on a model. Read more »

## Workshop: Web Scraping with R

2018-04-12

Join Andrew Collier and Hanjo Odendaal for a workshop on using R for Web Scraping. Read more »