Check out R-bloggers for more excellent content!

R, Docker and Checkpoint: A Route to Reproducibility

2019-08-28     R Docker

I need to deploy Shiny on a Windows machine. I also need to use {checkpoint} for package management. Using Docker seems to be the only reasonable approach to Shiny on Windows. But how easy would it be to also factor {checkpoint} into this setup? Only one reasonable way to find out: give it a try. Below is the simple Dockerfile I used. Here are the fundamental components of what it does: Read more »

All Roads Lead to Rome

2019-07-28     R

I was inspired by this visualisation, showing the optimal routes (by car) from the geographic centre of the USA to all counties. The proverb “All Roads Lead to Rome” immediately came to mind and I set out to hack together something along that theme. This is what was required: Find a list of major cities in Europe and Asia. Use OSRM to generate routes from each of these cities to Rome. Read more »

Using Shared Memory with OSRM

2019-07-26     OSRM Linux

If you have multiple applications accessing OSRM data then it does not make sense for each of those to have a separate copy of the data resident in memory. This is especially true if you’re using a relatively large map, in which case memory consumed by multiple processes might be enormous. An alternative is to store the map data in shared memory, allowing multiple processes to access a single copy of the data. Read more »

Recreating 'Unknown Pleasures' graphic

2019-07-15     R

For some time I’ve wanted to recreate the cover art from Joy Division’s Unknown Pleasures album. The visualisation depicts successive pulses from the pulsar PSR B1919+21, discovered by Jocelyn Bell in 1967. Album art. Data The first obstacle was acquiring the data. I found a D3 visualisation by Mike Bostock. This in turn pointed me to a CSV file in a gist belonging to @borgar. After reading the CSV data into pulsar I applied some light wrangling (the raw data is a matrix). Read more »

Comrades Marathon (2019) Splits

2019-07-01     R running

I’m looking at ways to effectively visualise the splits data for the 2019 edition of the Comrades Marathon. My objectives are to provide: an overall view of the splits across the entire field and a detailed view for individual runners (relative to the rest of the field). Ridge Plot My working solution for visualising the global splits data is a ridgeline plot created with the {ggridges} package. Read more »

Medal Breakdown at Comrades Marathon (2019)

2019-06-30     R running

A quick breakdown of the medal distribution at the 2019 edition of the Comrades Marathon. This is what the medal categories correspond to: Gold — first 10 men and women Wally Hayward (men) — 11th position to sub-6:00 Isavel Roche-Kelly (women) — 11th position to sub-7:30 Silver (men) — 6:00 to sub-7:30 Bill Rowan — 7:30 to sub-9:00 Robert Mtshali — 9:00 to sub-10:00 Bronze — 10:00 to sub-11:00 and Vic Clapham — 11:00 to sub-12:00. Read more »

Comrades Marathon (2019) Start Delay

2019-06-15     R running

How long does it take to cross the start line at the Comrades Marathon? If you’re lucky enough to be starting in one of the batches which is close to the front then this might be a matter of seconds to a couple of minutes. But if you’re in a batch closer to the back then this could be anything up to ten or eleven minutes. This is an agonising wait when all you want to do is start running. Read more »

A Shiny Comrades Marathon Pacing App

2019-06-04     R Shiny running

The Comrades Marathon is an epic ultramarathon run each year between Durban and Pietermaritzburg (South Africa). A few years ago I put together a simple spreadsheet for generating a Comrades Marathon pacing strategy. But the spreadsheet was clunky to use and laborious to maintain. Plus I was frustrated by the crude plots (largely due to my limited spreadsheet proficiency). It seemed like an excellent opportunity to create a Shiny app. Read more »

emayili: Sending Email from R

2019-05-27     R

At Exegetic we do a lot of automated reporting with R. Being able to easily and reliably send emails is a high priority. There is already a selection of packages for sending email from R: {mailR} {gmailr} {blastula} {blatr} (Windows) {mail} and {sendmailR}. We’ve had the most experience with the first two, both of which are really solid packages. However, {gmailr} uses the Google Mail API so it doesn’t work with all SMTP servers and {mailR} has a dependency on {rJava} which can be a bit of a hurdle for deploying in some environments. Read more »

Setting up an R Admin Group

2019-04-11     R

When I set up an R server for clients they often want to be able to install packages so that all users on the machine have access to them. This requires them to be able to install the packages onto the root filesystem rather that under their individual home directories. It would be easy enough to give them su access, but this is a risky approach. There are so many other things on the system that they could break with this level of power. Read more »

Sliding Puzzle Solvable?

2019-04-10     Python

I’m helping develop a new game concept, which is based on the sliding puzzle game. The idea is to randomise the initial configuration of the puzzle. However, I quickly discovered that half of the resulting configurations were not solvable. Not good! Here are two approaches to getting a solvable puzzle: build it (by randomly moving tiles from a known solvable configuration) or generate random configurations and check whether solvable. Read more »

Integrating Qlik Sense and R

2019-03-26     R Docker

Components Qlik Sense is a tool for exploratory data analysis and visualisation. It’s powerful and versatile. It’s can, however, be significantly enhanced by interfacing with R. Qlik Sense does not currently integrate directly with R. However, it’s not too tricky to get the two systems talking to each other. We’ll need two things to make this happen: Rserve — A TCP/IP server which allows other programs to use R without initialising a separate R process or linking against an R library; and SSE R-plugin — A server-side extension (SSE) which provides the interface between Qlik Sense and Rserve. Read more »

satRday (Paris) 2019

2019-02-25     conference

21 February 2019 Arrived in Paris rather late after catching the Eurostar from London. Trip nearly started on a bad note when I underestimated the time required to checkin, get through passport control and security. Sat down on the train literally as it departed. 22 February 2019 Early start, working on my tutorial for satRday. When the Sun came up I went out for a trot, primarily to get acquainted with the neighbourhood but also to locate the grave of Jim Morrison. Read more »

Docker Images for R: r-base versus r-apt

2019-01-21     R Docker

I need to deploy a Plumber API in a Docker container. The API has some R package dependencies which need to be baked into the Docker image. There are a few options for the base image: r-base tidyverse or r-apt. The first option, r-base, would require building the dependencies from source, a somewhat time consuming operation. The last option, r-apt, makes it possible to install most packages using apt, which is likely to be much quicker. Read more »

RServe: Getting Started

2019-01-21     R

Rserve is a server which allows other programs to use the facilities of R via TCP/IP. Installing Since Rserve gets installed to system folders, you need to do the install as the root user. # Become root. $ sudo su # Run R as root. $ R> install.packages("Rserve") Running To launch as a daemon. $ R CMD Rserve To launch in debug mode. $ R CMD Rserve. Read more »

JSON Payload for POST Request

2019-01-10     R

Starting with JSON body because this is the way that most API documentation will give you the payload examples. body = '{ "filters": { "keywords": ["money","government"], "award_type_codes": [ "A", "B", "C", "D" ] }, "fields": [ "Award ID", "Mod", "Recipient Name", "Action Date", "Transaction Amount", "Awarding Agency", "Awarding Sub Agency", "Award Type" ], "page": 1, "limit": 35, "sort": "Transaction Amount", "order": "desc" }' library(httr) Send the body as a JSON string. Read more »

Where does .Renviron live on Citrix?

2019-01-08     R

At one of my clients I run RStudio under Citrix in order to have access to their data. For the most part this works fine. However, every time I visit them I spend the first few minutes of my day installing packages because my environment does not seem to be persisted from one session to the next. I finally had a gap and decided to fix the problem. Where are the packages being installed? Read more »

Survey Raking: An Illustration

2018-12-26     R survey

Analysing survey data can be tricky. There’s often a mismatch between the characteristics of the survey respondents and and those of the general population. If the discrepancies are not accounted for then the survey results can (and generally will!) be misleading. A common approach to this problem is to weight the individual survey responses so that the marginal proportions of the survey are close to those of the population. Raking (also known as proportional fitting, sample-balancing, or ratio estimation) is a technique for generating the required weights. Read more »

Citrix Receiver on Ubuntu 18.04

2018-12-14     linux

How to set up access to Citrix from Ubuntu 18.04. Read more »

Scraping the Turkey Accordion

2018-12-12     R web-scraping

One of the things I like most about web scraping is that almost every site comes with a new set of challenges. The Accordion Concept I recently had to scrape a few product pages from the site of a large retailer. I discovered that these pages use an “accordion” to present the product attributes. Only a single panel of the accordion is visible at any one time. So, for example, you toggle the Details panel open to see the associated content. Read more »

Installing RStudio & Shiny Servers

2018-11-13     R Shiny

I did a remote install of Ubuntu Server today. This was somewhat novel because it’s the first time that I have not had physical access to the machine I was installing on. The server install went very smoothly indeed. The next tasks were to install RStudio Server and Shiny Server. The installation process for each of these is well documented on the RStudio web site: Installing RStudio Server and Installing Shiny Server. Read more »

Accessing Open Data from AWS

2018-11-04     aws

There’s a magnificent variety of open data available on AWS. To see the full list, head over to the Registry of Open Data on AWS. When you find something that’s of interest to you, click through to the respective page. The vital piece of information on this page is the Amazon Resource Name (ARN). Grab the final portion of the ARN. That’s the string that uniquely identifies the bucket on S3. Read more »

Embedding Dependencies into a HTML File

2018-10-31     tool speaking

I use HTML to generate slide decks. Usually my HTML will reference a host of other files on my machine (CSS, JavaScript and images). If I want to distribute my deck then I have a couple of options: just send the HTML file without all of the dependencies or send the HTML file and dependencies (normally wrapped up in some sort of archive). Both of these have problems. In the former case the HTML just ends up looking like ass because it relies on all of those dependencies to sort out the aesthetics. Read more »

DNS on Ubuntu 18.04

2018-10-25     Ubuntu

For years it’s been simple to set up DNS on a Linux machine. Just add a couple of entries to /etc/resolv.conf and you’re done. # Use Google's public DNS servers. nameserver 8.8.4.4 nameserver 8.8.8.8 But things change and now it’s not that simple. If you now edit /etc/resolv.conf on Ubuntu you’ll find that the edits are ephemeral. If you restart (or even hibernate) your machine then they’ll be overwritten by default content. Read more »

@pyconza (2018): Data Science and Bayes with Python

2018-10-15     python conference

I’ve just returned from PyConZA (2018), held at the Birchwood Hotel in Boksburg North (Johannesburg) on 11-12 October. A great conference with a super selection of talks and great catering. Obviously when the PyCon call for papers came out I was feeling ambitious because I submitted a Workshop and a Talk. They were both accepted, so that put the pressure on a bit. Workshop I gave a full day pre-conference workshop on 10 October entitled “Introduction to Python for Data Science”. Read more »

Docker Images for Spark

2018-09-28     Docker Spark

I recently put together a short training course on Spark. One of the initial components of the course involved deploying a Spark cluster on AWS. I wanted to have Jupyter Notebook and RStudio servers available on the master node too and the easiest way to make that happen was to install Docker and then run appropriate images. There’s already a jupyter/pyspark-notebook image which includes Spark and Jupyter. It’s a simple matter to extend the rocker/verse image (which already includes RStudio server, the tidyverse, devtools and some publishing utilities) to include the sparklyr package. Read more »

DIY VPN with Docker

2018-09-11     Docker

I’ve worked with both ExpressVPN and NordVPN. Both are great services but, from my perspective, have one major shortcoming: they’re currently blocked by Amazon Web Services (AWS). When using either of them you are simply not able to access any of the AWS services. The most common scenario in which I’d be using a VPN is if I’m on a restrictive network where I’m only able to access web sites. Read more »

Refining an AWS IAM Policy for Flintrock

2018-09-08     spark aws

Flintrock is a tool for launching a Spark cluster on AWS. To get it working initially I needed an IAM (Identity and Access Management) user with the following policies: AmazonEC2FullAccess and IAMFullAccess. Without these I got errors like botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetInstanceProfile operation: User: arn:aws:iam::690534650866:user/datawookie is not authorized to perform: iam:GetInstanceProfile on resource: instance profile EMR_EC2_DefaultRole and botocore.exceptions.ClientError: An error occurred (UnauthorizedOperation) when calling the DescribeVpcs operation: You are not authorized to perform this operation. Read more »

Diagnosing RStudio Startup Issues

2018-09-07     R

Yesterday I tried to start RStudio and something weird happened: the window launched but it was blank and unresponsive. I tried dpkg --remove and then re-installed. Same problem. I tried dpkg --remove followed by dpkg --purge and then re-installed. Same problem. I renamed by .R folder. Still the same problem. A sense of desperation was beginning to set in: most of my projects rely on RStudio. After trying a selection of other options I consulted the Internet Oracle and learned that I could get additional diagnostics using Read more »

Chairing a Conference Session

2018-08-09     speaking

There are many factors which can determine the success of a conference: the location, the venue, the catering, the speakers, the social programme, the contents of the swag bag… However, in my opinion, one of the most important components of an enjoyable conference is a collection of competent chairpersons, for they will ensure that all aspects of the sessions (the very core of a conference!) run smoothly. Chairing a session at a conference is a challenging and important responsibility. Read more »