Check out R-bloggers for more excellent content!

SQL Server from Ubuntu

2018-02-05     SQL Linux

Setting up the requisites to access a SQL Server database from Ubuntu. Read more »

Installing rJava on Ubuntu

2018-02-05     R Linux

Installing the rJava package on Ubuntu is not quite as simple as most other R packages. Some quick notes on how to do it. Read more »

Linux VM on Azure

2018-02-05     Azure Linux

A quick tutorial on how to create a Linux VM on Azure. Read more »

Ethereum: DIY Tools for Smart Contracts

2018-01-19     Ethereum

What tools do you need to start working with Ethereum smart contracts? The Solidity Online Compiler provides a quick way to experiment with smart contracts without installing any software on your machine. Another promising online alternative is Cosmo. However at some stage you’ll probably want to put together a local Ethereum development environment. Here are some suggestions for how to do that on an Ubuntu machine. Since I’m just feeling my way into this new domain, I’m not sure to what degree all of these are necessary. I do know for sure, that Truffle and testrpc are crucial. Read more »

Ethereum: Running a Node

2018-01-19     Ethereum

Once you’ve installed Geth you’re ready to run your own Ethereum node. Read more »

NTP: Synchronise Your Watches

2018-01-11     NTP

Just like an old fashioned grandfather clock, time time on your computer’s clock can slowly drift. You can quickly verify the accuracy of your clock by comparing it to It’s not unusual for it to be anything from a few seconds to a couple of minutes out. For most purposes this is not a major issue, but there are some applications which are very time sensitive. NTP (Network Time Protocol) is a tool which will synchronise your computer’s clock with a network of accurate time servers, ensuring that it’s always accurate. There’s a lot to be said about NTP, but this is a quick guide to getting it up and running on an Ubuntu machine. Read more »

An Ethereum Package for R

2018-01-07     Ethereum

Bitcoin has become synonymous with “cryptocurrency”. Ethereum is another cryptocurrency which, although not as hyped at Bitcoin, presents some attractive characteristics. The foremost of these is the ability to create sophisticated smart contracts. This post introduces the new ether package for interacting with the Ethereum network from R. Read more »

Moving a Running Process to screen

2017-12-30     Linux

I am not sure how many times this has happened to me, but it’s not infrequent. I’m working on a remote session and I start a long running job. Then some time later I want to disconnect from the session but realise that if I do then the job will be killed. I should have started job in screen or tmux! So, is it possible to transfer the running process to screen? (Or, equally, to tmux?) Well it turns out that it is using the reptyr utility. I discovered this thanks to a LinkedIn post by Bruce Werdschinski. A slightly refinement of his process is documented below. Read more »

Creating an Amazon Machine Image

2017-12-04     AWS

Creating an Amazon Machine Image (AMI) makes it quick and simple to rebuild a specific EC2 setup. This post illustrates the process by creating an AMI with ethminer and NVIDIA GPU drivers. Of course you’d never use this for mining Ether because the hardware costs are still too high! Read more »

Using Large Maps with OSRM

2017-11-27     OSRM

How to deal with large data sets in OSRM? Some quick notes on processing monster PBF files and getting them ready to serve with OSRM. Something to consider up front: if you are RAM limited then this process is going to take a very long time due to swapping. It might make sense to spin up a big clound instance (like a r4.8xlarge) for a couple of hours. You’ll get the job done much more quickly and it’ll definitely be worth it. Read more »

EC2 Missing Disk Space


This morning I created a r3.xlarge spot instance on EC2. The job I’m planning on running requires a good wad of data to be uploaded, which is why I chose the r3.xlarge instance: it’s cost effective and, according to AWS, has 80 Gb of SSD storage. So I was a little surprised when I connected to the running instance and found that the root partition was only around 8 Gb. This is what I did to claim that missing disk space. Read more »

Variable Names: Camel Case to Underscore Delimited

2017-11-20     R

A project I’m working on has a bunch of different data sources. Some of them have column names in Camel Case. Others are underscore delimited. My OCD rebels at this disarray and demands either one or the other. If it were just a few columns and I was only going to have to do this once, then I’d probably just quickly do it by hand. But there are many columns and it’s very likely that there’ll be more data in the future and the process will need to be repeated. Seems like something that should be easy to automate. Read more »

Analysis of Feedback from satRday [Cape Town] 2017

2017-11-15     R satRday Conference

We recently announced the second satRday (Cape Town) conference scheduled to take place on 17 March 2018. Obviously we want this to be bigger and better than this year’s event, so we are paying careful attention to the feedback that we received from the first event. This is a quick analysis of the feedback. We sold 192 tickets and gave out 11 complimentaries to the event. There were 107 responses to the feedback survey, which means that we heard back from more than half of the people who attended, which is hopefully a representative sample. Read more »

Durban Twitter Analysis

2017-11-10     R sentiment

I was invited to give a talk at Digifest (Durban University of Technology) on 10 November 2017. Looking at the other speakers and talks on the programme I realised that my normal range of topics would not be suitable. I needed to do something more in line with their mission to “celebrate the creative spirit through multimedia projects from disciplines such as visual and performing arts” and to promote “collaboration across art, science and technology”. Definitely outside my current domain, but consistent with many of the things that I have been aspiring to. To be honest, I was pleased to be invited, but when I sat down to consider what I would talk about, I found myself at a loss. I’m not currently engaged in anything that ticks many of those boxes. But I am loathe to turn down an opportunity to speak. So I made a plan. In retrospect it was not a terribly good plan. But it was workable. I decided to speak about gauging sentiment relating to the city of Durban using data from Twitter. This post touches on some of my results. Read more »

Speaking Bucket List

2017-10-21     Speaking

A list of conferences and meetups I’d like to speak at in the next few years. Read more »

Installing NVIDIA Graphics Driver on Ubuntu

2017-10-07     Linux GPU

Recipe for installing the NVIDIA binary drivers on Ubuntu. Read more »

Running OSRM with Docker

2017-10-07     Docker OSRM

I’ve now been through the process of setting up OSRM a few times. While it’s not exactly taxing, it seemed like a prime candidate for automation. Read more »

Exporting HTML Presentations to PDF

2017-10-05     Speaking

Building a presentation with reveal.js is such a pleasure. And the results looks so good. Seriously doubt that I will ever use anything like PowerPoint again. Although it’s possible to export a presentation directly to PDF using a style sheet, this doesn’t always work perfectly (IMHO). Fortunately there’s another way: decktape. It works with reveal.js and a bunch of other HTML5 presentation frameworks. Read more »

Quick Wordpress Install with Docker

2017-09-22     Docker Wordpress Linux MySQL

I’ve just put together a Wordpress site for my older daughter. It’s hosted on DigitalOcean and all of the infrastructure is handled with Docker. This post describes the steps in the (easy) install process. Read more »

Diagnosing Killed Jobs on EC2

2017-09-21     Linux AWS

I’ve got a long running optimisation problem on a EC2 instance. Yesterday it was mysteriously killed. I shrugged it off as an anomaly and restarted the job. However, this morning it was killed again. Definitely not a coincidence! So I investigated. This is what I found and how I am resolving the problem. Read more »

Removing Redundant Hostnames with NGINX

2017-09-15     NGINX

While poring over my Google Analytics data I noticed the notification below. Obviously this is not a train smash, but it is compromising the quality of my data. And it also offends my OCD. This is what I did to fix the problem. Read more »

Creating a S3 Bucket

2017-09-14     AWS

There are many good reasons to use S3 (Simple Storage Service) storage. This is a quick overview of how to create a S3 bucket. Read more »

Installing Docker on Ubuntu

2017-09-14     Docker Linux

This procedure works on both my laptop and a fresh EC2 instance. Read more »

Hosting a Plumber API on AWS

2017-09-14     AWS R Plumber

I’ve been putting together a small proof-of-concept API using R and plumber. It works flawlessly on my local machine and I was planning on deploying it on an EC2 instance to demo it for a client. However, I ran into a snag: despite opening the required port in my Security Group I was not able to access the API. This is what I needed to do to get it working. Read more »

Creating an AWS Spot Instance

2017-09-13     AWS

EC2 Spot Instances can provide very affordable computing on EC2 by allowing access to unused capacity at significant discounts. Read more »

Building a Local OSRM Instance

2017-09-11     R OSRM

The Open Source Routing Machine (OSRM) is a library for calculating routes, distances and travel times between spatial locations. It can be accessed via either an HTTP or C++ API. Since it’s open source you can also install locally, download appropriate map data and start making efficient travel calculations. These are the instructions for getting OSRM installed on a Ubuntu machine and hooking up the osrm R package. Read more »

Global Variables in R Packages

2017-09-07     R

I know that global variables are from the Devil, but sometimes you just can’t get around them. I’m building a small package for a client that relies on a data file. For various reasons that file is not part of the package and can reside in different locations on users’ machines. Furthermore there are users on both Windows and Linux machines. Read more »

Driving AWS from the Command Line

2017-08-31     AWS

Although it’s very handy (and easy) to set up some cloud resources using the AWS Management Console, once you know what you need it makes a lot of sense to automate the process. Fortunately there’s a handy little command line tools, aws, which makes this eminently possible. The AWS CLI Command Reference is the definitive resource for this tool. There’s a mind boggling array of possibilities. We’ll take a look at a small selection of them. Read more »

Route Asymmetry in Google Maps

2017-08-23     R

I have been retrieving some route information using Rodrigo Azuero’s gmapsdistance package and noted that there was some asymmetry in the results: the time and distance for the trip from A to B was not necessarily always the same as the time and distance for the trip from B to A. Although in retrospect this seems self-evident, it merited further investigation. Read more »

Retrieving Kaggle Data from the Command Line

2017-08-21     Kaggle AWS

We’ve been building some models for Kaggle competitions using an EC2 instance for compute. I initially downloaded the data locally and then pushed it onto EC2 using SCP. But there had to be a more efficient way to do this, especially given the blazing fast bandwidth available on AWS. Enter kaggle-cli. Update: Apparently kaggle-cli has been deprecated in favour of kaggle-api. More information below. Read more »