Check out R-bloggers for more excellent content!

Docker Images for R: r-base versus r-apt

2019-01-21

I need to deploy a Plumber API in a Docker container. The API has some R package dependencies which need to be baked into the Docker image. There are a few options for the base image: r-base tidyverse or r-apt. The first option, r-base, would require building the dependencies from source, a somewhat time consuming operation. The last option, r-apt, makes it possible to install most packages using apt, which is likely to be much quicker. Read more »

Where does .Renviron live on Citrix?

2019-01-08

At one of my clients I run RStudio under Citrix in order to have access to their data. For the most part this works fine. However, every time I visit them I spend the first few minutes of my day installing packages because my environment does not seem to be persisted from one session to the next. I finally had a gap and decided to fix the problem. Where are the packages being installed? Read more »

Survey Raking: An Illustration

2018-12-26

Analysing survey data can be tricky. There’s often a mismatch between the characteristics of the survey respondents and and those of the general population. If the discrepancies are not accounted for then the survey results can (and generally will!) be misleading. A common approach to this problem is to weight the individual survey responses so that the marginal proportions of the survey are close to those of the population. Raking (also known as proportional fitting, sample-balancing, or ratio estimation) is a technique for generating the required weights. Read more »

2018-12-14

There’s a Debian package available for Citrix Receiver, so in principle this task should be trivial. It’s not. Simply installing the package leaves you with a SSL error whenever you try to connect to a Citrix resource. You need to jump through a couple of extra hoops to get it actually working. Installing the Package Download the package from here (scroll down to the “Debian Packages” section). Install it. Read more »

Scraping the Turkey Accordion

2018-12-12

One of the things I like most about web scraping is that almost every site comes with a new set of challenges. The Accordion Concept I recently had to scrape a few product pages from the site of a large retailer. I discovered that these pages use an “accordion” to present the product attributes. Only a single panel of the accordion is visible at any one time. So, for example, you toggle the Details panel open to see the associated content. Read more »

Installing RStudio & Shiny Servers

2018-11-13

I did a remote install of Ubuntu Server today. This was somewhat novel because it’s the first time that I have not had physical access to the machine I was installing on. The server install went very smoothly indeed. The next tasks were to install RStudio Server and Shiny Server. The installation process for each of these is well documented on the RStudio web site: Installing RStudio Server and Installing Shiny Server. Read more »

Accessing Open Data from AWS

2018-11-04

There’s a magnificent variety of open data available on AWS. To see the full list, head over to the Registry of Open Data on AWS. When you find something that’s of interest to you, click through to the respective page. The vital piece of information on this page is the Amazon Resource Name (ARN). Grab the final portion of the ARN. That’s the string that uniquely identifies the bucket on S3. Read more »

Embedding Dependencies into a HTML File

2018-10-31

I use HTML to generate slide decks. Usually my HTML will reference a host of other files on my machine (CSS, JavaScript and images). If I want to distribute my deck then I have a couple of options: just send the HTML file without all of the dependencies or send the HTML file and dependencies (normally wrapped up in some sort of archive). Both of these have problems. In the former case the HTML just ends up looking like ass because it relies on all of those dependencies to sort out the aesthetics. Read more »

DNS on Ubuntu 18.04

2018-10-25

For years it’s been simple to set up DNS on a Linux machine. Just add a couple of entries to /etc/resolv.conf and you’re done. # Use Google's public DNS servers. nameserver 8.8.4.4 nameserver 8.8.8.8 But things change and now it’s not that simple. If you now edit /etc/resolv.conf on Ubuntu you’ll find that the edits are ephemeral. If you restart (or even hibernate) your machine then they’ll be overwritten by default content. Read more »

@pyconza (2018): Data Science and Bayes with Python

2018-10-15

I’ve just returned from PyConZA (2018), held at the Birchwood Hotel in Boksburg North (Johannesburg) on 11-12 October. A great conference with a super selection of talks and great catering. Obviously when the PyCon call for papers came out I was feeling ambitious because I submitted a Workshop and a Talk. They were both accepted, so that put the pressure on a bit. Workshop I gave a full day pre-conference workshop on 10 October entitled “Introduction to Python for Data Science”. Read more »

Docker Images for Spark

2018-09-28

I recently put together a short training course on Spark. One of the initial components of the course involved deploying a Spark cluster on AWS. I wanted to have Jupyter Notebook and RStudio servers available on the master node too and the easiest way to make that happen was to install Docker and then run appropriate images. There’s already a jupyter/pyspark-notebook image which includes Spark and Jupyter. It’s a simple matter to extend the rocker/verse image (which already includes RStudio server, the tidyverse, devtools and some publishing utilities) to include the sparklyr package. Read more »

DIY VPN with Docker

2018-09-11

I’ve worked with both ExpressVPN and NordVPN. Both are great services but, from my perspective, have one major shortcoming: they’re currently blocked by Amazon Web Services (AWS). When using either of them you are simply not able to access any of the AWS services. The most common scenario in which I’d be using a VPN is if I’m on a restrictive network where I’m only able to access web sites. Read more »

Refining an AWS IAM Policy for Flintrock

2018-09-08

Flintrock is a tool for launching a Spark cluster on AWS. To get it working initially I needed an IAM (Identity and Access Management) user with the following policies: AmazonEC2FullAccess and IAMFullAccess. Without these I got errors like botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetInstanceProfile operation: User: arn:aws:iam::690534650866:user/datawookie is not authorized to perform: iam:GetInstanceProfile on resource: instance profile EMR_EC2_DefaultRole and botocore.exceptions.ClientError: An error occurred (UnauthorizedOperation) when calling the DescribeVpcs operation: You are not authorized to perform this operation. Read more »

Diagnosing RStudio Startup Issues

2018-09-07

Yesterday I tried to start RStudio and something weird happened: the window launched but it was blank and unresponsive. I tried dpkg --remove and then re-installed. Same problem. I tried dpkg --remove followed by dpkg --purge and then re-installed. Same problem. I renamed by .R folder. Still the same problem. A sense of desperation was beginning to set in: most of my projects rely on RStudio. After trying a selection of other options I consulted the Internet Oracle and learned that I could get additional diagnostics using Read more »

Chairing a Conference Session

2018-08-09

There are many factors which can determine the success of a conference: the location, the venue, the catering, the speakers, the social programme, the contents of the swag bag… However, in my opinion, one of the most important components of an enjoyable conference is a collection of competent chairpersons, for they will ensure that all aspects of the sessions (the very core of a conference!) run smoothly. Chairing a session at a conference is a challenging and important responsibility. Read more »

Stan.jl Setup

2018-07-25

I’m busy preparing a poster about Stan.jl for JuliaCon 2018. Getting set up is pretty simple, although there are some minor details that I thought I’d document. Read more »

What's New in R 3.5.0?

2018-07-09

A complete list of the changes in R 3.5.0 can be found here. I’m picking out two (personal) highlights here. Read more »

Updating R on Ubuntu

2018-07-09

Today I finally got around to updating my R to 3.5 (or, more specifically, 3.5.1). The complete instructions for doing the update on Ubuntu are available here. I’ve paraphrased them below. Read more »

eRum (2018) Top Twenty

2018-05-18

My Top 20 highlights about eRum (2018) in Budapest. In no particular order: Read more »

Travelling Salesman with ggmap

2018-05-10

I’ve been testing out some ideas around the Travelling Salesman Problem using TSP and ggmap. For illustration I’ll find the optimal route between the following addresses: Read more »

Classification: Get the Balance Right

2018-04-21

For classification problems the positive class (which is what you’re normally trying to predict) is often sparsely represented in the data. Unless you do something to address this imbalance then your classifier is likely to be rather underwhelming. Achieving a reasonable balance in the proportions of the target classes is seldom emphasised. Perhaps it’s not very sexy. But it can have a massive effect on a model. Read more »

Workshop: Web Scraping with R

2018-04-12

Join Andrew Collier and Hanjo Odendaal for a workshop on using R for Web Scraping. Read more »

Tips for Lightning Talks

2018-04-06

It seems a little counter-intuitive, but a 5 minute lightning talk is far more difficult to prepare (and present!) than a standard 20 minute or longer talk. The principle challenge is fitting everything that you want to say into the allotted time, while still maintaining an engaging narrative. At the recent satRday conference in Cape Town (17 March 2018) we had a number of great lightning talks. A few of the speakers gave us their tips on creating a brilliant lightning talk. Read more »

Restoring a Django Backup

2018-02-23

It took me a little while to figure out the correct sequence for restoring a Django backup. If you have borked your database, this is how to put it back together. Read more »

Extending DataGrip Evaluation

2018-02-16

DataGrip is a great tool for accessing a wide range of databases. You can get a free 30 day evaluation license. But perhaps you want to evaluate for a tiny bit longer? Read more »

Installing DataGrip on Ubuntu

2018-02-16

Download the DataGrip archive. Unpack the archive. $tar -zxvf datagrip-2018.1.4.tar.gz Rename the folder.$ mv DataGrip-2018.1.4/ datagrip Change the owner to root. $chown -R root.root datagrip Move to /opt.$ sudo mv datagrip /opt/ Link it into PATH. $sudo ln -s /opt/datagrip/bin/datagrip.sh /usr/local/bin/datagrip Start it from the terminal.$ datagrip

SQL Server from Ubuntu

2018-02-05

Setting up the requisites to access a SQL Server database from Ubuntu. Read more »

Installing rJava on Ubuntu

2018-02-05

Installing the rJava package on Ubuntu is not quite as simple as most other R packages. Some quick notes on how to do it. Read more »

Linux VM on Azure

2018-02-05

A quick tutorial on how to create a Linux VM on Azure. Read more »

Ethereum: DIY Tools for Smart Contracts

2018-01-19

What tools do you need to start working with Ethereum smart contracts? The Solidity Online Compiler provides a quick way to experiment with smart contracts without installing any software on your machine. Another promising online alternative is Cosmo. However at some stage you’ll probably want to put together a local Ethereum development environment. Here are some suggestions for how to do that on an Ubuntu machine. Since I’m just feeling my way into this new domain, I’m not sure to what degree all of these are necessary. I do know for sure, that Truffle and testrpc are crucial. Read more »