Check out R-bloggers for more excellent content!

2016-08-08

I’m working on a project in which I need to systematically parse a number of RSS and Atom feeds from within R. I was somewhat surprised to find that no package currently exists on CRAN to handle this task. So this presented the opportunity for a bit of DIY. You can find the fruits of my morning’s labour here. Read more »

## Web Scraping and "invalid multibyte string"

2016-08-02

A couple of my collaborators have had trouble using read_html() from the xml2 package to access this Wikipedia page. Specifically they have been getting errors like this: Read more »

## John Green: The Nerd's Guide to Learning Everything Online

2016-08-02

99% of my learning in the last decade has happened online, so this resonates with me.

## Sportsbook Betting (Part 1): Odds

2016-08-01

This series of articles was written as support material for Statistics exercises in a course that I’m teaching for iXperience. In the series I’ll be using illustrative examples for wagering on a variety of Sportsbook events including Horse Racing, Rugby and Tennis. The same principles can be applied across essentially all betting markets. Read more »

## Arthur Benjamin: Teach statistics before calculus!

2016-07-29

Arthur Benjamin thinks that the end goal of teaching Mathematics at school should be Statistics rather than Calculus. He has a point: in terms of understanding things in the real world, Statistics is definitely more powerful. These ideas are quite compatible with those of Conrad Wolfram, who thinks that we should be using computers more extensively in Mathematics education. Read more »

2016-07-28

2016-07-26

## Conrad Wolfram: Teaching kids real math with computers

2016-07-25

Conrad Wolfram gives a thought provoking talk on a different way to teach Mathematics in schools. Read more »

## Mortality by Year and Age

2016-07-22

Taking another look at the data from the lifespan package. Plot below shows the evolution of mortality in the US as a function of year and age. Read more »

## Life Expectancy by Country

2016-07-20

I was rather inspired by this plot on Wikipedia’s List of Countries by Life Expectancy. Read more »

## Mortality Rate by Age

2016-07-19

Working further with the mortality data from http://www.cdc.gov/, I’ve added a breakdown of deaths by age and gender to the lifespan package on GitHub. Read more »

## Escalating Life Expectancy

2016-07-18

I’ve added mortality data to the lifespan package. A result that immediately emerges from these data is that average life expectancy is steadily climbing. Read more »

## Birth Month by Gender

2016-07-16

Based on some feedback to a previous post I normalised the birth counts by the (average) number of days in each month. As pointed out by a reader, the results indicate a gradual increase in the number of conceptions during (northern hemisphere) Autumn and Winter, roughly up to the end of December. Normalising the data to give births per day also shifts the peak from August to September. Read more »

## Most Probable Birth Month

2016-07-15

In a previous post I showed that the data from www.baseball-reference.com support Malcolm Gladwell’s contention that more professional baseball players are born in August than any other month. Although this might be explained by the 31 July cutoff for admission to baseball leagues, it was suggested that it could also be linked to a larger proportion of babies being born in August. Read more »

## Streaming from zip to bz2

2016-07-08

I’ve got a massive bunch of zip archives, each of which contains only a single file. And the name of the enclosed file varies. Dealing with these data is painful. It’d be a lot more convenient if the files were compressed with gzip or bzip2 and had a consistent naming convention. How would you go about making that conversion without actually unpacking the zip archive, finding the name of the enclosed file and then recompressing? Read more »

## Major League Baseball Birth Months

2016-07-05

The cutoff date for almost all nonschool baseball leagues in the United States is July 31, with the result that more major league players are born in August than in any other month. Malcolm Gladwell, Outliers A quick analysis to confirm Gladwell’s assertion above. Used data scraped from www.baseball-reference.com. Read more »

## Upgrading Ubuntu 16.04 to Linux Kernel 4.4.12

2016-06-04

I’ve had a few minor hardware issues with the default kernel in Ubuntu 16.04. For example, hibernate does not work on my laptop. So, in an effort to resolve these problems, I upgraded from the 4.4.0 version of the kernel to 4.4.12. Nothing tricky involved, but here’s the process. Grab the headers and image. $wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4.12-xenial/linux-headers-4.4.12-040412-generic_4.4.12-040412.201606011712_amd64.deb$ wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4.12-xenial/linux-headers-4.4.12-040412_4.4.12-040412.201606011712_all.deb \$ wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4.12-xenial/linux-image-4.4.12-040412-generic_4.4.12-040412.201606011712_amd64.deb Then, become root and install the kernel. Read more »

## satRday in Cape Town

2016-05-26

We are planning to host one of the three inaugural satRday conferences in Cape Town during 2017. The [R Consortium](https://www.r-consortium.org/) has committed to funding three of these events: one will be in Hungary, another will be somewhere in the USA and the third will be at an international destination. At present Cape Town is dicing it out with Monterrey (Mexico) for the third location. We just need your votes to make Cape Town’s plans a reality. Read more »

2016-05-18

Great video bringing back some good memories. Read more »

2016-05-12

2016-04-13

## The Next Rembrandt

2016-04-06

Creating The Next Rembrandt: using data to touch the human soul. How a team from ING, Microsoft, TU Delft, Mauritshuis and Rembrandthuis used technology to synthesise a painting in the style of the Dutch master, Rembrandt, almost 350 years after his death.

2016-03-19

## International Open Data Day

2016-03-05

As part of International Open Data Day we spent the morning with a bunch of like minded people poring over some open Census South Africa data. Excellent initiative, @opendatadurban, I’m very excited to see where this is all going and look forward to contributing to the journey! The data above show the distribution of ages in a segment of the South African population who have either no schooling (blue) or have completed Grade 12 (orange). Read more »

## R, HDF5 Data and Lightning

2016-02-23

I used to spend an inordinate amount of time digging through lightning data. These data came from a number of sources, the World Wide Lightning Location Network (WWLLN) and LIS/OTD being the most common. I recently needed to work with some Hierarchical Data Format (HDF) data. HDF is something of a niche format and, since that was the format used for the LIS/OTD data, I went to review those old scripts. It was very pleasant rediscovering work I did some time ago. Read more »

## GPS Doodling

2016-02-22

Stephen Lund combines two of my passions: technology and exercise. Awesome. Durban Doodles coming soon. Read more »

2016-02-12

## Automating R scripts under Windows

2016-02-11

Setting up an automated job under Linux is a cinch thanks to cron. Doing the same under Windows is a little more tricky, but still eminently doable. Read more »

2016-02-10

2016-02-08