Check out R-bloggers for more excellent content!

Linda Liukas: Teaching Kids About Computers


Can a computer write poetry?


Kaggle: Santa's Stolen Sleigh

2016-01-22     R

This morning I read Wendy Kan’s interesting post on Creating Santa’s Stolen Sleigh. I hadn’t really thought too much about the process of constructing an optimisation competition, but Wendy gave some interesting insights on the considerations involved in designing a competition which was both fun and challenging but still computationally feasible without military grade hardware. This seems like an opportune time to jot down some of my personal notes and also take a look at the results. I know that this sort of discussion is normally the prerogative of the winners and I trust that my ramblings won’t be viewed as presumptuous. Read more »

Casting a Wide (and Sparse) Matrix in R

2016-01-19     R

I routinely use melt() and cast() from the reshape2 package as part of my data munging workflow. Recently I’ve noticed that the data frames I’ve been casting are often extremely sparse. Stashing these in a dense data structure just feels wasteful. And the dismal drone of page thrashing is unpleasant. So I had a look around for an alternative. As it turns out, it’s remarkably easy to cast a sparse matrix using sparseMatrix() from the Matrix package. Read more »

Kaggle: Walmart Trip Type Classification

2016-01-15     R

Walmart Trip Type Classification was my first real foray into the world of Kaggle and I’m hooked. I previously dabbled in What’s Cooking but that was as part of a team and the team didn’t work out particularly well. As a learning experience the competition was second to none. My final entry put me at position 155 out of 1061 entries which, although not a stellar performance by any means, is just inside the top 15% and I’m pretty happy with that. Below are a few notes on the competition. Read more »

MongoDB: Installing on Windows 7

2016-01-13     MongoDB

It’s not my personal choice, but I have to spend a lot of my time working under Windows. Installing MongoDB under Ubuntu is a snap. Getting it going under Windows seems to require jumping through a few more hoops. Here are my notes. I hope that somebody will find them useful. Read more »

Review: Mastering Python Scientific Computing

2016-01-11     Python Book Review

Review: Learning Shiny

2016-01-05     R Shiny

I was asked to review Learning Shiny (Hernán G. Resnizky, Packt Publishing, 2015). I found the book to be useful, motivating and generally easy to read. I’d already spent some time dabbling with Shiny, but the book helped me graduate from paddling in the shallows to wading out into the Shiny sea. Read more »

Using Checksum to Guess Message Length: Not a Good Idea!

2015-12-22     R

A question posed by one of my colleagues: can a checksum be used to guess message length? My immediate response was negative and, as it turns out, a simple simulation supported this knee-jerk reaction. Read more »



For a moment this morning I was regretting the fact that R doesn’t have a goto statement, but then… Read more »

Making Sense of Logarithmic Loss

2015-12-14     R

Logarithmic Loss, or simply Log Loss, is a classification loss function often used as an evaluation metric in Kaggle competitions. Since success in these competitions hinges on effectively minimising the Log Loss, it makes sense to have some understanding of how this metric is calculated and how it should be interpreted. Log Loss quantifies the accuracy of a classifier by penalising false classifications. Minimising the Log Loss is basically equivalent to maximising the accuracy of the classifier, but there is a subtle twist which we’ll get to in a moment. Read more »

Installing XGBoost on Ubuntu

2015-12-09     R Python

2015 Data Science Salary Survey

2015-12-04     R

The recently published 2015 Data Science Salary Survey conducted by O’Reilly takes a look at the salaries received, tools used and other interesting facts about Data Scientists around the World. It’s based on a survey of over 600 respondents from a variety of industries. The entire report is well worth a read, but I’ve picked out some highlights below. The majority (67%) of the respondents in the survey were from the United States. Read more »

Evolution of First Names: Unisex Names and Nicknames


Evolution of First Names: Fashionable and Popular Names


Last week I took a high level look at the trends in children’s names over the last century. Today I’ll dig a little deeper and examine the ebb and flow in popularity of some specific names. Read more »

Visualising James Bond movies


Graph from Sparse Adjacency Matrix

2015-11-12     R

I spent a decent chunk of my morning trying to figure out how to construct a sparse adjacency matrix for use with graph.adjacency(). I’d have thought that this would be rather straight forward, but I tripped over a few subtle issues with the Matrix package. My biggest problem (which in retrospect seems rather trivial) was that elements in my adjacency matrix were occupied by the pipe symbol. Read more »

Evolution of First Names: Changes over the Last Century


In light of recent developments, a bit of work that I did almost two years ago has become rather relevant. Read more »

LIBOR and Bond Yields

2015-11-06     R

I’ve just been looking at the historical relationship between the London Interbank Offered Rate (LIBOR) and government bond yields. LIBOR data can be found at Quandl and comes in CSV format, so it’s pretty simple to digest. The bond data can be sourced from the US Department of the Treasury. It comes as XML and requires a little more work. > treasury.xml = xmlParse('data/treasury-yield.xml') > xml.field = function(name) { + xpathSApply(xmlRoot(treasury. Read more »

Guy Kawasaki on Personal Branding


Kelsey Jones of Search Engine Journal interviews Guy Kawasaki of Canva. The key take-home message is that maintaining a personal brand is vital even if you are permanently employed. Specifically, it’s important to keep a visible record of who you have worked for and your personal successes. I'm living proof. I did one thing right for Apple thirty years ago. I've been coasting ever since. Just need to do one thing really right. Read more »

MonthOfJulia Day 38: Imaging

2015-10-30     Julia

MonthOfJulia Day 37: Fourier Techniques

2015-10-26     Julia

Data Scientists: Respect in the Workplace?


Data Scientists are often among the best educated and most experienced on a team. Are you getting the respect you deserve? Read more »

Gitflow: A successful Git branching model

2015-10-20     Git

MonthOfJulia Day 36: Markdown

2015-10-19     Julia

Data Science Teams


And even that insanely curious data scientist, if he or she insists on working alone, won’t be able to produce the most valuable insights. Those come from high-performing teams combining individuals who are individually curious and naturally creative, but also collaborative in their approach to the art and science of experimentation. A great data science team is like a jazz quartet, where individuals are always riffing off of one another, and each takes the music to a new and unexpected place. Read more »

WordPress: Underscores and SyntaxHighlighter Evolved


The underscores are invisible in the code that I’m displaying on WordPress using the SyntaxHighlighter Evolved. After a bit of research I found that this was due to the line height being set too small. Read more »

Review: Beautiful Data

2015-10-15     R Python Book Review

MonthOfJulia Day 35: Mapping

2015-10-15     Julia

MonthOfJulia Day 34: Networking

2015-10-13     Julia