Check out R-bloggers for more excellent content!

Clustering Time Series Data

2017-04-25     Machine Learning

I have been looking at methods for clustering time domain data and recently read TSclust: An R Package for Time Series Clustering by Pablo Montero and José Vilar. Here are the results of my initial experiments with the TSclust package. Read more »

Bulgaria Web Summit

2017-04-16     Conference

The Bulgaria Web Summit happened on 7 and 8 April 2017 at the Inter Expo Center in Sofia, Bulgaria. Read more »

Relationship between Race Distance and Gender Ratio

2017-04-09     R

Google Quick, Draw!


Spent a very diverting few minutes playing with Quick, Draw! this morning, which is one of the cool AI Experiments hosted by Google. Read more »

Simple School Maths Problem


A simple problem sent through to me by one of my running friends: There are 6 red cards and 1 black card in a box. Busi and Khanha take turns to draw a card at random from the box, with Busi being the first one to draw. The first person who draws the black card will win the game (assume that the game can go on indefinitely). If the cards are drawn with replacement, determine the probability that Khanya will win, showing all working. Read more »

satRday Cape Town: Call for Submissions

2016-10-26     R Conference

satRday Cape Town will happen on 18 February 2017 at Workshop 17, Victoria & Alfred Waterfront, Cape Town, South Africa. Read more »

Zeynep Tufekci: Machine intelligence and human morals

2016-10-24     Machine Learning TED Talk

fast-neural-style: Real-Time Style Transfer

2016-10-07     Machine Learning

I followed up a reference to fast-neural-style from Twitter and spent a glorious hour experimenting with this code. Very cool stuff indeed. It’s documented in Perceptual Losses for Real-Time Style Transfer and Super-Resolution by Justin Johnson, Alexandre Alahi and Fei-Fei Li. The basic idea is to use feed-forward convolutional neural networks to generate image transformations. The networks are trained using perceptual loss functions and effectively apply style transfer. What is “style transfer”? Read more »

Fitting a Statistical Distribution to Sampled Data

2016-10-05     R

I’m generally not too interested in fitting analytical distributions to my data. With large enough samples (which I am normally fortunate enough to have!) I can safely assume normality for most statistics of interest. Recently I had a relatively small chunk of data and finding a decent analytical approximation was important. So I had a look at the tools available in R for addressing this problem. The fitdistrplus package seemed like a good option. Read more »

Talks about Bots

2016-10-04     Machine Learning

Seth Juarez and Matt Winkler having an informal chat about bots. Matt Winkler talking about Bots as the Next UX: Expanding Your Apps with Conversation at the Microsoft Machine Learning & Data Science Summit (2016). At the confluence of the rise in messaging applications, advances in text and language processing, and mobile form factors, bots are emerging as a key area of innovation and excitement. Bots (or conversation agents) are rapidly becoming an integral part of your digital experience: they are as vital a way for people to interact with a service or application as is a web site or a mobile experience. Read more »

Rafal Lukawiecki - Putting Science into the Business of Data Science


A talk by Rafal Lukawiecki at the Microsoft Machine Learning & Data Science Summit (2016). Data science relies on the scientific method of reasoning to help make business decisions based on analytics. Let Rafal explain how his customers apply the trusted processes and the principles of hypothesis testing with machine learning and statistics towards solving their day-to-day, practical business problems. Rafal will speak from his 10 years of experience in data mining and statistics, using the Microsoft data platform for high-value customer identification, recommendation and gap analysis, customer paths and acquisition modelling, price optimization and other forms of advanced analytics. Read more »

Edward Tufte - The Future of Data Analysis


A keynote talk by Edward Tufte at the Microsoft Machine Learning & Data Science Summit (2016). Introduction by David Smith.

Python: First Steps with MongoDB

2016-09-28     MongoDB Python

I’m busy working my way through Kyle Banker’s MongoDB in Action. Much of the example code in the book is given in Ruby. Despite the fact that I’d love to learn more about Ruby, for the moment it makes more sense for me to follow along with Python. Read more »

xkcd: Hand Sanitiser

2016-09-21     xkcd

Neha Narula: The future of money

2016-09-20     Blockchain TED Talk

View POST Data using Chrome Developer Tools

2016-09-19     Web Scraping

When figuring out how to formulate the contents of a POST request it’s often useful to see the “typical” fields submitted directly from a web form. Open Developer Tools in Chrome. Select the Network tab (at the top). Submit the form. Watch the magic happening in the Developer Tools console. Click on the first document listed in the Developer Tools console, then select the `Headers` tab. That’s just scratching the surface of the wealth of information available on the Network tab. Read more »

Deleting All Nodes and Relationships

2016-09-15     Neo4j

Seems that I am doing this a lot: deleting my entire graph (all nodes and relationships) and rebuilding from scratch. I guess that this is part of the learning process. Route 1: Delete Relationships then Nodes A relationship is constrained to join a start node to an end node. Every relationship must be associated with at least one node (a relationship may begin and end on the same node). No such constraint exists for nodes. Read more »

Running Cypher Queries from File on Windows

2016-09-14     Neo4j

Recent packages of Neo4j for Windows do not include neo4j-shell. The Neo4j browser will only accept one statement at a time, making scripts consisting of multiple Cypher commands a problem. Read more »

Remote Access to Neo4j on Windows

2016-09-13     Neo4j

Installing Neo4j on Ubuntu 16.04

2016-09-06     Neo4j Linux

Some instructions for installing Neo4j on Ubuntu 16.04. More for my own benefit than anything else. Installing Java Neo4j is implemented in Java, so you’ll need to have the Java Runtime Environment (JRE) installed. If you already have this up and running, go ahead and skip this step. sudo apt install default-jre default-jre-headless Check whether you can now run the java executable. java If that works for you, great! It didn’t immediately work on one of my machines. Read more »

James Veitch: When you reply to spam email

2016-09-05     TED Talk

PLOS Subject Keywords: Association Rules

2016-09-01     R Association Rules

In a previous post I detailed the process of compiling data on subject keywords used in articles published in PLOS journals. In this instalment I’ll be using those data to mine Association Rules with the arules package. Good references on the topic of Association Rules are Section 14.2 of The Elements of Statistical Learning (2009) by Hastie, Tibshirani and Friedman; and Introduction to arules by Hahsler, Grün, Hornik and Buchta. Read more »

ubeR: A Package for the Uber API

2016-08-31     R

Uber exposes an extensive API for interacting with their service. ubeR is a R package for working with that API which Arthur Wu and I put together during a Hackathon at iXperience. Installation The package is currently hosted on GitHub. Installation is simple using the devtools package. > devtools::install_github("DataWookie/ubeR") > library(ubeR) Authentication To work with the API you’ll need to create a new application for the Rides API. Set Redirect URL to http://localhost:1410/. Read more »

Talks about the Blockchain

2016-08-29     Blockchain TED Talk

Finally educating myself about the blockchain. These videos are a good place to start. Don Tapscott: How the blockchain is changing money and business Bettina Warburg: How the blockchain will radically transform the economy

PLOS Subject Keywords: Gathering Data

2016-08-24     R Association Rules Collaborative Filtering

I’m putting together a couple of articles on Collaborative Filtering and Association Rules. Naturally, the first step is finding suitable data for illustrative purposes. Read more »

Sportsbook Betting (Part 3): Evolving Odds

2016-08-23     R Gambling

In previous instalments in this series I have not taken into account how odds can change over time. Read more »

Garmin ANT on Ubuntu

2016-08-22     Linux

I finally got tired of booting up Windows to download data from my Garmin 910XT. I tried to get my old Ubuntu 15.04 system to recognise my ANT stick but failed. Now that I have a stable Ubuntu 16.04 system the time seems ripe. openant Install openant, a Python library for downloading and uploading files from ANT-FS compliant devices. Download the zip file from Unpack the archive and install using $ sudo python setup. Read more »

Anthony Goldbloom: The jobs we'll lose to machines

2016-08-22     Machine Learning TED Talk

Sportsbook Betting (Part 2): Bookmakers' Odds

2016-08-10     R Gambling

In the first instalment of this series we gained an understanding of the various types of odds used in Sportsbook betting and the link between those odds and implied probabilities. We noted that the implied probabilities for all possible outcomes in an event may sum to more than 100%. At first sight these seems a bit odd. It certainly appears to violate the basic principles of statistics. However, this anomaly is the mechanism by which bookmakers assure their profits. Read more »

Animated Mortality

2016-08-09     R