Check out R-bloggers for more excellent content!

Rafal Lukawiecki - Putting Science into the Business of Data Science


A talk by Rafal Lukawiecki at the Microsoft Machine Learning & Data Science Summit (2016). Data science relies on the scientific method of reasoning to help make business decisions based on analytics. Let Rafal explain how his customers apply the trusted processes and the principles of hypothesis testing with machine learning and statistics towards solving their day-to-day, practical business problems. Rafal will speak from his 10 years of experience in data mining and statistics, using the Microsoft data platform for high-value customer identification, recommendation and gap analysis, customer paths and acquisition modelling, price optimization and other forms of advanced analytics. Read more »

Edward Tufte - The Future of Data Analysis


A keynote talk by Edward Tufte at the Microsoft Machine Learning & Data Science Summit (2016). Introduction by David Smith.

Python: First Steps with MongoDB

2016-09-28     MongoDB Python

I’m busy working my way through Kyle Banker’s MongoDB in Action. Much of the example code in the book is given in Ruby. Despite the fact that I’d love to learn more about Ruby, for the moment it makes more sense for me to follow along with Python. Read more »

xkcd: Hand Sanitiser

2016-09-21     xkcd

Neha Narula: The future of money

2016-09-20     Blockchain TED Talk

View POST Data using Chrome Developer Tools

2016-09-19     Web Scraping

When figuring out how to formulate the contents of a POST request it’s often useful to see the “typical” fields submitted directly from a web form. Open Developer Tools in Chrome. Select the Network tab (at the top). Submit the form. Watch the magic happening in the Developer Tools console. Click on the first document listed in the Developer Tools console, then select the `Headers` tab. That’s just scratching the surface of the wealth of information available on the Network tab. Read more »

Deleting All Nodes and Relationships

2016-09-15     Neo4j

Seems that I am doing this a lot: deleting my entire graph (all nodes and relationships) and rebuilding from scratch. I guess that this is part of the learning process. Route 1: Delete Relationships then Nodes A relationship is constrained to join a start node to an end node. Every relationship must be associated with at least one node (a relationship may begin and end on the same node). No such constraint exists for nodes. Read more »

Running Cypher Queries from File on Windows

2016-09-14     Neo4j

Recent packages of Neo4j for Windows do not include neo4j-shell. The Neo4j browser will only accept one statement at a time, making scripts consisting of multiple Cypher commands a problem. Read more »

Remote Access to Neo4j on Windows

2016-09-13     Neo4j

Installing Neo4j on Ubuntu 16.04

2016-09-06     Neo4j Linux

Some instructions for installing Neo4j on Ubuntu 16.04. More for my own benefit than anything else. Installing Java Neo4j is implemented in Java, so you’ll need to have the Java Runtime Environment (JRE) installed. If you already have this up and running, go ahead and skip this step. sudo apt install default-jre default-jre-headless Check whether you can now run the java executable. java If that works for you, great! It didn’t immediately work on one of my machines. Read more »

James Veitch: When you reply to spam email

2016-09-05     TED Talk

PLOS Subject Keywords: Association Rules

2016-09-01     R Association Rules

In a previous post I detailed the process of compiling data on subject keywords used in articles published in PLOS journals. In this instalment I’ll be using those data to mine Association Rules with the arules package. Good references on the topic of Association Rules are Section 14.2 of The Elements of Statistical Learning (2009) by Hastie, Tibshirani and Friedman; and Introduction to arules by Hahsler, Grün, Hornik and Buchta. Read more »

ubeR: A Package for the Uber API

2016-08-31     R

Uber exposes an extensive API for interacting with their service. ubeR is a R package for working with that API which Arthur Wu and I put together during a Hackathon at iXperience. Installation The package is currently hosted on GitHub. Installation is simple using the devtools package. > devtools::install_github("DataWookie/ubeR") > library(ubeR) Authentication To work with the API you’ll need to create a new application for the Rides API. Set Redirect URL to http://localhost:1410/. Read more »

Talks about the Blockchain

2016-08-29     Blockchain TED Talk

Finally educating myself about the blockchain. These videos are a good place to start. Don Tapscott: How the blockchain is changing money and business Bettina Warburg: How the blockchain will radically transform the economy

PLOS Subject Keywords: Gathering Data

2016-08-24     R Association Rules Collaborative Filtering

I’m putting together a couple of articles on Collaborative Filtering and Association Rules. Naturally, the first step is finding suitable data for illustrative purposes. Read more »

Sportsbook Betting (Part 3): Evolving Odds

2016-08-23     R Gambling

In previous instalments in this series I have not taken into account how odds can change over time. Read more »

Garmin ANT on Ubuntu

2016-08-22     Linux

I finally got tired of booting up Windows to download data from my Garmin 910XT. I tried to get my old Ubuntu 15.04 system to recognise my ANT stick but failed. Now that I have a stable Ubuntu 16.04 system the time seems ripe. openant Install openant, a Python library for downloading and uploading files from ANT-FS compliant devices. Download the zip file from Unpack the archive and install using $ sudo python setup. Read more »

Anthony Goldbloom: The jobs we'll lose to machines

2016-08-22     Machine Learning TED Talk

Sportsbook Betting (Part 2): Bookmakers' Odds

2016-08-10     R Gambling

In the first instalment of this series we gained an understanding of the various types of odds used in Sportsbook betting and the link between those odds and implied probabilities. We noted that the implied probabilities for all possible outcomes in an event may sum to more than 100%. At first sight these seems a bit odd. It certainly appears to violate the basic principles of statistics. However, this anomaly is the mechanism by which bookmakers assure their profits. Read more »

Animated Mortality

2016-08-09     R

feedeR: Reading RSS and Atom Feeds from R

2016-08-08     R

I’m working on a project in which I need to systematically parse a number of RSS and Atom feeds from within R. I was somewhat surprised to find that no package currently exists on CRAN to handle this task. So this presented the opportunity for a bit of DIY. You can find the fruits of my morning’s labour here. Read more »

Web Scraping and "invalid multibyte string"

2016-08-02     R Web Scraping

A couple of my collaborators have had trouble using read_html() from the xml2 package to access this Wikipedia page. Specifically they have been getting errors like this: Read more »

John Green: The Nerd's Guide to Learning Everything Online

2016-08-02     TED Talk

99% of my learning in the last decade has happened online, so this resonates with me.

Sportsbook Betting (Part 1): Odds

2016-08-01     R Gambling

This series of articles was written as support material for Statistics exercises in a course that I’m teaching for iXperience. In the series I’ll be using illustrative examples for wagering on a variety of Sportsbook events including Horse Racing, Rugby and Tennis. The same principles can be applied across essentially all betting markets. Read more »

Arthur Benjamin: Teach statistics before calculus!

2016-07-29     TED Talk Teaching

Arthur Benjamin thinks that the end goal of teaching Mathematics at school should be Statistics rather than Calculus. He has a point: in terms of understanding things in the real world, Statistics is definitely more powerful. These ideas are quite compatible with those of Conrad Wolfram, who thinks that we should be using computers more extensively in Mathematics education. Read more »

Building a Life Table

2016-07-28     R

Calculating Pi using Buffon's Needle

2016-07-26     R

Conrad Wolfram: Teaching kids real math with computers

2016-07-25     TED Talk Teaching

Conrad Wolfram gives a thought provoking talk on a different way to teach Mathematics in schools. Read more »

Mortality by Year and Age

2016-07-22     R

Taking another look at the data from the lifespan package. Plot below shows the evolution of mortality in the US as a function of year and age. Read more »

Life Expectancy by Country

2016-07-20     R

I was rather inspired by this plot on Wikipedia’s List of Countries by Life Expectancy. Read more »