Check out R-bloggers for more excellent content!

Amy Cuddy: Your body language shapes who you are...

2013-11-27     TED Talk

… but be careful with your plots: they might be misinterpreted. Amy Cuddy gives a great talk. Provided me with lots to think about and I will happily confess that I have struck a few power poses (but only after ensuring that I am quite alone)! Read more »

Deriving a Priority Queue from a Plain Vanilla Queue

2013-11-26     R

Following up on my recent post about implementing a queue as a reference class, I am going to derive a Priority Queue class. Read more »

Implementing a Queue as a Reference Class

2013-11-24     R

I am working on a simulation for an Automatic Repeat-reQuest (ARQ) algorithm. After trying various options, I concluded that I would need an implementation of a queue to make this problem tractable. R does not have a native queue data structure, so this seemed like a good opportunity to implement one and learn something about Reference Classes in the process. The Implementation We use setRefClass() to create a generator function which will create objects of the Queue class. Read more »

Lightning Activity Predictions For Single Buoy Moorings

2013-11-24     talk: standard

A short talk that I gave at the LIGHTS 2013 Conference (Johannesburg, 12 September 2013). The slides are a little short on text because I like the audience to hear the content rather than read it. The objective with this project was to develop a model which would predict the occurrence of lightning in the vicinity of a Single Buoy Mooring (SBM). Analysis and visualisations were all done in R. I used data from the World Wide Lightning Location Network (WWLLN) and considered four possible models: Neural Network, Conditional Inference Tree, Support Vector Machine and Random Forest. Of the four, Random Forests produced the best performance. The preliminary results from the Random Forests model are very promising: there is good agreement between predicted and observed lightning occurrence in the vicinity of the SBM. Read more »

Iterators in R

2013-11-14     R

According to Wikipedia, an iterator is “an object that enables a programmer to traverse a container”. A collection of items (stashed in a container) can be thought of as being “iterable” if there is a logical progression from one element to the next (so a list is iterable, while a set is not). An iterator is then an object for moving through the container, one item at a time. Read more »

Introduction to Fractals

2013-11-04     R

A short while ago I was contracted to write a short piece entitled “Introduction to Fractals”. The article can be found here. Admittedly it is hard to do justice to the topic in less than 1000 words. Both of the illustrations were created with R. Read more »

Percolation Threshold: Including Next-Nearest Neighbours

2013-11-01     R

In my previous post about estimating the Percolation Threshold on a square lattice, I only considered flow from a given cell to its four nearest neighbours. It is a relatively simple matter to extend the recursive flow algorithm to include other configurations as well. Malarz and Galam (2005) considered the problem of percolation on a square lattice for various ranges of neighbor links. Below is their illustration of (a) nearest neighbour “NN” and (b) next-nearest neighbour “NNN” links. Read more »

Percolation Threshold on a Square Lattice

2013-10-30     R

Manfred Schroeder touches on the topic of percolation a number of times in his encyclopaedic book on fractals (Schroeder, M. (1991) Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. Percolation has numerous practical applications, the most interesting of which (from my perspective) is the flow of hot water through ground coffee! The problem of percolation can be posed as follows: suppose that a liquid is poured onto a solid block of some substance. If the substance is porous then it is possible for the liquid to seep through the pores and make it all the way to the bottom of the block. Whether or not this happens is determined by the connectivity of the pores within the substance. If it is extremely porous then it is very likely that there will be an open path of pores connecting the top to the bottom and the liquid will flow freely. If, on the other hand, the porosity is low then such a path may not exist. Evidently there is a critical porosity threshold which divides these two regimes. Read more »

Eric Berlow and Sean Gourley: Mapping Ideas Worth Spreading

2013-10-25     TED Talk

Plotting Times of Discrete Events

2013-10-19     R

I recently enjoyed reading O’Hara, R. B., & Kotze, D. J. (2010). Do not log-transform count data. Methods in Ecology and Evolution, 1(2), 118–122. doi:10.1111/j.2041-210X.2010.00021.x. Read more »

Applying the Same Operation to a Number of Variables

2013-10-14     R

Just a quick note on a short hack that I cobbled together this morning. I have an analysis where I need to perform the same set of operations to a list of variables. In order to do this in a compact and robust way, I wanted to write a loop that would run through the variables and apply the operations to each of them in turn. This can be done using get() and assign(). Read more »

Mounting a sshfs volume via the crontab

2013-10-06     Linux

I need to mount a directory from my laptop on my desktop machine using sshfs. At first I was not making the mount terribly regularly, so I did it manually each time that I needed it. However, the frequency increased over time and I was eventually mounting it every day (or multiple times during the course of a day!). This was a perfect opportunity to employ some automation. Read more »

Top 250 Movies at IMDb

2013-10-03     R web scraping

Some years ago I allowed myself to accept a challenge to read the Top 100 Novels of All Time (complete list here). This list was put together by Richard Lacayo and Lev Grossman at Time Magazine. To start with I could tick off a number of books that I had already read. That left me with around 75 books outstanding. So I knuckled down. The Lord of the Rings had been on my reading list for a number of years, so this was my first project. Read more »

Flushing Live MetaTrader Logs to Disk

2013-09-18

The logs generated by expert advisors and indicators when running live on MetaTrader are displayed in the Experts tab at the bottom of the terminal window. Sometimes it is more convenient to analyse these logs offline (especially since the order of the records in the terminal runs in a rather counter-intuitive bottom-to-top order!). However, because writing to the log files is buffered, there can be a delay before what you see in the terminal is actually written to disk. Read more »

Clustering Lightning Discharges to Identify Storms

2013-09-13     talk: standard

A short talk that I gave at the LIGHTS 2013 Conference (Johannesburg, 12 September 2013). The slides are relatively devoid of text because I like the audience to hear the content rather than read it. The central message of the presentation is that clustering lightning discharges into storms is not a trivial task, but still a worthwhile challenge because it can lead to some very interesting science! Read more »

Clustering the Words of William Shakespeare

2013-09-10     R

In my previous post I used the tm package to do some simple text mining on the Complete Works of William Shakespeare. Today I am taking some of those results and using them to generate word clusters. Read more »

MetaTrader Time Zones

2013-09-09

Time zones on MetaTrader can be slightly confusing. There are two important time zones: the time zone of the broker’s server and your local time zone. And these need not be the same. Read more »

Text Mining the Complete Works of William Shakespeare

2013-09-05     R

I am starting a new project that will require some serious text mining. So, in the interests of bringing myself up to speed on the tm package, I thought I would apply it to the Complete Works of William Shakespeare and just see what falls out. The first order of business was getting my hands on all that text. Fortunately it is available from a number of sources. I chose to use Project Gutenberg. Read more »

What can be learned from 5 million books

2013-08-29

This talk by Jean-Baptiste Michel and Erez Lieberman Aiden is phenomenal. The associated article is also well worth checking out: Michel, J.-B., et al. (2011). Quantitative Analysis of Culture Using Millions of Digitized Books. Science, 331, 176–182. Read more »

Presenting Conformance Statistics

2013-08-27     R

A client came to me with some conformance data. She was having a hard time making sense of it in a spreadsheet. I had a look at a couple of ways of presenting it that would bring out the important points. The Data The data came as a spreadsheet with multiple sheets. Each of the sheets had a slightly different format, so the easiest thing to do was to save each one as a CSV file and then import them individually into R. Read more »

The Wonders of foreach

2013-08-25     R

Writing code from scratch to do parallel computations can be rather tricky. However, the packages providing parallel facilities in R make it remarkably easy. One such package is foreach. I am going to document my trail of discovery with foreach, which began some time ago, but has really come into fruition over the last few weeks. First we need a reproducible example. Preferably something which is numerically intensive. > max. Read more »

Fitting a Model by Maximum Likelihood

2013-08-18     R

Maximum-Likelihood Estimation (MLE) is a statistical technique for estimating model parameters. It basically sets out to answer the question: what model parameters are most likely to characterise a given set of data? First you need to select a model for the data. And the model must have one or more (unknown) parameters. As the name implies, MLE proceeds to maximise a likelihood function, which in turn maximises the agreement between the model and the data. Read more »

Finding Correlations in Data with Uncertainty: Classical Solution

2013-08-13     R

Following up on my previous post as a result of an excellent suggestion from Andrej Spiess. The data are indeed very heteroscedastic! Andrej suggested that an alternative way to attack this problem would be to use weighted correlation with weights being the inverse of the measurement variance. Read more »

Finding Correlations in Data with Uncertainty: Bootstrap Solution

2013-08-11     R

A week or so ago a colleague of mine asked if I knew how to calculate correlations for data with uncertainties. Now, if we are going to be honest, then all data should have some level of experimental or measurement error. However, I suspect that in the majority of cases these uncertainties are ignored when considering correlations. To what degree are uncertainties important? A moment’s thought would suggest that if the uncertainties are large enough then they should have a rather significant effect on correlation, or more properly, the uncertainty measure associated with the correlation. Read more »

Finding Your MetaTrader Log Files

2013-08-08

Debugging an indicator or expert advisor (EA) can be a tricky business. Especially when you are doing the debugging remotely. So I write my MQL code to include copious amounts of debugging information to log files. The contents of these log files can be used to diagnose any problems. This articles tells you where you can find those files. Testing Logs When you are running an EA under the strategy tester, the log files are written to the tester\logs directory (see the red rectangle in the directory tree above). Read more »

A Chart of Recent Comrades Marathon Winners

2013-07-30     R running

Continuing on my quest to document the Comrades Marathon results, today I have put together a chart showing the winners of both the men and ladies races since 1980. Click on the image below to see a larger version. The analysis started off with the same data set that I was working with before, from which I extracted only the records for the winners. > winners = subset(results, gender.position == 1, select = c(year, name, gender, race. Read more »

Modelling the Age of the Oldest Person You Know

2013-07-29

The blog post How old is the oldest person you know? by Arthur Charpentier was inspired by Prudential’s stickers campaign which asks you to record the age of the oldest person you know by placing a blue sticker on a number line. The result is a histogram of ages. The original experiment was carried out using 400 real stickers in a park in Austin. Read more »

Comrades Marathon Inference Trees

2013-07-19     R running

Following up on my previous posts regarding the results of the Comrades Marathon, I was planning on putting together a set of models which would predict likelihood to finish and probable finishing time. Along the way I got distracted by something else that is just as interesting and which produces results which readily yield to qualitative interpretation: Conditional Inference Trees as implemented in the R package party. Just to recall what the data look like: Read more »

Optimising a Noisy Objective Function

2013-07-16     R

I am busy with a project where I need to calibrate the Heston Model to some Asian options data. The model has been implemented as a function which executes a Monte Carlo (MC) simulation. As a result, the objective function is rather noisy. There are a number of algorithms for dealing with this sort of problem, and here I simply give a brief overview of some of them. Read more »

Tutorial: Compiling Indicators and Expert Advisors from Source

2013-06-25

When you receive the code for an expert advisor or indidator which we have developed for you, it will come in a package consisting of include files (with a .mqh extension) and source code files (with a .mq4 extension). So, what do you do with them? Read more »