Ireland is a typically stable region from a seismic activity perspective as it is distant from major plate boundaries where subduction and sea-floor spreading occur. However, in reading the following article, I was surprised to discover that earthquakes occur quite frequently both off the Irish coast and within the country itself. Most of these events… Continue reading Mapping earthquakes off the Irish Coast using the leaflet package in R

## Cats are great and so is the forcats R package

Cats are great. Perhaps Hadley Wickham and Lionel Henry think so too given the wonderful choice of name for their purrr package. Hadley Wickham has also created a superb, wittingly named package called forcats, possibly an abbreviation of "for categoricals" or an anagram of "factors", which is very, very useful to the data scientist. In… Continue reading Cats are great and so is the forcats R package

## Useful dplyr Functions (w/examples)

The R package dplyr is an extremely useful resource for data cleaning, manipulation, visualisation and analysis. It contains a large number of very useful functions and is, without doubt, one of my top 3 R packages today (ggplot2 and reshape2 being the others). When I was learning how to use dplyr for the first time,… Continue reading Useful dplyr Functions (w/examples)

## Ordinary Least Squares (OLS) Linear Regression in R

Ordinary Least Squares (OLS) linear regression is a statistical technique used for the analysis and modelling of linear relationships between a response variable and one or more predictor variables. If the relationship between two variables appears to be linear, then a straight line can be fit to the data in order to model the relationship.… Continue reading Ordinary Least Squares (OLS) Linear Regression in R

## Bland-Altman/Tukey Mean-Difference Plots using ggplot2

A very useful data visualisation tool in science, particularly in medical and sports settings, is the Bland-Altman/Tukey Mean-Difference plot. When comparing two sets of measurements for the same variable made by different instruments, it is often required to determine whether the instruments are in agreement or not. Correlation and linear regression can tell us something… Continue reading Bland-Altman/Tukey Mean-Difference Plots using ggplot2

## ggplot2 style plotting in Python

R is my language of choice for data science but a good data scientist should have some knowledge of all of the great tools available to them. Recently, I have been gleefully using Python for machine learning problems (specifically pandas and the wonderful scikit-learn). However, for all its greatness, I couldn't help but feel it… Continue reading ggplot2 style plotting in Python

## Naive Bayes Classification in R (Part 2)

Following on from Part 1 of this two-part post, I would now like to explain how the Naive Bayes classifier works before applying it to a classification problem involving breast cancer data. The dataset is sourced from Matjaz Zwitter and Milan Soklic from the Institute of Oncology, University Medical Center in Ljubljana, Slovenia (formerly Yugoslavia) and… Continue reading Naive Bayes Classification in R (Part 2)

## Naive Bayes Classification in R (Part 1)

Introduction A very useful machine learning method which, for its simplicity, is incredibly successful in many real world applications is the Naive Bayes classifier. I am currently taking a machine learning module as part of my data science college course and this week's practical work involved a classification problem using the Naive Bayes method. I… Continue reading Naive Bayes Classification in R (Part 1)

## Predicting the Willingen 2017 men’s ski jumping competition

In an earlier post of mine, I carried out an analysis on ski jumping data for Zakopane, Poland and attempted to predict which athletes would end up on the podium. I also created a classification tree and tested it on the 2017 competition data with good results. For this side project of mine, I hope… Continue reading Predicting the Willingen 2017 men’s ski jumping competition

## Predicting the Zakopane 2017 men’s ski jumping competition

When I was a young boy with a wild imagination, I used to try my hand at numerous sports ranging from tennis to gaelic footbal to soccer, each with varying degrees of success. Living in the countryside throughout my childhood, a big garden allowed me to construct vivid simulations of soccer championships (crowd and all)… Continue reading Predicting the Zakopane 2017 men’s ski jumping competition