I feel like 2013 holds a lot of data analysis for me, so I’d like to start the year off by learning a language that excels at statistical analysis and visualization. Enter R, a language that has gotten quite popular over the past few years. In the interest of expanding my horizons, I decided to try to learn this using Code School’s Try R course. Code School courses can be a little simplistic if you have programming experience, but since I haven’t ever looked at R, it seems appropriate.
Lesson One: Using R
The first lesson covers basic variable assignment, functions, and expressions in the REPL environment. Pretty simple for anyone with a programming background, but it does introduce the somewhat unusual assignment operator in R:
x <- 42
This is going to prove a little confusing, for me, as I’ve recently been using coffeescript a lot, with its -> operator for defining functions. I keep swapping the two operators during the lesson.
Lesson Two: Vectors
Now we are getting somewhere. In order for me to do any statistical analysis, I’m going to need some data structures. Vectors are the fundamental one-dimensional list in R. Codeschool does an excellent job in this lesson of moving into data visualization early and seamlessly.
> vesselsSunk <- c(4, 5, 1) > barplot(vesselsSunk)
To be honest, I’ve never used a language before with a barplot function in the core language. At this point in the lesson, I’m pretty excited to keep going. Lesson 2 covers vector math and plotting.
Lesson Three: Matrices
Moving onto two-dimensional data sets. I can almost feel the correlation coefficients and multiple regressions in my near future.
It turns out that this is kind of an odd chapter. We look at basic matrix construction and manipulation.
# Construct a matrix > elevation <- matrix(0,3,4) [,1] [,2] [,3] [,4] [1,] 0 0 0 0 [2,] 0 0 0 0 [3,] 0 0 0 0 #Edit a value elevation[2,2] <- 1
I suppose the lesson is successful in showing how to use matrices, but I don’t feel that it imparts much insight into the language. Similarly, we are introduced to contour, persp, and image functions, however, they remain fairly magical at the end of the lesson.
Lesson 4 – Summary Statistics
Mean, Median, standard deviation. This one took about 2 minutes to complete, but is obviously very important if you never took statistics.
Lesson 5,6 – Factors & Data Frames
R’s Factors and Data Frames provide nice ways to group and categorize data. Once you understand factors, you can group a set of users by age, or other distinguishing characteristics. I really enjoyed these lessons, very practical.
Lesson 7 – Real-World Data
Great finish for the lessons, bit of analysis on real world software piracy data. We finally got an example of data correlation using R, which I’m pretty excited to use in some data sets I’m looking at.
I’d recommend the Try R course to any developer who is interested in data analysis and visualization. I’d REALLY recommend the course for anyone with an interest in statistics and data analysis who doesn’t know anything about programming. It really is that easy. Good work Code School.