>> Noel O'Boyle

R code

R is a statistical programming environment. It's free, very hands-on, and contains implementations of a huge number of statistical methods.

Plotting things

To reduce the size of label text (particularly when viewing the results of hclust, use plot(myhclust,cex=0.7).

For the output of rpart, use...margin=0.05...

Things I keep forgetting

Attach a column to a dataframe with cbind. Use rbind for rows. Join a number to a vector with c(1,1:5).

The vector 1:n is useful for loops. For stepped loops, use 1:n*m, where m is the step size. For non-integer values you can use seq(1,n,m).

The vector rep(0,10) repeats 0 ten times (useful for initialising a vector).

You can use 'subset' to find subsets that match specific criteria. I imagine that this is similar to using the bracket notation.

To run an R script in batch mode use: R CMD BATCH --vanilla myscript.R.

Looking things up in help

Try ?"[".

Random not so random

I've just run a stochastic algorithm ten times in batch mode but each has gotten the same answer, down to the noise. Weird, huh? After much head-scratching, I've realised that there was a workspace saved in the directory which was being restored each time. As a result the random seed was the same in every case (random numbers are created using a deterministic algorithm, which appears to save its state with the workspace). In order to avoid this, after much more head-scratching and frustrating manual reading, I will use the following to run all of my jobs in future:

R CMD BATCH --vanilla myfile.R