>> Noel O'Boyle

R or SciPy for Cheminformatics?

Python, through the SciPy numerical extension, offers an alternative to using R in cheminformatics. You will understand what I mean if you have ever tried to code an algorithm in R.

Advantages of R

Disadvantages of R

  • Trying to figure out how to do something in R (even when you remember having done it before) removes years from your life.
  • To compare the two, I used a 1GB RAM Pentium 4 to read the same data file and calculate the principal components (after scaling). The data file contained different numbers of molecules, with 10 descriptor values for each. In the table below, the units of time are seconds.

    For R jobs, .RData was removed and then the command R CMD BATCH myscript.R was used.

    Reading in data

    The data is in the format required for the R command read.table. The data is read in using:

    Method300K cmpds600K cmpds1.6M cmpds
    Python6.813.941
    R (read.table)42105NA
    R (scan)92056

    The scan method in R requires more parameters. In addition, the data is either read in as a list or a single vector, and requires a transformation to a data frame before it is of use.

    PCA

    R uses singular value decomposition (svd) to do the PCA, whereas I found the eigenvalues of the covariance matrix using SciPy.

    Language300K cmpds600K cmpds1.6M cmpds
    Python2.23.642
    R (read.table)510NA
    R (scan)3529