A Friendly Statistics Toolbox for Microarray Analysis

The friendly statistics toolbox for microarray analysis (FSPMA) is an R-library that is controlled by a definition file. FSPMA is available under the GPL 2 license. It is free software that comes with NO WARRANTY.

Functionality

FSPMA, (Sykacek et al. 2005), is an R-library that can be used to analyse microarray data. FSPMA's concept is to base microarray analysis on a definition file that describes the experiment and which analysis steps should be done. The definition file allows analysis without adapting or writing R-scripts. In addition it serves as documentation of the analysis run. FSPMA can be used with data from different platforms (single and two colour arrays) with optional preprocessing steps done before the data gets loaded into FSPMA. The main restriction of FSPMA is that the experiment must be a balanced reference design. Analysis includes handling of bad quality flagged samples, conventional normalisation and normalisation with spike RNA, calculation of ANOVA tables and variance components and finally gene ranking based on within ANOVA contrasts and by using per gene ANOVA models. FSPMA is wrapped around YASMA, (Wernisch et al. 2003), which it extends by some preprocessing and normalisation options and by more general contrasts that allow e.g. analysis of longitudinal studies. To find out more about FSPMA's functionality, it is recommended to inspect FSPMA's tutorial (Sykacek & Furlong 2005) which is part of FSPMA's documentation files.

Installation

R 2.14.0 under MS Windows 64

The easiest way to install FSPMA for R under Windows is to download the corresponding binary distribution. If your distribution is not supported, you have to download the source distribution found below and install the development version for Windows. Details of the latter can be found at the Rtools page.
R versionDownload link
R 2.9.1fspmawin_R.2.14.0.zip

Either approach will install FSPMA and a slightly modified version of the YASMA library, (Wernisch et al. 2003), on which FSPMA depends. Some minor modifications of the original YASMA library were necessary for compatibility with R ver. 2.1 and with the R Win32 tools, which do not support drand48() random number generation. Instead the Win32 port uses R's internal uniform random number generator.

R 2.14.0 with other operating systems, source distribution

For all other operating systems, one has to download the source distribution fspmax_122011.zip. After unzipping the file, running the command "fspmaxinstall.sh" will install FSPMA and a modified version of the YASMA library, (Wernisch et al. 2003). The source distribution of the library is known to work with Linux, Apple OSX and MS Windows.

Testing the installation and learning about FSPMA

To test the installation, one should use the examples provided in FSPMA's online help. See the package overview for details. There are five zip archives containing definition files and the corresponding data files. These examples are meant for evaluation purpose and contain a small number of genes of a larger study done by R. Furlong of the Dept. of Pathology, University of Cambridge. The run time of each example is thus rather small. Downloading and extracting fspma-tutorial.zip from the package overview page in FSPMA's online help, one can obtain the "fspma.Rnw" Sweave file (see the R help on how to use Sweave) which together with the experiments will generate the LaTex sources of the FSPMA tutorial (Sykacek & Furlong 2005). This step will run all code fragments in the tutorial and requires that all experimental data and the Sweave file to reside unzipped in the same directory. Individual experiments can be run by downloading and extracting the relevant archive into a local directory. Analysis is started by invoking "fspma.wrapper" on the R command line using the name of the definition file as parameter, exactly as is shown in the tutorial. Refer to (Sykacek & Furlong 2005) for further details on the output of such analysis runs and how to produce different visualisations of the data and the analysis results.

A real world dataset

We provide here an additional documented definition file for a public Affymetrix dataset. This file must be unzipped (e.g. gunzip) and stored in a directory of your choice. The microarray data that will be analysed by this file have been published as (Small et al . 2005) and can be downloaded form the NCBI GEO Datasets server under reference GDS660. These data files must be stored in the same directory as the definition file. Subsequently one has to start R in that directory and type the following commands at the command line. Different to the examples provided with the library, this definition file provides an analysis of realistic size. In particular evaluating base level comparisons which are shortcuts for several pair wise comparisons and k nearest neighbour imputation can be computationally quite demanding. The definition file of this example is discussed in (Sykacek 2005) which is also part of FSPMA's online help.

>>library(fspma)
>>ret <- fspma.wrapper('tstsgd_A.def')

As soon as the script terminates, there will be several additional files in that directory. These files contain the normalised raw data and a corresponding effects description, a file with an ANOVA table and variance components (although the latter will not show up in this analysis, since there is only one random effect which is captured by the residual noise) and several files that contain the rank lists that correspond to different tests.

Further Information

FSPMA comes with extensive documentation. There are two tutorial like technical reports, one provides an overview and the second a detailed discussion of definition files. In addition all user level functions of FSPMA are described in detail in the online help.

FSPMA is also used for teaching in my FSPMA lecture. In addition I offer supervising enthusiastic students in their master thesis projects, if they want to help extending FSPMAs functionality by adding or improving graphical user interfaces or by interfacing to other libraries.

Acknowledgements

This work was done at the Department of Pathology and the Department of Genetics, University of Cambridge and funded by the BBSRC's Exploiting Genomics initiative under ref. 8/EGH16106, "Shared Genetic Pathways in Cell Number Control". FSPMA is joint work with Gos Micklem and Rob Furlong and relies heavily on Lorenz Wernisch's YASMA package.

References

(Small et al. 2005)
C.L. Small, J. E. Shima, M. Uzumcu, M. K. Skinner, and M. D. Griswold. Profiling gene expression during the differentiation and development of the murine embryonic gonad. Biol Reprod., 72(2):492–501, 2005.
(Sykacek et al. 2005)
P. Sykacek, R. Furlong and G. Micklem. A Friendly Statistics Package for Microarray Analysis, Abstract and PDF available from Bioinformatics Advance Access. An early preprint is available here as pdf and gzipped postscript.
(Sykacek & Furlong 2005)
P. Sykacek and R. Furlong. A FSPMA tutorial. available in pdf and as gzipped postscript.
(Sykacek 2005)
P. Sykacek. A reference to FSPMA definition files. available in pdf and as gzipped postscript.
(Wernisch et al . 2003)
L. Wernisch, S. L. Kendall, S. Soneji, A. Wietzorrek, T. Parish, J. Hinds, P. G. Butcher, and N. G. Stoker. Analysis of whole-genome microarray replicates using mixed models. Bioinformatics, 19(1):53– 61, 2003.

Software repository for "The impact of quantitative microarray optimization on gene expression analysis"

P. Sykacek,1,2 D. P. Kreil,1 L. Meadows, R. Auburn, B. Fischer, S. Russell, and G. Micklem


1: joint first authorship, 2: corresponding author.

Software which implements the quantitative evaluation procedure for calibrating microarray laboratories proposed in The impact of quantitative microarray optimization on gene expression analysis, Sykacek et al., 2010, submitted, can be downloaded here. We provide bmc_code_supp_2010.zip as zip archive and bmc_code_supp_2010.tgz as gzipped tar archive. The zip archive can be unzipped by downloading the file and issuing the command "unzip bmc_code_supp_2010.zip" on the command line. The gzipped tar archive can be expanded issuing the command "tar -xzf bmc_code_supp_2010.tgz". Alternatively the file browsers of most modern operating systems can be used for allocating and expanding these archives as well. After expanding, one finds the new directory ./bmc_code_supp in the current directory. Within this directory the folder ./bmc_code_supp/eval/ contains data files and evaluation scripts which allow repeating all analysis steps concerning the quantitative calibration approach proposed in the paper.

The data files were generated with FSPMA from the raw data (Blue-Fuse quantified images) provided here, using location removal as normalisation method. Different normalisation methods like vsn did not alter the results.

Data files for analysis

bfs_fly2_lc_raweffdesc.tsvcontains the effects description which in this case is mainly used for allocating the samples which correspond to different hybridisation temperatures.
bfs_fly2_lc_rawlogG.tsvcontains the expression values of all samples and genes for male Drosophila flies (dye swap resolved).
bfs_fly2_lc_rawlogR.tsvcontains the expression values of all samples and genes for female Drosophila flies (dye swap resolved).

MatLab Scripts

There are two MatLab scripts which allow redoing the calculations and evaluations proposed in the paper.

fly_eval_selections.m allows calculating the proposed quantitative measures which are based on several library functions which are provided in this code supplement. The only additional dependency is introduced by ANOVA based gene rankings which are calculated as well. The latter depend on the MatLab statistics toolbox. Users without statistics toolbox are advised setting all relevant code fragments under comments. For test purposes it is recommended changing the flag dosubsel=0; to dosubsel=1; for speeding up calculations.
fly_plot_selections.mgenerates the tables and plots from the quantitative evaluation measures obtained with fly_eval_selections.m. This function should be called after completion of the evaluations with fly_eval_selections.m.

All code, that is scripts and the required library functions provided with this code supplement in the directory ../bmc_code_supp/mlablib/ are released under the GPL 2 license. This implies that anyone can modify and use the code under the conditions detailed in this license. An important implication of GPL 2 is that we take no responsibility for wrong conclusions or other damage which might be caused by using the proposed method or software. The library functions provide extensive documentation. This allows adapting the evaluation scripts discussed above to specific user needs.