home services analysis & bioinformatics
The Microarray Centre also offers analysis and bioinformaticsrelated services. Our Bioinformatics Team provides
many services, such as discussion and analysis of your arraybased project and experimental design assessments,
complementary to UHNMAC customers and UHN researchers.
 Take a look through our databases: human CpG, mouse CpG and cDNA data centres
 Contact our Bioinformatics Manager, Carl Virtanen, at
 Current UHNMAC clients can use our secure User Portal to retrieve data and check the status of a project
Visit our Bioinformatics Website
Data analysis for microarray projects
A basic data analysis package is available for on a per project basis. Our data analysis service is also available to customers with microarray data obtained from other facilities. The basic data analysis package includes:
 consultation on experimental design (complimentary)
 extensive quality control, filtering, and normalisation
 statistical analyses such as clustering and Ttests/ANOVA using R/Bioconductor and GeneSpring software packages
 data available via secure online User Portal
Pricing for advanced data analysis is also quoted on a per project basis. Advanced data analysis can include:
 gene ontology
 pathway analysis
 literature searches
 sequence and SNP analysis from raw chromatograms
 BLAST searches
 multiple alignments and trees
 primer design
 antibody epitope prediction
 computer programming
We have also put together some information that you may find helpful. If you have any suggestions or additions, please contact us at and let us know!

Look through resources and links to find solutions for your needs
Academic and open source
Commercial  Explanations of various terms used in bioinformatics
Hierarchical Clustering
Kmeans Clustering
SAM/PAM
Quick questions:
Expand all
Hide all
 How many replicates are needed for a microarray experiment?

As with any science experiment, the reliability of your data increases with the number of replicates performed. Of course, if you are using a limited source of RNA, the number of replicates you are capable of doing will also be limited. Most groups strive to have at least 3 replicates to allow for statistical analysis. For UHNMAC Service packages that include data analysis, a minimum of 3 replicates are required.
 I've done my microarray experiment the Cy5/Cy3 ratios are all between 0.5 and 2. What did I do wrong?

It is possible that you did nothing wrong! When comparing two RNA samples, the majority of genes will not be differentially expressed and thus the majority will have ratios around 1. If you are sure that at least a few genes should be differentially expressed, you may have to repeat the experiment (and do a reciprocal labelling!) to verify the results.
 What is preprocessing?

Preprocessing is a step that extracts or enhances meaningful data characteristics and is often performed prior to analyzing or “processing” the data. General preprocessing techniques include log transformations, combining replicates, eliminating outliers, use of control spots, and normalisation.
 What is normalisation? What is the difference between global and subgrid normalisation?

Normalisation means to adjust microarray data to account for systemic differences across data sets. Most often, normalisation is used to account for the different dye efficiency in a twocolour experiment.
Global normalisation takes into account all areas of the array during normalisation. Significant local effects can heavily influence this method. Subgrid normalisation calculates the normalisation factor for each subgrid independently, thus making this method insensitive to local variations on the array.  Do you use housekeeping genes for normalisation?

Due to the controversy over what constitutes a housekeeping gene (for a given organism, tissue, condition, etc), we tend not to use housekeeping genes for normalisation.
 What is LOWESS?

LOWESS, also known as LOESS, stands for LOcally WEighted polynomial regreSSion. The general idea for this kind of normalisation is to fit a mathematical function through the data and obtain a model of the distortion, and then use this model to adjust the data. The LOWESS function is a curvefitting equation. It performs a local fit to the data in an intensitydependent manner. The intensity value for each spot is normalised based on data distribution in the immediate neighbourhood of the spot’s intensity.
 Why log ratios? Why log base 2?

The logarithmic transformation provides values that are more easily interpretable and more biologically meaningful. It is convenient to log transform numbers in order to eliminate misleading disproportion between two relative changes. For example, assume two spots both have intensity values of 1000 in the control sample, and values of 100 and 10,000 in the treated sample. The absolute difference between the control and treated samples is 900 and 9000, respectively, for the two spots. However, from a biological point of view, the phenomenon is the same, a 10fold change in both genes (10fold increase for one gene, and a 10fold decrease for the other gene). By using log transformation, fold changes happening around small intensity values will be comparable to fold changes happening around large intensity values. In this example, one gene has a fold change of 1 and the other of –1.
Log base 2 has the advantage of producing a continuous spectrum of values and treating up and downregulated genes in a similar way. A gene upregulated by a factor of 2 has a log2 (ratio) of 1, a gene downregulated by a factor of 2 has a log2 (ratio) of –1 and a gene with no change in expression (ratio of 1) has a log2 (ratio) equal to zero. The log base 2 transformations are convenient and make further analysis and data interpretation easier.  What is oneway ANOVA? Twoway ANOVA?

ANOVA stands for Analysis of Variance. The idea behind ANOVA is to study the relationship between the intergroup and withingroup variabilities. Oneway ANOVA investigates the data by only considering one factor, or in other words, considers only one way of partitioning the data into groups. Twoway ANOVA considers that the data can be grouped by at least two factors.
 What is an M versus A plot?

M is defined as log2 (LexE/LexR) and the formula for A is (log2 (LexE*LexR/))/2. This plot, as opposed to a log vs log plot, allows for the rapid identification of skewed data. Data points in a perfectly normalised data set will be centered on the M=0 axis.
 What do you do with saturated spots?

In a 16bit tiff file, genes with an intensity value of 65,536 are considered saturated. The true intensity of these spots is actually unknown and those spots are flagged and often excluded from further analysis.
 What are distance metrics? (Euclidean, manhattan)

Distance metrics, also known as similiarity metrics, are a function that takes two points (x and y) in an ndimensional space and has the following properties: symmetry, positivity, and triangle inequality. A Euclidean distance is the simplest (shortest) distance between x and y, while the Manhattan (city block) distance is one in which movement can only be in parallel with the x or y axis.
 What is PCA?

Principal Component Analysis (PCA) is a numerical procedure carried out to discover or reduce dimensionality of the data set, identify new meaningful underlying variables, and to magnify the trends in data (increasing separation of poorly correlated elements and bringing highly correlated elements closer together). PCA rotates the data space, aligning the directions of the greatest variability in the data (the first and second principal component) with the x and y axes of the scatter plot.
 How can I validate my microarray results?

Validation is most often performed by realtime (quantitative) PCR. Validation can also be performed using nanostrings, BioPlex assays, and the Ziplex platform. More information about data validation services offered at the UHNMAC can be found here.
 What is MIAME?

Minimum Information About Microarray Experiments (MIAME) is a set of guidelines that outlines the minimum information required to interpret unambiguously, and possibly verify, microarray experiments. Visit the Microarray Gene Expression Data Society (MGED) for more information. Brazma, A, et al. Minimum information about a microarray experiment (MIAME), towards standards for microarray data. Nature Genetics, 2001, 30(4):e15
 What is gene ontology?

Gene Ontology (GO) is a controlled vocabulary to describe gene and gene product attributes of any organism. The GO project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The three organising principles of GO are molecular function, biological process and cellular component.