Elsevier

Methods in Enzymology

Volume 411, 2006, Pages 134-193
Methods in Enzymology

[9] TM4 Microarray Software Suite

https://doi.org/10.1016/S0076-6879(06)11009-5Get rights and content

Abstract

Powerful specialized software is essential for managing, quantifying, and ultimately deriving scientific insight from results of a microarray experiment. We have developed a suite of software applications, known as TM4, to support such gene expression studies. The suite consists of open‐source tools for data management and reporting, image analysis, normalization and pipeline control, and data mining and visualization. An integrated MIAME‐compliant MySQL database is included. This chapter describes each component of the suite and includes a sample analysis walk‐through.

Introduction

The Human Genome Project was envisioned as a grand endeavor that would change biology by providing a catalog of genes in humans and other model organisms. Although a large number of genome sequencing projects, including that of the human genome, have been declared finished, the collection of the sequence itself has not fundamentally altered our approach to understanding biological systems. Rather, it has been the development of techniques and technologies that allow us to analyze patterns of expression for sets of genes, proteins, or metabolites approaching the total number that are active in an organism at any given point in time.

Since their introduction in 1995 (Lipshutz 1995, Schena 1995), DNA microarrays have matured significantly to become the most widely used technique for the analysis of global patterns of expression and represent a technology that is now used routinely as a means of generating testable hypotheses prior to other studies. DNA microarrays consist of an arrayed collection of probes bound to a solid substrate that are used to interrogate the levels of gene expression using hybridization to labeled nucleic acids and detection of those hybridization events. Although microarray technology is still evolving, the development of robust and reliable commercial platforms, combined with a significant decrease in the cost of an assay, has resulted in an explosion of gene expression data. The challenge of doing an expression profiling experiment is no longer in the generation of data, but rather in effectively capturing the information and using it to explore the biology of the systems under study.

In that regard, the role of software in a study involving microarrays cannot be overstated. Specialized tools are available to complement the experimental procedure and subsequent data analysis. Data management software is used to capture vital information describing the laboratory portion of a microarray experiment. Scanned microarray slides are processed and quantified using image analysis software. Normalization utilities ready data for comparisons and further analysis. Data mining and visualization tools can then help explore data from many perspectives. When used together, such software becomes a system to maximize the utility of the microarray experiment and gain better insight into the biology of interest.

We have developed a suite of software applications to support gene expression studies. This suite, called TM4, consists of a comprehensive set of tools that allow users to collect, manage, and effectively analyze data from microarray experiments. This chapter describes the TM4 suite and each of its components. The chapter concludes with an example analysis using a real data set and several analysis techniques.

The four major applications of TM4 are Madam, Spotfinder, Midas, and MeV. Each application in the suite is publicly and freely available. This includes the source code, which is OSI certified as open source under the artistic license (http://www.opensource.org/licenses/artistic‐license.php).

Madam is the primary data entry, tracking, and reporting system of TM4. A series of data entry forms provide users with an organized method of recording their experimental parameters and data. Query and reporting tools present important data on a variety of entities, such as a single hybridization or an entire study. This application also serves as a repository for other tools in the data management and reporting realm. These include a polymerase chain reaction (PCR) scoring and microtiter plate loading utility, a study design tool, and a free‐form SQL query window. Madam works closely in conjunction with a MIAME‐compliant relational database to carry out its functions. The role of such a database is described elsewhere (Troein et al., 2006).

Spotfinder is a multichannel image analysis tool. This application provides the means to load the output of a microarray scanning operation—typically a pair of 16‐bit tagged image format file (TIFF) images (Timlin, 2006). Semiautomatic grid construction and several methods to adjust the placement of each grid cell manually allow for accurate spot detection. The intensity of each spot can then be quantified and written to an output file along with related spot parameters and flags (Minor, 2006). A number of quality control displays are available, helping users detect systemic issues in slide production.

Midas is a normalization and filtering tool used to process raw data output from Spotfinder and prepare it for further analysis and data mining. Users create a project file, chaining together multiple normalization, filtering, and quality control (QC) modules, using an intuitive graphical workflow builder. The input options provide ways to consistently process single, paired, or whole studies worth of raw expression data. An intuitive graphing system illustrates the effects of normalization with a variety of detailed plots. These graphs can be embedded in a Midas summary report, a pdf‐formatted file that also contains a description of the data processing procedure used.

MeV is the main data analysis and visualization tool of TM4. Users can load raw or normalized data from a variety of input file types. A broad range of algorithms is available, including those for clustering, classification, and statistical tests. The intuitive graphical interface simplifies navigation between algorithm results. An integrated scripting interface and XML‐based format provides a means to analyze data sets in a regimented and reproducible fashion.

Although these applications were designed with interconnectivity in mind, each piece can be used independently of the others. Aside from the .mev format of TM4 (tab‐delimited text with standardized column headers and comment rows), several other popular input and output formats are supported. While originally designed for two‐dye fluorescent microarray systems, TM4 has been expanded to support other technologies, such as the Affymetrix Genechip platform (Dalma‐Weiszhauz et al., 2006).

A SourceForge web site (http://sourceforge.net/projects/tm4) serves as the central code repository for TM4. This site also hosts the application downloads, user mailing lists, and discussion forums. The TM4 development team actively provides technical support via email. System requirements for each application are detailed in the documentation included with the download. The entire TM4 suite, including software, documentation, and sample data, can be downloaded from http://www.tm4.org.

The TM4 suite was originally developed at The Institute for Genomic Research, under the direction of principal investigator Dr. John Quackenbush. Grants to Dr. Quackenbush for TM4 development were provided by The National Cancer Institute, The National Science Foundation, The National Heart, Lung and Blood Institute, and the NHLBI's Programs for Genomics Applications (PGA). Details regarding the ongoing development of TM4 and the teams responsible are available at the aforementioned SourceForge site. Beyond the main TM4 development team, many organizations and individuals have contributed to this open source project. Their contributions and affiliations are listed in the documentation for each application. The development of TM4 continues through collaborative efforts of groups worldwide, but with work now concentrated at three primary sites: John Quackenbush and his group at the Dana‐Farber Cancer Institute and Harvard School of Public Health; members of the Pathogen Functional Genomics Resource Center's microarray software group at The Institute for Genomic Research; and Roger Bumgarner and his group at the University of Washington.

Section snippets

MADAM

Madam (also referred to as MADAM) is the data manager of TM4. It handles the tasks of data entry, tracking, and reporting while serving as an interface to a relational microarray database. Madam offers a series of data entry pages, which provide the user an easy method to load the database with information about their microarray experiments. Several report types display vital information about various stages of the experiment and let the user track the progress. Madam also houses several

Spotfinder

Image processing is a key component of the microarray experiment (Minor 2006, Timlin 2006). Each two‐color spotted microarray slide will typically produce two gray‐scale 16‐bit images in TIFF format. Each image corresponds to a single labeling dye such that the two images complement each other spatially and need to be processed in parallel. The microarray TIFF image is the end product of the portions of the microarray experiment conducted in the laboratory.

Despite being digital media, in

Program Settings

Check the program settings and change them based on the actual slide type; if the number of channels is changed, it is necessary to close and restart the program. The user also may change the visualization scale factor depending on the image size and available video card memory; it is recommended to keep default settings for the initial use. Use the menu bar to go to Settings→General Settings. Make sure that the Channel Number is set to 2 for a two‐dye experiment and that the Scale Factor also

MIDAS

Microarray analysis is a comparative analysis. In a two‐color experiment, cDNA or mRNA abundances are compared between two samples. During a microarray experiment, the different samples are dyed with Cy3 (green) and Cy5 (red) fluorophores and are cohybridized to a glass slide. After scanning the slide and performing an image‐processing procedure, the intensities for each spot, for both green and red channels, are recorded.

An underlying assumption in microarray analysis is that differences

MeV

After spot scanning and normalization comes the data analysis step that is usually of most interest to microarray practitioners, namely mining data to look for biologically significant patterns of gene expression (Ayroles 2006, Downey 2006; Neal and Westwood, Reimers 2006, Royce 2006). The MeV (MultiExperiment Viewer) software incorporates an extensive array of clustering, statistical, and visualization tools that can be used to analyze preprocessed microarray data. An intuitive and

Sample Analysis Walk‐Through

This section presents a sample Midas and MeV analysis that takes data through filtering and normalization, clustering and statistical analysis, and on to biological role analysis. To take full advantage of this walk‐through it is best to download the applications and the sample data set so that one can follow along. Data for this analysis walk‐through can be downloaded from this ftp site: ftp://ftp.tigr.org/pub/software/Microarray/MeV/MIE_data/. Each section indicates the proper files to use to

References (62)

  • T.E. Royce et al.

    Extrapolating traditional DNA microarray statistics to tiling and protein microarray technologies

    Methods Enzymol.

    (2006)
  • J.A. Timlin

    Scanning microarrays: Current methods and future directions

    Methods Enzymol.

    (2006)
  • C. Troein et al.

    An introduction to BioArray Software Environment

    Methods Enzymol.

    (2006)
  • P.L. Whetzel et al.

    Using ontologies to annotate microarray experiments

    Methods Enzymol.

    (2006)
  • M. Ashburner et al.

    Gene ontology: Tool for the unification of biology

    Nature Genet.

    (2000)
  • A. Ben‐Dor et al.

    Clustering gene expression patterns

    J. Comput. Biol.

    (1999)
  • A. Brazma et al.

    Minimum information about a microarray experiment (miame)‐toward standards for microarray data

    Nature Genet.

    (2001)
  • A. Brazma et al.

    Arrayexpress: A public repository for microarray gene expression data at the EBI

    Nucleic Acids Res.

    (2003)
  • M.P. Brown et al.

    Knowledge‐based analysis of microarray gene expression data by using support vector machines

    Proc. Natl. Acad. Sci. USA

    (2000)
  • Y. Chen et al.

    Ratio‐based decisions and the quantitative analysis of cDNA microarray images

    J. Biomed. Optics

    (1997)
  • G. Chu et al.

    SAM “Significance Analysis of Microarrays.”

    (2002)
  • A.C. Culhane et al.

    Between‐group analysis of microarray data

    Bioinformatics

    (2002)
  • D.D. Dalma‐Weiszhauz et al.

    The Affymetrix Gene Chip® platform: An overview

    Methods Enzymol.

    (2006)
  • J. Dopazo et al.

    Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree

    J. Mol. Evol.

    (1997)
  • S. Dudoit et al.

    Multiple hypothesis testing in microarray experiments

    Stat. Sci.

    (2003)
  • S. Dudoit et al.

    Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments

    (2000)
  • R. Edgar et al.

    Gene Expression Omnibus: NCBI gene expression and hybridization array data repository

    Nucleic Acids Res.

    (2002)
  • M.B. Eisen et al.

    Cluster analysis and display of genome‐wide expression patterns

    Proc. Natl. Acad. Sci. USA

    (1998)
  • K. Fellenberg et al.

    Correspondence analysis applied to microarray data

    Proc. Natl. Acad. Sci. USA

    (2001)
  • D.B Finkelstein et al.

    Iterative linear regression by sector: Renormalization of cDNA microarray data and cluster analysis weighted by cross homology

  • J. Jeremy Gollub et al.

    Clustering microarray data

    Methods Enzymol.

    (2006)
  • Cited by (1455)

    • Elucidation of the fucose metabolism of probiotic Lactobacillus rhamnosus GG by metabolomic and flux balance analyses

      2022, Journal of Biotechnology
      Citation Excerpt :

      PLS-DA was performed using SIMCA-P + software (Version 16.0; Umetrics AB, Umea, Sweden). HCA was performed using the MultiExperiment Viewer application (Saeed et al., 2006). ANOVA was performed using Statistica (version 7.1; StatSoft, Tulsa, OK, USA).

    View all citing articles on Scopus
    View full text