Large-scale simultaneous inference with applications to the detection of differential expression with. (with discussion)


  • Geoffrey J. Mclachlan University of Queensland
  • Kent Wang University of Queensland
  • Shu-Kay Ng University of Queensland



An important problem in microarray experiments is the detection of genes that are differentially expressed in agiven mumber of classes. We consider a straightforward and easily implemented method for estimating the posterior probability that an individual gene is null. The problem can be expressed in a two-component mixture framework, using an empirical Bayes approach. Current methods of implementing this approach either have some limitations due to the minimal assumptions made or with more specific assumptions are computationally intensive. By converting to a z-score the value of the test statistic used to test the significance of each gene, we can use a simple two-component normal mixture to model adequately the distribution of this score. In the context of the application of this approach to a well known breast cancer data set, we consider some of the issues associated with the problem of the detection of differential expression, including the case where there is need for the use of an empirical null distribution in place of the standard normal (the theoretical null) and the case where none of the genes might be differentially expressed. We also describe briefly some initial results on a cluster analysis approach to this problem, which attempts to model the joint distribution of the individual gene expressions. This latter approach thus has to make distributional assumptions which are note necessary with the former approach based on the z-scores. However, in the case where the distributional assumptions are valid, it has the potential to provide a more powerful analysis.


D.B. ALLISON, G.L. GADBURY, M. HEO, J.R. FERNANDEZ, C.-K. LEE, T.A. PROLLA and R. WEINDRUCH (2002), A mixture model approach for the analysis of microarray gene expression data., “Computational Statistics & Data Analysis”, 39, pp. 1-20.

Y. BENJAMINI and Y. HOCHBERG (1995), Controlling the false discovery rate: a practical and powerful approach to multiple testing, “Journal of the Royal Statistical Society”, B57, pp. 289-300.

P. BROËT, A. LEWIN, S. RICHARDSON, C. DALMASSO and H. MAGDELENAT (2004), A mixture modelbased strategy for selecting sets of genes in multiclass response microarray experiments, “Bioinformatics”, 20, pp. 2562-2571.

A.P. DEMPSTER, N.M. LAIRD and D.B. RUBIN (1977), Maximum likelihood from incomplete data via the EM algorithm (with discussion). “Journal of the Royal Statistical Society”, B39, pp. 1-38.

K.-A. DO, P. MÜLLER and F. TANG (2005), A Bayesian mixture model for differential gene expression. “Applied Statistics”, 54, 627-644.

B. EFRON (2004), Large-scale simultaneous hypothesis testing: the choice of a null hypothesis, “Journal of the American Statistical Association”, 99, 96-104.

B. EFRON (2005a), Selection and Estimation for Large-Scale Simultaneous Inference, “Technical Report”, Stanford, CA: Department of Statistics, Stanford University,˜brad/papers/Selection.pdf.

B. EFRON (2005b), Local False Discovery Rates. “Technical Report”, Stanford, CA: Department of Statistics, Stanford University,˜brad/papers/False.pdf.

B. EFRON, R. TIBSHIRANI (2002), Empirical Bayes methods and false discovery rates for microarrays, “Genetic Epidemiology.”, 23, pp. 70-86.

B. EFRON, R. TIBSHIRANI, J.D. STOREY and V.G. TUSHER (2001), Empirical Bayes analysis of a microarray experiment, “Journal of the American Statistical Association”, 96, pp. 1151-1160.

R. GOTTARDO, A.E. RAFTERY, K.Y. YEUNG and R.E. BUMGARNER (2006), Bayesian robust inference for differential gene expression in cDNA microarrays with multiple samples, “Biometrics”, 62, to appear.

X. GUO, W. PAN (2005), Using weighted permutation scorse to detect differential gene expression with microarray

data, “Journal of Bioinformatics and Computational Biology”, 3, pp. 989-1006.

I. HEDENFALK et al. (2001), Gene-expression profiles in hereditary breast cancer, “The New England Journal of Medicine”, 344, pp. 539-548.

M.-L.T. LEE, F.C. KUO, G.A. WHITMORE and J. SKLAR (2000), Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations, “Proceedings of the National Academy of Science”, USA 97, pp. 9834-9838.

I. LÖNNSTEDT, T. SPEED (2002) Replicated microarray data, “Statistica Sinica”, 12, pp. 31-46.

G.J. MCLACHLAN (1987), On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture, “Applied statistics”, 36, pp. 318-324.

G.J. MCLACHLAN, R.W. BEAN and L. BEN-TOVIM JONES (2006), A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays, “Bioinformatics”, 22, pp. 1608-1615.

G.J. MCLACHLAN, K.-A. DO and C. AMBROISE (2004), Analyzing Microarray Gene Expression Data, Wiley, Hoboken, New Jersey.

G.J. MCLACHLAN, T. KRISHNAN (1997), The EM Algorithm and Extensions, Wiley, New York.

G.J. MCLACHLAN, D. PEEL (2000), Finite Mixture Models, Wiley, New York.

M.A. NEWTON, C.M. KENDZIORSKI, C.S. RICHMOND, F.R. BLATTNER and K.W. TSUI (2001), On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data, “Journal of Computational Biology”, 8, pp. 37-52.

M.A. NEWTON, A. NOUEIRY, D. SARKAR and P. AHLQUIST (2004), Detecting differential gene expression with a semiparametric hierarchical mixture method, “Bioinformatics”, 5, pp. 155-176.

W. PAN (2002), A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments, “Bioinformatics”, 18, pp. 546-554.

W. PAN (2003), On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression,“Bioinformatics”, 19, pp. 1333-1340.

W. PAN, J. LIN and C.T. LE (2003), A mixture model approach to detecting differentially expressed genes with microarray data. Model-based cluster analysis of microarray gene-expression data, “Genome Biology”, 3, research0009.1-0009.8.

Y. PAWITAN, S. MICHIELS, S. KOSCIELNY, A. GUSNANTO and A. PLONER (2005), False discovery rate, sensitivity and sample size for microarray studies, “Bioinformatics”, 21, pp. 3017-3024.

A. PLONER, S. CALZA, A. GUSNANTO and Y. PAWITAN (2006), Multidimensional local false discovery rate for microarray studies, “Bioinformatics”, 22, pp. 556-565.

S. POUNDS, S.W. MORRIS (2003), Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values, “Informatics”, 19, pp. 1236-1242.

X. QIU, L. KLEBANOV and A. YAKOVLEV (2003), Correlation between gene expression levels and limitations of the empirical Bayes methodology for finding differentially expressed genes, “Statistical Applications in Genetics and Molecular Biology”, 4, n. 1, Article 34.

G.K. SMYTHY (2004), Linear models and empirical Bayes methods for assessing differential expression in microarray experiments, “Statistical Applications in Genetics and Molecular Biology”, 3, n. 1, Article 3.

J.D. STOREY (2002), A direct approach to false discovery rates, “Journal of the Royal Statistical Society”, B 64, pp. 479-498.

J.D. STOREY, R. TIBSHIRANI (2003), Statistical significance for genome-wide studies, “Proceedings of the National Academy of Sciences”, USA 100, pp. 9440-9445.

J. TAYLOR, R. TIBSHIRANI and B. EFRON (2005), The ‘miss rate’ for the analysis of gene expression data, “Biostatistitics”, 6, pp. 111-117.

V.G. TUSHER, R. TIBSHIRANI and G. CHU (2001), Significance analysis of microarrays applied to the ionizing radiation response, “Proceedings of the National Academy of Sciences”, USA 98, pp. 5116-5121.

A.B. VAN’T WOUT et al. (2003), Cellular gene expression upon human immunodeficiency virus type 1 infection of CD4+-T-cell linear, “Journal of Virology”, 77, pp. 1392-1402.

E.B. WILSON and M.M. HILFERTY (1931), The distribution of chi-square, “Proceedings of the National Academy of Sciences”, USA 28, pp. 94-100.

Y. XIE, W. PAN and A.B. KHODURSKY (2005), A note on using permutation-based false discovery rate estimate to compare different analysis methods for microarray data, “Bioinformatics”, 21, pp. 4280-4288.

Y. ZHAO, W. PAN (2003), Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments, “Bioinformatics”, 19, pp. 1046-1054.




How to Cite

Mclachlan, G. J., Wang, K., & Ng, S.-K. (2008). Large-scale simultaneous inference with applications to the detection of differential expression with. (with discussion). Statistica, 68(1), 3–30.