Evaluating sensitivity and specificity of three diagnostic tests when the “gold standard” is unavailable,


  • Dario Basso Università degli Studi di Padova
  • Katia Capello Istituto Zooprofilattico Sperimentale delle Venezie
  • Livio Corain Università degli Studi di Padova
  • Luigi Salmaso Università degli Studi di Padova




In the context diagnostic tests may be assessed through indicators of diagnosis reliability called specificity and sensitivity. In practice, these indicators can be estimated only if a “gold standard” test is available, meaning that its diagnosis is the most reliable one available as to the prevalence of an illness in a population.
Starting from a real case study related to cattle Q fever disease in small ruminants, the aim of this work is to determine which of the three examined diagnostic tests is the best, taking into account the fact that there is neither any a priori information on the sensitivity and specificity of the three tests, nor a reference “gold standard” diagnostic test. Moreover, the incidence of the disease in the reference population is unknown.
Our approach, which is mainly descriptive in nature, derived estimates of sensitivity and specificity of the diagnostic tests from incidence of the disease. The estimates are obtained by minimizing the least squares and a performed simulation study shows that on average the method provides unbiased estimates of unknown parameters. The application of the method to a real case study make it possible to establish a hierarchy among the three diagnostic tests in question.


P.A. BEARE, J.E. SAMUEL, D. HOWE, K. VIRTANEVA, S.F. PORCELLA, R.A. HEINZEN (2006). Genetic diversity of the Q fever agent, Coxiella burnetii, assessed by microarray-based whole-genome comparisons. “Journal of Bacteriology”, 188, 7, pp. 2309-2324.

R.H. BYRD, P. LU, J. NOCEDAL, C. ZHU (1995). A limited memory algorithm for bound constrained optimization, “SIAM J. Scientific Computing”, 16, pp. 1190-1208.

R. GUATTEO, F. BEAUDEAU, A. JOLY, H. SEEGERS (2007). Coxiella burnetii shedding by dairy cows. “Veterinary Research”, 38, 6, pp. 849-60.

S.L. HUI, S.D. WALTER (2001). Estimating the error rates of diagnostic tests, “Biometrics”, 36, pp.


W.O. JOHNSON, J.L. GASTWIRTH, L.M. PEARSON (2001). Screening without a “Gold Standard”: The Hui-Walter Paradigm Revisited, “American Journal of Epidemiology”, 153, 9, pp. 921-924.

L. JOSEPH, T.W. GYORKOS, L. COUPAL (1995). Bayesian estimation of disease prevalence and parameters for diagnostic tests in the absence of a gold standard. “American Journal of Epidemiology”, 141, pp. 263-72.

T.J. MARRIE (EDITOR) (1990). Q Fever. CRC Press.

C. MEHTA, N. PATEL (1998). Exact Inference for Categorical Data. In: P. Armitage, T. Coltin (Eds.). “Encyclopedia of Biostatistics” (Vols. 1-6). NY: John Wiley.

A. NEATH, F.J. SAMANIEGO (1997). On the efficacy of Bayesian inference for nonidentifiable models. “American Statistician”, 51, pp. 225-32.

R DEVELOPMENT CORE TEAM (2008). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Wien, Austria, http://www.R-project.org.




How to Cite

Basso, D., Capello, K., Corain, L., & Salmaso, L. (2009). Evaluating sensitivity and specificity of three diagnostic tests when the “gold standard” is unavailable,. Statistica, 69(1), 15–26. https://doi.org/10.6092/issn.1973-2201/3545