Italian contributions on some recent research topics in cluster analysis


  • Daniela Giovanna Calò Alma Mater Studiorum - Università di Bologna



The paper presents a selective view of the issues that are attracting the interest of Italian statisticians working on clustering methods and applications. It does not aim at providing a comprehensive overview of the wealth of methods developed in Italy on the selected topics: indeed, it focuses on methods dealing with quantitative data and, in this context, only on the most recent literature. The fil rouge is given by the developments which have been inspired in quantitative data clustering by the complex nature of the data nowadays arising in a broad range of applications.


G. ADELFIO, M. CHIODI, A. D’ALESSANDRO, D. LUZIO, (2010), Clustering of waveforms-data based on FPCA direction, in “Proceedings of COMPSTAT 2010”, Physica-Verlag.

M. ALFÒ, L. NIEDDU, D. VICARI, (2009), Finite mixture models for mapping spatially dependent disease counts, “Biometrical Journal”, 51, pp. 84-97.

A.C. ATKINSON, M. RIANI, (2007), Exploratory tools for clustering multivariate data, “Computational Statistics and Data Analysis”, 52, pp. 272-285.

A.C. ATKINSON, M. RIANI, A. CERIOLI (2010), The Forward Search: theory and data analysis, “Journal of the Korean Statistical Society”, 39, pp. 117-134.

L. AUGUGLIARO, A. MINEO, (2011), Plaid model for microarray data: an enhancement of the pruning step, in B. Fichet et al. (eds.) “Classification and multivariate analysis for complex data structures”, pp. 447-456. Springer, Heidelberg.

J. BAEK, G.J. MCLACHLAN, L. FLACK, (2010), Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data, “IEEE Transactions on Pattern Analysis and Machine intelligence”, 32, pp. 1298-1309.

S. BALBI, R. MIELE, G. SCEPI, (2010), Clustering of documents from a two-way viewpoint, in “JADT 2010: 10 th international Conference on Statistical Analysis of Textual Data”.

A. BALZANELLA, Y. LECHEVALLIER, R. VERDE, (2011), Clustering multiple data streams, in S. Ingrassia, et al. (eds.) “New Perspectives in Statistical Modeling and Data Analysis”, Springer.

J.D. BANFIELD, A.E. RAFTERY, (1993), Model-based Gaussian and non-Gaussian clustering, “Biometrics”, 49, pp. 803-821.

R. BARAGONA, (2010), Dissimilarity indexes for clustering multivariate time series, available at

R. BARAGONA, F. BATTAGLIA, I. POLI, (2011), Evolutionary Statistical Procedures, Springer-Verlag, Heidelberg.

F. BARTOLUCCI, (2005), Clustering univariate observations via mixtures of unimodal normal mixtures, “Journal of Classification”, 22, pp. 203-219.

J.P. BAUDRY, A.E. RAFTERY, G. CELEUX, K. LO, R. GOTTARDO, (2010), Combining mixture components for clustering, “Journal of Computational and Graphical Statistics”, 19, pp. 332-353.

D.G. CALÒ, C. VIROLI, (2010), A dimensionally reduced finite mixture model for multilevel data, “Journal of Multivariate Analysis”, 101, pp. 2543-2553.

A. CERIOLI, (2010), Multivariate outlier detection with high-breakdown estimators, “Journal of the American Statistical Association”, 105, pp. 147-156.

R. COPPI, P. D’URSO, P. GIORDANI, (2010), A fuzzy clustering model for multivariate spatial time series, “Journal of Classification”, 27, pp. 54-88.

R. COPPI, P. D’URSO, P. GIORDANI, (2011), Fuzzy and possibilistic clustering for fuzzy data, “Computational Statistics & Data Analysis”, doi: 10.1016/j.csda.2010.09.013.

M. CORDUAS, (2010), Mining time series data: a selective survey, in F. Palumbo et al. (eds.) “Data Analysis and Classification”, pp. 355-362. Springer, Heidelberg.

M. CORDUAS, D. PICCOLO, (2008), Time series clustering and classification by the autoregressive metric, “Computational Statistics & Data Analysis”, 52, pp. 4685-4698.

P. CORETTO, C. HENNIG, (2010), A simulation study to compare robust clustering methods based on mixtures, “Advances in Data Analysis and Classification”, 4, pp. 111-135.

P. CORETTO, C. HENNIG, (2011), Maximum likelihood estimation of heterogeneous mixtures of Gaussian and uniform distributions, “Journal of Statistical Planning and inference”,141, pp. 462-473.

L. DE ANGELIS, (2011), The multidimensional measurement of poverty: a longitudinal analysis, in “JOCLAD2011 - Book of Abstract”, pp. 49-52.

A. DE GREGORIO, S.M. IACUS, (2010), Clustering of discretely observed diffusion processes, “Computational Statistics & Data Analysis”, 54, pp. 598-606.

T. DI BATTISTA, S.A. GATTONE, A. DE SANCTIS, (2011), Dealing with FDA estimation methods, in S. Ingrassia, et al. (eds.) “New Perspectives in Statistical Modeling and Data Analysis”, Springer.

E. DIDAY, M. NOIRHOMME, (2008), Symbolic Data Analysis, Wiley, New York.

P. D’URSO, (2000), Dissimilarity measures for time trajectories, “Statistical Methods & Applications”, pp. 53-83.

P. D’URSO, E.A. MAHARAJ, (2009), Autocorrelation-based fuzzy clustering of time series, “Fuzzy Sets and Systems”, 160, pp. 3565-3589.

P. D’URSO, E.A. MAHARAJ, Wavelet-based clustering of multivariate time series, “Fuzzy Sets and Systems”, in press.

A. FARCOMENI, (2009), Robust double clustering, “Journal of Classification”, 26, pp. 77-101.

G. GALIMBERTI, A. MONTANARI, C. VIROLI, (2008), Penalized factor mixture analysis for variable selection in clustered data, “Computational Statistics & Data Analysis”, 53, pp. 4301-4310.

G. GALIMBERTI, G. SOFFRITTI, (2009), Discovering multidimensional unobserved heterogeneity through model-based cluster analysis, available at

L.A. GARCÌA-ESCUDERO, A. GORDALIZA, C. MATRÁN, A. MAYO-ISCAR, (2010), A review of robust clustering methods, “Advances in Data Analysis and Classification”, 4, pp. 89-109.

N. GERSHENFELD, B. SCHONER, F. METOIS, (1999), Cluster-weighted modelling for time-series analysis, “Advances in Data Analysis and Classification”, 397, pp. 329-332.

F. GIORDANO, M. LA ROCCA, M.L. PARRELLA, (2011), Clustering complex time series databases, in B. Fichet et al. (eds.) “Classification and multivariate analysis for complex data structures”, pp. 417-426. Springer, Heidelberg.

F. GRESELIN, S. INGRASSIA, (2010), Constrained monotone EM algorithms for mixtures of multivariate t distributions, “Statistics and Computing”, 20, pp. 9-22.

D.J. HAND, (2009), Modern statistics: the myth and the magic, “Journal of the Royal Statistical Society”, A, 172, pp. 287-306.

C. HENNIG, (2004), Breakdown points for maximum likelihood-estimators of location-scale mixtures, “Annals of Statistics”, 32, pp. 1313-1340.

C. HENNIG, (2010), Methods for merging Gaussian mixture components, “Advances in Data Analysis and Classification”, 4, pp. 3-34.

S. INGRASSIA, R. ROCCI, (2007), Constrained monotone EM algorithms for finite mixture of multivariate Gaussians, “Computational Statistics & Data Analysis”, 51, pp. 5339-5351.

S. INGRASSIA, R. ROCCI, (2011), Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints, “Computational Statistics & Data Analysis”, 55, pp. 1715-1725.

S. INGRASSIA, C. MINOTTI, G. VITTADINI, (2010), Cluster Weighted Modelling wit Student-t components, available at

A. IODICE D’ENZA, F. PALUMBO, M. GREENACRE, (2008), Exploratory data analysis leading towards the most interesting simple association rules, “Computational Statistics & Data Analysis”, 52, pp. 3269-3281.

A. IRPINO, R. VERDE, (2008), Dynamic clustering of interval data using a Wasserstein-based distance, “Pattern Recognition Letters”, 29, pp. 1648-1658.

T.I. LIN, (2009), Maximum likelihood estimation for multivariate skew normal mixture models, “Journal of Multivariate Analysis”, 100, pp. 257-265.

E.A. MAHARAJ, P. D’URSO, (2011), Fuzzy clustering of time series in the frequency domain, “Information Sciences”, 181, pp. 1187-1211.

E.A. MAHARAJ, P. D’URSO, D.U.A. GALAGEDERA, (2010), Wavelet-based fuzzy clustering of time series, “Journal of Classification”, 27, pp. 231-275.

F. MARTELLA, M. ALFÒ, M. VICHI, (2010), Biclustering of gene expression data by an extension of mixtures of factor analyzers, “The international Journal of Biostatistics”, 4, doi: 10.2202/1557-4679.1078.

A. MARUOTTI, R. ROCCI, (2010), A semiparametric approach to mixed non-homogeneous hidden Markov models, avalilable at SVI_Vicari/851-1532-1-DR.pdf.

A. MARUOTTI, T. RYDEN, (2009), A semiparametric approach to hidden Markov models under longitudinal observations, “Statistics and Computing”, 19, pp. 381-393.

C. MAUGIS, G. CELEUX, M.L. MARTIN-MAGNIETTE, (2009), Variable selection for clustering with Gaussian mixture models, “Biometrics”, 65, pp. 701-709.

G.J. MCLACHLAN, R.W. BEAN, L. BEN-TOVIM JONES, (2007), Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution, “Computational Statistics & Data Analysis”, 51, pp. 5327-5338.

A. MONTANARI, C. VIROLI, (2010), Heteroscedastic factor mixture analysis, “Statistical Modelling”, 10, pp. 441-460.

A. MONTANARI, C. VIROLI, (2010), The independent factor analysis approach to latent variable modeling, “Statistics”, 44, pp. 397-416.

I. MORLINI, (2007), Searching for structure in measurements of air pollutant concentration, “Environmetrics”, 18, pp. 823-840.

I. MORLINI, S. ZANI, (2010), A dissimilarity measure between two hierarchical clusterings, in “CLADAG 2010- Book of Abstract”, pp. 219-210.

E. OTRANTO, (2008), Clustering heteroscedastic time series by model-based procedures, “Computational Statistics & Data Analysis”, 52, pp. 4685-4698.

E. OTRANTO, (2010), Identifying financial time series with similar dynamical conditional correlation, “Computational Statistics & Data Analysis”, 54, pp. 1-15.

F. PALUMBO, D. VISTOCCO, A. MORINEAU, (2008), Huge multidimensional data visualization: back to the virtue of principal coordinates and dendrograms in the new computer age, in C. Chun-Houh et al. (eds.) “Handbook of Data Visualization”, pp. 349-387. Springer, Heidelberg.

D. PEEL, G. MCLACHLAN, (2000), Robust mixture modeling using the t-distribution, “Statistics and Computing”, 10, pp. 339-348.

D. PICCOLO, (1990), A distance measure for classifying ARIMA models, “Journal of Time Series Analysis”, 11, pp. 153-164.

D. PIGOLI, L.M. SANGALLI, (2010), Wavelet smoothing for curves in more than one dimension, available at

A.E. RAFTERY, N. DEAN, (2006), Variable selection for model-based cluster analysis, “Journal of the American Statistical Association”, 101, pp. 168-178.

M. RIANI, A.C. ATKINSON, A. CEROLI, (2009), Finding an unknown number of multivariate outliers, “Journal of the Royal Statistical Society B”, B, 71, pp. 447-466.

R. ROCCI, (2010), Mixing mixtures of Gaussians, GfKl-CLADAG 2010 Book of Abstracts, pp. 27-28.

R. ROCCI, M. VICHI, (2005), Three-mode component analysis with crisp or fuzzy partition of units, “Psychometrika”, 70, pp. 715-736.

R. ROCCI, M. VICHI, (2010), Two-mode multi-partitioning, “Computational Statistics & Data Analysis”, 52, pp. 1984-2003.

E. ROMANO, A. BALZANELLA, R. VERDE, (2010), A new regionalization method for spatially dependent functional data based on local variogram models: an application on environmental data, available at

P.J. ROUSSEEUW, K. VAN DRIESSEN, (1999), A fast algorithm for the minimum covariance determinant estimator, “Technometrics”, 41, pp. 212-223.

L.M. SANGALLI, P. SECCHI, S. VATINI, V. VITELLI, (2010), k-mean alignment for curve clustering, “Computational Statistics & Data Analysis”, 54, pp. 1219-1233.

L. SCRUCCA, (2010), Genetic algorithms for subset selection in model-based clustering, available at

L. SCRUCCA, (2010), Dimension reduction for model-based clustering, “Statistics and Computing”, 20, pp. 471-484.

I. VAN MECHELEN, H.-H. BOCK, P. DE BOECK, (2004), Two-mode clustering methods: a structured overview, “Statistical Methods in Medical Research”, 13, pp. 363-394.

R. VERDE, A. IRPINO, (2008), Comparing histogram data using a Mahalanobis Wasserstein distance, in P. Brito (ed.), “COMPSTAT 2008”, pp. 77-89. PhysicaVerlag, Berlin.

J.K. VERMUNT, B. TRAN, J. MAGIDSON, (2008), Latent class models in longitudinal research, in S. Menard (ed.), “Handbook of Longitudinal Research: Design, Mesurement, and Analysis”, pp. 373-385. Burlington, MA.

D. VICARI, M. ALFÒ, (2010), Clustering discrete choice data, in Y. LECHEVALLIER, G. SAPORTA (eds.) Proceedings of COMPSTAT2010, pp. 369-378. Physica-Verlag, Heidelberg.

M. VICHI, (2000), Double k-means clustering for simultaneous classification of objects and variables, in S. Borra et al. (eds.), “Advances in Classification and Data Analysis”, pp. 43-52. Springer, Berlin.

M. VICHI, (2010), Clustering longitudinal multivariate observations, Personal communication,

M. VICHI, H.A.L. KIERS, (2001), Factorial k-means analysis for two-way data, “Computational Statistics & Data Analysis”, 37, pp. 49-64.

D. VICARI, M. VICHI, (2009), Structural classification analysis of three-way dissimilarity data, “Journal of Classification”, 26, pp. 121-154.

M. VICHI, G. SAPORTA, (2009), Clustering and disjoint principal component analysis, “Computational Statistics & Data Analysis”, 53, pp. 3194-3208.

C. VIROLI, (2010), Dimensionally reduced model-based clustering through mixtures of factor mixture analyzers, “Journal of Classification”, 27, pp. 363-388.

K. WANG, S.-K.NG, G.J. MCLACHLAN, (2010), Multivariate Skew-t Mixture Models, in “DICTA ’09”, doi:10.1109/DICTA.2009.88.

M.S. YANG, K.L. WU, (2006), Unsupervised possibilistic clustering, “Pattern Recognition”, 39, pp. 5-21.




How to Cite

Calò, D. G. (2012). Italian contributions on some recent research topics in cluster analysis. Statistica, 72(3), 271–286.