Some statistical analyses of the italian language


  • Alfredo Rizzi Università di Roma “La Sapienza”



Certain quantitative characteristics of contemporary Italian have  been examined. The formulations refer to three sources of data:
— Lessico di  frequenza della lingua italiana  contemporanea (LIF) published by IBM  Italy,edited  by U. Bortolini, C. Tagliavini, A.Zampoli. Lif refers to 500 .000 words taken  in equal parts from the Theatre, Novels, Cinema, Periodicals, Supplements.
— Corriere della Sera from 11-12 July 1880 (approx. 100,000 letters);
— Corriere della Sera 12 May 1980.
  Of course this research is intended only as an initial contribution to a further in-depth study of statistical specifics of contemporary Italian.
  The following aspects have been examined:
— distribution of syllables in The Italian language;
— distribution of monograms, digraphs, trigrams;
— average length of words;
— distribution of punctuation (full stops, commas, exclamation marks, etc.);
— beginnings and ends of words.
As regards the first  point, that is the distribution of syllables in the Italian language, it has been shown that out of a total of approximated 560.000 words, 1,400 distinct syllables were to be found, of which 60% were syllables  of  two letters, 27% of three,  l.8% of one and 48% of four.

How to Cite

Rizzi, A. (1985). Some statistical analyses of the italian language. Statistica, 45(1), 7–31.


