Some statistical analyses of the italian language
DOI:
https://doi.org/10.6092/issn.1973-2201/668Abstract
Certain quantitative characteristics of contemporary Italian have been examined. The formulations refer to three sources of data:— Lessico di frequenza della lingua italiana contemporanea (LIF) published by IBM Italy,edited by U. Bortolini, C. Tagliavini, A.Zampoli. Lif refers to 500 .000 words taken in equal parts from the Theatre, Novels, Cinema, Periodicals, Supplements.
— Corriere della Sera from 11-12 July 1880 (approx. 100,000 letters);
— Corriere della Sera 12 May 1980.
Of course this research is intended only as an initial contribution to a further in-depth study of statistical specifics of contemporary Italian.
The following aspects have been examined:
— distribution of syllables in The Italian language;
— distribution of monograms, digraphs, trigrams;
— average length of words;
— distribution of punctuation (full stops, commas, exclamation marks, etc.);
— beginnings and ends of words.
As regards the first point, that is the distribution of syllables in the Italian language, it has been shown that out of a total of approximated 560.000 words, 1,400 distinct syllables were to be found, of which 60% were syllables of two letters, 27% of three, l.8% of one and 48% of four.
How to Cite
Rizzi, A. (1985). Some statistical analyses of the italian language. Statistica, 45(1), 7–31. https://doi.org/10.6092/issn.1973-2201/668
Issue
Section
Articles
License
Copyright (c) 1985 Statistica
Copyrights and publishing rights of all the texts on this journal belong to the respective authors without restrictions.
This journal is licensed under a Creative Commons Attribution 4.0 International License (full legal code).
See also our Open Access Policy.