The contribution is the last chapter of the Volume matching the DVD publication of the Spoken romance corpus C-ORAL-ROM. The C-ORAL-ROM resource is a multilingual corpus of spontaneous speech for the main romance languages (French, Italian, Portuguese, Spanish), comprised of around 1,200,000 words and 124 hours of speech and integrated with tools for the exploitation of linguistic information at the textual and acoustic levels. C-ORAL-ROM is the main outcome of an EU project in the IST program of the 5th Framework Program (IST2000-26228), coordinated by Emanuela Cresti; it is distributed through ELDA and Benjamin Publishing Company, achieving a large diffusion in the scientific community. After a general premise on the comparison between spoken and written texts, this paper develops a detailed corpus based description of the most relevant linguistic strategies concerning speech lexicon and syntax shared by the four romance languages. Among the topics we consider are quantitative data on the percentage of verbs and nouns (i.e. the major verb usage in accordance with the variation of diaphasia in the corpus design) in comparison with a substantial homogeneity across the four languages, the high occurrence of verbless utterances, and some general tendencies of utterance information structure with the highest percentage being simple utterances followed by the topic-comment pattern. The final part of the paper is devoted to the analysis of speech coordination, subordination and negation through the automatic retrieval of the most common coordinative and subordinative conjunctions and negative adverbs, along with their distribution. In conclusion, the percentage data and the specific strategies of construction emerging from corpora analysis allow us to hypothesize a new perspective in the study of spoken language
Notes on lexical strategies, structural strategies and surface clause indexes in the C-ORAL-ROM spoken corpora / E. CRESTI. - STAMPA. - (2005), pp. 209-256.
Notes on lexical strategies, structural strategies and surface clause indexes in the C-ORAL-ROM spoken corpora
CRESTI, EMANUELA
2005
Abstract
The contribution is the last chapter of the Volume matching the DVD publication of the Spoken romance corpus C-ORAL-ROM. The C-ORAL-ROM resource is a multilingual corpus of spontaneous speech for the main romance languages (French, Italian, Portuguese, Spanish), comprised of around 1,200,000 words and 124 hours of speech and integrated with tools for the exploitation of linguistic information at the textual and acoustic levels. C-ORAL-ROM is the main outcome of an EU project in the IST program of the 5th Framework Program (IST2000-26228), coordinated by Emanuela Cresti; it is distributed through ELDA and Benjamin Publishing Company, achieving a large diffusion in the scientific community. After a general premise on the comparison between spoken and written texts, this paper develops a detailed corpus based description of the most relevant linguistic strategies concerning speech lexicon and syntax shared by the four romance languages. Among the topics we consider are quantitative data on the percentage of verbs and nouns (i.e. the major verb usage in accordance with the variation of diaphasia in the corpus design) in comparison with a substantial homogeneity across the four languages, the high occurrence of verbless utterances, and some general tendencies of utterance information structure with the highest percentage being simple utterances followed by the topic-comment pattern. The final part of the paper is devoted to the analysis of speech coordination, subordination and negation through the automatic retrieval of the most common coordinative and subordinative conjunctions and negative adverbs, along with their distribution. In conclusion, the percentage data and the specific strategies of construction emerging from corpora analysis allow us to hypothesize a new perspective in the study of spoken languageFile | Dimensione | Formato | |
---|---|---|---|
cresti-strategies-1.pdf
Accesso chiuso
Tipologia:
Versione finale referata (Postprint, Accepted manuscript)
Licenza:
Tutti i diritti riservati
Dimensione
1.84 MB
Formato
Adobe PDF
|
1.84 MB | Adobe PDF | Richiedi una copia |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.