C-ORAL-ROM, Integrated Reference Corpora For Spoken Romance Languages, is a multilingual corpus of spontaneous speech delivered within the IST Program. Corpora are tagged with respect to terminal and non terminal prosodic breaks. Terminal breaks are considered the most perceptively relevant cues to determine the utterance boundaries in spontaneous speech resources. The Chapter presents the evaluation of the inter-annotator agreement accomplished by LOQUENDO and shows the level of reliability of the tagging delivered and the annotation scheme adopted. The data show, at cross-linguistic level, a very high K coefficient (between 7.7 and 9.2, according to the language resource). A strong level of agreement specifically for terminal breaks has also been recorded. The data thus show that the annotation of the utterances identified in terms of their prosodic breaks is able to capture relevant perceptual facts, and it appears that the proposed coding scheme can be applied in a highly replicable way.

Evaluation of consensus on the annotation of terminal and non-terminal prosodic breaks in the C-ORAL-ROM corpus / M. MONEGLIA; FABBRI M.; QUAZZA S.; PANIZZA A.; DANIELI. M; GARRIDO J. M.; SWERTS M.. - STAMPA. - (2005), pp. 257-276.

Evaluation of consensus on the annotation of terminal and non-terminal prosodic breaks in the C-ORAL-ROM corpus

MONEGLIA, MASSIMO;
2005

Abstract

C-ORAL-ROM, Integrated Reference Corpora For Spoken Romance Languages, is a multilingual corpus of spontaneous speech delivered within the IST Program. Corpora are tagged with respect to terminal and non terminal prosodic breaks. Terminal breaks are considered the most perceptively relevant cues to determine the utterance boundaries in spontaneous speech resources. The Chapter presents the evaluation of the inter-annotator agreement accomplished by LOQUENDO and shows the level of reliability of the tagging delivered and the annotation scheme adopted. The data show, at cross-linguistic level, a very high K coefficient (between 7.7 and 9.2, according to the language resource). A strong level of agreement specifically for terminal breaks has also been recorded. The data thus show that the annotation of the utterances identified in terms of their prosodic breaks is able to capture relevant perceptual facts, and it appears that the proposed coding scheme can be applied in a highly replicable way.
2005
9789027222862
C-ORAL-ROM. Integrated Reference Corpora for Spoken Romance Languages – Amsterdam / Philadelphia, pp.
257
276
M. MONEGLIA; FABBRI M.; QUAZZA S.; PANIZZA A.; DANIELI. M; GARRIDO J. M.; SWERTS M.
File in questo prodotto:
File Dimensione Formato  
moneglia 2.pdf

Accesso chiuso

Tipologia: Versione finale referata (Postprint, Accepted manuscript)
Licenza: Tutti i diritti riservati
Dimensione 530.15 kB
Formato Adobe PDF
530.15 kB Adobe PDF   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/229877
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact