This research investigates the pragmatic competence of Large Language Models (LLMs) in interpreting implicit meanings within Italian political discourse. Using the IMPAQTS-PIDMM dataset, which is a multimodal benchmark derived from the 2.5-million-token IMPAQTS corpus, the experiment evaluates how e!ectively models identify tendentious content such as presuppositions and implicatures. The study compares the performance of text-only LLMs against speech-based models (SpeechLMs) that process both audio and transcriptions to determine if acoustic cues enhance understanding. The results reveal that text-only models significantly outperform multimodal variants, with Qwen2.5-72B achieving the highest global accuracy of 0.863. Surprisingly, the inclusion of audio did not improve performance, as SpeechLMs like GPT-4o-mini-audio-preview and Qwen2-Audio-7B-Instruct obtained lower accuracy scores and a higher frequency of missed answers compared to their text-only equivalents. Across all tested architectures, models generally demonstrated a superior ability to process presuppositions over implicatures.

Evaluating the abilities of LLMs and SpeechLMs in discovering implicit contents of Italian political speeches / Lorenzo Gregori; Walter Paci; Alessandro Panunzi. - ELETTRONICO. - (2026), pp. 165-170.

Evaluating the abilities of LLMs and SpeechLMs in discovering implicit contents of Italian political speeches

Lorenzo Gregori
;
Walter Paci
;
Alessandro Panunzi
2026

Abstract

This research investigates the pragmatic competence of Large Language Models (LLMs) in interpreting implicit meanings within Italian political discourse. Using the IMPAQTS-PIDMM dataset, which is a multimodal benchmark derived from the 2.5-million-token IMPAQTS corpus, the experiment evaluates how e!ectively models identify tendentious content such as presuppositions and implicatures. The study compares the performance of text-only LLMs against speech-based models (SpeechLMs) that process both audio and transcriptions to determine if acoustic cues enhance understanding. The results reveal that text-only models significantly outperform multimodal variants, with Qwen2.5-72B achieving the highest global accuracy of 0.863. Surprisingly, the inclusion of audio did not improve performance, as SpeechLMs like GPT-4o-mini-audio-preview and Qwen2-Audio-7B-Instruct obtained lower accuracy scores and a higher frequency of missed answers compared to their text-only equivalents. Across all tested architectures, models generally demonstrated a superior ability to process presuppositions over implicatures.
2026
978-2-493814-76-0
The 3rd Workshop on Natural Language Processing for Political Sciences (PoliticalNLP 2026) @ LREC 2026
165
170
Lorenzo Gregori; Walter Paci; Alessandro Panunzi
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1471182
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact