Landslides and floods in Italy are the most frequent and diffuse natural hazards causing fatalities and damages to urban areas. Traditional methods as photo-interpretation, remote sensing or retrieval data from technical reports are the most common to set up event inventories. These systems rarely rely on automated or real-time updates. The retrieval of data, using specific data mining algorithms, from newspapers allows continuous feedback from real world and can further extend the exploitable data. Exploiting the data from mass media allows to get information about disaster situations with a relatively high temporal and spatial resolution to map natural hazards across various locations. Several techniques have been developed to mine data for different natural hazard, but rarely applied about landslide and flood news. The algorithm Semantic Engine to Classification and Geotagging News (SECaGN), based on a semantic engine, automatically retrieves information from online newspaper. 184.322 newspaper articles have been harvested from 2010 to 2019, referred to 32.525 landslide news and to 34.560 floods news in Italy. In this work, the data harvested by SECaGN underwent to a manual classification based on news relevance, localization accuracy and time of publication. Most of the news referred to recent events or are generically referred to landslide or floods (remediation work, hazard scenarios) and only a minimum part it was made up by wrong news. This classification allowed to identify the “true news” and to reject the data not appropriate, reducing the uncertainties. The harvested data have been used to identify the media impact of the events (both landslides or floods), their temporal distribution and those areas where more events happened, allowing a fast hazard estimation of the Country. The retrieved news data have been then compared with traditional sensors (e.g. rain gauges) and official reports about victims, damages, funds for soil protection and risk maps. Results did not show any clear correlation between the distribution of news and the other parameters, but it resulted that the regions that experienced a relevant number of events recorded lower funds for soil protection and vice versa. In conclusion, this work allowed to demonstrate the possibility of using automatically retrieved data from newspaper to create a reliable landslide (and flood) inventory, to be used as a proxy for hazard assessment over wide areas and to investigate the distribution of the phenomena and their correlation with other parameters, providing a powerful tool for a rapid hazard assessment in support of public authorities and decision makers.
Using automated web data mining for natural hazard assessment / Rosi A.; Franceschini R.; Casagli N.; Catani F.. - ELETTRONICO. - (2023), pp. 11984-11984. (Intervento presentato al convegno EGU General Assembly 2023 tenutosi a Vienna, Austria nel 24–28 April 2023) [10.5194/egusphere-egu23-11984].
Using automated web data mining for natural hazard assessment
Franceschini R.;Casagli N.;
2023
Abstract
Landslides and floods in Italy are the most frequent and diffuse natural hazards causing fatalities and damages to urban areas. Traditional methods as photo-interpretation, remote sensing or retrieval data from technical reports are the most common to set up event inventories. These systems rarely rely on automated or real-time updates. The retrieval of data, using specific data mining algorithms, from newspapers allows continuous feedback from real world and can further extend the exploitable data. Exploiting the data from mass media allows to get information about disaster situations with a relatively high temporal and spatial resolution to map natural hazards across various locations. Several techniques have been developed to mine data for different natural hazard, but rarely applied about landslide and flood news. The algorithm Semantic Engine to Classification and Geotagging News (SECaGN), based on a semantic engine, automatically retrieves information from online newspaper. 184.322 newspaper articles have been harvested from 2010 to 2019, referred to 32.525 landslide news and to 34.560 floods news in Italy. In this work, the data harvested by SECaGN underwent to a manual classification based on news relevance, localization accuracy and time of publication. Most of the news referred to recent events or are generically referred to landslide or floods (remediation work, hazard scenarios) and only a minimum part it was made up by wrong news. This classification allowed to identify the “true news” and to reject the data not appropriate, reducing the uncertainties. The harvested data have been used to identify the media impact of the events (both landslides or floods), their temporal distribution and those areas where more events happened, allowing a fast hazard estimation of the Country. The retrieved news data have been then compared with traditional sensors (e.g. rain gauges) and official reports about victims, damages, funds for soil protection and risk maps. Results did not show any clear correlation between the distribution of news and the other parameters, but it resulted that the regions that experienced a relevant number of events recorded lower funds for soil protection and vice versa. In conclusion, this work allowed to demonstrate the possibility of using automatically retrieved data from newspaper to create a reliable landslide (and flood) inventory, to be used as a proxy for hazard assessment over wide areas and to investigate the distribution of the phenomena and their correlation with other parameters, providing a powerful tool for a rapid hazard assessment in support of public authorities and decision makers.I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.