In the software process, unresolved natural language (NL) ambiguities in the early requirements phases may cause problems in later stages of development. Although methods exist to detect domain-independent ambiguities, ambiguities are also influenced by the domain-specific background of the stakeholders involved in the requirements process. In this paper, we aim to estimate the degree of ambiguity of typical computer science words (e.g., system, database, interface) when used in different application domains. To this end, we apply a natural language processing (NLP) approach based on Wikipedia crawling and word embeddings, a novel technique to represent the meaning of words through compact numerical vectors. Our preliminary experiments, performed on five different domains, show promising results. The approach allows an estimate of the variation of meaning of the computer science words when used in different domains. Further validation of the method will indicate the words that need to be carefully defined in advance by the requirements analyst to avoid misunderstandings when editing documents and dealing with experts in the considered domains.
Detecting Domain-Specific Ambiguities: An NLP Approach Based on Wikipedia Crawling and Word Embeddings / Ferrari, Alessio; Donati, Beatrice; Gnesi, Stefania. - ELETTRONICO. - (2017), pp. 393-399. (Intervento presentato al convegno 2017 IEEE 25th International Requirements Engineering Conference Workshops (REW) (2017)) [10.1109/REW.2017.20].
Detecting Domain-Specific Ambiguities: An NLP Approach Based on Wikipedia Crawling and Word Embeddings
Ferrari, Alessio;Donati, Beatrice;
2017
Abstract
In the software process, unresolved natural language (NL) ambiguities in the early requirements phases may cause problems in later stages of development. Although methods exist to detect domain-independent ambiguities, ambiguities are also influenced by the domain-specific background of the stakeholders involved in the requirements process. In this paper, we aim to estimate the degree of ambiguity of typical computer science words (e.g., system, database, interface) when used in different application domains. To this end, we apply a natural language processing (NLP) approach based on Wikipedia crawling and word embeddings, a novel technique to represent the meaning of words through compact numerical vectors. Our preliminary experiments, performed on five different domains, show promising results. The approach allows an estimate of the variation of meaning of the computer science words when used in different domains. Further validation of the method will indicate the words that need to be carefully defined in advance by the requirements analyst to avoid misunderstandings when editing documents and dealing with experts in the considered domains.I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.