Background: Q&A sites allow to study how users reference and request support on technical debt. To date only few studies, focusing on narrow aspects, investigate technical debt on Stack Overflow. Aims: We aim at gaining an in-depth understanding on the characteristics of technical debt questions on Stack Overflow. In addition, we assess if identification strategies based on machine learning can be used to automatically identify and classify technical debt questions. Method: We use automated and manual processes to identify technical debt questions on Stack Overflow. The final set of 415 questions is analyzed to study (i) technical debt types, (ii) question length, (iii) perceived urgency, (iv) sentiment, and (v) themes. Natural language processing and machine learning techniques are used to assess if questions can be identified and classified automatically. Results: Architecture debt is the most recurring debt type, followed by code and design debt. Most questions display mild urgency, with frequency of higher urgency steadily declining as urgency rises. Question length varies across debt types. Sentiment is mostly neutral. 29 recurrent themes emerge. Machine learning can be used to identify technical debt questions and binary urgency, but not debt types. Conclusions: Different patterns emerge from the analysis of technical debt questions on Stack Overflow. The results provide further insights on the phenomenon, and support the adoption of a more comprehensive strategy to identify technical debt questions.
Asking about Technical Debt: Characteristics and Automatic Identification of Technical DebtQuestions on Stack Overflow / Kozanidis N.; Verdecchia R.; Guzman E.. - ELETTRONICO. - (2022), pp. 45-56. (Intervento presentato al convegno 16th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2022 tenutosi a fin nel 2022) [10.1145/3544902.3546245].
Asking about Technical Debt: Characteristics and Automatic Identification of Technical DebtQuestions on Stack Overflow
Verdecchia R.;
2022
Abstract
Background: Q&A sites allow to study how users reference and request support on technical debt. To date only few studies, focusing on narrow aspects, investigate technical debt on Stack Overflow. Aims: We aim at gaining an in-depth understanding on the characteristics of technical debt questions on Stack Overflow. In addition, we assess if identification strategies based on machine learning can be used to automatically identify and classify technical debt questions. Method: We use automated and manual processes to identify technical debt questions on Stack Overflow. The final set of 415 questions is analyzed to study (i) technical debt types, (ii) question length, (iii) perceived urgency, (iv) sentiment, and (v) themes. Natural language processing and machine learning techniques are used to assess if questions can be identified and classified automatically. Results: Architecture debt is the most recurring debt type, followed by code and design debt. Most questions display mild urgency, with frequency of higher urgency steadily declining as urgency rises. Question length varies across debt types. Sentiment is mostly neutral. 29 recurrent themes emerge. Machine learning can be used to identify technical debt questions and binary urgency, but not debt types. Conclusions: Different patterns emerge from the analysis of technical debt questions on Stack Overflow. The results provide further insights on the phenomenon, and support the adoption of a more comprehensive strategy to identify technical debt questions.File | Dimensione | Formato | |
---|---|---|---|
3544902.3546245.pdf
accesso aperto
Tipologia:
Pdf editoriale (Version of record)
Licenza:
Open Access
Dimensione
2.51 MB
Formato
Adobe PDF
|
2.51 MB | Adobe PDF |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.