In a data mining project evolved on a relational database often a significant effort needs to be done to construct the data set for the analysis. In fact, usually the database contains a series of normalized tables that need to be joined, aggregated and processed in an appropriate way to build the data set. This process generates various SQL queries that are written independently of each other, in a disordered manner. In this way, the database grows with tables and views which are not present at the conceptual level and this can yield problems for the development of the database. In this paper we consider a typical database containing data about students, courses and exams and illustrate some SQL transformations to build a data set to perform a sequential pattern analysis eventually combined with clustering and classification. In particular, we introduce in the student database some interesting patterns representing relationship between the exams given by students in various periods and the career of each student. This is achieved by introducing a particular encoding of a the career of a student. The resulting table can be analyzed with clustering and classification algorithms. We present a case study following this organization.

A Preprocessing Design Scheme for Sequential Pattern Analysis of a Student Database / R. Campagni; D. Merlini; M. C. Verri. - ELETTRONICO. - 2:(2016), pp. 99-106. (Intervento presentato al convegno CSEDU 2016 tenutosi a Rome, Italy nel 21-23 April).

A Preprocessing Design Scheme for Sequential Pattern Analysis of a Student Database

CAMPAGNI, RENZA;MERLINI, DONATELLA;VERRI, MARIA CECILIA
2016

Abstract

In a data mining project evolved on a relational database often a significant effort needs to be done to construct the data set for the analysis. In fact, usually the database contains a series of normalized tables that need to be joined, aggregated and processed in an appropriate way to build the data set. This process generates various SQL queries that are written independently of each other, in a disordered manner. In this way, the database grows with tables and views which are not present at the conceptual level and this can yield problems for the development of the database. In this paper we consider a typical database containing data about students, courses and exams and illustrate some SQL transformations to build a data set to perform a sequential pattern analysis eventually combined with clustering and classification. In particular, we introduce in the student database some interesting patterns representing relationship between the exams given by students in various periods and the career of each student. This is achieved by introducing a particular encoding of a the career of a student. The resulting table can be analyzed with clustering and classification algorithms. We present a case study following this organization.
2016
VIII International Conference on Computer Supported Education Proceedings
CSEDU 2016
Rome, Italy
21-23 April
R. Campagni; D. Merlini; M. C. Verri
File in questo prodotto:
File Dimensione Formato  
CSEDU_2016_46.pdf

Accesso chiuso

Tipologia: Pdf editoriale (Version of record)
Licenza: Tutti i diritti riservati
Dimensione 186.16 kB
Formato Adobe PDF
186.16 kB Adobe PDF   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1039159
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact