Data collection at ultra high-frequency on financial markets requires the manipulation of complex databases, and possibly the correction of errors present in the data. The New York Stock Exchange is chosen to provide evidence of problems affecting ultra high-frequency data sets. Standard filters can be applied to remove bad records from the trades and quotes data.A method for outlier detection is proposed to remove data which do not correspond to plausible market activity. Several methods of aggregation of the data are suggested, according to which corresponding time series of interest for econometric analysis can be constructed. As an example of the relevance of the procedure, the autoregressive conditional duration model is estimated on price durations. Failure to purge the data from “wrong” ticks is likely to shorten the financial durations between substantial price movements and to alter the autocorrelation profile of the series. The estimated coefficients and overall model diagnostics are considerably altered in the absence of appropriate steps in data cleaning. Overall the difference in the coefficients is bigger between the dirty series and the clean series than among series filtered with different algorithms.

Financial Econometric Analysis at Ultra--High Frequency: Data Handling Concerns / G. GALLO; BROWNLEES; C.T. - In: COMPUTATIONAL STATISTICS & DATA ANALYSIS. - ISSN 0167-9473. - STAMPA. - 51:(2006), pp. 2232-2245. [10.1016/j.csda.2006.09.030]

Financial Econometric Analysis at Ultra--High Frequency: Data Handling Concerns

GALLO, GIAMPIERO MARIA;
2006

Abstract

Data collection at ultra high-frequency on financial markets requires the manipulation of complex databases, and possibly the correction of errors present in the data. The New York Stock Exchange is chosen to provide evidence of problems affecting ultra high-frequency data sets. Standard filters can be applied to remove bad records from the trades and quotes data.A method for outlier detection is proposed to remove data which do not correspond to plausible market activity. Several methods of aggregation of the data are suggested, according to which corresponding time series of interest for econometric analysis can be constructed. As an example of the relevance of the procedure, the autoregressive conditional duration model is estimated on price durations. Failure to purge the data from “wrong” ticks is likely to shorten the financial durations between substantial price movements and to alter the autocorrelation profile of the series. The estimated coefficients and overall model diagnostics are considerably altered in the absence of appropriate steps in data cleaning. Overall the difference in the coefficients is bigger between the dirty series and the clean series than among series filtered with different algorithms.
2006
51
2232
2245
G. GALLO; BROWNLEES; C.T
File in questo prodotto:
File Dimensione Formato  
brownleesgallo_csda.pdf

Accesso chiuso

Tipologia: Versione finale referata (Postprint, Accepted manuscript)
Licenza: Tutti i diritti riservati
Dimensione 1.26 MB
Formato Adobe PDF
1.26 MB Adobe PDF   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/210321
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 150
  • ???jsp.display-item.citation.isi??? 127
social impact