Big Data...a few Outliers = Big Mistakes. Un nuovo processo per l'individuazione di outliers

Maurizio Rosina

Big Data...a few Outliers = Big Mistakes. Un nuovo processo per l'individuazione di outliers

Autori

Maurizio Rosina RLD – Ricerca e Laboratorio Digitale – Società Generale d'Informatica

Abstract

The search and identification of outliers is a fundamental step, generally preparatory to the elaborations aimed at obtaining consistent results. The new approach devised for the identification of outliers in space R2 benefits from geometric / statistical techniques largely independent from the type of data distribution, and is based on four methodological pillars: clustering, the convex hull peeling technique, a specific metric and Chebyshev's inequality, which is valid for any type of univariate distribution of values. The modularity and the generality of the approach, coupled to the research and identification of outliers based on strictly statistical parameters, make the approach presented a useful and daily tool for those who need to process bivariate data with the security of being able to previously identify outliers.

Riferimenti bibliografici

Amidan B. G., Ferryman T. A., Cooley S. K. (2005) Data Outlier Detection using the Chebyshev Theorem, IEEE Aerospace Conference Proceedings

Porzio G. C. & G. Ragozini (2000) Peeling multivariate data sets: a new approach, Quaderni di Statistica, Vol. 2

Ester M., Kriegel H-P., Sander J., Xu X. (1996) A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, in Proceedings of 2nd International Conference on

Knowledge Discovery and Data Mining.

Riani M. & S. Zani (1998) Generalized Distance Measures for Asymmetric Multivariate Distributions, in Advances in Data Science and Classification: Proceedings of the 6th Conference of the

International Federation of Classification Societies (IFCS-98), Università "La Sapienza”, Rome, 21–24 July, 503-508, Springer

Savage R., (1961) Probability Inequalities of the Tchebycheff Type, Journal of Research of the National Bureau of Standards, B. Mathematics and Mathematical Physics, Vol. 65B, No.3

Zani S., Riani M., Corbellini A. (1998), Robust bivariate boxplots and multiple outlier detection, Computational Statistics & Data Analysis, Elsevier

Dowloads

Pubblicato

2018-05-08

Fascicolo

V. 22 N. 1 (2018): GEOmedia 1 2018

Sezione

REPORT

Licenza

Gli autori che pubblicano su questa rivista accettano le seguenti condizioni:

Gli autori mantengono i diritti sulla loro opera e cedono alla rivista il diritto di prima pubblicazione dell'opera, contemporaneamente licenziata sotto una Licenza Creative Commons - Attribuzione che permette ad altri di condividere l'opera indicando la paternità intellettuale e la prima pubblicazione su questa rivista.
Gli autori possono aderire ad altri accordi di licenza non esclusiva per la distribuzione della versione dell'opera pubblicata (es. depositarla in un archivio istituzionale o pubblicarla in una monografia), a patto di indicare che la prima pubblicazione è avvenuta su questa rivista.
Gli autori possono diffondere la loro opera online (es. in repository istituzionali o nel loro sito web) prima e durante il processo di submission, poiché può portare a scambi produttivi e aumentare le citazioni dell'opera pubblicata (Vedi The Effect of Open Access).

Come citare

Big Data...a few Outliers = Big Mistakes. Un nuovo processo per l’individuazione di outliers. (2018). GEOmedia, 22(1). https://ojs.mediageo.it/index.php/GEOmedia/article/view/1520

Scarica citazione

Big Data...a few Outliers = Big Mistakes. Un nuovo processo per l'individuazione di outliers

Autori

Abstract

Riferimenti bibliografici

Dowloads

Pubblicato

Fascicolo

Sezione

Licenza

Come citare

Altri articoli dello/a stesso/a autore/rice

Fai una proposta

Sviluppato a cura di

Lingua

Informazioni

Ultime pubblicazioni