On Studying the Effect of Data Quality on Classification Performances - Université Clermont Auvergne Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

On Studying the Effect of Data Quality on Classification Performances

Résumé

The field of data repairing is very active and produces a lot of repairing methods. This abundance of options can make choosing a re- pairing method hard. Our research question stems from this problem: Is it always better to repair data? Our work is placed in the context of clas- sification tasks and numerical data. We investigates our research ques- tion through five criteria: C1 the perceived difficulty of using a repairing method according to experts, C2 the impact of the level of degradation of data on accuracies and f1 scores, C3 the effectiveness of the repair- ing tool, C4 the impact of the type of error present in data and C5 the impact of the classification model used. Our main contributions are a method to evaluate the difficulty of using a repairing tool and a study of experimental results on accuracies and f1 scores investigating C2 to C5. We were able to identify two categories of error types: low impact and high impact on accuracy and f1 score. We also observed that repair- ing methods perform similarly both at very low and very high levels of errors.
Fichier principal
Vignette du fichier
IDEAL2022-paper2690.pdf (650.94 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03938077 , version 1 (13-01-2023)

Identifiants

Citer

Roxane Jouseau, Sébastien Salva, Chafik Samir. On Studying the Effect of Data Quality on Classification Performances. Intelligent Data Engineering and Automated Learning - {IDEAL} 2022, Nov 2022, Manchester, United Kingdom. pp.82-93, ⟨10.1007/978-3-031-21753-1_9⟩. ⟨hal-03938077⟩
87 Consultations
55 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More