Comparison of Boxplots for Outlier Detection in Performance Modelling

Author : Joice Mary Philip 1 Abraham George 2

Date of Publication :9th October 2021

Abstract: Distorted values creeping into a data due to sampling, experimental, instrumental, manual , data handling or data processing errors can mislead the prediction of performance. Misfits in an observational data has to be diagnosed which need to be treated before modelling. Quality of data on the material characteristics, determines the accuracy in the performance prediction of a product. In this paper, the reported incompetence of models in a research data and the reason for model inaccuracy is considered . Examination of the data under study using Tukey’s traditional boxplot, and two other medcouple based adjusted boxplots indicated presence of outliers in the data on characteristics of different types of fly ash. Skewness in the data on fly ash characteristics revealed through histogram and density plots were dealt by transformations done to the data. Impact of data transformation in outlier detection is studied for the 3 boxplots. Suitability of each method for the detection of outliers is assessed using sensitivity and specificity calculations. Sensitivity or True Positive Rate is found to be maximum in modified adjusted boxplots while specificity or True Negative Rate is found to be maximum in traditional boxplots. Adjusted boxplots showed least variation in the results with transformed and nontransformed data which suggests it to be suitable for a nontransformed data. Performance models could predict well for the winsorised data based on adjusted box plots

