Abstract:Many data mining techniques cannot be applied directly to incomplete dataset which contains missing values. Furthermore, missing values will significantly reduce the effectiveness of the algorithm. So missing data management is an indispensable data preprocessing process. The proposed imputation method is based on statistical measurements named as grey class center missing value imputation (GCCMVI) approach. The missing values are imputed based on class center and standard deviation. Besides, the standard deviation is added (subtracted) or not determined by comparing the threshold and the relevance between class center and instance. Grey relational analysis is used to compute relevance. After the missing values are filled, the complete dataset is used to train the support vector machine (SVM) classifier. The comparative experiments are carried out on three datasets in different types. The classification accuracy, imputation performance and imputation time are used as criteria to evaluate the effectiveness of the proposed algorithm, experimental results show that it significantly improves the classification accuracy and imputation performance.