Abstract:When using Naive Bayes classifier to classify texts, the feature selection method has a direct impact on the performance of the classifier.In this paper, a maximum discrimination (MD)feature selection algorithm is proposed. After N types of probability distributions are obtained through training, the ability to distinguish the categories of each feature in its feature vector d is acquiredby testing the sample, and a new feature vector ε is constructed for classification, the selected features from the feature selection have the maximum discrimination capacity for text categorization. Simulation results show that compared with cMFD, CSFS and CMFS feature selection algorithms, MD feature selection algorithm can obtain higher classification accuracy when fewer features are selected.