Abstract:The public data sets in named entity recognition research are often class label imbalanced,which limits the further performance improvement based on statistical learning model methods. Aiming at the above problems, a data class label balancing method based on genetic algorithm is proposed, which modifies the fitness function and gene combination rules tried to balance the dataset by generating new samples to augment the original dataset. In order to verify the validity, the proposed method was compared with the balanced undersampling method and the random oversampling method by using the BiLSTMCRF model on the CoNLL 2003 and JNLPBA datasets respectively. The results show that the proposed method increased the recall rate by 3.26% and the F1 value by 1.70% on the CoNLL2003 dataset, and the recall rate by 2.44% and the F1 value by 1.03% on the JNLPBA dataset. The experimental results demonstrate that the proposed method can effectively alleviate the class imbalance and improves the effect of named entity recognition.