Method for solving class imbalance of named entity recognition dataset
DOI:
Author:
Affiliation:

Clc Number:

TP391

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
    Abstract:

    The public data sets in named entity recognition research are often class label imbalanced,which limits the further performance improvement based on statistical learning model methods. Aiming at the above problems, a data class label balancing method based on genetic algorithm is proposed, which modifies the fitness function and gene combination rules tried to balance the dataset by generating new samples to augment the original dataset. In order to verify the validity, the proposed method was compared with the balanced undersampling method and the random oversampling method by using the BiLSTMCRF model on the CoNLL 2003 and JNLPBA datasets respectively. The results show that the proposed method increased the recall rate by 3.26% and the F1 value by 1.70% on the CoNLL2003 dataset, and the recall rate by 2.44% and the F1 value by 1.03% on the JNLPBA dataset. The experimental results demonstrate that the proposed method can effectively alleviate the class imbalance and improves the effect of named entity recognition.

    Reference
    Related
    Cited by
Get Citation

Cite this article as: xu lidan, Liu jiayong, He xiang. Method for solving class imbalance of named entity recognition dataset [J]. J Sichuan Univ: Nat Sci Ed, 2020, 57: 82.

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:May 11,2019
  • Revised:August 30,2019
  • Adopted:September 05,2019
  • Online: January 15,2020
  • Published: