基于局部对抗训练的命名实体识别方法研究
作者:
作者单位:

四川大学网络空间安全学院

作者简介:

通讯作者:

中图分类号:

TP391.1

基金项目:

四川省重点研发项目(2020YFG0076); 四川大学基金(2020SCUNG205); 国家自然科学基金(U2066203,61473197)


Name Entity Recognition based on Local Adversarial Training
Author:
Affiliation:

College of Cybersecurity, Sichuan University

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    命名实体识别研究中,数据集内普遍存在实体与非实体,实体内部类别间边界样本混淆的问题,极大地影响了命名实体识别方法的性能.提出以BiLSTM-CRF为基线模型,结合困难样本筛选与目标攻击对抗训练的命名实体识别方法.该方法筛选出包含大量边界样本的困难样本,利用边界样本易被扰动偏离正确类别的特性,采用按照混淆矩阵错误概率分布的目标攻击方法,生成对抗样本用于对抗训练,增强模型对混淆边界样本的识别能力.为验证该方法的优越性,设计非目标攻击方式的全局、局部对抗训练方法与目标攻击全局对抗训练方法作为对比实验.实验结果表明,该方法提高了对抗样本质量,保留了对抗训练的优势,在JNLPBA、MalwareTextDB、Drugbank三个数据集上F1值分别提升1.34%、6.03%、3.65%.

    Abstract:

    Boundary samples of different categories staggered on the boundary in the datasets of named entity recognition research, which affects the performance of named entity recognition model. A method based on local adversarial training and BiLSTMCRF model is proposed to solve the problem above. The method selects hard examples which contain a lot of boundary samples to crafting adversarial samples. The process is based on the characteristics of boundary samples that are easily perturbed to leave from the correct category, and then get adversarial samples from the target attack step according to the confusion matrix error probability distribution. Finally, the datasets mixing with the original data and the adversarial is used to adversarial training to enhance the model’s recognition ability. In order to verify the superiority of this method, global/local adversarial training based on nontarget attack method and local adversarial training based on target attack are designed as comparative experiments. Experimental results show that the method proposed improves the quality of adversarial samples while retaining the advantages of adversarial training. The F1 scores on the three datasets of JNLPBA, MalwareTextDB, and Drugbank are increased by 1.34%, 6.03%, and 3.65% respectively.

    参考文献
    相似文献
    引证文献
引用本文

引用本文格式: 李静,程芃森,许丽丹,刘嘉勇. 基于局部对抗训练的命名实体识别方法研究[J]. 四川大学学报: 自然科学版, 2021, 58: 023003.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-06-17
  • 最后修改日期:2020-11-11
  • 录用日期:2020-11-19
  • 在线发布日期: 2021-04-02
  • 出版日期:
通知
自2024年3月6日起,《四川大学学报(自然科学版)》官网已迁移至新网站:https://science.scu.edu.cn/,此网站数据不再更新。
关闭