结合主题模型与自监督学习的可控文本生成技术研究
作者:
作者单位:

四川大学网络空间安全学院

作者简介:

通讯作者:

中图分类号:

TP391

基金项目:

四川省科技厅重点研发项目(2021YFG0156)


Controllable text generation technology based on topic model and self supervised learning
Author:
Affiliation:

School of Cyber Sciencce and Engineering, Sichuan University

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    基于大型预训练语言模型的有监督学习方法在可控文本生成任务上取得了优秀的成果,但这些研究都着重于控制生成文本的高级属性(比如情感与主题),而忽略了泛化性问题.现有的基于自监督学习的研究方法则通过句子级别的训练来使模型获得补全整句的能力,使模型做到单词和短语级别的控制生成,但生成与特定属性强相关句子的能力依旧待提升.所以本文提出了一种单词级别(细粒度)与句子(粗粒度)级别相结合的多粒度训练方式:单词级别的主题模型让模型学习主题层面的语义以获得主题到文本的生成能力,句子级别的自监督训练让模型学习整句的表征以获得补全句子的能力.通过主题模型与自监督学习的结合,使模型在单词与短语级别的可控生成阶段取得了更好的效果.实验表明,本文提出的模型在主题契合度以及常规文本生成指标方面优于现有的基线模型.

    Abstract:

    Supervised learning methods based on large-scale pre-trained language models have achieved excellent results in controllable text generation tasks, but current approaches mainly focus on controlling the high-level attributes of the generated text such as emotion and theme, neglecting the generalization problem. The existing research methods based on self-supervised learning use sentence-level training to enable the model to obtain the ability to complete the entire sentence, so that the model can control the generation of words and phrases, but the generation is strongly related to specific attributes. To address this problem, this paper proposes a multi granularity training method combining word level (fine granularity) and sentence level (coarse granularity): word level topic model lets the model learn the semantics of the topic level to obtain the ability to generate topic to text, and sentence level self-monitoring training lets the model learn the representation of the whole sentence to obtain the ability to complete the sentence. Through the combination of topic model and self supervised learning, the model achieve better results in controlled generation at the word and phrase level. Experiments show that the proposed model is superior to the existing baseline model in terms of topic fit and conventional text generation metrics.

    参考文献
    相似文献
    引证文献
引用本文

引用本文格式: 胡益,刘嘉勇,代金鞘,贾鹏. 结合主题模型与自监督学习的可控文本生成技术研究[J]. 四川大学学报: 自然科学版, 2023, 60: 053002.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-10-14
  • 最后修改日期:2022-12-14
  • 录用日期:2022-12-23
  • 在线发布日期: 2023-10-12
  • 出版日期:
通知
自2024年3月6日起,《四川大学学报(自然科学版)》官网已迁移至新网站:https://science.scu.edu.cn/,此网站数据不再更新。
关闭