词义归纳(word sense induction,简称WSI)是在给定包含多义词语料的条件下,识别出多义词词义的过程,通常是采用聚类的方法。本文提出了基于主题模型的方法来解决中文词义归纳问题,基于主题模型的词义归纳方法使用文档的主题概率分布来推断多义词的词义分布。实验结果表明,本文方法在测试数据上获得了77.58% F-Score值。
Abstract:
Sense Induction is the process of identifying the word sense given its context, often treated as a clustering task. In this paper, we present a approach to Chinese Word Sense Induction which is based on topic modeling. Key to our methodology is the use of probabilistic assignment of topics distributions to documents to estimate sense distributions. Experimental results show that our method could achieve 77.58% scores of F-score on the development data set.