Malware family classification based on text embedding feature representation
DOI:
Author:
Affiliation:

Clc Number:

TP309.7

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
    Abstract:

    Automation, efficiency, and granularity are major challenges in the area of malware detection and classification. With the successful application of deep learning in the fields of image processing, speech recognition and natural language processing, it has alleviated the enormous pressure of traditional analysis methods on manpower and time cost to some extent. This paper describes mal2vec: an automatic, efficient and finegrained malware analysis method, which treats each malware as a text with rich behavioral semantic information. The content of the text is composed of API sequences when malware is dynamically executed. We use the classical neural probability model Doc2Vec to train the text set. The experimental results show that the effect of this paper is significantly improved compared with the classification effect of Rieck et al. In particular, unlike other methods of deep learning, this method can extract the intermediate results of model training for explicit representation. This explicit intermediate result is interpretable and allows us to analyze the behavior patterns of the malware family from a finegrained level.

    Reference
    Related
    Cited by
Get Citation

Cite this article as: zhangtao, wang junfeng. Malware family classification based on text embedding feature representation [J]. J Sichuan Univ: Nat Sci Ed, 2019, 56: 441.

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:August 09,2018
  • Revised:December 15,2018
  • Adopted:December 21,2018
  • Online: May 29,2019
  • Published: