Malware family classification based on text embedding feature representation

Home > Archive>Volume 56, Issue 3, 2019 >441-449

Malware family classification based on text embedding feature representation
DOI:
                        
Author:
                        
Affiliation:
Clc Number:TP309.7
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Abstract:

Automation, efficiency, and granularity are major challenges in the area of malware detection and classification. With the successful application of deep learning in the fields of image processing, speech recognition and natural language processing, it has alleviated the enormous pressure of traditional analysis methods on manpower and time cost to some extent. This paper describes mal2vec: an automatic, efficient and finegrained malware analysis method, which treats each malware as a text with rich behavioral semantic information. The content of the text is composed of API sequences when malware is dynamically executed. We use the classical neural probability model Doc2Vec to train the text set. The experimental results show that the effect of this paper is significantly improved compared with the classification effect of Rieck et al. In particular, unlike other methods of deep learning, this method can extract the intermediate results of model training for explicit representation. This explicit intermediate result is interpretable and allows us to analyze the behavior patterns of the malware family from a finegrained level.

Reference

Cited by

Get Citation

Cite this article as: zhangtao, wang junfeng. Malware family classification based on text embedding feature representation [J]. J Sichuan Univ: Nat Sci Ed, 2019, 56: 441.

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:August 09,2018
Revised:December 15,2018
Adopted:December 21,2018
Online: May 29,2019
Published:

Home

About journal

Authors

Referees

Editors

Readers

Contact us

Get Citation

Share

Article Metrics

History