A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence

Eslam Amer*, Ivan Zelinka

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Malware API call graph derived from API call sequences is considered as a representative technique to understand the malware behavioral characteristics. However, it is troublesome in practice to build a behavioral graph for each malware. To resolve this issue, we examine how to generate a simple behavioral graph that characterizes malware. In this paper, we introduce the use of word embedding to understand the contextual relationship that exists between API functions in malware call sequences. We also propose a method that segregating individual functions that have similar contextual traits into clusters. Our experimental results prove that there is a significant distinction between malware and goodware call sequences. Based on this distinction, we introduce a new method to detect and predict malware based on the Markov chain. Through modeling the behavior of malware and goodware API call sequences, we generate a semantic transition matrix which depicts the actual relation between API functions. Our models return an average detection precision of 0.990, with a false positive rate of 0.010. We also propose a prediction methodology that predicts whether an API call sequence is malicious or not from the initial API calling functions. Our model returns an average accuracy for the prediction of 0.997. Therefore, we propose an approach that can block malicious payloads instead of detecting them after their post-execution and avoid repairing the damage.

Original languageEnglish
Article number101760
Number of pages15
JournalComputers and Security
Volume92
Early online date11 Feb 2020
DOIs
Publication statusPublished - 1 May 2020

Keywords

  • API call sequence
  • Chain sequence
  • Malware detection
  • Malware prediction
  • Word embedding

Fingerprint

Dive into the research topics of 'A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence'. Together they form a unique fingerprint.

Cite this