Skip to content
Back to outputs

MALGRA: Machine learning and N-GRAM malware feature extraction and detection system

Research output: Contribution to journalArticlepeer-review

Standard

MALGRA : Machine learning and N-GRAM malware feature extraction and detection system. / Ali, Muhammad; Shiaeles, Stavros; Bendiab, Gueltoum; Ghita, Bogdan.

In: Electronics (Switzerland), Vol. 9, No. 11, 1777, 26.10.2020.

Research output: Contribution to journalArticlepeer-review

Harvard

APA

Vancouver

Author

Bibtex

@article{103cc5eb42364133bc75ebc85eb6cd13,
title = "MALGRA: Machine learning and N-GRAM malware feature extraction and detection system",
abstract = "Detection and mitigation of modern malware are critical for the normal operation of an organisation. Traditional defence mechanisms are becoming increasingly ineffective due to the techniques used by attackers such as code obfuscation, metamorphism, and polymorphism, which strengthen the resilience of malware. In this context, the development of adaptive, more effective malware detection methods has been identified as an urgent requirement for protecting the IT infrastructure against such threats, and for ensuring security. In this paper, we investigate an alternative method for malware detection that is based on N-grams and machine learning. We use a dynamic analysis technique to extract an Indicator of Compromise (IOC) for malicious files, which are represented using N-grams. The paper also proposes TF-IDF as a novel alternative used to identify the most significant N-grams features for training a machine learning algorithm. Finally, the paper evaluates the proposed technique using various supervised machine-learning algorithms. The results show that Logistic Regression, with a score of 98.4%, provides the best classification accuracy when compared to the other classifiers used.",
keywords = "API call, Decision Tree, Dynamic analysis, Logistic Regression, Machine learning, Malware, N-grams, Naive Bayes, Random Forests, Sandbox, SNDBOX",
author = "Muhammad Ali and Stavros Shiaeles and Gueltoum Bendiab and Bogdan Ghita",
year = "2020",
month = oct,
day = "26",
doi = "10.3390/electronics9111777",
language = "English",
volume = "9",
journal = "Electronics",
issn = "2079-9292",
publisher = "MDPI AG",
number = "11",

}

RIS

TY - JOUR

T1 - MALGRA

T2 - Machine learning and N-GRAM malware feature extraction and detection system

AU - Ali, Muhammad

AU - Shiaeles, Stavros

AU - Bendiab, Gueltoum

AU - Ghita, Bogdan

PY - 2020/10/26

Y1 - 2020/10/26

N2 - Detection and mitigation of modern malware are critical for the normal operation of an organisation. Traditional defence mechanisms are becoming increasingly ineffective due to the techniques used by attackers such as code obfuscation, metamorphism, and polymorphism, which strengthen the resilience of malware. In this context, the development of adaptive, more effective malware detection methods has been identified as an urgent requirement for protecting the IT infrastructure against such threats, and for ensuring security. In this paper, we investigate an alternative method for malware detection that is based on N-grams and machine learning. We use a dynamic analysis technique to extract an Indicator of Compromise (IOC) for malicious files, which are represented using N-grams. The paper also proposes TF-IDF as a novel alternative used to identify the most significant N-grams features for training a machine learning algorithm. Finally, the paper evaluates the proposed technique using various supervised machine-learning algorithms. The results show that Logistic Regression, with a score of 98.4%, provides the best classification accuracy when compared to the other classifiers used.

AB - Detection and mitigation of modern malware are critical for the normal operation of an organisation. Traditional defence mechanisms are becoming increasingly ineffective due to the techniques used by attackers such as code obfuscation, metamorphism, and polymorphism, which strengthen the resilience of malware. In this context, the development of adaptive, more effective malware detection methods has been identified as an urgent requirement for protecting the IT infrastructure against such threats, and for ensuring security. In this paper, we investigate an alternative method for malware detection that is based on N-grams and machine learning. We use a dynamic analysis technique to extract an Indicator of Compromise (IOC) for malicious files, which are represented using N-grams. The paper also proposes TF-IDF as a novel alternative used to identify the most significant N-grams features for training a machine learning algorithm. Finally, the paper evaluates the proposed technique using various supervised machine-learning algorithms. The results show that Logistic Regression, with a score of 98.4%, provides the best classification accuracy when compared to the other classifiers used.

KW - API call

KW - Decision Tree

KW - Dynamic analysis

KW - Logistic Regression

KW - Machine learning

KW - Malware

KW - N-grams

KW - Naive Bayes

KW - Random Forests

KW - Sandbox

KW - SNDBOX

UR - http://www.scopus.com/inward/record.url?scp=85094179131&partnerID=8YFLogxK

U2 - 10.3390/electronics9111777

DO - 10.3390/electronics9111777

M3 - Article

AN - SCOPUS:85094179131

VL - 9

JO - Electronics

JF - Electronics

SN - 2079-9292

IS - 11

M1 - 1777

ER -

ID: 23441431