MALGRA: Machine learning and N-GRAM malware feature extraction and detection system
Research output: Contribution to journal › Article › peer-review
Standard
MALGRA : Machine learning and N-GRAM malware feature extraction and detection system. / Ali, Muhammad; Shiaeles, Stavros; Bendiab, Gueltoum; Ghita, Bogdan.
In: Electronics (Switzerland), Vol. 9, No. 11, 1777, 26.10.2020.Research output: Contribution to journal › Article › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - MALGRA
T2 - Machine learning and N-GRAM malware feature extraction and detection system
AU - Ali, Muhammad
AU - Shiaeles, Stavros
AU - Bendiab, Gueltoum
AU - Ghita, Bogdan
PY - 2020/10/26
Y1 - 2020/10/26
N2 - Detection and mitigation of modern malware are critical for the normal operation of an organisation. Traditional defence mechanisms are becoming increasingly ineffective due to the techniques used by attackers such as code obfuscation, metamorphism, and polymorphism, which strengthen the resilience of malware. In this context, the development of adaptive, more effective malware detection methods has been identified as an urgent requirement for protecting the IT infrastructure against such threats, and for ensuring security. In this paper, we investigate an alternative method for malware detection that is based on N-grams and machine learning. We use a dynamic analysis technique to extract an Indicator of Compromise (IOC) for malicious files, which are represented using N-grams. The paper also proposes TF-IDF as a novel alternative used to identify the most significant N-grams features for training a machine learning algorithm. Finally, the paper evaluates the proposed technique using various supervised machine-learning algorithms. The results show that Logistic Regression, with a score of 98.4%, provides the best classification accuracy when compared to the other classifiers used.
AB - Detection and mitigation of modern malware are critical for the normal operation of an organisation. Traditional defence mechanisms are becoming increasingly ineffective due to the techniques used by attackers such as code obfuscation, metamorphism, and polymorphism, which strengthen the resilience of malware. In this context, the development of adaptive, more effective malware detection methods has been identified as an urgent requirement for protecting the IT infrastructure against such threats, and for ensuring security. In this paper, we investigate an alternative method for malware detection that is based on N-grams and machine learning. We use a dynamic analysis technique to extract an Indicator of Compromise (IOC) for malicious files, which are represented using N-grams. The paper also proposes TF-IDF as a novel alternative used to identify the most significant N-grams features for training a machine learning algorithm. Finally, the paper evaluates the proposed technique using various supervised machine-learning algorithms. The results show that Logistic Regression, with a score of 98.4%, provides the best classification accuracy when compared to the other classifiers used.
KW - API call
KW - Decision Tree
KW - Dynamic analysis
KW - Logistic Regression
KW - Machine learning
KW - Malware
KW - N-grams
KW - Naive Bayes
KW - Random Forests
KW - Sandbox
KW - SNDBOX
UR - http://www.scopus.com/inward/record.url?scp=85094179131&partnerID=8YFLogxK
U2 - 10.3390/electronics9111777
DO - 10.3390/electronics9111777
M3 - Article
AN - SCOPUS:85094179131
VL - 9
JO - Electronics
JF - Electronics
SN - 2079-9292
IS - 11
M1 - 1777
ER -
ID: 23441431