TY - GEN
T1 - Identification and classification of misogynous tweets using multi-classifier fusion
AU - Liu, Han
AU - Chiroma, Fatima
AU - Cocea, Mihaela
N1 - Article has no DOI.
PY - 2018/7/27
Y1 - 2018/7/27
N2 - For this study, we used the Doc2Vec embedding approach for feature extraction, with the context window size of 2, minimum word frequency of 2, sampling rate of 0.001, learning rate of 0.025, minimum learning rate of 1.0E-4, 200 layers, batch size of 10000 and 40 epochs. Distributed Memory (DM) is used as the embedding learning algorithm with the negative sampling rate of 5.0. Before feature extraction, all the tweets were pre-processed by converting the characters to their lower case, removing stop words, numbers, punctuations and words that contain no more than 3 characters as well as stemming all the kept words by Snowball Stemmer. Additionally, three classifiers are trained by using SVM with a linear kernel, random forests (RF) and gradient boosted trees (GBT). In the testing stage, the same way of text pre-processing and feature extraction is applied to test instances separately, and each pair of two out of the three trained classifiers (SVM+RF, SVM+GBT and RF+GBT) are fused by combining the probabilities for each class by averaging.
AB - For this study, we used the Doc2Vec embedding approach for feature extraction, with the context window size of 2, minimum word frequency of 2, sampling rate of 0.001, learning rate of 0.025, minimum learning rate of 1.0E-4, 200 layers, batch size of 10000 and 40 epochs. Distributed Memory (DM) is used as the embedding learning algorithm with the negative sampling rate of 5.0. Before feature extraction, all the tweets were pre-processed by converting the characters to their lower case, removing stop words, numbers, punctuations and words that contain no more than 3 characters as well as stemming all the kept words by Snowball Stemmer. Additionally, three classifiers are trained by using SVM with a linear kernel, random forests (RF) and gradient boosted trees (GBT). In the testing stage, the same way of text pre-processing and feature extraction is applied to test instances separately, and each pair of two out of the three trained classifiers (SVM+RF, SVM+GBT and RF+GBT) are fused by combining the probabilities for each class by averaging.
KW - Misogynous
KW - Multi-classiffier Fusion
KW - Social Media
UR - http://www.sepln2018.com/en/ibereval/
M3 - Conference contribution
T3 - CEUR Workshop Proceedings
SP - 268
EP - 273
BT - Proceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages
A2 - Rosso, Paolo
A2 - Gonzalo, Julio
A2 - Martínez, Raquel
A2 - Montalvo, Soto
A2 - Carrillo-de-Albornoz, Jorge
PB - CEUR Workshop Proceedings
T2 - Evaluation of Human Language Technologies for Iberian Languages
Y2 - 19 September 2018 through 21 September 2018
ER -