Identification and classification of misogynous tweets using multi-classifier fusion

Han Liu, Fatima Chiroma, Mihaela Cocea

Research output: Chapter in Book/Report/Conference proceedingConference contribution

58 Downloads (Pure)


For this study, we used the Doc2Vec embedding approach for feature extraction, with the context window size of 2, minimum word frequency of 2, sampling rate of 0.001, learning rate of 0.025, minimum learning rate of 1.0E-4, 200 layers, batch size of 10000 and 40 epochs. Distributed Memory (DM) is used as the embedding learning algorithm with the negative sampling rate of 5.0. Before feature extraction, all the tweets were pre-processed by converting the characters to their lower case, removing stop words, numbers, punctuations and words that contain no more than 3 characters as well as stemming all the kept words by Snowball Stemmer. Additionally, three classifiers are trained by using SVM with a linear kernel, random forests (RF) and gradient boosted trees (GBT). In the testing stage, the same way of text pre-processing and feature extraction is applied to test instances separately, and each pair of two out of the three trained classifiers (SVM+RF, SVM+GBT and RF+GBT) are fused by combining the probabilities for each class by averaging.
Original languageEnglish
Title of host publicationProceedings of the Third Workshop on Evaluation of Human Language Technologies for Iberian Languages
EditorsPaolo Rosso, Julio Gonzalo, Raquel Martínez, Soto Montalvo, Jorge Carrillo-de-Albornoz
PublisherCEUR Workshop Proceedings
Number of pages6
Publication statusPublished - 27 Jul 2018
EventEvaluation of Human Language Technologies for Iberian Languages: IberEval 2018 - Seville, Spain
Duration: 19 Sept 201821 Sept 2018

Publication series

NameCEUR Workshop Proceedings
ISSN (Print)1613-0073


WorkshopEvaluation of Human Language Technologies for Iberian Languages


  • Misogynous
  • Multi-classiffier Fusion
  • Social Media


Dive into the research topics of 'Identification and classification of misogynous tweets using multi-classifier fusion'. Together they form a unique fingerprint.

Cite this