Question categorization and classification using grammar based approach

Research output: Contribution to journalArticlepeer-review

1296 Downloads (Pure)

Abstract

Question-answering has become one of the most popular information retrieval applications. Despite that most question-answering systems try to improve the user experience and the technology used in finding relevant results, many difficulties are still faced because of the continuous increase in the amount of web content. Questions Classification (QC) plays an important role in question-answering systems, with one of the major tasks in the enhancement of the classification process being the identification of questions types. A broad range of QC approaches has been proposed with the aim of helping to find a solution for the classification problems; most of these are approaches based on bag-of-words or dictionaries. In this research, we present an analysis of the different type of questions based on their grammatical structure. We identify different patterns and use machine learning algorithms to classify them. A framework is proposed for question classification using a grammar-based approach (GQCC) which exploits the structure of the questions. Our findings indicate that using syntactic categories related to different domain-specific types of Common Nouns, Numeral Numbers and Proper Nouns enable the machine learning algorithms to better differentiate between different question types. The paper presents a wide range of experiments the results show that the GQCC using J48 classifier has outperformed other classification methods with 90.1% accuracy.
Original languageEnglish
Pages (from-to)1228-1243
Number of pages16
JournalInformation Processing and Management
Volume54
Issue number6
Early online date29 May 2018
DOIs
Publication statusPublished - 1 Nov 2018

Keywords

  • question classification
  • machine learning
  • text mining
  • text classification
  • natural language processing (NLP)

Fingerprint

Dive into the research topics of 'Question categorization and classification using grammar based approach'. Together they form a unique fingerprint.

Cite this