Skip to content

Question categorization and classification using grammar based approach

Research output: Contribution to journal › Article

Question-answering has become one of the most popular information retrieval applications. Despite that most question-answering systems try to improve the user experience and the technology used in finding relevant results, many difficulties are still faced because of the continuous increase in the amount of web content. Questions Classification (QC) plays an important role in question-answering systems, with one of the major tasks in the enhancement of the classification process being the identification of questions types. A broad range of QC approaches has been proposed with the aim of helping to find a solution for the classification problems; most of these are approaches based on bag-of-words or dictionaries. In this research, we present an analysis of the different type of questions based on their grammatical structure. We identify different patterns and use machine learning algorithms to classify them. A framework is proposed for question classification using a grammar-based approach (GQCC) which exploits the structure of the questions. Our findings indicate that using syntactic categories related to different domain-specific types of Common Nouns, Numeral Numbers and Proper Nouns enable the machine learning algorithms to better differentiate between different question types. The paper presents a wide range of experiments the results show that the GQCC using J48 classifier has outperformed other classification methods with 90.1% accuracy.
Original languageEnglish
JournalInformation Processing and Management
Early online date29 May 2018
DOIs
StateEarly online - 29 May 2018

Documents

  • Question_Classification_accepted

    Accepted author manuscript (Post-print), 358 KB, PDF-document

    Due to publisher’s copyright restrictions, this document is not freely available to download from this website until: 29/11/18

    License: CC BY-NC-ND

Related information

Relations Get citation (various referencing formats)

ID: 10536660