Abstract
In real-life classification problems, prior information about the problem and expert knowledge about the domain are often used to obtain reliable and consistent solutions. This is especially true in fields where the data is ambiguous, such as text, in which the same words can be used in seemingly similar texts, but have a different meaning. A promising avenue for text classification is machine learning, which has been shown to perform well in a variety of applications including query classification and sentiment analysis. Many of the proposed approaches rely on the bag-of-words representation, which loses the information about the structure of the text. In this paper, we propose a Customised Grammar Framework for text classification, which exploits domain-related information and a new way to represent text as a series of syntactic categories forming syntactic patterns. The framework employs a formal grammar approach for transforming the text into the syntactic patterns representation. We applied the framework for the query classification problem and our results show that our approach outperforms previous ones in terms of classification performance.
Original language | English |
---|---|
Pages (from-to) | 164-180 |
Journal | Expert Systems with Applications |
Volume | 135 |
Early online date | 7 Jun 2019 |
DOIs | |
Publication status | Published - 30 Nov 2019 |
Keywords
- Natural Language Processing
- Information Retrieval
- Text Classification
- Query Classification
- MachineLearning