Abstract
Entities Extraction becomes very important for developing many applications of Natural Language Processing (NLP). In this paper, we present a new algorithm to extract entities from Arabic text. The approach uses the semi-structured knowledge source: Arabic Wikipedia to predict the words that constitutes an Arabic entity. Our method is generic and can be applied directly to other languages to extract entities. The proposed method has been designed to analyze Arabic text hierarchically with variable length N-gram. The experimental results have proven that the proposed system is very efficient in detecting entities from large set of Arabic news.
Original language | English |
---|---|
Title of host publication | International Conference on Informatics and Systems, INFOS 2016 |
Subtitle of host publication | Proceedings of the 10th International Conference on Informatics and Systems |
Publisher | Association for Computing Machinery (ACM) |
Pages | 56-60 |
Number of pages | 5 |
ISBN (Electronic) | 9781450340625 |
DOIs | |
Publication status | Published - 9 May 2016 |
Event | 10th International Conference on Informatics and Systems, INFOS 2016 - Cairo, Egypt Duration: 9 May 2016 → 11 May 2016 |
Conference
Conference | 10th International Conference on Informatics and Systems, INFOS 2016 |
---|---|
Country/Territory | Egypt |
City | Cairo |
Period | 9/05/16 → 11/05/16 |
Keywords
- Arabic Wikipedia
- Entity
- Information extraction
- N-gram
- Natural language processing