Hierarchical N-gram algorithm for extracting Arabic entities

Eslam Amer, Heba M. Khalil, Tarek El-Shistawy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Entities Extraction becomes very important for developing many applications of Natural Language Processing (NLP). In this paper, we present a new algorithm to extract entities from Arabic text. The approach uses the semi-structured knowledge source: Arabic Wikipedia to predict the words that constitutes an Arabic entity. Our method is generic and can be applied directly to other languages to extract entities. The proposed method has been designed to analyze Arabic text hierarchically with variable length N-gram. The experimental results have proven that the proposed system is very efficient in detecting entities from large set of Arabic news.

Original languageEnglish
Title of host publicationInternational Conference on Informatics and Systems, INFOS 2016
Subtitle of host publicationProceedings of the 10th International Conference on Informatics and Systems
PublisherAssociation for Computing Machinery (ACM)
Pages56-60
Number of pages5
ISBN (Electronic)9781450340625
DOIs
Publication statusPublished - 9 May 2016
Event10th International Conference on Informatics and Systems, INFOS 2016 - Cairo, Egypt
Duration: 9 May 201611 May 2016

Conference

Conference10th International Conference on Informatics and Systems, INFOS 2016
Country/TerritoryEgypt
CityCairo
Period9/05/1611/05/16

Keywords

  • Arabic Wikipedia
  • Entity
  • Information extraction
  • N-gram
  • Natural language processing

Fingerprint

Dive into the research topics of 'Hierarchical N-gram algorithm for extracting Arabic entities'. Together they form a unique fingerprint.

Cite this