An ontology-based web crawling approach for the retrieval of materials in the educational domain

Mohammed Essmat Ibrahim, Linda Yang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

276 Downloads (Pure)

Abstract

As the web continues to be a huge source of information for various domains, the information available is rapidly increasing. Most of this information is stored in unstructured databases and therefore searching for relevant information becomes a complex task and the search for pertinent information within a specific domain is time-consuming and, in all probability, results in irrelevant information being retrieved. Crawling and downloading pages that are related to the user’s enquiries alone is a tedious activity. In particular, crawlers focus on converting unstructured data and sorting this into a structured database. In this paper, among others kind of crawling, we focus on those techniques that extract the content of a web page based on the relations of ontology concepts. Ontology is a promising technique by which to access and crawl only related data within specific web pages or a domain. The methodology proposed is a Web Crawler approach based on Ontology (WCO) which defines several relevance computation strategies with increased efficiency thereby reducing the number of extracted items in addition to the crawling time. It seeks to select and search out web pages in the education domain that matches the user’s requirements. In WCO, data is structured based on the hierarchical relationship, the concepts which
are adapted in the ontology domain. The approach is flexible for application to crawler items for different domains by adapting user requirements in defining several relevance computation strategies with promising results.
Original languageEnglish
Title of host publicationProceedings of the 11th International Conference on Agents and Artificial Intelligence - Volume 2
Subtitle of host publicationICAART 2019
PublisherSciTePress
Pages900-906
Number of pages7
ISBN (Print)978-989-758-350-6
DOIs
Publication statusPublished - 14 Mar 2019
Event11th International Conference on Agents and Artificial Intelligence - Prague, Czech Republic
Duration: 19 Feb 201921 Feb 2019
http://www.icaart.org/Home.aspx

Conference

Conference11th International Conference on Agents and Artificial Intelligence
Abbreviated title ICAART 2019
Country/TerritoryCzech Republic
CityPrague
Period19/02/1921/02/19
Internet address

Keywords

  • web crawling
  • ontology
  • education domain

Fingerprint

Dive into the research topics of 'An ontology-based web crawling approach for the retrieval of materials in the educational domain'. Together they form a unique fingerprint.

Cite this