Abstract
The rapid expansion of online content and big data has precipitated an urgent need for efficient summarization techniques to swiftly comprehend vast textual documents without compromising their original integrity. Current approaches in Extractive Text Summarization (ETS) leverage the modeling of inter-sentence relationships, a task of paramount importance in producing coherent summaries. This study introduces an innovative model that integrates Graph Attention Networks (GATs) with Transformer-based Bidirectional Encoder Representations from Transformers (BERT) and Latent Dirichlet Allocation (LDA), further enhanced by Term Frequency-Inverse Document Frequency (TF-IDF) values, to improve sentence selection by capturing comprehensive topical information. Our approach constructs a graph with nodes representing sentences, words, and topics, thereby elevating the interconnectivity and enabling a more refined understanding of text structures. This model is stretched to Multi-Document Summarization (MDS) from Single-Document Summarization, offering significant improvements over existing models such as THGS-GMM and Topic-GraphSum, as demonstrated by empirical evaluations on benchmark news datasets like Cable News Network (CNN)/Daily Mail (DM) and Multi-News. The results consistently demonstrate superior performance, showcasing the model’s robustness in handling complex summarization tasks across single and multi-document contexts. This research not only advances the integration of BERT and LDA within a GATs but also emphasizes our model’s capacity to effectively manage global information and adapt to diverse summarization challenges.
Original language | English |
---|---|
Pages (from-to) | 3221-3242 |
Number of pages | 22 |
Journal | Computers, Materials and Continua |
Volume | 80 |
Issue number | 2 |
Early online date | 6 Aug 2024 |
DOIs | |
Publication status | Published - 15 Aug 2024 |
Keywords
- Summarization
- graph attention network
- bidirectional encoder representations from transformers
- Latent Dirichlet Allocation
- term frequency-inverse document frequency