Fuzzy Text Segmentation Using Syntactic Features for Rhetorical Structure Theory

  • Omar Ali

Student thesis: Doctoral Thesis


Text segmentation is task often overlooked in NLP. The process of pre-digesting text effectively, such that NLP and machine-learning models can efficiently use it is a nuanced and delicate process. If segmented spans are too long, they may contained information that will hinder or hoodwink the model processing them, however, if the segments are too short one may lose the important context that glues those particular words together.
Moreover, different models my require text to be segmented differently to other model to properly carry out their task. For example, if an analyst want to boil a large piece of text down to its constituent sentences, we will need our segmentations to reflect the sentences we indent to return. Alternatively, if we want to determine topical shifts in large textual inputs, our segments will need to be of paragraph-length.
FuzzySeg is presented as a solution to the process of text segmentation. FuzzySeg aims to adopt fuzzy systems to improve upon rule-based and heuristic approaches to text segmentation. Segments of text are retrieved by means of boundary insertions carried our by the model. Inputs, derived from syntax parse trees generated from the text, determine the levels of cohesion at particular intervals. The fuzzy system aims to take these inputs to determine the probability of a boundary.
Furthermore, the aim is to build on this method in future work with the goal of presenting a multifaceted segmentation approach that is applicable across various domains that require segmentation e.g. text summarisation and rhetorical structure theory.
This is made possible due to the nature of fuzzy rule-generation capabilities that are created based on the syntactic data retrieved from inputted text. This data is subsequently used to train the model allowing us to produce the segmentation outputs.
In this work, this work’s function is to focus mainly on the extension of the work sur- rounding the applications of rhetorical structure theory for smartly weighting sentiment- carrying text. FuzzySeg aims to better segment the text within a key stage of rhetorical structure theory, with the aims of improving the accuracy of conventional sentiment anal- ysis methods.
Summarised, briefly, this work contributes first and foremost to the field of text seg- mentation and fuzzy system through the introduction of fuzzy text segmentation – a novel union of both fields. Furthermore, this work focuses on the contribution to the province of rhetorical structure theory by means of introducing our presented fuzzy segmentation as the first stage of rhetorical structure theory parsing. Finally, the goal is apply the work in a novel practical capacity by means of using our fuzzy-segmentation-enhanced model for the use of sentiment analysis. The novel theoretical application for microaggression detection is also explore.
The following thesis is presented as follows: the previously established concepts that were used to develop our understanding of segmentation and fuzzy systems are outlined first. Secondly, this work moves to outlining the motivations together with our overall position and reasons for the proposed method. The model, its components, and the validation metrics are then outlined, followed by descriptions on how the inputs and architecture for the fuzzy system are derived. The applications of the segmentation model together with the future work to be carried out on top of this model as well as alternative applications our model or similar models can be used for i.e. microaggression detection are discussed and concluded.
Date of Award14 Feb 2023
Original languageEnglish
SupervisorAlexander Gegov (Supervisor), Ella Haig (Supervisor) & Rinat Khusainov (Supervisor)

Cite this