AbstractThe ancient technique of information hiding known as text steganography has enjoyed much research in recent years due to the rising popularity of social media platforms, and the abundant availability of online literature and other text as cover media for steganography. Whilst the majority of the research approaches have focused on manipulating or replacing text, in some form or another, to embed secret information, the utilisation of the structure of the document itself for such embedding has rarely been researched. Subsequently, research in the field of written English paragraph structure and the related error analysis is outdated. Therefore, a new approach in embedding secret messages in textual documents based on the splitting, merging, and resizing of paragraph text is proposed. The size comparison between adjacent paragraphs embeds one bit of information. We outline a basic and advanced model, and define the syntax and semantics of the embedding language. We also propose and applied two analysis applications, one using Machine Learning by classifying text based on their attributes, such as; words per paragraph, paragraph proportion based on sentences, and other written English data points. The other analysis technique utilised is the Chi-squared method, here the distribution of paragraphs sizes is analysed to see if it can statistically detect between clean and embedded text when and if the manipulation is applied.
The embedding model showed to be resilient against the analysis techniques. It detected around 50% of the bad corpus. However, we concluded that it extremely difficult to detect an embedding model that manipulates paragraphs and structure of novel texts.
|Date of Award||14 Jun 2023|
|Supervisor||Benjamin Yowell Yousif Aziz (Supervisor), Alaa Mohasseb (Supervisor) & Rinat Khusainov (Supervisor)|