Evidence identification in heterogeneous data using clustering

Hussam Mohammed, Nathan Clarke, Fudong Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

236 Downloads (Pure)


Digital forensics faces several challenges in examining and analyzing data due to an increasing range of technologies at people’s disposal. The investigators find themselves having to process and analyze many systems manually (e.g. PC, laptop, Smartphone) in a single case. Unfortunately, current tools such as FTK and Encase have a limited ability to achieve the automation in finding evidence. As a result, a heavy burden is placed on the investigator to both find and analyze evidential artifacts in a heterogenous environment. This paper proposed a clustering approach based on Fuzzy C-Means (FCM) and K-means algorithms to identify the evidential files and isolate the non-related files based on their metadata. A series of experiments using heterogenous real-life forensic cases are conducted to evaluate the approach. Within each case, various types of metadata categories were created based on file systems and applications. The results showed that the clustering based on file systems gave the best results of grouping the evidential artifacts within only five clusters. The proportion across the five clusters was 100% using small configurations of both FCM and K-means with less than 16 % of the non-evidential artifacts across all cases – representing a reduction in having to analyze 84% of the benign files. In terms of the applications, the proportion of evidence was more than 97%, but the proportion of benign files was also relatively high based upon small configurations. However, with a large configuration, the proportion of benign files became very low less than 10%. Successfully prioritizing large proportions of evidence and reducing the volume of benign files to be analyzed, reduces the time taken and cognitive load upon the investigator.
Original languageEnglish
Title of host publicationARES 2018 Proceedings of the 13th International Conference on Availability, Reliability and Security
PublisherAssociation for Computing Machinery (ACM)
Number of pages8
ISBN (Print)978-1-4503-6448-5
Publication statusPublished - 30 Aug 2018
EventARES 2018 International Conference on Availability, Reliability and Security - Hamburg, Germany
Duration: 27 Aug 201830 Aug 2018


ConferenceARES 2018 International Conference on Availability, Reliability and Security


  • Digital forensics
  • heterogeneous data
  • clustering algorithms
  • FCM
  • K-means


Dive into the research topics of 'Evidence identification in heterogeneous data using clustering'. Together they form a unique fingerprint.

Cite this