Scaling up data mining techniques to large datasets using parallel and distributed processing

F. Stahl, M. Gaber, Max Bramer

Research output: Chapter in Book/Report/Conference proceedingChapter (peer-reviewed)peer-review


Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.
Original languageEnglish
Title of host publicationBusiness Intelligence and Performance Management: Theory, Systems, and Industrial Applications
EditorsP. Rausch, A. Sheta, A. Ayesh
Place of PublicationBerlin
Number of pages20
ISBN (Print)9781447148654
Publication statusPublished - 2013


Dive into the research topics of 'Scaling up data mining techniques to large datasets using parallel and distributed processing'. Together they form a unique fingerprint.

Cite this