With the onset of massive cosmological data collection through mediums such as the Sloan Digital Sky Survey (SDSS), galaxy classification has been accomplished for the most part with the help of citizen science communities like Galaxy Zoo. However, an analysis of one of the Galaxy Zoo morphological classification data sets has shown that a significant majority of all classified galaxies are, in fact, labelled as "Uncertain". This has driven us to conduct experiments with data obtained from the SDSS database using each galaxy's right ascension and declination values, together with the Galaxy Zoo morphology class label, and the k-means clustering algorithm. This paper identifies the best attributes for clustering using a heuristic approach and, accordingly, applies an unsupervised learning technique in order to improve the classification of galaxies labelled as "Uncertain" and increase the overall accuracies of such data clustering processes. Through this heuristic approach, it is observed that the accuracy of classes-to-clusters evaluation, by selecting the best combination of attributes via information gain, is further improved by approximately 10-15%. An accuracy of 82.627% was also achieved after conducting various experiments on the galaxies labelled as "Uncertain" and replacing them back into the original data set. It is concluded that a vast majority of these galaxies are, in fact, of spiral morphology with a small subset potentially consisting of stars, elliptical galaxies or galaxies of other morphological variants.
|Publication status||Published - 9 Jun 2013|
|Event||The 12th International Conference on Artificial Intelligence and Soft Computing ICAISC 2013 - Zakopane, Poland|
Duration: 9 Jun 2013 → 13 Jun 2013
|Conference||The 12th International Conference on Artificial Intelligence and Soft Computing ICAISC 2013|
|Period||9/06/13 → 13/06/13|