Journal ID : TRKU-29-07-2020-10949
[This article belongs to Volume - 62, Issue - 07]
Total View : 343

Title : An Efficient Algorithm with Insensitive Seed Selection for Categorical Clustering

Abstract :

One of the main issues with k-means-type algorithms is their sensitivity to seeding selection. Typically, good seeding selection leads to good clustering results. This study provides supporting evidence that the recent k-approximate modal haplotype (AMH)-type algorithm is insensitive to seed selections for clustering categorical data, compared with its counterpart, the fuzzy k-modes-type algorithm. The k-AMH algorithm demonstrates its advantages using six real-world datasets, obtaining high minimum, maximum, and median scores compared with those obtained by the fuzzy k-modes algorithms as verified using analysis of variance and t-tests. Hence, the k-AMH-type algorithm provided statistically significantly different scores at a 5% significance level, compared with the fuzzy k-modes-type algorithm. However, the t-test showed that the k-AMH-type algorithm did not show a significant difference, compared with randomized or k-means++ seeding. Therefore, with insensitive seed selection, the k-AMH-type algorithm could be used to develop a categorical clustering tool

Full article