Abstract:
The subspace clustering algorithm CLIQUE finds all subspace clusters including overlapping clusters existing in high dimensional datasets. CLIQUE consists of three main steps namely - (1) identification of subspaces that contain clusters, (2) identification of clusters and (3) generation of the minimal description for the clusters obtained in step two. In this paper, we have presented a method for speeding-up the first step of the CLIQUE algorithm. The proposed method is based on accessing the data from columns instead of rows. It is very efficient when there are many missing values in the high dimensional datasets given in the form of table. We have also proposed a depth-first method to find the maximal dense units, to further improve the performance of the first step.