Scalable clustering of lines
Journal of Decision Making and Healthcare, Volume 1, Issue 1, June 2024, Pages: 45–56
AKSHAY SHARMA
Department of Mathematical Sciences, Indian Institute of Technology (BHU), Varanasi, Uttar Pradesh 221005, India
DEBDAS GHOSH
Department of Mathematical Sciences, Indian Institute of Technology (BHU), Varanasi, Uttar Pradesh 221005, India
WITOLD PEDRYCZ
Department of Electrical and Computer Engineering, University of Alberta, Alberta T6G 1H9, Canada
Abstract
We introduce a scalable algorithm for clustering lines and affine subspaces of fixed dimensions via data reduction using a coreset approximating scheme. A key characteristic of the proposed algorithm is exploiting a scatter-then-aggregate scheme for data partitioning, which allows to maintain a constant coreset approximation size. This key feature enables a novel algorithm which streams as well as distributes workload. We study how the proposed algorithm scales in a practical setting and compare it with other clustering methods. Results of experiments executed on the PARAM supercomputer confirm that the time complexity of algorithm is log-linear in problem size and decreases linearly as the number of processors increase. Source code for the algorithm, the experiments, and other potential applications is provided.
Cite this Article as
Akshay Sharma, Debdas Ghosh and Witold Pedrycz, Scalable clustering of lines, Journal of Decision Making and Healthcare, 1 (2024), 45–56