Title: AUTOCLUST: Automatic clustering via boundary extraction for mining massive point-data sets

Date: 23 August 2000

Authors: Vladimir Estivill-Castro and Ickjai Lee

Link: http://www.geocomputation.org/2000/GC024/Gc024.htm


Widespread clustering methods require user-specified arguments and prior knowledge to produce their best results. This demands pre-processing and/or several trial and error steps. Both are extremely expensive and inefficient for massive data sets. The need to find best-fit arguments in semi-automatic clustering is not the only concern, the manipulation of data to find the arguments opposes the philosophy of ''let the data speak for themselves'' that underpins exploratory data analysis. Our new approach consists of effective and efficient methods for discovering cluster boundaries in point-data sets. The approach automatically extracts boundaries based on Voronoi modelling and Delaunay Diagrams. Parameters are not specified by users in our automatic clustering. Rather, values for parameters are revealed from the proximity structures of the Voronoi modelling, and thus, an algorithm, AUTOCLUST, calculates them from the Delaunay Diagram. This not only removes human-generated bias, but also reduces exploration time. The effectiveness of our approach allows us to detect not only clusters of different densities, but sparse clusters near to high-density clusters. Multiple bridges linking clusters are identified and removed. All this is performed within O( n log n) expected time, where n is the number of data points. We evaluate AUTOCLUST's time efficiency and clustering quality. We compare and contrast AUTOCLUST with other algorithms for clustering large geo-referenced sets of points. A series of detailed performance comparisons with both synthetic data sets and real data sets confirms the virtues of our approach.

Reference: Proceedings of the Fifth International Conference on GeoComputation, University of Greenwich's School of Earth and Environmental Sciences, Kent, UK, 23 - 25 August 2000. Papers published on CD-ROM. Produced by: R.J.Abrahart and B.H.Carlisle. Publisher: "GeoComputation CD-ROM". ISBN 0-9533477-2-9

