DOI: 10.3724/SP.J.1047.2013.00854

Journal of Geo-information Science (地球信息科学学报) 2013/15:6 PP.854-861

Fuzzy C-means Clustering for GIS Data Based on Spatial Weighted Distance

Ordinary Euclidean distance is often used to measure similarity in fuzzy C-means, and in distance formula, different attribute features should have different weights according to their important degree. Moreover, for geospatial objects, clustering should consider not only similarity of attribute features, but also spatial proximity of the objects. Based on ordinary Euclidean distance, several forms of spatial weighted distance are proposed in this paper. Different distance formula imposes different weight on both two coordinate directions and each attribute feature. The weight vector is used to measure effect sizes of spatial location features and attribute features in similarity-based clustering and also measure degree of isotropy and anisotropy along X and Y coordinate directions. A fuzzy evaluation function derived from similarity matrix of spatial objects is used as optimization objective, and the weight vector is learned by gradient-descent algorithm based on dynamic learning rate. Then, spatial weighted distance is introduced to fuzzy C-means clustering to replace ordinary Euclidean distance. Meuse dataset, a spatial dataset as the application example, is analyzed by FCM clustering and the clustering number is set to 2-10. The clustering results are evaluated and compared via cluster validity indices including PC, PE and Xie-Beni. The analysis indicates that clustering performance based on spatial weighted distance is better than ordinary Euclidean distance and spatial common distance, and further, spatial distribution of the clustering results shows that, besides attribute features, spatial features such as locations also play important roles in spatial data clustering.

Key words:spatial weighted distance,GIS data,Fuzzy C-means clustering,gradient-descent learning algorithm

ReleaseDate:2015-04-17 13:34:29

[1] 孙吉贵, 刘杰, 赵连宇.聚类算法研究[J].软件学报, 2008, 19(1):48-61.

[2] 李德仁, 王树良, 史文中, 等.论空间数据挖掘和知识发现[J].武汉大学学报(信息科学版), 2001, 26(6):491-499.

[3] 李德仁, 王树良, 李德毅, 等.论空间数据挖掘和知识发现的理论与方法[J].武汉大学学报(信息科学版), 2002, 27(3):221-233.

[4] Yang M S. A survey of fuzzy clustering[J]. Mathematical and Computer Modeling, 1993, 18(11):1-16.

[5] 高新波, 谢维信.模糊聚类理论发展及应用的研究进展[J].科学通报, 1999, 44(21):2241-2251.

[6] 李洁, 高新波, 焦李成.基于特征加权的模糊聚类新算法[J].电子学报, 2008, 34(1):89-92.

[7] 李新运, 郑新奇, 闫弘文.坐标与属性一体化的空间聚类方法研究[J].地理与地理信息科学, 2004, 20(2):38-40.

[8] 焦利民, 洪晓峰, 刘耀林.空间和属性双重约束下的自组织空间聚类研究[J].武汉大学学报(信息科学版), 2011, 36(7):862-866.

[9] 王海起, 王劲峰.一种基于空间邻接关系的k-means聚类改进算法[J].计算机工程, 2006, 32(2):50-51, 75.

[10] Kaymak U, Setnes M. Fuzzy clustering with volume prototypes and adaptive cluster merging[J]. IEEE Transactions on Fuzzy Systems, 2002, 10(6):705-712.

[11] 王丽娟, 关守义, 王晓龙, 等.基于属性权重的Fuzzy C Mean算法[J].计算机学报, 2006, 29(10):1797-1803.

[12] Yeung D S, Wang X. Improving performance of similarity-based clustering by feature weight learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(4):556-561.

[13] Wang X, Wang Y, Wang L. Improving fuzzy c-means clustering based on feature-weight learning[J]. Pattern Recognition Letters, 2004, 25(10):1123-1132.

[14] 张敏, 于剑.基于划分的模糊聚类算法[J].软件学报, 2004, 15(6):858-868.

[15] 胡春春, 孟令奎, 谢文君, 等.空间数据模糊聚类的有效性评价[J].武汉大学学报(信息科学版), 2007, 32(8):740-743.

[16] 杨燕, 靳蕃.聚类有效性评价综述[J].计算机应用研究, 2008, 25(6):1630-1632, 1638.

[17] Hengl T. A practical guide to geostatistical mapping[M]. Amsterdam, Holland: University of Amsterdam, 2009, 117-148.

[18] 杨春成, 何列松, 谢鹏, 等.顾及距离与形状相似性的面状地理实体聚类[J].武汉大学学报(信息科学版), 2009, 34(3):335-338.

[19] 袁烨城, 周成虎, 覃彪, 等.多层次格网模型的近邻指数聚类生态区划算法与实验[J].地球信息科学学报, 2011, 13(1):1-11.

[20] Li W W, Goodchild M F, Church R. An efficient measure of compactness for two-dimensional shapes and its application in regionalization problems[J]. International Journal of Geographical Information Science, 2013, 27(6):1227-1250.