作者:李福祥 , 周 明 , 杨天浩
Authors:LI Fuxiang , ZHOU Ming , YANG Tianhao摘要:针对密度峰值聚类算法在处理分布不均匀数据集时聚类性能不佳且不能自动确定聚类中心的问题 , 提出基于共享邻域的密度峰值聚类算法(DPC-SN) 。首先 ,考虑数据点的局部邻域信息和数据点间的相关性 ,根据 共享邻域重新定义局部密度 ;其次 ,给出了新的决策阈值作为区分聚类中心和非聚类中心的临界值 , 自动获取聚类 中心;最后 ,在不同分布特征的合成数据集和 UCI 数据集进行实验验证 。结果表明 ,该算法聚类精度和总体性能优 于基于 K 近邻的密度峰值聚类(DPC-KNN)、原始密度峰值聚类(DPC)、K 均值聚类(K-means) 和基于密度的聚类 (DBSCAN)4 种算法。
Abstract:To solve the problem of poor clustering performance and automatic determination of cluster center inthe processing of unevenly distributed datasets, a density peak clustering algorithm based on shared neighborhood is proposed ( DPC-SN) . Firstly, considering the local neighborhood information of the data points and the correlation between data points, the local density is redefined according to the shared neighborhood. Secondly, a new decision threshold is given as the critical value to distinguish cluster centers from non-cluster centers, and the cluster centers are automatically obtained. Finally, experiments are performed on synthetic datasets and UCI datasets with different distribution characteristics. The results show that the clustering accuracy and overall performance of this algorithm are better than four algorithms: DPC-KNN, DPC, K-means and DBSCAN.
PDF全文下载地址:
可免费Download/下载PDF全文
删除或更新信息,请邮件至freekaoyan#163.com(#换成@)