引用本文:刘凤, 戴家佳, 胡阳.基于局部密度离群点检测k-means算法(J/M/D/N,J:杂志,M:书,D:论文,N:报纸).期刊名称,2021,38(4):30-35
CHEN X. Adap tive slidingmode contr ol for discrete2ti me multi2inputmulti2 out put systems[ J ]. Aut omatica, 2006, 42(6): 4272-435
【打印本页】   【下载PDF全文】   查看/发表评论  【EndNote】   【RefMan】   【BibTex】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 634次   下载 1433 本文二维码信息
码上扫一扫!
分享到: 微信 更多
基于局部密度离群点检测k-means算法
刘凤, 戴家佳, 胡阳
贵州大学 数学与统计学院,贵阳 550025
摘要:
针对数据集的聚类过程容易受到离群值的影响这一问题,提出了局部密度离群值检测k-means算法,即先对数据集使用局部密度离群值检测方法检测离群值,先把离群值去除,再进行k-means聚类,算法的有效性通过Davies-Bouldin指标(DB)、Dunn指标和Silhouette指标进行评价,在人工生成的数据集与UCI数据集上验证,去除离群值,再使用k-means算法得到的聚类结果相比原始数据集进行k-means算法聚类结果较好,并且用在疫情数据分析上,对安徽省、北京市、福建省、广东省等24个省、市、自治区2020年2月18日新型冠状病毒肺炎确诊人数进行聚类分析,得到的去除离群值在使用k-means算法相比原始数据集进行k-means算法聚类结果较好,该结果能帮助更好地在实际中怎么去做决策以及更好地降低经济损失。
关键词:  k-means  离群点  LOF  评价指标
DOI:
分类号:
基金项目:
The k-means Algorithm Based on Local Density Outlier Detection
LIU Feng, DAI Jia-jia, HU Yang
School of Mathematics and Statistics, Guizhou University, Guiyang 550025, China
Abstract:
In view of that the clustering process of data set is easily affected by outliers, the local density outlier detection k-means algorithm is proposed. The proposed method firstly detects the outliers of the data set by using local density outlier detection method, removes the outliers at first and then conducts k-means clustering. The validity of the algorithm is evaluated by Davies-Bouldin index, Dunn index and Silhouette index and is verified by artificial data set and UCI data set, and the outliers are removed. The obtained clustering results by using k-means algorithm are better than original data set k-means algorithm clustering results, this method is used for COVID-19 epidemic data analysis and the clustering analysis of the method is conducted on the confirmed infected number of COVID-19 in 24 provinces, municipalities and autonomous regions such as Anhui, Beijing, Fujian, Guangdong and so on on February 18, 2020. The clustering results using k-means algorithm by removing outliers are better than the clustering results of original data set using k-means algorithm, and the results can be conducive to how to make decision in practical work and better reduce economic cost.
Key words:  k-means  outliers  LOF  evaluation index
系统正在查找本文的参考文献,请稍候...
系统正在查找本文的被引信息,请稍候...
系统正在获取相似文献,请稍候...
重庆工商大学学报(自然科学版) 版权所有
地址:中国 重庆市 南岸区学府大道19号 重庆工商大学学术期刊社 邮编:400067
电话:023-62769495 传真:
您是第4555415位访客
关注微信二维码