Abstract:In view of that the clustering process of data set is easily affected by outliers, the local density outlier detection k-means algorithm is proposed. The proposed method firstly detects the outliers of the data set by using local density outlier detection method, removes the outliers at first and then conducts k-means clustering. The validity of the algorithm is evaluated by Davies-Bouldin index, Dunn index and Silhouette index and is verified by artificial data set and UCI data set, and the outliers are removed. The obtained clustering results by using k-means algorithm are better than original data set k-means algorithm clustering results, this method is used for COVID-19 epidemic data analysis and the clustering analysis of the method is conducted on the confirmed infected number of COVID-19 in 24 provinces, municipalities and autonomous regions such as Anhui, Beijing, Fujian, Guangdong and so on on February 18, 2020. The clustering results using k-means algorithm by removing outliers are better than the clustering results of original data set using k-means algorithm, and the results can be conducive to how to make decision in practical work and better reduce economic cost.