摘要: |
针对基因表达数据噪声大、冗余性较高,传统的NMF算法在基因表达数据聚类中的低效性问题,提出了一种平滑的l0范数约束的〖WTBX〗β〖WTBZ〗散度的矩阵分解与K means相结合的聚类算法,应用到基因表达数据当中;将平滑的l0范数约束引入到基于β散度的矩阵分解的目标函数中,从而提取有用特征信息用于聚类;最后通过实验比较,改进的算法平均聚类精度达到70%,比传统的NMF聚类算法精度提高了11%,聚类效果相较其他方法显著。 |
关键词: 基因表达数据 β散度 聚类 矩阵分解 |
DOI: |
分类号: |
基金项目: |
|
Smooth Lo Norm Constrainedβ-NMF and Its Application to Clustering of Gene Expression Data |
CUI Jian, YOU Chun-zhi
|
Abstract: |
Based on high noise and redundancy of gene expression data and that traditional NMF algorithm is inefficient in the clustering of gene expression data, a new clustering method of beta divergence matrix decomposition under the constraint of smooth lo norm and the combination K means is presented, and the new clustering method is applied to gene expression data. The smooth lo norm is introduced into the objective function of matrix decomposition based on beta divergence so as to extract the useful feature information for the clustering. Finally, compared by experiments, the average clustering accuracy of the improved algorithm reaches 70 percent, which is 11 percent higher than that of the traditional NMF clustering algorithm, and clustering effect is more significant than other methods. |
Key words: gene expression data beta divergence clustering matrix decomposition |