摘要: |
针对高维数据集,提出一种利用预测变量之间的图结构信息来改进稀疏逻辑回归模型的方法。该方法通过利用高维图结构数据或者重叠组结构来进行逻辑回归建模,即使预测变量的图结构未知,该方法仍适用,当图结构为某些特殊形式时,目前流行的方法,如Adaptive Lasso,(Overlapping)Group Lasso和岭回归都可以看作是该模型方法的特例。数值模拟和实例分析应用表明:该方法能有效地利用预测变量图结构信息,提高模型在估计、预测以及变量选择等方面的表现,并且该模型在有限样本情形下是有效的;该模型方法克服了数据集的维数问题,利用高维数据的图结构提高了稀疏逻辑回归模型的性能,可广泛应用于高通量基因数据集的疾病分类研究中。 |
关键词: 逻辑回归 高维数据 图结构 Lasso 稀疏性 |
DOI: |
分类号: |
基金项目: |
|
High-dimensional Logic Regression Model Based on Graph Structure of Predictive Variables |
HUANG Wen-jing, DENG Dan, DU Jie-lin, WU Ming-yue
|
School of Public Health and Management, Chongqing Medical University, Chongqing 400016, China
|
Abstract: |
For high-dimensional data sets, we propose a method to improve sparse logic regression model by using the graph structure information between predictive variables. In this method, logic regression modeling is carried out by using high-dimensional graph structure data or overlapping group structure, it is still applicable even if the graph structure of predictive variables is unknown. When graph structure is some special forms, all current popular methods such as Adaptive Lasso, (Overlapping) Group Lasso and ridge regression can be regarded as special cases of this method. Numerical simulation and real data analysis show that the proposed method can effectively use the graph structure information of predictive variables to improve the performance of the model in estimation, prediction, variable selection and so on. Moreover, the model is effective in the case of limited samples and overcomes the problem of the dimensionality of data sets, improves the performance of the sparse logic regression model by using the graph structure of high-dimensional data, and can be widely used in disease classification of high-throughput gene data sets. |
Key words: logic regression high-dimensional data graph structure Lasso sparseness |