| 引用本文: | 李 梦1,2
,应 豪1,2.基于数据宽度处理的药物性质分类预测神经网络模型(J/M/D/N,J:杂志,M:书,D:论文,N:报纸).期刊名称,2025,42(6):86-96 |
| CHEN X. Adap tive slidingmode contr ol for discrete2ti me multi2inputmulti2 out put systems[ J ]. Aut omatica, 2006, 42(6): 4272-435 |
|
| 摘要: |
| 目的 针对常规数据处理导致分类预测精度不高等问题,提出 Optuna-MLP-LightGBM 组合模型用于抗癌候
选药物的性质分类预测。 方法 针对收集的 1 974 种化合物(每个化合物各 729 个分子描述符),首先利用多层感知
机(MLP)对高维数据进行聚合处理,再采用跳转连接实现数据的宽度处理,将输出数据与输入数据合并组成宽度
数据集,以此提高数据的特征识别,同时避免有用信息的缺失从而提高信息的流通;然后,用 LightGBM 替换 MLP
神经网络中的分类层,可以更好地进行分类处理及避免过拟合问题,最后构建基于 Optuna 优化的 MLP-LightGBM
分类预测模型,用于候选药物的小肠上皮细胞渗透性(Caco-2)的分类预测。 结果 模型准确率、AUC 值和 F1 值分
别达到 91. 03%、97. 31 %和 90. 48 %,由消融实验可以发现,通过 MLP-LightGBM 实现数据宽度处理以及分类后,
模型分类效果相比 MLP 模型得到提升,3 种指标分别提升了 0. 51%、1. 22%和 0. 7%;与逻辑回归(LR)、Attentive
FP、MLP 等传统模型相比该模型能更好整合数据信息,其中与基模型相比平均增长幅度分别达到 5. 94%、5. 65%
和 6. 56%。 结论 由于跳接处理使 MLP 网络可以达到特征的有效提取和扩充数据集的目的,同时引入机器学习可
以更好地提高分类精度,因此在药物高通量筛选中可以成为重要的辅助工具。 |
| 关键词: MLP 神经网络 LightGBM Optuna 自动化调参 抗癌候选药物 分类预测 |
| DOI: |
| 分类号: |
| 基金项目: |
|
| Neural Network Model for Classification Prediction of Drug Properties Based on Data Width Processing |
|
LI Meng
1 2
YING Hao
1 2
|
|
1. School of Mathematics and Statistics Chongqing Technology and Business University Chongqing 400067 China
2. Chongqing Key Laboratory of Social Economic and Applied Statistics Chongqing 400067 China
|
| Abstract: |
| Objective Aiming at problems such as low accuracy in classification prediction by conventional data
processing an Optuna-MLP-LightGBM combination model for predicting the properties of anticancer candidate drugs was
proposed. Methods A total of 1 974 compounds 729 molecular descriptors for each compound were collected. Firstly
a multi-layer perceptron MLP was used to aggregate high-dimensional data. A jump connection was used to realize the
width processing of the data. The output data and input data were merged to form a width data set. This enhanced feature
recognition and prevented the loss of useful information thereby improving information flow. Then LightGBM replaced
the classification layer in the MLP neural network for better classification and to avoid overfitting issues. Finally theMLP-LightGBM classification prediction model based on Optuna optimization was constructed to predict the classification of
the permeability of the small intestinal epithelial cells of the candidate drug Caco-2 . Results The accuracy AUC and
F1
values of the model reached 91. 03% 97. 31 % and 90. 48 % respectively. Through ablation experiments it was
found that the model?? s classification performance has been improved compared with the MLP model after implementing
data width processing and classification with MLP-LightGBM with increases of 0. 51% 1. 22% and 0. 7% in the three
metrics respectively. Compared with traditional models such as Logistic Regression LR Attentive FP and MLP this
model can better integrate data information with average growth rates compared with the base model of 5. 94% 5. 65%
and 6. 56% respectively. Conclusion The jump-join processing enables the MLP network to effectively extract features
and expand datasets. Introducing machine learning can further improve classification accuracy. Therefore it can become
an important auxiliary tool in high-throughput drug screening. |
| Key words: MLP neural network LightGBM Optuna automatic parameter tuning anticancer candidate drugs
classification prediction |