基于数据宽度处理的药物性质分类预测神经网络模型
DOI:
作者:
作者单位:

作者简介:

通讯作者:

基金项目:


Neural Network Model for Classification Prediction of Drug Properties Based on Data Width Processing
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    目的 针对常规数据处理导致分类预测精度不高等问题,提出 Optuna-MLP-LightGBM 组合模型用于抗癌候 选药物的性质分类预测。 方法 针对收集的 1 974 种化合物(每个化合物各 729 个分子描述符),首先利用多层感知 机(MLP)对高维数据进行聚合处理,再采用跳转连接实现数据的宽度处理,将输出数据与输入数据合并组成宽度 数据集,以此提高数据的特征识别,同时避免有用信息的缺失从而提高信息的流通;然后,用 LightGBM 替换 MLP 神经网络中的分类层,可以更好地进行分类处理及避免过拟合问题,最后构建基于 Optuna 优化的 MLP-LightGBM 分类预测模型,用于候选药物的小肠上皮细胞渗透性(Caco-2)的分类预测。 结果 模型准确率、AUC 值和 F1 值分 别达到 91. 03%、97. 31 %和 90. 48 %,由消融实验可以发现,通过 MLP-LightGBM 实现数据宽度处理以及分类后, 模型分类效果相比 MLP 模型得到提升,3 种指标分别提升了 0. 51%、1. 22%和 0. 7%;与逻辑回归(LR)、Attentive FP、MLP 等传统模型相比该模型能更好整合数据信息,其中与基模型相比平均增长幅度分别达到 5. 94%、5. 65% 和 6. 56%。 结论 由于跳接处理使 MLP 网络可以达到特征的有效提取和扩充数据集的目的,同时引入机器学习可 以更好地提高分类精度,因此在药物高通量筛选中可以成为重要的辅助工具。

    Abstract:

    Objective Aiming at problems such as low accuracy in classification prediction by conventional data processing an Optuna-MLP-LightGBM combination model for predicting the properties of anticancer candidate drugs was proposed. Methods A total of 1 974 compounds 729 molecular descriptors for each compound were collected. Firstly a multi-layer perceptron MLP was used to aggregate high-dimensional data. A jump connection was used to realize the width processing of the data. The output data and input data were merged to form a width data set. This enhanced feature recognition and prevented the loss of useful information thereby improving information flow. Then LightGBM replaced the classification layer in the MLP neural network for better classification and to avoid overfitting issues. Finally theMLP-LightGBM classification prediction model based on Optuna optimization was constructed to predict the classification of the permeability of the small intestinal epithelial cells of the candidate drug Caco-2 . Results The accuracy AUC and F1 values of the model reached 91. 03% 97. 31 % and 90. 48 % respectively. Through ablation experiments it was found that the model?? s classification performance has been improved compared with the MLP model after implementing data width processing and classification with MLP-LightGBM with increases of 0. 51% 1. 22% and 0. 7% in the three metrics respectively. Compared with traditional models such as Logistic Regression LR Attentive FP and MLP this model can better integrate data information with average growth rates compared with the base model of 5. 94% 5. 65% and 6. 56% respectively. Conclusion The jump-join processing enables the MLP network to effectively extract features and expand datasets. Introducing machine learning can further improve classification accuracy. Therefore it can become an important auxiliary tool in high-throughput drug screening.

    参考文献
    相似文献
    引证文献
引用本文

李 梦 ,应 豪.基于数据宽度处理的药物性质分类预测神经网络模型[J].重庆工商大学学报(自然科学版),2025,42(6):86-96
LI Meng YING Hao . Neural Network Model for Classification Prediction of Drug Properties Based on Data Width Processing[J]. Journal of Chongqing Technology and Business University(Natural Science Edition),2025,42(6):86-96

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-11-19
×
2024年《重庆工商大学学报(自然科学版)》影响因子显著提升