| 引用本文: | 章涂义,刘三民,陈燕菲,余文韬,朱 健.面向非平衡数据流的重采样集成分类方法研究(J/M/D/N,J:杂志,M:书,D:论文,N:报纸).期刊名称,2025,42(3):34-43 |
| CHEN X. Adap tive slidingmode contr ol for discrete2ti me multi2inputmulti2 out put systems[ J ]. Aut omatica, 2006, 42(6): 4272-435 |
|
| 摘要: |
| 目的 类不平衡和概念漂移是数据流分类任务中的两个主要挑战,当它们同时发生时,将显著影响数据流分
类算法的性能,因此,针对传统数据流分类算法难以应对类别不平衡和概念漂移同时存在的问题,提出一种专注于
非平衡数据流的重采样集成模型。 方法 首先,设计一种适用于数据流的边界过采样方法,利用三角形重心的特点,
在边界样本内侧合成新样本,使得块中的少数类得到增强的同时,尽可能保持数据原有分布并且避免引入新的概
念,有效改善数据块中类别不平衡情况;在此基础上,融合时间衰减策略和加权集成策略,设计基于马修斯相关系
数作为权重的动态加权集成模型,解决概念漂移问题,同时增强分类挖掘模型的自适应性和健壮性。 结果 在 3 个
真实数据流和 6 个模拟数据流上的仿真实验结果表明:所提方法在非平衡数据流场景中,展现出对多数类和少数
类均有高效的识别能力,并且对突变和增量概念漂移都具有更好的漂移感知和适应能力,分类模型整体性能优于
对比算法。 结论 实验验证:所提方法构建出一种鲁棒的非平衡数据流分类模型,在处理非平衡数据流和适应两种
类型的概念漂移方面具有更好的优势。 |
| 关键词: 非平衡数据流 概念漂移 集成学习 马修斯相关系数 |
| DOI: |
| 分类号: |
| 基金项目: |
|
| Research on Resampling Ensemble Classification Method for Imbalanced Data Streams |
|
HANG Tuyi, LIU Sanmin, CHEN Yanfei, YU Wentao, ZHU Jian
|
|
School of Computer and Information Anhui University of Technology Anhui Wuhu 241000 China
|
| Abstract: |
| Objective Class imbalance and concept drift are two main challenges in data stream classification tasks. When
they occur simultaneously they significantly affect the performance of data stream classification algorithms. Therefore to
address the difficulty of traditional data stream classification algorithms in handling the simultaneous occurrence of class
imbalance and concept drift a resampling ensemble model focused on imbalanced data streams was proposed. Methods
Firstly a boundary oversampling method tailored for data streams was designed. By leveraging the characteristics of the
triangular center of gravity new samples were synthesized inside boundary samples to enhance the minority class within
the block while striving to maintain the original data distribution and avoid introducing new concepts. This effectively
improved the class imbalance in the data block. On this basis a dynamic weighted ensemble model based on Matthews
correlation coefficient as weights was designed by integrating the time decay strategy and weighted ensemble strategy. This
model solved the problem of concept drift and enhanced the adaptability and robustness of the classification mining model. |
| Key words: imbalanced data stream concept drift ensemble learning Matthews correlation coefficient |