重庆工商大学学报（自然科学版）

引用本文:	陈苗苗.融合 Swin Transformer 多尺度特征与池化空间特征的目标追踪算法研究(J/M/D/N,J:杂志，M：书，D：论文，N：报纸).期刊名称,2025，42（3）：110-117
	CHEN X. Adap tive slidingmode contr ol for discrete2ti me multi2inputmulti2 out put systems[ J ]. Aut omatica, 2006, 42(6): 4272-435

【打印本页】【下载PDF全文】【查看/发表评论】【EndNote】【RefMan】【BibTex】

←前一篇|后一篇→

过刊浏览高级检索

本文已被：浏览 768次下载 664次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
融合 Swin Transformer 多尺度特征与池化空间特征的目标追踪算法研究
陈苗苗
安徽理工大学计算机科学与工程学院,安徽淮南 232001

摘要:

目的目标追踪在视频监控、汽车自动驾驶、无人机航拍等领域有广泛应用,现有的基于 Transformer 模型的自注意力操作是将 2D 特征转换为 1D 序列,会忽略目标对象的空间先验知识,导致追踪效果不佳,针对这一问题, 提出一种名为 Pool-Swin Transformer Tracker ( PSTransT) 的追踪器。方法 PSTransT 将 Swin Transformer 的每个阶段与一个池化层相结合,实现了在不同尺度上充分提取特征的能力,同时保留了空间位置信息。具体而言,PSTransT 利用基于跨尺度融合的 Swin Transformer 模型进行上下文建模,该模型在每个阶段都与一个池化层并联,这样可以在保持特征丰富性的同时,有效地捕捉不同尺度上的空间特征。此外,该方法采用基于 Transformer 的特征融合网络,通过 Transformer 的自注意力机制,联合学习模板特征和搜索特征之间的关联性进行特征融合,以更好地捕捉被追踪目标的动态变化和局部上下文信息。结果该方法在多个基准数据集上对 PSTransT 进行了广泛评估,其中在 LaSOT 上达到了 69. 2%的成功率,比 SiamPRN++高 19. 6%、在 GOT-10k 上达到了 72. 2%的平均重叠率( mAO) ,比 SiamPRN++高 20. 5%。结论实验结果表明: 在保留上下文信息的同时,保留空间先验信息对目标追踪性能有利, PSTransT 优于其他对比方法。

关键词: 目标追踪 Swin Transformer 池化层上下文建模

DOI：

分类号:

基金项目:

Research on Object Tracking Algorithm Integrating Swin Transformer Multi-scale Features and Pooling SpatialFeatures

CHEN Miaomiao

School of Computer Science and Engineering Anhui University of Science and Technology Anhui Huainan 232001 China

Abstract:

Objective Object tracking finds extensive applications in fields such as video surveillance autonomous driving and UAV aerial photography. Self-attention operations of existing Transformer-based models transform 2D features into 1D sequences which ignore spatial priors of target objects leading to poor tracking performance. To address this limitation this paper proposed a tracker named Pool-Swin Transformer Tracker PSTransT . Methods PSTransT integrated each stage of the Swin Transformer with a pooling layer enabling the effective extraction of features across different scales while preserving spatial positional information. Specifically PSTransT utilized a cross-scale fusion-based Swin Transformer model for contextual modeling where each stage was parallelized with a pooling layer to effectively capture spatial features at various scales while maintaining feature richness. Furthermore the method utilized a Transformer-based feature fusion network that leveraged the self-attention mechanism of the Transformer to jointly learn the correlations between template features and search features for feature fusion aiming to better capture the dynamic changes of the tracked target and local contextual information. Results The effectiveness of PSTransT was extensively evaluated on multiple benchmark datasets. This method achieved a success rate of 69. 2% on LaSOT which was 19. 6% higher than SiamPRN++ and reached a mean average overlap mAO of 72. 2% on GOT-10k surpassing SiamPRN++ by 20. 5%. Conclusion Experimental results demonstrate that preserving spatial prior information alongside contextual information benefits object tracking performance. PSTransT outperforms other comparative methods.

Key words: object tracking Swin Transformer pool layer contextual modeling