融合 Swin Transformer 多尺度特征与池化空间特征的目标追踪算法研究
作者:

Research on Object Tracking Algorithm Integrating Swin Transformer Multi-scale Features and Pooling Spatial Features
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
    摘要:

    目的 目标追踪在视频监控、汽车自动驾驶、无人机航拍等领域有广泛应用,现有的基于 Transformer 模型的 自注意力操作是将 2D 特征转换为 1D 序列,会忽略目标对象的空间先验知识,导致追踪效果不佳,针对这一问题, 提出一种名为 Pool-Swin Transformer Tracker ( PSTransT) 的追踪器。 方法 PSTransT 将 Swin Transformer 的每个阶段 与一个池化层相结合,实现了在不同尺度上充分提取特征的能力,同时保留了空间位置信息。 具体而言,PSTransT 利用基于跨尺度融合的 Swin Transformer 模型进行上下文建模,该模型在每个阶段都与一个池化层并联,这样可以 在保持特征丰富性的同时,有效地捕捉不同尺度上的空间特征。 此外,该方法采用基于 Transformer 的特征融合网 络,通过 Transformer 的自注意力机制,联合学习模板特征和搜索特征之间的关联性进行特征融合,以更好地捕捉被 追踪目标的动态变化和局部上下文信息。 结果 该方法在多个基准数据集上对 PSTransT 进行了广泛评估,其中在 LaSOT 上达到了 69. 2%的成功率,比 SiamPRN++高 19. 6%、在 GOT-10k 上达到了 72. 2%的平均重叠率( mAO) ,比 SiamPRN++高 20. 5%。 结论 实验结果表明: 在保留上下文信息的同时,保留空间先验信息对目标追踪性能有利, PSTransT 优于其他对比方法。

    Abstract:

    Objective Object tracking finds extensive applications in fields such as video surveillance autonomous driving and UAV aerial photography. Self-attention operations of existing Transformer-based models transform 2D features into 1D sequences which ignore spatial priors of target objects leading to poor tracking performance. To address this limitation this paper proposed a tracker named Pool-Swin Transformer Tracker PSTransT . Methods PSTransT integrated each stage of the Swin Transformer with a pooling layer enabling the effective extraction of features across different scales while preserving spatial positional information. Specifically PSTransT utilized a cross-scale fusion-based Swin Transformer model for contextual modeling where each stage was parallelized with a pooling layer to effectively capture spatial features at various scales while maintaining feature richness. Furthermore the method utilized a Transformer-based feature fusion network that leveraged the self-attention mechanism of the Transformer to jointly learn the correlations between template features and search features for feature fusion aiming to better capture the dynamic changes of the tracked target and local contextual information. Results The effectiveness of PSTransT was extensively evaluated on multiple benchmark datasets. This method achieved a success rate of 69. 2% on LaSOT which was 19. 6% higher than SiamPRN++ and reached a mean average overlap mAO of 72. 2% on GOT-10k surpassing SiamPRN++ by 20. 5%. Conclusion Experimental results demonstrate that preserving spatial prior information alongside contextual information benefits object tracking performance. PSTransT outperforms other comparative methods.

    参考文献
    相似文献
    引证文献
引用本文

陈苗苗.融合 Swin Transformer 多尺度特征与池化空间特征的目标追踪算法研究[J].重庆工商大学学报(自然科学版),2025,42(3):110-117
CHEN Miaomiao. Research on Object Tracking Algorithm Integrating Swin Transformer Multi-scale Features and Pooling Spatial Features[J]. Journal of Chongqing Technology and Business University(Natural Science Edition),2025,42(3):110-117

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 在线发布日期: 2025-05-14
×
2024年《重庆工商大学学报(自然科学版)》影响因子显著提升