| 摘要: |
| 目的 目标追踪在视频监控、汽车自动驾驶、无人机航拍等领域有广泛应用,现有的基于 Transformer 模型的
自注意力操作是将 2D 特征转换为 1D 序列,会忽略目标对象的空间先验知识,导致追踪效果不佳,针对这一问题,
提出一种名为 Pool-Swin Transformer Tracker ( PSTransT) 的追踪器。 方法 PSTransT 将 Swin Transformer 的每个阶段
与一个池化层相结合,实现了在不同尺度上充分提取特征的能力,同时保留了空间位置信息。 具体而言,PSTransT
利用基于跨尺度融合的 Swin Transformer 模型进行上下文建模,该模型在每个阶段都与一个池化层并联,这样可以
在保持特征丰富性的同时,有效地捕捉不同尺度上的空间特征。 此外,该方法采用基于 Transformer 的特征融合网
络,通过 Transformer 的自注意力机制,联合学习模板特征和搜索特征之间的关联性进行特征融合,以更好地捕捉被
追踪目标的动态变化和局部上下文信息。 结果 该方法在多个基准数据集上对 PSTransT 进行了广泛评估,其中在
LaSOT 上达到了 69. 2%的成功率,比 SiamPRN++高 19. 6%、在 GOT-10k 上达到了 72. 2%的平均重叠率( mAO) ,比
SiamPRN++高 20. 5%。 结论 实验结果表明: 在保留上下文信息的同时,保留空间先验信息对目标追踪性能有利,
PSTransT 优于其他对比方法。 |
| 关键词: 目标追踪 Swin Transformer 池化层 上下文建模 |
| DOI: |
| 分类号: |
| 基金项目: |
|
| Research on Object Tracking Algorithm Integrating Swin Transformer Multi-scale Features and Pooling SpatialFeatures |
|
CHEN Miaomiao
|
|
School of Computer Science and Engineering Anhui University of Science and Technology Anhui Huainan 232001 China
|
| Abstract: |
| Objective Object tracking finds extensive applications in fields such as video surveillance autonomous
driving and UAV aerial photography. Self-attention operations of existing Transformer-based models transform 2D features
into 1D sequences which ignore spatial priors of target objects leading to poor tracking performance. To address this
limitation this paper proposed a tracker named Pool-Swin Transformer Tracker PSTransT . Methods PSTransT
integrated each stage of the Swin Transformer with a pooling layer enabling the effective extraction of features across
different scales while preserving spatial positional information. Specifically PSTransT utilized a cross-scale fusion-based
Swin Transformer model for contextual modeling where each stage was parallelized with a pooling layer to effectively
capture spatial features at various scales while maintaining feature richness. Furthermore the method utilized a
Transformer-based feature fusion network that leveraged the self-attention mechanism of the Transformer to jointly learn the
correlations between template features and search features for feature fusion aiming to better capture the dynamic changes of the tracked target and local contextual information. Results The effectiveness of PSTransT was extensively evaluated on
multiple benchmark datasets. This method achieved a success rate of 69. 2% on LaSOT which was 19. 6% higher than
SiamPRN++ and reached a mean average overlap mAO of 72. 2% on GOT-10k surpassing SiamPRN++ by 20. 5%.
Conclusion Experimental results demonstrate that preserving spatial prior information alongside contextual information
benefits object tracking performance. PSTransT outperforms other comparative methods. |
| Key words: object tracking Swin Transformer pool layer contextual modeling |