| 摘要: |
| 目的 针对传统遥感图像目标检测模型在无人机、卫星等低算力场景下难以部署的问题,实现保持检测精度
的同时,降低模型的参数量,提出一种基于 DETR 的轻量级遥感图像目标检测算法。 方法 该模型首先采用
EfficientViT 特征提取模块作为轻量级骨干网络,用于图像特征提取和筛选;同时,设计了一个轻量级高效混合编码
器,旨在降低模型参数量和计算量的同时,保持检测精度,该编码器由 S-AIFI 模块和 MSFM 模块组成,其中 S-AIFI
模块专注于处理深层特征,以增强对特征信息的聚合能力。 而 MSFM 模块通过多尺度特征融合提高模型在遥感图
像中对不同大小目标的检测能力;最后,引入了 shape-IoU 损失函数,以进一步提高模型的检测精度。 结果 在
DOTA-v1 数据集和 SIMD 数据集上进行实验,该模型的 mAP 达到了 75. 5%及 81. 9%,其参数量降低到了 10. 3 M。
结论 训练后的模型具有较小的内存占用和参数量,适用于计算资源有限的遥感图像处理应用场景。 |
| 关键词: 遥感图像 轻量级网络 efficientvit 多特征融合 |
| DOI: |
| 分类号: |
| 基金项目: |
|
| Lightweight Remote Sensing Image Object Detection Algorithm Based on DETR |
|
ZHOU Mengrana,WANG Aob
|
|
a. School of Electrical and Information Engineering b. School of Computer Science and Engineering Anhui University of
Science and Technology Huainan 232000 Anhui China
|
| Abstract: |
| Objective To address the deployment challenge of traditional object detection models in low-computation
scenarios e. g. drones and satellites this study proposes a lightweight remote sensing image object detection algorithm
based on DETR Detection Transformer which reduces model complexity while preserving detection accuracy. Methods
Firstly the proposed model employed an EfficientViT feature extraction module as a lightweight backbone for image feature
extraction and selection. Then a lightweight and efficient hybrid encoder was designed to reduce the number of
parameters and computational cost of the model while maintaining detection accuracy. This encoder comprised two key
components the S-AIFI Slim-Attention-based Intrascale Feature Interaction module which focused on processing deep
features to enhance contextual aggregation of feature information and the MSFM Multi-Scale Feature Fusion Module
which improved detection capability for objects of varying sizes in remote sensing images through effective multi-scale
fusion. Furthermore a shape-IoU loss function was incorporated to refine the detection precision of the model.
Results Experiments on the DOTA-v1 and SIMD datasets showed that the model achieved mean average precision mAP
scores of 75. 5% and 81. 9% respectively with its parameter count reduced to 10. 3 M. Conclusion The trained model
exhibits a small memory footprint and low parameter count making it suitable for remote sensing image processing
applications with limited computational resources. |
| Key words: remote sensing image lightweight network EfficientViT multi-feature fusion |