基于多尺度特征混合注意力的连续帧深度估计

首页 > 按月查看>2024年第4月 >104-111

基于多尺度特征混合注意力的连续帧深度估计
DOI:
                        
作者:
                        
作者单位:
作者简介:
通讯作者:
基金项目:

Continuous Frame Depth Estimation Based on Multi-scale Feature Mixed Attention Mechanism

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

目的估计获取拍摄物体到相机之间距离的深度信息是单目视觉 SLAM 中获取深度信息的方法,针对无监督单目深度估计算法出现精度不足以及误差较大的问题,提出基于多尺度特征融合的混合注意力机制的连续帧深度估计网络。方法通过深度估计和位姿估计的两种编码器解码器结构分别得到深度信息和 6 自由度的位姿信息,深度信息和位姿信息进行图像重建与原图损失计算输出深度信息,深度估计解码器编码器结构构成 U 型网络,位姿估计网络和深度估计网络使用同一个编码器,通过位姿估计解码器输出位姿信息;在编码器中使用混合注意力机制 CBAM 网络结合 ResNet 网络提取四个不同尺度的特征图,为了提升估计的深度信息轮廓细节在提取的每个不同尺度的特征中再进行分配可学习权重系数提取局部和全局特征再和原始特征进行融合。结果在 KITTI 数据集上进行训练同时进行误差以及精度评估,最后还进行了测试,与经典的 monodepth2 单目方法相比误差评估指标相对误差、均方根误差和对数均方根误差分别降低 0. 034、0. 129 和 0. 002,自制测试图片证明了网络的泛化性。结论使用混合注意力机制结合的 ResNet 网络提取多尺度特征,同时在提取的特征上进行多尺度特征融合提升了深度估计效果,改善了轮廓细节。

Abstract:

Objective Estimating the depth information to obtain the distance between the photographed object and the camera is the method to obtain the depth information in monocular vision SLAM. As unsupervised monocular depth estimation algorithms suffer from insufficient accuracy as well as large errors a continuous frame depth estimation network based on a hybrid attention mechanism with multi-scale feature fusion was proposed. Methods Information on depth and 6 degrees of freedom of pose were obtained by two encoder-decoder structures for depth estimation and pose estimation respectively. The depth information and the pose information were used for image reconstruction with the original image loss calculation to output the depth information. The decoder encoder structure for depth estimation formed a U-shaped network and the same encoder was used for both the pose estimation network and the depth estimation network and the pose information was output through the pose estimation decoder. The feature maps at four different scales were extracted in the encoder using a hybrid attention mechanism CBAM network combined with a ResNet network. For the enhancement of the estimated depth information contour details the extracted features of each different scale were then assigned learnable weight coefficients to extract local and global features and then fused with the original features. Results Evaluation of error and accuracy was performed on the KITTI dataset and finally testing was also performed. Compared with the classical monodepth2 monocular method the relative error root mean square error and log root mean square error in the error evaluation metrics were reduced by 0. 034 0. 129 and 0. 002 respectively and self-made test images demonstrated the generalizability of the network. Conclusion The multiscale features are extracted using a ResNet network combined with a hybrid attention mechanism while multiscale feature fusion on the extracted features enhances the depth estimation and improves the contour details.

参考文献

相似文献

引证文献

引用本文

郑宇航；曹雏清.基于多尺度特征混合注意力的连续帧深度估计[J].重庆工商大学学报（自然科学版）,2024,41(4):104-111
ZHENG Yuhang；CAO Chuqing . Continuous Frame Depth Estimation Based on Multi-scale Feature Mixed Attention Mechanism[J]. Journal of Chongqing Technology and Business University(Natural Science Edition）,2024,41(4):104-111

复制

文章指标

点击次数:
下载次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2024-07-05

引用本文

分享

文章指标

历史