一种基于 SwiftNet 面向室内 RGBD 场景的高效语义分割算法

首页 > 按月查看>2025年第3月 >84-93

一种基于 SwiftNet 面向室内 RGBD 场景的高效语义分割算法
DOI:
                        
                    
作者:
                        王 博,许 钢,苏世林王 博,许 钢,苏世林
安徽工程大学 电气工程学院,安徽 芜湖 241000
在知网中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
基金项目:

An Efficient Semantic Segmentation Algorithm for Indoor RGBD Scenes Based on SwiftNet

Author:

WANG Bo, XU Gang, SU Shilin
WANG Bo, XU Gang, SU Shilin
School of Electrical Engineering Anhui Polytechnic University Anhui Wuhu 241000 China
在知网中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

目的针对室内场景中的复杂光照、多样化的材质以及空间结构,现有的 RGBD 语义分割算法未能充分利用深度图像提供的形状信息,且计算成本高等问题,提出一种基于 SwiftNet 面向室内 RGBD 场景高效语义分割方法。方法首先,在轻量级多尺度道路 RGB 场景语义分割算法( SwiftNet) 中引入深度图像,通过利用深度图像的颜色稳定性和其为每个像素提供的到相机的距离信息,能够降低光线、颜色和距离等因素对分割结果的影响;然后,针对深度图像的几何形状特征进行专门提取,把深度特征分解为位置分量和形状分量,同时引入两个可学习权重以独立地与它们协作,再对这两个分量的重新加权组合应用卷积获取深度数据中固有的几何形状信息,不会在推理阶段引入计算和内存增加;最后,为了更快地捕捉更丰富的上下文信息,改进深度聚合金字塔池化模块使其并行提取上下文信息,称为快速聚合金字塔池化模块( FAPPM) 。结果通过在公共室内数据集 NYUv2 和 SUNRGBD 上的评估实验结果表明:相较于当前表现良好的 ESANet 模型,在两数据集上分别获得的 2. 21%和 3. 2%的 MIoU 提升,同时能够达到 33. 36 的 FPS。结论验证了该算法在处理复杂的室内环境语义分割中展现出的高效与准确性,为室内应用的后续智能机器人任务提供了良好的支持。

关键词:RGBD 语义分割;形状感知卷积;室内场景; 特征融合;深度学习

Abstract:

Objective Existing RGBD semantic segmentation algorithms fail to fully utilize shape information provided by depth images and suffer from high computational costs particularly for complex lighting diverse materials and spatial structures in indoor scenes. This paper proposed an efficient semantic segmentation method for indoor RGBD scenes based on SwiftNet. Methods Firstly in the SwiftNet a lightweight multi-scale road RGB scene semantic segmentation algorithm depth images were incorporated. By leveraging the color stability of depth images and the distance information provided for each pixel relative to the camera this approach reduced the impact of factors such as lighting color variations and distances on segmentation results. Next a specialized extraction of geometric shape features from depth images was conducted. Depth features were decomposed into positional components and shape components with two learnable weights introduced to independently collaborate with them. Convolution operations were then applied for the reweighting and combination of these two components securing the intrinsic geometric shape information from the depth data without incurring additional computation and memory during the inference phase. Finally to capture richer contextual information more rapidly the depth aggregation pyramid pooling module was enhanced to extract context information in parallel referred to as the Fast Aggregation Pyramid Pooling Module FAPPM . Results Through evaluation experiments on the NYUv2 and SUNRGBD indoor datasets the results demonstrated that compared with the current well-performing ESANet model the proposed approach achieved improvements of 2. 21% and 3. 2% in mean intersection over union MIoU on these datasets respectively. Furthermore it achieves a processing speed of 33. 36 frames per second FPS . Conclusion The validation confirms the algorithm?? s efficiency and accuracy in handling complex indoor semantic segmentation tasks providing solid support for subsequent intelligent robot tasks in indoor applications.

Key words:RGBD semantic segmentation shape-aware convolution indoor scene feature fusion deep learning

引用本文

王博,许钢,苏世林.一种基于 SwiftNet 面向室内 RGBD 场景的高效语义分割算法[J].重庆工商大学学报（自然科学版）,2025,42(3):84-93
WANG Bo, XU Gang, SU Shilin. An Efficient Semantic Segmentation Algorithm for Indoor RGBD Scenes Based on SwiftNet[J]. Journal of Chongqing Technology and Business University(Natural Science Edition）,2025,42(3):84-93

复制

文章指标

点击次数:
下载次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2025-05-14

引用本文

分享

文章指标

历史