CN116433904A

CN116433904A - Cross-modal RGB-D semantic segmentation method based on shape perception and pixel convolution

Info

Publication number: CN116433904A
Application number: CN202310347813.7A
Authority: CN
Inventors: 葛斌; 陆一鸣; 夏晨星; 朱序; 卢洋; 郭婷
Original assignee: Anhui University of Science and Technology
Current assignee: Anhui University of Science and Technology
Priority date: 2023-03-31
Filing date: 2023-03-31
Publication date: 2023-07-14

Abstract

The inventionThe invention belongs to the field of computer vision, and provides a cross-mode RGB-D semantic segmentation method based on shape perception, which comprises the following steps: 1) Acquiring an RGB-D data set for training and testing the task, and defining an algorithm target of the invention; 2) Constructing a shape perception and pixel convolution based RGB-D semantic segmentation network model by using a deep learning technology and a double encoder-decoder structure; 3) Constructing a cross-modal feature fusion network for generating multi-modal features; 4) The cross-modal characteristics are fused by a cross fusion method, so that the high-level semantic information of the multi-modal characteristics is enhanced; 5) In the deep labv3+ decoder, the output of the encoder is up-sampled to match the resolution with the features of the low level. The feature layer connection is convolved once by 3 multiplied by 3, and then activated by a sigmoid function to obtain a predicted semantic graph P _est The method comprises the steps of carrying out a first treatment on the surface of the 6) Predicted saliency map P _est Semantic segmentation map P with artificial annotation _GT Calculating loss; 7) Testing the test data set to generate a saliency map P _test And performance evaluation is performed using the evaluation index.

Description

A Cross-Modal RGB-D Semantic Segmentation Method Based on Shape Awareness and Pixel Convolution

技术领域：Technical field:

本发明涉及计算机视觉和图像处理领域，特别地涉及一种基于形状感知和像素卷积的跨模态RGB-D语义分割方法。The invention relates to the fields of computer vision and image processing, in particular to a cross-modal RGB-D semantic segmentation method based on shape perception and pixel convolution.

背景技术：Background technique:

语义分割涉及将一些原始数据作为输入并将它们转换为具有突出显示的感兴趣区域的掩膜，其中图像中的每个像素根据其所属的对象被分配类别ID。语义分割将属于同一目标的图像部分聚集在一起解决这个问题，从而扩展了其应用领域。与其他的基于图像的任务相比，语义分割是完全不同且先进的。简而言之，在计算机视觉领域，语义分割是基于全卷积的像素分类任务。Semantic segmentation involves taking some raw data as input and converting them into masks with highlighted regions of interest, where each pixel in the image is assigned a class ID according to the object it belongs to. Semantic segmentation solves this problem by clustering image parts belonging to the same object, thus expanding its application domain. Semantic segmentation is completely different and advanced compared to other image-based tasks. In short, in the field of computer vision, semantic segmentation is a pixel classification task based on full convolution.

单一的模态的RGB语义分割在面临复杂场景等挑战性因素，难以明确目标的轮廓从而精准的进行语义分割。并且难以准确和完整地从背景中准确定位出所有目标并分类。因此，为了解决这个问题，将深度(Depth)图像引入到语义分割，通过联合RGB图像和Depth图像相结合构成了RGB-D进行语义分割。The RGB semantic segmentation of a single modality is faced with challenging factors such as complex scenes, and it is difficult to clarify the outline of the target so as to accurately perform semantic segmentation. And it is difficult to accurately and completely locate and classify all objects from the background. Therefore, in order to solve this problem, depth (Depth) images are introduced into semantic segmentation, and RGB-D is formed by combining RGB images and Depth images for semantic segmentation.

由于Depth Map主要能够提供目标边缘等信息。将Depth图引入到语义分割任务中，RGB图提供了全局信息，而深度图提供轮廓信息更完备，表达几何结构和距离信息。因此，将RGB图像与深度图相结合用于语义分割任务是一种合理的选择。Since the Depth Map can mainly provide information such as the edge of the target. The Depth map is introduced into the semantic segmentation task, the RGB map provides global information, and the depth map provides more complete contour information, expressing geometric structure and distance information. Therefore, it is a reasonable choice to combine RGB images with depth maps for semantic segmentation tasks.

此前的RGB-D语义分割方法大多将Depth Map作为独立于RGB图像的数据流，单独提取特征，或者将Depth图像作为RGB图像的第四个通道，该类方法无差别地对待RGB图像和Depth图像，并未考虑到RGB图和深度图信息本质上是不同的，所以广泛应用于RGB的卷积操作不适用于深度图的信息处理。Most of the previous RGB-D semantic segmentation methods use the Depth Map as a data stream independent of the RGB image to extract features separately, or use the Depth image as the fourth channel of the RGB image. This type of method treats the RGB image and the Depth image indiscriminately. , does not consider that the RGB image and the depth image information are essentially different, so the convolution operation widely used in RGB is not suitable for the information processing of the depth image.

考虑到RGB图像数据和Depth图像数据之间存在跨模态数据的二义性问题，本发明尝试探索基于形状感知和像素卷积的跨模态特征融合方法。本发明通过对深度特征的局部形状以及联系进一步挖掘特征在跨模态特征融合方面的作用，帮助语义分割模型更加准确地像素分类。Considering the ambiguity of cross-modal data between RGB image data and Depth image data, the present invention attempts to explore a cross-modal feature fusion method based on shape perception and pixel convolution. The present invention helps the semantic segmentation model to more accurately classify pixels by further mining the role of features in cross-modal feature fusion through local shapes and connections of deep features.

发明内容:Invention content:

针对以上提出的问题，本发明提供一种基于形状感知的跨模态RGB-D语义分割方法，具体采用的技术方案如下：In view of the problems raised above, the present invention provides a cross-modal RGB-D semantic segmentation method based on shape perception, and the specific technical scheme adopted is as follows:

1.获取训练和测试该任务的RGB-D数据集。1. Obtain the RGB-D dataset for training and testing the task.

1.1)将NYU-Depth-V2(NYUDv2-13 and-40)数据集作为训练集，SUN RGB-D数据集作为测试集。1.1) The NYU-Depth-V2 (NYUDv2-13 and-40) dataset is used as the training set, and the SUN RGB-D dataset is used as the test set.

1.2)RGB-D图像数据集，每份数据标注了场景种类(scene category)、二维分割(2D segmentation)、三维房间布局(3D room layout)、三维物体边框(3D object box)、三维物体方向(3D object prientation)。1.2) RGB-D image data set, each data is marked with scene category, 2D segmentation, 3D room layout, 3D object box, and 3D object direction (3D object prientation).

2.利用深度学习技术，基于形状感知和像素卷积并通过双编码器-解码器结构构建RGB-D语义分割网络模型：2. Using deep learning technology, a RGB-D semantic segmentation network model is constructed based on shape perception and pixel convolution and through a dual encoder-decoder structure:

2.1)利用编码器-解码器架构作为本发明的模型的基本架构，用于提取RGB图像特征和因对的Depth图像特征，分别为

和/>

2.1) Utilize the encoder-decoder architecture as the basic architecture of the model of the present invention, for extracting RGB image features and corresponding Depth image features, respectively

and />

2.2)本发明采用NYU-Depth-V2数据集预训练构建双编码器-解码器架构的网络模型。2.2) The present invention adopts NYU-Depth-V2 data set pre-training to build a network model of dual encoder-decoder architecture.

3.基于步骤2所提取到的RGB图像特征

和对应的Depth图像特征

进行跨模态特征融合，并利用该融合构建一个跨模态特征融合网络用于生成多模态特征。3. Based on the RGB image features extracted in step 2

And the corresponding Depth image features

Perform cross-modal feature fusion, and use the fusion to construct a cross-modal feature fusion network for generating multi-modal features.

3.1)跨模态特征融合模块由5个层次的FCF模块整合5个层次的RGB图像特征

和对应的Depth图像特征/>

构成，更新出5个层次的特征/>

和/>

3.1) The cross-modal feature fusion module integrates 5 levels of RGB image features by 5 levels of FCF modules

And the corresponding Depth image features />

Composition, update 5 levels of features/>

and />

3.2)第i层次的FCF模块的输入数据为

和/>

构成，并通过交互注意力机制更新出5个层次的特征/>

和/>

3.2) The input data of the FCF module at the i-th level is

and />

Composition, and update the features of 5 levels through the interactive attention mechanism />

and />

3.3)FCF模块通过特征交叉融合生成多模态特征具体过程如下：3.3) The FCF module generates multimodal features through feature cross fusion. The specific process is as follows:

3.3.1)首先本发明构建一个交叉像素卷积模块用于获取RGB和像素差异的特征，进一步增强RGB图像特征。同时对于深度图构建形状感知卷积用于获取较为准确地局部形状边缘信息，进一步增强Depth图像特征。3.3.1) Firstly, the present invention constructs a cross-pixel convolution module to obtain features of RGB and pixel differences, and further enhance RGB image features. At the same time, the shape-aware convolution is constructed for the depth image to obtain more accurate local shape edge information, and further enhance the depth image features.

3.3.2)进一步利用元素感知的矩阵相加操作融合RGB图像特征和对应的Depth图像特征，其中通过像素卷积进行判断像素是否可用，利用元素感知的矩阵相加操作确定最后计算值。然后利用softmax激活函数将融合后的特征转化为RGB特征更新权重W_r和深度特征更新权重W_d：3.3.2) Further use the element-aware matrix addition operation to fuse the RGB image features and the corresponding Depth image features, wherein whether the pixel is available is judged by pixel convolution, and the element-aware matrix addition operation is used to determine the final calculation value. Then use the softmax activation function to convert the fused features into RGB feature update weight W _r and depth feature update weight W _d :

其中，conv表示卷积模块，表示元素感知的矩阵乘操作，add表示元素感知的矩阵加操作，GAP表示全局平均池化，softmax表示softmax激活函数。

为像素卷积值，

为RGB卷积值。Among them, conv represents the convolution module, represents the element-aware matrix multiplication operation, add represents the element-aware matrix addition operation, GAP represents the global average pooling, and softmax represents the softmax activation function.

is the pixel convolution value,

is the RGB convolution value.

3.3.3)在获得RGB特征更新权重W_r和深度特征更新权重W_d之后，我们将W_r和W_d分别与增强之后的RGB图像特征和对应的Depth图像特征相结合，得到新的RGB特征和深度特征。3.3.3) After obtaining the RGB feature update weight W _r and the depth feature update weight W _d , we combine W _r and W _d with the enhanced RGB image feature and the corresponding Depth image feature respectively to obtain a new RGB feature and deep features.

3.3.4)通过上述操作，更新出5个层次的特征

和/>

并将每个层次更新的特征对应输入下一个像素卷积模块和形状感知模块，通过多层级的操作增强特征感受野信息和高级语义信息。3.3.4) Through the above operations, update the features of 5 levels

and />

And the updated features of each level are correspondingly input to the next pixel convolution module and shape perception module, and the feature receptive field information and advanced semantic information are enhanced through multi-level operations.

4)通过交叉融合方法，融合跨模态特征，RGB图像特征

和对应的Depth图像特征

最后得到融合特征/>

4) Through the cross-fusion method, the cross-modal features and RGB image features are fused

And the corresponding Depth image features

Finally get the fusion feature />

其中，i∈{1,2,3,4,5}表示特征所在模型的层次，conv5表示卷积核大小为5×5的卷积操作，cat表示特征连接操作。Among them, i∈{1,2,3,4,5} represents the level of the model where the feature is located, conv5 represents the convolution operation with a convolution kernel size of 5×5, and cat represents the feature connection operation.

4.1)将更新后的特征

经过有效特征层利用像素卷积结构特征提取：4.1) The updated features

Through the effective feature layer, the pixel convolution structure feature extraction is used:

P_i＝Conv(P，K_i) 公式(3)P _i =Conv(P, K _i ) formula (3)

D_i＝Conv(R，K_i) 公式(4)D _i =Conv(R, K _i ) formula (4)

R_i＝Conv(D_i+P_i,K₁) 公式(5)R _i ＝Conv(D _i +P _i ,K ₁ ) formula (5)

其中，i∈{1,2,3,4,5}表示特征所在的层次，Conv()代表所执行的卷积操作，K_i为各层次不同的卷积核，D_i为RGB特征的提取结果，P_i为像素信息的提取，并令K₁为1×1卷积核。R_i为最终产生的RGB图像特征

Among them, i∈{1,2,3,4,5} represents the level of the feature, Conv() represents the convolution operation performed, K _i is the convolution kernel of each level, D _i is the extraction of RGB features As a result, P _i is the extraction of pixel information, and let K ₁ be a 1×1 convolution kernel. R _i is the final RGB image feature

4.2)将上述步骤所生成的RGB图像特征和模态感知模块中的深度特征信息输入特征交叉融合模块，融合不同感受野的多模态特征。4.2) Input the RGB image features generated in the above steps and the depth feature information in the modality perception module into the feature cross fusion module to fuse the multimodal features of different receptive fields.

5)将步骤4所获取更新的第5个层次RGB图像特征和深度图像特征，输入到DeepLabV3+的解码器中，将编码器的输出上采样4倍，使其分辨率和低层级的feature一致。将特征层连接后，再进行一次3×3的卷积(细化作用)，得到最终的融合特征，在经过sigmoid函数激活，得到预测的语义图P_est。5) Input the updated fifth-level RGB image features and depth image features obtained in step 4 into the decoder of DeepLabV3+, and upsample the output of the encoder by 4 times to make its resolution consistent with low-level features. After the feature layers are connected, a 3×3 convolution (refinement) is performed to obtain the final fusion feature, which is activated by the sigmoid function to obtain the predicted semantic map P _est .

6)通过本发明预测出来的语义图P_est与人工标注的语义分割图P_GT进行损失函数的计算，并通过反向传播算法逐步更新本发明提出的模型的参数权重，最终确定RGB-D语义分割算法的结构和参数权重。6) Calculate the loss function through the semantic map P _est predicted by the present invention and the semantic segmentation map _PGT manually marked, and gradually update the parameter weights of the model proposed by the present invention through the back propagation algorithm, and finally determine the RGB-D semantics The structure and parameter weights of the segmentation algorithm.

7)在步骤6确定模型的结构和参数权重的基础上，对测试集上的RGB-D图像对进行测试，生成显著图P_test，并使用MAE、S-measure、F-measure、E-measure评价指标进行评估。7) On the basis of determining the structure and parameter weights of the model in step 6, test the RGB-D image pair on the test set to generate a saliency map P _test , and use MAE, S-measure, F-measure, E-measure Evaluation indicators are evaluated.

本发明基于深度卷积神经网络、形状感知和像素卷积实现的RGB-D语义分割，提取Depth图像中的丰富的空间结构边缘信息，并与RGB图像提取的全局信息进行交叉跨模态特征融合，能够适应不同场景下的语义分割的要求，特别在一些具有挑战性场景下(复杂背景、低对比度、透明物体等)。相比较之前的语义分割方法，本发明具有以下收益：The invention is based on RGB-D semantic segmentation realized by deep convolutional neural network, shape perception and pixel convolution, extracts rich spatial structure edge information in Depth image, and performs cross-cross-modal feature fusion with global information extracted from RGB image , can adapt to the requirements of semantic segmentation in different scenarios, especially in some challenging scenarios (complex background, low contrast, transparent objects, etc.). Compared with previous semantic segmentation methods, the present invention has the following benefits:

首先，将深度图的引入，并且深度图不作为RGB图的额外通道，同时不将两种模态作为相同贡献价值进行特征提取和融合。利用深度学习技术，通过双编码器-解码器结构构建RGB-D图像对和实类之间的关系，并通过跨模态特征的提取和融合，得到分割特征。First of all, the depth map is introduced, and the depth map is not used as an additional channel of the RGB map, and the two modalities are not used as the same contribution value for feature extraction and fusion. Using deep learning technology, the relationship between RGB-D image pairs and real classes is constructed through a dual encoder-decoder structure, and segmentation features are obtained through cross-modal feature extraction and fusion.

其次，通过一种交叉融合的方式，有效调制Depth图像特征对于RGB图像特征的补充边缘信息，而对RGB图像的全局信息无影响，并利用其本身具备的深度分布信息指导跨模态的特征融合，排除RGB图像中的背景信息的干扰，为下一阶段的像素分割打好基础。Secondly, through a cross-fusion method, the Depth image feature is effectively modulated to supplement the edge information of the RGB image feature, but has no effect on the global information of the RGB image, and uses its own depth distribution information to guide cross-modal feature fusion. , eliminate the interference of background information in the RGB image, and lay a solid foundation for the next stage of pixel segmentation.

最后，通过语义解码器，预测最终的语义分割像素图。Finally, through the semantic decoder, the final semantic segmentation pixmap is predicted.

附图说明Description of drawings

图1为本发明的模型结构示意图Fig. 1 is the model structure schematic diagram of the present invention

图2为跨模态特征融合模块示意图Figure 2 is a schematic diagram of the cross-modal feature fusion module

图3为交叉像素卷积模块示意图Figure 3 is a schematic diagram of the cross-pixel convolution module

图4为分割解码器示意图Figure 4 is a schematic diagram of the segmentation decoder

图5为模型训练和测试示意图Figure 5 is a schematic diagram of model training and testing

具体实施方式Detailed ways

下面将结合本发明实例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，此外，所叙述的实例仅仅是本发明一部分实例，而不是所有的实例。基于本发明中的实例，本研究方向普通技术人员在没有付出创造性劳动前提下所获得的所有其他实例，都属于本发明保护范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the examples of the present invention. In addition, the described examples are only some examples of the present invention, not all examples. Based on the examples in the present invention, all other examples obtained by those of ordinary skill in this research direction without paying creative work, all belong to the protection scope of the present invention.

参考附图1，一种基于形状感知和像素卷积的跨模态RGB-D语义分割方法主要包含以下步骤：Referring to Figure 1, a cross-modal RGB-D semantic segmentation method based on shape perception and pixel convolution mainly includes the following steps:

1.获取训练和测试该任务的RGB-D数据集，并定义本发明的算法目标，并确定用于训练和测试算法的训练集和测试集。将NYU-Depth-V2(NYUDv2-13 and-40)数据集作为训练集，SUN RGB-D数据集作为测试集。1. Obtain the RGB-D data set for training and testing the task, and define the algorithm target of the present invention, and determine the training set and test set for training and testing the algorithm. The NYU-Depth-V2 (NYUDv2-13 and-40) dataset is used as the training set, and the SUN RGB-D dataset is used as the test set.

2.利用交叉像素卷积网络提取RGB图像特征，利用形状感知卷积网络提取Depth图像特征，并以此为基础构建双编码器-解码器的语义分割模型网络，包括用于提取RGB图像特征的RGB编码器和用于提取Depth图像特征的Depth编码器：2. Use the cross-pixel convolutional network to extract RGB image features, and use the shape-aware convolutional network to extract Depth image features, and build a dual encoder-decoder semantic segmentation model network based on this, including for extracting RGB image features RGB encoder and Depth encoder for extracting Depth image features:

2.1.将带有三通道的RGB图像输入到RGB编码器，生成5个层次的RGB图像特征，为

2.1. Input the RGB image with three channels to the RGB encoder to generate 5 levels of RGB image features, as

2.2.将三通道的Depth图像输入到Depth编码器中，生成5个层次的Depth图像特征，为

2.2. Input the three-channel Depth image into the Depth encoder to generate five levels of Depth image features, as

3.基于步骤2所提取到的RGB图像特征

和对应的Depth图像特征

And the corresponding Depth image features

和对应的Depth图像特征/>

构成，更新出5个层次的特征

和/>

And the corresponding Depth image features />

Composition, update 5 levels of features

and />

3.2)第i层次的FCF模块的输入数据为

和/>

构成，并通过交互注意力机制更新出5个层次的特征/>

和/>

3.2) The input data of the FCF module at the i-th level is

and />

3.3.2)进一步利用元素感知的矩阵相加操作融合RGB图像特征和对应的Depth图像特征，其中通过像素卷积进行判断像素是否可用，利用元素感知的矩阵相加操作确定最后计算值。然后利用softmax激活函数将融合后的特征转化为RGB特征更新权重Wr和深度特征更新权重W_d：3.3.2) Further use the element-aware matrix addition operation to fuse the RGB image features and the corresponding Depth image features, wherein whether the pixel is available is judged by pixel convolution, and the element-aware matrix addition operation is used to determine the final calculation value. Then use the softmax activation function to convert the fused features into RGB feature update weight Wr and depth feature update weight W _d :

为像素卷积值，

is the pixel convolution value,

is the RGB convolution value.

3.3.4)通过上述操作，更新出5个层次的特征

和/>

and />

4)通过交叉融合方法，融合跨模态特征，RGB图像特征

和对应的Depth图像特征

最后得到融合特征/>

And the corresponding Depth image features

Finally get the fusion feature />

4.1)将更新后的特征

D_i＝Conv(R，K_i) 公式(3)D _i =Conv(R, K _i ) formula (3)

P_i＝Conv(P，K_i) 公式(4)P _i =Conv(P, K _i ) formula (4)

R_i＝Conv(D_i+P_i,K₁) 公式(5)R _i ＝Conv(D _i +P _i ,K ₁ ) formula (5)

Claims

1. A cross-modal RGB-D semantic segmentation method based on shape perception, characterized in that the method comprises the following steps:

1) obtain the RGB-D dataset of training and testing this task, and define the algorithm target of the present invention;

2) Using deep learning technology, build a RGB-D semantic segmentation network model based on shape perception and pixel convolution and through a dual encoder-decoder structure;

3) Construct a cross-modal feature fusion network for generating multi-modal features;

4) The structure fuses cross-modal features through the cross-fusion method to enhance the high-level semantic information of multi-modal features;

5) In the decoder of DeepLabV3+, the output of the encoder is up-sampled to make the resolution consistent with the low-level feature. Connect the feature layer to perform a 3×3 convolution, and activate the sigmoid function to obtain the predicted semantic map P _est ;

6) Calculate the loss of the predicted saliency map P _est and the manually labeled semantic segmentation map P _GT ;

7) Test the test data set, generate a saliency map P _test , and use the evaluation index to perform performance evaluation.

2. a kind of cross-modal RGB-D semantic segmentation method based on shape perception according to claim 1, is characterized in that: described step 2) concrete method is:

2.1) The NYU-Depth-V2 (NYUDv2-13 and-40) dataset is used as the training set, and the SUN RGB-D dataset is used as the test set.

2.2) RGB-D image data set, each data marked scene category (scene category), two-dimensional segmentation (2Dsegmentation), three-dimensional room layout (3D room layout), three-dimensional object frame (3D object box), three-dimensional object direction ( 3D object prientation).

3. a kind of cross-modal RGB-D semantic segmentation method based on shape perception according to claim 1, is characterized in that: described step 3) concrete method is:

3.1) Use the encoder-decoder architecture as the basic architecture of the model of the present invention, for extracting RGB image features and the corresponding Depth image features, respectively

and />

3.2) The present invention adopts NYU-Depth-V2 data set pre-training to build a network model of dual encoder-decoder architecture.

4. a kind of cross-modal RGB-D semantic segmentation method based on shape perception according to claim 1, is characterized in that: described step 4) concrete method is:

4.1) The cross-modal feature fusion module integrates 5 levels of RGB image features by 5 levels of FCF modules

And the corresponding Depth image features />

Composition, update 5 levels of features/>

and

4.2) The input data of the FCF module at the i-th level is

and />

and />

5. a kind of cross-modal RGB-D semantic segmentation method based on shape perception according to claim 1, is characterized in that: described step 5) concrete method is:

5.1) The updated features

P _i =Conv(P, K _i ) formula (1)

D _i =Conv(R, K _i ) formula (2)

B _i =Conv(D _i +P _i , K ₁ ) formula (3)

Among them, i∈{1, 2, 3, 4, 5} represents the level of the feature, Conv() represents the convolution operation performed, K _i is the convolution kernel of each level, D _i is the extraction of RGB features As a result, P _i is the extraction of pixel information, and let K ₁ be a 1×1 convolution kernel. R _i is the final RGB image feature

5.2) Input the RGB image features generated by the above steps and the depth feature information in the modality perception module into the feature cross fusion module to fuse the multimodal features of different receptive fields.

6) Input the updated fifth-level RGB image features and depth image features obtained in step 4 into the decoder of DeepLabV3+, and upsample the output of the encoder by 4 times to make its resolution consistent with the low-level features. After the feature layers are connected, a 3×3 convolution (refinement) is performed to obtain the final fusion feature, which is activated by the sigmoid function to obtain the predicted semantic map P _est .

7) Calculate the loss function through the semantic map P _est predicted by the present invention and the semantic segmentation map _PGT manually marked, and gradually update the parameter weights of the model proposed by the present invention through the back propagation algorithm, and finally determine the RGB-D semantics The structure and parameter weights of the segmentation algorithm.