CN112861911A - RGB-D semantic segmentation method based on depth feature selection fusion - Google Patents

RGB-D semantic segmentation method based on depth feature selection fusion Download PDF

Info

Publication number
CN112861911A
CN112861911A CN202110027615.3A CN202110027615A CN112861911A CN 112861911 A CN112861911 A CN 112861911A CN 202110027615 A CN202110027615 A CN 202110027615A CN 112861911 A CN112861911 A CN 112861911A
Authority
CN
China
Prior art keywords
layer
image
rgb
semantic segmentation
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110027615.3A
Other languages
Chinese (zh)
Other versions
CN112861911B (en
Inventor
袁媛
苏月皎
姜志宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202110027615.3A priority Critical patent/CN112861911B/en
Publication of CN112861911A publication Critical patent/CN112861911A/en
Application granted granted Critical
Publication of CN112861911B publication Critical patent/CN112861911B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a depth feature selection fusion-based RGB-D semantic segmentation method, which adopts a double-current convolution neural network as a semantic segmentation model, firstly preprocesses an RGB-D image and a corresponding label image, then extracts feature maps of each layer of a visible light image and a depth image by an encoder, and then fuses the feature maps of the visible light image and the depth image to obtain the fusion feature of each layer; and then, selecting the fusion characteristics of the encoder by using space attention, and up-sampling the selected result to finally obtain a segmentation result. The method is more accurate for small objects and contour segmentation, has stronger robustness for illumination change and objects with similar appearances, and simultaneously has higher accuracy and average intersection ratio of segmentation results.

Description

RGB-D semantic segmentation method based on depth feature selection fusion
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to an RGB-D semantic segmentation method.
Background
Semantic Segmentation (Semantic Segmentation) refers to the classification of images at the pixel level according to Semantic information. With the development of depth sensors, in addition to visual information that is widely used, depth information is regarded as another supplementary information that improves scene resolution performance. The depth information contains 3D geometric information that is insensitive to illumination variations and that can distinguish objects that are similar in appearance. Therefore, the depth cue can make up for the defect of semantic segmentation using only visual cue to some extent. RGB-D semantic segmentation is very important for many applications, such as auto-driving, robot vision and indoor navigation, etc.
With the development of deep learning, the dual-stream network achieves excellent performance in RGB-D semantic segmentation. However, how to effectively fuse visible light and depth information into a unified representation remains a basic but difficult problem in RGB-D semantic segmentation. Hazirbas et al in the document "C.Hazirbas, L.Ma, C.Domokos, and D.Cremers.FuseNet: incorporated Depth and Selective Segmentation video-Based CNN architecture. in assessment Conference on Computer video, 2016, pp.213-228" directly add the Depth feature map to the visible light feature map to fuse the information of both. Deng in the document "l.deng, m.yang, t.li, y.he, and c.wang.rfbnet: Deep Multimodal Networks with Residual Fusion Blocks for RGB-D semiconducting segmentation. envelope axiv, 2019" proposes a Residual Fusion block to achieve bottom-up interaction and Fusion between the two modes. Hu et al, in the literature "X.Hu, K.Yang, L.Fei, and K.Wang.ACNet: Attention Based networks to explicit computation resources for RGBD selection. in IEEE International Conference on Image Processing,2019, pp.1440-1444", propose an Attention supplementing module that assigns different weights to different modalities to achieve better integration. The document "s.lee, s.j.park, and k.s.hong.rdfnet RGB-D Multi-level reactive Feature Fusion for the interior semiconductor segmentation. in IEEE International Conference on Computer Vision,2017, pp.4990-4999" by s.lee et al uses monomodal Residual joining to learn RGB and depth features and combinations thereof to exploit complementary features. Although these methods provide a structured model that integrates both types of information, how to ensure that the network takes full advantage of the information in both modes for fine semantic segmentation remains an open question.
In addition, most of the current RGB-D semantic segmentation methods are based on an encoder-decoder architecture, continuous down-sampling in an encoder may cause a part of detailed information to be lost, and reducing the resolution of the feature map to a very small level through a cascaded pooling layer is not favorable for generating an accurate segmentation result. To further exploit the spatial information lost in the encoding stage, o.ronnberger et al propose hopping connections in the literature "o.ronnberger, p.fischer, and t.brox.u-Net: relational Networks for biological Image segmentation. in Medical Image Computing and Computer-Assisted interpretation, 2015, pp.234-241" to reuse the information lost in the encoder to assist in the up-sampling learning, and finally to obtain a fine segmentation result. Although it may enable reuse of some missing features, it lacks pertinence and does not explicitly model the recovery of important detail information.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides an RGB-D semantic segmentation method based on depth feature selection fusion, which adopts a double-current convolution neural network as a semantic segmentation model, firstly, an RGB-D image and a corresponding label image are preprocessed, then, a coder extracts feature maps of each layer of a visible light image and a depth image, and then, the feature maps of the visible light image and the depth image are fused to obtain the fusion feature of each layer; and then, selecting the fusion characteristics of the encoder by using space attention, and up-sampling the selected result to finally obtain a segmentation result. The method is more accurate for small objects and contour segmentation, has stronger robustness for illumination change and objects with similar appearances, and simultaneously has higher accuracy and average intersection ratio of segmentation results.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1: constructing a semantic segmentation model:
the method comprises the steps that a double-current convolutional neural network is adopted as a semantic segmentation model and comprises two encoders and a decoder;
step 2: fusing RGB-D information;
step 2-1: preprocessing an image;
taking the RGB-D images and the corresponding label images from the public training set as a training sample set, and uniformly changing the sizes of the images in the training sample set into A x A; the RGB-D image consists of a visible light image and a depth image;
coding the depth image of the single channel into a three-channel HHA image;
step 2-2: respectively inputting the visible light image and the HHA image into two encoders for feature extraction to obtain two feature sequences, wherein each feature sequence comprises an m-layer feature diagram and is represented as formula (1):
Figure BDA0002890914190000021
wherein
Figure BDA0002890914190000022
And
Figure BDA0002890914190000023
j-th layer feature maps of the visible light image and the depth image respectively, wherein H, W and C respectively represent the height, width and channel number of the feature maps;
step 2-3: will be provided with
Figure BDA0002890914190000031
And
Figure BDA0002890914190000032
obtaining selected characteristic graphs through convolution of 1 x 1 respectively, wherein the selected characteristic graphs are expressed as a formula (2):
Figure BDA0002890914190000033
wherein
Figure BDA0002890914190000034
And
Figure BDA0002890914190000035
selected feature maps for the j-th layer of the visible image and the depth image respectively,
Figure BDA0002890914190000036
and
Figure BDA0002890914190000037
respectively representing the process of selection through convolution of 1 x 1;
step 2-4: calculating the fusion characteristics of j layers of the two characteristic sequences by adopting an equation (3):
Figure BDA0002890914190000038
wherein the content of the first and second substances,
Figure BDA0002890914190000039
and
Figure BDA00028909141900000310
respectively representing the fusion characteristics of the j-th layer and the j-1 st layer of the two characteristic sequences, fconv(.) represents convolution, fdown(.) represents downsampling, fconcat(.) represents a cascade; fusion features of layer 1 of two feature sequences
Figure BDA00028909141900000311
Comprises the following steps:
Figure BDA00028909141900000312
and step 3: reusing the detail information;
step 3-1: inputting the fused features of the mth layer of the two feature sequences into a decoder to be used as a first layer of the decoder; the fusion characteristics of the m-th layers of the two characteristic sequences are up-sampled to be used as a second layer of a decoder;
step 3-2: respectively selecting the fusion characteristics of the 2 nd layer to the m-1 th layer of the two characteristic sequences by using space attention, re-fusing the selected result and the fusion characteristics of the two characteristic sequences according to a formula (5), and up-sampling the re-fused result to obtain the third layer to the m th layer of the decoder;
Figure BDA00028909141900000313
Figure BDA00028909141900000314
wherein the content of the first and second substances,
Figure BDA00028909141900000315
shows the results of the m-i +1 th layer after the re-fusion,
Figure BDA00028909141900000316
represents the fusion characteristics of the ith layers of the two characteristic sequences,
Figure BDA00028909141900000317
representing the m-i +1 th layer of the decoder,
Figure BDA00028909141900000318
representing the m-i +2 th layer of the decoder, fup(.) represents upsampling;
when i is 2, obtain
Figure BDA00028909141900000319
The image size of the segmentation result is A x A;
and 4, step 4: training a semantic segmentation model;
training the semantic segmentation model by using a training sample set; inputting the RGB-D image of the training sample set into a semantic segmentation model, and segmenting through the steps 2 and 3;
sequentially performing m-2 times of downsampling on the label images of the training sample set, wherein the size of the image obtained by each downsampling is the same as the scale of the feature maps from the (m-1) th layer to the (2) th layer of the feature sequence;
respectively taking the labeled images of the training sample set and the images sequentially subjected to m-2 times of downsampling as supervision information from the mth layer to the 2 nd layer of the semantic segmentation model to train the semantic segmentation model;
the objective function of the training is:
Figure BDA0002890914190000041
wherein liAnd λiRespectively representing the weighted cross entropy loss function and the weight thereof of the ith layer, wherein class is a class number, weight [ class ]]Is the frequency, x, at which the pixels of each class appear in the training set.]Is the predicted segmentation result;
and 5: and inputting the RGB-D image to be segmented into the trained semantic segmentation model to obtain an image segmentation result.
Preferably, a is 480, m is 4, λ1=0.1,λ2=0.1,λ3=0.2。
Preferably, in the step 4, when the label images of the training sample set are sequentially subjected to m-2 times of downsampling, nearest neighbor interpolation downsampling is adopted.
The invention has the following beneficial effects:
1. and the method is more accurate for small objects and contour segmentation. The method is more accurate for small object and contour segmentation, and the proposed spatial attention mechanism is used for paying attention to and reusing features which are easy to lose in down-sampling, so that the features can play a role in recovering a segmentation mask, and the method is more helpful for small object and contour segmentation.
2. The robustness is stronger for illumination changes and object segmentation of similar appearance. The method of the invention uses two kinds of modal information of visible light image and depth image as input, and uses an information fusion module to fuse the two kinds of information to obtain a unified representation with unified and high distinguishability. The unified representation can make up the deficiency of visible light semantic segmentation, namely, the influence of illumination change on the performance of the model can be reduced, and objects with partial similar appearances in a scene can be segmented.
3. The accuracy rate and the average intersection ratio of the segmentation results are higher. The method can help the existing model architecture to better perform scene analysis.
4. The generalization ability is stronger. The RGB-D information fusion module provided by the invention can be generalized to other multi-mode information fusion algorithms, such as visible light and infrared information fusion.
5. The system has more practical and industrial values. The invention is suitable for vehicle auxiliary driving system and robot indoor navigation, thus having higher practical value.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a graph of semantic segmentation results generated by the method of the present invention and the comparison method.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
The method mainly solves the two problems that multi-modal feature fusion is not complete enough in RGB-D semantic segmentation and important features are lost in the down-sampling process. The purpose is to solve the following aspects:
1. existing algorithms do not explicitly model the fusion of both visible light and depth information.
2. The existing algorithm loses some important features in the down-sampling process, and further causes poor small object and outline segmentation performance.
As shown in fig. 1, a depth-feature-selection-based fused RGB-D semantic segmentation method includes the following steps:
step 1: constructing a semantic segmentation model:
the method comprises the steps that a double-current convolutional neural network is adopted as a semantic segmentation model and comprises two encoders and a decoder;
step 2: fusing RGB-D information;
step 2-1: preprocessing an image;
taking the RGB-D images and the corresponding label images from the public training set as a training sample set, and uniformly changing the sizes of the images in the training sample set into A x A; the RGB-D image consists of a visible light image and a depth image;
coding the depth image of the single channel into a three-channel HHA image;
step 2-2: respectively inputting the visible light image and the HHA image into two encoders for feature extraction to obtain two feature sequences, wherein each feature sequence comprises an m-layer feature diagram and is represented as formula (1):
Figure BDA0002890914190000051
wherein
Figure BDA0002890914190000052
And
Figure BDA0002890914190000053
j-th layer feature maps of the visible light image and the depth image respectively, wherein H, W and C respectively represent the height, width and channel number of the feature maps;
step 2-3: will be provided with
Figure BDA0002890914190000054
And
Figure BDA0002890914190000055
obtaining selected characteristic graphs through convolution of 1 x 1 respectively, wherein the selected characteristic graphs are expressed as a formula (2):
Figure BDA0002890914190000056
wherein
Figure BDA0002890914190000057
And
Figure BDA0002890914190000058
selected feature maps for the j-th layer of the visible image and the depth image respectively,
Figure BDA0002890914190000059
and
Figure BDA00028909141900000510
respectively representing the process of selection through convolution of 1 x 1;
step 2-4: calculating the fusion characteristics of j layers of the two characteristic sequences by adopting an equation (3):
Figure BDA00028909141900000511
wherein the content of the first and second substances,
Figure BDA00028909141900000512
and
Figure BDA00028909141900000513
respectively representing the fusion characteristics of the j-th layer and the j-1 st layer of the two characteristic sequences, fconv(.) represents convolution, fdown(.) represents downsampling, fconcat(.) represents a cascade; fusion features of layer 1 of two feature sequences
Figure BDA0002890914190000061
Comprises the following steps:
Figure BDA0002890914190000062
and step 3: reusing the detail information;
step 3-1: inputting the fused features of the mth layer of the two feature sequences into a decoder to be used as a first layer of the decoder; the fusion characteristics of the m-th layers of the two characteristic sequences are up-sampled to be used as a second layer of a decoder;
step 3-2: respectively selecting the fusion characteristics of the 2 nd layer to the m-1 th layer of the two characteristic sequences by using space attention, re-fusing the selected result and the fusion characteristics of the two characteristic sequences according to a formula (5), and up-sampling the re-fused result to obtain the third layer to the m th layer of the decoder;
Figure BDA0002890914190000063
Figure BDA0002890914190000064
wherein the content of the first and second substances,
Figure BDA0002890914190000065
shows the results of the m-i +1 th layer after the re-fusion,
Figure BDA0002890914190000066
represents the fusion characteristics of the ith layers of the two characteristic sequences,
Figure BDA0002890914190000067
representing the m-i +1 th layer of the decoder,
Figure BDA0002890914190000068
representing the m-i +2 th layer of the decoder, fup(.) represents upsampling;
when i is 2, obtain
Figure BDA0002890914190000069
The image size of the segmentation result is A x A;
and 4, step 4: training a semantic segmentation model;
training the semantic segmentation model by using a training sample set; inputting the RGB-D image of the training sample set into a semantic segmentation model, and segmenting through the steps 2 and 3;
sequentially performing m-2 times of downsampling on the label images of the training sample set, wherein the size of the image obtained by each downsampling is the same as the scale of the feature maps from the (m-1) th layer to the (2) th layer of the feature sequence;
respectively taking the labeled images of the training sample set and the images sequentially subjected to m-2 times of downsampling as supervision information from the mth layer to the 2 nd layer of the semantic segmentation model to train the semantic segmentation model;
the objective function of the training is:
Figure BDA00028909141900000610
wherein liAnd λiRespectively representing the weighted cross entropy loss function and the weight thereof of the ith layer, wherein class is a class number, weight [ class ]]Is the frequency, x, at which the pixels of each class appear in the training set.]Is the predicted segmentation result;
and 5: and inputting the RGB-D image to be segmented into the trained semantic segmentation model to obtain an image segmentation result.
The specific embodiment is as follows:
in this embodiment, the simulation is performed by using a Pythroch on an operating system with a central processing unit of Intel (R) core (TM) i7-6800K CPU @3.40GHz and a memory 60G, Linux. The data used in the simulation is an open data set.
The data used in the simulation were from NYUDv2 and SUN RGB-D data. The NYUDv2 data contained 1449 pairs of densely labeled RGB-D image pairs captured by Microsoft Kinect, with 795 pairs of images for training and 654 pairs of images for testing. The SUN RGB-D dataset is the currently largest RGB-D semantically segmented dataset with 10,335 densely annotated RGB-D images taken from 20 different scenes. It is captured by four different sensors, Kinect V1, Kinect V2, xution and RealSense. The official-divided training set includes 5285 pairs of RGB-D images and labels, with the remaining 5050 pairs being used for testing. The number of categories in both data sets is 40.
To demonstrate the effectiveness of the method, the present invention selected 3DGNN, RedNet, CFN, ACNet, PAP, SA-Gate as the comparison algorithm on both data sets. Wherein 3DGNN is a method as proposed in the literature "X.Qi, R.Liao, J.jia, S.Fidler, and R.Urtasun.3D Graph Neural Networks for RGBD magnetic segmentation. in IEEE International Conference on Computer Vision,2017, pp.5209-5218"; RedNet is proposed in the literature "J.Jiang, L.Zheng, F.Luo, and Z.Zhang.RedNet: Residual Encode-Decoder Network for index RGB-D magnetic segmentation. Eprint Arxiv, 2018."; CFN is described in the documents "D.Lin, G.Chen, D.Cohen-Or, P.A.heng, and H.Huang.Cascade Feature Network for magnetic Segmentation of RGB-D images.In International Conference on Computer Vision,2017, pp.1320-1328"; PAP is proposed in the literature "Z.ZHenyu, C.ZHen, X.Chunyan, Y.Yan, S.Nicu, Y.Jian.Pattern-affinity amplification Depth, Surface Normal and Semantic Segmentin.in IEEE Conference on Computer Vision and Pattern Recognition,2019, pp.4106-4115"; SA-Gate is proposed in the literature "X.Chen, K.Y.Lin, J.Wang, W.Wu, C.Qian, H.Li, and G.Zeng.Bi-directional Cross-modification Feature creation with Separation-and-Aggregation Gate for RGB-D semiconductor selection. in European Conference on Computer Vision, 2020". FSFNet is the method proposed in the invention, mIoU and Pixel Acc are evaluation indexes for RGB-D semantic segmentation quality, and the comparison result is shown in Table 1:
TABLE 1
Figure BDA0002890914190000071
Figure BDA0002890914190000081
As can be seen from table 1, on NYUDv2 dataset, the present invention is equivalent to the current optimal algorithm on Pixel acc. On the SUN RGB-D data set, the method is superior to other algorithms in the mIoU index.
FIG. 2 is a graph of semantic segmentation results generated by the present invention and comparison algorithm. As seen from the figure, the present invention has better segmentation performance on different classes of objects such as ceilings, tables, etc. compared to the comparative algorithm, which can prove that the present invention effectively combines the features of RGB and depth information. In addition, the present invention can distinguish small objects and can perform more accurate contour segmentation, thereby proving that the present invention can better utilize detailed information in the down-sampling process.

Claims (3)

1. A depth feature selection fusion-based RGB-D semantic segmentation method is characterized by comprising the following steps:
step 1: constructing a semantic segmentation model:
the method comprises the steps that a double-current convolutional neural network is adopted as a semantic segmentation model and comprises two encoders and a decoder;
step 2: fusing RGB-D information;
step 2-1: preprocessing an image;
taking the RGB-D images and the corresponding label images from the public training set as a training sample set, and uniformly changing the sizes of the images in the training sample set into A x A; the RGB-D image consists of a visible light image and a depth image;
coding the depth image of the single channel into a three-channel HHA image;
step 2-2: respectively inputting the visible light image and the HHA image into two encoders for feature extraction to obtain two feature sequences, wherein each feature sequence comprises an m-layer feature diagram and is represented as formula (1):
Figure FDA0002890914180000011
wherein
Figure FDA0002890914180000012
And
Figure FDA0002890914180000013
j-th layer feature maps of the visible light image and the depth image respectively, wherein H, W and C respectively represent the height, width and channel number of the feature maps;
step 2-3: will be provided with
Figure FDA0002890914180000014
And
Figure FDA0002890914180000015
respectively convolving by 1 x 1 to obtain selected characteristic diagram represented by formula (2)):
Figure FDA0002890914180000016
Wherein
Figure FDA0002890914180000017
And
Figure FDA0002890914180000018
selected feature maps for the j-th layer of the visible image and the depth image respectively,
Figure FDA0002890914180000019
and
Figure FDA00028909141800000110
respectively representing the process of selection through convolution of 1 x 1;
step 2-4: calculating the fusion characteristics of j layers of the two characteristic sequences by adopting an equation (3):
Figure FDA00028909141800000111
wherein the content of the first and second substances,
Figure FDA00028909141800000112
and
Figure FDA00028909141800000113
respectively representing the fusion characteristics of the j-th layer and the j-1 st layer of the two characteristic sequences, fconv(.) represents convolution, fdown(.) represents downsampling, fconcat(.) represents a cascade; fusion features of layer 1 of two feature sequences
Figure FDA00028909141800000114
Comprises the following steps:
Figure FDA00028909141800000115
and step 3: reusing the detail information;
step 3-1: inputting the fused features of the mth layer of the two feature sequences into a decoder to be used as a first layer of the decoder; the fusion characteristics of the m-th layers of the two characteristic sequences are up-sampled to be used as a second layer of a decoder;
step 3-2: respectively selecting the fusion characteristics of the 2 nd layer to the m-1 th layer of the two characteristic sequences by using space attention, re-fusing the selected result and the fusion characteristics of the two characteristic sequences according to a formula (5), and up-sampling the re-fused result to obtain the third layer to the m th layer of the decoder;
Figure FDA0002890914180000021
Figure FDA0002890914180000022
wherein the content of the first and second substances,
Figure FDA0002890914180000023
shows the results of the m-i +1 th layer after the re-fusion,
Figure FDA0002890914180000024
represents the fusion characteristics of the ith layers of the two characteristic sequences,
Figure FDA0002890914180000025
representing the m-i +1 th layer of the decoder,
Figure FDA0002890914180000026
representing the m-i +2 th layer of the decoder, fup(.) represents upsampling;
when i is 2, obtain
Figure FDA0002890914180000027
The image size of the segmentation result is A x A;
and 4, step 4: training a semantic segmentation model;
training the semantic segmentation model by using a training sample set; inputting the RGB-D image of the training sample set into a semantic segmentation model, and segmenting through the steps 2 and 3;
sequentially performing m-2 times of downsampling on the label images of the training sample set, wherein the size of the image obtained by each downsampling is the same as the scale of the feature maps from the (m-1) th layer to the (2) th layer of the feature sequence;
respectively taking the labeled images of the training sample set and the images sequentially subjected to m-2 times of downsampling as supervision information from the mth layer to the 2 nd layer of the semantic segmentation model to train the semantic segmentation model;
the objective function of the training is:
Figure FDA0002890914180000028
wherein liAnd λiRespectively representing the weighted cross entropy loss function and the weight thereof of the ith layer, wherein class is a class number, weight [ class ]]Is the frequency, x, at which the pixels of each class appear in the training set.]Is the predicted segmentation result;
and 5: and inputting the RGB-D image to be segmented into the trained semantic segmentation model to obtain an image segmentation result.
2. The RGB-D semantic segmentation method based on depth feature selection fusion as claimed in claim 1, wherein A-480, m-4, λ1=0.1,λ2=0.1,λ3=0.2。
3. The RGB-D semantic segmentation method based on depth feature selection fusion as claimed in claim 1, wherein in step 4, nearest neighbor interpolation downsampling is adopted when downsampling is performed m-2 times on the label images of the training sample set in sequence.
CN202110027615.3A 2021-01-10 2021-01-10 RGB-D semantic segmentation method based on depth feature selection fusion Active CN112861911B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110027615.3A CN112861911B (en) 2021-01-10 2021-01-10 RGB-D semantic segmentation method based on depth feature selection fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110027615.3A CN112861911B (en) 2021-01-10 2021-01-10 RGB-D semantic segmentation method based on depth feature selection fusion

Publications (2)

Publication Number Publication Date
CN112861911A true CN112861911A (en) 2021-05-28
CN112861911B CN112861911B (en) 2024-05-28

Family

ID=76002060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110027615.3A Active CN112861911B (en) 2021-01-10 2021-01-10 RGB-D semantic segmentation method based on depth feature selection fusion

Country Status (1)

Country Link
CN (1) CN112861911B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762263A (en) * 2021-08-17 2021-12-07 慧影医疗科技(北京)有限公司 Semantic segmentation method and system for small-scale similar structure
CN113920317A (en) * 2021-11-15 2022-01-11 西北工业大学 Semantic segmentation method based on visible light image and low-resolution depth image

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271990A (en) * 2018-09-03 2019-01-25 北京邮电大学 A kind of semantic segmentation method and device for RGB-D image
CN110298361A (en) * 2019-05-22 2019-10-01 浙江省北大信息技术高等研究院 A kind of semantic segmentation method and system of RGB-D image
CN110490884A (en) * 2019-08-23 2019-11-22 北京工业大学 A kind of lightweight network semantic segmentation method based on confrontation
US20200134375A1 (en) * 2017-08-01 2020-04-30 Beijing Sensetime Technology Development Co., Ltd. Semantic segmentation model training methods and apparatuses, electronic devices, and storage media
US20200402300A1 (en) * 2019-06-21 2020-12-24 Harbin Institute Of Technology Terrain modeling method that fuses geometric characteristics and mechanical charateristics, computer readable storage medium, and terrain modeling system thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200134375A1 (en) * 2017-08-01 2020-04-30 Beijing Sensetime Technology Development Co., Ltd. Semantic segmentation model training methods and apparatuses, electronic devices, and storage media
CN109271990A (en) * 2018-09-03 2019-01-25 北京邮电大学 A kind of semantic segmentation method and device for RGB-D image
CN110298361A (en) * 2019-05-22 2019-10-01 浙江省北大信息技术高等研究院 A kind of semantic segmentation method and system of RGB-D image
US20200402300A1 (en) * 2019-06-21 2020-12-24 Harbin Institute Of Technology Terrain modeling method that fuses geometric characteristics and mechanical charateristics, computer readable storage medium, and terrain modeling system thereof
CN110490884A (en) * 2019-08-23 2019-11-22 北京工业大学 A kind of lightweight network semantic segmentation method based on confrontation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
钱正芳等: "浅析深度学习在未来水面无人艇平台的应用", 《中国造船》, 30 August 2020 (2020-08-30) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762263A (en) * 2021-08-17 2021-12-07 慧影医疗科技(北京)有限公司 Semantic segmentation method and system for small-scale similar structure
CN113920317A (en) * 2021-11-15 2022-01-11 西北工业大学 Semantic segmentation method based on visible light image and low-resolution depth image
CN113920317B (en) * 2021-11-15 2024-02-27 西北工业大学 Semantic segmentation method based on visible light image and low-resolution depth image

Also Published As

Publication number Publication date
CN112861911B (en) 2024-05-28

Similar Documents

Publication Publication Date Title
Zhou et al. Salient object detection in stereoscopic 3D images using a deep convolutional residual autoencoder
CN112396607B (en) Deformable convolution fusion enhanced street view image semantic segmentation method
CN108734210B (en) Object detection method based on cross-modal multi-scale feature fusion
CN112581409B (en) Image defogging method based on end-to-end multiple information distillation network
CN110751111B (en) Road extraction method and system based on high-order spatial information global automatic perception
CN111563909A (en) Semantic segmentation method for complex street view image
CN113657388A (en) Image semantic segmentation method fusing image super-resolution reconstruction
CN114820579A (en) Semantic segmentation based image composite defect detection method and system
CN112861911A (en) RGB-D semantic segmentation method based on depth feature selection fusion
CN111488884A (en) Real-time semantic segmentation method with low calculation amount and high feature fusion
CN115631513B (en) Transformer-based multi-scale pedestrian re-identification method
CN116311254B (en) Image target detection method, system and equipment under severe weather condition
WO2024040973A1 (en) Multi-scale fused dehazing method based on stacked hourglass network
CN115527096A (en) Small target detection method based on improved YOLOv5
CN114913493A (en) Lane line detection method based on deep learning
CN111860116A (en) Scene identification method based on deep learning and privilege information
CN112766056A (en) Method and device for detecting lane line in low-light environment based on deep neural network
CN114743027B (en) Weak supervision learning-guided cooperative significance detection method
CN115905838A (en) Audio-visual auxiliary fine-grained tactile signal reconstruction method
Sun et al. TSINIT: a two-stage Inpainting network for incomplete text
CN117079237A (en) Self-supervision monocular vehicle distance detection method
CN113920317B (en) Semantic segmentation method based on visible light image and low-resolution depth image
CN112733934B (en) Multi-mode feature fusion road scene semantic segmentation method in complex environment
CN114220098A (en) Improved multi-scale full-convolution network semantic segmentation method
Gao et al. RGBD semantic segmentation based on global convolutional network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant