CN112115951A - RGB-D image semantic segmentation method based on spatial relationship - Google Patents
RGB-D image semantic segmentation method based on spatial relationship Download PDFInfo
- Publication number
- CN112115951A CN112115951A CN202011301588.6A CN202011301588A CN112115951A CN 112115951 A CN112115951 A CN 112115951A CN 202011301588 A CN202011301588 A CN 202011301588A CN 112115951 A CN112115951 A CN 112115951A
- Authority
- CN
- China
- Prior art keywords
- rgb
- semantic segmentation
- module
- feature
- spatial relationship
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
The invention discloses a RGB-D image semantic segmentation method based on spatial relationship, which constructs a semantic segmentation network by taking Deeplab-v3 as a basic model and comprises a feature extraction module, a spatial relationship similarity loss module, a decoder module and a loss function module. Semantic segmentation is carried out on an RGB-D image of an indoor scene, RGB information and Depth information are effectively fused through a deep learning network, and spatial relationship similarity is introduced into a backbone network. On the basis of parallel design of a network structure, the depth information and RGB information fusion effect is assisted to be improved by calculating regional characteristic values and similarity degree measurement of the depth information and the RGB information. The method is simple and convenient only depending on sensor equipment capable of providing RGB data and depth data, and is an effective method based on image matching in Kinect, Xtion and other somatosensory equipment applications.
Description
Technical Field
The invention belongs to the field of computer image processing, and particularly relates to an RGB-D image semantic segmentation method based on a spatial relationship.
Background
Semantic segmentation is an important application in computer vision, and is widely applied to the fields of robots, automatic driving, security monitoring and the like.
Compared to conventional RGB solutions, RGB-D sensors can provide multi-mode information including color, depth. In scenes with unobvious color boundaries, weak texture features, inconsistent target depths and the like, the depth information has a strong guiding function on semantic segmentation. Based on the principle, the semantic segmentation method utilizing the RGB-D information can obtain a segmentation effect superior to that of the traditional method.
The existing RGB-D fusion scheme can be mainly divided into three types, namely 2D multi-mode semantic fusion, network structure parallel design and 3D point cloud space mapping. The 2D multi-mode semantic fusion and the network structure parallel design respectively guide the fusion of depth and RGB information in a manual excavation and network extraction mode, and the fusion effect is limited; the 3D point cloud space mapping method incurs a large amount of computational overhead.
Disclosure of Invention
The invention aims to provide an RGB-D image semantic segmentation method based on a spatial relationship aiming at the defects of the prior art.
The purpose of the invention is realized by the following technical scheme: a RGB-D image semantic segmentation method based on spatial relationship comprises the following steps:
(1) constructing a semantic segmentation network by taking Deeplab-v3 as a basic model, wherein the semantic segmentation network comprises a feature extraction module, a spatial relationship similarity loss module, a decoder module and a loss function module; inputting an RGB-D image and outputting a semantic classification score map;
(2) training the semantic segmentation network constructed in the step (1);
(3) and (3) inputting the RGB-D image to be tested to the semantic segmentation network trained in the step (2), and taking the maximum score category in the output semantic classification score map as each pixel point category to obtain a semantic segmentation image.
Further, the feature extraction module is: the Resnet101 is used as a backbone network of the feature extraction module, parallel RGB and depth branches are constructed, and the structure is kept consistent.
Further, in the training process of the step (2), data enhancement is performed by using a random inversion, cutting and gamma value transformation method; the pre-training parameters of ImageNet are loaded on the RGB in the model and the main network corresponding to the deep branch; and the model is trained using a back propagation algorithm.
Further, the construction of the spatial relationship similarity loss module comprises the following sub-steps:
(a1) respectively extracting output characteristics of b sub-modules in RGB and deep branch networks, and constructing multiple groups of pairwise relationsf i :
f i ={f i,rgb ,f i,dep }
Wherein the content of the first and second substances,b represents the number of selected sub-modules;is the RGB branch ofiThe output characteristics of the individual modules are,is a deep branchiOutput characteristics of the individual modules;
(a2) respectively mixing each groupf i Inner RGB, depth feature transformation into feature regions:
Wherein the functionRepresenting a global pooling operation based on an original feature scale downsampling;、is that、A corresponding feature region;
Wherein the content of the first and second substances,、is that、Corresponding autocorrelation spatial features;representing an autocorrelation spatial matrix;,;、to representAny two of the regions m, n in the region,、to representAny two of the regions m and n; the dst (x, y) function represents the distance operation;
(a4) calculating the distance between RGB and depth autocorrelation spatial features and generating a spatial relationship similarity loss:
Further, the dst (x, y) function is dst (x, y) = cos (norm (x), norm (y)), where norm represents a norm.
Further, the decoder module is configured to: final set of feature maps for RGB and depth branch outputPerforming feature splicing through a feature weighting module; features after splicingGenerating a characteristic diagram through a multi-scale cavity convolution module, and comparing the characteristic diagram with the original characteristic diagramAnd overlapping the channels to finally obtain a semantic classification score map.
Further, the construction of the decoder module comprises the sub-steps of:
(b1) will be provided with、Respectively inputting into the global average pooling layer, subsequently connecting with two full-connection layers with same-ratio compression and expansion of channels, activating function, and outputting;
(b3) Splicing the step (b 2)Later feature mapInputting a multi-scale cavity convolution module, parallelly passing through 4 cavity convolution layers with different scales and 1 mean value pooling layer, superposing the 5 types of outputs on a channel, compressing the outputs by convolution of 1 x 1, and outputting;
(b4) Will be provided withAndafter the features are overlapped on the channels, inputting 3 × 3 convolution layers and 1 × 1 convolution layers, and finally outputting a semantic classification score map.
Further, the loss function module is: and fitting the semantic classification score map and the real label by using the cross entropy loss as a loss function, and using a random gradient descent method as an optimization method.
The invention has the beneficial effects that: the invention relates to an image fusion descriptor method based on an RGB-D sensor, which is used for performing semantic segmentation on an RGB-D image of an indoor scene, effectively fusing RGB information and Depth information through a deep learning network and introducing spatial relationship similarity in a backbone network. On the basis of parallel design of a network structure, the depth information and RGB information fusion effect is assisted to be improved by calculating regional characteristic values and similarity degree measurement of the depth information and the RGB information. The method is simple and convenient only depending on sensor equipment capable of providing RGB data and depth data, and is an effective method based on image matching in Kinect, Xtion and other somatosensory equipment applications.
Drawings
FIG. 1 is a diagram of the overall architecture of a network;
FIG. 2 is a block diagram of spatial relationship similarity loss;
FIG. 3 is a schematic diagram illustrating the effect of the present invention; wherein, a is an image schematic diagram of an RGB-D to-be-tested image of an indoor scene, and b is a semantic classification score map.
Detailed Description
The invention relates to a RGB-D image semantic segmentation method based on spatial relationship, which comprises the following steps as shown in figure 1:
step one, constructing a semantic segmentation network:
the overall network architecture design is based on an open-source deep learning framework pytorch, and is transformed on the basis of the public Deeplab-v3 network architecture, so that three parts, namely a feature extraction module, a spatial relationship similarity loss module and a decoder module, are realized.
(1) Building feature extraction module
The module selects a backbone network of Resnet101 as a basic framework of the feature extraction module, and two parallel branches of RGB and Depth (Depth) are synchronously constructed.
(2) Building spatial relationship similarity loss module
The RGB and the depth branch structure are kept consistent, the output characteristics of four sub-modules in the network of the RGB and the depth branch structure are extracted, and four groups of pairwise relations are constructedf i And is recorded as:
f i ={f i,rgb ,f i,dep }
wherein the content of the first and second substances,i belongs to {1,2,3,4}, and corresponds to 4 groups of characteristics;is the RGB branch ofiThe output characteristics of the individual modules are,is a deep branchiOutput characteristics of the individual modules; w, h, c refer to feature map dimensions.
Pairwise relationships for each groupf i Converting the RGB, depth features within the set into feature regions, respectivelyAnd is recorded as:
wherein, the function p (x) = maxporoling (x,5), representing a global maximum pooling operation of down-sampling 5 times based on the original feature scale; then it is corresponding to,h’=h/5,w’=w/5。
Computing paired feature regionsRespectively corresponding autocorrelation spatial featuresThe autocorrelation is related to the distance of different regions in the same feature map, and is expressed as:
wherein the content of the first and second substances,、is an auto-correlated spatial feature of RGB and depth,representing an autocorrelation spatial matrix;,;、to representAny two of the regions m, n in the region,、to representAny two of the regions m and n; such asCorresponding region m isA point position element set of each channel in a third dimension corresponding to a point in the first and second dimensions; and a function of distancedstCosine distance formula dst (x, y) = cos (norm (x), norm (y)) is selected, and function norm (x) represents norm sampling.
As shown in FIG. 2, the distances between each set of RGB and depth autocorrelation spatial features are calculated and a spatial relationship similarity loss is generated:
Where b =4 represents a 4-component-pair feature map of RGB and depth branch outputs.
(3) Building decoder modules
Final set of feature maps for RGB and depth branching outputInputting a feature weighting module and completing feature splicing; features after splicingGenerating a new characteristic diagram by a multi-scale void convolution (ASPP) module, and comparing the new characteristic diagram with the original characteristic diagramAnd (4) overlapping channels, and finally generating a semantic classification score map with the channel number of 40 by a decoder module. The characteristic weighting module adopts 16 times of compression and expansion rate, and the activation function selects sigmod (x); the multiscale void convolution (ASPP) module selects (1,6,12,18) different expansion coefficients.
(3.1) performing feature splicing on the RGB and the output feature map of the last module in the depth branch, and inputting in a feature weighting and adding mode, wherein the specific process is as follows:
a) the characteristic weight is thatRespectively inputting the global average pooling layer to obtain two data corresponding to B × C × 1 × 1 scale (B, C respectively refer to batch and feature map of training processThe corresponding number of channels); subsequently, two full-connection layers with the same compression and expansion of the channels are connected, and the full-connection layers are output after the full-connection layers are activated by a function。
b) The feature summation is to add the weighted RGB and the depth branch feature value, and the calculated feature map value after feature splicing is。
(3.2) feature map after stitchingInputting the semantic classification score map into a decoder network corresponding to the Deeplab-v3, and finally outputting the semantic classification score map, wherein the specific flow is as follows:
a) characteristic diagramInputting the data into a multi-scale hole convolution module (ASPP),the input passes through 4 void convolution layers of different scales and 1 mean pooling layer mechanism in parallel. Superposing the above 5 kinds of outputs on the channel, compressing with convolution of 1 × 1, and outputting。
b) Andand after the features are superposed on the channels, inputting a standard 3 × 3 convolution layer and a standard 1 × 1 convolution layer, and outputting a final semantic classification score map.
(4) Loss function module
And fitting the semantic classification score map and a real label by using cross-entropy loss (cross-entropy loss) as a loss function, and reversely propagating the whole semantic segmentation network by using a random-gradient descent (mini-batch SGD) as an optimization method, so that the whole model framework is constructed.
Step two, selecting an open source NYU-depth v2 semantic segmentation data set as a task sample; the data lump meter comprises 1449 marked RGB-D images, 795 images are divided to be used as training sets, and 654 images are divided to be used as testing sets. In the training process, data enhancement is carried out on line by using a random overturning, cutting and gamma value transformation method. The pre-training parameters of ImageNet are loaded on the RGB in the model and the trunk network corresponding to the deep branch; and the model is trained using a back propagation algorithm.
And step three, in the task verification process, as shown in fig. 3, inputting an RGB-D image to be tested (a in fig. 3) of an indoor scene, outputting a semantic segmentation image by taking the maximum score category as each pixel point category according to an output final output semantic classification score map (b in fig. 3), and finishing the visualization process.
Claims (8)
1. A RGB-D image semantic segmentation method based on spatial relationship is characterized by comprising the following steps:
(1) constructing a semantic segmentation network by taking Deeplab-v3 as a basic model, wherein the semantic segmentation network comprises a feature extraction module, a spatial relationship similarity loss module, a decoder module and a loss function module; inputting an RGB-D image and outputting a semantic classification score map;
(2) training the semantic segmentation network constructed in the step (1);
(3) and (3) inputting the RGB-D image to be tested to the semantic segmentation network trained in the step (2), and taking the maximum score category in the output semantic classification score map as each pixel point category to obtain a semantic segmentation image.
2. The RGB-D image semantic segmentation method based on spatial relationship as claimed in claim 1, wherein the feature extraction module is: the Resnet101 is used as a backbone network of the feature extraction module, parallel RGB and depth branches are constructed, and the structure is kept consistent.
3. The RGB-D image semantic segmentation method based on spatial relationship as claimed in claim 2, wherein in the training process of step (2), data enhancement is performed by using random inversion, clipping and gamma value transformation methods; the pre-training parameters of ImageNet are loaded on the RGB in the model and the main network corresponding to the deep branch; and the model is trained using a back propagation algorithm.
4. The RGB-D image semantic segmentation method based on spatial relationship as claimed in claim 2, wherein the construction of the spatial relationship similarity loss module comprises the following sub-steps:
(a1) respectively extracting output characteristics of b sub-modules in RGB and deep branch networks, and constructing multiple groups of pairwise relationsf i :
f i ={f i,rgb ,f i,dep }
Wherein the content of the first and second substances,b represents the number of selected sub-modules;is the RGB branch ofiThe output characteristics of the individual modules are,is a deep branchiOutput characteristics of the individual modules;
Wherein the function p (x) represents a global pooling operation based on the original feature scale down-sampling;、is that、A corresponding feature region;
Wherein the content of the first and second substances,、is that、Corresponding autocorrelation spatial features; d (x) represents an autocorrelation spatial matrix;,;、to representAny two of the regions m, n in the region,、to representAny two of the regions m and n; the dst (x, y) function represents the distance operation;
(a4) calculating the distance between RGB and depth autocorrelation spatial features and generating a spatial relationship similarity loss:
5. The RGB-D image semantic segmentation method based on spatial relationship as claimed in claim 4, wherein the dst (x, y) function is dst (x, y) = cos (norm (x), norm (y)), and norm represents norm.
6. The method for semantic segmentation of RGB-D images based on spatial relationships according to claim 4, wherein the decoder module is configured to: final set of feature maps for RGB and depth branch outputPerforming feature splicing through a feature weighting module; features after splicingGenerating a characteristic diagram through a multi-scale cavity convolution module, and comparing the characteristic diagram with the original characteristic diagramAnd overlapping the channels to finally obtain a semantic classification score map.
7. The method for semantic segmentation of RGB-D images based on spatial relationships according to claim 6, wherein the construction of the decoder module includes the sub-steps of:
(b1) will be provided with、Respectively inputting into the global average pooling layer, subsequently connecting with two full-connection layers with same-ratio compression and expansion of channels, activating function, and outputting;
(b3) Splicing the characteristic graph obtained in the step (b 2)Inputting a multi-scale cavity convolution module, parallelly passing through 4 cavity convolution layers with different scales and 1 mean value pooling layer, superposing the 5 types of outputs on a channel, compressing the outputs by convolution of 1 x 1, and outputting;
8. The RGB-D image semantic segmentation method based on spatial relationship as claimed in claim 1, wherein the loss function module is: and fitting the semantic classification score map and the real label by using the cross entropy loss as a loss function, and using a random gradient descent method as an optimization method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011301588.6A CN112115951B (en) | 2020-11-19 | 2020-11-19 | RGB-D image semantic segmentation method based on spatial relationship |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011301588.6A CN112115951B (en) | 2020-11-19 | 2020-11-19 | RGB-D image semantic segmentation method based on spatial relationship |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112115951A true CN112115951A (en) | 2020-12-22 |
CN112115951B CN112115951B (en) | 2021-03-09 |
Family
ID=73794969
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011301588.6A Active CN112115951B (en) | 2020-11-19 | 2020-11-19 | RGB-D image semantic segmentation method based on spatial relationship |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112115951B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112801105A (en) * | 2021-01-22 | 2021-05-14 | 之江实验室 | Two-stage zero sample image semantic segmentation method |
CN113205520A (en) * | 2021-04-22 | 2021-08-03 | 华中科技大学 | Method and system for semantic segmentation of image |
CN113255678A (en) * | 2021-06-17 | 2021-08-13 | 云南航天工程物探检测股份有限公司 | Road crack automatic identification method based on semantic segmentation |
CN116051830A (en) * | 2022-12-20 | 2023-05-02 | 中国科学院空天信息创新研究院 | Cross-modal data fusion-oriented contrast semantic segmentation method |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113011427B (en) * | 2021-03-17 | 2022-06-21 | 中南大学 | Remote sensing image semantic segmentation method based on self-supervision contrast learning |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109635882A (en) * | 2019-01-23 | 2019-04-16 | 福州大学 | Salient object detection method based on multi-scale convolution feature extraction and fusion |
CN110458939A (en) * | 2019-07-24 | 2019-11-15 | 大连理工大学 | The indoor scene modeling method generated based on visual angle |
-
2020
- 2020-11-19 CN CN202011301588.6A patent/CN112115951B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109635882A (en) * | 2019-01-23 | 2019-04-16 | 福州大学 | Salient object detection method based on multi-scale convolution feature extraction and fusion |
CN110458939A (en) * | 2019-07-24 | 2019-11-15 | 大连理工大学 | The indoor scene modeling method generated based on visual angle |
Non-Patent Citations (2)
Title |
---|
LIN-ZHUO CHEN, ZHENG LIN, ZIQIN WANG, YONG-LIANG YANG, AND MING-: "Spatial Information Guided Convolution for", 《RESEARCHGATE》 * |
江锦东: "基于卷积神经网络的室内RGB-D图像语义分割方法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112801105A (en) * | 2021-01-22 | 2021-05-14 | 之江实验室 | Two-stage zero sample image semantic segmentation method |
CN113205520A (en) * | 2021-04-22 | 2021-08-03 | 华中科技大学 | Method and system for semantic segmentation of image |
CN113205520B (en) * | 2021-04-22 | 2022-08-05 | 华中科技大学 | Method and system for semantic segmentation of image |
CN113255678A (en) * | 2021-06-17 | 2021-08-13 | 云南航天工程物探检测股份有限公司 | Road crack automatic identification method based on semantic segmentation |
CN116051830A (en) * | 2022-12-20 | 2023-05-02 | 中国科学院空天信息创新研究院 | Cross-modal data fusion-oriented contrast semantic segmentation method |
CN116051830B (en) * | 2022-12-20 | 2023-06-20 | 中国科学院空天信息创新研究院 | Cross-modal data fusion-oriented contrast semantic segmentation method |
Also Published As
Publication number | Publication date |
---|---|
CN112115951B (en) | 2021-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112115951B (en) | RGB-D image semantic segmentation method based on spatial relationship | |
CN111080629B (en) | Method for detecting image splicing tampering | |
CN111126202B (en) | Optical remote sensing image target detection method based on void feature pyramid network | |
CN109712105B (en) | Image salient object detection method combining color and depth information | |
CN108090902A (en) | A kind of non-reference picture assessment method for encoding quality based on multiple dimensioned generation confrontation network | |
CN109685135A (en) | A kind of few sample image classification method based on modified metric learning | |
CN111563418A (en) | Asymmetric multi-mode fusion significance detection method based on attention mechanism | |
CN114511710A (en) | Image target detection method based on convolutional neural network | |
CN114387512B (en) | Remote sensing image building extraction method based on multi-scale feature fusion and enhancement | |
CN116206133A (en) | RGB-D significance target detection method | |
CN111739037B (en) | Semantic segmentation method for indoor scene RGB-D image | |
CN113963170A (en) | RGBD image saliency detection method based on interactive feature fusion | |
CN113177559A (en) | Image recognition method, system, device and medium combining breadth and dense convolutional neural network | |
CN116051977A (en) | Multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm | |
CN113689382B (en) | Tumor postoperative survival prediction method and system based on medical images and pathological images | |
CN111428650A (en) | Pedestrian re-identification method based on SP-PGGAN style migration | |
CN113066074A (en) | Visual saliency prediction method based on binocular parallax offset fusion | |
CN107909565A (en) | Stereo-picture Comfort Evaluation method based on convolutional neural networks | |
CN115311186B (en) | Cross-scale attention confrontation fusion method and terminal for infrared and visible light images | |
CN116433904A (en) | Cross-modal RGB-D semantic segmentation method based on shape perception and pixel convolution | |
CN113744205B (en) | End-to-end road crack detection system | |
CN115311117A (en) | Image watermarking system and method for style migration depth editing | |
CN115147727A (en) | Method and system for extracting impervious surface of remote sensing image | |
CN115035408A (en) | Unmanned aerial vehicle image tree species classification method based on transfer learning and attention mechanism | |
CN113111906A (en) | Method for generating confrontation network model based on condition of single pair image training |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |