CN112115951A - RGB-D image semantic segmentation method based on spatial relationship - Google Patents

RGB-D image semantic segmentation method based on spatial relationship Download PDF

Info

Publication number
CN112115951A
CN112115951A CN202011301588.6A CN202011301588A CN112115951A CN 112115951 A CN112115951 A CN 112115951A CN 202011301588 A CN202011301588 A CN 202011301588A CN 112115951 A CN112115951 A CN 112115951A
Authority
CN
China
Prior art keywords
rgb
semantic segmentation
module
feature
spatial relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011301588.6A
Other languages
Chinese (zh)
Other versions
CN112115951B (en
Inventor
张健
费哲遥
李月华
谢天
朱世强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202011301588.6A priority Critical patent/CN112115951B/en
Publication of CN112115951A publication Critical patent/CN112115951A/en
Application granted granted Critical
Publication of CN112115951B publication Critical patent/CN112115951B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention discloses a RGB-D image semantic segmentation method based on spatial relationship, which constructs a semantic segmentation network by taking Deeplab-v3 as a basic model and comprises a feature extraction module, a spatial relationship similarity loss module, a decoder module and a loss function module. Semantic segmentation is carried out on an RGB-D image of an indoor scene, RGB information and Depth information are effectively fused through a deep learning network, and spatial relationship similarity is introduced into a backbone network. On the basis of parallel design of a network structure, the depth information and RGB information fusion effect is assisted to be improved by calculating regional characteristic values and similarity degree measurement of the depth information and the RGB information. The method is simple and convenient only depending on sensor equipment capable of providing RGB data and depth data, and is an effective method based on image matching in Kinect, Xtion and other somatosensory equipment applications.

Description

RGB-D image semantic segmentation method based on spatial relationship
Technical Field
The invention belongs to the field of computer image processing, and particularly relates to an RGB-D image semantic segmentation method based on a spatial relationship.
Background
Semantic segmentation is an important application in computer vision, and is widely applied to the fields of robots, automatic driving, security monitoring and the like.
Compared to conventional RGB solutions, RGB-D sensors can provide multi-mode information including color, depth. In scenes with unobvious color boundaries, weak texture features, inconsistent target depths and the like, the depth information has a strong guiding function on semantic segmentation. Based on the principle, the semantic segmentation method utilizing the RGB-D information can obtain a segmentation effect superior to that of the traditional method.
The existing RGB-D fusion scheme can be mainly divided into three types, namely 2D multi-mode semantic fusion, network structure parallel design and 3D point cloud space mapping. The 2D multi-mode semantic fusion and the network structure parallel design respectively guide the fusion of depth and RGB information in a manual excavation and network extraction mode, and the fusion effect is limited; the 3D point cloud space mapping method incurs a large amount of computational overhead.
Disclosure of Invention
The invention aims to provide an RGB-D image semantic segmentation method based on a spatial relationship aiming at the defects of the prior art.
The purpose of the invention is realized by the following technical scheme: a RGB-D image semantic segmentation method based on spatial relationship comprises the following steps:
(1) constructing a semantic segmentation network by taking Deeplab-v3 as a basic model, wherein the semantic segmentation network comprises a feature extraction module, a spatial relationship similarity loss module, a decoder module and a loss function module; inputting an RGB-D image and outputting a semantic classification score map;
(2) training the semantic segmentation network constructed in the step (1);
(3) and (3) inputting the RGB-D image to be tested to the semantic segmentation network trained in the step (2), and taking the maximum score category in the output semantic classification score map as each pixel point category to obtain a semantic segmentation image.
Further, the feature extraction module is: the Resnet101 is used as a backbone network of the feature extraction module, parallel RGB and depth branches are constructed, and the structure is kept consistent.
Further, in the training process of the step (2), data enhancement is performed by using a random inversion, cutting and gamma value transformation method; the pre-training parameters of ImageNet are loaded on the RGB in the model and the main network corresponding to the deep branch; and the model is trained using a back propagation algorithm.
Further, the construction of the spatial relationship similarity loss module comprises the following sub-steps:
(a1) respectively extracting output characteristics of b sub-modules in RGB and deep branch networks, and constructing multiple groups of pairwise relationsf i
f i ={f i,rgb ,f i,dep }
Wherein the content of the first and second substances,
Figure 381115DEST_PATH_IMAGE001
b represents the number of selected sub-modules;
Figure 94993DEST_PATH_IMAGE002
is the RGB branch ofiThe output characteristics of the individual modules are,
Figure 405889DEST_PATH_IMAGE003
is a deep branchiOutput characteristics of the individual modules;
(a2) respectively mixing each groupf i Inner RGB, depth feature transformation into feature regions
Figure 597836DEST_PATH_IMAGE004
Figure 943366DEST_PATH_IMAGE005
Wherein the function
Figure 296987DEST_PATH_IMAGE006
Representing a global pooling operation based on an original feature scale downsampling;
Figure 360758DEST_PATH_IMAGE007
Figure 621975DEST_PATH_IMAGE008
is that
Figure 87592DEST_PATH_IMAGE002
Figure 612114DEST_PATH_IMAGE009
A corresponding feature region;
(a3) computing paired feature regions
Figure 897602DEST_PATH_IMAGE004
Corresponding autocorrelation spatial features
Figure 165772DEST_PATH_IMAGE010
Figure 220316DEST_PATH_IMAGE011
Figure 446898DEST_PATH_IMAGE012
Figure 219682DEST_PATH_IMAGE013
Wherein the content of the first and second substances,
Figure 25964DEST_PATH_IMAGE014
Figure 206453DEST_PATH_IMAGE015
is that
Figure 338357DEST_PATH_IMAGE007
Figure 598437DEST_PATH_IMAGE008
Corresponding autocorrelation spatial features;
Figure 208410DEST_PATH_IMAGE016
representing an autocorrelation spatial matrix;
Figure 237545DEST_PATH_IMAGE017
Figure 805930DEST_PATH_IMAGE018
Figure 22148DEST_PATH_IMAGE019
Figure 701391DEST_PATH_IMAGE020
to represent
Figure 319454DEST_PATH_IMAGE007
Any two of the regions m, n in the region,
Figure 793161DEST_PATH_IMAGE021
Figure 27833DEST_PATH_IMAGE022
to represent
Figure 979608DEST_PATH_IMAGE008
Any two of the regions m and n; the dst (x, y) function represents the distance operation;
(a4) calculating the distance between RGB and depth autocorrelation spatial features and generating a spatial relationship similarity loss
Figure 983336DEST_PATH_IMAGE023
Figure 627944DEST_PATH_IMAGE024
Further, the dst (x, y) function is dst (x, y) = cos (norm (x), norm (y)), where norm represents a norm.
Further, the decoder module is configured to: final set of feature maps for RGB and depth branch output
Figure 84333DEST_PATH_IMAGE025
Performing feature splicing through a feature weighting module; features after splicing
Figure 839800DEST_PATH_IMAGE026
Generating a characteristic diagram through a multi-scale cavity convolution module, and comparing the characteristic diagram with the original characteristic diagram
Figure 432455DEST_PATH_IMAGE026
And overlapping the channels to finally obtain a semantic classification score map.
Further, the construction of the decoder module comprises the sub-steps of:
(b1) will be provided with
Figure 247965DEST_PATH_IMAGE027
Figure 191650DEST_PATH_IMAGE028
Respectively inputting into the global average pooling layer, subsequently connecting with two full-connection layers with same-ratio compression and expansion of channels, activating function, and outputting
Figure 16386DEST_PATH_IMAGE029
(b2) Outputting step (b 1)
Figure 729127DEST_PATH_IMAGE029
Adding to obtain a feature map after feature splicing
Figure 715538DEST_PATH_IMAGE026
(b3) Splicing the step (b 2)Later feature map
Figure 146519DEST_PATH_IMAGE026
Inputting a multi-scale cavity convolution module, parallelly passing through 4 cavity convolution layers with different scales and 1 mean value pooling layer, superposing the 5 types of outputs on a channel, compressing the outputs by convolution of 1 x 1, and outputting
Figure 243788DEST_PATH_IMAGE030
(b4) Will be provided with
Figure 811036DEST_PATH_IMAGE026
And
Figure 968348DEST_PATH_IMAGE030
after the features are overlapped on the channels, inputting 3 × 3 convolution layers and 1 × 1 convolution layers, and finally outputting a semantic classification score map.
Further, the loss function module is: and fitting the semantic classification score map and the real label by using the cross entropy loss as a loss function, and using a random gradient descent method as an optimization method.
The invention has the beneficial effects that: the invention relates to an image fusion descriptor method based on an RGB-D sensor, which is used for performing semantic segmentation on an RGB-D image of an indoor scene, effectively fusing RGB information and Depth information through a deep learning network and introducing spatial relationship similarity in a backbone network. On the basis of parallel design of a network structure, the depth information and RGB information fusion effect is assisted to be improved by calculating regional characteristic values and similarity degree measurement of the depth information and the RGB information. The method is simple and convenient only depending on sensor equipment capable of providing RGB data and depth data, and is an effective method based on image matching in Kinect, Xtion and other somatosensory equipment applications.
Drawings
FIG. 1 is a diagram of the overall architecture of a network;
FIG. 2 is a block diagram of spatial relationship similarity loss;
FIG. 3 is a schematic diagram illustrating the effect of the present invention; wherein, a is an image schematic diagram of an RGB-D to-be-tested image of an indoor scene, and b is a semantic classification score map.
Detailed Description
The invention relates to a RGB-D image semantic segmentation method based on spatial relationship, which comprises the following steps as shown in figure 1:
step one, constructing a semantic segmentation network:
the overall network architecture design is based on an open-source deep learning framework pytorch, and is transformed on the basis of the public Deeplab-v3 network architecture, so that three parts, namely a feature extraction module, a spatial relationship similarity loss module and a decoder module, are realized.
(1) Building feature extraction module
The module selects a backbone network of Resnet101 as a basic framework of the feature extraction module, and two parallel branches of RGB and Depth (Depth) are synchronously constructed.
(2) Building spatial relationship similarity loss module
The RGB and the depth branch structure are kept consistent, the output characteristics of four sub-modules in the network of the RGB and the depth branch structure are extracted, and four groups of pairwise relations are constructedf i And is recorded as:
f i ={f i,rgb ,f i,dep }
wherein the content of the first and second substances,
Figure 886625DEST_PATH_IMAGE031
i belongs to {1,2,3,4}, and corresponds to 4 groups of characteristics;
Figure 787585DEST_PATH_IMAGE002
is the RGB branch ofiThe output characteristics of the individual modules are,
Figure 209339DEST_PATH_IMAGE009
is a deep branchiOutput characteristics of the individual modules; w, h, c refer to feature map dimensions.
Pairwise relationships for each groupf i Converting the RGB, depth features within the set into feature regions, respectively
Figure 537552DEST_PATH_IMAGE004
And is recorded as:
Figure 937266DEST_PATH_IMAGE032
wherein, the function p (x) = maxporoling (x,5), representing a global maximum pooling operation of down-sampling 5 times based on the original feature scale; then it is corresponding to
Figure 641917DEST_PATH_IMAGE033
,h’=h/5,w’=w/5。
Computing paired feature regions
Figure 183757DEST_PATH_IMAGE004
Respectively corresponding autocorrelation spatial features
Figure 682871DEST_PATH_IMAGE010
The autocorrelation is related to the distance of different regions in the same feature map, and is expressed as:
Figure 310162DEST_PATH_IMAGE034
Figure 552924DEST_PATH_IMAGE035
Figure 949271DEST_PATH_IMAGE036
wherein the content of the first and second substances,
Figure 619286DEST_PATH_IMAGE037
Figure 733873DEST_PATH_IMAGE038
is an auto-correlated spatial feature of RGB and depth,
Figure 780326DEST_PATH_IMAGE039
representing an autocorrelation spatial matrix;
Figure 296758DEST_PATH_IMAGE017
Figure 137675DEST_PATH_IMAGE018
Figure 739558DEST_PATH_IMAGE040
Figure 324123DEST_PATH_IMAGE041
to represent
Figure 695062DEST_PATH_IMAGE042
Any two of the regions m, n in the region,
Figure 706880DEST_PATH_IMAGE043
Figure 530479DEST_PATH_IMAGE044
to represent
Figure 918736DEST_PATH_IMAGE008
Any two of the regions m and n; such as
Figure 144180DEST_PATH_IMAGE040
Corresponding region m is
Figure 326900DEST_PATH_IMAGE042
A point position element set of each channel in a third dimension corresponding to a point in the first and second dimensions; and a function of distancedstCosine distance formula dst (x, y) = cos (norm (x), norm (y)) is selected, and function norm (x) represents norm sampling.
As shown in FIG. 2, the distances between each set of RGB and depth autocorrelation spatial features are calculated and a spatial relationship similarity loss is generated
Figure 903375DEST_PATH_IMAGE023
Figure 829743DEST_PATH_IMAGE045
Where b =4 represents a 4-component-pair feature map of RGB and depth branch outputs.
(3) Building decoder modules
Final set of feature maps for RGB and depth branching output
Figure 909694DEST_PATH_IMAGE046
Inputting a feature weighting module and completing feature splicing; features after splicing
Figure 528894DEST_PATH_IMAGE026
Generating a new characteristic diagram by a multi-scale void convolution (ASPP) module, and comparing the new characteristic diagram with the original characteristic diagram
Figure 327086DEST_PATH_IMAGE026
And (4) overlapping channels, and finally generating a semantic classification score map with the channel number of 40 by a decoder module. The characteristic weighting module adopts 16 times of compression and expansion rate, and the activation function selects sigmod (x); the multiscale void convolution (ASPP) module selects (1,6,12,18) different expansion coefficients.
(3.1) performing feature splicing on the RGB and the output feature map of the last module in the depth branch, and inputting in a feature weighting and adding mode, wherein the specific process is as follows:
a) the characteristic weight is that
Figure 57145DEST_PATH_IMAGE047
Respectively inputting the global average pooling layer to obtain two data corresponding to B × C × 1 × 1 scale (B, C respectively refer to batch and feature map of training process
Figure 991603DEST_PATH_IMAGE048
The corresponding number of channels); subsequently, two full-connection layers with the same compression and expansion of the channels are connected, and the full-connection layers are output after the full-connection layers are activated by a function
Figure 781704DEST_PATH_IMAGE029
b) The feature summation is to add the weighted RGB and the depth branch feature value, and the calculated feature map value after feature splicing is
Figure 67192DEST_PATH_IMAGE049
(3.2) feature map after stitching
Figure 335362DEST_PATH_IMAGE026
Inputting the semantic classification score map into a decoder network corresponding to the Deeplab-v3, and finally outputting the semantic classification score map, wherein the specific flow is as follows:
a) characteristic diagram
Figure 389906DEST_PATH_IMAGE026
Inputting the data into a multi-scale hole convolution module (ASPP),
Figure 622347DEST_PATH_IMAGE026
the input passes through 4 void convolution layers of different scales and 1 mean pooling layer mechanism in parallel. Superposing the above 5 kinds of outputs on the channel, compressing with convolution of 1 × 1, and outputting
Figure 129552DEST_PATH_IMAGE030
b)
Figure 935834DEST_PATH_IMAGE026
And
Figure 376043DEST_PATH_IMAGE030
and after the features are superposed on the channels, inputting a standard 3 × 3 convolution layer and a standard 1 × 1 convolution layer, and outputting a final semantic classification score map.
(4) Loss function module
And fitting the semantic classification score map and a real label by using cross-entropy loss (cross-entropy loss) as a loss function, and reversely propagating the whole semantic segmentation network by using a random-gradient descent (mini-batch SGD) as an optimization method, so that the whole model framework is constructed.
Step two, selecting an open source NYU-depth v2 semantic segmentation data set as a task sample; the data lump meter comprises 1449 marked RGB-D images, 795 images are divided to be used as training sets, and 654 images are divided to be used as testing sets. In the training process, data enhancement is carried out on line by using a random overturning, cutting and gamma value transformation method. The pre-training parameters of ImageNet are loaded on the RGB in the model and the trunk network corresponding to the deep branch; and the model is trained using a back propagation algorithm.
And step three, in the task verification process, as shown in fig. 3, inputting an RGB-D image to be tested (a in fig. 3) of an indoor scene, outputting a semantic segmentation image by taking the maximum score category as each pixel point category according to an output final output semantic classification score map (b in fig. 3), and finishing the visualization process.

Claims (8)

1. A RGB-D image semantic segmentation method based on spatial relationship is characterized by comprising the following steps:
(1) constructing a semantic segmentation network by taking Deeplab-v3 as a basic model, wherein the semantic segmentation network comprises a feature extraction module, a spatial relationship similarity loss module, a decoder module and a loss function module; inputting an RGB-D image and outputting a semantic classification score map;
(2) training the semantic segmentation network constructed in the step (1);
(3) and (3) inputting the RGB-D image to be tested to the semantic segmentation network trained in the step (2), and taking the maximum score category in the output semantic classification score map as each pixel point category to obtain a semantic segmentation image.
2. The RGB-D image semantic segmentation method based on spatial relationship as claimed in claim 1, wherein the feature extraction module is: the Resnet101 is used as a backbone network of the feature extraction module, parallel RGB and depth branches are constructed, and the structure is kept consistent.
3. The RGB-D image semantic segmentation method based on spatial relationship as claimed in claim 2, wherein in the training process of step (2), data enhancement is performed by using random inversion, clipping and gamma value transformation methods; the pre-training parameters of ImageNet are loaded on the RGB in the model and the main network corresponding to the deep branch; and the model is trained using a back propagation algorithm.
4. The RGB-D image semantic segmentation method based on spatial relationship as claimed in claim 2, wherein the construction of the spatial relationship similarity loss module comprises the following sub-steps:
(a1) respectively extracting output characteristics of b sub-modules in RGB and deep branch networks, and constructing multiple groups of pairwise relationsf i
f i ={f i,rgb ,f i,dep }
Wherein the content of the first and second substances,
Figure 30163DEST_PATH_IMAGE001
b represents the number of selected sub-modules;
Figure 290243DEST_PATH_IMAGE002
is the RGB branch ofiThe output characteristics of the individual modules are,
Figure 171655DEST_PATH_IMAGE003
is a deep branchiOutput characteristics of the individual modules;
(a2) respectively mixing each group
Figure 200791DEST_PATH_IMAGE004
Inner RGB, depth feature transformation into feature regions
Figure 238017DEST_PATH_IMAGE005
Figure 250972DEST_PATH_IMAGE006
Wherein the function p (x) represents a global pooling operation based on the original feature scale down-sampling;
Figure 664636DEST_PATH_IMAGE007
Figure 813858DEST_PATH_IMAGE008
is that
Figure 287564DEST_PATH_IMAGE002
Figure 256657DEST_PATH_IMAGE003
A corresponding feature region;
(a3) computing paired feature regions
Figure 208433DEST_PATH_IMAGE005
Corresponding autocorrelation spatial features
Figure 212161DEST_PATH_IMAGE009
Figure 856769DEST_PATH_IMAGE010
Figure 313158DEST_PATH_IMAGE011
Figure 334204DEST_PATH_IMAGE012
Wherein the content of the first and second substances,
Figure 192438DEST_PATH_IMAGE013
Figure 7948DEST_PATH_IMAGE014
is that
Figure 951633DEST_PATH_IMAGE007
Figure 245211DEST_PATH_IMAGE008
Corresponding autocorrelation spatial features; d (x) represents an autocorrelation spatial matrix;
Figure 223531DEST_PATH_IMAGE015
Figure 741100DEST_PATH_IMAGE016
Figure 172082DEST_PATH_IMAGE017
Figure 534930DEST_PATH_IMAGE018
to represent
Figure 102177DEST_PATH_IMAGE007
Any two of the regions m, n in the region,
Figure 259489DEST_PATH_IMAGE019
Figure 443346DEST_PATH_IMAGE020
to represent
Figure 78727DEST_PATH_IMAGE008
Any two of the regions m and n; the dst (x, y) function represents the distance operation;
(a4) calculating the distance between RGB and depth autocorrelation spatial features and generating a spatial relationship similarity loss
Figure 766060DEST_PATH_IMAGE021
Figure 94273DEST_PATH_IMAGE022
5. The RGB-D image semantic segmentation method based on spatial relationship as claimed in claim 4, wherein the dst (x, y) function is dst (x, y) = cos (norm (x), norm (y)), and norm represents norm.
6. The method for semantic segmentation of RGB-D images based on spatial relationships according to claim 4, wherein the decoder module is configured to: final set of feature maps for RGB and depth branch output
Figure 234267DEST_PATH_IMAGE023
Performing feature splicing through a feature weighting module; features after splicing
Figure 222075DEST_PATH_IMAGE024
Generating a characteristic diagram through a multi-scale cavity convolution module, and comparing the characteristic diagram with the original characteristic diagram
Figure 763915DEST_PATH_IMAGE024
And overlapping the channels to finally obtain a semantic classification score map.
7. The method for semantic segmentation of RGB-D images based on spatial relationships according to claim 6, wherein the construction of the decoder module includes the sub-steps of:
(b1) will be provided with
Figure 263030DEST_PATH_IMAGE025
Figure 890320DEST_PATH_IMAGE026
Respectively inputting into the global average pooling layer, subsequently connecting with two full-connection layers with same-ratio compression and expansion of channels, activating function, and outputting
Figure 867503DEST_PATH_IMAGE027
(b2) Outputting step (b 1)
Figure 795008DEST_PATH_IMAGE028
Adding to obtain a feature map after feature splicing
Figure 465024DEST_PATH_IMAGE024
(b3) Splicing the characteristic graph obtained in the step (b 2)
Figure 579610DEST_PATH_IMAGE024
Inputting a multi-scale cavity convolution module, parallelly passing through 4 cavity convolution layers with different scales and 1 mean value pooling layer, superposing the 5 types of outputs on a channel, compressing the outputs by convolution of 1 x 1, and outputting
Figure 360485DEST_PATH_IMAGE029
(b4) Will be provided with
Figure 142496DEST_PATH_IMAGE024
And
Figure 983413DEST_PATH_IMAGE029
after the features are overlapped on the channels, inputting 3 × 3 convolution layers and 1 × 1 convolution layers, and finally outputting a semantic classification score map.
8. The RGB-D image semantic segmentation method based on spatial relationship as claimed in claim 1, wherein the loss function module is: and fitting the semantic classification score map and the real label by using the cross entropy loss as a loss function, and using a random gradient descent method as an optimization method.
CN202011301588.6A 2020-11-19 2020-11-19 RGB-D image semantic segmentation method based on spatial relationship Active CN112115951B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011301588.6A CN112115951B (en) 2020-11-19 2020-11-19 RGB-D image semantic segmentation method based on spatial relationship

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011301588.6A CN112115951B (en) 2020-11-19 2020-11-19 RGB-D image semantic segmentation method based on spatial relationship

Publications (2)

Publication Number Publication Date
CN112115951A true CN112115951A (en) 2020-12-22
CN112115951B CN112115951B (en) 2021-03-09

Family

ID=73794969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011301588.6A Active CN112115951B (en) 2020-11-19 2020-11-19 RGB-D image semantic segmentation method based on spatial relationship

Country Status (1)

Country Link
CN (1) CN112115951B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801105A (en) * 2021-01-22 2021-05-14 之江实验室 Two-stage zero sample image semantic segmentation method
CN113205520A (en) * 2021-04-22 2021-08-03 华中科技大学 Method and system for semantic segmentation of image
CN113255678A (en) * 2021-06-17 2021-08-13 云南航天工程物探检测股份有限公司 Road crack automatic identification method based on semantic segmentation
CN116051830A (en) * 2022-12-20 2023-05-02 中国科学院空天信息创新研究院 Cross-modal data fusion-oriented contrast semantic segmentation method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011427B (en) * 2021-03-17 2022-06-21 中南大学 Remote sensing image semantic segmentation method based on self-supervision contrast learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635882A (en) * 2019-01-23 2019-04-16 福州大学 Salient object detection method based on multi-scale convolution feature extraction and fusion
CN110458939A (en) * 2019-07-24 2019-11-15 大连理工大学 The indoor scene modeling method generated based on visual angle

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635882A (en) * 2019-01-23 2019-04-16 福州大学 Salient object detection method based on multi-scale convolution feature extraction and fusion
CN110458939A (en) * 2019-07-24 2019-11-15 大连理工大学 The indoor scene modeling method generated based on visual angle

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIN-ZHUO CHEN, ZHENG LIN, ZIQIN WANG, YONG-LIANG YANG, AND MING-: "Spatial Information Guided Convolution for", 《RESEARCHGATE》 *
江锦东: "基于卷积神经网络的室内RGB-D图像语义分割方法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801105A (en) * 2021-01-22 2021-05-14 之江实验室 Two-stage zero sample image semantic segmentation method
CN113205520A (en) * 2021-04-22 2021-08-03 华中科技大学 Method and system for semantic segmentation of image
CN113205520B (en) * 2021-04-22 2022-08-05 华中科技大学 Method and system for semantic segmentation of image
CN113255678A (en) * 2021-06-17 2021-08-13 云南航天工程物探检测股份有限公司 Road crack automatic identification method based on semantic segmentation
CN116051830A (en) * 2022-12-20 2023-05-02 中国科学院空天信息创新研究院 Cross-modal data fusion-oriented contrast semantic segmentation method
CN116051830B (en) * 2022-12-20 2023-06-20 中国科学院空天信息创新研究院 Cross-modal data fusion-oriented contrast semantic segmentation method

Also Published As

Publication number Publication date
CN112115951B (en) 2021-03-09

Similar Documents

Publication Publication Date Title
CN112115951B (en) RGB-D image semantic segmentation method based on spatial relationship
CN111080629B (en) Method for detecting image splicing tampering
CN111126202B (en) Optical remote sensing image target detection method based on void feature pyramid network
CN109712105B (en) Image salient object detection method combining color and depth information
CN108090902A (en) A kind of non-reference picture assessment method for encoding quality based on multiple dimensioned generation confrontation network
CN109685135A (en) A kind of few sample image classification method based on modified metric learning
CN111563418A (en) Asymmetric multi-mode fusion significance detection method based on attention mechanism
CN114511710A (en) Image target detection method based on convolutional neural network
CN114387512B (en) Remote sensing image building extraction method based on multi-scale feature fusion and enhancement
CN116206133A (en) RGB-D significance target detection method
CN111739037B (en) Semantic segmentation method for indoor scene RGB-D image
CN113963170A (en) RGBD image saliency detection method based on interactive feature fusion
CN113177559A (en) Image recognition method, system, device and medium combining breadth and dense convolutional neural network
CN116051977A (en) Multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm
CN113689382B (en) Tumor postoperative survival prediction method and system based on medical images and pathological images
CN111428650A (en) Pedestrian re-identification method based on SP-PGGAN style migration
CN113066074A (en) Visual saliency prediction method based on binocular parallax offset fusion
CN107909565A (en) Stereo-picture Comfort Evaluation method based on convolutional neural networks
CN115311186B (en) Cross-scale attention confrontation fusion method and terminal for infrared and visible light images
CN116433904A (en) Cross-modal RGB-D semantic segmentation method based on shape perception and pixel convolution
CN113744205B (en) End-to-end road crack detection system
CN115311117A (en) Image watermarking system and method for style migration depth editing
CN115147727A (en) Method and system for extracting impervious surface of remote sensing image
CN115035408A (en) Unmanned aerial vehicle image tree species classification method based on transfer learning and attention mechanism
CN113111906A (en) Method for generating confrontation network model based on condition of single pair image training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant