CN114419323A - Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method - Google Patents
Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method Download PDFInfo
- Publication number
- CN114419323A CN114419323A CN202210328137.4A CN202210328137A CN114419323A CN 114419323 A CN114419323 A CN 114419323A CN 202210328137 A CN202210328137 A CN 202210328137A CN 114419323 A CN114419323 A CN 114419323A
- Authority
- CN
- China
- Prior art keywords
- semantic segmentation
- image
- domain
- rgbd
- cross
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
- G06F18/256—Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The method is based on cross-modal learning and field self-adaptive RGBD image semantic segmentation, and adopts data of two modalities, namely RGB and depth images, as input to construct a cross-modal based image semantic segmentation network; the method adopts Jensen-Shannon divergence to enable semantic segmentation results of each branch of the network to be consistent as much as possible. The method designs a set of counter-generating-based domain self-adaptive method, and obtains three semantic segmentation results by taking a semantic segmentation network as a generator; three discriminators are designed, and three semantic segmentation results are respectively used as the input of the discriminators; the generator makes the semantic segmentation of the source field and the target field consistent in distribution as much as possible; the discriminator aims to correctly distinguish which domain the semantic segmentation result comes from; the purpose of the generator and the discriminator is contrary, the generator and the discriminator are mutually improved in the continuous game, and finally the alignment of different fields on the output level is realized, namely the RGBD data is marked with high precision in cross-fields.
Description
Technical Field
The invention relates to a cross-modal learning and domain self-adaptive RGBD image semantic segmentation method, belonging to the technical field of image semantic segmentation.
Background
In recent years, image semantic segmentation based on deep convolutional networks has advanced significantly. Training a segmented network requires a large amount of annotation data. However, manual annotation acquisition of large numbers of pixel-level semantically segmented annotation data sets is quite difficult due to time and labor. Accurately predicting the class of all the pixels in an image remains a challenging problem, especially when the model is trained on one dataset (source domain) and predicted on another dataset (target domain). The difference between the source domain and the target domain causes the accuracy of the model trained on the source domain to be reduced to some extent on the target domain. Even in the absence of training data, intelligent critical systems, such as autonomous systems, require that autonomous vehicles perform robustly in a variety of different test environments. For these systems, the semantic segmentation model trained on the images in the sunny day of Beijing should also have good prediction results on the images in the foggy day of Kunming.
Although autonomous vehicles and robots are equipped with various sensors such as RGB images and depth images, most of the existing image semantic segmentation techniques focus on semantic segmentation of RGB images. Few field-adaptive technology-based schemes can focus on data of both RGB and depth modalities. In order to solve the problem of difference between different fields, the current main method adopts a field self-adaptive technology to align source field data with artificial labeling information and target field data without artificial labeling information.
Many semantic segmentation methods based on domain adaptive techniques use an antagonistic learning approach to narrow the gap between the source domain and the target domain. In the field adaptive learning based on the antagonistic learning, the purposes of the generator and the discriminator are mutually contradictory, the generator and the discriminator are mutually improved in the continuous game process, and finally the source field or the target field is selected under the condition that the discrimination capability of the discriminator is reliable, so that the field difference between the source field and the target field is reduced. The current few methods combine cross-modal learning and a domain adaptive method to achieve high-precision cross-dataset semantic segmentation of image scenes.
Previous research and practice has shown that fully mining the complementarity of multi-modal data can enhance the semantic understanding of a viewer to a scene. Therefore, it is necessary to sufficiently mine two modality information of the RGB image and the depth image, and improve the accuracy of semantic segmentation of an image scene based on domain adaptation of the contrast learning.
Disclosure of Invention
The invention aims to solve the technical problem of realizing semantic segmentation of an image semantic scene across a data set by fully mining two modal information of an RGB image and a depth image and combining field self-adaptation based on countermeasure generation type learning.
The technical scheme of the invention is that a cross-modal learning and domain self-adaptive RGBD image semantic segmentation method is adopted, two modal data of an RGB image and a depth image are used as input, a cross-modal based image semantic segmentation network is constructed, and the image semantic segmentation method is in the source domainTraining a semantic segmentation network supervised; in order to fully utilize the data of the two modes, JS (Jensen-Shannon) divergence is adopted to measure the difference between different probability distributions; the method designs a domain self-adaptive method based on a countermeasure generation formula, and three semantic segmentation probability outputs of an RGBD image are obtained by taking a semantic segmentation network as a generator; three discriminators based on convolutional neural network are designed, and related weight information graphs are output according to three semantic segmentation probabilities of the semantic segmentation networkAs an input to the arbiter; the purpose of the generator is to make the source and target domains as similar as possible in the distribution of the output; the purpose of the discriminator is to distinguish whether the corresponding semantic segmentation probability output is from a sample of a target field or a source field as much as possible, namely which field the input sample comes from; the purposes of the generator and the discriminator are opposite, the generator and the discriminator are mutually improved in the continuous game process, and finally, the source field or the target field of the input sample still cannot be distinguished under the condition that the discrimination capability of the discriminator is reliable, so that the source field and the target field are aligned on the output level, namely, the difference of the source field and the target field about image semantic segmentation is reduced.
The method adopts two deep neural networks (semantic segmentation networks such as deplab) to respectively extract 256-dimensional RGB image featuresAnd 256-dimensional depth image features(ii) a The RGB image features and the depth image features are directly fused to form 512-dimensional fused features:
Wherein the content of the first and second substances,representing a feature join operation;、andis subjected to convolution,And respectively obtaining semantic segmentation probability output after the up-sampling operation、And(ii) a Assuming that the height and width of the image input to the semantic segmentation network are H and W, respectively, and the predefined number of semantic categories is K, then、Andis of dimension ofThe elements in the matrix represent the probability of the model about the pixel prediction category at the corresponding spatial position of the RGBD image.
The image semantic segmentation method is in the source fieldSupervised training of semantic segmentation networks:
hypothesis source domainFor a pair of labelled RGB and depth imagesIt is shown that, among others,which represents an RGB image, is provided,a depth image is represented in the image,a truth label representing manual labeling;
output of RGBD image semantic segmentation model、Andabout a sampleThe supervised segmentation loss of (a) can be expressed as:
h and W respectively represent the height, width and width of an RGBD image, and K represents the number of semantic label types;representation matrixAt the position of image spaceProbability for semantic class C;representation matrixAt the position of image spaceProbability for semantic class C;representation matrixAt the position of image spaceProbability for semantic class C;representing the position of a tag in image spaceThe value for semantic class C.
wherein the content of the first and second substances,representing KL divergence measures to measure two probability outputsAndthe degree of difference between;representation matrixAt the position of image spaceProbability values for semantic class C.
where H is the height of the RGBD image; w is the width of the RGBD image; k is the number of semantic categories;representation matrixAt the position of image spaceProbability for semantic class C;representation matrixAt the position of image spaceProbability for semantic class C;representation matrixAt the position of image spaceProbability for semantic class C.
The cross-modality based image semantic segmentation network has network input of RGB image and depth image.
In the training phase, the output of the network is a probabilistic output、And;、andis of the dimension of(ii) a Three areThe probability output is essentially the distribution of the segmentation network with respect to semantic classes currently predicted across modal input samples.
In the testing stage, the network outputs the weighted sum of the corresponding elements with respect to the three probabilities to obtain the final semantic segmentation result of the RGBD image.
The three discriminators based on the convolutional neural network、Andconvolutional neural networks having the same network structure, the input size beingThe output values are 0 and 1; 0 and 1 correspond to the target domain and the source domain, respectively;
output according to probability、Andthe weight information maps are respectively calculated as follows:
wherein the content of the first and second substances,、andrespectively representing prediction weight information、Andat the corresponding positionA value of (d) above; the RGBD images of the source domain and the target domain are input into the semantic segmentation network to obtain the weight information map according to the formula for calculating the weight information map、And;
wherein the content of the first and second substances,、andare respectively a discriminator、Andparameters to be solved;、andrespectively-indicated discriminator、Andcorresponding domain cross entropy loss;andrespectively representing the number of RGB images and depth image pairs used for training in the source field and the target field;representing an RGBD image pair in a source domain;representing an RGBD image pair in a target domain;、andrespectively representCorresponding to、And;、andrespectively representCorresponding to、And。
the target function of the cross-modality based image semantic segmentation network can be expressed as the sum of supervised loss of a source domain and antagonistic loss of a target domain:
wherein the content of the first and second substances,is the network parameter of the semantic segmentation model to be solved;is to fight against the lossCorresponding weights, which are typically set manually by cross-validation;supervised losses are generated about the source field for the RGBD image semantic segmentation network;
the loss of the RGBD image semantic segmentation network in terms of supervision of the source field is expressed as the sum of cross entropy loss and cross modal loss:
wherein the content of the first and second substances,andweights representing the corresponding losses, the weights typically being set manually by cross-validation;
representing a probabilistic outputAndin betweenLoss of divergence;representing a probabilistic outputAndin betweenLoss of divergence;、、are respectively as、Andabout a sample,,Supervised segmentation loss of (2);
wherein the content of the first and second substances,、andrespectively representCorresponding to、And;、andrespectively representCorresponding to、And;、andrespectively-indicated discriminator、Andcorresponding domain cross entropy loss;representing the number of RGB picture and depth image pairs used for target domain training.
The RGBD-oriented image semantic segmentation method based on the domain self-adaptation comprises the following training steps:
s1: on a source field with a label, training a semantic segmentation network by utilizing an RGBD image semantic segmentation network about a supervised loss formula of the source field until convergence or a certain number of times of termination of iteration is reached;
s2: using discriminators on source and target domains、Andtraining a discriminator by using the target function formula;
s3: training a semantic segmentation network on a source field and a target field by using an object function of the RGBD image semantic segmentation network;
s4: steps S2 and S3 are repeated until a certain number of iterations is met, or the semantic segmentation network converges and the discriminator cannot correctly distinguish from which domain the training data came from.
The cross-modal image semantic segmentation network based on the RGB image and the depth image has the advantages that the cross-modal image semantic segmentation network based on the RGB image and the depth image better fuses two different modal data of the RGB image and the depth image; the image semantic segmentation method designed by the invention is based on the field self-adaptive technology, and the image in the target field does not need a semantic segmentation label of manual annotation, so that the workload of manual annotation of the image is reduced; the segmentation method designed by the invention inputs two different modal data of the RGB image and the depth image in the training and testing stages, adopts cross-modal learning to fully mine the correlation of the two different modal data of the RGB image and the depth image, and can improve the precision of image semantic segmentation; the method combines antagonistic learning and cross-modal learning, can reduce the difference between the source field and the target field, and is favorable for improving the generalization of the model between different data sets.
Drawings
FIG. 1 is a schematic diagram of an RGB image;
FIG. 2 is a diagram illustrating image semantic segmentation results of an RGB image;
FIG. 3 is a schematic diagram of a cross-modality based RGBD image semantic segmentation network according to the present invention;
FIG. 4 is a diagram illustrating the expression of the adaptive and correlated loss function in the anti-generative domain according to the present invention.
Detailed Description
The task of image semantic segmentation is to predict the class of each picture element in an image. As shown in fig. 1 and fig. 2, according to a certain segmentation method or rule, the RGB image in fig. 1 is semantically segmented to obtain the segmentation result shown in fig. 2, wherein the pixels of each category are replaced by corresponding gray levels, for example, the pixels corresponding to the car correspond to black.
The model designed in this embodiment inputs RGBD multi-modal data in the training and testing stages, that is, one RGB image and one corresponding depth image. The output is the result of semantic segmentation of the RGBD image, i.e. the class label of each pixel on each image. It is assumed here that the height and width of the RGBD image are H and W, respectively, and the preset number of semantic categories is K.
As shown in FIG. 3, the invention adopts two deep neural networks to respectively extract 256-dimensional RGB image featuresAnd 256-dimensional depth image features. Here, the deep neural network may use a common convolutional neural network, such as deplab V3. The features of the RGB image and the depth image are directly fused to form 512-dimensional fused features:
Wherein the content of the first and second substances,indicating a feature join operation.、Andafter being convoluted, the,And obtaining probability outputs after up-sampling operations、And。、andthe matrix is a dimension (H, W, K), and elements in the matrix represent the probability of the model about pixel prediction categories at corresponding spatial positions of the RGBD image.
The semantic segmentation method designed by the embodiment is in the source fieldThe semantic segmentation network is trained supervised. Hypothesis source domainFor a pair of labelled RGB and depth imagesAnd (4) showing.
Wherein the content of the first and second substances,which represents an RGB image, is provided,a depth image is represented in the image,representing a manually marked truth label.
Output of the RGBD image semantic segmentation model、Andabout a sampleThe supervised segmentation loss of (a) can be expressed as:
wherein, H and W respectively represent the height, width and width of the RGBD image, and K represents the number of semantic label types.Representation matrixAt the position of image spaceProbability for semantic class C.Representation matrixAt the position of image spaceProbability for semantic class C.Representation matrixAt the position of image spaceProbability for semantic class C.Representing the position of a tag in image spaceThe value for semantic class C.
In the RGBD segmentation network designed in this embodiment, the input of the network is an RGB image and a depth image, and in the training stage, the output of the network is a probability output、And。、andis of the dimension of. The three probability outputs are essentially the distributions of the segmentation network with respect to the semantic classes currently predicted across the modal input samples. Therefore, for the same RGBD image,、andshould have similar probability distributions.
This example uses JS divergence (Jensen-Shannon divergence) as a loss measure for cross-modal estimation to measure the difference between different probability distributions.Andthe JS divergence loss in between can be expressed as:
wherein the content of the first and second substances,representing KL divergence measures to measure two probability outputsAndthe degree of difference therebetween.Representation matrixAt the position of image spaceProbability values for semantic class C.
therefore, the loss of the RGBD image semantic segmentation network with respect to the source domain supervision can be expressed as the sum of cross-entropy loss and cross-modal loss:
wherein the content of the first and second substances,andthe weight representing the corresponding loss is typically set manually by cross-validation.
FIG. 4 illustrates a table showing the domain adaptation and associated loss function for the robust mode.
The embodiment adopts a pair generation formula to realize the domain self-adaptation of the RGBD image semantic segmentation.
The framework of the countermeasure generation learning designed by the embodiment comprises a generator and three discriminators based on the convolutional neural network. Wherein the generator is an RGBD semantic segmentation network.
The core idea of the countermeasure generation formula of the embodiment is to train the generator, i.e. train the RGBD semantic segmentation network. The purpose of the generator is to generate a probabilistic output of the RGBD image of the target domain、Andthe distribution should be as similar as possible to the probability map of the source domain. The purpose of the discriminator is to distinguish as much as possible whether the corresponding weight information graph is from a sample of the target domain or the source domain, i.e. from which domain the input sample is coming from. The purposes of the generator and the discriminator are opposite, the generator and the discriminator are mutually improved in the continuous game process, and finally, the input sample still cannot be distinguished from the source field or the target field under the condition that the discrimination capability of the discriminator is reliable.
The present embodiment is directed to probabilistic output、Andthree discriminators based on the convolutional neural network are respectively designed. The three discriminators have the same network structure convolutional neural network, the number of input channels is equal to the number of semantic labels, and the output values are 0 and 1. Where 0 and 1 correspond to the target domain and the source domain, respectively. First, respectively according to、Andcalculating a weight information graph:
wherein the content of the first and second substances,、andrespectively representing prediction weight information、Andat the corresponding positionThe value of (c) above. As described above, the RGBD images of the source domain and the target domain are input into the semantic segmentation network, and the weight information graph can be obtained、And。
wherein the content of the first and second substances,、andare respectively a discriminator、Andparameters to be solved;、andrespectively-indicated discriminator、Andcorresponding domain cross entropy loss;andrespectively representing the number of RGB images and depth image pairs used for training in the source field and the target field;representing an RGBD image pair in the source domain,representing an RGBD image pair in the target domain.、Andrespectively representCorresponding to、And。、andrespectively representCorresponding to、And。
the countermeasure loss of the semantic segmentation network of the embodiment can be expressed as:
the objective function of the RGBD image semantic segmentation network can be expressed as the sum of supervised loss in the source domain and antagonistic loss in the target domain, i.e.:
wherein the content of the first and second substances,are the network parameters of the semantic segmentation model that need to be solved,is to fight against the lossA corresponding weight; the weights are typically set manually by cross-validation.
The invention designs a set of RGBD-oriented image semantic segmentation method based on domain self-adaptation, and the training steps of the method mainly comprise:
s1: training a semantic segmentation network on a labeled source field by using a formula (1) until convergence or a certain number of times of ending iteration;
s2: training a discriminator on a source domain and a target domain by using formulas (2), (3) and (4);
s3: training a semantic segmentation network on a source field and a target field by using a formula (5);
s4: steps S2 and S3 are repeated until a certain number of iterations is met, or the semantic segmentation network converges and the discriminator cannot correctly distinguish from which domain the training data came from.
Claims (9)
1. The method is characterized in that data of two different modes, namely an RGB image and a depth image, are used as input to construct a cross-mode-based image semantic segmentation network, and an image semantic segmentation algorithm is applied to the source fieldTraining a semantic segmentation network supervised; in order to fully utilize the data of the two modes, JS divergence is adopted to measure the difference between different probability distributions, so that the outputs of the different modes are consistent as much as possible; the method designs a domain adaptive algorithm based on a countermeasure generation formula, and obtains three semantic segmentation outlines by taking a semantic segmentation network as a generatorA rate map; three discriminators based on a convolutional neural network are designed, and information graphs generated by three semantic segmentation probability graphs of the semantic segmentation network are used as the input of the discriminators; the purposes of the generator and the discriminator are mutually contradictory, the generator and the discriminator are mutually improved in the continuous game process, and finally, the input sample can not be distinguished from the source field or the target field under the condition that the discrimination capability of the discriminator is reliable, so that the source field and the target field are aligned on the output level, and the high-precision labeling of the RGBD data cross-field is realized.
2. The method for semantic segmentation of RGBD images based on cross-modal learning and domain adaptation according to claim 1, wherein the method adopts two deep neural networks to extract features of the RGB images with 256 dimensions respectivelyAnd 256-dimensional depth image features(ii) a The RGB image features and the depth image features are directly fused to form 512-dimensional fused features:
Wherein the content of the first and second substances,representing a feature join operation;、andis subjected to convolution,And obtaining probability outputs after up-sampling operations、And(ii) a Assuming that the height and width of the image input to the semantic segmentation network are H and W, respectively, and the predefined number of semantic categories is K, then、Andis of dimension ofThe elements in the matrix represent the probability of the model about the pixel prediction category at the corresponding spatial position of the RGBD image.
3. The method of claim 1, wherein the image semantic segmentation algorithm is on a source domainSupervised training of semantic segmentation networks:
Hypothesis source domainFor a pair of labelled RGB and depth imagesIt is shown that, among others,which represents an RGB image, is provided,a depth image is represented in the image,a truth label representing manual labeling;
output of RGBD image semantic segmentation model、Andabout a sampleThe supervised segmentation loss of (a) can be expressed as:
h and W respectively represent the height, width and width of an RGBD image, and K represents the number of semantic label types;representation matrixAt the position of image spaceProbability for semantic class C;representation matrixAt the position of image spaceProbability for semantic class C;representation matrixAt the position of image spaceProbabilities about semantic categories;representing the position of a tag in image spaceThe value for semantic class C.
4. The method of claim 2, wherein the probability output is based on a cross-modal learning and domain adaptive RGBD image semantic segmentation methodAndJS divergence loss betweenExpressed as:
5. The method of claim 2, wherein the probability output is based on a cross-modal learning and domain adaptive RGBD image semantic segmentation methodAndJS divergence loss betweenExpressed as:
wherein H and W are the height and width of the RGBD image, respectively; k is the number of semantic categories;representation matrixAt the position of image spaceProbability for semantic class C;representation matrixAt the position of image spaceProbability for semantic class C;representing KL divergence measures to measure two probability outputsAndthe degree of difference therebetween.
6. The method for semantic segmentation of RGBD images based on cross-modal learning and domain adaptation according to claim 1, wherein the network input for the cross-modal based image semantic segmentation is an RGB image and a depth image; the output of the network during the training phase is a probabilistic output、And;、andis of the dimension of(ii) a The three probability outputs are essentially the distributions of the segmentation network with respect to the semantic categories currently predicted across modal input samples; in the testing stage, the network outputs the weighted sum of the corresponding elements with respect to the three probabilities to obtain the final semantic segmentation result of the RGBD image.
7. The method for semantic segmentation of RGBD images based on cross-modal learning and domain adaptation according to claim 1, wherein the three discriminators based on convolutional neural network、Andconvolutional neural networks having the same network structure, the input size beingThe output values are 0 and 1; 0 and 1 correspond to the target domain and the source domain, respectively;
wherein the content of the first and second substances,、andrespectively representing prediction weight information、Andat the corresponding positionA value of (d) above; the RGBD images of the source domain and the target domain are input into the semantic segmentation network to obtain the weight information map according to the formula for calculating the weight information map、And;
wherein the content of the first and second substances,、andare respectively a discriminator、Andparameters to be solved;、andrespectively-indicated discriminator、Andcorresponding domain cross entropy loss;andrespectively representing sourcesThe number of RGB picture and depth image pairs used for field and target field training;representing an RGBD image pair in the source domain,representing an RGBD image pair in a target domain;、andrespectively representCorresponding to、And;、andrespectively representCorresponding to、And。
8. the method according to claim 6, wherein the target function of the cross-modality based image semantic segmentation network is expressed as a sum of a source domain supervised loss and a target domain confrontation loss:
wherein the content of the first and second substances,is the network parameter of the semantic segmentation model to be solved;is to fight against the lossCorresponding weights, which are typically set manually by cross-validation;supervised losses are generated about the source field for the RGBD image semantic segmentation network;
the loss of the RGBD image semantic segmentation network in terms of supervision of the source field is expressed as the sum of cross entropy loss and cross modal loss:
wherein the content of the first and second substances,andweights representing the corresponding losses, the weights typically being set manually by cross-validation;
representing a probabilistic outputAndin betweenLoss of divergence;representing a probabilistic outputAndin betweenLoss of divergence;、、are respectively as、Andabout a sample,,Supervised segmentation loss of (2);
wherein the content of the first and second substances,、andrespectively representCorresponding to、And;、andrespectively representCorresponding to、And;、andrespectively-indicated discriminator、Andcorresponding domain cross entropy loss;representing the number of RGB picture and depth image pairs used for target domain training.
9. The RGBD-based image semantic segmentation method based on cross-modal learning and domain adaptation according to claim 1, wherein the training step comprises:
s1: on a source field with a label, training a semantic segmentation network by utilizing an RGBD image semantic segmentation network about a supervised loss formula of the source field until convergence or a certain number of times of termination of iteration is reached;
s2: using discriminators on source and target domains、Andtraining a discriminator by using the target function formula;
s3: training a semantic segmentation network on a source field and a target field by using an object function of the RGBD image semantic segmentation network;
s4: steps S2 and S3 are repeated until a certain number of iterations is met, or the semantic segmentation network converges and the discriminator cannot correctly distinguish from which domain the training data came from.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210328137.4A CN114419323B (en) | 2022-03-31 | 2022-03-31 | Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210328137.4A CN114419323B (en) | 2022-03-31 | 2022-03-31 | Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114419323A true CN114419323A (en) | 2022-04-29 |
CN114419323B CN114419323B (en) | 2022-06-24 |
Family
ID=81262781
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210328137.4A Active CN114419323B (en) | 2022-03-31 | 2022-03-31 | Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114419323B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115272681A (en) * | 2022-09-22 | 2022-11-01 | 中国海洋大学 | Ocean remote sensing image semantic segmentation method and system based on high-order feature class decoupling |
CN115797642A (en) * | 2023-02-13 | 2023-03-14 | 华东交通大学 | Self-adaptive image semantic segmentation algorithm based on consistency regularization and semi-supervision field |
CN116051830A (en) * | 2022-12-20 | 2023-05-02 | 中国科学院空天信息创新研究院 | Cross-modal data fusion-oriented contrast semantic segmentation method |
CN117036891A (en) * | 2023-08-22 | 2023-11-10 | 睿尔曼智能科技(北京)有限公司 | Cross-modal feature fusion-based image recognition method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104809187A (en) * | 2015-04-20 | 2015-07-29 | 南京邮电大学 | Indoor scene semantic annotation method based on RGB-D data |
CN109712105A (en) * | 2018-12-24 | 2019-05-03 | 浙江大学 | A kind of image well-marked target detection method of combination colour and depth information |
WO2019137915A1 (en) * | 2018-01-09 | 2019-07-18 | Connaught Electronics Ltd. | Generating input data for a convolutional neuronal network |
CN111340814A (en) * | 2020-03-03 | 2020-06-26 | 北京工业大学 | Multi-mode adaptive convolution-based RGB-D image semantic segmentation method |
CN111832592A (en) * | 2019-04-20 | 2020-10-27 | 南开大学 | RGBD significance detection method and related device |
CN112233124A (en) * | 2020-10-14 | 2021-01-15 | 华东交通大学 | Point cloud semantic segmentation method and system based on countermeasure learning and multi-modal learning |
CN113627433A (en) * | 2021-06-18 | 2021-11-09 | 中国科学院自动化研究所 | Cross-domain self-adaptive semantic segmentation method and device based on data disturbance |
-
2022
- 2022-03-31 CN CN202210328137.4A patent/CN114419323B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104809187A (en) * | 2015-04-20 | 2015-07-29 | 南京邮电大学 | Indoor scene semantic annotation method based on RGB-D data |
WO2019137915A1 (en) * | 2018-01-09 | 2019-07-18 | Connaught Electronics Ltd. | Generating input data for a convolutional neuronal network |
CN109712105A (en) * | 2018-12-24 | 2019-05-03 | 浙江大学 | A kind of image well-marked target detection method of combination colour and depth information |
CN111832592A (en) * | 2019-04-20 | 2020-10-27 | 南开大学 | RGBD significance detection method and related device |
CN111340814A (en) * | 2020-03-03 | 2020-06-26 | 北京工业大学 | Multi-mode adaptive convolution-based RGB-D image semantic segmentation method |
CN112233124A (en) * | 2020-10-14 | 2021-01-15 | 华东交通大学 | Point cloud semantic segmentation method and system based on countermeasure learning and multi-modal learning |
CN113627433A (en) * | 2021-06-18 | 2021-11-09 | 中国科学院自动化研究所 | Cross-domain self-adaptive semantic segmentation method and device based on data disturbance |
Non-Patent Citations (2)
Title |
---|
ZIQIANG ZHENG等: "Instance Map Based Image Synthesis With a Denoising Generative Adversarial network", 《IEEE ACCESS》 * |
李晓阳 等: "结合显著性检测和图割的RGBD图像共分割算法", 《系统仿真学报》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115272681A (en) * | 2022-09-22 | 2022-11-01 | 中国海洋大学 | Ocean remote sensing image semantic segmentation method and system based on high-order feature class decoupling |
CN115272681B (en) * | 2022-09-22 | 2022-12-20 | 中国海洋大学 | Ocean remote sensing image semantic segmentation method and system based on high-order feature class decoupling |
CN116051830A (en) * | 2022-12-20 | 2023-05-02 | 中国科学院空天信息创新研究院 | Cross-modal data fusion-oriented contrast semantic segmentation method |
CN116051830B (en) * | 2022-12-20 | 2023-06-20 | 中国科学院空天信息创新研究院 | Cross-modal data fusion-oriented contrast semantic segmentation method |
CN115797642A (en) * | 2023-02-13 | 2023-03-14 | 华东交通大学 | Self-adaptive image semantic segmentation algorithm based on consistency regularization and semi-supervision field |
CN117036891A (en) * | 2023-08-22 | 2023-11-10 | 睿尔曼智能科技(北京)有限公司 | Cross-modal feature fusion-based image recognition method and system |
CN117036891B (en) * | 2023-08-22 | 2024-03-29 | 睿尔曼智能科技(北京)有限公司 | Cross-modal feature fusion-based image recognition method and system |
Also Published As
Publication number | Publication date |
---|---|
CN114419323B (en) | 2022-06-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109949317B (en) | Semi-supervised image example segmentation method based on gradual confrontation learning | |
CN114419323B (en) | Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method | |
CN107945204B (en) | Pixel-level image matting method based on generation countermeasure network | |
CN110276765B (en) | Image panorama segmentation method based on multitask learning deep neural network | |
CN111950453B (en) | Random shape text recognition method based on selective attention mechanism | |
CN110910391B (en) | Video object segmentation method for dual-module neural network structure | |
CN111783540B (en) | Method and system for recognizing human body behaviors in video | |
CN111325750B (en) | Medical image segmentation method based on multi-scale fusion U-shaped chain neural network | |
CN110648310A (en) | Weak supervision casting defect identification method based on attention mechanism | |
CN110827265B (en) | Image anomaly detection method based on deep learning | |
CN110020658B (en) | Salient object detection method based on multitask deep learning | |
CN114742799B (en) | Industrial scene unknown type defect segmentation method based on self-supervision heterogeneous network | |
CN115713679A (en) | Target detection method based on multi-source information fusion, thermal infrared and three-dimensional depth map | |
Maslov et al. | Online supervised attention-based recurrent depth estimation from monocular video | |
CN113312973A (en) | Method and system for extracting features of gesture recognition key points | |
CN112927266A (en) | Weak supervision time domain action positioning method and system based on uncertainty guide training | |
CN115909002A (en) | Image translation method based on contrast learning | |
CN116452805A (en) | Transformer-based RGB-D semantic segmentation method of cross-modal fusion network | |
CN112529025A (en) | Data processing method and device | |
Zhang et al. | Planeseg: Building a plug-in for boosting planar region segmentation | |
Dong et al. | Combination of modified U‐Net and domain adaptation for road detection | |
CN110942463B (en) | Video target segmentation method based on generation countermeasure network | |
CN111275751B (en) | Unsupervised absolute scale calculation method and system | |
Kim et al. | Stereo confidence estimation via locally adaptive fusion and knowledge distillation | |
CN110647917A (en) | Model multiplexing method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |