CN114419323A - Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method - Google Patents

Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method Download PDF

Info

Publication number
CN114419323A
CN114419323A CN202210328137.4A CN202210328137A CN114419323A CN 114419323 A CN114419323 A CN 114419323A CN 202210328137 A CN202210328137 A CN 202210328137A CN 114419323 A CN114419323 A CN 114419323A
Authority
CN
China
Prior art keywords
semantic segmentation
image
domain
rgbd
cross
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210328137.4A
Other languages
Chinese (zh)
Other versions
CN114419323B (en
Inventor
刘伟
郭永发
余晓霞
刘家伟
张苗辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Jiaotong University
Original Assignee
East China Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Jiaotong University filed Critical East China Jiaotong University
Priority to CN202210328137.4A priority Critical patent/CN114419323B/en
Publication of CN114419323A publication Critical patent/CN114419323A/en
Application granted granted Critical
Publication of CN114419323B publication Critical patent/CN114419323B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • G06F18/256Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The method is based on cross-modal learning and field self-adaptive RGBD image semantic segmentation, and adopts data of two modalities, namely RGB and depth images, as input to construct a cross-modal based image semantic segmentation network; the method adopts Jensen-Shannon divergence to enable semantic segmentation results of each branch of the network to be consistent as much as possible. The method designs a set of counter-generating-based domain self-adaptive method, and obtains three semantic segmentation results by taking a semantic segmentation network as a generator; three discriminators are designed, and three semantic segmentation results are respectively used as the input of the discriminators; the generator makes the semantic segmentation of the source field and the target field consistent in distribution as much as possible; the discriminator aims to correctly distinguish which domain the semantic segmentation result comes from; the purpose of the generator and the discriminator is contrary, the generator and the discriminator are mutually improved in the continuous game, and finally the alignment of different fields on the output level is realized, namely the RGBD data is marked with high precision in cross-fields.

Description

Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method
Technical Field
The invention relates to a cross-modal learning and domain self-adaptive RGBD image semantic segmentation method, belonging to the technical field of image semantic segmentation.
Background
In recent years, image semantic segmentation based on deep convolutional networks has advanced significantly. Training a segmented network requires a large amount of annotation data. However, manual annotation acquisition of large numbers of pixel-level semantically segmented annotation data sets is quite difficult due to time and labor. Accurately predicting the class of all the pixels in an image remains a challenging problem, especially when the model is trained on one dataset (source domain) and predicted on another dataset (target domain). The difference between the source domain and the target domain causes the accuracy of the model trained on the source domain to be reduced to some extent on the target domain. Even in the absence of training data, intelligent critical systems, such as autonomous systems, require that autonomous vehicles perform robustly in a variety of different test environments. For these systems, the semantic segmentation model trained on the images in the sunny day of Beijing should also have good prediction results on the images in the foggy day of Kunming.
Although autonomous vehicles and robots are equipped with various sensors such as RGB images and depth images, most of the existing image semantic segmentation techniques focus on semantic segmentation of RGB images. Few field-adaptive technology-based schemes can focus on data of both RGB and depth modalities. In order to solve the problem of difference between different fields, the current main method adopts a field self-adaptive technology to align source field data with artificial labeling information and target field data without artificial labeling information.
Many semantic segmentation methods based on domain adaptive techniques use an antagonistic learning approach to narrow the gap between the source domain and the target domain. In the field adaptive learning based on the antagonistic learning, the purposes of the generator and the discriminator are mutually contradictory, the generator and the discriminator are mutually improved in the continuous game process, and finally the source field or the target field is selected under the condition that the discrimination capability of the discriminator is reliable, so that the field difference between the source field and the target field is reduced. The current few methods combine cross-modal learning and a domain adaptive method to achieve high-precision cross-dataset semantic segmentation of image scenes.
Previous research and practice has shown that fully mining the complementarity of multi-modal data can enhance the semantic understanding of a viewer to a scene. Therefore, it is necessary to sufficiently mine two modality information of the RGB image and the depth image, and improve the accuracy of semantic segmentation of an image scene based on domain adaptation of the contrast learning.
Disclosure of Invention
The invention aims to solve the technical problem of realizing semantic segmentation of an image semantic scene across a data set by fully mining two modal information of an RGB image and a depth image and combining field self-adaptation based on countermeasure generation type learning.
The technical scheme of the invention is that a cross-modal learning and domain self-adaptive RGBD image semantic segmentation method is adopted, two modal data of an RGB image and a depth image are used as input, a cross-modal based image semantic segmentation network is constructed, and the image semantic segmentation method is in the source domain
Figure 612230DEST_PATH_IMAGE001
Training a semantic segmentation network supervised; in order to fully utilize the data of the two modes, JS (Jensen-Shannon) divergence is adopted to measure the difference between different probability distributions; the method designs a domain self-adaptive method based on a countermeasure generation formula, and three semantic segmentation probability outputs of an RGBD image are obtained by taking a semantic segmentation network as a generator; three discriminators based on convolutional neural network are designed, and related weight information graphs are output according to three semantic segmentation probabilities of the semantic segmentation networkAs an input to the arbiter; the purpose of the generator is to make the source and target domains as similar as possible in the distribution of the output; the purpose of the discriminator is to distinguish whether the corresponding semantic segmentation probability output is from a sample of a target field or a source field as much as possible, namely which field the input sample comes from; the purposes of the generator and the discriminator are opposite, the generator and the discriminator are mutually improved in the continuous game process, and finally, the source field or the target field of the input sample still cannot be distinguished under the condition that the discrimination capability of the discriminator is reliable, so that the source field and the target field are aligned on the output level, namely, the difference of the source field and the target field about image semantic segmentation is reduced.
The method adopts two deep neural networks (semantic segmentation networks such as deplab) to respectively extract 256-dimensional RGB image features
Figure 819220DEST_PATH_IMAGE002
And 256-dimensional depth image features
Figure 994987DEST_PATH_IMAGE003
(ii) a The RGB image features and the depth image features are directly fused to form 512-dimensional fused features
Figure 920218DEST_PATH_IMAGE004
Figure 957575DEST_PATH_IMAGE005
Wherein the content of the first and second substances,
Figure 284651DEST_PATH_IMAGE006
representing a feature join operation;
Figure 569002DEST_PATH_IMAGE002
Figure 778266DEST_PATH_IMAGE003
and
Figure 806265DEST_PATH_IMAGE004
is subjected to convolution,
Figure 987848DEST_PATH_IMAGE007
And respectively obtaining semantic segmentation probability output after the up-sampling operation
Figure 256149DEST_PATH_IMAGE008
Figure 155972DEST_PATH_IMAGE009
And
Figure 722083DEST_PATH_IMAGE010
(ii) a Assuming that the height and width of the image input to the semantic segmentation network are H and W, respectively, and the predefined number of semantic categories is K, then
Figure 820489DEST_PATH_IMAGE008
Figure 446642DEST_PATH_IMAGE009
And
Figure 568182DEST_PATH_IMAGE010
is of dimension of
Figure 753962DEST_PATH_IMAGE011
The elements in the matrix represent the probability of the model about the pixel prediction category at the corresponding spatial position of the RGBD image.
The image semantic segmentation method is in the source field
Figure 910137DEST_PATH_IMAGE001
Supervised training of semantic segmentation networks:
hypothesis source domain
Figure 707192DEST_PATH_IMAGE001
For a pair of labelled RGB and depth images
Figure 378345DEST_PATH_IMAGE012
It is shown that, among others,
Figure 286258DEST_PATH_IMAGE013
which represents an RGB image, is provided,
Figure 296939DEST_PATH_IMAGE014
a depth image is represented in the image,
Figure 264895DEST_PATH_IMAGE015
a truth label representing manual labeling;
output of RGBD image semantic segmentation model
Figure 174076DEST_PATH_IMAGE008
Figure 885680DEST_PATH_IMAGE009
And
Figure 78764DEST_PATH_IMAGE010
about a sample
Figure 217622DEST_PATH_IMAGE012
The supervised segmentation loss of (a) can be expressed as:
Figure 535471DEST_PATH_IMAGE016
h and W respectively represent the height, width and width of an RGBD image, and K represents the number of semantic label types;
Figure 863815DEST_PATH_IMAGE017
representation matrix
Figure 849088DEST_PATH_IMAGE008
At the position of image space
Figure 158847DEST_PATH_IMAGE018
Probability for semantic class C;
Figure 963992DEST_PATH_IMAGE019
representation matrix
Figure 79715DEST_PATH_IMAGE009
At the position of image space
Figure 919496DEST_PATH_IMAGE018
Probability for semantic class C;
Figure 400155DEST_PATH_IMAGE020
representation matrix
Figure 833542DEST_PATH_IMAGE010
At the position of image space
Figure 425060DEST_PATH_IMAGE018
Probability for semantic class C;
Figure 384926DEST_PATH_IMAGE021
representing the position of a tag in image space
Figure 846607DEST_PATH_IMAGE018
The value for semantic class C.
Probability output
Figure 626344DEST_PATH_IMAGE008
And
Figure 21553DEST_PATH_IMAGE009
JS divergence loss between
Figure 898242DEST_PATH_IMAGE022
Expressed as:
Figure 720705DEST_PATH_IMAGE023
wherein the content of the first and second substances,
Figure 987738DEST_PATH_IMAGE024
representing KL divergence measures to measure two probability outputs
Figure 999688DEST_PATH_IMAGE009
And
Figure 668566DEST_PATH_IMAGE008
the degree of difference between;
Figure 724247DEST_PATH_IMAGE017
representation matrix
Figure 478576DEST_PATH_IMAGE008
At the position of image space
Figure 481168DEST_PATH_IMAGE018
Probability values for semantic class C.
Probability output
Figure 817602DEST_PATH_IMAGE010
And
Figure 981867DEST_PATH_IMAGE009
JS divergence loss between
Figure 223493DEST_PATH_IMAGE025
Expressed as:
Figure 826512DEST_PATH_IMAGE026
where H is the height of the RGBD image; w is the width of the RGBD image; k is the number of semantic categories;
Figure 204404DEST_PATH_IMAGE017
representation matrix
Figure 805150DEST_PATH_IMAGE008
At the position of image space
Figure 81541DEST_PATH_IMAGE018
Probability for semantic class C;
Figure 425935DEST_PATH_IMAGE017
representation matrix
Figure 720650DEST_PATH_IMAGE009
At the position of image space
Figure 492297DEST_PATH_IMAGE018
Probability for semantic class C;
Figure 442936DEST_PATH_IMAGE020
representation matrix
Figure 129701DEST_PATH_IMAGE010
At the position of image space
Figure 482185DEST_PATH_IMAGE018
Probability for semantic class C.
The cross-modality based image semantic segmentation network has network input of RGB image and depth image.
In the training phase, the output of the network is a probabilistic output
Figure 424733DEST_PATH_IMAGE008
Figure 862668DEST_PATH_IMAGE009
And
Figure 611181DEST_PATH_IMAGE010
Figure 818171DEST_PATH_IMAGE008
Figure 744670DEST_PATH_IMAGE009
and
Figure 732218DEST_PATH_IMAGE010
is of the dimension of
Figure 956526DEST_PATH_IMAGE027
(ii) a Three areThe probability output is essentially the distribution of the segmentation network with respect to semantic classes currently predicted across modal input samples.
In the testing stage, the network outputs the weighted sum of the corresponding elements with respect to the three probabilities to obtain the final semantic segmentation result of the RGBD image.
The three discriminators based on the convolutional neural network
Figure 96651DEST_PATH_IMAGE028
Figure 115423DEST_PATH_IMAGE029
And
Figure 527950DEST_PATH_IMAGE030
convolutional neural networks having the same network structure, the input size being
Figure 618265DEST_PATH_IMAGE011
The output values are 0 and 1; 0 and 1 correspond to the target domain and the source domain, respectively;
output according to probability
Figure 799848DEST_PATH_IMAGE008
Figure 255100DEST_PATH_IMAGE009
And
Figure 702393DEST_PATH_IMAGE010
the weight information maps are respectively calculated as follows:
Figure 268504DEST_PATH_IMAGE031
wherein the content of the first and second substances,
Figure 570172DEST_PATH_IMAGE032
Figure 258642DEST_PATH_IMAGE033
and
Figure 380182DEST_PATH_IMAGE034
respectively representing prediction weight information
Figure 560103DEST_PATH_IMAGE035
Figure 716278DEST_PATH_IMAGE036
And
Figure 513333DEST_PATH_IMAGE037
at the corresponding position
Figure 184485DEST_PATH_IMAGE027
A value of (d) above; the RGBD images of the source domain and the target domain are input into the semantic segmentation network to obtain the weight information map according to the formula for calculating the weight information map
Figure 92399DEST_PATH_IMAGE035
Figure 103080DEST_PATH_IMAGE036
And
Figure 884085DEST_PATH_IMAGE037
corresponding discriminator
Figure 980217DEST_PATH_IMAGE028
Figure 754138DEST_PATH_IMAGE029
And
Figure 619326DEST_PATH_IMAGE030
can be expressed as:
Figure 758183DEST_PATH_IMAGE038
wherein the content of the first and second substances,
Figure 154661DEST_PATH_IMAGE039
Figure 669956DEST_PATH_IMAGE040
and
Figure 389650DEST_PATH_IMAGE041
are respectively a discriminator
Figure 761726DEST_PATH_IMAGE028
Figure 832450DEST_PATH_IMAGE029
And
Figure 698906DEST_PATH_IMAGE030
parameters to be solved;
Figure 790883DEST_PATH_IMAGE042
Figure 271543DEST_PATH_IMAGE043
and
Figure 891880DEST_PATH_IMAGE044
respectively-indicated discriminator
Figure 483398DEST_PATH_IMAGE028
Figure 177685DEST_PATH_IMAGE029
And
Figure 907875DEST_PATH_IMAGE030
corresponding domain cross entropy loss;
Figure 687612DEST_PATH_IMAGE045
and
Figure 145138DEST_PATH_IMAGE046
respectively representing the number of RGB images and depth image pairs used for training in the source field and the target field;
Figure 959510DEST_PATH_IMAGE047
representing an RGBD image pair in a source domain;
Figure 781973DEST_PATH_IMAGE048
representing an RGBD image pair in a target domain;
Figure 862055DEST_PATH_IMAGE049
Figure 60956DEST_PATH_IMAGE050
and
Figure 792151DEST_PATH_IMAGE051
respectively represent
Figure 785515DEST_PATH_IMAGE047
Corresponding to
Figure 539844DEST_PATH_IMAGE035
Figure 104554DEST_PATH_IMAGE036
And
Figure 627939DEST_PATH_IMAGE037
Figure 120100DEST_PATH_IMAGE052
Figure 361726DEST_PATH_IMAGE053
and
Figure 636849DEST_PATH_IMAGE054
respectively represent
Figure 93370DEST_PATH_IMAGE048
Corresponding to
Figure 428536DEST_PATH_IMAGE035
Figure 157458DEST_PATH_IMAGE036
And
Figure 298589DEST_PATH_IMAGE037
the target function of the cross-modality based image semantic segmentation network can be expressed as the sum of supervised loss of a source domain and antagonistic loss of a target domain:
Figure 609616DEST_PATH_IMAGE055
wherein the content of the first and second substances,
Figure 381263DEST_PATH_IMAGE056
is the network parameter of the semantic segmentation model to be solved;
Figure 331901DEST_PATH_IMAGE057
is to fight against the loss
Figure 276723DEST_PATH_IMAGE058
Corresponding weights, which are typically set manually by cross-validation;
Figure 629207DEST_PATH_IMAGE059
supervised losses are generated about the source field for the RGBD image semantic segmentation network;
the loss of the RGBD image semantic segmentation network in terms of supervision of the source field is expressed as the sum of cross entropy loss and cross modal loss:
Figure 571756DEST_PATH_IMAGE060
wherein the content of the first and second substances,
Figure 822739DEST_PATH_IMAGE061
and
Figure 243356DEST_PATH_IMAGE062
weights representing the corresponding losses, the weights typically being set manually by cross-validation;
Figure 778243DEST_PATH_IMAGE022
representing a probabilistic output
Figure 891693DEST_PATH_IMAGE008
And
Figure 627043DEST_PATH_IMAGE009
in between
Figure 851351DEST_PATH_IMAGE063
Loss of divergence;
Figure 912848DEST_PATH_IMAGE025
representing a probabilistic output
Figure 259515DEST_PATH_IMAGE010
And
Figure 672042DEST_PATH_IMAGE009
in between
Figure 513090DEST_PATH_IMAGE063
Loss of divergence;
Figure 694673DEST_PATH_IMAGE064
Figure 884346DEST_PATH_IMAGE065
Figure 846486DEST_PATH_IMAGE066
are respectively as
Figure 412596DEST_PATH_IMAGE008
Figure 527314DEST_PATH_IMAGE009
And
Figure 887888DEST_PATH_IMAGE010
about a sample
Figure 337324DEST_PATH_IMAGE067
,
Figure 707125DEST_PATH_IMAGE068
,
Figure 597721DEST_PATH_IMAGE069
Supervised segmentation loss of (2);
countermeasure loss for semantically partitioned networks
Figure 207825DEST_PATH_IMAGE058
Expressed as:
Figure 816661DEST_PATH_IMAGE070
wherein the content of the first and second substances,
Figure 52470DEST_PATH_IMAGE071
Figure 63152DEST_PATH_IMAGE072
and
Figure 31108DEST_PATH_IMAGE073
respectively represent
Figure 943219DEST_PATH_IMAGE047
Corresponding to
Figure 654823DEST_PATH_IMAGE074
Figure 582327DEST_PATH_IMAGE075
And
Figure 721185DEST_PATH_IMAGE076
Figure 304613DEST_PATH_IMAGE077
Figure 367378DEST_PATH_IMAGE078
and
Figure 352651DEST_PATH_IMAGE079
respectively represent
Figure 662410DEST_PATH_IMAGE048
Corresponding to
Figure 795451DEST_PATH_IMAGE074
Figure 848858DEST_PATH_IMAGE075
And
Figure 501687DEST_PATH_IMAGE076
Figure 982347DEST_PATH_IMAGE080
Figure 274788DEST_PATH_IMAGE081
and
Figure 194202DEST_PATH_IMAGE082
respectively-indicated discriminator
Figure 888489DEST_PATH_IMAGE028
Figure 540050DEST_PATH_IMAGE029
And
Figure 585367DEST_PATH_IMAGE030
corresponding domain cross entropy loss;
Figure 793625DEST_PATH_IMAGE046
representing the number of RGB picture and depth image pairs used for target domain training.
The RGBD-oriented image semantic segmentation method based on the domain self-adaptation comprises the following training steps:
s1: on a source field with a label, training a semantic segmentation network by utilizing an RGBD image semantic segmentation network about a supervised loss formula of the source field until convergence or a certain number of times of termination of iteration is reached;
s2: using discriminators on source and target domains
Figure 342418DEST_PATH_IMAGE028
Figure 430460DEST_PATH_IMAGE029
And
Figure 759810DEST_PATH_IMAGE030
training a discriminator by using the target function formula;
s3: training a semantic segmentation network on a source field and a target field by using an object function of the RGBD image semantic segmentation network;
s4: steps S2 and S3 are repeated until a certain number of iterations is met, or the semantic segmentation network converges and the discriminator cannot correctly distinguish from which domain the training data came from.
The cross-modal image semantic segmentation network based on the RGB image and the depth image has the advantages that the cross-modal image semantic segmentation network based on the RGB image and the depth image better fuses two different modal data of the RGB image and the depth image; the image semantic segmentation method designed by the invention is based on the field self-adaptive technology, and the image in the target field does not need a semantic segmentation label of manual annotation, so that the workload of manual annotation of the image is reduced; the segmentation method designed by the invention inputs two different modal data of the RGB image and the depth image in the training and testing stages, adopts cross-modal learning to fully mine the correlation of the two different modal data of the RGB image and the depth image, and can improve the precision of image semantic segmentation; the method combines antagonistic learning and cross-modal learning, can reduce the difference between the source field and the target field, and is favorable for improving the generalization of the model between different data sets.
Drawings
FIG. 1 is a schematic diagram of an RGB image;
FIG. 2 is a diagram illustrating image semantic segmentation results of an RGB image;
FIG. 3 is a schematic diagram of a cross-modality based RGBD image semantic segmentation network according to the present invention;
FIG. 4 is a diagram illustrating the expression of the adaptive and correlated loss function in the anti-generative domain according to the present invention.
Detailed Description
The task of image semantic segmentation is to predict the class of each picture element in an image. As shown in fig. 1 and fig. 2, according to a certain segmentation method or rule, the RGB image in fig. 1 is semantically segmented to obtain the segmentation result shown in fig. 2, wherein the pixels of each category are replaced by corresponding gray levels, for example, the pixels corresponding to the car correspond to black.
The model designed in this embodiment inputs RGBD multi-modal data in the training and testing stages, that is, one RGB image and one corresponding depth image. The output is the result of semantic segmentation of the RGBD image, i.e. the class label of each pixel on each image. It is assumed here that the height and width of the RGBD image are H and W, respectively, and the preset number of semantic categories is K.
As shown in FIG. 3, the invention adopts two deep neural networks to respectively extract 256-dimensional RGB image features
Figure 958710DEST_PATH_IMAGE002
And 256-dimensional depth image features
Figure 362010DEST_PATH_IMAGE003
. Here, the deep neural network may use a common convolutional neural network, such as deplab V3. The features of the RGB image and the depth image are directly fused to form 512-dimensional fused features
Figure 431072DEST_PATH_IMAGE004
Figure 185402DEST_PATH_IMAGE005
Wherein the content of the first and second substances,
Figure 922414DEST_PATH_IMAGE006
indicating a feature join operation.
Figure 508116DEST_PATH_IMAGE002
Figure 937960DEST_PATH_IMAGE003
And
Figure 914006DEST_PATH_IMAGE004
after being convoluted, the,
Figure 267758DEST_PATH_IMAGE007
And obtaining probability outputs after up-sampling operations
Figure 911229DEST_PATH_IMAGE008
Figure 246396DEST_PATH_IMAGE009
And
Figure 37634DEST_PATH_IMAGE010
Figure 116449DEST_PATH_IMAGE008
Figure 614426DEST_PATH_IMAGE009
and
Figure 933543DEST_PATH_IMAGE010
the matrix is a dimension (H, W, K), and elements in the matrix represent the probability of the model about pixel prediction categories at corresponding spatial positions of the RGBD image.
The semantic segmentation method designed by the embodiment is in the source field
Figure 149761DEST_PATH_IMAGE001
The semantic segmentation network is trained supervised. Hypothesis source domain
Figure 94583DEST_PATH_IMAGE001
For a pair of labelled RGB and depth images
Figure 447067DEST_PATH_IMAGE012
And (4) showing.
Wherein the content of the first and second substances,
Figure 389615DEST_PATH_IMAGE013
which represents an RGB image, is provided,
Figure 640599DEST_PATH_IMAGE014
a depth image is represented in the image,
Figure 61216DEST_PATH_IMAGE015
representing a manually marked truth label.
Output of the RGBD image semantic segmentation model
Figure 533786DEST_PATH_IMAGE008
Figure 443973DEST_PATH_IMAGE009
And
Figure 369204DEST_PATH_IMAGE010
about a sample
Figure 593512DEST_PATH_IMAGE012
The supervised segmentation loss of (a) can be expressed as:
Figure 724848DEST_PATH_IMAGE016
wherein, H and W respectively represent the height, width and width of the RGBD image, and K represents the number of semantic label types.
Figure 9199DEST_PATH_IMAGE017
Representation matrix
Figure 421726DEST_PATH_IMAGE008
At the position of image space
Figure 184145DEST_PATH_IMAGE018
Probability for semantic class C.
Figure 428045DEST_PATH_IMAGE019
Representation matrix
Figure 696346DEST_PATH_IMAGE009
At the position of image space
Figure 658486DEST_PATH_IMAGE018
Probability for semantic class C.
Figure 224597DEST_PATH_IMAGE020
Representation matrix
Figure 73735DEST_PATH_IMAGE010
At the position of image space
Figure 699888DEST_PATH_IMAGE018
Probability for semantic class C.
Figure 87007DEST_PATH_IMAGE021
Representing the position of a tag in image space
Figure 253546DEST_PATH_IMAGE018
The value for semantic class C.
In the RGBD segmentation network designed in this embodiment, the input of the network is an RGB image and a depth image, and in the training stage, the output of the network is a probability output
Figure 409721DEST_PATH_IMAGE008
Figure 19825DEST_PATH_IMAGE009
And
Figure 628661DEST_PATH_IMAGE010
Figure 802154DEST_PATH_IMAGE008
Figure 812835DEST_PATH_IMAGE009
and
Figure 843108DEST_PATH_IMAGE010
is of the dimension of
Figure 939240DEST_PATH_IMAGE027
. The three probability outputs are essentially the distributions of the segmentation network with respect to the semantic classes currently predicted across the modal input samples. Therefore, for the same RGBD image,
Figure 385265DEST_PATH_IMAGE008
Figure 326151DEST_PATH_IMAGE009
and
Figure 465009DEST_PATH_IMAGE010
should have similar probability distributions.
This example uses JS divergence (Jensen-Shannon divergence) as a loss measure for cross-modal estimation to measure the difference between different probability distributions.
Figure 48437DEST_PATH_IMAGE083
And
Figure 360469DEST_PATH_IMAGE084
the JS divergence loss in between can be expressed as:
Figure 345743DEST_PATH_IMAGE023
wherein the content of the first and second substances,
Figure 655501DEST_PATH_IMAGE024
representing KL divergence measures to measure two probability outputs
Figure 539275DEST_PATH_IMAGE009
And
Figure 592682DEST_PATH_IMAGE008
the degree of difference therebetween.
Figure 432462DEST_PATH_IMAGE017
Representation matrix
Figure 975438DEST_PATH_IMAGE008
At the position of image space
Figure 267880DEST_PATH_IMAGE018
Probability values for semantic class C.
Similarly, probability output
Figure 124977DEST_PATH_IMAGE010
And
Figure 632313DEST_PATH_IMAGE009
the JS divergence loss in between can be expressed as:
Figure 283874DEST_PATH_IMAGE026
therefore, the loss of the RGBD image semantic segmentation network with respect to the source domain supervision can be expressed as the sum of cross-entropy loss and cross-modal loss:
Figure 329191DEST_PATH_IMAGE060
(1)
wherein the content of the first and second substances,
Figure 724400DEST_PATH_IMAGE061
and
Figure 335510DEST_PATH_IMAGE062
the weight representing the corresponding loss is typically set manually by cross-validation.
FIG. 4 illustrates a table showing the domain adaptation and associated loss function for the robust mode.
The embodiment adopts a pair generation formula to realize the domain self-adaptation of the RGBD image semantic segmentation.
The framework of the countermeasure generation learning designed by the embodiment comprises a generator and three discriminators based on the convolutional neural network. Wherein the generator is an RGBD semantic segmentation network.
The core idea of the countermeasure generation formula of the embodiment is to train the generator, i.e. train the RGBD semantic segmentation network. The purpose of the generator is to generate a probabilistic output of the RGBD image of the target domain
Figure 423551DEST_PATH_IMAGE008
Figure 503634DEST_PATH_IMAGE009
And
Figure 436955DEST_PATH_IMAGE010
the distribution should be as similar as possible to the probability map of the source domain. The purpose of the discriminator is to distinguish as much as possible whether the corresponding weight information graph is from a sample of the target domain or the source domain, i.e. from which domain the input sample is coming from. The purposes of the generator and the discriminator are opposite, the generator and the discriminator are mutually improved in the continuous game process, and finally, the input sample still cannot be distinguished from the source field or the target field under the condition that the discrimination capability of the discriminator is reliable.
The present embodiment is directed to probabilistic output
Figure 105834DEST_PATH_IMAGE008
Figure 427094DEST_PATH_IMAGE009
And
Figure 915844DEST_PATH_IMAGE010
three discriminators based on the convolutional neural network are respectively designed. The three discriminators have the same network structure convolutional neural network, the number of input channels is equal to the number of semantic labels, and the output values are 0 and 1. Where 0 and 1 correspond to the target domain and the source domain, respectively. First, respectively according to
Figure 918435DEST_PATH_IMAGE008
Figure 257799DEST_PATH_IMAGE009
And
Figure 422064DEST_PATH_IMAGE010
calculating a weight information graph:
Figure 663690DEST_PATH_IMAGE031
wherein the content of the first and second substances,
Figure 204392DEST_PATH_IMAGE032
Figure 644601DEST_PATH_IMAGE033
and
Figure 245347DEST_PATH_IMAGE034
respectively representing prediction weight information
Figure 521738DEST_PATH_IMAGE035
Figure 866132DEST_PATH_IMAGE036
And
Figure 364109DEST_PATH_IMAGE037
at the corresponding position
Figure 932494DEST_PATH_IMAGE027
The value of (c) above. As described above, the RGBD images of the source domain and the target domain are input into the semantic segmentation network, and the weight information graph can be obtained
Figure 148712DEST_PATH_IMAGE035
Figure 31217DEST_PATH_IMAGE036
And
Figure 383701DEST_PATH_IMAGE037
correspond toIs determined by
Figure 873719DEST_PATH_IMAGE028
Figure 311654DEST_PATH_IMAGE029
And
Figure 997850DEST_PATH_IMAGE030
can be expressed as:
Figure 267157DEST_PATH_IMAGE085
(2)
Figure 380607DEST_PATH_IMAGE086
(3)
Figure 118887DEST_PATH_IMAGE087
(4)
wherein the content of the first and second substances,
Figure 343195DEST_PATH_IMAGE039
Figure 670271DEST_PATH_IMAGE040
and
Figure 16939DEST_PATH_IMAGE041
are respectively a discriminator
Figure 163886DEST_PATH_IMAGE028
Figure 191885DEST_PATH_IMAGE029
And
Figure 183587DEST_PATH_IMAGE030
parameters to be solved;
Figure 638840DEST_PATH_IMAGE042
Figure 538662DEST_PATH_IMAGE043
and
Figure 167090DEST_PATH_IMAGE044
respectively-indicated discriminator
Figure 203179DEST_PATH_IMAGE028
Figure 781664DEST_PATH_IMAGE029
And
Figure 965521DEST_PATH_IMAGE030
corresponding domain cross entropy loss;
Figure 335322DEST_PATH_IMAGE045
and
Figure 304546DEST_PATH_IMAGE046
respectively representing the number of RGB images and depth image pairs used for training in the source field and the target field;
Figure 101601DEST_PATH_IMAGE047
representing an RGBD image pair in the source domain,
Figure 710437DEST_PATH_IMAGE048
representing an RGBD image pair in the target domain.
Figure 680667DEST_PATH_IMAGE049
Figure 504398DEST_PATH_IMAGE050
And
Figure 472354DEST_PATH_IMAGE051
respectively represent
Figure 630802DEST_PATH_IMAGE047
Corresponding to
Figure 155456DEST_PATH_IMAGE035
Figure 286223DEST_PATH_IMAGE036
And
Figure 487397DEST_PATH_IMAGE037
Figure 805246DEST_PATH_IMAGE052
Figure 320541DEST_PATH_IMAGE053
and
Figure 121793DEST_PATH_IMAGE054
respectively represent
Figure 431552DEST_PATH_IMAGE048
Corresponding to
Figure 299014DEST_PATH_IMAGE035
Figure 352421DEST_PATH_IMAGE036
And
Figure 5250DEST_PATH_IMAGE037
the countermeasure loss of the semantic segmentation network of the embodiment can be expressed as:
Figure 485910DEST_PATH_IMAGE070
the objective function of the RGBD image semantic segmentation network can be expressed as the sum of supervised loss in the source domain and antagonistic loss in the target domain, i.e.:
Figure 106247DEST_PATH_IMAGE055
(5)
wherein the content of the first and second substances,
Figure 697765DEST_PATH_IMAGE056
are the network parameters of the semantic segmentation model that need to be solved,
Figure 657631DEST_PATH_IMAGE057
is to fight against the loss
Figure 122242DEST_PATH_IMAGE058
A corresponding weight; the weights are typically set manually by cross-validation.
The invention designs a set of RGBD-oriented image semantic segmentation method based on domain self-adaptation, and the training steps of the method mainly comprise:
s1: training a semantic segmentation network on a labeled source field by using a formula (1) until convergence or a certain number of times of ending iteration;
s2: training a discriminator on a source domain and a target domain by using formulas (2), (3) and (4);
s3: training a semantic segmentation network on a source field and a target field by using a formula (5);
s4: steps S2 and S3 are repeated until a certain number of iterations is met, or the semantic segmentation network converges and the discriminator cannot correctly distinguish from which domain the training data came from.

Claims (9)

1. The method is characterized in that data of two different modes, namely an RGB image and a depth image, are used as input to construct a cross-mode-based image semantic segmentation network, and an image semantic segmentation algorithm is applied to the source field
Figure 816824DEST_PATH_IMAGE001
Training a semantic segmentation network supervised; in order to fully utilize the data of the two modes, JS divergence is adopted to measure the difference between different probability distributions, so that the outputs of the different modes are consistent as much as possible; the method designs a domain adaptive algorithm based on a countermeasure generation formula, and obtains three semantic segmentation outlines by taking a semantic segmentation network as a generatorA rate map; three discriminators based on a convolutional neural network are designed, and information graphs generated by three semantic segmentation probability graphs of the semantic segmentation network are used as the input of the discriminators; the purposes of the generator and the discriminator are mutually contradictory, the generator and the discriminator are mutually improved in the continuous game process, and finally, the input sample can not be distinguished from the source field or the target field under the condition that the discrimination capability of the discriminator is reliable, so that the source field and the target field are aligned on the output level, and the high-precision labeling of the RGBD data cross-field is realized.
2. The method for semantic segmentation of RGBD images based on cross-modal learning and domain adaptation according to claim 1, wherein the method adopts two deep neural networks to extract features of the RGB images with 256 dimensions respectively
Figure 915230DEST_PATH_IMAGE002
And 256-dimensional depth image features
Figure 541384DEST_PATH_IMAGE003
(ii) a The RGB image features and the depth image features are directly fused to form 512-dimensional fused features
Figure 662924DEST_PATH_IMAGE004
Figure 95042DEST_PATH_IMAGE005
Wherein the content of the first and second substances,
Figure 251217DEST_PATH_IMAGE006
representing a feature join operation;
Figure 48272DEST_PATH_IMAGE002
Figure 220889DEST_PATH_IMAGE003
and
Figure 128802DEST_PATH_IMAGE004
is subjected to convolution,
Figure 139484DEST_PATH_IMAGE007
And obtaining probability outputs after up-sampling operations
Figure 169757DEST_PATH_IMAGE008
Figure 265889DEST_PATH_IMAGE009
And
Figure 39809DEST_PATH_IMAGE010
(ii) a Assuming that the height and width of the image input to the semantic segmentation network are H and W, respectively, and the predefined number of semantic categories is K, then
Figure 170577DEST_PATH_IMAGE008
Figure 371751DEST_PATH_IMAGE009
And
Figure 689600DEST_PATH_IMAGE010
is of dimension of
Figure 204895DEST_PATH_IMAGE011
The elements in the matrix represent the probability of the model about the pixel prediction category at the corresponding spatial position of the RGBD image.
3. The method of claim 1, wherein the image semantic segmentation algorithm is on a source domain
Figure 756880DEST_PATH_IMAGE001
Supervised training of semantic segmentation networks:
Hypothesis source domain
Figure 66638DEST_PATH_IMAGE001
For a pair of labelled RGB and depth images
Figure 934100DEST_PATH_IMAGE012
It is shown that, among others,
Figure 987507DEST_PATH_IMAGE013
which represents an RGB image, is provided,
Figure 889604DEST_PATH_IMAGE014
a depth image is represented in the image,
Figure 370263DEST_PATH_IMAGE015
a truth label representing manual labeling;
output of RGBD image semantic segmentation model
Figure 990601DEST_PATH_IMAGE008
Figure 582119DEST_PATH_IMAGE009
And
Figure 105766DEST_PATH_IMAGE010
about a sample
Figure 757328DEST_PATH_IMAGE012
The supervised segmentation loss of (a) can be expressed as:
Figure 537065DEST_PATH_IMAGE016
h and W respectively represent the height, width and width of an RGBD image, and K represents the number of semantic label types;
Figure 994591DEST_PATH_IMAGE017
representation matrix
Figure 808963DEST_PATH_IMAGE008
At the position of image space
Figure 693743DEST_PATH_IMAGE018
Probability for semantic class C;
Figure 960776DEST_PATH_IMAGE019
representation matrix
Figure 159676DEST_PATH_IMAGE009
At the position of image space
Figure 890872DEST_PATH_IMAGE018
Probability for semantic class C;
Figure 884236DEST_PATH_IMAGE020
representation matrix
Figure 199417DEST_PATH_IMAGE010
At the position of image space
Figure 202008DEST_PATH_IMAGE018
Probabilities about semantic categories;
Figure 787710DEST_PATH_IMAGE021
representing the position of a tag in image space
Figure 951975DEST_PATH_IMAGE018
The value for semantic class C.
4. The method of claim 2, wherein the probability output is based on a cross-modal learning and domain adaptive RGBD image semantic segmentation method
Figure 193601DEST_PATH_IMAGE008
And
Figure 796620DEST_PATH_IMAGE009
JS divergence loss between
Figure 174512DEST_PATH_IMAGE022
Expressed as:
Figure 837575DEST_PATH_IMAGE023
wherein the content of the first and second substances,
Figure 300917DEST_PATH_IMAGE024
representing KL divergence measures to measure two probability outputs
Figure 209092DEST_PATH_IMAGE009
And
Figure 441491DEST_PATH_IMAGE008
the degree of difference between;
Figure 275454DEST_PATH_IMAGE017
representation matrix
Figure 226093DEST_PATH_IMAGE008
At the position of image space
Figure 108598DEST_PATH_IMAGE018
Probability values for semantic class C.
5. The method of claim 2, wherein the probability output is based on a cross-modal learning and domain adaptive RGBD image semantic segmentation method
Figure 523399DEST_PATH_IMAGE010
And
Figure 465947DEST_PATH_IMAGE009
JS divergence loss between
Figure 966199DEST_PATH_IMAGE025
Expressed as:
Figure 652395DEST_PATH_IMAGE026
wherein H and W are the height and width of the RGBD image, respectively; k is the number of semantic categories;
Figure 437816DEST_PATH_IMAGE017
representation matrix
Figure 551265DEST_PATH_IMAGE008
At the position of image space
Figure 538813DEST_PATH_IMAGE018
Probability for semantic class C;
Figure 763121DEST_PATH_IMAGE017
representation matrix
Figure 152514DEST_PATH_IMAGE009
At the position of image space
Figure 171285DEST_PATH_IMAGE018
Probability for semantic class C;
Figure 583812DEST_PATH_IMAGE027
representing KL divergence measures to measure two probability outputs
Figure 674128DEST_PATH_IMAGE009
And
Figure 855710DEST_PATH_IMAGE008
the degree of difference therebetween.
6. The method for semantic segmentation of RGBD images based on cross-modal learning and domain adaptation according to claim 1, wherein the network input for the cross-modal based image semantic segmentation is an RGB image and a depth image; the output of the network during the training phase is a probabilistic output
Figure 874744DEST_PATH_IMAGE008
Figure 508988DEST_PATH_IMAGE009
And
Figure 137415DEST_PATH_IMAGE010
Figure 439084DEST_PATH_IMAGE008
Figure 127554DEST_PATH_IMAGE009
and
Figure 249094DEST_PATH_IMAGE010
is of the dimension of
Figure 618895DEST_PATH_IMAGE028
(ii) a The three probability outputs are essentially the distributions of the segmentation network with respect to the semantic categories currently predicted across modal input samples; in the testing stage, the network outputs the weighted sum of the corresponding elements with respect to the three probabilities to obtain the final semantic segmentation result of the RGBD image.
7. The method for semantic segmentation of RGBD images based on cross-modal learning and domain adaptation according to claim 1, wherein the three discriminators based on convolutional neural network
Figure 837387DEST_PATH_IMAGE029
Figure 634442DEST_PATH_IMAGE030
And
Figure 804130DEST_PATH_IMAGE031
convolutional neural networks having the same network structure, the input size being
Figure 712043DEST_PATH_IMAGE011
The output values are 0 and 1; 0 and 1 correspond to the target domain and the source domain, respectively;
according to a probability map
Figure 785041DEST_PATH_IMAGE008
Figure 752997DEST_PATH_IMAGE009
And
Figure 849129DEST_PATH_IMAGE010
the weight information graph is calculated as follows:
Figure 623050DEST_PATH_IMAGE032
wherein the content of the first and second substances,
Figure 488238DEST_PATH_IMAGE033
Figure 689412DEST_PATH_IMAGE034
and
Figure 272840DEST_PATH_IMAGE035
respectively representing prediction weight information
Figure 351917DEST_PATH_IMAGE036
Figure 71611DEST_PATH_IMAGE037
And
Figure 443687DEST_PATH_IMAGE038
at the corresponding position
Figure 514411DEST_PATH_IMAGE028
A value of (d) above; the RGBD images of the source domain and the target domain are input into the semantic segmentation network to obtain the weight information map according to the formula for calculating the weight information map
Figure 567817DEST_PATH_IMAGE036
Figure 469914DEST_PATH_IMAGE037
And
Figure 12891DEST_PATH_IMAGE038
corresponding discriminator
Figure 570911DEST_PATH_IMAGE029
Figure 729141DEST_PATH_IMAGE030
And
Figure 423428DEST_PATH_IMAGE031
can be expressed as:
Figure 402885DEST_PATH_IMAGE039
wherein the content of the first and second substances,
Figure 244939DEST_PATH_IMAGE040
Figure 640148DEST_PATH_IMAGE041
and
Figure 454521DEST_PATH_IMAGE042
are respectively a discriminator
Figure 339300DEST_PATH_IMAGE029
Figure 606333DEST_PATH_IMAGE030
And
Figure 369015DEST_PATH_IMAGE031
parameters to be solved;
Figure 37894DEST_PATH_IMAGE043
Figure 93575DEST_PATH_IMAGE044
and
Figure 847904DEST_PATH_IMAGE045
respectively-indicated discriminator
Figure 647233DEST_PATH_IMAGE029
Figure 170618DEST_PATH_IMAGE030
And
Figure 662779DEST_PATH_IMAGE031
corresponding domain cross entropy loss;
Figure 904405DEST_PATH_IMAGE046
and
Figure 740380DEST_PATH_IMAGE047
respectively representing sourcesThe number of RGB picture and depth image pairs used for field and target field training;
Figure 383851DEST_PATH_IMAGE048
representing an RGBD image pair in the source domain,
Figure 781335DEST_PATH_IMAGE049
representing an RGBD image pair in a target domain;
Figure 510256DEST_PATH_IMAGE050
Figure 651387DEST_PATH_IMAGE051
and
Figure 149365DEST_PATH_IMAGE052
respectively represent
Figure 983329DEST_PATH_IMAGE048
Corresponding to
Figure 933967DEST_PATH_IMAGE036
Figure 380254DEST_PATH_IMAGE037
And
Figure 732738DEST_PATH_IMAGE038
Figure 737603DEST_PATH_IMAGE053
Figure 175538DEST_PATH_IMAGE054
and
Figure 658472DEST_PATH_IMAGE055
respectively represent
Figure 131041DEST_PATH_IMAGE049
Corresponding to
Figure 306808DEST_PATH_IMAGE036
Figure 787031DEST_PATH_IMAGE037
And
Figure 11339DEST_PATH_IMAGE038
8. the method according to claim 6, wherein the target function of the cross-modality based image semantic segmentation network is expressed as a sum of a source domain supervised loss and a target domain confrontation loss:
Figure 135153DEST_PATH_IMAGE056
wherein the content of the first and second substances,
Figure 419504DEST_PATH_IMAGE057
is the network parameter of the semantic segmentation model to be solved;
Figure 894347DEST_PATH_IMAGE058
is to fight against the loss
Figure 109297DEST_PATH_IMAGE059
Corresponding weights, which are typically set manually by cross-validation;
Figure 290880DEST_PATH_IMAGE060
supervised losses are generated about the source field for the RGBD image semantic segmentation network;
the loss of the RGBD image semantic segmentation network in terms of supervision of the source field is expressed as the sum of cross entropy loss and cross modal loss:
Figure 542869DEST_PATH_IMAGE061
wherein the content of the first and second substances,
Figure 442692DEST_PATH_IMAGE062
and
Figure 71120DEST_PATH_IMAGE063
weights representing the corresponding losses, the weights typically being set manually by cross-validation;
Figure 372788DEST_PATH_IMAGE022
representing a probabilistic output
Figure 300074DEST_PATH_IMAGE008
And
Figure 749510DEST_PATH_IMAGE009
in between
Figure 119311DEST_PATH_IMAGE064
Loss of divergence;
Figure 72224DEST_PATH_IMAGE025
representing a probabilistic output
Figure 931595DEST_PATH_IMAGE010
And
Figure 540431DEST_PATH_IMAGE009
in between
Figure 277705DEST_PATH_IMAGE064
Loss of divergence;
Figure 288387DEST_PATH_IMAGE065
Figure 318659DEST_PATH_IMAGE066
Figure 414791DEST_PATH_IMAGE067
are respectively as
Figure 188712DEST_PATH_IMAGE008
Figure 116217DEST_PATH_IMAGE009
And
Figure 255074DEST_PATH_IMAGE010
about a sample
Figure 838502DEST_PATH_IMAGE068
,
Figure 649070DEST_PATH_IMAGE069
,
Figure 634344DEST_PATH_IMAGE070
Supervised segmentation loss of (2);
countermeasure loss for semantically partitioned networks
Figure 6419DEST_PATH_IMAGE059
Expressed as:
Figure 77143DEST_PATH_IMAGE071
wherein the content of the first and second substances,
Figure 192867DEST_PATH_IMAGE072
Figure 32647DEST_PATH_IMAGE073
and
Figure 575624DEST_PATH_IMAGE074
respectively represent
Figure 868065DEST_PATH_IMAGE048
Corresponding to
Figure 288944DEST_PATH_IMAGE075
Figure 983231DEST_PATH_IMAGE076
And
Figure 697109DEST_PATH_IMAGE077
Figure 742425DEST_PATH_IMAGE078
Figure 199951DEST_PATH_IMAGE079
and
Figure 748744DEST_PATH_IMAGE080
respectively represent
Figure 899103DEST_PATH_IMAGE049
Corresponding to
Figure 166136DEST_PATH_IMAGE075
Figure 943467DEST_PATH_IMAGE076
And
Figure 409083DEST_PATH_IMAGE077
Figure 668026DEST_PATH_IMAGE081
Figure 484672DEST_PATH_IMAGE082
and
Figure 221684DEST_PATH_IMAGE083
respectively-indicated discriminator
Figure 807386DEST_PATH_IMAGE029
Figure 237231DEST_PATH_IMAGE030
And
Figure 777059DEST_PATH_IMAGE031
corresponding domain cross entropy loss;
Figure 317761DEST_PATH_IMAGE047
representing the number of RGB picture and depth image pairs used for target domain training.
9. The RGBD-based image semantic segmentation method based on cross-modal learning and domain adaptation according to claim 1, wherein the training step comprises:
s1: on a source field with a label, training a semantic segmentation network by utilizing an RGBD image semantic segmentation network about a supervised loss formula of the source field until convergence or a certain number of times of termination of iteration is reached;
s2: using discriminators on source and target domains
Figure 23549DEST_PATH_IMAGE029
Figure 421033DEST_PATH_IMAGE030
And
Figure 149954DEST_PATH_IMAGE031
training a discriminator by using the target function formula;
s3: training a semantic segmentation network on a source field and a target field by using an object function of the RGBD image semantic segmentation network;
s4: steps S2 and S3 are repeated until a certain number of iterations is met, or the semantic segmentation network converges and the discriminator cannot correctly distinguish from which domain the training data came from.
CN202210328137.4A 2022-03-31 2022-03-31 Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method Active CN114419323B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210328137.4A CN114419323B (en) 2022-03-31 2022-03-31 Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210328137.4A CN114419323B (en) 2022-03-31 2022-03-31 Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method

Publications (2)

Publication Number Publication Date
CN114419323A true CN114419323A (en) 2022-04-29
CN114419323B CN114419323B (en) 2022-06-24

Family

ID=81262781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210328137.4A Active CN114419323B (en) 2022-03-31 2022-03-31 Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method

Country Status (1)

Country Link
CN (1) CN114419323B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272681A (en) * 2022-09-22 2022-11-01 中国海洋大学 Ocean remote sensing image semantic segmentation method and system based on high-order feature class decoupling
CN115797642A (en) * 2023-02-13 2023-03-14 华东交通大学 Self-adaptive image semantic segmentation algorithm based on consistency regularization and semi-supervision field
CN116051830A (en) * 2022-12-20 2023-05-02 中国科学院空天信息创新研究院 Cross-modal data fusion-oriented contrast semantic segmentation method
CN117036891A (en) * 2023-08-22 2023-11-10 睿尔曼智能科技(北京)有限公司 Cross-modal feature fusion-based image recognition method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809187A (en) * 2015-04-20 2015-07-29 南京邮电大学 Indoor scene semantic annotation method based on RGB-D data
CN109712105A (en) * 2018-12-24 2019-05-03 浙江大学 A kind of image well-marked target detection method of combination colour and depth information
WO2019137915A1 (en) * 2018-01-09 2019-07-18 Connaught Electronics Ltd. Generating input data for a convolutional neuronal network
CN111340814A (en) * 2020-03-03 2020-06-26 北京工业大学 Multi-mode adaptive convolution-based RGB-D image semantic segmentation method
CN111832592A (en) * 2019-04-20 2020-10-27 南开大学 RGBD significance detection method and related device
CN112233124A (en) * 2020-10-14 2021-01-15 华东交通大学 Point cloud semantic segmentation method and system based on countermeasure learning and multi-modal learning
CN113627433A (en) * 2021-06-18 2021-11-09 中国科学院自动化研究所 Cross-domain self-adaptive semantic segmentation method and device based on data disturbance

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809187A (en) * 2015-04-20 2015-07-29 南京邮电大学 Indoor scene semantic annotation method based on RGB-D data
WO2019137915A1 (en) * 2018-01-09 2019-07-18 Connaught Electronics Ltd. Generating input data for a convolutional neuronal network
CN109712105A (en) * 2018-12-24 2019-05-03 浙江大学 A kind of image well-marked target detection method of combination colour and depth information
CN111832592A (en) * 2019-04-20 2020-10-27 南开大学 RGBD significance detection method and related device
CN111340814A (en) * 2020-03-03 2020-06-26 北京工业大学 Multi-mode adaptive convolution-based RGB-D image semantic segmentation method
CN112233124A (en) * 2020-10-14 2021-01-15 华东交通大学 Point cloud semantic segmentation method and system based on countermeasure learning and multi-modal learning
CN113627433A (en) * 2021-06-18 2021-11-09 中国科学院自动化研究所 Cross-domain self-adaptive semantic segmentation method and device based on data disturbance

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZIQIANG ZHENG等: "Instance Map Based Image Synthesis With a Denoising Generative Adversarial network", 《IEEE ACCESS》 *
李晓阳 等: "结合显著性检测和图割的RGBD图像共分割算法", 《系统仿真学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272681A (en) * 2022-09-22 2022-11-01 中国海洋大学 Ocean remote sensing image semantic segmentation method and system based on high-order feature class decoupling
CN115272681B (en) * 2022-09-22 2022-12-20 中国海洋大学 Ocean remote sensing image semantic segmentation method and system based on high-order feature class decoupling
CN116051830A (en) * 2022-12-20 2023-05-02 中国科学院空天信息创新研究院 Cross-modal data fusion-oriented contrast semantic segmentation method
CN116051830B (en) * 2022-12-20 2023-06-20 中国科学院空天信息创新研究院 Cross-modal data fusion-oriented contrast semantic segmentation method
CN115797642A (en) * 2023-02-13 2023-03-14 华东交通大学 Self-adaptive image semantic segmentation algorithm based on consistency regularization and semi-supervision field
CN117036891A (en) * 2023-08-22 2023-11-10 睿尔曼智能科技(北京)有限公司 Cross-modal feature fusion-based image recognition method and system
CN117036891B (en) * 2023-08-22 2024-03-29 睿尔曼智能科技(北京)有限公司 Cross-modal feature fusion-based image recognition method and system

Also Published As

Publication number Publication date
CN114419323B (en) 2022-06-24

Similar Documents

Publication Publication Date Title
CN109949317B (en) Semi-supervised image example segmentation method based on gradual confrontation learning
CN114419323B (en) Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
CN110276765B (en) Image panorama segmentation method based on multitask learning deep neural network
CN111950453B (en) Random shape text recognition method based on selective attention mechanism
CN110910391B (en) Video object segmentation method for dual-module neural network structure
CN111783540B (en) Method and system for recognizing human body behaviors in video
CN111325750B (en) Medical image segmentation method based on multi-scale fusion U-shaped chain neural network
CN110648310A (en) Weak supervision casting defect identification method based on attention mechanism
CN110827265B (en) Image anomaly detection method based on deep learning
CN110020658B (en) Salient object detection method based on multitask deep learning
CN114742799B (en) Industrial scene unknown type defect segmentation method based on self-supervision heterogeneous network
CN115713679A (en) Target detection method based on multi-source information fusion, thermal infrared and three-dimensional depth map
Maslov et al. Online supervised attention-based recurrent depth estimation from monocular video
CN113312973A (en) Method and system for extracting features of gesture recognition key points
CN112927266A (en) Weak supervision time domain action positioning method and system based on uncertainty guide training
CN115909002A (en) Image translation method based on contrast learning
CN116452805A (en) Transformer-based RGB-D semantic segmentation method of cross-modal fusion network
CN112529025A (en) Data processing method and device
Zhang et al. Planeseg: Building a plug-in for boosting planar region segmentation
Dong et al. Combination of modified U‐Net and domain adaptation for road detection
CN110942463B (en) Video target segmentation method based on generation countermeasure network
CN111275751B (en) Unsupervised absolute scale calculation method and system
Kim et al. Stereo confidence estimation via locally adaptive fusion and knowledge distillation
CN110647917A (en) Model multiplexing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant