CN113221902B

CN113221902B - Cross-domain self-adaptive semantic segmentation method and system based on data distribution expansion

Info

Publication number: CN113221902B
Application number: CN202110511220.0A
Authority: CN
Inventors: 张兆翔; 宋纯锋; 王玉玺
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2021-10-15
Anticipated expiration: 2041-05-11
Also published as: CN113221902A

Abstract

The invention relates to a cross-domain self-adaptive semantic segmentation method and a system based on data distribution expansion, wherein the cross-domain self-adaptive semantic segmentation method comprises the following steps: acquiring different domain training data sets; respectively carrying out Fourier transform on the source domain image and the target domain image to obtain a corresponding source frequency domain image and a corresponding target frequency domain image; carrying out high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information; obtaining a conversion image through Fourier inversion according to the high-frequency information and the low-frequency information; based on the converted image, performing data enhancement on the source domain image and the target domain image to obtain a source domain expansion image and a target domain expansion image; determining a first semantic segmentation loss model, a first pair of loss-resisting functions, a second pair of loss-resisting functions and a semantic consistency loss function; determining a second semantic segmentation loss model; based on the second semantic segmentation loss model, the accurate semantic segmentation can be performed on the image to be processed, and the segmentation precision is improved.

Description

Cross-domain self-adaptive semantic segmentation method and system based on data distribution expansion

Technical Field

The invention relates to the technical field of computer vision and pattern recognition, in particular to a cross-domain self-adaptive semantic segmentation method and a system based on data distribution expansion.

Background

The domain self-adaptation is an important and challenging task in the field of machine learning as one of the transfer learning, and has wide application in the fields of image recognition, target detection, image semantic segmentation and the like. In a big data era, a large amount of data is generated every day, but labeling data which can be used for machine learning are difficult to obtain, because some data labels need time-consuming fine labeling, such as semantic segmentation labeling at a pixel level, and some people who need labeling have enough masterwork knowledge and experience, such as labeling of medical images, and some data labels are difficult to label and have low precision due to the fact that the data amount is extremely large. Therefore, how to use the existing labeled sample to transfer the learned knowledge of the data into new data is a very practical task.

Disclosure of Invention

In order to solve the above problems in the prior art, i.e. to improve the semantic segmentation precision, the present invention aims to provide a cross-domain adaptive semantic segmentation method and system based on data distribution expansion.

In order to solve the technical problems, the invention provides the following scheme:

a cross-domain adaptive semantic segmentation method based on data distribution expansion comprises the following steps:

acquiring different-domain training data sets, wherein the training data sets comprise a plurality of labeled source domain images and a plurality of unlabeled target domain images;

respectively carrying out Fourier transform on the source domain image and the target domain image aiming at each source domain image and each target domain image to obtain corresponding source frequency domain images and corresponding target frequency domain images;

carrying out high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information;

obtaining a conversion image through Fourier inversion according to the high-frequency information and the low-frequency information;

based on the converted image, performing data enhancement on the source domain image and the target domain image to obtain a source domain expansion image and a target domain expansion image;

determining a first semantic segmentation loss model of a source domain according to each source domain image and a corresponding label;

constructing a first pair of loss-resistant functions according to each pair of source domain image and target domain image based on a domain discriminator; constructing a second pair of loss-resistant functions according to each pair of source domain expansion images and target domain expansion images;

constructing a semantic consistency loss function according to each target domain image and the corresponding target domain expansion image;

determining a second semantic segmentation loss model of the target domain based on the first pair of loss-resisting functions, the second pair of loss-resisting functions, the semantic consistency loss function and the first semantic segmentation loss model of the source domain;

and performing semantic segmentation on the image to be processed based on the second semantic segmentation loss model of the target domain.

Optionally, the obtaining a transform image through inverse fourier transform according to the high-frequency information and the low-frequency information specifically includes:

combining the high-frequency information with the low-frequency information to obtain combined information;

and carrying out Fourier inversion on the combined information to obtain a converted image.

Optionally, the performing, based on the converted image, data enhancement on the source domain image and the target domain image to obtain a source domain expanded image and a target domain expanded image specifically includes:

determining a data enhancement sequence through the converted image according to a set amplitude value;

and respectively expanding the source domain image and the target domain image through the data enhancement sequence to obtain the corresponding source domain expanded image and target domain expanded image.

Optionally, a first semantic segmentation loss model L of the source domain is determined according to the following formula_seg(x_s，y_s)：

Wherein, H represents the length of the source domain image, W represents the width of the source domain image, and C represents the category of the source domain image; h, w denotes the pixel position, c denotes the pixel class, x_sRepresenting a source domain image, y_sRepresenting a source domain image x_sThe corresponding tag data is stored in a memory of the tag,

represents the value of class c at position (h, w),

the prediction result at position (h, w) is indicated.

Optionally, the first loss-immunity function L is determined according to the following formula_adv(x_s，x_t)：

Wherein D represents a domain discriminator function, x_sRepresenting a source domain image, x_tWhich represents the image of the target domain,

representing an image x in the source domain_sThe above desires,

Is represented in the target field image x_tIn the above-mentioned manner, the expectation is that,

representation for source domain image x_sThe result of the prediction of (a) is,

representing an image x for a target domain_tThe predicted result of (1).

Optionally, a second pair of loss-immunity functions is determined according to the following formula

Wherein D represents a domain discriminator function,

representing the source-domain augmented image,

representing the extended image of the target domain,

representing an image augmented in the source domain

The above desires,

Representing an image augmented in a target domain

In the above-mentioned manner, the expectation is that,

representing an augmented image for a source domain

The result of the prediction of (a) is,

representing an augmented image for a target domain

The predicted result of (1).

Optionally, the semantic consistency loss function is determined according to the following formula

Wherein x is_tWhich represents the image of the target domain,

representing the extended image of the target domain,

representing an image x for a target domain_tThe result of the prediction of (a) is,

representing an augmented image for a target domain

Predicted result of (D)_KL(.) represents the KL divergence.

In order to solve the technical problems, the invention also provides the following scheme:

a cross-domain adaptive semantic segmentation system based on data distribution augmentation, the cross-domain adaptive semantic segmentation system comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring training data sets of different domains, and the training data sets comprise a plurality of labeled source domain images and a plurality of unlabeled target domain images;

the frequency domain transformation unit is used for respectively carrying out Fourier transformation on the source domain image and the target domain image aiming at each source domain image and each target domain image to obtain corresponding source frequency domain images and corresponding target frequency domain images;

the Gaussian filtering unit is used for carrying out high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information;

the spatial domain transformation unit is used for obtaining a transformation image through Fourier inversion according to the high-frequency information and the low-frequency information;

the data enhancement unit is used for carrying out data enhancement on the source domain image and the target domain image based on the converted image to obtain a source domain extended image and a target domain extended image;

the first modeling unit is used for determining a first semantic segmentation loss model of the source domain according to each source domain image and the corresponding label;

the countermeasure function establishing unit is used for constructing a first countermeasure loss function according to each pair of source domain image and target domain image based on the domain discriminator; constructing a second pair of loss-resistant functions according to each pair of source domain expansion images and target domain expansion images;

the semantic consistency loss function establishing unit is used for constructing a semantic consistency loss function according to each target domain image and the corresponding target domain expansion image;

the second modeling unit is used for determining a second semantic segmentation loss model of the target domain based on the first pair of loss-resisting functions, the second pair of loss-resisting functions, the semantic consistency loss function and the first semantic segmentation loss model of the source domain;

and the semantic segmentation unit is used for performing semantic segmentation on the image to be processed based on the second semantic segmentation loss model of the target domain.

a cross-domain adaptive semantic segmentation system based on data distribution expansion, comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

a computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to:

According to the embodiment of the invention, the invention discloses the following technical effects:

the method comprises the steps of carrying out distributed expansion on images of a source domain and a target domain, and establishing a first pair of loss-resistant functions and a second pair of loss-resistant functions in countermeasure learning; aiming at the target domain data without labels, a semantic consistency loss is adopted, and the characteristics of the target domain are better learned through an unsupervised method. The invention solves the problem of field inconsistency between the source domain and the target domain from the data distribution expansion angle, thereby improving the semantic segmentation precision of the label-free image to be processed.

Drawings

FIG. 1 is a flow chart of a cross-domain adaptive semantic segmentation method based on data distribution expansion according to the present invention;

FIG. 2 is a block diagram of a cross-domain adaptive semantic segmentation system based on data distribution expansion;

description of the symbols:

the system comprises an acquisition unit-1, a frequency domain transformation unit-2, a Gaussian filter unit-3, a spatial domain transformation unit-4, a data enhancement unit-5, a first modeling unit-6, a countermeasure function establishment unit-7, a semantic consistency loss function establishment unit-8, a second modeling unit-9 and a semantic segmentation unit-10.

Detailed Description

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.

The invention aims to provide a cross-domain self-adaptive semantic segmentation method based on data distribution expansion, which is used for carrying out distribution expansion on images of a source domain and a target domain and establishing a first pair of loss-resistant functions and a second pair of loss-resistant functions in countermeasure learning; aiming at the target domain data without labels, a semantic consistency loss is adopted, and the characteristics of the target domain are better learned through an unsupervised method. The invention solves the problem of field inconsistency between the source domain and the target domain from the data distribution expansion angle, thereby improving the semantic segmentation precision of the label-free image to be processed.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

As shown in fig. 1, the cross-domain adaptive semantic segmentation method based on data distribution expansion of the present invention includes:

step 100: acquiring different-domain training data sets, wherein the training data sets comprise a plurality of labeled source domain images and a plurality of unlabeled target domain images;

step 200: respectively carrying out Fourier transform on the source domain image and the target domain image aiming at each source domain image and each target domain image to obtain corresponding source frequency domain images and corresponding target frequency domain images;

step 300: carrying out high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information;

step 400: obtaining a conversion image through Fourier inversion according to the high-frequency information and the low-frequency information;

step 500: based on the converted image, performing data enhancement on the source domain image and the target domain image to obtain a source domain expansion image and a target domain expansion image;

step 600: determining a first semantic segmentation loss model of a source domain according to each source domain image and a corresponding label;

step 700: constructing a first pair of loss-resistant functions according to each pair of source domain image and target domain image based on a domain discriminator; constructing a second pair of loss-resistant functions according to each pair of source domain expansion images and target domain expansion images;

step 800: constructing a semantic consistency loss function according to each target domain image and the corresponding target domain expansion image;

step 900: determining a second semantic segmentation loss model of the target domain based on the first pair of loss-resisting functions, the second pair of loss-resisting functions, the semantic consistency loss function and the first semantic segmentation loss model of the source domain;

step 1000: and performing semantic segmentation on the image to be processed based on the second semantic segmentation loss model of the target domain.

Before step 100 and step 200, the method for cross-domain adaptive semantic segmentation based on data distribution expansion further includes: and carrying out normalization processing on the training data set to obtain the training data set with uniform size.

For example, in the present embodiment, the normalization processing results in image data of 512 × 1024 × 3 pixels.

In the present invention, in the data input stage, the data distribution of the source domain image and the target domain image is expanded, and the specific method includes firstly translating the data distribution of the source domain toward the distribution direction of the target domain (as in step 400), and secondly performing random data expansion on the translated distribution, so as to expand the data distribution space of the source domain and the target domain, thereby enabling the distribution of the data of the source domain and the target domain to be better aligned (as in step 500).

In step 400, obtaining a transform image through inverse fourier transform according to the high frequency information and the low frequency information specifically includes:

step 410: combining the high-frequency information with the low-frequency information to obtain combined information;

step 420: and carrying out Fourier inversion on the combined information to obtain a converted image.

Specifically, a converted image x 'is obtained according to the following formula'_st：

x′_st＝F^-1([g^l(σ₁).F(x_t)+g^h(σ₂).F(x_s)])；

Wherein F (.) represents Fourier transform, F^-1(.) represents the inverse fourier transform, g^l(σ₁) Representing a low-frequency filter function, σ₁Representing the low frequency filter reference coefficient, g^h(σ₂) Representing a high-frequency filter function, σ₂Representing a high frequency filter reference coefficient, x_sRepresenting a source domain image, x_tRepresenting the target domain image.

Convert image x'_stThe image retains content information of the active domain and has style information of the target domain, thereby enabling migration of the source domain image distribution towards the target domain in preparation for further aligning the source and target domain distributions.

In step 500, the data enhancement is performed on the source domain image and the target domain image based on the converted image to obtain a source domain expanded image and a target domain expanded image, and the method specifically includes:

step 510: and determining a data enhancement sequence through the converted image according to the set amplitude.

Wherein, the data enhancement sequence T is:

T＝{o₁(λ₁；p)，o₂(λ₂；p)，...，o_N(λ_N；p)}；

wherein N represents the dimension, o₁(.) denotes a data enhancement operation, λ₁Represents operation o₁(.) where p represents operation o₁(.).

Step 520: and respectively expanding the source domain image and the target domain image through the data enhancement sequence to obtain the corresponding source domain expanded image and target domain expanded image.

Further, in step 600, a first semantic segmentation loss model L of the source domain may be determined according to the following formula_seg(x_s，y_s)：

represents the value of class c at position (h, w),

the prediction result at position (h, w) is indicated.

The method is based on a framework of counterstudy, discriminative study is carried out on the source domain image and the target domain image in an output layer, so that the source domain and the target domain are distributed more consistently, and counterstudy is carried out on the enhanced source domain extended image and the enhanced target domain extended image simultaneously, so that the characteristics of domain invariance can be better learned.

Specifically, in step 700, a first pair of loss-immunity functions L may be determined according to the following equation_adv(x_s，x_t)：

representing an image x in the source domain_sThe above desires,

representing an image x for a target domain_tThe predicted result of (1).

Determining a second pair of loss-immunity functions according to the following formula

Wherein D represents a domain discriminator function,

representing the source-domain augmented image,

representing the extended image of the target domain,

representing an image augmented in the source domain

The above desires,

Representing an image augmented in a target domain

In the above-mentioned manner, the expectation is that,

representing an augmented image for a source domain

The result of the prediction of (a) is,

representing an augmented image for a target domain

The predicted result of (1).

Aiming at the unsupervised target domain data, the adopted data enhancement strategy only expands the space of data distribution and does not change the distribution of data content, so that the semantic consistency between the target domain expanded image and the target domain image is kept after the conversion, and the characteristics of the target domain are better learned through an unsupervised method.

Specifically, in step 800, a semantic consistency loss function may be determined according to the following formula

Wherein x is_tWhich represents the image of the target domain,

representing the extended image of the target domain,

representing an augmented image for a target domain

Predicted result of (D)_KL(.) represents the KL divergence.

Preferably, the method can train the model through a gradient back propagation algorithm, and test the target domain data set by using the trained second semantic segmentation loss model of the target domain. Firstly, target domain test data is normalized to be uniform in size (such as 512 multiplied by 1024 multiplied by 3 pixels), a trained semantic segmentation model is input to obtain a segmentation result of the target domain data, and the segmentation result is compared with a test data label to determine the performance of a second semantic segmentation loss model of the target domain in the invention.

The method performs distribution expansion on the data of the source domain and the data of the target domain in a data input stage, and mainly comprises two aspects, namely moving the distribution of the data of the source domain to the target domain through distribution translation to reduce the distribution difference of the two domains, and randomly generating a data enhancement sequence through a defined constrained data enhancement space to enlarge the distribution space of the data of the source domain and the data of the target domain, so as to better align the distribution of the source domain and the target domain; on the other hand, the source domain and the target domain are further aligned on the output level through a strategy of counterstudy, the specific method comprises the alignment of the original image and the alignment between the enhanced images, and finally, aiming at the target domain data without labels, the semantic consistency loss is adopted, and the characteristics of the target domain are better learned through an unsupervised method. The method solves the problem of field inconsistency between the source domain and the target domain from the data distribution expansion angle, aligns the distribution of the two domains at the input layer through two aspects of distribution migration and data enhancement, and obtains excellent adaptability under a countermeasure-based learning framework.

In addition, the invention also provides a cross-domain self-adaptive semantic segmentation system based on data distribution expansion, which can improve the semantic segmentation precision.

As shown in fig. 2, the cross-domain adaptive semantic segmentation system based on data distribution expansion of the present invention includes an obtaining unit 1, a frequency domain transforming unit 2, a gaussian filtering unit 3, a spatial domain transforming unit 4, a data enhancing unit 5, a first modeling unit 6, a countermeasure function establishing unit 7, a semantic consistency loss function establishing unit 8, a second modeling unit 9, and a semantic segmentation unit 10.

Specifically, the acquiring unit 1 is configured to acquire different domain training data sets, where the training data sets include a plurality of labeled source domain images and a plurality of unlabeled target domain images;

the frequency domain transforming unit 2 is configured to perform fourier transform on the source domain image and the target domain image respectively for each of the source domain image and the target domain image to obtain a corresponding source frequency domain image and a corresponding target frequency domain image;

the Gaussian filtering unit 3 is used for performing high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information;

the spatial domain transformation unit 4 is used for obtaining a transformation image through Fourier inverse transformation according to the high-frequency information and the low-frequency information;

the data enhancement unit 5 is used for performing data enhancement on the source domain image and the target domain image based on the converted image to obtain a source domain extended image and a target domain extended image;

the first modeling unit 6 is configured to determine a first semantic segmentation loss model of the source domain according to each source domain image and the corresponding label;

the countermeasure function establishing unit 7 is configured to construct a first countermeasure loss function according to each pair of source domain image and target domain image based on a domain discriminator; constructing a second pair of loss-resistant functions according to each pair of source domain expansion images and target domain expansion images;

the semantic consistency loss function establishing unit 8 is used for constructing a semantic consistency loss function according to each target domain image and the corresponding target domain expansion image;

the second modeling unit 9 is configured to determine a second semantic segmentation loss model of the target domain based on the first pair of loss prevention functions, the second pair of loss prevention functions, the semantic consistency loss function, and the first semantic segmentation loss model of the source domain;

the semantic segmentation unit 10 is configured to perform semantic segmentation on the image to be processed based on the second semantic segmentation loss model of the target domain.

In addition, the invention also provides the following scheme:

a processor; and

Further, the invention also provides the following scheme:

Compared with the prior art, the cross-domain adaptive semantic segmentation system based on data distribution expansion and the computer-readable storage medium have the same beneficial effects as the cross-domain adaptive semantic segmentation method based on data distribution expansion, and are not repeated herein.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A cross-domain adaptive semantic segmentation method based on data distribution expansion is characterized by comprising the following steps:

2. The method according to claim 1, wherein the obtaining a transformed image by inverse fourier transform based on the high frequency information and the low frequency information comprises:

3. The method according to claim 1, wherein the data distribution expansion-based cross-domain adaptive semantic segmentation method performs data enhancement on a source domain image and a target domain image based on a converted image to obtain a source domain expanded image and a target domain expanded image, and specifically comprises:

4. The method of claim 1, wherein the first semantic segmentation loss model L of the source domain is determined according to the following formula_seg(x_s，y_s)：

represents the value of class c at position (h, w),

the prediction result at position (h, w) is indicated.

5. The data distribution expansion-based cross-domain adaptive semantic segmentation method according to claim 1, wherein the first pair of loss-tolerant functions L is determined according to the following formula_adv(x_s，x_t)：

representing an image x in the source domain_sThe above desires,

representing an image x for a target domain_tThe predicted result of (1).

6. The data distribution expansion-based cross-domain adaptive semantic segmentation method according to claim 1, wherein the second pair of loss-tolerant functions is determined according to the following formula

Wherein D represents a domain discriminator function,

representing the source-domain augmented image,

representing the extended image of the target domain,

representing an image augmented in the source domain

The above desires,

Representing an image augmented in a target domain

In the above-mentioned manner, the expectation is that,

representing an augmented image for a source domain

The result of the prediction of (a) is,

representing an augmented image for a target domain

The predicted result of (1).

7. The data distribution expansion-based cross-domain adaptive semantic segmentation method according to claim 1, wherein the semantic consistency loss function is determined according to the following formula

Wherein x is_tWhich represents the image of the target domain,

representing the extended image of the target domain,

representing an augmented image for a target domain

Predicted result of (D)_KL(.) represents the KL divergence.

8. A cross-domain adaptive semantic segmentation system based on data distribution expansion, the cross-domain adaptive semantic segmentation system comprising:

9. A cross-domain adaptive semantic segmentation system based on data distribution expansion, comprising:

a processor; and

10. A computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to: