CN113221902B - Cross-domain self-adaptive semantic segmentation method and system based on data distribution expansion - Google Patents
Cross-domain self-adaptive semantic segmentation method and system based on data distribution expansion Download PDFInfo
- Publication number
- CN113221902B CN113221902B CN202110511220.0A CN202110511220A CN113221902B CN 113221902 B CN113221902 B CN 113221902B CN 202110511220 A CN202110511220 A CN 202110511220A CN 113221902 B CN113221902 B CN 113221902B
- Authority
- CN
- China
- Prior art keywords
- image
- domain
- source
- target
- semantic segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 112
- 238000009826 distribution Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000006870 function Effects 0.000 claims abstract description 106
- 238000012549 training Methods 0.000 claims abstract description 27
- 238000001914 filtration Methods 0.000 claims abstract description 17
- 238000006243 chemical reaction Methods 0.000 claims abstract description 11
- 230000003044 adaptive effect Effects 0.000 claims description 21
- 230000003190 augmentative effect Effects 0.000 claims description 18
- 230000009466 transformation Effects 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 6
- 238000002372 labelling Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 3
- 230000001131 transforming effect Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000002265 prevention Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000003416 augmentation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/168—Segmentation; Edge detection involving transform domain methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20048—Transform domain processing
- G06T2207/20056—Discrete and fast Fourier transform, [DFT, FFT]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a cross-domain self-adaptive semantic segmentation method and a system based on data distribution expansion, wherein the cross-domain self-adaptive semantic segmentation method comprises the following steps: acquiring different domain training data sets; respectively carrying out Fourier transform on the source domain image and the target domain image to obtain a corresponding source frequency domain image and a corresponding target frequency domain image; carrying out high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information; obtaining a conversion image through Fourier inversion according to the high-frequency information and the low-frequency information; based on the converted image, performing data enhancement on the source domain image and the target domain image to obtain a source domain expansion image and a target domain expansion image; determining a first semantic segmentation loss model, a first pair of loss-resisting functions, a second pair of loss-resisting functions and a semantic consistency loss function; determining a second semantic segmentation loss model; based on the second semantic segmentation loss model, the accurate semantic segmentation can be performed on the image to be processed, and the segmentation precision is improved.
Description
Technical Field
The invention relates to the technical field of computer vision and pattern recognition, in particular to a cross-domain self-adaptive semantic segmentation method and a system based on data distribution expansion.
Background
The domain self-adaptation is an important and challenging task in the field of machine learning as one of the transfer learning, and has wide application in the fields of image recognition, target detection, image semantic segmentation and the like. In a big data era, a large amount of data is generated every day, but labeling data which can be used for machine learning are difficult to obtain, because some data labels need time-consuming fine labeling, such as semantic segmentation labeling at a pixel level, and some people who need labeling have enough masterwork knowledge and experience, such as labeling of medical images, and some data labels are difficult to label and have low precision due to the fact that the data amount is extremely large. Therefore, how to use the existing labeled sample to transfer the learned knowledge of the data into new data is a very practical task.
Disclosure of Invention
In order to solve the above problems in the prior art, i.e. to improve the semantic segmentation precision, the present invention aims to provide a cross-domain adaptive semantic segmentation method and system based on data distribution expansion.
In order to solve the technical problems, the invention provides the following scheme:
a cross-domain adaptive semantic segmentation method based on data distribution expansion comprises the following steps:
acquiring different-domain training data sets, wherein the training data sets comprise a plurality of labeled source domain images and a plurality of unlabeled target domain images;
respectively carrying out Fourier transform on the source domain image and the target domain image aiming at each source domain image and each target domain image to obtain corresponding source frequency domain images and corresponding target frequency domain images;
carrying out high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information;
obtaining a conversion image through Fourier inversion according to the high-frequency information and the low-frequency information;
based on the converted image, performing data enhancement on the source domain image and the target domain image to obtain a source domain expansion image and a target domain expansion image;
determining a first semantic segmentation loss model of a source domain according to each source domain image and a corresponding label;
constructing a first pair of loss-resistant functions according to each pair of source domain image and target domain image based on a domain discriminator; constructing a second pair of loss-resistant functions according to each pair of source domain expansion images and target domain expansion images;
constructing a semantic consistency loss function according to each target domain image and the corresponding target domain expansion image;
determining a second semantic segmentation loss model of the target domain based on the first pair of loss-resisting functions, the second pair of loss-resisting functions, the semantic consistency loss function and the first semantic segmentation loss model of the source domain;
and performing semantic segmentation on the image to be processed based on the second semantic segmentation loss model of the target domain.
Optionally, the obtaining a transform image through inverse fourier transform according to the high-frequency information and the low-frequency information specifically includes:
combining the high-frequency information with the low-frequency information to obtain combined information;
and carrying out Fourier inversion on the combined information to obtain a converted image.
Optionally, the performing, based on the converted image, data enhancement on the source domain image and the target domain image to obtain a source domain expanded image and a target domain expanded image specifically includes:
determining a data enhancement sequence through the converted image according to a set amplitude value;
and respectively expanding the source domain image and the target domain image through the data enhancement sequence to obtain the corresponding source domain expanded image and target domain expanded image.
Optionally, a first semantic segmentation loss model L of the source domain is determined according to the following formulaseg(xs,ys):
Wherein, H represents the length of the source domain image, W represents the width of the source domain image, and C represents the category of the source domain image; h, w denotes the pixel position, c denotes the pixel class, xsRepresenting a source domain image, ysRepresenting a source domain image xsThe corresponding tag data is stored in a memory of the tag,represents the value of class c at position (h, w),the prediction result at position (h, w) is indicated.
Optionally, the first loss-immunity function L is determined according to the following formulaadv(xs,xt):
Wherein D represents a domain discriminator function, xsRepresenting a source domain image, xtWhich represents the image of the target domain,representing an image x in the source domainsThe above desires,Is represented in the target field image xtIn the above-mentioned manner, the expectation is that,representation for source domain image xsThe result of the prediction of (a) is,representing an image x for a target domaintThe predicted result of (1).
Optionally, a second pair of loss-immunity functions is determined according to the following formula
Wherein D represents a domain discriminator function,representing the source-domain augmented image,representing the extended image of the target domain,representing an image augmented in the source domainThe above desires,Representing an image augmented in a target domainIn the above-mentioned manner, the expectation is that,representing an augmented image for a source domainThe result of the prediction of (a) is,representing an augmented image for a target domainThe predicted result of (1).
Wherein x istWhich represents the image of the target domain,representing the extended image of the target domain,representing an image x for a target domaintThe result of the prediction of (a) is,representing an augmented image for a target domainPredicted result of (D)KL(.) represents the KL divergence.
In order to solve the technical problems, the invention also provides the following scheme:
a cross-domain adaptive semantic segmentation system based on data distribution augmentation, the cross-domain adaptive semantic segmentation system comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring training data sets of different domains, and the training data sets comprise a plurality of labeled source domain images and a plurality of unlabeled target domain images;
the frequency domain transformation unit is used for respectively carrying out Fourier transformation on the source domain image and the target domain image aiming at each source domain image and each target domain image to obtain corresponding source frequency domain images and corresponding target frequency domain images;
the Gaussian filtering unit is used for carrying out high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information;
the spatial domain transformation unit is used for obtaining a transformation image through Fourier inversion according to the high-frequency information and the low-frequency information;
the data enhancement unit is used for carrying out data enhancement on the source domain image and the target domain image based on the converted image to obtain a source domain extended image and a target domain extended image;
the first modeling unit is used for determining a first semantic segmentation loss model of the source domain according to each source domain image and the corresponding label;
the countermeasure function establishing unit is used for constructing a first countermeasure loss function according to each pair of source domain image and target domain image based on the domain discriminator; constructing a second pair of loss-resistant functions according to each pair of source domain expansion images and target domain expansion images;
the semantic consistency loss function establishing unit is used for constructing a semantic consistency loss function according to each target domain image and the corresponding target domain expansion image;
the second modeling unit is used for determining a second semantic segmentation loss model of the target domain based on the first pair of loss-resisting functions, the second pair of loss-resisting functions, the semantic consistency loss function and the first semantic segmentation loss model of the source domain;
and the semantic segmentation unit is used for performing semantic segmentation on the image to be processed based on the second semantic segmentation loss model of the target domain.
In order to solve the technical problems, the invention also provides the following scheme:
a cross-domain adaptive semantic segmentation system based on data distribution expansion, comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
acquiring different-domain training data sets, wherein the training data sets comprise a plurality of labeled source domain images and a plurality of unlabeled target domain images;
respectively carrying out Fourier transform on the source domain image and the target domain image aiming at each source domain image and each target domain image to obtain corresponding source frequency domain images and corresponding target frequency domain images;
carrying out high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information;
obtaining a conversion image through Fourier inversion according to the high-frequency information and the low-frequency information;
based on the converted image, performing data enhancement on the source domain image and the target domain image to obtain a source domain expansion image and a target domain expansion image;
determining a first semantic segmentation loss model of a source domain according to each source domain image and a corresponding label;
constructing a first pair of loss-resistant functions according to each pair of source domain image and target domain image based on a domain discriminator; constructing a second pair of loss-resistant functions according to each pair of source domain expansion images and target domain expansion images;
constructing a semantic consistency loss function according to each target domain image and the corresponding target domain expansion image;
determining a second semantic segmentation loss model of the target domain based on the first pair of loss-resisting functions, the second pair of loss-resisting functions, the semantic consistency loss function and the first semantic segmentation loss model of the source domain;
and performing semantic segmentation on the image to be processed based on the second semantic segmentation loss model of the target domain.
In order to solve the technical problems, the invention also provides the following scheme:
a computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to:
acquiring different-domain training data sets, wherein the training data sets comprise a plurality of labeled source domain images and a plurality of unlabeled target domain images;
respectively carrying out Fourier transform on the source domain image and the target domain image aiming at each source domain image and each target domain image to obtain corresponding source frequency domain images and corresponding target frequency domain images;
carrying out high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information;
obtaining a conversion image through Fourier inversion according to the high-frequency information and the low-frequency information;
based on the converted image, performing data enhancement on the source domain image and the target domain image to obtain a source domain expansion image and a target domain expansion image;
determining a first semantic segmentation loss model of a source domain according to each source domain image and a corresponding label;
constructing a first pair of loss-resistant functions according to each pair of source domain image and target domain image based on a domain discriminator; constructing a second pair of loss-resistant functions according to each pair of source domain expansion images and target domain expansion images;
constructing a semantic consistency loss function according to each target domain image and the corresponding target domain expansion image;
determining a second semantic segmentation loss model of the target domain based on the first pair of loss-resisting functions, the second pair of loss-resisting functions, the semantic consistency loss function and the first semantic segmentation loss model of the source domain;
and performing semantic segmentation on the image to be processed based on the second semantic segmentation loss model of the target domain.
According to the embodiment of the invention, the invention discloses the following technical effects:
the method comprises the steps of carrying out distributed expansion on images of a source domain and a target domain, and establishing a first pair of loss-resistant functions and a second pair of loss-resistant functions in countermeasure learning; aiming at the target domain data without labels, a semantic consistency loss is adopted, and the characteristics of the target domain are better learned through an unsupervised method. The invention solves the problem of field inconsistency between the source domain and the target domain from the data distribution expansion angle, thereby improving the semantic segmentation precision of the label-free image to be processed.
Drawings
FIG. 1 is a flow chart of a cross-domain adaptive semantic segmentation method based on data distribution expansion according to the present invention;
FIG. 2 is a block diagram of a cross-domain adaptive semantic segmentation system based on data distribution expansion;
description of the symbols:
the system comprises an acquisition unit-1, a frequency domain transformation unit-2, a Gaussian filter unit-3, a spatial domain transformation unit-4, a data enhancement unit-5, a first modeling unit-6, a countermeasure function establishment unit-7, a semantic consistency loss function establishment unit-8, a second modeling unit-9 and a semantic segmentation unit-10.
Detailed Description
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
The invention aims to provide a cross-domain self-adaptive semantic segmentation method based on data distribution expansion, which is used for carrying out distribution expansion on images of a source domain and a target domain and establishing a first pair of loss-resistant functions and a second pair of loss-resistant functions in countermeasure learning; aiming at the target domain data without labels, a semantic consistency loss is adopted, and the characteristics of the target domain are better learned through an unsupervised method. The invention solves the problem of field inconsistency between the source domain and the target domain from the data distribution expansion angle, thereby improving the semantic segmentation precision of the label-free image to be processed.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
As shown in fig. 1, the cross-domain adaptive semantic segmentation method based on data distribution expansion of the present invention includes:
step 100: acquiring different-domain training data sets, wherein the training data sets comprise a plurality of labeled source domain images and a plurality of unlabeled target domain images;
step 200: respectively carrying out Fourier transform on the source domain image and the target domain image aiming at each source domain image and each target domain image to obtain corresponding source frequency domain images and corresponding target frequency domain images;
step 300: carrying out high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information;
step 400: obtaining a conversion image through Fourier inversion according to the high-frequency information and the low-frequency information;
step 500: based on the converted image, performing data enhancement on the source domain image and the target domain image to obtain a source domain expansion image and a target domain expansion image;
step 600: determining a first semantic segmentation loss model of a source domain according to each source domain image and a corresponding label;
step 700: constructing a first pair of loss-resistant functions according to each pair of source domain image and target domain image based on a domain discriminator; constructing a second pair of loss-resistant functions according to each pair of source domain expansion images and target domain expansion images;
step 800: constructing a semantic consistency loss function according to each target domain image and the corresponding target domain expansion image;
step 900: determining a second semantic segmentation loss model of the target domain based on the first pair of loss-resisting functions, the second pair of loss-resisting functions, the semantic consistency loss function and the first semantic segmentation loss model of the source domain;
step 1000: and performing semantic segmentation on the image to be processed based on the second semantic segmentation loss model of the target domain.
Before step 100 and step 200, the method for cross-domain adaptive semantic segmentation based on data distribution expansion further includes: and carrying out normalization processing on the training data set to obtain the training data set with uniform size.
For example, in the present embodiment, the normalization processing results in image data of 512 × 1024 × 3 pixels.
In the present invention, in the data input stage, the data distribution of the source domain image and the target domain image is expanded, and the specific method includes firstly translating the data distribution of the source domain toward the distribution direction of the target domain (as in step 400), and secondly performing random data expansion on the translated distribution, so as to expand the data distribution space of the source domain and the target domain, thereby enabling the distribution of the data of the source domain and the target domain to be better aligned (as in step 500).
In step 400, obtaining a transform image through inverse fourier transform according to the high frequency information and the low frequency information specifically includes:
step 410: combining the high-frequency information with the low-frequency information to obtain combined information;
step 420: and carrying out Fourier inversion on the combined information to obtain a converted image.
Specifically, a converted image x 'is obtained according to the following formula'st:
x′st=F-1([gl(σ1).F(xt)+gh(σ2).F(xs)]);
Wherein F (.) represents Fourier transform, F-1(.) represents the inverse fourier transform, gl(σ1) Representing a low-frequency filter function, σ1Representing the low frequency filter reference coefficient, gh(σ2) Representing a high-frequency filter function, σ2Representing a high frequency filter reference coefficient, xsRepresenting a source domain image, xtRepresenting the target domain image.
Convert image x'stThe image retains content information of the active domain and has style information of the target domain, thereby enabling migration of the source domain image distribution towards the target domain in preparation for further aligning the source and target domain distributions.
In step 500, the data enhancement is performed on the source domain image and the target domain image based on the converted image to obtain a source domain expanded image and a target domain expanded image, and the method specifically includes:
step 510: and determining a data enhancement sequence through the converted image according to the set amplitude.
Wherein, the data enhancement sequence T is:
T={o1(λ1;p),o2(λ2;p),...,oN(λN;p)};
wherein N represents the dimension, o1(.) denotes a data enhancement operation, λ1Represents operation o1(.) where p represents operation o1(.).
Step 520: and respectively expanding the source domain image and the target domain image through the data enhancement sequence to obtain the corresponding source domain expanded image and target domain expanded image.
Further, in step 600, a first semantic segmentation loss model L of the source domain may be determined according to the following formulaseg(xs,ys):
Wherein, H represents the length of the source domain image, W represents the width of the source domain image, and C represents the category of the source domain image; h, w denotes the pixel position, c denotes the pixel class, xsRepresenting a source domain image, ysRepresenting a source domain image xsThe corresponding tag data is stored in a memory of the tag,represents the value of class c at position (h, w),the prediction result at position (h, w) is indicated.
The method is based on a framework of counterstudy, discriminative study is carried out on the source domain image and the target domain image in an output layer, so that the source domain and the target domain are distributed more consistently, and counterstudy is carried out on the enhanced source domain extended image and the enhanced target domain extended image simultaneously, so that the characteristics of domain invariance can be better learned.
Specifically, in step 700, a first pair of loss-immunity functions L may be determined according to the following equationadv(xs,xt):
Wherein D represents a domain discriminator function, xsRepresenting a source domain image, xtWhich represents the image of the target domain,representing an image x in the source domainsThe above desires,Is represented in the target field image xtIn the above-mentioned manner, the expectation is that,representation for source domain image xsThe result of the prediction of (a) is,representing an image x for a target domaintThe predicted result of (1).
Wherein D represents a domain discriminator function,representing the source-domain augmented image,representing the extended image of the target domain,representing an image augmented in the source domainThe above desires,Representing an image augmented in a target domainIn the above-mentioned manner, the expectation is that,representing an augmented image for a source domainThe result of the prediction of (a) is,representing an augmented image for a target domainThe predicted result of (1).
Aiming at the unsupervised target domain data, the adopted data enhancement strategy only expands the space of data distribution and does not change the distribution of data content, so that the semantic consistency between the target domain expanded image and the target domain image is kept after the conversion, and the characteristics of the target domain are better learned through an unsupervised method.
Specifically, in step 800, a semantic consistency loss function may be determined according to the following formula
Wherein x istWhich represents the image of the target domain,representing the extended image of the target domain,representing an image x for a target domaintThe result of the prediction of (a) is,representing an augmented image for a target domainPredicted result of (D)KL(.) represents the KL divergence.
Preferably, the method can train the model through a gradient back propagation algorithm, and test the target domain data set by using the trained second semantic segmentation loss model of the target domain. Firstly, target domain test data is normalized to be uniform in size (such as 512 multiplied by 1024 multiplied by 3 pixels), a trained semantic segmentation model is input to obtain a segmentation result of the target domain data, and the segmentation result is compared with a test data label to determine the performance of a second semantic segmentation loss model of the target domain in the invention.
The method performs distribution expansion on the data of the source domain and the data of the target domain in a data input stage, and mainly comprises two aspects, namely moving the distribution of the data of the source domain to the target domain through distribution translation to reduce the distribution difference of the two domains, and randomly generating a data enhancement sequence through a defined constrained data enhancement space to enlarge the distribution space of the data of the source domain and the data of the target domain, so as to better align the distribution of the source domain and the target domain; on the other hand, the source domain and the target domain are further aligned on the output level through a strategy of counterstudy, the specific method comprises the alignment of the original image and the alignment between the enhanced images, and finally, aiming at the target domain data without labels, the semantic consistency loss is adopted, and the characteristics of the target domain are better learned through an unsupervised method. The method solves the problem of field inconsistency between the source domain and the target domain from the data distribution expansion angle, aligns the distribution of the two domains at the input layer through two aspects of distribution migration and data enhancement, and obtains excellent adaptability under a countermeasure-based learning framework.
In addition, the invention also provides a cross-domain self-adaptive semantic segmentation system based on data distribution expansion, which can improve the semantic segmentation precision.
As shown in fig. 2, the cross-domain adaptive semantic segmentation system based on data distribution expansion of the present invention includes an obtaining unit 1, a frequency domain transforming unit 2, a gaussian filtering unit 3, a spatial domain transforming unit 4, a data enhancing unit 5, a first modeling unit 6, a countermeasure function establishing unit 7, a semantic consistency loss function establishing unit 8, a second modeling unit 9, and a semantic segmentation unit 10.
Specifically, the acquiring unit 1 is configured to acquire different domain training data sets, where the training data sets include a plurality of labeled source domain images and a plurality of unlabeled target domain images;
the frequency domain transforming unit 2 is configured to perform fourier transform on the source domain image and the target domain image respectively for each of the source domain image and the target domain image to obtain a corresponding source frequency domain image and a corresponding target frequency domain image;
the Gaussian filtering unit 3 is used for performing high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information;
the spatial domain transformation unit 4 is used for obtaining a transformation image through Fourier inverse transformation according to the high-frequency information and the low-frequency information;
the data enhancement unit 5 is used for performing data enhancement on the source domain image and the target domain image based on the converted image to obtain a source domain extended image and a target domain extended image;
the first modeling unit 6 is configured to determine a first semantic segmentation loss model of the source domain according to each source domain image and the corresponding label;
the countermeasure function establishing unit 7 is configured to construct a first countermeasure loss function according to each pair of source domain image and target domain image based on a domain discriminator; constructing a second pair of loss-resistant functions according to each pair of source domain expansion images and target domain expansion images;
the semantic consistency loss function establishing unit 8 is used for constructing a semantic consistency loss function according to each target domain image and the corresponding target domain expansion image;
the second modeling unit 9 is configured to determine a second semantic segmentation loss model of the target domain based on the first pair of loss prevention functions, the second pair of loss prevention functions, the semantic consistency loss function, and the first semantic segmentation loss model of the source domain;
the semantic segmentation unit 10 is configured to perform semantic segmentation on the image to be processed based on the second semantic segmentation loss model of the target domain.
In addition, the invention also provides the following scheme:
a cross-domain adaptive semantic segmentation system based on data distribution expansion, comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
acquiring different-domain training data sets, wherein the training data sets comprise a plurality of labeled source domain images and a plurality of unlabeled target domain images;
respectively carrying out Fourier transform on the source domain image and the target domain image aiming at each source domain image and each target domain image to obtain corresponding source frequency domain images and corresponding target frequency domain images;
carrying out high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information;
obtaining a conversion image through Fourier inversion according to the high-frequency information and the low-frequency information;
based on the converted image, performing data enhancement on the source domain image and the target domain image to obtain a source domain expansion image and a target domain expansion image;
determining a first semantic segmentation loss model of a source domain according to each source domain image and a corresponding label;
constructing a first pair of loss-resistant functions according to each pair of source domain image and target domain image based on a domain discriminator; constructing a second pair of loss-resistant functions according to each pair of source domain expansion images and target domain expansion images;
constructing a semantic consistency loss function according to each target domain image and the corresponding target domain expansion image;
determining a second semantic segmentation loss model of the target domain based on the first pair of loss-resisting functions, the second pair of loss-resisting functions, the semantic consistency loss function and the first semantic segmentation loss model of the source domain;
and performing semantic segmentation on the image to be processed based on the second semantic segmentation loss model of the target domain.
Further, the invention also provides the following scheme:
a computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to:
acquiring different-domain training data sets, wherein the training data sets comprise a plurality of labeled source domain images and a plurality of unlabeled target domain images;
respectively carrying out Fourier transform on the source domain image and the target domain image aiming at each source domain image and each target domain image to obtain corresponding source frequency domain images and corresponding target frequency domain images;
carrying out high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information;
obtaining a conversion image through Fourier inversion according to the high-frequency information and the low-frequency information;
based on the converted image, performing data enhancement on the source domain image and the target domain image to obtain a source domain expansion image and a target domain expansion image;
determining a first semantic segmentation loss model of a source domain according to each source domain image and a corresponding label;
constructing a first pair of loss-resistant functions according to each pair of source domain image and target domain image based on a domain discriminator; constructing a second pair of loss-resistant functions according to each pair of source domain expansion images and target domain expansion images;
constructing a semantic consistency loss function according to each target domain image and the corresponding target domain expansion image;
determining a second semantic segmentation loss model of the target domain based on the first pair of loss-resisting functions, the second pair of loss-resisting functions, the semantic consistency loss function and the first semantic segmentation loss model of the source domain;
and performing semantic segmentation on the image to be processed based on the second semantic segmentation loss model of the target domain.
Compared with the prior art, the cross-domain adaptive semantic segmentation system based on data distribution expansion and the computer-readable storage medium have the same beneficial effects as the cross-domain adaptive semantic segmentation method based on data distribution expansion, and are not repeated herein.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.
Claims (10)
1. A cross-domain adaptive semantic segmentation method based on data distribution expansion is characterized by comprising the following steps:
acquiring different-domain training data sets, wherein the training data sets comprise a plurality of labeled source domain images and a plurality of unlabeled target domain images;
respectively carrying out Fourier transform on the source domain image and the target domain image aiming at each source domain image and each target domain image to obtain corresponding source frequency domain images and corresponding target frequency domain images;
carrying out high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information;
obtaining a conversion image through Fourier inversion according to the high-frequency information and the low-frequency information;
based on the converted image, performing data enhancement on the source domain image and the target domain image to obtain a source domain expansion image and a target domain expansion image;
determining a first semantic segmentation loss model of a source domain according to each source domain image and a corresponding label;
constructing a first pair of loss-resistant functions according to each pair of source domain image and target domain image based on a domain discriminator; constructing a second pair of loss-resistant functions according to each pair of source domain expansion images and target domain expansion images;
constructing a semantic consistency loss function according to each target domain image and the corresponding target domain expansion image;
determining a second semantic segmentation loss model of the target domain based on the first pair of loss-resisting functions, the second pair of loss-resisting functions, the semantic consistency loss function and the first semantic segmentation loss model of the source domain;
and performing semantic segmentation on the image to be processed based on the second semantic segmentation loss model of the target domain.
2. The method according to claim 1, wherein the obtaining a transformed image by inverse fourier transform based on the high frequency information and the low frequency information comprises:
combining the high-frequency information with the low-frequency information to obtain combined information;
and carrying out Fourier inversion on the combined information to obtain a converted image.
3. The method according to claim 1, wherein the data distribution expansion-based cross-domain adaptive semantic segmentation method performs data enhancement on a source domain image and a target domain image based on a converted image to obtain a source domain expanded image and a target domain expanded image, and specifically comprises:
determining a data enhancement sequence through the converted image according to a set amplitude value;
and respectively expanding the source domain image and the target domain image through the data enhancement sequence to obtain the corresponding source domain expanded image and target domain expanded image.
4. The method of claim 1, wherein the first semantic segmentation loss model L of the source domain is determined according to the following formulaseg(xs,ys):
Wherein, H represents the length of the source domain image, W represents the width of the source domain image, and C represents the category of the source domain image; h, w denotes the pixel position, c denotes the pixel class, xsRepresenting a source domain image, ysRepresenting a source domain image xsThe corresponding tag data is stored in a memory of the tag,represents the value of class c at position (h, w),the prediction result at position (h, w) is indicated.
5. The data distribution expansion-based cross-domain adaptive semantic segmentation method according to claim 1, wherein the first pair of loss-tolerant functions L is determined according to the following formulaadv(xs,xt):
Wherein D represents a domain discriminator function, xsRepresenting a source domain image, xtWhich represents the image of the target domain,representing an image x in the source domainsThe above desires,Is represented in the target field image xtIn the above-mentioned manner, the expectation is that,representation for source domain image xsThe result of the prediction of (a) is,representing an image x for a target domaintThe predicted result of (1).
6. The data distribution expansion-based cross-domain adaptive semantic segmentation method according to claim 1, wherein the second pair of loss-tolerant functions is determined according to the following formula
Wherein D represents a domain discriminator function,representing the source-domain augmented image,representing the extended image of the target domain,representing an image augmented in the source domainThe above desires,Representing an image augmented in a target domainIn the above-mentioned manner, the expectation is that,representing an augmented image for a source domainThe result of the prediction of (a) is,representing an augmented image for a target domainThe predicted result of (1).
7. The data distribution expansion-based cross-domain adaptive semantic segmentation method according to claim 1, wherein the semantic consistency loss function is determined according to the following formula
Wherein x istWhich represents the image of the target domain,representing the extended image of the target domain,representing an image x for a target domaintThe result of the prediction of (a) is,representing an augmented image for a target domainPredicted result of (D)KL(.) represents the KL divergence.
8. A cross-domain adaptive semantic segmentation system based on data distribution expansion, the cross-domain adaptive semantic segmentation system comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring training data sets of different domains, and the training data sets comprise a plurality of labeled source domain images and a plurality of unlabeled target domain images;
the frequency domain transformation unit is used for respectively carrying out Fourier transformation on the source domain image and the target domain image aiming at each source domain image and each target domain image to obtain corresponding source frequency domain images and corresponding target frequency domain images;
the Gaussian filtering unit is used for carrying out high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information;
the spatial domain transformation unit is used for obtaining a transformation image through Fourier inversion according to the high-frequency information and the low-frequency information;
the data enhancement unit is used for carrying out data enhancement on the source domain image and the target domain image based on the converted image to obtain a source domain extended image and a target domain extended image;
the first modeling unit is used for determining a first semantic segmentation loss model of the source domain according to each source domain image and the corresponding label;
the countermeasure function establishing unit is used for constructing a first countermeasure loss function according to each pair of source domain image and target domain image based on the domain discriminator; constructing a second pair of loss-resistant functions according to each pair of source domain expansion images and target domain expansion images;
the semantic consistency loss function establishing unit is used for constructing a semantic consistency loss function according to each target domain image and the corresponding target domain expansion image;
the second modeling unit is used for determining a second semantic segmentation loss model of the target domain based on the first pair of loss-resisting functions, the second pair of loss-resisting functions, the semantic consistency loss function and the first semantic segmentation loss model of the source domain;
and the semantic segmentation unit is used for performing semantic segmentation on the image to be processed based on the second semantic segmentation loss model of the target domain.
9. A cross-domain adaptive semantic segmentation system based on data distribution expansion, comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
acquiring different-domain training data sets, wherein the training data sets comprise a plurality of labeled source domain images and a plurality of unlabeled target domain images;
respectively carrying out Fourier transform on the source domain image and the target domain image aiming at each source domain image and each target domain image to obtain corresponding source frequency domain images and corresponding target frequency domain images;
carrying out high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information;
obtaining a conversion image through Fourier inversion according to the high-frequency information and the low-frequency information;
based on the converted image, performing data enhancement on the source domain image and the target domain image to obtain a source domain expansion image and a target domain expansion image;
determining a first semantic segmentation loss model of a source domain according to each source domain image and a corresponding label;
constructing a first pair of loss-resistant functions according to each pair of source domain image and target domain image based on a domain discriminator; constructing a second pair of loss-resistant functions according to each pair of source domain expansion images and target domain expansion images;
constructing a semantic consistency loss function according to each target domain image and the corresponding target domain expansion image;
determining a second semantic segmentation loss model of the target domain based on the first pair of loss-resisting functions, the second pair of loss-resisting functions, the semantic consistency loss function and the first semantic segmentation loss model of the source domain;
and performing semantic segmentation on the image to be processed based on the second semantic segmentation loss model of the target domain.
10. A computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to:
acquiring different-domain training data sets, wherein the training data sets comprise a plurality of labeled source domain images and a plurality of unlabeled target domain images;
respectively carrying out Fourier transform on the source domain image and the target domain image aiming at each source domain image and each target domain image to obtain corresponding source frequency domain images and corresponding target frequency domain images;
carrying out high-frequency filtering on the source frequency domain image to obtain high-frequency information; carrying out low-frequency wave on the target frequency domain image to obtain low-frequency information;
obtaining a conversion image through Fourier inversion according to the high-frequency information and the low-frequency information;
based on the converted image, performing data enhancement on the source domain image and the target domain image to obtain a source domain expansion image and a target domain expansion image;
determining a first semantic segmentation loss model of a source domain according to each source domain image and a corresponding label;
constructing a first pair of loss-resistant functions according to each pair of source domain image and target domain image based on a domain discriminator; constructing a second pair of loss-resistant functions according to each pair of source domain expansion images and target domain expansion images;
constructing a semantic consistency loss function according to each target domain image and the corresponding target domain expansion image;
determining a second semantic segmentation loss model of the target domain based on the first pair of loss-resisting functions, the second pair of loss-resisting functions, the semantic consistency loss function and the first semantic segmentation loss model of the source domain;
and performing semantic segmentation on the image to be processed based on the second semantic segmentation loss model of the target domain.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110511220.0A CN113221902B (en) | 2021-05-11 | 2021-05-11 | Cross-domain self-adaptive semantic segmentation method and system based on data distribution expansion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110511220.0A CN113221902B (en) | 2021-05-11 | 2021-05-11 | Cross-domain self-adaptive semantic segmentation method and system based on data distribution expansion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113221902A CN113221902A (en) | 2021-08-06 |
CN113221902B true CN113221902B (en) | 2021-10-15 |
Family
ID=77094685
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110511220.0A Active CN113221902B (en) | 2021-05-11 | 2021-05-11 | Cross-domain self-adaptive semantic segmentation method and system based on data distribution expansion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113221902B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113674277B (en) * | 2021-10-22 | 2022-02-22 | 北京矩视智能科技有限公司 | Unsupervised domain adaptive surface defect region segmentation method and device and electronic equipment |
CN114492599B (en) * | 2022-01-07 | 2024-09-13 | 北京邮电大学 | Medical image preprocessing method and device based on Fourier domain self-adaption |
CN116206108B (en) * | 2023-02-16 | 2024-09-20 | 苏州大学 | OCT image choroid segmentation system and method based on domain self-adaption |
CN116468744B (en) * | 2023-06-19 | 2023-09-05 | 武汉大水云科技有限公司 | Double-distribution matching multi-domain adaptive segmentation method and system for water area scene |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109190707A (en) * | 2018-09-12 | 2019-01-11 | 深圳市唯特视科技有限公司 | A kind of domain adapting to image semantic segmentation method based on confrontation study |
CN110197229A (en) * | 2019-05-31 | 2019-09-03 | 腾讯科技(深圳)有限公司 | Training method, device and the storage medium of image processing model |
WO2019221965A1 (en) * | 2018-05-16 | 2019-11-21 | Nec Laboratories America, Inc. | Unsupervised cross-domain distance metric adaptation with feature transfer network |
CN110533044A (en) * | 2019-05-29 | 2019-12-03 | 广东工业大学 | A kind of domain adaptation image, semantic dividing method based on GAN |
CN111275713A (en) * | 2020-02-03 | 2020-06-12 | 武汉大学 | Cross-domain semantic segmentation method based on countermeasure self-integration network |
CN111723813A (en) * | 2020-06-05 | 2020-09-29 | 中国科学院自动化研究所 | Weak supervision image semantic segmentation method, system and device based on intra-class discriminator |
CN111832511A (en) * | 2020-07-21 | 2020-10-27 | 中国石油大学(华东) | Unsupervised pedestrian re-identification method for enhancing sample data |
US10825219B2 (en) * | 2018-03-22 | 2020-11-03 | Northeastern University | Segmentation guided image generation with adversarial networks |
CN112308862A (en) * | 2020-06-04 | 2021-02-02 | 北京京东尚科信息技术有限公司 | Image semantic segmentation model training method, image semantic segmentation model training device, image semantic segmentation model segmentation method, image semantic segmentation model segmentation device and storage medium |
CN112330625A (en) * | 2020-11-03 | 2021-02-05 | 杭州迪英加科技有限公司 | Immunohistochemical nuclear staining section cell positioning multi-domain co-adaptation training method |
CN112598003A (en) * | 2020-12-18 | 2021-04-02 | 燕山大学 | Real-time semantic segmentation method based on data expansion and full-supervision preprocessing |
CN112668594A (en) * | 2021-01-26 | 2021-04-16 | 华南理工大学 | Unsupervised image target detection method based on antagonism domain adaptation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20200075344A (en) * | 2018-12-18 | 2020-06-26 | 삼성전자주식회사 | Detector, method of object detection, learning apparatus, and learning method for domain transformation |
-
2021
- 2021-05-11 CN CN202110511220.0A patent/CN113221902B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10825219B2 (en) * | 2018-03-22 | 2020-11-03 | Northeastern University | Segmentation guided image generation with adversarial networks |
WO2019221965A1 (en) * | 2018-05-16 | 2019-11-21 | Nec Laboratories America, Inc. | Unsupervised cross-domain distance metric adaptation with feature transfer network |
CN109190707A (en) * | 2018-09-12 | 2019-01-11 | 深圳市唯特视科技有限公司 | A kind of domain adapting to image semantic segmentation method based on confrontation study |
CN110533044A (en) * | 2019-05-29 | 2019-12-03 | 广东工业大学 | A kind of domain adaptation image, semantic dividing method based on GAN |
CN110197229A (en) * | 2019-05-31 | 2019-09-03 | 腾讯科技(深圳)有限公司 | Training method, device and the storage medium of image processing model |
CN111275713A (en) * | 2020-02-03 | 2020-06-12 | 武汉大学 | Cross-domain semantic segmentation method based on countermeasure self-integration network |
CN112308862A (en) * | 2020-06-04 | 2021-02-02 | 北京京东尚科信息技术有限公司 | Image semantic segmentation model training method, image semantic segmentation model training device, image semantic segmentation model segmentation method, image semantic segmentation model segmentation device and storage medium |
CN111723813A (en) * | 2020-06-05 | 2020-09-29 | 中国科学院自动化研究所 | Weak supervision image semantic segmentation method, system and device based on intra-class discriminator |
CN111832511A (en) * | 2020-07-21 | 2020-10-27 | 中国石油大学(华东) | Unsupervised pedestrian re-identification method for enhancing sample data |
CN112330625A (en) * | 2020-11-03 | 2021-02-05 | 杭州迪英加科技有限公司 | Immunohistochemical nuclear staining section cell positioning multi-domain co-adaptation training method |
CN112598003A (en) * | 2020-12-18 | 2021-04-02 | 燕山大学 | Real-time semantic segmentation method based on data expansion and full-supervision preprocessing |
CN112668594A (en) * | 2021-01-26 | 2021-04-16 | 华南理工大学 | Unsupervised image target detection method based on antagonism domain adaptation |
Non-Patent Citations (4)
Title |
---|
Attention Guided Multiple Source and Target Domain Adaptation;Yuxi Wang 等;《 IEEE Transactions on Image Processing》;IEEE;20201028;第30卷;892 - 906 * |
Unsupervised domain adaptation with adversarial distribution adaptation network;Qiang Zhou 等;《Neural Computing and Applications》;20201124;第33卷(第3期);7709–7721 * |
基于双重权重偏差建模的无监督域适应;马闯 等;《计算机科学》;20210205;第48卷(第2期);217-223 * |
基于多核域自适应稀疏表示与深度卷积神经网络的图像分类方法研究;王希龙;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20210215;I138-2340 * |
Also Published As
Publication number | Publication date |
---|---|
CN113221902A (en) | 2021-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113221902B (en) | Cross-domain self-adaptive semantic segmentation method and system based on data distribution expansion | |
Tu et al. | Joint face image restoration and frontalization for recognition | |
Zhao et al. | Dd-cyclegan: Unpaired image dehazing via double-discriminator cycle-consistent generative adversarial network | |
Wu et al. | Dynamic filtering with large sampling field for convnets | |
Khmag | Additive Gaussian noise removal based on generative adversarial network model and semi-soft thresholding approach | |
CN111583210B (en) | Automatic breast cancer image identification method based on convolutional neural network model integration | |
CN113723295A (en) | Face counterfeiting detection method based on image domain frequency domain double-flow network | |
Ni et al. | Example-driven manifold priors for image deconvolution | |
Liu et al. | True wide convolutional neural network for image denoising | |
CN115272306B (en) | Solar cell panel grid line enhancement method utilizing gradient operation | |
CN110135435B (en) | Saliency detection method and device based on breadth learning system | |
Nanthini et al. | A survey on data augmentation techniques | |
Jia et al. | Dual-complementary convolution network for remote-sensing image denoising | |
CN112651329B (en) | Low-resolution ship classification method for generating countermeasure network through double-flow feature learning | |
Yan et al. | Joint image-to-image translation with denoising using enhanced generative adversarial networks | |
Lemarchand et al. | Opendenoising: an extensible benchmark for building comparative studies of image denoisers | |
CN114724183B (en) | Human body key point detection method, system, electronic equipment and readable storage medium | |
CN107680126A (en) | The images match denoising system and method for random sampling uniformity | |
Wang et al. | EIDNet: Extragradient-based iterative denoising network for image compressive sensing reconstruction | |
CN111369452A (en) | Large-area image local damage point optimization extraction method | |
Akinlar et al. | A novel matching of MR images using gabor wavelets | |
Li et al. | Foundation | |
CN113537291B (en) | Image frequency domain countermeasure sample generation method and system | |
Yu et al. | MACFNet: multi-attention complementary fusion network for image denoising | |
Yang et al. | Bi-path network coupling for single image super-resolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |