CN114882220B

CN114882220B - Domain-adaptive priori knowledge-based GAN (generic object model) image generation method and system

Info

Publication number: CN114882220B
Application number: CN202210548444.3A
Authority: CN
Inventors: 张凯; 史洋; 聂秀山; 逯天斌
Original assignee: Shandong Liju Robot Technology Co ltd
Current assignee: Shandong Liju Robot Technology Co ltd
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2023-02-28
Anticipated expiration: 2042-05-20
Also published as: CN114882220A

Abstract

The invention discloses an image generation method and system for generating a countermeasure network based on domain self-adaptive prior knowledge guidance, wherein the method comprises the following steps: the method comprises the steps of data set preparation, data set preprocessing, source domain generator in a training source domain network model, target domain generator in a training target domain network model, image augmentation, source domain judger in the training source domain network model and target domain judger in the target domain network model. In the GAN proposed by the present invention, the generator includes a source domain branch and a target domain branch. The source domain branch is used for learning content information of a large amount of data similar to the target domain, and the affine parameter migration and domain mixing technology of the BN layer is utilized to migrate the knowledge of the source domain into the target domain, so that the problem of limited data of the target domain is solved. In order to further improve the quality of the generated image, a spatial adaptive normalization module is introduced into the target domain branch, and the prior knowledge of the main target is introduced in the target domain image generation process, so that the accuracy of the target in the generated image is improved.

Description

Domain-adaptive priori knowledge-based GAN (generic image generation) guided image generation method and system

Technical Field

The invention relates to an image generation technology, belongs to the field of computer vision and artificial intelligence, and particularly relates to an image generation method and system for generating a countermeasure network based on domain adaptive priori knowledge guidance.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

With the proposal of generating a countermeasure network (GAN), the image generation field is in the trend of research, and the GAN-based image generation model has achieved satisfactory effects in the task fields of style migration, image restoration, super-resolution, image translation, and the like.

In general, the GAN model consists of a two-part network, one of which is a generator sub-network for generating images and the other of which is a decision maker sub-network for ensuring that the generated images are consistent with the target image. The training process of the network is also a process in which two sub-networks game each other and jointly optimize. The complex structure of the GAN network makes its parameters large, so training GAN often requires a large amount of data. If the data volume is insufficient, the image generation quality is not high, and the generated image mode is collapsed (mode collapse). However, in certain tasks (e.g., medical image generation), it is difficult to collect large amounts of data, which can lead to reduced model performance.

For the condition of limited data, the adoption of the transfer learning is an effective idea for improving the network performance. In transfer learning, a domain adaptation technique fires, which is capable of aligning the source domain training data and the target domain data in a feature representation within the latent space. The same or similar features can be extracted from the data of the two domains by the network, so that a large number of source domain data extraction features can effectively assist the training of the target domain data, thereby improving the performance of the model trained under the condition of limited data.

Although GAN can generate data according with the image distribution of the training set under a certain training strategy, the quality of the generated image is difficult to guarantee, and the situation of content blurring and the like often occurs. This is often due to the fact that the regularization approach in the network is not appropriate. The problem is solved to a certain extent by spatial-Adaptive Normalization (SPADE), which obtains affine parameters in the regularization layer by performing convolution calculation on additional semantic segmentation labels. The area with the example in the semantic label is more obvious in the feature map extracted by the network, so that the semantic property of the feature map is enhanced, and the result of the generated image is more vivid.

Disclosure of Invention

In order to solve the problem that paired data and data labels are difficult to obtain, the invention provides a domain-adaptive priori-knowledge-based GAN (generic object model) image generation method and system. In the GAN proposed by the present invention, the generator includes two branches, a source domain branch and a target domain branch. The source domain branch is used for learning content information of a large amount of data similar to the target domain, and the affine parameter migration and domain mixing technology of the batch regularization layer is utilized to migrate the knowledge of the source domain into the target domain, so that the problem of limited data of the target domain is solved. In order to further improve the quality of the generated image, a spatial adaptive normalization module is introduced into the target domain branch, and the priori knowledge of the main target is introduced in the target domain image generation process, so that the accuracy of the target in the generated image is improved. In order to improve the importance of the important target region in the discrimination process, a spatial adaptive normalization module is introduced into the discriminator so that the discriminator can focus on the target region.

In order to realize the above content, the invention adopts the following technical scheme:

the invention provides an image generation method for generating a countermeasure network based on domain self-adaptive priori knowledge guidance, which comprises the following steps:

s1, preparing a data set: collecting paired images and semantic segmentation labels corresponding to the images according to task requirements, and using the paired images and the semantic segmentation labels as target domain data during training; collecting images similar to or related to the translated images in the target domain from the Internet without labels as source domain data during training;

s2, preprocessing a data set: unifying the sizes of all image data in the target domain data and the source domain data;

s3, training a source domain generator in the source domain network model: when the source domain data are used for training the model, the input of the model is a noise vector, the noise vector is processed by a full connection layer, new vectors are recombined into a uniform size, and at the moment, batch regularization (Batch regularization) is used as a regularization layer for the model.

It is a desire of the present invention to enable a network to generate a translated image similar to that in the target domain. When a network has this capability, it can be said that the network holds content information that generates the translated image.

S4, training a target domain generator in the target domain network model: when the target domain data is used for training the model, the input data received by the model are the image to be translated and the semantic segmentation label of the image to be translated, wherein the semantic segmentation label is used for performing conditional regularization for space self-adaptive normalized regularization and enhancing the constraint of the image to be translated on the generated translated image;

s5, image augmentation: and the self-adaptive decision device is used for enhancing, the image before being input into the self-adaptive decision device is randomly enhanced, and the self-adaptive decision device only decides the enhanced image. This approach can broaden the distribution of images, providing a greater gradient to aid training;

s6, training a source domain judger in the source domain network model and a target domain judger in the target domain network model: a source domain judger in the source domain network model and a target domain judger in the target domain network model do not share a regularization layer; when a target domain judger in a target domain network model is trained, the target domain judger in the target domain network model receives a target domain real image or a synthetic image and semantic segmentation labels of the real image, and the semantic segmentation labels are also used for conditional regularization of a space self-adaptive regularization layer. Thereby focusing more on local objects.

Preferably, in the data set preparation step, the collected images are divided into images to be translated, semantic segmentation labels of the images to be translated, and the translated images and the translated semantic segmentation labels of the images are correspondingly placed in four folders as target domain data; the images associated with the translated images are collected as source domain images, either using public data sets associated with the translated images or from the internet, and placed individually in a folder.

Preferably, in the step of preprocessing the data set, naming rules are set for each group of four data in the target domain data, so as to facilitate grouping.

Preferably, in the step of training the source domain generator in the source domain network model, the noise vector is subjected to full convolution layer dimensionality lifting to become a 65536 dimensionality vector, then the 65536 dimensionality vector is converted into a 256 × 256 dimensionality matrix, and then the matrix is input into the convolution layer, and batch regularization layer is used for regularization after convolution; the process of generating the image by the source domain generator needs to generate 256 × 256 false source domain images by up-sampling after down-sampling; the image received by the source domain decider is a real source domain image or a false source domain image, but the image received by the source domain decider is an enhanced image.

Preferably, in the step of training the target domain generator in the target domain network model, the pre-translated image and the post-translated image are not in the same distribution, so that the spatial adaptive regularization layer does not constrain the last up-sampling layer of the target domain generator, and only constrains the previous down-sampling layer and the feature extraction layer. Thus, the last layers can be better guided by the target domain judger, so that the generated result is closer to the translated image, and the characteristics of the image before translation are reserved. Because the source domain has a large amount of data and the batch regularization layer can learn the content invariant information of the image domain, during training, the affine parameters of the batch regularization layer of the corresponding layer are migrated into the spatial adaptive normalization layer, so as to help strengthen the relation between the source domain and the target domain.

Preferably, in the step of training the target domain generator in the target domain network model, the target domain generator receives the image to be translated and the semantic segmentation label of the image to be translated, the size of the image to be translated is 256 × 256 pixels, and the image to be translated directly enters the convolutional layer network through a full connection layer without the need of training the source domain network; the feature map after convolution uses space self-adaptive Normalization, the feature map (feature map) basic regularization in the space self-adaptive Normalization regularization uses an example regularization calculation mode, affine transformation is carried out through an additionally input to-be-translated image semantic segmentation label, firstly, the input to-be-translated image semantic segmentation label is subjected to one convolution to obtain output, then, two tensors are obtained through two convolutions respectively, the two tensors are used as an offset (beta) and a scaling quantity (gamma) in affine transformation parameters, then, firstly, affine parameters of a source domain batch layer regularization are used for restoring the feature map distribution, and then, gamma and beta are used for multiplying and adding the feature map in element level to obtain final output; the spatial adaptive normalization layer is only used in the downsampling and intermediate volume blocks, and the upsampling and source domain network model share the batch regularization layer, so that the translated image of the false target domain is finally obtained.

Preferably, in the step of image augmentation, an adaptive decision device is used for enhancement, and the augmentation change of the image occurs before the input of the adaptive decision device and not before the adaptive generator; before the image is sent to the self-adaptive decision device, the image is subjected to an augmentation mode of color change or random shielding randomly, and the adopted random probability is 0.8 of a safety value; the decided images will be those with 0.8 probability augmentation.

Preferably, in the step of training the source domain judger in the source domain network model and the target domain judger in the target domain network model, the adaptive judger receives a real translated image or a false translated image, and a semantic segmentation label of the real translated image; the image received by the self-adaptive decision device enters a convolution layer to extract features; the semantic segmentation labels of the real translated images are used as a space self-adaptive normalization layer to carry out condition regularization; the use of a spatial adaptive normalization layer in the adaptive decision maker enables the adaptive decision maker to focus on critical target regions; and finally, obtaining a judgment result.

The invention also provides an image generation system for generating a confrontation network based on the guidance of domain self-adaptive prior knowledge, which comprises a data set preparation module, a data set preprocessing module, a training source domain network module, a training target domain generation network module, an image augmentation module and a training target domain judgment network module, wherein the data set preparation module is used for collecting paired images and semantic segmentation labels corresponding to the images aiming at task requirements as target domain data during training; collecting images similar to or related to the translated images in the target domain from the Internet without labels as source domain data during training; the data set preprocessing module is used for unifying the sizes of all image data in the target domain data and the source domain data; the training source domain network module trains a model by using the source domain data, the input of the model is a noise vector, the noise vector is processed by a full connection layer, a new vector is recombined into a uniform image size, and the model uses a batch regularization layer; the training target domain generation network module uses the target domain data training model, input data received by the model are images to be translated and semantic segmentation labels of the images to be translated, the semantic segmentation labels are used for conditioning regularization of a space self-adaptive normalization layer, and constraint of the images to be translated on the generated translated images is enhanced; the image amplification module uses a self-adaptive decision device to enhance, the image before being input into the self-adaptive decision device is randomly enhanced, and the self-adaptive decision device only decides the enhanced image; the training target domain judgment network module adopts a self-adaptive judgment device to receive a real translated image or a false translated image and a semantic segmentation label of the real translated image; the real image or the composite image received by the self-adaptive decision device enters the convolution layer to extract features; the semantic segmentation labels of the real translated images are used as a space self-adaptive normalization layer to carry out condition regularization; the use of a spatial adaptive normalization layer in the adaptive decision maker enables the adaptive decision maker to focus on critical target regions.

Compared with the prior art, the invention has the beneficial effects that:

the image generation model is built based on the GAN, and the domain adaptive technology is combined to help image generation under the condition of small samples. The training process of the model has two lines of simultaneous training, where the source domain generation branch is used for source domain data generation, the source domain data is similar to the translated image, and the requirement for this line is to generate a realistic image that is similar to the translated image. This line now holds a large amount of information necessary to generate the image. The circuit is used for target domain image translation, the model receives an image to be translated as input, and semantic segmentation label information of the image to be translated is injected by using a space self-adaptive normalization layer. With this additional information, the relationship with the translated image is facilitated to be established. Because the first line can generate images similar to the translated images, and the batch regularization layer affine parameters of the first line store information related to the distribution of the translated images, the migration affine parameters help the second line to be close to the distribution of the translated images. The invention introduces domain self-adaptation into the GAN network, and helps to improve the capability of training the network by small sample data. The purpose of generating a vivid image can be achieved by using only 160 images in an experiment.

Drawings

The drawings in the following description are for the purpose of clearly understanding embodiments and technical aspects of the present invention, and it should be noted that the exemplary embodiments and descriptions thereof are only for the purpose of explaining the present invention and do not constitute an unlimited limit to the present invention.

FIG. 1 is a flow chart of a batch of data training models in the present invention.

Fig. 2 is a schematic structural diagram of a generator according to the present invention.

Fig. 3 is a schematic structural diagram of an adaptive decision device according to the present invention.

Detailed Description

It is to be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The embodiments and features of the embodiments of the invention may be combined with each other without conflict.

The invention provides an image generation method for generating a countermeasure network based on domain self-adaptive prior knowledge guidance, which comprises the following steps:

s1, data set preparation:

image data related to the task is collected and divided into an image to be translated, semantic segmentation labels _ B of the image to be translated, an image translated, image images _ A and semantic segmentation labels _ A of the image translated, the image data and the semantic segmentation labels _ A of the image translated are correspondingly placed in four folders, and the data are used as target domain data. Images related to the translated image images _ a are collected as source domain images source _ images in a folder separately using a public data set related to the translated image images _ a or from the internet.

S2, preprocessing a data set:

the main image area is cut out, then the image data is subjected to size transformation and scaled to 256 × 256, and finally the image is normalized, wherein the normalization parameters are all 0.5.

S3, training a source domain generator in the source domain network model:

when training the model on the source domain data, the model is set to the source domain mode. The source domain generator receives as input a noise vector that is scaled up through a full convolution layer to a 65536 dimensional vector, and then converts the 65536 dimensional vector to a 256 x 256 dimensional matrix, which is then input to the convolution layer, where batch regularization is used for the regularization after convolution. The generation process needs to go through a downsampling layer, a convolution layer with a constant size, and finally upsampling to generate 256 × 256 false source domain images (fake source _ images). The image received by the source domain decider is a real source domain image (real source _ images) or a synthesized fake source domain image (fake source _ images), but the image is an enhanced image, and the enhancing manner is described in detail in S5. The source domain decider tries to distinguish whether the input image is a real image or a synthesized false image. The parameters of the source domain generator and the source domain decider are alternately updated by the feedback.

S4, training a target domain generator in the target domain network model:

when the network is trained on the target domain data, the target domain generator receives images to be translated _ B and semantic segmentation labels _ B of the images to be translated, and the model is set to be in a target domain mode. The size of the image to be translated _ B is 256 pixels by 256 pixels, and the image to be translated _ B directly enters the convolutional layer network through the full connection layer without the need of training the source domain network. The feature graph after convolution uses space self-adaptive normalization, basic regularization used in the space self-adaptive normalization is a calculation mode of example regularization, affine transformation is carried out through additionally input image semantic segmentation labels _ B to be translated, firstly, the input image semantic segmentation labels _ B to be translated are output after being convoluted for one time, two tensors are obtained through two convolutions respectively, the two tensors are used as a scaling value (gamma) and an offset value (beta) in affine transformation parameters, then, the feature graph distribution is restored through affine parameters of a batch regularization layer in a source domain model, and then, element-level multiplication and addition are carried out on the feature graph through the gamma and the beta to obtain final output. The spatial adaptive Normalization layer is only used in the downsampling and intermediate volume blocks, the upsampling and source domain share a BN layer (Batch Normalization), and finally a translated image (fake images _ A) of a false target domain is obtained.

S5, image augmentation:

the conventional augmented approach is not applicable to GAN networks, and the present invention employs adaptive decision maker enhancements specifically proposed for GAN. I.e. the augmented change of the image occurs before the input adaptation determiner and not before the adaptation generator. The image is subjected to color change, random shielding and other augmentation modes randomly before being sent to the self-adaptive decision device. The originally set random probability of the method is automatically adjusted according to the degree of overfitting, but the judgment standard of overfitting is not suitable for the method, and the random probability adopted by the method is a safety value of 0.8. The decided images will be those with 0.8 probability augmentation.

S6, training a source domain judger in the source domain network model and a target domain judger in the target domain network model:

the decider receives a real translated image (real images _ a) or a fake translated image (fake images _ B), and a semantic segmentation label Labels _ a of the real translated image. The model is also set to the target domain mode. And the received real image or false image enters the convolutional layer to extract features. Labels _ A is used as a spatial adaptive normalization layer for conditional regularization, and the calculation process is as described in S4. The use of spatially adaptive normalization in the decider may enable the decider to focus on critical target regions. And finally, whether the input image is a real translated image or a false translated image is obtained.

The invention also provides an image generation system for generating a confrontation network based on the guidance of domain self-adaptive prior knowledge, which comprises a data set preparation module, a data set preprocessing module, a training source domain network module, a training target domain generation network module, an image augmentation module and a training target domain judgment network module, wherein the data set preparation module is used for collecting paired images and semantic segmentation labels corresponding to the images aiming at task requirements as target domain data during training; collecting images similar to or related to the translated images in the target domain from the Internet without labels as source domain data during training; the data set preprocessing module is used for unifying the sizes of all image data in the target domain data and the source domain data; the training source domain network module trains a model by using the source domain data, the input of the model is a noise vector, the noise vector is processed by a full-connection layer, a new vector is recombined into a uniform image size, and the model uses batch regularization as a regularization layer; the training target domain generation network module uses the target domain data training model, input data received by the model are images to be translated and semantic segmentation labels of the images to be translated, the semantic segmentation labels are used for conditioning regularization of a space self-adaptive normalization layer, and constraint of the images to be translated on the generated translated images is enhanced; the image amplification module uses a self-adaptive decision device to enhance, the image before being input into the self-adaptive decision device is randomly enhanced, and the self-adaptive decision device only decides the enhanced image; the training target domain judgment network module adopts a self-adaptive judgment device to receive a real translated image or a false translated image and a semantic segmentation label of the real translated image; the real image or the false image received by the decision device enters the convolution layer to extract the characteristics; the semantic segmentation labels of the real translated images are used as a space self-adaptive normalization layer to carry out condition regularization; the use of spatially adaptive normalization in the adaptive decision maker enables the adaptive decision maker to focus on critical target regions.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. An image generation method for generating a countermeasure network based on the guidance of domain adaptive prior knowledge, which is characterized by comprising the following steps:

s1, data set preparation: collecting paired images and semantic segmentation labels corresponding to the images according to task requirements, and using the images and the semantic segmentation labels as target domain data during training; collecting images similar to or related to the translated images in the target domain from the Internet without labels as source domain data during training;

s3, training a source domain generator in the source domain network model: when the source domain data are used for training the model, the input of the model is a noise vector, and after the noise vector is processed by a full connection layer, the new vector is recombined and unified into the size of an image;

in the step of training the source domain generator in the source domain network model, the noise vector is subjected to full convolution layer dimensionality raising to become a 65536 dimensionality vector, then the 65536 dimensionality vector is converted into a 256 x 256 dimensionality matrix and then input into the convolution layer, and batch regularization layer is used for regularization after convolution; the process of generating the image by the source domain generator needs to generate 256 × 256 false source domain images by up-sampling after down-sampling; the image received by the source domain judger is a real source domain image or a false source domain image, but the image received by the source domain judger is an enhanced image;

s4, training a target domain generator in the target domain network model: when the target domain data is used for training the model, the input data received by the model are the image to be translated and the semantic segmentation label of the image to be translated, the semantic segmentation label is used for performing conditional regularization on a space self-adaptive normalization layer, and the constraint of the image to be translated on the generated translated image is enhanced;

s5, image augmentation: the self-adaptive decision device is used for enhancement, the image before being input into the self-adaptive decision device is randomly enhanced, and the self-adaptive decision device only decides the enhanced image;

s6, training a source domain judger in the source domain network model and a target domain judger in the target domain network model: a source domain judger in the source domain network model and a target domain judger in the target domain network model do not share a regularization layer; when a target domain decision network is trained, a target domain decision device receives a target domain real image or a synthetic image and a semantic segmentation label of the real image, wherein the semantic segmentation label is also used for conditional regularization of a space self-adaptive normalization layer; and finally, obtaining a judgment result.

2. The method for generating the image based on the domain-adaptive priori knowledge guided generation countermeasure network as claimed in claim 1, wherein in the data set preparation step, the collected image is divided into an image to be translated, an image semantic segmentation label to be translated, a translated image and a translated image semantic segmentation label which are correspondingly placed in four folders as target domain data; the images associated with the translated images are collected as source domain images, either using public data sets associated with the translated images or from the internet, and placed individually in a folder.

3. The method as claimed in claim 1, wherein in the step of training the target domain generator in the target domain network model, the pre-translated image and the post-translated image are not in the same distribution, so that the spatial adaptive normalization does not constrain the last upsampling layer of the target domain generator, and only constrains other layers that do not contain upsampling.

4. The method as claimed in claim 3, wherein in the step of training the target domain generator in the target domain network model, the affine parameters of the batch regularization layer of the corresponding layer are migrated into the spatial adaptive normalization to help strengthen the relationship between the source domain and the target domain.

5. The method for generating an image based on domain-adaptive priori knowledge guided generation of a countermeasure network according to claim 1, wherein in the step of training a target domain generator in the target domain network model, the target domain generator receives an image to be translated and a semantic segmentation label of the image to be translated, the size of the image to be translated is 256 × 256 pixels, and the image to be translated directly enters the convolutional layer network without passing through a full connection layer as in the case of training a source domain network; the feature graph after convolution is normalized in a space self-adaptive mode, an example regularization is used for calculation in the space self-adaptive normalization regularization mode, affine transformation is carried out through an additionally input image semantic segmentation label to be translated, firstly, the input image semantic segmentation label to be translated is output after being convolved for one time, two tensors are obtained through two convolutions respectively, the two tensors are used as a scaling amount and an offset amount in affine transformation parameters, then, the feature graph distribution is restored through affine parameters of a batch regularization layer in a source domain network model, then, element-level multiplication and addition are carried out on the feature graph through the scaling amount and the offset amount, and final output is obtained; the spatial adaptive normalization layer is only used in downsampling and intermediate volume blocks, and the upsampling and source domain share a batch regularization layer to finally obtain a translated image of a false target domain.

6. The method for generating an image based on domain adaptive priori knowledge guided generation countermeasure network as claimed in claim 1, wherein in the step of image augmentation, an adaptive decision device is used for enhancement, and the augmented change of the image occurs before the input adaptive decision device and not before the adaptive generator; before the image is sent to the self-adaptive decision device, the image is subjected to an augmentation mode of color change or random shielding randomly, and the adopted random probability is a safety value of 0.8; the decided images will all be 0.8 probability augmented images.

7. The method of claim 1, wherein in the step of training a source domain decider in a source domain network model and a target domain decider in a target domain network model, the adaptive decider receives a real translated image or a false translated image, and a semantic segmentation label of the real translated image; the real image or the false image received by the self-adaptive decision device enters the convolution layer to extract the characteristics; the semantic segmentation labels of the real translated image are used as a space self-adaptive normalization layer to carry out condition regularization; the use of spatial adaptive normalization in the adaptive decision device enables the adaptive decision device to focus on critical target regions; and finally, obtaining a judgment result.

8. An image generation system for generating a countermeasure network based on domain self-adaptive priori knowledge guidance is characterized by comprising a data set preparation module, a data set preprocessing module, a training source domain network module, a training target domain generation network module, an image augmentation module and a training target domain judgment network module, wherein the data set preparation module is used for collecting paired images and semantic segmentation labels corresponding to the images according to task requirements and taking the paired images and the semantic segmentation labels as target domain data during training; collecting images similar to or related to the translated images in the target domain from the Internet without labels as source domain data during training; the data set preprocessing module is used for unifying the sizes of all image data in the target domain data and the source domain data; the training source domain network module uses the source domain data to train the model, the input of the model is a noise vector, and the noise vector is processed by a full connection layer, and then a new vector is recombined into a uniform image size; the noise vector is subjected to full convolution layer dimensionality raising to form a 65536 dimensionality vector, then the 65536 dimensionality vector is converted into a 256 x 256 dimensionality matrix, then the matrix is input into a convolution layer, and batch regularization layers are used for regularization after convolution; the process of generating the image by the source domain generator needs to generate 256 × 256 false source domain images by up-sampling after down-sampling; the image received by the source domain judger is a real source domain image or a false source domain image, but the image received by the source domain judger is an enhanced image; the training target domain generation network module uses the target domain data training model, the input data received by the model are the image to be translated and the semantic segmentation label of the image to be translated, the semantic segmentation label is used for the spatial adaptive normalization layer to perform conditional regularization, and the constraint of the image to be translated on the generated translated image is enhanced; the image amplification module uses a self-adaptive decision device to enhance, the image before being input into the self-adaptive decision device is randomly enhanced, and the self-adaptive decision device only decides the enhanced image; the training target domain judgment network module adopts a self-adaptive judgment device to receive a real translated image or a false translated image and a semantic segmentation label of the real translated image; the real image or the false image received by the self-adaptive decision device enters the convolution layer to extract the characteristics; the semantic segmentation labels of the real translated images are used as a space self-adaptive normalization layer to carry out condition regularization; the use of spatially adaptive normalization in the adaptive decision maker enables the adaptive decision maker to focus on critical target regions.