CN113160101A - Method for synthesizing high-simulation image - Google Patents

Method for synthesizing high-simulation image Download PDF

Info

Publication number
CN113160101A
CN113160101A CN202110401470.9A CN202110401470A CN113160101A CN 113160101 A CN113160101 A CN 113160101A CN 202110401470 A CN202110401470 A CN 202110401470A CN 113160101 A CN113160101 A CN 113160101A
Authority
CN
China
Prior art keywords
image
data set
original image
intermediate image
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110401470.9A
Other languages
Chinese (zh)
Other versions
CN113160101B (en
Inventor
金枝
张欢荣
齐银鹤
庞雨贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202110401470.9A priority Critical patent/CN113160101B/en
Publication of CN113160101A publication Critical patent/CN113160101A/en
Application granted granted Critical
Publication of CN113160101B publication Critical patent/CN113160101B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a method for synthesizing a high-simulation image, which comprises the following steps: constructing an original image data set and a target data set, and training the original image data set and the target data set to obtain an unpaired unsupervised style conversion network model; acquiring an image to be processed in an original image data set, and generating a target image with a target data set style through an unpaired unsupervised style conversion network model; wherein, the model training step comprises: acquiring a first original image in an original image data set, and converting the first original image into a first intermediate image under the format of a target data set; the first intermediate image is subjected to information growth recovery or information reverse modification to obtain a second original image; the method saves a large amount of time cost and labor cost, enables the image data to have higher usability, and can be widely applied to the technical field of image processing.

Description

Method for synthesizing high-simulation image
Technical Field
The invention relates to the technical field of image processing, in particular to a method for synthesizing a high-simulation image.
Background
In the prior art, a supervised learning based deep Neural Network (CNN) shows remarkable modeling capabilities in various advanced visual tasks, such as object detection, object segmentation, and the like. Most of the images in the existing image data sets are clear, clean and bright images, which may cause the networks trained on these data sets to be poor in poor visual conditions, such as low scene brightness and rain and fog, because some poor visual conditions may impair the visibility of the images and distort the structure, texture and color of objects in the images. In order to improve the robustness of networks for target detection, recognition, etc., people begin to train networks with images under poor visual conditions, and it is desirable to improve the network performance through learning such scenes. In addition, people also try to enhance the image before training the network, for example, the image is subjected to rain and fog removal, so that the visibility of the image to be detected is improved. Training a network on images under poor visual conditions requires a labeled image dataset under poor visual conditions. Although image collection under poor vision conditions is not difficult, labeling on an image is difficult and unreliable due to poor visibility of the image.
For this related art, a way of manufacturing a data set is proposed: the clear image is converted into the image under the bad vision condition, so that the original label of the clear image can be transferred to the generated image under the bad vision condition while the image under the bad vision condition is obtained. However, in this learning process of image-to-image conversion, it is very important to obtain a sufficient number of image data sets of different visual conditions in a pair in the same scene, and it takes a great deal of time and effort to collect a sufficient number of pairs of images, such as a high-brightness image and a low-brightness image in the same scene, a rain-fog-free image and a rain-fog-present image. In addition, since outdoor scenes frequently change over time, it is difficult to ensure that the content of the collected photographs in different situations is completely consistent, such as day and night, with no and no rain, so it is very difficult and impractical to acquire paired images.
Disclosure of Invention
In view of the above, to at least partially solve one of the above technical problems, an embodiment of the present invention provides a method for synthesizing a high-simulation image based on an unsupervised approach.
The technical scheme of the application provides a method for synthesizing a high-simulation image, which comprises the following steps: constructing an original image data set and a target data set, and training the original image data set and the target data set to obtain an unpaired unsupervised style conversion network model;
acquiring an image to be processed in an original image data set, and generating a target image with the style of the target data set through the unpaired unsupervised style conversion network model;
the step of training the original image data set and the target data set to obtain an unpaired unsupervised style conversion network model includes:
acquiring a first original image in the original image data set, and converting the first original image into a first intermediate image under the format of a target data set;
and obtaining a second original image by the first intermediate image through information growth recovery or information reverse modification.
In a possible embodiment of the present disclosure, the first intermediate image in the target data set format is a first intermediate image in a poor visual condition, and the step of converting the first original drawing into the first intermediate image in the target data set format includes removing object texture information of the first original drawing through a UNet network and introducing visual disturbance information to generate the first intermediate image in the poor visual condition.
In a possible embodiment of the present disclosure, the step of obtaining the second original image by performing information growth restoration or information reverse modification on the first intermediate image includes extracting image mask information from the first original image, and obtaining the second original image by performing AttUNet network restoration on the basis of the image mask information and the first intermediate image.
In one possible embodiment of the solution of the present application, the UNet network comprises an encoder and a decoder;
the encoder is used for extracting the multi-scale features of the first original image and down-sampling the multi-scale features to obtain a feature map;
the decoder is used for performing up-sampling on the feature map and performing feature fusion on the up-sampled feature map to obtain the first intermediate image under the poor visual condition.
In one possible embodiment of the solution of the present application, the AttUNet network comprises an attention block and a residual block;
the attention block is used for generating an edge map according to the image mask information, generating a scale map and a displacement map through parallel inference according to the edge map, and carrying out scaling displacement according to the scale map and the displacement map to obtain a first feature map;
and the residual block is used for carrying out deconvolution according to the first feature mapping to obtain the second original image.
In a possible embodiment of the solution of the present application, the first intermediate image under the target dataset format comprises at least one of: a first intermediate image in poor vision conditions and a first intermediate image in non-poor vision conditions; the step of training the original image data set and the target data set to obtain an unpaired unsupervised style conversion network model comprises at least one of the following steps:
determining, by a discriminator, a probability that the first intermediate image is a first intermediate image under poor visual conditions;
or
Determining, by a discriminator, a probability that the first intermediate image is a first intermediate image in non-poor visual conditions.
In a possible embodiment of the scheme of the application, the countermeasure loss, the cyclic pixel loss and the cyclic perception loss of the unpaired unsupervised style conversion network model are obtained;
and updating the unpaired unsupervised style conversion network model according to the sum of the countermeasure loss, the cyclic pixel loss and the cyclic perception loss.
In a possible embodiment of the solution of the present application, said antagonistic losses are obtained by:
determining the countermeasure loss based on a difference between the first intermediate image and the first artwork
In a possible embodiment of the solution of the present application, the cyclic pixel loss is obtained by:
and determining the cyclic pixel loss according to the difference of the first original image and the second original image in the spatial domain.
In a possible embodiment of the solution of the present application, the cyclic perceptual loss is obtained by:
and obtaining the similarity of the characteristic domains of the first original image and the second original image by pre-training a VGG network model to obtain the cycle perception loss.
Advantages and benefits of the present invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention:
according to the technical scheme, the image data sets are collected, the images under a part of specified conditions are collected as the target data sets, the two data sets are formed by being separated, the unpaired style conversion network model is trained on the two data sets, the trained model can be synthesized on any input image to obtain the image with a specific style, an image acquisition link and a manual labeling link which are required by a severe environment are avoided, a large amount of time cost and labor cost are saved, and the usability of the image data is higher.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart illustrating steps of a method for synthesizing a high-emulation image according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an unpaired style conversion network architecture according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a UNet network architecture according to an embodiment of the present invention;
fig. 4 is a schematic diagram of an AttUNet network architecture in an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a discriminator according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
First, the terms referred to in the technical section of the present application will be explained:
UNet Networks are neural Networks improved on the basis of full Convolutional Networks (full volumetric Networks).
The AttUNet network is a UNet network after an Attention (Attention) mechanism is introduced.
The VGG network is a deep convolutional neural network developed by the computer vision group at oxford university and by the researcher instruments of Google deep mind corporation.
The technical scheme of the application provides a method for synthesizing a high-simulation data set based on an unsupervised means, and target effects can be synthesized on an image, such as synthesizing rain and synthesizing fog in a scene, and switching from day to night.
In a first aspect, as shown in fig. 1, the technical solution of the present application provides an embodiment of a method for synthesizing a high simulation image, wherein the method includes steps S100-S200:
s100, constructing an original image data set and a target data set, and training the original image data set and the target data set to obtain an unpaired unsupervised style conversion network model;
illustratively, the original image dataset collects a sorted clear image (clearimages) in advance as an image dataset, and the target dataset collects a part of images (targetimages) under a specified condition, which may be a relatively poor visual condition such as rainy, foggy, or nighttime, as a target dataset while collecting the image dataset. In particular, the collected image dataset and the image in the target dataset do not require a one-to-one correspondence, i.e. the two datasets may be collected separately.
S200, acquiring an image to be processed in an original image data set, and generating a target image with a target data set style through an unpaired unsupervised style conversion network model;
specifically, the unpaired style conversion network is trained on the two data sets collected in this step S100, and the trained network can synthesize a specified effect on an arbitrary input image. Taking the example of the conversion of the images in the day and at night, firstly, some images shot in the day of any scene are collected, and then some images shot at night are collected to train the network. The tagged daytime image dataset is input into the trained network, and the corresponding night image in the same scene can be obtained, so that a paired 'day-night' image dataset is obtained.
Referring to fig. 2, the unpaired conversion network is composed of two cyclic networks of image-to-image conversion: cyclic network CycleC-T-CAnd a Cycle networkT-C-TBoth of the two circulating networks comprise a UNet network and an AttUNet network, with the difference that UNet precedes in one circulating network and UNet follows in the other circulating network.
The first intermediate image under the target data set format in an embodiment may be the first intermediate image under poor visual conditions, and step S200 may further comprise the step of refining: s210, acquiring a first original image in the image data set, and converting the first original image into a first intermediate image under a poor visual condition; and obtaining a second original image by the first intermediate image through information growth recovery or information reverse modification.
Referring to fig. 2, the first original image is original image c in the image data set, and the first intermediate image is a high-emulation composite image obtained by passing the original image through UNet network
Figure BDA0003020500780000051
And the second original is
Figure BDA0003020500780000052
Graph restored by AttUNet
Figure BDA0003020500780000053
Illustratively, in the CycleC-T-CFirstly, the original image c is converted into an image under poor visual conditions through a UNet network
Figure BDA0003020500780000054
Then the image is processed
Figure BDA0003020500780000055
And the mask is restored into the original image through the AttUNet network
Figure BDA0003020500780000056
The original object texture information can be lost and visual interference information is introduced when the simulation effect image is synthesized on an original image, for example, rainy days, foggy days, night and the like, so that the image visual condition is worse; this is a loss of information process; and the restoration of the image is an information growing process. And CycleC-T-CSimilarly, CycleT-C-TFirstly, t is converted by an AttUNet network to obtain a synthesized source domain map
Figure BDA0003020500780000057
In comparison to the value of t,
Figure BDA0003020500780000058
the visual condition is better, and the target domain graph is obtained through UNet recovery
Figure BDA0003020500780000059
The difference is that CycleT-C-TThe used mask is obtained from the image under the bad visual condition, and the image needs to be preprocessed to obtain the mask. Taking the synthetic night effect as an example, it is necessary to extract the edge map after increasing the brightness of the image with low brightness. Finally, a map of the target domain is extracted
Figure BDA00030205007800000510
The style of the image (c) includes, but is not limited to, the image features of the image such as brightness, saturation, white balance, and color temperature of the screen, and the style is determined from the original image
Figure BDA00030205007800000511
The extracted image features are adjusted or synthesized to obtain a synthetic image with a corresponding style; for example, the original image is an image of day, and is converted to an image of night.
In some alternative embodiments, the UNet network in the cyclic network of image-to-image conversion is composed of an encoder and a decoder;
in particular, the cyclic network framework in an embodiment comprises two generators (G)C-TAnd GT-C). Wherein G isC-TAnd the method is responsible for converting a clear and bright image into an image under poor visual conditions by using UNet. UNet is 2015 "U-net: the network architecture proposed in the document "public networks for biological image segmentation" has a strong capability in semantic segmentation, image restoration and image enhancement applications, and the structure of the UNet is shown in fig. 3, and the UNet network structure is symmetrical and is called UNet because it looks like the english letter U. Wherein the box part represents a feature map; the dotted pattern arrows represent the 3x3 convolution and activation function ReLU for feature extraction; the slash pattern arrow represents the upsampling process for recovering the dimensionality; the grid arrows represent a 1 × 1 convolution for the output result.
More specifically, the UNet network in the embodiment is an Encoder-Decoder (Encoder-Decoder) structure; as shown in fig. 3, the left half of the illustrated network architecture of fig. 3 is an Encoder (Encoder) for extracting multi-scale features, which is composed of two 3 × 3 convolutional layers (ReLU) and 2 × 2 maximum pooling layers (stride 2) repeated four times, and the number of channels is doubled each time downsampling is performed; in the right half of the illustrated network architecture of fig. 3 is a Decoder (Decoder) for image synthesis, which is composed of a 2x2 upsampled convolutional layer (ReLU), a connection layer for cropping the output feature map of the corresponding Encoder layer and then adding the upsampled result of the Decoder layer, and two 3x3 convolutional layers (ReLU) repeated four times; the structure fuses the local features and the overall features, enlarges the influence of the overall style on the local, and adds the influence of global scenes, lighting conditions, texture information and the like into the local features. Then, an UNet expansion path is executed on the fused feature maps, that is, upsampling is continuously performed, the upsampled feature maps obtained in each level are fused with the feature maps of the corresponding compression path levels, and finally, a poor visual condition image with the same size as the original image is obtained through a 1 × 1 convolutional layer. It will be appreciated that the UNet network architecture in the two image-to-image conversion loop networks is the same, and the functions and steps performed by the UNet network architecture are the same.
In some alternative embodiments, the AttUNet network in the cyclic network of image-to-image conversion is composed of several attention blocks and residual blocks;
specifically, two generators (G) in the embodimentC-TAnd GT-C) Another GT-CThe atstunt is used to convert the poor vision condition image and the associated mask into a clear and bright image as input. The AttUNet network architecture is modified from the UNet network architecture, an attention mechanism is introduced on the basis of the UNet network, the specific structure of the AttUNet network architecture is shown in fig. 4, in the whole architecture of the AttUNet of the embodiment, the input is an image t under severe visual conditions and an extracted mask, and the output is a fresh and bright image c. AttUNet consists of a series of attention blocks (Att Block) and residual blocks (Rec Block), which are used to process features on different scales. Take the nth attention block and the nth residual block as examples, take fnIs expressed as the nth noteInput of an intention block or an nth residual block, masknShown as an additional input to the nth attention block, i.e. the edge map. Attention Block from mask by four parallel inferencesnGenerating two scale graphs
Figure BDA0003020500780000061
And two shift maps
Figure BDA0003020500780000062
Each inference consists of a 1 × 1 convolutional layer, an activation function (ReLU), and a 1 × 1 convolutional layer. Use of
Figure BDA0003020500780000063
And
Figure BDA0003020500780000064
nth attention block scaling and shifting input fnTo obtain a new set of feature maps
Figure BDA0003020500780000065
Then, the 3 × 3 convolutional layer, the batch normalization layer and the activation function (ReLU) are used as the slave
Figure BDA0003020500780000066
Generating an output feature map representing a new style
Figure BDA0003020500780000067
Similarly, the next network layer uses
Figure BDA0003020500780000068
And
Figure BDA0003020500780000069
to pair
Figure BDA00030205007800000610
Performing scaling and shifting operations to form
Figure BDA00030205007800000611
To output of (c). By mixing
Figure BDA00030205007800000612
Passed to the 3x3 convolutional layer, batch normalization layer, and activation function (ReLU), resulting in the final output of the nth layer). A 4 x 4 convolutional layer is connected after each attention block, with a step size of 2, for halving the size of the feature map. For each residual block, it simply consists of two cells, each consisting of a 3 × 3 convolutional layer, a batch normalization layer, and an activation function (ReLU).
In some possible embodiments, the first intermediate image under the target data set format includes: at least one of the first intermediate image under poor visual condition and the first intermediate image under non-poor visual condition, so in step S200, the process of obtaining the unpaired style conversion network model through training the image data set and the target data set includes:
s220, determining the probability that the first intermediate image is the first intermediate image under the poor visual condition through a discriminator; or
Determining, by a discriminator, a probability that the first intermediate image is a first intermediate image under non-poor visual conditions;
illustratively, the recurrent network framework in an embodiment further comprises two discriminators (D)CAnd DT) Wherein D isCThe probability that the sample is a sharp bright image, D, is estimatedTThe probability that the sample is an image of poor visual conditions is estimated and the structure of the two discriminators is shown in fig. 5, where,
Figure BDA0003020500780000071
denotes the convolution layer k × k with a step size of s, NiRepresenting a normalization layer and a an activation function ReLU. The arbiter uses a CNN network that performs the generation of the picture type decision through 5 hierarchies.
In some possible embodiments, the process of training the image dataset and the target dataset to obtain the unpaired style conversion network model in step S200 may further include steps S230-S240:
s230, acquiring the countermeasure loss, the cyclic pixel loss and the cyclic perception loss of the unpaired style conversion network model;
and S240, optimizing the unpaired style conversion network model according to the countermeasure loss, the cyclic pixel loss and the cyclic perception loss.
Specifically, to stably learn the proposed network, embodiments employ multiple losses in the training process, including countermeasures, cyclic pixel losses, and cyclic perception losses, with the left network architecture Cycle in fig. 1C-T-CBy way of example, CycleT-C-TThe same is true. Total loss of generator and discriminator taking into account the generated simulation image
Figure BDA0003020500780000077
Authenticity, input sharp bright image c and reconstructed sharp bright image
Figure BDA0003020500780000078
The difference between them in the spatial and feature domains, the formula is as follows:
Figure BDA0003020500780000072
Figure BDA0003020500780000073
wherein G represents a generator, and D represents a discriminator; l istotalRepresents the total loss, LadvRepresenting a loss of antagonism, Lcycle_pixRepresenting a loss of cyclic pixels, Lcycle_perRepresents a loss of cyclic perception; α, β, γ, and δ are the corresponding balance coefficients of the loss components.
In some possible embodiments, the antagonistic loss may be derived from a difference between the first intermediate image and the first artwork;
specifically, the discriminator DTIt is necessary to learn whether the sample is a simulated image, and therefore, the D-based method is adoptedTTo reduce the generated simulation image
Figure BDA0003020500780000079
The difference from the true poor visual image t. Generator GC-TAnd discriminator DTThe resistance loss of (c) is as follows:
Figure BDA0003020500780000074
Figure BDA0003020500780000075
where E is the desired calculation.
In some possible embodiments, the cyclic pixel loss may be determined based on a difference in the spatial domain of the first artwork and the second artwork;
in particular, in the CycleC-T-CMiddle, reconstructed bright image
Figure BDA00030205007800000710
The original fresh bright image c should be input as close as possible. Therefore, the L1 loss function is applied in the embodiment to constrain the similarity of the two images in the spatial domain. H, W and C are given as C,
Figure BDA00030205007800000711
Height, width and number of channels, the cyclic pixel loss is as follows:
Figure BDA0003020500780000076
wherein, ch,w,cRepresenting the pixel intensities at the corresponding rows, columns and channels in image c,
Figure BDA0003020500780000081
the same is true.
In some possible embodiments, the cyclic perception loss is determined by determining the similarity of the first original image and the second original image in the feature domain through a pre-trained VGG network model;
in particular, perceptual loss, also known as feature loss, is used to constrain the similarity of features of two images. In the network architecture provided by the embodiment, the network architecture is applied to CycleC-T-CI.e. images c and
Figure BDA0003020500780000082
unlike periodic pixel loss, cyclic perceptual loss can constrain image c and image
Figure BDA0003020500780000084
Similarity of feature domains obtained by a pre-trained VGG network, the cyclic perceptual loss function is defined as follows:
Figure BDA0003020500780000083
where φ (c) represents a feature map extracted from c by a pre-trained VGG-19 network without batch normalization, φ (c) represents a feature map extracted from c by a pre-trained VGG-19 network without batch normalizationi,j(c) Representing the feature map obtained from the ith convolutional layer before the jth largest pooling layer.
In a second aspect, the present disclosure may further provide a system for synthesizing a high simulation image, including at least one processor; at least one memory for storing at least one program; when the at least one program is executed by the at least one processor, the at least one processor is caused to execute a method of synthesizing a high simulation image as in the first aspect.
An embodiment of the present invention further provides a storage medium storing a program, where the program is executed by a processor to implement the method in the first aspect.
From the above specific implementation process, it can be concluded that the technical solution provided by the present invention has the following advantages or advantages compared to the prior art:
1) according to the technical scheme, a high-simulation data set is synthesized through learning of a small part of data sets, and in the synthesis process, labels carried on the real data set serving as a source domain can be transferred to a synthesized high-simulation severe image data set for use, so that the link of collecting large-scale multi-scene image pairs and the link of manually labeling the large-scale image data sets are saved. The two links are saved, and the method has the characteristics of saving time, manpower and material resources;
2) the technical scheme of this application when converting the bad picture into with clear picture, can also convert the bad picture into clear picture, can be used for improving image quality.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the functions and/or features may be integrated in a single physical device and/or software module, or one or more of the functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method for synthesizing a high simulation image is characterized by comprising the following steps:
constructing an original image data set and a target data set, and training the original image data set and the target data set to obtain an unpaired unsupervised style conversion network model;
acquiring an image to be processed in an original image data set, and generating a target image with the style of the target data set through the unpaired unsupervised style conversion network model;
the step of training the original image data set and the target data set to obtain an unpaired unsupervised style conversion network model includes:
acquiring a first original image in the original image data set, and converting the first original image into a first intermediate image under the format of a target data set;
and obtaining a second original image by the first intermediate image through information growth recovery or information reverse modification.
2. A method for synthesizing an emulated image of claim 1, wherein said first intermediate image in the format of said target data set is a first intermediate image in poor visual conditions, and said step of converting said first artwork into said first intermediate image in the format of said target data set comprises:
and removing the object texture information of the first original image through a UNet network and introducing visual interference information to generate a first intermediate image under the poor visual condition.
3. The method of claim 1, wherein said step of modifying said first intermediate image into said second original image by information growth restoration or information reverse modification comprises:
and extracting image mask information from the first original image, and recovering the first intermediate image through an AttUNet network according to the image mask information and the first intermediate image to obtain the second original image.
4. A method of synthesizing an hyperartificial image according to claim 2, wherein the UNet network comprises an encoder and a decoder;
the encoder is used for extracting the multi-scale features of the first original image and down-sampling the multi-scale features to obtain a feature map;
the decoder is used for performing up-sampling on the feature map and performing feature fusion on the up-sampled feature map to obtain the first intermediate image under the poor visual condition.
5. A method of synthesizing a hyperartificial image according to claim 3, wherein the AttUNet network comprises an attention block and a residual block;
the attention block is used for generating an edge map according to the image mask information, generating a scale map and a displacement map through parallel inference according to the edge map, and carrying out scaling displacement according to the scale map and the displacement map to obtain a first feature map;
and the residual block is used for carrying out deconvolution according to the first feature mapping to obtain the second original image.
6. A method of composing a hyperartificial image according to any of claims 1-5, wherein the first intermediate image under the target data set format comprises at least one of: a first intermediate image in poor vision conditions and a first intermediate image in non-poor vision conditions;
the step of training the original image data set and the target data set to obtain an unpaired unsupervised style conversion network model comprises at least one of the following steps:
determining, by a discriminator, a probability that the first intermediate image is a first intermediate image under poor visual conditions;
or
Determining, by a discriminator, a probability that the first intermediate image is a first intermediate image in non-poor visual conditions.
7. The method of claim 6, wherein the step of training the original image data set and the target data set to obtain an unpaired unsupervised style transformation network model further comprises:
acquiring the countermeasure loss, the cyclic pixel loss and the cyclic perception loss of the unpaired unsupervised style conversion network model; and updating the unpaired unsupervised style conversion network model according to the sum of the countermeasure loss, the cyclic pixel loss and the cyclic perception loss.
8. The method for synthesizing high simulation image according to claim 7, wherein the countermeasure loss is obtained by the following steps:
determining the countermeasure loss based on a difference between the first intermediate image and the first artwork.
9. The method of synthesizing a high simulation image according to claim 7, wherein the cyclic pixel loss is obtained by:
and determining the cyclic pixel loss according to the difference of the first original image and the second original image in the spatial domain.
10. The method of claim 7, wherein the cyclic perceptual loss is obtained by:
and obtaining the similarity of the first original image and the second original image in a feature domain by pre-training a VGG network model to obtain the cycle perception loss.
CN202110401470.9A 2021-04-14 2021-04-14 Method for synthesizing high-simulation image Active CN113160101B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110401470.9A CN113160101B (en) 2021-04-14 2021-04-14 Method for synthesizing high-simulation image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110401470.9A CN113160101B (en) 2021-04-14 2021-04-14 Method for synthesizing high-simulation image

Publications (2)

Publication Number Publication Date
CN113160101A true CN113160101A (en) 2021-07-23
CN113160101B CN113160101B (en) 2023-08-01

Family

ID=76890421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110401470.9A Active CN113160101B (en) 2021-04-14 2021-04-14 Method for synthesizing high-simulation image

Country Status (1)

Country Link
CN (1) CN113160101B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
US20190147582A1 (en) * 2017-11-15 2019-05-16 Toyota Research Institute, Inc. Adversarial learning of photorealistic post-processing of simulation with privileged information
CN112270648A (en) * 2020-09-24 2021-01-26 清华大学 Unsupervised image transformation method and unsupervised image transformation device based on loop countermeasure network
CN112365556A (en) * 2020-11-10 2021-02-12 成都信息工程大学 Image extension method based on perception loss and style loss

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
US20190147582A1 (en) * 2017-11-15 2019-05-16 Toyota Research Institute, Inc. Adversarial learning of photorealistic post-processing of simulation with privileged information
CN112270648A (en) * 2020-09-24 2021-01-26 清华大学 Unsupervised image transformation method and unsupervised image transformation device based on loop countermeasure network
CN112365556A (en) * 2020-11-10 2021-02-12 成都信息工程大学 Image extension method based on perception loss and style loss

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李君艺;尧雪娟;李海林;: "基于感知对抗网络的图像风格迁移方法研究", 合肥工业大学学报(自然科学版), no. 05, pages 54 - 58 *

Also Published As

Publication number Publication date
CN113160101B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN113902915B (en) Semantic segmentation method and system based on low-light complex road scene
CN113343789A (en) High-resolution remote sensing image land cover classification method based on local detail enhancement and edge constraint
CN112396607A (en) Streetscape image semantic segmentation method for deformable convolution fusion enhancement
CN113486897A (en) Semantic segmentation method for convolution attention mechanism up-sampling decoding
CN116665176B (en) Multi-task network road target detection method for vehicle automatic driving
CN110443763B (en) Convolutional neural network-based image shadow removing method
CN110363068B (en) High-resolution pedestrian image generation method based on multiscale circulation generation type countermeasure network
CN112084859B (en) Building segmentation method based on dense boundary blocks and attention mechanism
CN111275638B (en) Face repairing method for generating confrontation network based on multichannel attention selection
CN116645592B (en) Crack detection method based on image processing and storage medium
CN114373094A (en) Gate control characteristic attention equal-variation segmentation method based on weak supervised learning
CN115205672A (en) Remote sensing building semantic segmentation method and system based on multi-scale regional attention
CN115908772A (en) Target detection method and system based on Transformer and fusion attention mechanism
CN114782298A (en) Infrared and visible light image fusion method with regional attention
CN115272438A (en) High-precision monocular depth estimation system and method for three-dimensional scene reconstruction
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
CN112767277B (en) Depth feature sequencing deblurring method based on reference image
CN114529832A (en) Method and device for training preset remote sensing image overlapping shadow segmentation model
CN114331931A (en) High dynamic range multi-exposure image fusion model and method based on attention mechanism
CN117952883A (en) Backlight image enhancement method based on bilateral grid and significance guidance
CN114155165A (en) Image defogging method based on semi-supervision
CN117058392A (en) Multi-scale Transformer image semantic segmentation method based on convolution local enhancement
Wang et al. A multi-scale attentive recurrent network for image dehazing
CN115035402B (en) Multistage feature aggregation system and method for land cover classification problem
CN113160101B (en) Method for synthesizing high-simulation image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant