CN114399668A - Natural image generation method and device based on hand-drawn sketch and image sample constraint - Google Patents

Natural image generation method and device based on hand-drawn sketch and image sample constraint Download PDF

Info

Publication number
CN114399668A
CN114399668A CN202111617371.0A CN202111617371A CN114399668A CN 114399668 A CN114399668 A CN 114399668A CN 202111617371 A CN202111617371 A CN 202111617371A CN 114399668 A CN114399668 A CN 114399668A
Authority
CN
China
Prior art keywords
image
content
natural
natural image
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111617371.0A
Other languages
Chinese (zh)
Inventor
高成英
袁梦丽
许琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202111617371.0A priority Critical patent/CN114399668A/en
Publication of CN114399668A publication Critical patent/CN114399668A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a natural image generation method and a device based on hand-drawn sketch and image sample constraint, wherein the method comprises the following steps: firstly, acquiring an original natural image and category information, and constructing a training data set; then, a natural image generation model is constructed, wherein the natural image generation model comprises a generator and a multi-task discriminator, the generator is used for generating natural images in the training process, and the multi-task discriminator is used for judging whether the generated natural images are real or not in the training process and judging the category of the generated natural images; then training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model; and finally, inputting the target image sample and the target hand-drawn sketch into the target model to generate a natural image based on the target hand-drawn sketch and the image sample constraint. The invention improves the convenience and the controllability, and can be widely applied to the technical field of image processing.

Description

Natural image generation method and device based on hand-drawn sketch and image sample constraint
Technical Field
The invention relates to the technical field of image processing, in particular to a natural image generation method and device based on hand-drawn sketches and image sample constraints.
Background
With the rapid development of conditional countermeasure generation networks, a variety of conditional countermeasure generation networks based on different constraints are emerging continuously, such as networks using edge maps, natural images, hand-drawn sketches, semantic segmentation maps, or the like as constraints. But these networks can only control the generation of image content such as pose, shape, etc. The controllability of the generated image is not high enough.
The recent emergence of using key points, edge maps and only natural images to control the generation of image content, but using key points does not express user intent well because they are too abstract; the user can not find a proper input to express the intention of the user by using the edge map or the natural image, and the method is not convenient enough.
Disclosure of Invention
In view of this, embodiments of the present invention provide a natural image generation method and apparatus based on sketching and image sample constraints, which are highly convenient and controllable.
The invention provides a natural image generation method based on hand-drawn sketch and image sample constraint, which comprises the following steps:
acquiring an original natural image and category information, and constructing a training data set; wherein the training data set comprises content images and image samples;
constructing a natural image generation model, wherein the natural image generation model comprises a generator and a multi-task discriminator, the generator is used for generating natural images in the training process, and the multi-task discriminator is used for judging whether the generated natural images are real or not in the training process and judging the category of the generated natural images;
training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model;
and inputting the target image sample and the target hand-drawn sketch into the target model, and generating a natural image based on the target hand-drawn sketch and the image sample constraint.
Optionally, the acquiring the original natural image and the category information to construct a training data set includes:
acquiring an edge map of the original natural image through an edge map extraction algorithm;
the edge graph and the corresponding natural image are paired to form a training data pair in a data set, and the edge graph and the random natural image are also paired to form a training data pair in the data set;
and constructing the training data set according to the two training data pairs, wherein the edge graph is a content image, and the natural image is an image sample.
Optionally, the constructing a natural image generation model includes:
constructing a content encoder, wherein the content encoder comprises five convolution modules and two residual modules, and is used for extracting content characteristics in input data;
constructing a style encoder, wherein the style encoder is used for extracting style characteristics of an image sample in input data;
and constructing a content decoder, wherein the content decoder is used for acquiring affine transformation parameters according to the style characteristics and generating pictures according to the content characteristics and the affine transformation parameters.
Optionally, the content feature extraction formula is:
Zcontent=Econtent(Xedge or sketch)
wherein Z iscontentIs a content feature; econtentBeing a content encoder, Xedge or sketchIs a sketch or hand drawing in the content image.
Optionally, the method further includes a step of performing style migration using adaptive instance normalization, where the step specifically includes:
processing the style characteristics through three full connection layers to obtain affine transformation parameters;
inputting the style characteristics into an AdaIN Resblock module of the content decoder according to the affine transformation parameters, and performing style migration by using self-adaptive example normalization through the AdaIN Resblock module;
wherein the calculation expression of the adaptive instance normalization is as follows:
Figure BDA0003436958740000021
wherein AdaIN (z)content,zreference) Represents the result of the adaptive instance normalization, zcontentRepresenting a content feature, zreferenceDenotes the style characteristics, μ denotes the mean, and σ denotes the variance.
Optionally, in the step of training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model,
the reconstruction loss function of the training process is:
Lrec(G)=||G(Xedge,Yimage)-Yimage||1
wherein L isrec(G) Represents a reconstruction loss function; xedgeRepresenting edge maps in the training data pairs; y isimageRepresenting natural images in the training data pair; g () represents the generator model, and during training, the generator inputs an edge graph and a natural image to generate a target picture;
the multi-tasking discriminant loss function is:
LGAN(D,G)=EX[-logDc-1(Yc-1)]+EX,Y[log(1-logDc-1(G(Xedge,Yc-1)))]
wherein L isGAN(D, G) represents a multitask discrimination loss, G represents a generator network, and D represents a discriminator network; eXRepresenting the true data distribution, logDc-1(Yc-1) Representing the output of the discriminator when the real sample is input; eX,YRepresenting the generated sample distribution, logDc-1(G(Xedge,Yc-1) Output of the discriminator when the representative input generates a sample; g (X)edge,Yc-1) Representing samples generated by a generator network; the subscript c-1 indicates the category.
The knowledge distillation loss function is:
Figure BDA0003436958740000031
wherein L isDistill(GS) Represents a loss of knowledge distillation; n represents the number of layers of the middle layer of the selected generator network;
Figure BDA0003436958740000032
an output representing the activation value of the ith layer of the network middle layer of the teacher generator;
Figure BDA0003436958740000033
an output representing a layer i activation value of a middle layer of the student generator network.
Another aspect of the embodiments of the present invention provides a natural image generation apparatus based on hand-drawn sketches and image sample constraints, including:
the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring an original natural image and category information and constructing a training data set; wherein the training data set comprises content images and image samples;
the system comprises a first module, a second module and a third module, wherein the first module is used for establishing a natural image generation model, the natural image generation model comprises a generator and a multi-task discriminator, the generator is used for generating a natural image in a training process, and the multi-task discriminator is used for judging whether the generated natural image is real or not in the training process and judging the category of the generated natural image;
the third module is used for training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model;
and the fourth module is used for inputting the target image sample and the target hand-drawn sketch into the target model and generating a natural image based on the target hand-drawn sketch and the image sample constraint.
Another aspect of the embodiments of the present invention provides an electronic device, including a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
Another aspect of the embodiments of the present invention provides a computer-readable storage medium storing a program, the program being executed by a processor to implement the method as described above.
Another aspect of embodiments of the present invention provides a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
The method comprises the steps of firstly, obtaining original natural images and category information, and constructing a training data set; then, a natural image generation model is constructed, wherein the natural image generation model comprises a generator and a multi-task discriminator, the generator is used for generating natural images in the training process, and the multi-task discriminator is used for judging whether the generated natural images are real or not in the training process and judging the category of the generated natural images; then training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model; and finally, inputting the target image sample and the target hand-drawn sketch into the target model to generate a natural image based on the target hand-drawn sketch and the image sample constraint. The invention improves the convenience and the controllability.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart illustrating the overall steps provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of edge map extraction according to natural images according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating two training data pairs according to an embodiment of the present invention;
FIG. 4 is a diagram of a natural image generation model according to an embodiment of the present invention;
fig. 5 is a diagram of a natural image generation result based on a sketching and an image sample constraint according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Aiming at the problems in the prior art, in order to combine the advantages of using the hand-drawn sketch and the image sample as input, the hand-drawn sketch can be used for providing convenience for users and controlling the generated image content, and the image sample can be used for controlling the styles of generating image textures and the like. The invention provides a fine-grained generation framework based on a hand-drawn sketch and using an image sample as an additional constraint. In the framework, aiming at the problem that the mapping relation between the hand-drawn sketch and the natural image is difficult to construct, a content encoder capable of extracting general semantic features from the cross-domain image is designed, namely, correct semantic features can be extracted no matter a side graph or the hand-drawn sketch is used as the input of the content encoder; and the knowledge distillation loss is introduced into the image generation process by combining the idea of knowledge distillation, so that the image generation quality is improved without changing the network architecture.
Specifically, an embodiment of the present invention provides a natural image generation method based on hand-drawn sketches and image sample constraints, including:
acquiring an original natural image and category information, and constructing a training data set; wherein the training data set comprises content images and image samples;
constructing a natural image generation model, wherein the natural image generation model comprises a generator and a multi-task discriminator, the generator is used for generating natural images in the training process, and the multi-task discriminator is used for judging whether the generated natural images are real or not in the training process and judging the category of the generated natural images;
training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model;
and inputting the target image sample and the target hand-drawn sketch into the target model, and generating a natural image based on the target hand-drawn sketch and the image sample constraint.
Optionally, the acquiring the original natural image and the category information to construct a training data set includes:
acquiring an edge map of the original natural image through an edge map extraction algorithm;
the edge graph and the corresponding natural image are paired to form a training data pair in a data set, and the edge graph and the random natural image are also paired to form a training data pair in the data set;
and constructing the training data set according to the two training data pairs, wherein the edge graph is a content image, and the natural image is an image sample.
Optionally, the constructing a natural image generation model includes:
constructing a content encoder, wherein the content encoder comprises five convolution modules and two residual modules, and is used for extracting content characteristics in input data;
constructing a style encoder, wherein the style encoder is used for extracting style characteristics of an image sample in input data;
and constructing a content decoder, wherein the content decoder is used for acquiring affine transformation parameters according to the style characteristics and generating pictures according to the content characteristics and the affine transformation parameters.
Optionally, the content feature extraction formula is:
Zcontent=Econtent(Xedge or sketch)
wherein Z iscontentIs a content feature; econtentBeing a content encoder, Xedge or sketchIs a sketch or hand drawing in the content image.
Optionally, the method further includes a step of performing style migration using adaptive instance normalization, where the step specifically includes:
processing the style characteristics through three full connection layers to obtain affine transformation parameters;
inputting the style characteristics into an AdaIN Resblock module of the content decoder according to the affine transformation parameters, and performing style migration by using self-adaptive example normalization through the AdaIN Resblock module;
wherein the calculation expression of the adaptive instance normalization is as follows:
Figure BDA0003436958740000061
wherein AdaIN (z)content,zreference) Represents the result of the adaptive instance normalization, zcontentRepresenting a content feature, zreferenceDenotes the style characteristics, μ denotes the mean, and σ denotes the variance.
Optionally, in the step of training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model,
the reconstruction loss function of the training process is:
Lrec(G)=||G(Xedge,Yimage)-Yimage||1
wherein L isrec(G) Represents a reconstruction loss function; xedgeRepresenting edge maps in the training data pairs; y isimageRepresenting natural images in the training data pair; g () represents the generator model, and during training, the generator inputs an edge graph and a natural image to generate a target picture;
the multi-tasking discriminant loss function is:
LGAN(D,G)=EX[-logDc-1(Yc-1)]+EX,Y[log(1-logDc-1(G(Xedge,Yc-1)))]
wherein L isGAN(D, G) represents a multitask discrimination loss, G represents a generator network, and D represents a discriminator network; eXRepresenting the true data distribution, logDc-1(Yc-1) Representing the output of the discriminator when the real sample is input; eX,YRepresenting the generated sample distribution, logDc-1(G(Xedge,Yc-1) Output of the discriminator when the representative input generates a sample; g (X)edge,Yc-1) Representing samples generated by a generator network; the subscript c-1 indicates the category.
The knowledge distillation loss function is:
Figure BDA0003436958740000062
wherein L isDistill(GS) Represents a loss of knowledge distillation; n represents the number of layers of the middle layer of the selected generator network;
Figure BDA0003436958740000063
an output representing the activation value of the ith layer of the network middle layer of the teacher generator;
Figure BDA0003436958740000064
an output representing a layer i activation value of a middle layer of the student generator network.
Another aspect of the embodiments of the present invention provides a natural image generation apparatus based on hand-drawn sketches and image sample constraints, including:
the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring an original natural image and category information and constructing a training data set; wherein the training data set comprises content images and image samples;
the system comprises a first module, a second module and a third module, wherein the first module is used for establishing a natural image generation model, the natural image generation model comprises a generator and a multi-task discriminator, the generator is used for generating a natural image in a training process, and the multi-task discriminator is used for judging whether the generated natural image is real or not in the training process and judging the category of the generated natural image;
the third module is used for training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model;
and the fourth module is used for inputting the target image sample and the target hand-drawn sketch into the target model and generating a natural image based on the target hand-drawn sketch and the image sample constraint.
Another aspect of the embodiments of the present invention provides an electronic device, including a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
Another aspect of the embodiments of the present invention provides a computer-readable storage medium storing a program, the program being executed by a processor to implement the method as described above.
Another aspect of embodiments of the present invention provides a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
The following detailed description of the specific implementation principles of the present invention is made with reference to the accompanying drawings:
as shown in FIG. 1, the invention discloses a natural image generation method based on hand-drawn sketch and image sample constraint, which includes the steps of firstly collecting natural images and hand-drawn sketch data to make a data set, then constructing a natural image generation model, training the natural image generation model by using a loss function, finally using the hand-drawn sketch and the image sample as input by a user, generating natural images which refer to the hand-drawn sketch in posture and refer to the image sample in texture by a generator in the natural image generation model, and obtaining a final result. Specifically, the method mainly comprises the following steps:
step 1: collecting a natural image and a hand-drawn sketch making data set, wherein the edge map and the sketch in the data set are content images, the natural image is an image sample, and the method comprises the following steps 1-1 and 1-2:
step 1-1: the method comprises the steps of collecting natural images and corresponding category information of the natural images, firstly obtaining edge maps of the natural images by using an edge map extraction algorithm, and obtaining the result of extracting the edge maps according to the natural images as shown in fig. 2. And (3) taking the edge map, the corresponding natural image, the edge map and the random natural image pair as training data pairs, and constructing a training data set by using the two training data pairs as shown in fig. 3.
Step 1-2: collecting the hand-drawn sketches, randomly distributing 10 natural images for each sketches as image samples, and constructing a test data set.
Step 2: the method comprises the following steps of constructing a natural image generation model, wherein the model comprises a generator and a multi-task discriminator, the generator consists of a content encoder which is responsible for extracting input content characteristics, a style encoder which is responsible for extracting input image sample characteristics and a decoder, and the multi-task discriminator consists of a plurality of two-classification discriminators and comprises the following steps of 2-1 to 2-5:
and 2-1, constructing a content encoder, wherein the content encoder is responsible for extracting the content characteristics of the input content image and is designed into a seven-layer convolution network comprising five convolution modules Conv-64, Conv-128, Conv-256, Conv-512 and two residual modules Resblock-512 and Resblock-512. The content feature process formula for extracting the content image is expressed as follows:
Zcontent=Econtent(Xedge or sketch)
wherein Z iscontentAs a content feature, EcontentBeing a content encoder, Xedge or sketchIs a content image edge map or sketch; conv-in the network structure represents a convolution block, Resblock-represents a residual block, and the number represents the number of output characteristic channels.
And 2-2, constructing a style encoder, wherein the style encoder is responsible for extracting style characteristics of the input image sample and is designed into seven-layer networks including Conv-64, Conv-128, Conv-256, Conv-512, Conv-1024, AvgPooling and Conv-8. The style characteristic process formula for extracting the image sample is expressed as follows:
Zreference=Ereference(Yc-1)
wherein Z isreferenceAs a style feature, EreferenceFor a genre encoder, Yc-1For the input image sample, subscript c-1 indicates the class of the input image sample; conv-in the network structure represents the volume block, AvgPooling represents the average pooling layer, and numbers represent the number of output feature channels.
And 2-3, constructing a content decoder, wherein the content decoder uses the content characteristics as input, and performs style migration on the style characteristics by using adaptive instance normalization (AdaIN) to obtain a final generated picture. The content decoder is designed as two AdaIN Resblock-512 modules and five convolution modules Conv-512, Conv-256, Conv-128, Conv-64, Conv-3. The content decoder uses the content features and the style features as parameters to obtain a final generated picture process formula as follows:
Figure BDA0003436958740000081
wherein Z iscontentAs a characteristic of the content, ZreferenceFor style characteristics, Decoder is a content Decoder,
Figure BDA0003436958740000082
for the final generated picture, subscript c-1 indicates the category of the generated picture; conv-in the network structure represents the convolution block, AdaIN Resblock represents the AdaIN residual block, and the number represents the number of output characteristic channels.
The specific steps of the content decoder for style migration using adaptive instance normalization (AdaIN) are as follows:
step 2-4: obtaining affine transformation parameters required by the ith AdaIN reblock module after the style characteristics pass through the first three full-connection layers
Figure BDA0003436958740000083
The specific calculation is as follows
Figure BDA0003436958740000084
Wherein, WTAnd b is the offset of the full connection layer, and the full connection layer converts the output into a vector form to realize feature transformation.
Step 2-5: using affine transformation parameters
Figure BDA0003436958740000085
And (3) injecting the style characteristics into an AdaIN Resblock module of a decoder model, wherein the injection method is specifically calculated as follows
Figure BDA0003436958740000091
Wherein σi(zreference) And ui(zreference) Respectively representing predicted affine transformation parameters according to style characteristics
Figure BDA0003436958740000092
x is a content characteristic Zcontent,zreferenceFor the style characteristics, μ represents the mean and σ represents the variance.
And step 3: training data is adopted to generate a model for training a natural image, and parameters of the model for generating the natural image are adjusted by using a loss function in each training round, wherein the specific loss function is as follows:
reconstruction loss function:
Lrec(G)=||G(Xedge,Yimage)-Yimage||1
wherein, XedgeAs content image, YimageFor the image example, G is the generator network.
Multitask discriminant loss function:
LGAN(D,G)=EX[-logDc-1(Yc-1)]+EX,Y[log(1-logDc-1(G(Xedge,Yc-1)))]
wherein G is a generator network and D is a discriminator network; eXFor true data distribution, logDc-1(Yc-1) The output of the discriminator when the real sample is input; eX,YTo generate the sample distribution, logDc-1(G(Xedge,Yc-1) When a sample is generated for input, the output of the discriminator; g (X)edge,Yc-1) Samples generated for a generator network; the subscript c-1 is a category.
Knowledge distillation loss function:
Figure BDA0003436958740000093
wherein the content of the first and second substances,
Figure BDA0003436958740000094
and
Figure BDA0003436958740000095
and respectively outputting the activation values of the ith layers of the intermediate layers of the teacher generator network and the student generator network, wherein the teacher generator network and the student generator network are consistent with the generator network structure, N represents the total selected number of the intermediate layers, and N is 6.
And 4, inputting the sketch in the test data set and any natural image data into a trained natural image generation model to realize natural image generation based on the freehand sketch and image sample constraint, wherein the generation result is shown in FIG. 5.
In summary, the invention combines the advantages of using the hand-drawn sketch and the image sample as input, which can not only use the hand-drawn sketch to provide convenience for users and control the generation of image content, but also use the image sample to control the generation of styles such as image texture.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. The natural image generation method based on the hand-drawn sketch and the image sample constraint is characterized by comprising the following steps of:
acquiring an original natural image and category information, and constructing a training data set; wherein the training data set comprises content images and image samples;
constructing a natural image generation model, wherein the natural image generation model comprises a generator and a multi-task discriminator, the generator is used for generating natural images in the training process, and the multi-task discriminator is used for judging whether the generated natural images are real or not in the training process and judging the category of the generated natural images;
training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model;
and inputting the target image sample and the target hand-drawn sketch into the target model, and generating a natural image based on the target hand-drawn sketch and the image sample constraint.
2. The method for generating a natural image based on a hand-drawn sketch and image example constraint according to claim 1, wherein the obtaining of the original natural image and the category information and the construction of the training data set comprise:
acquiring an edge map of the original natural image through an edge map extraction algorithm;
the edge graph and the corresponding natural image are paired to form a training data pair in a data set, and the edge graph and the random natural image are also paired to form a training data pair in the data set;
and constructing the training data set according to the two training data pairs, wherein the edge graph is a content image, and the natural image is an image sample.
3. The method for generating natural images based on sketching and image sample constraints as claimed in claim 1, wherein the constructing a natural image generation model comprises:
constructing a content encoder, wherein the content encoder comprises five convolution modules and two residual modules, and is used for extracting content characteristics in input data;
constructing a style encoder, wherein the style encoder is used for extracting style characteristics of an image sample in input data;
and constructing a content decoder, wherein the content decoder is used for acquiring affine transformation parameters according to the style characteristics and generating pictures according to the content characteristics and the affine transformation parameters.
4. The method of generating natural images based on sketching and image sample constraints as recited in claim 3,
the extraction formula of the content features is as follows:
Zcontent=Econtent(Xedge or sketch)
wherein Z iscontentIs a content feature; econtentBeing a content encoder, Xedge or sketchIs a sketch or hand drawing in the content image.
5. The method for generating natural images based on sketching and image sample constraints as claimed in claim 3, wherein said method further comprises a step of performing style migration using adaptive instance normalization, said step comprising:
processing the style characteristics through three full connection layers to obtain affine transformation parameters;
inputting the style characteristics into an AdaIN Resblock module of the content decoder according to the affine transformation parameters, and performing style migration by using self-adaptive example normalization through the AdaIN Resblock module;
wherein the calculation expression of the adaptive instance normalization is as follows:
Figure FDA0003436958730000021
wherein AdaIN (z)content,zreference) Represents the result of the adaptive instance normalization, zcontentRepresenting a content feature, zreferenceDenotes the style characteristics, μ denotes the mean, and σ denotes the variance.
6. The method according to claim 1, wherein in the step of training the natural image model by the training data set, adjusting parameters of the natural image model to obtain a trained target model,
the reconstruction loss function of the training process is:
Lrec(G)=||G(Xedge,Yimage)-Yimage||1
wherein L isrec(G) Represents a reconstruction loss function; xedgeRepresenting edge maps in the training data pairs; y isimageRepresenting natural images in the training data pair; g () represents the generator model, and during training, the generator inputs an edge graph and a natural image to generate a target picture;
the multi-tasking discriminant loss function is:
LGAN(D,G)=EX[-logDc-1(Yc-1)]+EX,Y[log(1-logDc-1(G(Xedge,Yc-1)))]
wherein L isGAN(D, G) represents a multitask discrimination loss, G represents a generator network, and D represents a discriminator network; eXRepresenting the true data distribution, logDc-1(Yc-1) Representing the output of the discriminator when the real sample is input; eX,YRepresenting the generated sample distribution, logDc-1(G(Xedge,Yc-1) Output of the discriminator when the representative input generates a sample; g (X)edge,Yc-1) Representing samples generated by a generator network; the subscript c-1 indicates the category.
The knowledge distillation loss function is:
Figure FDA0003436958730000022
wherein L isDistill(GS) Represents a loss of knowledge distillation; n represents the number of layers of the middle layer of the selected generator network;
Figure FDA0003436958730000023
an output representing the activation value of the ith layer of the network middle layer of the teacher generator;
Figure FDA0003436958730000024
output representing activation value of layer I of middle layer of student generator network。
7. A natural image generating apparatus based on hand-drawn sketches and image sample constraints, comprising:
the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring an original natural image and category information and constructing a training data set; wherein the training data set comprises content images and image samples;
the system comprises a first module, a second module and a third module, wherein the first module is used for establishing a natural image generation model, the natural image generation model comprises a generator and a multi-task discriminator, the generator is used for generating a natural image in a training process, and the multi-task discriminator is used for judging whether the generated natural image is real or not in the training process and judging the category of the generated natural image;
the third module is used for training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model;
and the fourth module is used for inputting the target image sample and the target hand-drawn sketch into the target model and generating a natural image based on the target hand-drawn sketch and the image sample constraint.
8. An electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executing the program realizes the method of any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the method according to any one of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program realizes the method of any of claims 1 to 6 when executed by a processor.
CN202111617371.0A 2021-12-27 2021-12-27 Natural image generation method and device based on hand-drawn sketch and image sample constraint Pending CN114399668A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111617371.0A CN114399668A (en) 2021-12-27 2021-12-27 Natural image generation method and device based on hand-drawn sketch and image sample constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111617371.0A CN114399668A (en) 2021-12-27 2021-12-27 Natural image generation method and device based on hand-drawn sketch and image sample constraint

Publications (1)

Publication Number Publication Date
CN114399668A true CN114399668A (en) 2022-04-26

Family

ID=81228403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111617371.0A Pending CN114399668A (en) 2021-12-27 2021-12-27 Natural image generation method and device based on hand-drawn sketch and image sample constraint

Country Status (1)

Country Link
CN (1) CN114399668A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115358917A (en) * 2022-07-14 2022-11-18 北京汉仪创新科技股份有限公司 Method, device, medium and system for transferring non-aligned faces in hand-drawing style
CN115496824A (en) * 2022-09-27 2022-12-20 北京航空航天大学 Multi-class object-level natural image generation method based on hand drawing
CN116542321A (en) * 2023-07-06 2023-08-04 中科南京人工智能创新研究院 Image generation model compression and acceleration method and system based on diffusion model

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115358917A (en) * 2022-07-14 2022-11-18 北京汉仪创新科技股份有限公司 Method, device, medium and system for transferring non-aligned faces in hand-drawing style
CN115358917B (en) * 2022-07-14 2024-05-07 北京汉仪创新科技股份有限公司 Method, equipment, medium and system for migrating non-aligned faces of hand-painted styles
CN115496824A (en) * 2022-09-27 2022-12-20 北京航空航天大学 Multi-class object-level natural image generation method based on hand drawing
CN115496824B (en) * 2022-09-27 2023-08-18 北京航空航天大学 Multi-class object-level natural image generation method based on hand drawing
CN116542321A (en) * 2023-07-06 2023-08-04 中科南京人工智能创新研究院 Image generation model compression and acceleration method and system based on diffusion model
CN116542321B (en) * 2023-07-06 2023-09-01 中科南京人工智能创新研究院 Image generation model compression and acceleration method and system based on diffusion model

Similar Documents

Publication Publication Date Title
Zhao et al. On leveraging pretrained gans for generation with limited data
CN114399668A (en) Natural image generation method and device based on hand-drawn sketch and image sample constraint
Zhang et al. FFDNet: Toward a fast and flexible solution for CNN-based image denoising
CN109919209B (en) Domain self-adaptive deep learning method and readable storage medium
Gu et al. A systematic survey of prompt engineering on vision-language foundation models
CN109829959B (en) Facial analysis-based expression editing method and device
CN113658051A (en) Image defogging method and system based on cyclic generation countermeasure network
Zhang et al. Accurate and fast image denoising via attention guided scaling
Kwak et al. Generating images part by part with composite generative adversarial networks
Zhang et al. Efficient feature learning and multi-size image steganalysis based on CNN
Zareapoor et al. Diverse adversarial network for image super-resolution
CN110490814B (en) Mixed noise removing method and system based on smooth rank constraint and storage medium
CN110427946A (en) A kind of file and picture binary coding method, device and calculate equipment
Chen et al. Automated design of neural network architectures with reinforcement learning for detection of global manipulations
CN112836602B (en) Behavior recognition method, device, equipment and medium based on space-time feature fusion
Lin et al. A detail preserving neural network model for Monte Carlo denoising
Gupta et al. Image style transfer using convolutional neural networks based on transfer learning
CN113989405B (en) Image generation method based on small sample continuous learning
CN113688882A (en) Training method and device of memory-enhanced continuous learning neural network model
Ruta et al. Hypernst: Hyper-networks for neural style transfer
CN114972611B (en) Depth texture synthesis method based on guide matching loss and related equipment
CN116957917A (en) Image beautifying method and device based on near-end policy optimization
Matskevych et al. From shallow to deep: exploiting feature-based classifiers for domain adaptation in semantic segmentation
CN116310028A (en) Style migration method and system of three-dimensional face model
CN113487475B (en) Interactive image editing method, system, readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination