CN111767774A

CN111767774A - Target image generation method and device and computer-readable storage medium

Info

Publication number: CN111767774A
Application number: CN201911226438.0A
Authority: CN
Inventors: 林嘉; 刘偲; 高晨; 朱德发; 包仁达; 李波; 翁志; 纪鸿焘; 陈宇
Original assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2019-12-04
Filing date: 2019-12-04
Publication date: 2020-10-13

Abstract

The disclosure relates to a method and a device for generating a target image and a computer-readable storage medium, and relates to the technical field of computers. The method comprises the following steps: determining a first feature map and an attention feature map of an image to be processed by using a generated model according to a residual learning method; determining a second feature map of the image to be processed by using the generation model according to a convolution processing method; determining a third feature map by using the generative model according to the second feature map and the attention feature map; and generating a first target image which meets the first additional condition by using the generation model according to the third feature map, the first feature map and the feature information of the first additional condition.

Description

Target image generation method and device and computer-readable storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for generating a target image, and a computer-readable storage medium.

Background

With the development of the technology in the field of computer vision, a target image meeting additional conditions can be generated according to business requirements on the basis of the existing image. For example, the additional condition may be to change a person in an existing image to a specified race, sex, age, or the like; it may also be that the scene in the existing image is changed to a specified period, season, weather, etc.

In recent years, the face aging generation task has attracted more and more attention. This task has a very wide range of application scenarios. For example, in the movie and television entertainment industry, when the roles of the same person at different ages are decorated, the efficiency of movie and television production can be greatly improved and the production cost can be reduced by using the human face aging algorithm; in the field of identity verification, the face aging algorithm can also be used for performing cross-age face verification, and the method helps to find lost children for many years and the like.

In the related art, variations in additional conditions may be simulated using a sample set satisfying different additional conditions. For example, age variation patterns are simulated with differences between different age groups to generate an aging image.

Disclosure of Invention

The inventors of the present disclosure found that the following problems exist in the above-described related art: the image is subjected to undifferentiated processing, so that the generated target image has poor effect and cannot meet additional conditions.

In view of this, the present disclosure provides a technical solution for generating a target image, which can improve the effect of the target image and better satisfy additional conditions.

According to some embodiments of the present disclosure, there is provided a target image generation method including: determining a first feature map and an attention feature map of an image to be processed by using a generated model according to a residual learning method; determining a second feature map of the image to be processed by using the generation model according to a convolution processing method; determining a third feature map by using the generative model according to the second feature map and the attention feature map; and generating a first target image which meets the first additional condition by using the generation model according to the third feature map, the first feature map and the feature information of the first additional condition.

In some embodiments, determining the first feature map and the attention feature map of the image to be processed using the generative model comprises: determining a first feature map and an attention feature map according to a first additional condition; the step of determining the second feature map of the image to be processed by using the generative model comprises the following steps: according to a first additional condition, a second profile is determined.

In some embodiments, determining the first feature map and the attention feature map according to the first additional condition comprises: adding a first additional condition in the residual error learning method by utilizing a condition example regularization mode; according to a first additional condition, determining the second feature map comprises: a first additional condition is added in the convolution processing method by utilizing the regularization mode of the condition example.

In some embodiments, determining the second feature map of the image to be processed using the generative model comprises:

and performing convolution processing on each image area of the image to be processed respectively to determine a second feature map. In some embodiments, determining the second feature map of the image to be processed using the generative model comprises: respectively adding characteristic information of a first additional condition into each image area of the image to be processed by utilizing a condition example regularization mode; and performing convolution processing on each image area added with the characteristic information of the first additional condition to determine a second characteristic map.

In some embodiments, the feature information of the first additional condition is a feature map of the first additional condition; generating a first target image that meets the first additional condition using the generative model includes: fusing the third feature map, the first feature map and the feature map of the first additional condition into data to be processed; and generating a first target image according to the data to be processed.

In some embodiments, the generative model is trained using a loss function determined by: generating a second target image meeting a second additional condition according to the image to be processed by using the generation model according to the second additional condition; according to the third additional condition, generating a third target image meeting the third additional condition according to the image to be processed by using the generation model; according to the third additional condition, generating a fourth target image meeting the third additional condition according to the second target image by using the generation model; and determining a loss function according to the difference between the fourth target image and the third target image and the difference between the image to be processed and the first target image.

In some embodiments, the generative model is trained with an objective function until the objective function reaches a maximum value. The objective function takes the first target image as a variable, is positively correlated with the target similarity, is negatively correlated with the sum of each real similarity, and is negatively correlated with the sum of each generated similarity.

In some embodiments, the target similarity is a similarity of the first target image and the real image satisfying the first additional condition. The true similarity is a similarity between the first target image and a true image satisfying other additional conditions. The generated similarity is a similarity between the first target image and the generated image satisfying the first additional condition and the other additional conditions. The generated image is an image generated by the generation model according to the additional condition.

In some embodiments, the target similarity is a similarity of the feature information of the first real data distribution and the feature information of the first target image. The first real data distribution is a data distribution of a real image that satisfies the first additional condition.

In some embodiments, the true similarity is a similarity between feature information of other true data distributions and feature information of the first target image. The other real data distribution is a data distribution of the real image that satisfies other additional conditions.

In some embodiments, the generation similarity is a similarity of the feature information of the generation data distribution and the feature information of the first target image. The generated data distribution is a data distribution of the generated image that satisfies the first additional condition and the other additional conditions.

In some embodiments, the generative model trains the discrimination results of the first target image and the real image satisfying the first additional condition according to the discriminant model.

According to still further embodiments of the present disclosure, there is provided a target image generating apparatus including: the residual error learning unit is used for determining a first feature map and an attention feature map of the image to be processed by using the generation model according to a residual error learning method; the convolution processing unit is used for determining a second feature map of the image to be processed by using the generation model according to a convolution processing method; a determination unit, configured to determine a third feature map by using the generative model according to the second feature map and the attention feature map; and a generating unit, which is used for generating a first target image which accords with the first additional condition by using the generating model according to the third feature map, the first feature map and the feature information of the first additional condition.

In some embodiments, the residual learning unit determines a first feature map and an attention feature map according to a first additional condition; the convolution processing unit determines a second feature map based on the first additional condition.

In some embodiments, the residual error learning unit adds a first additional condition in the residual error learning method by using a condition example regularization mode; the convolution processing unit adds a first additional condition in the convolution processing method by using a condition example regularization mode.

In some embodiments, the convolution processing unit performs convolution processing on each image area of the image to be processed respectively to determine the second feature map.

In some embodiments, the convolution processing unit adds feature information of a first additional condition in each image area of the image to be processed respectively by using a condition instance regularization mode; and performing convolution processing on each image area added with the characteristic information of the first additional condition to determine a second characteristic map.

In some embodiments, the feature information of the first additional condition is a feature map of the first additional condition; the generating unit fuses the third feature map, the first feature map and the feature map of the first additional condition into data to be processed; and generating a first target image according to the data to be processed.

In some embodiments, the target similarity is a similarity between the first target image and a real image satisfying the first additional condition, the real similarity is a similarity between the first target image and a real image satisfying the other additional condition, the generated similarity is a similarity between the first target image and a generated image satisfying the first additional condition and the other additional condition, and the generated image is an image generated by the generation model according to the additional condition.

In some embodiments, the generated similarity is a similarity of feature information of the generated data distribution and feature information of the first target image, and the generated data distribution is a data distribution of the generated image satisfying the first additional condition and other additional conditions.

According to still further embodiments of the present disclosure, there is provided a target image generation apparatus including: a memory; and a processor coupled to the memory, the processor configured to perform the method of generating a target image in any of the above embodiments based on instructions stored in the memory device.

According to still further embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the generation method of the target image in any of the above embodiments.

In the above embodiment, the convolution processing result is weighted by using the attention feature map output by the residual learning method, so that the generation method can focus on the key region in the image to be processed. Thus, the target image which can better meet the additional condition can be generated, and the effect of the target image is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure can be more clearly understood from the following detailed description with reference to the accompanying drawings, in which:

FIG. 1 illustrates a flow diagram of some embodiments of a method of generating a target image of the present disclosure;

FIG. 2 illustrates a schematic diagram of some embodiments of a method of generating a target image of the present disclosure;

FIG. 3 shows a schematic diagram of further embodiments of a method of generating a target image of the present disclosure;

FIG. 4 illustrates a block diagram of some embodiments of a target image generation apparatus of the present disclosure;

FIG. 5 shows a block diagram of further embodiments of an apparatus for generating a target image of the present disclosure;

FIG. 6 illustrates a block diagram of still further embodiments of an apparatus for generating a target image of the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Fig. 1 illustrates a flow diagram of some embodiments of a method of generating a target image of the present disclosure.

As shown in fig. 1, the method includes: step 110, residual error processing is carried out; step 120, performing convolution processing; step 130, determining a third feature map; and step 140, generating a first target image.

In step 110, a first feature map and an attention feature map of the image to be processed are determined using the generative model according to a residual learning method. For example, the image to be processed is a portrait image, and the user needs to perform an aging process on the image to be processed.

In step 120, a second feature map of the image to be processed is determined using the generative model according to a convolution processing method.

In some embodiments, a first feature map and an attention feature map are determined based on a first additional condition; according to a first additional condition, a second profile is determined. For example, a first additional condition may be added to the residual learning method by using CIN (Conditional Instance regularization); a first additional condition is added in the convolution processing method by utilizing the regularization mode of the condition example.

In some embodiments, the convolution processing is performed on each image area of the image to be processed to determine the second feature map. For example, by using a condition instance regularization mode, adding feature information of a first additional condition in each image area of the image to be processed respectively; and performing convolution processing on each image area added with the characteristic information of the first additional condition to determine a second characteristic map.

In step 130, a third feature map is determined using the generative model based on the second feature map and the attention feature map.

In step 140, a first target image conforming to the first additional condition is generated using the generative model based on the third feature map, the first feature map, and the feature information of the first additional condition. For example, the first additional condition is specified age information for aging or rejuvenation.

In some embodiments, the feature information of the first additional condition is a feature map of the first additional condition. Fusing the third feature map, the first feature map and the feature map of the first additional condition into data to be processed; and generating a first target image according to the data to be processed.

In some embodiments, the generative model may be utilized to generate the target image by the embodiment in FIG. 2.

Fig. 2 illustrates a schematic diagram of some embodiments of a method of generating a target image of the present disclosure.

As shown in fig. 2, the generator (generative model) may include a deep generative network, a set of samplers, and a fusion generative network. For example, the generator is configured to generate an aged or younger image as an output image, with a picture of a human face as an input image.

In some embodiments, the depth generation network is built up from a series of residual modules. For example, the residual module may include a downsampling residual module, a residual processing module, and an upsampling residual module. The down-sampling residual error module is used for performing down-sampling processing and residual error processing; the residual error processing module comprises a plurality of layers and is used for carrying out residual error processing; the up-sampling residual module is used for carrying out up-sampling processing and residual processing. The depth generation network adopts a structure of an up-sampling residual error module and a down-sampling residual error module so as to ensure that the feature size of output is consistent with that of input.

For example, the input image of the depth generation network may be an original face picture, and the output is a deep feature for generating a face, and an attention feature map for guiding a sampler group.

For example, age condition information (specified age information) may be embedded in each residual module of the depth generation network by means of CIN. Therefore, the age condition information can be better coded into the depth feature, and the human face feature of the specified age can be generated.

In some embodiments, the sampler group can adopt a multi-branch parallel structure, and each branch comprises a CIN module for embedding age condition information and an intermediate convolution module for extracting shallow features. After the input image is processed by the pre-convolution module, the processing result can be input into each branch for processing different areas of the input image.

For example, each of the intermediate convolution modules may be a shallow network composed of hole convolutions (convolutions) with different expansion rates (expansion rates). Each intermediate convolution module is used for acquiring pixel information of different positions in the input picture.

For example, the input to the set of samplers is the original face picture and the output is a series of shallow features that are used to generate a face. The shallow feature output by each branch is directed to different areas of the face.

In some embodiments, the attention feature map output by the depth generation network may be utilized, and the shallow features input by each branch are weighted to obtain weighted shallow features, so that certain specific regions may be focused.

For example, in the case of changing a mustache, a middle-aged man, into a child, the generation of the image area where the mouth is located may be more dependent on the far-distance face information. In this case, the effect of aging or rejuvenation treatment of age span can be improved by the attention feature map.

In some embodiments, the fusion network obtains the to-be-processed data from the deep-layer features, the weighted shallow-layer features and the feature map of the age condition information. And sending the data to be processed into a subsequent convolutional layer to obtain an aged or younger picture of a specified age. For example, the network may cascade feature maps of deep features, weighted shallow features, and age condition information to obtain the data to be processed.

In some embodiments, for processing tasks of face aging or rejuvenation, the generator is trained not only to constrain the generator to be able to generate pictures that meet the target data distribution and satisfy the input conditions, but also to constrain the identity preservation conditions. I.e. training of the generator also constrains the input face and the output face to be distinguishable as one and the same person.

For example, the generator needs to generate the image satisfying the age condition y from the input image x₁Output image G (x, y)₁). In this case, the loss function L can be determined as follows:

e () is a mathematical expectation in the case where each variable satisfies a certain probability distribution, q (X, Y) is a true data distribution (joint probability distribution) of the true picture X satisfying the age condition Y, q (X) is a true data distribution (probability distribution) of the true picture X, q (Y) is a true data distribution (probability distribution) of the age condition Y, | | | | | g₁Is a norm of 1.

y₂And y₃To be distinguished from y₁Other two age conditions. The generator generates a signal satisfying an age condition y from an input image x₃The output image of (1); the generator generates the product satisfying the age condition y₂Image G (x, y) of₂) And then from image G (x, y)₂) Generating a model satisfying an age condition y₃Output image G (G (x, y))₂),y₃)。

In the above formula, the first term is used to evaluate whether the generator can maintain the consistency (e.g., identity consistency) of the input and output in the case of reconstruction based on the input image; the second term is used to evaluate whether the generator can maintain consistency of direct and indirect processing in the case of performing direct and indirect processing based on the input image.

In this way, the consistency (e.g., identity consistency) of the input and output images may be constrained across a sequence of additional conditions (e.g., age conditions). Moreover, the loss function can realize the functions without an additional network, and is more flexible and convenient.

In some embodiments, the generator may be not only used to generate the target image, but also to perform a confrontational training on the generator using the target function. For example, the generative model is trained using the objective function until the objective function reaches a maximum value. The objective function takes the first target image as a variable, is positively correlated with the target similarity, is negatively correlated with the sum of each real similarity, and is negatively correlated with the sum of each generated similarity.

In some embodiments, the target similarity is a similarity of the first target image and the real image satisfying the first additional condition. For example, the target similarity is a similarity between the feature information of the first real data distribution and the feature information of the first target image, and the first real data distribution is a data distribution of the real image that satisfies the first additional condition.

In some embodiments, the true similarity is a similarity of the first target image and a true image satisfying other additional conditions. For example, the true similarity is a similarity between feature information of other true data distributions and feature information of the first target image. The other real data distribution is a data distribution of the real image that satisfies other additional conditions.

In some embodiments, the generated similarity is a similarity of the first target image and a generated image satisfying the first additional condition and the other additional conditions, and the generated image is an image generated by the generation model according to the additional conditions. For example, the generated similarity is a similarity between the feature information of the generated data distribution and the feature information of the first target image, and the generated data distribution is a data distribution of the generated image satisfying the first additional condition and the other additional conditions.

In some embodiments, the generator may be countertrained using the discriminators and cross classifiers of FIG. 3.

FIG. 3 shows schematic diagrams of further embodiments of methods of generating a target image of the present disclosure.

As shown in fig. 3, after the input picture is processed by the generator, an output picture satisfying the age condition is generated; the discriminator distinguishes the output picture from the real picture meeting the corresponding age condition; and performing countermeasure training on the generator by combining the distinguishing result of the discriminator and the objective function of the cross classifier.

In some embodiments, on the task of face aging, since it is necessary to specify the age condition of the generated picture, it can be implemented by CGAN (Conditional generation assisted network).

CGANs are characterized in that both the generator and the arbiter receive an additional condition information, so that the data generated by the generator not only matches the distribution of the real data, but also satisfies the condition information (e.g., age condition). Therefore, the discriminator needs to distinguish the joint distribution of the genuine data and the condition information. That is, the discriminator is capable of discriminating between the generated picture and the real picture, and also judges whether or not the picture generated by the generator satisfies a specified age condition.

In some embodiments, where the generator wants to generate an output image G (x, y ═ k) from the input image x, the modeling equation of the generator may be a log linear model in the feature space:

f_Φ(G (x, y ═ K) is a feature vector of an image G (x, y ═ K) satisfying condition information (e.g., age condition) y ═ K generated from the input image x by the generator, and K is the total number of classifications of the condition information_Φ(G (x, y ═ k)) is a conditional probability. The kth generation data (generated pictures satisfying y ═ k) of the generator all satisfy the same generation data distribution, namely, are located in the same classification hyperplane.

And generating a normal vector of a hyperplane in which the classification of the data is positioned for the kth class, wherein T is transposition operation.

The larger the numerical value of (a) is, the closer the generated picture G (x, y ═ k) is to the generated data distribution of the kth class of the generator.

In the case where this modeling equation reaches the maximum value, the generator completes the generation of G (x, y ═ k).

On the basis of the modeling equation, the kth real data (the real pictures satisfying y ═ k) all satisfy the same real data distribution, namely, are located in the same classification hyperplane.

And (4) a normal vector of a hyperplane in which the kth type real data is classified.

In some embodiments, the cross classifier of the present disclosure can enable f_Φ(G (x, y ═ k)) best fits the feature vectors of the kth class of real data, while feature vectors of data are generated far away from the feature vectors of other classes of real data and from other classes.

In some embodiments, selecting

As fitting direction of the generator, i.e.

Will use

As a gradient, directly towards

The fitting is carried out according to the direction of the user, so that the training and learning efficiency is improved.

For example, an objective function of a training generator with a force f may be configured in a cross classifier_Φ(G (x, y ═ k)) away from

Thrust term and force f_Φ(G (x, y ═ k)) close to

The tension term (c).

In the selection

As a fitting direction of the generator, when the generator maximizes the modeling equation, the generator is constrained to generate data G to P (G | y ═ k), and is far from the classification hyperplane of other real data

In some embodiments, the age categories are mutually exclusive on the face aging task. Classification hyperplanes of other category data may be used as a pushing term to improve the performance of the generator, i.e., cross-classifier. For example, each category is divided into two subcategories of real data and generated data, and then there are 2 × K categories in the cross classifier. This cross classifier can be constructed as a model with 1 pull term and 2 xK-1 push terms. Therefore, when training a generator, which is required to classify the generated data into corresponding classes, the objective function of the training generator is as follows:

d is q (X, Y ═ i) or p (X, Y ═ i) (a generated data distribution of the generated picture X satisfying the age condition Y).

The larger the size, the closer G (x, y ═ k) is to the data distribution of the kth-th-class real data;

the larger (e.g., i ≠ k) G (x, y ═ k) is, the closer to the data distribution of the i-th class real data or the data distribution of the i-th class generated data is.

Thus, by training the generator with the above-described objective function, not only,object classes capable of fitting real data

But also constrained from other classes in the true data distribution q (X, Y) and all classes in the generated data distribution p (X, Y). The cross classifier expands the number of thrust terms, thereby using more classification hyperplanes of mutually exclusive categories to compress a search space and improve the performance of the model.

In the above embodiment, the deep generation network with the age information embedded in the whole processing process makes the generated deep features contain as much condition information as possible to guide the generation of the picture.

In the above embodiments, the shallow and deep features generated by the set of samplers form complements. The shallow features are well weighted by the attention feature map from the deep generation network. The difficult problem of the change of the cross-age period (the change of additional conditions is large) is well solved by adopting the multi-branch hole convolution sampler.

In the above embodiment, the cross classifier based discrimination network increases the number of thrust terms for network convergence under the guidance of mathematical theory. Therefore, more classified hyperplane compressed search spaces of mutually exclusive categories can be effectively used, and the performance of the model is improved.

In the above embodiment, the loss function constrains identity consistency across the entire age series (additional condition series), and no additional network is required, which is very flexible and convenient.

Fig. 4 illustrates a block diagram of some embodiments of a generation apparatus of a target image of the present disclosure.

As shown in fig. 4, the target image generation device 4 includes a residual learning unit 41, a convolution processing unit 42, a determination unit 43, and a generation unit 44.

The residual learning unit 41 determines the first feature map and the attention feature map of the image to be processed using the generation model according to the residual learning method.

In some embodiments, the residual learning unit 41 determines the first feature map and the attention feature map according to a first additional condition; the convolution processing unit determines a second feature map based on the first additional condition.

In some embodiments, the residual learning unit 41 adds a first additional condition in the residual learning method by means of condition instance regularization; the convolution processing unit adds a first additional condition in the convolution processing method by using a condition example regularization mode.

The convolution processing unit 42 determines the second feature map of the image to be processed using the generation model according to the convolution processing method.

In some embodiments, the convolution processing unit 42 performs convolution processing on each image area of the image to be processed, respectively, to determine the second feature map.

In some embodiments, the convolution processing unit 42 adds feature information of a first additional condition in each image region of the image to be processed respectively by using a condition instance regularization mode; and performing convolution processing on each image area added with the characteristic information of the first additional condition to determine a second characteristic map.

The determination unit 43 determines the third feature map using the generative model from the second feature map and the attention feature map.

The generation unit 44 generates a first target image that meets the first additional condition using the generation model based on the third feature map, the first feature map, and the feature information of the first additional condition.

In some embodiments, the feature information of the first additional condition is a feature map of the first additional condition; the generating unit 44 fuses the third feature map, the first feature map and the feature map of the first additional condition into the data to be processed; and generating a first target image according to the data to be processed.

FIG. 5 illustrates a block diagram of further embodiments of an apparatus for generating a target image of the present disclosure.

As shown in fig. 5, the generation apparatus 5 of the target image of the embodiment includes: a memory 51 and a processor 52 coupled to the memory 51, the processor 52 being configured to execute the method for generating a target image in any one of the embodiments of the present disclosure based on instructions stored in the memory 51.

The memory 51 may include, for example, a system memory, a fixed nonvolatile storage medium, and the like. The system memory stores, for example, an operating system, application programs, a boot loader, a database, and other programs.

As shown in fig. 6, the generation device 6 of the target image of the embodiment includes: a memory 610 and a processor 620 coupled to the memory 610, the processor 620 being configured to execute the method for generating a target image in any of the above embodiments based on instructions stored in the memory 610.

The memory 610 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a boot loader, and other programs.

The target image generation apparatus 6 may further include an input-output interface 630, a network interface 640, a storage interface 650, and the like. These

interfaces

630, 640, 650 and the connections between the memory 610 and the processor 620 may be through a bus 660, for example. The input/output interface 630 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 640 provides a connection interface for various networking devices. The storage interface 650 provides a connection interface for external storage devices such as an SD card and a usb disk.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media having computer-usable program code embodied therein.

Up to this point, a generation method of a target image, a generation apparatus of a target image, and a computer-readable storage medium according to the present disclosure have been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.

The method and system of the present disclosure may be implemented in a number of ways. For example, the methods and systems of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A method of generating a target image, comprising:

determining a first feature map and an attention feature map of an image to be processed by using a generated model according to a residual learning method;

determining a second feature map of the image to be processed by using the generated model according to a convolution processing method;

determining a third feature map by using the generative model according to the second feature map and the attention feature map;

and generating a first target image meeting the first additional condition by using the generative model according to the third feature map, the first feature map and the feature information of the first additional condition.

2. The generation method according to claim 1, wherein the determining the first feature map and the attention feature map of the image to be processed using the generation model includes:

determining the first feature map and the attention feature map according to the first additional condition;

the determining the second feature map of the image to be processed by using the generative model comprises:

and determining the second feature map according to the first additional condition.

3. The generation method according to claim 2, wherein the determining the first feature map and the attention feature map according to the first additional condition includes:

adding the first additional condition into the residual error learning method by utilizing a condition example regularization mode;

the determining the second feature map according to the first additional condition includes:

and adding the first additional condition in the convolution processing method by using a condition example regularization mode.

4. The generation method according to claim 1, wherein the determining a second feature map of the image to be processed using the generative model comprises:

and performing convolution processing on each image area of the image to be processed respectively to determine the second feature map.

5. The generation method according to claim 4, wherein the determining a second feature map of the image to be processed using the generative model comprises:

respectively adding the characteristic information of the first additional condition into each image area of the image to be processed by utilizing a condition example regularization mode;

performing convolution processing on the image areas to which the feature information of the first additional condition is added to determine the second feature map.

6. The generation method according to claim 4,

the feature information of the first additional condition is a feature map of the first additional condition;

the generating, with the generative model, a first target image that meets the first additional condition includes:

fusing the third feature map, the first feature map and the feature map of the first additional condition into data to be processed;

and generating the first target image according to the data to be processed.

7. The generation method of claim 1, wherein the generative model is trained with a loss function determined by:

according to a second additional condition, generating a second target image meeting the second additional condition according to the image to be processed by using the generation model;

according to a third additional condition, generating a third target image meeting the third additional condition according to the image to be processed by using the generation model;

according to a third additional condition, generating a fourth target image meeting the third additional condition according to the second target image by using the generation model;

and determining the loss function according to the difference between the fourth target image and the third target image and the difference between the image to be processed and the first target image.

8. The generation method according to any one of claims 1 to 7,

the generative model is trained by using an objective function until the objective function reaches a maximum value, the objective function takes the first objective image as a variable, is positively correlated with the objective similarity, is negatively correlated with the sum of each real similarity, and is negatively correlated with the sum of each generative similarity,

the target similarity is a similarity of the first target image and a real image satisfying the first additional condition,

the true similarity is a similarity of the first target image and a true image satisfying other additional conditions,

the generated similarity is the similarity between the first target image and a generated image satisfying the first additional condition and the other additional conditions, and the generated image is an image generated by the generation model according to additional conditions.

9. The generation method according to claim 8,

the target similarity is a similarity between feature information of a first real data distribution and feature information of the first target image, the first real data distribution being a data distribution of a real image satisfying the first additional condition,

the true similarity is a similarity between the feature information of the other true data distribution and the feature information of the first target image, the other true data distribution is a data distribution of the true image that satisfies the other additional condition,

the generation similarity is a similarity between feature information of a generation data distribution and feature information of the first target image, and the generation data distribution is a data distribution of a generation image satisfying the first additional condition and the other additional conditions.

10. The generation method according to claim 8,

and the generation model trains the distinguishing result of the first target image and the real image meeting the first additional condition according to a discrimination model.

11. An apparatus for generating an image of an object, comprising:

the residual error learning unit is used for determining a first feature map and an attention feature map of the image to be processed by using the generation model according to a residual error learning method;

the convolution processing unit is used for determining a second feature map of the image to be processed by using the generation model according to a convolution processing method;

a determination unit, configured to determine a third feature map by using the generated model according to the second feature map and the attention feature map;

and the generating unit is used for generating a first target image which accords with the first additional condition by using the generating model according to the third feature map, the first feature map and the feature information of the first additional condition.

12. An apparatus for generating an image of an object, comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the method of generating a target image of any of claims 1-10 based on instructions stored in the memory.

13. A computer-readable storage medium on which a computer program is stored which, when executed by a processor, implements the method of generating an object image of any one of claims 1-10.