CN112861592A

CN112861592A - Training method of image generation model, image processing method and device

Info

Publication number: CN112861592A
Application number: CN201911193533.5A
Authority: CN
Inventors: 黄星
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-11-28
Filing date: 2019-11-28
Publication date: 2021-05-28
Anticipated expiration: 2039-11-28
Also published as: CN112861592B

Abstract

The disclosure relates to a training method of an image generation model, an image processing method and an image processing device, and relates to the technical field of image processing. The feature layers of the feature map are grouped according to the number of channels of the feature map to obtain a target feature map containing a target number of feature layer groups, learnable parameters of a preset normalization conversion function are set according to a random vector of an input model, the number of channels of the feature map and the target number to obtain a target normalization conversion function, and the feature map is normalized according to the target normalization conversion function, so that training of an image generation model is completed. In the normalization operation process, the input of the image generation model is considered, and the characteristics of the normalization operation object are considered, so that parameters in the normalization operation process are correlated, the situation that the difference disappears due to the normalization process is ensured, the training effect of the model for image generation is improved, the quality of the generated image is improved, and the training time of the model is shortened.

Description

Training method of image generation model, image processing method and device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a training method for an image generation model, an image processing method, and an image processing apparatus.

Background

With the development of technology, the application range of deep learning is wider and wider, the deep learning is the internal rule and the expression level of the learning sample data, so that the network model can have the analysis and learning capability like a human, can identify data such as characters, images and sound,

in the training process of the deep learning network model, the feature map processed by the network layer in the deep learning network model is generally normalized to obtain normalized features, so that the data of the feature map is changed into the distribution with the mean value of 0 and the standard deviation of 1 or the distribution with the range of 0-1, thereby shortening the convergence time of the model and improving the training effect of the model, wherein the current training method uses a standardized score method, the method is standardized according to the mean value and the standard deviation of the original data, so that the processed data conforms to the standard normal distribution, namely the mean value is 0 and the standard deviation is 1, namely the formula is used as 1

And (2) performing a normalization operation, wherein x is raw data, E (x) is a mean value of x, Var (x) is a variance of x, y (x) is a normalization processing result, and gamma and beta are learnable parameters, wherein gamma and beta are independent values which are independently optimized along with a loss back propagation process in a model training process and are not connected with a model input quantity and a model, so that the difference disappears due to the normalization process, and the image quality is reduced.

Disclosure of Invention

The present disclosure provides a training method of an image generation model, an image processing method and an image processing apparatus, so as to improve the training effect of the model for image generation and improve the quality of generated images.

The technical scheme of the disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, the present disclosure provides a training method for an image generation model, including:

acquiring a random vector, wherein the random vector is a vector with a preset dimension;

inputting the random vector into an image generation model to be trained for processing to obtain a characteristic diagram comprising a plurality of channels;

grouping the feature layers of the feature map to obtain a target feature map containing a target number of feature layer groups, wherein the target number is a divisor of a channel number of the feature map;

setting learnable parameters of a preset normalization conversion function according to the random vector, the number of channels of the feature map and the target number to obtain a target normalization conversion function;

normalizing the feature map according to the target normalization conversion function to obtain an intermediate feature map;

processing the intermediate characteristic graph by using the image generation model to be trained to obtain a predicted image;

and adjusting parameters of the image generation model to be trained based on the predicted image and the sample image to obtain the trained image generation model, wherein the sample image is an image with specified facial features. Optionally, the image generation model is a generative confrontation network or a variational self-encoder.

Optionally, the setting learnable parameters of a preset normalization conversion function according to the random vector, the number of channels of the feature map, and the target number to obtain a target normalization conversion function includes:

respectively generating a first matrix and a second matrix according to the random vector and the number of channels and the target number of the feature map, wherein the number of columns of the first matrix and the number of columns of the second matrix are the same as the dimension of the random vector, respectively generating a first vector and a second vector according to the number of channels and the target number of the feature map, and the number of rows of the first matrix, the number of rows of the second matrix, the number of rows of the first vector and the number of rows of the second vector are the same;

and setting learnable parameters of a preset normalization conversion function according to the first matrix, the random vector, the first vector, the second matrix and the second vector to obtain a target normalization conversion function.

Optionally, the preset normalization conversion function is

Wherein x is the target feature map, E (x) is the mean of x, Var (x) is the variance of x, y (x) is the intermediate feature map, and γ and β are learnable parameters.

Optionally, the target normalized conversion function is:

wherein Z represents the random vector, A_γRepresenting said first matrix, B_γRepresenting said first vector, A_βRepresenting said second matrix, B_βRepresenting said second vector, wherein Z is an m-dimensional column vector, A_γAnd A_βA matrix of g rows and m columns each consisting of g x m numbers, B_γAnd B_βAre g-dimensional column vectors, wherein g is set according to the number of channels of the feature map, wherein

Wherein C represents the number of channels of the feature map, G is the target number, and G is an integer.

According to a second aspect of embodiments of the present disclosure, there is provided an image processing method including:

acquiring an image to be processed;

inputting the image to be processed into a trained image generation model for processing to obtain an image with specified facial features, wherein the trained image generation model is obtained by training according to the method of any one of claims 1 to 5.

Optionally, the image with the specified facial features is an image with doll face features.

According to a third aspect of the embodiments of the present disclosure, there is provided a training apparatus for an image generation model, including:

the acquisition module is configured to acquire a random vector, and the random vector is a vector with a preset dimensionality;

the processing module is configured to input the random vector into an image generation model to be trained for processing to obtain a feature map comprising a plurality of channels;

the grouping module is configured to group the feature layers of the feature map to obtain a target feature map comprising a target number of feature layer groups, wherein the target number is a divisor of a channel number of the feature map;

the setting module is configured to set learnable parameters of a preset normalization conversion function according to the random vector, the number of channels of the feature map and the target number to obtain a target normalization conversion function;

the normalization module is configured to perform normalization processing on the target feature map according to the target normalization conversion function to obtain an intermediate feature map;

the processing module is configured to process the intermediate feature map by using the image generation model to be trained to obtain a predicted image;

and the training module is configured to adjust the parameters of the image generation model to be trained based on the predicted image and the sample image to obtain the trained image generation model, wherein the sample image is an image with specified facial features.

Optionally, the image generation model is a generative confrontation network or a variational self-encoder.

Optionally, the setting module is specifically configured to:

Optionally, the preset normalization conversion function is

Optionally, the target normalized conversion function is:

According to a fourth aspect of the embodiments of the present disclosure, there is provided an image processing apparatus comprising:

an acquisition module configured to acquire an image to be processed;

an image processing module configured to input the image to be processed into a trained image generation model for processing, so as to obtain an image with specified facial features, wherein the trained image generation model is trained according to the method of any one of the above first aspects.

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic apparatus, comprising: a processor;

a memory for storing processor-executable instructions, wherein the processor is configured to execute the instructions to implement the method for training an image generation model according to any of the above first aspects or the method for processing an image according to any of the above second aspects.

According to a sixth aspect of the embodiments of the present disclosure, there is provided a storage medium having stored therein a computer program, which when executed by a processor, implements the method for training an image generation model according to any one of the above first aspects or the method for processing an image according to any one of the above second aspects.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method for training an image generation model according to any of the above first aspects or the method for processing an image according to any of the above second aspects.

The training method, the image processing method and the device for the image generation model provided by the embodiment of the disclosure at least have the following beneficial effects:

obtaining a feature map comprising a plurality of channels by inputting the random vector into an image generation model to be trained for processing, grouping feature layers of the feature map according to the number of the channels of the feature map to obtain a target feature map comprising a target number of feature layer groups, setting learnable parameters of a preset normalization conversion function according to the random vector, the number of the channels of the feature map and the target number to obtain a target normalization conversion function, normalizing the feature map according to the target normalization conversion function to obtain an intermediate feature map, processing the intermediate feature map by using the image generation model to be trained to obtain a predicted image, so that the input of the image generation model is considered in the normalization operation process, the characteristics of a normalization operation object are also considered, and the parameters in the normalization operation process are mutually associated, and finally, adjusting parameters of the image generation model to be trained based on the predicted image and the sample image, and finally finishing the training of the image generation model, so that the training effect of the model for image generation is improved, and the quality of the generated image is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a schematic diagram illustrating a method of training an image generation model in accordance with an exemplary embodiment;

FIG. 2 is a schematic diagram illustrating an image processing method according to an exemplary embodiment;

FIG. 3 is a block diagram illustrating an apparatus for training an image generation model in accordance with an exemplary embodiment;

FIG. 4 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment;

FIG. 5 is a block diagram illustrating a first type of electronic device in accordance with an exemplary embodiment;

FIG. 6 is a block diagram illustrating a second type of electronic device, according to an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The embodiment of the disclosure discloses a training method of an image generation model, an image processing method and an image processing device, which are respectively described below.

FIG. 1 is a schematic diagram illustrating a method of training an image generation model according to an exemplary embodiment, as shown in FIG. 1, including the steps of:

in step S110, a random vector is obtained, where the random vector is a vector with a preset dimension.

The training method for the image generation model according to the embodiment of the present disclosure may be implemented by an electronic device, and specifically, the electronic device may be a server.

The image generation model disclosed by the present disclosure is a generation model based on deep learning, and specifically, the image generation model may be a GAN (generic adaptive Network) model, a VAE (Variational auto encoder) model, or the like, and a random vector is obtained and input to the image generation model as an input of the image generation model and processed to generate an image having certain characteristics, for example, an image having doll face characteristics, an image having old person face characteristics, an image having female person face characteristics, or the like, and the image having certain characteristics is set according to actual needs. The random vector is a vector of a predetermined dimension, for example, an image generation model is used to generate an image with doll face features, the random vector is a column vector Z of dimension 512, and Z is input to the image generation model to generate an image with doll face features. The random vector is input into the image generation model for processing to generate an image with doll face features, which is not described in detail herein. In the training process of the image generation model, the feature map processed by the network layer in the image generation model is generally subjected to normalization processing to obtain normalized features, so that the data of the feature map is changed into the distribution with the mean value of 0 and the standard deviation of 1 or the distribution with the range of 0-1, and thus, the model convergence time can be shortened, and the model training effect can be improved.

In one possible embodiment, the image generation model is a generative confrontation network or a variational self-encoder.

The GAN model is modeled by (at least) two modules in a framework: the mutual game learning of the Generative Model (Generative Model) and the Discriminative Model (Discriminative Model) yields a reasonably good output. The VAE model aims to learn the basic probability distribution of the training data so that new data can be easily sampled from the learned distribution in order to generate an image of the target feature. The generation of images using GAN and VAE models is prior art and will not be described in detail here.

In step S120, the random vector is input into an image generation model to be trained for processing, so as to obtain a feature map including a plurality of channels.

The image generation model is a deep learning model and can comprise a plurality of network layers, specifically, the network layers can be convolutional layers, connection layers, up-sampling layers and activation layers, the random vector is input into the image generation model to be trained for processing, feature extraction is carried out through each network layer of the image generation model, and a feature map output by the network layers of the image generation model is obtained and comprises a plurality of channels. For example, the image generation model is a GAN model, where the generation model of the GAN model includes a plurality of network layers, for example, the generation model of the GAN model includes a plurality of network layers, such as convolutional layers, pooling layers, connection layers, activation layers, upsampling layers, etc., a random vector is a column vector Z with dimension 512, Z is input into the image generation model, and after processing of each network layer in the image generation model, a C × H × W image is finally generated, where C is the number of channels, H is the image feature length, and W is the image feature width, for example, the image output by the generation model of the GAN model is 3 × 256 × 256.

For example, a feature map P output from an active layer in the network layer of the image generation model is acquired, the size of P is 512 × 4 × 4, the channel thereof is 512, the feature map P is processed by the network layers such as a convolutional layer and an upsampling layer in the network layer of the image generation model, and the final generated image is 3 × 256 × 256.

In step S130, the feature layers of the feature map are grouped to obtain a target feature map including a target number of feature layer groups, where the target number is a divisor of a number of channels of the feature map.

And grouping the feature layers of the feature map to obtain a target feature map comprising a target number of feature layer groups, wherein the target number is a divisor of the number of channels of the feature map, for example, obtaining the feature map P, and grouping the feature layers of the feature map according to the number of channels 512 of the feature map P, wherein the target number is a divisor of the number of channels of the feature map, for example, obtaining 16 target feature maps, and each target feature map comprises 32 channels of feature layers.

In step S140, a learnable parameter of a preset normalization conversion function is set according to the random vector, the number of channels of the feature map, and the target number, so as to obtain a target normalization conversion function.

The normalized transformation function is provided with a learnable parameter indicating the degree of scaling of the data and the amount of data offset, e.g. by presetting the normalized transformation function to

Where x is the target feature map, e (x) is the mean of x, var (x) is the variance of x, y (x) is the normalized result of the target feature map, and γ and β are learnable parameters, where γ represents the degree of data scaling, and β represents the data offset. Since γ and β are learnable parameters, the data for normalization is more satisfactoryAnd in the normalization operation process, the input of the image generation model is considered, the characteristics of the normalization operation object are also considered, the parameters in the normalization operation process are correlated, the difference is ensured not to disappear due to the normalization process, gamma and beta can be correlated, and the gamma and beta are correlated with the input of the image generation model, and then the gamma and beta are set according to the random vector, the channel number of the characteristic diagram and the target number. For example, if the dimension of the random vector Z is 512, the number of channels of the feature map P is 512, and the number of targets is 16, each target feature map includes 32 channels of feature layers, and the target feature map is represented by x, thereby further calculating the mean e (x) and the variance var (x) of the target feature map. Based on the random vector 512, the number of channels 512 in the feature map and the target number 16, for example, the feature layer included in each target feature map is 512/16 ═ 32 channels, and a first matrix and a second matrix are respectively generated at random, where the first matrix is a matrix of 32 × 512 and is denoted as a_γThe second matrix is also a 32 × 512 matrix, denoted as A_βRespectively generating a first vector and a second vector according to the number of channels of the characteristic diagram and the target number, wherein the first vector is marked as B_γSecond vector is denoted as B_β，B_γAnd B_βSetting learnable parameters of a preset normalization conversion function according to the random vector, the number of channels of the characteristic diagram and the target number to obtain a target normalization conversion function of 32-dimensional column vectors

Setting learnable parameters of a preset normalization conversion function according to the random vector, the number of channels of the feature map and the target number to obtain a target normalization conversion function, normalizing the target feature map according to the target normalization conversion function to obtain a middle feature map, so that the input of an image generation model and the features of normalization operation objects are considered in the normalization operation process, the parameters in the normalization operation process are correlated, the difference is ensured not to disappear due to the normalization process, and the training effect of the model for image generation is improvedThe quality of the generated image is improved.

In step S150, the target feature map is normalized according to the target normalization transformation function, so as to obtain an intermediate feature map.

And after the target normalization conversion function is determined, normalizing the target characteristic diagram according to the target normalization conversion function to obtain an intermediate characteristic diagram. And then the intermediate feature map can be input into other network layers in the image generation model to be trained to perform feature extraction, so that the difference is prevented from disappearing due to the normalization process, the training effect of the model for image generation is improved, and the quality of the generated image is improved.

In step S160, the intermediate feature map is processed by the image generation model to be trained, so as to obtain a predicted image.

The intermediate feature map is obtained after the target normalization conversion function processing, and other network layers in the image generation model to be trained can be input for feature extraction, so that the image generation model to be trained finally generates a predicted image.

In step S170, parameters of the image generation model to be trained are adjusted based on the predicted image and a sample image, so as to obtain a trained image generation model, where the sample image is an image with a specified facial feature.

The sample image is an image having a predetermined facial feature, for example, the sample image is an image of a doll face feature, and a difference between the sample image and the predicted image is calculated based on the predicted image and the sample image, so that a parameter of the image generation model to be trained is adjusted based on a loss function to obtain a trained image generation model. The parameter adjustment of the image generation model to be trained based on the sample image and the prediction image is prior art and is not described herein again.

Obtaining a feature map comprising a plurality of channels by inputting the random vector into an image generation model to be trained for processing, obtaining a target feature map comprising a target number of feature layer groups by grouping feature layers of the feature map according to the number of channels of the feature map, setting learnable parameters of a preset normalization conversion function according to the random vector, the number of channels of the feature map and the target number to obtain a target normalization conversion function, normalizing the feature map according to the target normalization conversion function to obtain an intermediate feature map, processing the intermediate feature map by using the image generation model to be trained to obtain a predicted image, and enabling the parameters in the normalization operation process to be related to each other by considering the input of the image generation model and the characteristics of a normalization operation object, and finally, adjusting the parameters of the image generation model to be trained based on the predicted image and the sample image, and finally finishing the training of the image generation model, thereby improving the training effect of the model for image generation and the quality of the generated image.

In a possible implementation manner, the setting learnable parameters of a preset normalization conversion function according to the random vector, the number of channels of the feature map, and the target number to obtain a target normalization conversion function includes:

generating a first matrix and a second matrix respectively according to the random vector, the number of channels of the feature map and the target number, wherein the number of columns of the first matrix and the number of columns of the second matrix are the same as the dimension of the random vector, generating a first vector and a second vector respectively according to the number of channels of the feature map and the target number, and the number of rows of the first matrix, the number of rows of the second matrix, the number of rows of the first vector and the number of rows of the second vector are the same;

The dimension of the random vector Z is 512, the number of channels of the feature map P is 512, the number of targets is 16, each target feature map comprises 32 channels of feature layers, and the target feature map is represented by x, thereby further calculating the mean e (x) and the variance var (x) of the target feature map. Based on the random vector 512, the number of channels 512 in the feature map and the target number 16, for example, the feature layer included in each target feature map is 512/16 ═ 32 channels, and a first matrix and a second matrix are respectively generated at random, where the first matrix is a matrix of 32 × 512 and is denoted as a_γThe second matrix is also a 32 × 512 matrix, denoted as A_βRespectively generating a first vector and a second vector according to the number of channels of the characteristic diagram and the target number, wherein the first vector is marked as B_γSecond vector is denoted as B_β，B_γAnd B_βSetting learnable parameters of a preset normalization conversion function according to the random vector, the number of channels of the characteristic diagram and the target number to obtain a target normalization conversion function of 32-dimensional column vectors

And normalizing the target feature map according to the random vector, the number of channels of the feature map and the grouped number to obtain an intermediate feature map, so that the input of an image generation model and the features of a normalization operation object are considered in the normalization operation process, parameters in the normalization operation process are correlated, the difference is prevented from disappearing due to the normalization process, the training effect of the model for image generation is improved, and the quality of generated images is improved.

In a possible embodiment, the predetermined normalized conversion function is

Wherein x is the target feature map, E (x) is the mean of x, Var (x) is the variance of x, y (x) is the above-mentioned intermediate feature map, and γ and β are optionalAnd (5) learning parameters.

Using normalized conversion functions

The feature map processed by the network layer in the deep learning network model is normalized to obtain normalized features, so that the data of the feature map is changed into the distribution with the mean value of 0 and the standard deviation of 1 or the distribution with the range of 0-1, and the effects of shortening the model convergence time and improving the model training effect can be achieved.

In a possible embodiment, the target normalized conversion function is:

wherein Z represents the random vector, A_γRepresenting the first matrix, B_γRepresenting the first vector, A_βRepresenting the above-mentioned second matrix, B_βRepresenting the second vector, wherein Z is an m-dimensional column vector, A_γAnd A_βA matrix of g rows and m columns each consisting of g x m numbers, B_γAnd B_βAre g-dimensional column vectors, where g is set according to the number of channels of the above feature map, where

Wherein C represents the number of channels of the characteristic diagram, G is the target number, and G is an integer.

In order to correlate γ with β and to correlate with the input of the image generation model, the number of channels of the feature map P is 512 according to the dimension of the random vector Z, the number of targets is 16, each target feature map includes 32 channels of feature layers, and the target feature map is represented by x, thereby further calculating the mean e (x) and the variance var (x) of the target feature map. Based on the random vector 512, the number of channels 512 in the feature map and the target number 16, for example, the feature layer contained in each target feature map is 512/16 ═ 32 channels, and a first matrix and a second matrix are respectively generated randomly and are usedThe first matrix is a matrix of 32 × 512, and is marked as A_γThe second matrix is also a 32 × 512 matrix, denoted as A_βRespectively generating a first vector and a second vector according to the number of channels of the characteristic diagram and the target number, wherein the first vector is marked as B_γSecond vector is denoted as B_β，B_γAnd B_βSetting learnable parameters of a preset normalization conversion function according to the random vector, the number of channels of the characteristic diagram and the target number to obtain a target normalization conversion function of 32-dimensional column vectors

Fig. 2 is a schematic diagram illustrating an image processing method according to an exemplary embodiment, as shown in fig. 2, including the steps of:

in step S210, an image to be processed or a random vector to be processed is acquired.

The image processing method of the embodiment of the present disclosure may be implemented by an electronic device, and specifically, the electronic device may be a server.

After the trained image generation model is obtained by training according to any of the training methods for image generation models disclosed in the embodiments, an image to be processed or a random vector to be processed is obtained, wherein the image to be processed can be represented as a vector of a certain dimension.

In step S220, the image to be processed or the random vector to be processed is input into a trained image generation model, which is trained according to any one of the methods for training image generation models disclosed in the above embodiments, and is processed to obtain an image with a specified facial feature.

And inputting the image to be processed or the random vector to be processed into a trained image generation model for processing to obtain an image with specified facial features, wherein the image with the specified facial features can be an image with doll face features, an image with old people face features, an image with female people face features and the like, and the image with certain features is set according to actual needs. For example, if the image generation model trained by any of the training methods for image generation models disclosed in the above embodiments is a model for generating female face features, the image to be processed may be input into the trained image generation model and processed to generate an image having female face features.

In one possible embodiment, the image with the specified facial features is an image with doll face features.

The image with the designated facial features may be an image with doll face features, and if the image generation model trained by the training method of any image generation model disclosed in the above embodiment is a model for generating the doll face features, the image to be processed is input into the trained image generation model for processing, so that the image with the doll face features may be generated.

Fig. 3 is a block diagram illustrating an image generation model training apparatus according to an exemplary embodiment, referring to fig. 3, the apparatus including: the system comprises an acquisition module 310, a processing module 320, a grouping module 330, a setting module 340, a normalization module 350, a processing module 360 and a training module 370.

An acquisition module 310 configured to acquire a random vector, where the random vector is a vector of a preset dimension;

the processing module 320 is configured to input the random vector into an image generation model to be trained for processing, so as to obtain a feature map comprising a plurality of channels; (ii) a

A grouping module 330 configured to group the feature layers of the feature map to obtain a target feature map including a target number of feature layer groups, where the target number is a divisor of a channel number of the feature map;

a setting module 340 configured to set learnable parameters of a preset normalization conversion function according to the random vector, the number of channels of the feature map, and the target number, so as to obtain a target normalization conversion function;

a normalization module 350, configured to perform normalization processing on the target feature map according to the target normalization transformation function, so as to obtain an intermediate feature map;

the processing module 360 is configured to process the intermediate feature map by using the image generation model to be trained to obtain a predicted image;

and a training module 370, configured to adjust parameters of the image generation model to be trained based on the predicted image and a sample image, so as to obtain a trained image generation model, where the sample image is an image with a specified facial feature.

In a possible implementation manner, the setting module 340 is specifically configured to:

In a possible embodiment, the predetermined normalized conversion function is

In a possible embodiment, the target normalized conversion function is:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 4 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment, referring to fig. 4, the apparatus including: an acquisition module 410 and an image processing module 420.

An obtaining module 410 configured to obtain an image to be processed or a random vector to be processed;

an image processing module 420 configured to input the to-be-processed image or the to-be-processed random vector into a trained image generation model for processing, so as to obtain an image with a specified facial feature, where the trained image generation model is trained according to the method of any one of the first aspect.

Optionally, the image with the specified facial feature is an image with a doll face feature.

Fig. 5 is a block diagram of a first electronic device illustrated in accordance with an exemplary embodiment of the present disclosure, referring to fig. 5, for example, electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 5, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, images, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user as described above. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of the touch or slide action but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the above-mentioned communication component 816 further comprises a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing any of the above-described image generation model training methods or any of the above-described image processing methods in any of the above-described embodiments.

FIG. 6 is a schematic diagram of a second type of electronic device shown in accordance with an example embodiment of the present disclosure. For example, the electronic device 900 may be provided as a server. Referring to fig. 6, electronic device 900 includes a processing component 922, which further includes one or more processors, and memory resources, represented by memory 932, for storing instructions, such as applications, that are executable by processing component 922. The application programs stored in memory 932 may include one or more modules that each correspond to a set of instructions. Furthermore, the processing component 922 is configured to execute instructions to perform a training method of an image generation model as described above in any of the above embodiments or an image processing method as described above in any of the above embodiments.

The electronic device 900 may also include a power component 926 configured to perform power management of the electronic device 900, a wired or wireless network interface 950 configured to connect the electronic device 900 to a network, and an input/output (I/O) interface 958. The electronic device 900 may operate based on an operating system stored in the memory 932, such as a Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or similar operating system.

In an embodiment of the present disclosure, there is also provided a storage medium, in which instructions are stored, and when the instructions are executed on a computer, the instructions cause the computer to execute the training method for the image generation model in any one of the above embodiments or the image processing method in any one of the above embodiments. In an exemplary embodiment, a storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the electronic device 800 to perform the above-described method is also provided. Alternatively, the storage medium may be, for example, a non-transitory computer-readable storage medium, such as a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an embodiment of the present disclosure, there is also provided a computer program product including instructions, which when run on a computer, cause the computer to perform the training method of the image generation model in any of the above embodiments or the image processing method in any of the above embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A training method of an image generation model is characterized by comprising the following steps:

and adjusting parameters of the image generation model to be trained based on the predicted image and the sample image to obtain the trained image generation model, wherein the sample image is an image with specified facial features.

2. The method of claim 1, wherein the image generation model is a generative countermeasure network or a variational self-encoder.

3. The method according to claim 1 or 2, wherein the setting learnable parameters of a preset normalization conversion function according to the random vector, the number of channels of the feature map and the target number to obtain a target normalization conversion function comprises:

4. The method of claim 3, wherein the predetermined normalized conversion function is

5. The method of claim 4, wherein the target normalized conversion function is:

6. An image processing method, comprising:

acquiring an image to be processed or a random vector to be processed;

inputting the image to be processed or the random vector to be processed into a trained image generation model for processing to obtain an image with specified facial features, wherein the trained image generation model is obtained by training according to the method of any one of claims 1 to 5.

7. An apparatus for training an image generation model, comprising:

the normalization module is configured to perform normalization processing on the feature map according to the target normalization conversion function to obtain an intermediate feature map;

8. An image processing apparatus characterized by comprising:

the acquisition module is configured to acquire an image to be processed or a random vector to be processed;

an image processing module configured to input the image to be processed or the random vector to be processed into a trained image generation model for processing, so as to obtain an image with specified facial features, wherein the trained image generation model is obtained by training according to the method of any one of claims 1 to 5.

9. An electronic device, comprising: a processor; memory for storing processor-executable instructions, wherein the processor is configured to execute the instructions to implement the method of training an image generation model according to any of claims 1-5 or the method of image processing according to claim 6.

10. A storage medium having stored therein a computer program which, when executed by a processor, implements a method of training an image generation model according to any one of claims 1 to 5 or a method of image processing according to claim 6.