CN115953821A

CN115953821A - Virtual face image generation method and device and electronic equipment

Info

Publication number: CN115953821A
Application number: CN202310174544.9A
Authority: CN
Inventors: 余镇滔; 杨颖�; 王慎纳; 赵祥; 杨帅; 陈粤洋; 王宝元; 彭爽; 李笛
Original assignee: Beijing Hongmian Xiaoice Technology Co Ltd
Current assignee: Beijing Hongmian Xiaoice Technology Co Ltd
Priority date: 2023-02-28
Filing date: 2023-02-28
Publication date: 2023-04-11
Anticipated expiration: 2043-02-28
Also published as: CN115953821B

Abstract

The invention provides a virtual face image generation method, a virtual face image generation device and electronic equipment, which can obtain real face images with random noise and target style types; carrying out typed face feature mapping on random noise to obtain virtual face feature data of a target style type; extracting the features of the real face image to obtain real face feature data; mixing the virtual face feature data and the real face feature data to obtain mixed data; inputting the mixed data into a trained virtual face generator to obtain a virtual face image of a target style type generated by the virtual face generator; the virtual face generator is obtained by training a pre-training model by using first random noise and a real face image of a first style type. The method can effectively improve the image yield of the virtual face image with the target style.

Description

Virtual face image generation method and device and electronic equipment

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a virtual face image generation method and device and electronic equipment.

Background

At present, scenes of social contact, live broadcast, games and the like which relate to character modeling need to use virtual face images, and the demand for the virtual face images is continuously increased. In particular, the prior art may use a virtual face generation model to generate a desired virtual face image.

However, the image yield of the virtual face image of the prior art is low.

Disclosure of Invention

The invention provides a virtual face image generation method, a virtual face image generation device and electronic equipment, which are used for overcoming the defect of low image yield of a virtual face image generated in the prior art and effectively improving the image yield of a virtual face image of a target style type.

The invention provides a virtual face image generation method, which comprises the following steps:

acquiring a real face image of random noise and a target style type;

performing typed face feature mapping on the random noise to obtain virtual face feature data of the target style type;

extracting the features of the real face image to obtain real face feature data;

mixing the virtual face feature data and the real face feature data to obtain mixed data;

inputting the mixed data into a trained virtual face generator to obtain a virtual face image of the target style type generated by the virtual face generator;

the virtual face generator is obtained by training a pre-training model by using first random noise and a real face image of a first style type.

Optionally, the mixing the virtual face feature data and the real face feature data to obtain mixed data includes:

determining a feature mixing weight to be used for mixing processing;

and mixing the virtual face feature data and the real face feature data based on the feature mixing weight to obtain mixed data.

Optionally, the virtual face generator includes a self-attention module, where the self-attention module is configured to optimize a face detail texture in a virtual face image to be generated; the inputting the mixed data into a trained virtual face generator to obtain a virtual face image of the target style type generated by the virtual face generator includes:

and inputting the mixed data into the virtual face generator to obtain the virtual face image generated by the virtual face generator based on the self-attention module to perform face detail texture optimization processing.

Optionally, the virtual face generator is obtained by adding the self-attention module to a virtual face generation network of a virtual face generation model; wherein, in the virtual face generator, starting from the fifth complex convolution stack layer, each complex convolution stack layer comprises the self-attention module.

Optionally, the real face image of the first style type is obtained by the following method, including:

obtaining a first description text and a first real face image; the first description text is an image style description text of the first style type;

inputting the first description text and the first real face image into a trained image style classifier for image style classification, and obtaining an image classification result output by the image style classifier; the image classification result is the classification result of whether the first real face image is the real face image of the first style type;

when the image classification result is yes, determining the first real face image as the real face image of the first style type;

and the image style classifier is obtained by training based on the second description text and the real face image of the second style type.

Optionally, the image style classifier includes an image feature extraction layer, a graph-text association feature extraction layer, and a similarity processing layer, and the image classification result is obtained in the following manner, including:

inputting the first description text into the image-text associated feature extraction layer for text feature extraction, and obtaining a text feature vector output by the image-text associated feature extraction layer;

inputting the first real face image into the image feature extraction layer for image feature extraction, and obtaining an image feature vector output by the image feature extraction layer;

inputting the text feature vector and the image feature vector into the similarity processing layer to obtain the image classification result output by the similarity processing layer; the similarity processing layer is used for determining cosine similarity of the text feature vector and the image feature vector, and generating and outputting the image classification result based on a comparison result of the cosine similarity and a preset similarity threshold.

Optionally, the obtaining of the real face image with random noise and the target style type includes:

obtaining a plurality of real face images;

obtaining one of the random noises by random sampling;

the performing typed face feature mapping on the random noise to obtain virtual face feature data of the target style type includes:

performing typed face feature mapping on the random noise obtained by random sampling to obtain the virtual face feature data;

the extracting the features of the real face image to obtain the real face feature data comprises:

randomly selecting one real face image from a plurality of real face images to determine the real face image to be processed;

and extracting the characteristics of the real face image to be processed to obtain the real face characteristic data.

Optionally, after obtaining the virtual face image of the target style type generated by the virtual face generator, the virtual face image generation method further includes:

and returning to the step of obtaining one random noise through random sampling until the first number of virtual face images of the target style type generated by the virtual face generator are obtained.

The invention provides a virtual human face image generation device, comprising: a first obtaining unit, a second obtaining unit, a third obtaining unit, a fourth obtaining unit and a fifth obtaining unit; wherein:

the first obtaining unit is used for obtaining random noise and a real face image of a target style type;

the second obtaining unit is configured to perform typed face feature mapping on the random noise to obtain virtual face feature data of the target style type;

the third obtaining unit is used for extracting the characteristics of the real face image to obtain real face characteristic data;

the fourth obtaining unit is configured to perform mixing processing on the virtual face feature data and the real face feature data to obtain mixed data;

the fifth obtaining unit is configured to input the mixed data to a trained virtual face generator, and obtain a virtual face image of the target style type generated by the virtual face generator;

The invention also provides an electronic device, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the virtual human face image generation method.

The virtual face image generation method, the virtual face image generation device and the electronic equipment can generate mixed data, wherein the mixed data simultaneously comprise virtual face feature data of a target style and real face feature data of the target style, and after the mixed data are input into the virtual face generator, the virtual face image generated by the virtual face generator can be different from a real face image of a real world while being close to the real face image, the style type of the virtual face image generated by the virtual face generator can be more controllable, the generation probability of the virtual face image of the target style to the virtual face image of the target style by the virtual face generator is effectively improved, the image yield of the virtual face image of the target style can be effectively improved, and the personalized requirement of a user on the virtual face image of the target style can be met.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flow diagram of a virtual face image generation method provided by the present invention;

FIG. 2 is a schematic view of the image style classification process provided by the present invention;

FIG. 3 is a second schematic flow chart of the method for generating a virtual face image according to the present invention;

FIG. 4 is a schematic diagram of a network architecture of a self-attention module provided by the present invention;

FIG. 5 is a schematic diagram of a network structure of a virtual face generator provided by the present invention;

FIG. 6 is a schematic diagram of a data cleansing process provided by the present invention;

FIG. 7 is a third schematic flow chart of a virtual human face image generation method according to the present invention;

FIG. 8 is a schematic structural diagram of a virtual face image generation apparatus provided in the present invention;

fig. 9 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The virtual face image generation method of the present invention is described below with reference to fig. 1 to 7.

As shown in fig. 1, the present invention provides a first virtual face image generation method, which may include the following steps:

s101, acquiring a real face image of random noise and a target style type;

the random noise may be noise obtained through random sampling. Specifically, the random noise may be a random vector having a certain dimension, for example, 512 dimensions, satisfying a gaussian distribution, i.e., a normal distribution.

Specifically, the invention can firstly obtain the probability expression of the standard normal distribution and randomly generate the random noise according to the probability. It will be appreciated that the random noise generated at each random may not be the same.

The target style type may be a specific image style type, such as a neighboring girl. The invention may be specified by the technician as a target style type.

It should be noted that the virtual face generation model can be trained by using random noise as sample data, so that the virtual face generation model has the capability of generating a virtual face image based on the random noise. The virtual face generation model may include a feature mapper and a virtual face generation network, the feature mapper may be a network built by a multi-layer fully-connected network, the feature mapper may be configured to perform feature mapping on random noise to generate corresponding virtual face feature data, and the virtual face generation network may be configured to generate and output a virtual face image based on the virtual face feature data. When the random noise of the input virtual face generation model is different, the virtual face images generated by the virtual face generation model are also different, and the virtual face images can be embodied on the face features such as gender, posture, expression and the like. It should be noted that the model structure of the virtual face generation model can be implemented by a model structure with virtual face generation capability, such as the model structure of the StyleGAN series model.

It should be noted that the virtual face generation model can be improved and a real face image of a specific style type is introduced on the basis of generating a virtual face image based on the virtual face generation model and random noise, so that the virtual face generation model can generate a virtual face image which is closer to a real face and has the specific style type.

S102, performing typed face feature mapping on random noise to obtain virtual face feature data of a target style type;

it should be noted that the present invention can train and improve the feature mapper in the virtual face generation model, so that the feature mapper can map out the virtual face feature data of the target style type.

The virtual face feature data of the target style type can be face feature data of a virtual face of the target style type.

Specifically, the invention can input random noise into a trained and improved feature mapper to obtain virtual human face feature data of a target style type output by the feature mapper through feature mapping on the random noise.

It should be noted that, the present invention may input random noise into the feature mapper by using the random noise as sample data, obtain mapping data output by the feature mapper, and update the network structure and parameters of the feature mapper based on the difference between the mapping data and the virtual face feature data of the target style until the feature mapper is trained to have a better capability of mapping the random noise into the virtual face feature data of the target style.

S103, extracting the features of the real face image to obtain real face feature data;

the real face feature data is image feature data of a real face image.

Specifically, the invention can input the real face image into the image feature extractor to obtain the real face feature data output by the image feature extractor.

S104, mixing the virtual face feature data and the real face feature data to obtain mixed data;

the mixed data is generated by mixing the virtual face feature data and the real face feature data.

Specifically, the virtual face feature data and the real face feature data can be mixed after the virtual face feature data and the real face feature data are obtained, and mixed data are obtained.

Optionally, on the basis of a network structure of a virtual face generation model including a feature mapper and a virtual face generation network, the virtual face generator may further include an image feature extractor and a hybrid processor to extract image feature data of a real face image and generate hybrid data, and improve and train the virtual face generation network to obtain a trained virtual face generation network, that is, the trained virtual face generator of the present invention, so that the virtual face generator may generate a virtual face image of a specific style type based on the hybrid data.

S105, inputting the mixed data into a trained virtual face generator to obtain a virtual face image of a target style type generated by the virtual face generator;

The pre-training model may be a model capable of generating a virtual face image based on random noise, such as a virtual face generation network in a virtual face generation model.

The first random noise and the real face image of the first style type may be training data for training a pre-training model.

Optionally, the invention may first train a basic model (such as the above virtual face generation network) with a virtual face image generation capability to obtain a pre-training model with good model performance. Specifically, the method can obtain random noise and a real face image, determine the real face image as a positive sample, input the random noise into the basic model, obtain a virtual face image output by the basic model based on the random noise, use the virtual face image as a negative sample, and determine whether the positive sample and the negative sample have a difference based on a discriminator, such as a convolution network structure which is trained on line and distinguishes the positive sample and the negative sample. It can be understood that, when the model performance of the base model meets the training requirement, the invention can determine the base model whose model performance meets the training requirement as the pre-training model.

Specifically, after the pre-training model is obtained, the trained virtual face generator can be obtained by adding the feature mapper, the image feature extractor and the mixing processor on the basis of the network structure of the pre-training model and performing fine tuning on the pre-training model, namely performing further model training. Specifically, the method can obtain first virtual face feature data and first real face feature data corresponding to first random noise and a first style real face image respectively based on a feature mapper and an image feature extractor, input the first virtual face feature data and the first real face feature data into a mixing processor for mixing processing to generate first mixed data, input the first mixed data into a pre-training model to obtain a virtual face image output by the pre-training model, and update model parameters of the pre-training model based on a difference between the virtual face image and the first style real face image, so that the pre-training model can have model performance of generating the first style virtual face image based on the first random noise and the first style real face image.

Wherein the first random noise may include a plurality of random noises; the first genre may include one or more image genre types. Optionally, the first style type may or may not include the target style type. The real face image of the first genre may include a plurality of real face images of respective genre. The pre-training model with the model performance meeting the requirements can be determined to be the trained virtual face generator when the pre-training model with the model performance meeting the requirements is trained for multiple times by using the first random noise and the real face image with the first style.

Specifically, after the mixed data is obtained, the mixed data can be input into a trained virtual face generator, and the virtual face generator generates a virtual face image of a target style type based on the mixed data. It should be noted that the mixed data simultaneously includes the virtual face feature data of the target style type and the real face feature data of the target style type, so that the virtual face image generated by the virtual face generator can be different from the real face image of the real world while approaching to the real face image, the style type of the virtual face image generated by the virtual face generator can be more controllable, the generation probability of the virtual face image of the target style type by the virtual face generator is effectively improved, the image yield of the virtual face image of the target style type is improved, and the personalized requirement of the user on the virtual face image of the target style type can be met.

The virtual face image generation method provided by the invention can obtain the real face image with random noise and target style; performing typed face feature mapping on random noise to obtain virtual face feature data of a target style type; extracting the features of the real face image to obtain real face feature data; mixing the virtual face feature data and the real face feature data to obtain mixed data; inputting the mixed data into a trained virtual face generator to obtain a virtual face image of a target style type generated by the virtual face generator; the virtual face generator is obtained by training a pre-training model by using first random noise and a real face image of a first style type. The generated mixed data simultaneously comprises the virtual face feature data of the target style type and the real face feature data of the target style type, after the mixed data is input into the virtual face generator, the virtual face image generated by the virtual face generator can be close to the real face image and can be different from the real face image of the real world, the style type of the virtual face image generated by the virtual face generator can be more controllable, the generation probability of the virtual face image of the target style type by the virtual face generator is effectively improved, the image yield of the virtual face image of the target style type can be effectively improved, and the individual requirements of users on the virtual face image of the target style type can be met.

Based on fig. 1, the present invention provides a second virtual face image generation method, which may include the following steps:

s201, obtaining a plurality of real face images;

wherein, the plurality of real face images may all be real face images of a target style type.

Optionally, the plurality of real face images may be different face images in the target style type.

Optionally, the plurality of real face images may include the same face image in the target style type, or may include different face images in the target style type, for example, may include a plurality of face images of zhang san and a plurality of face images of lie san qi;

it should be noted that the plurality of real face images may be obtained by image screening performed manually by a technician.

S202, obtaining random noise through random sampling;

specifically, the random noise may be obtained by random sampling. It will be appreciated that the random noise obtained for each random sampling may not be the same.

Specifically, steps S201 and S202 may be a specific implementation of step S101.

It should be noted that the present invention can generate a plurality of virtual face images of a target style type in batch based on a plurality of random noises and a plurality of real face images. At the moment, the invention can establish a production line for generating virtual face images of specific styles in batches, and generate a large amount of virtual face images of specific styles and types which do not exist in the real world in a short time, thereby effectively improving the generation efficiency of the virtual face images.

S203, performing typed face feature mapping on random noise obtained by random sampling to obtain virtual face feature data;

specifically, after the random noise is obtained, the random noise is input into the improved feature mapper, and the virtual human face feature data of the target style type output by the feature mapper is obtained.

It should be noted that step S203 may be a specific implementation of step S102.

S204, selecting a real face image from a plurality of real face images to determine the real face image to be processed;

specifically, the method can randomly select one real face image from a plurality of real face images with target style types to determine the real face image to be processed.

S205, extracting the features of the real face image to be processed to obtain real face feature data;

it should be noted that steps S204 and S205 may be a specific implementation of step S103.

S206, mixing the virtual face feature data and the real face feature data to obtain mixed data;

step S206 corresponds to the content of step S104 described above.

S207, inputting the mixed data into a trained virtual face generator to obtain a virtual face image of a target style type generated by the virtual face generator;

step S207 corresponds to the content of step S105 described above.

And S208, returning to execute the step S202 until a first number of virtual face images of the target style type generated by the virtual face generator are obtained.

Specifically, after obtaining a virtual face image of the target style type output by the virtual face generator, the present invention returns to step S202, obtains new random noise again through random sampling, obtains a real face image of the target style type again, and generates another virtual face image of the target style type based on the random noise and the real face image again.

It can be understood that the real face image of the target style type obtained by the present invention may be the same as or different from the previously obtained real face image.

It should be noted that random noise obtained by random sampling at each time is different from each other in a high probability, so that whether the real face image of the target style type obtained by the present invention is a repeated image or not, it can be effectively ensured that input data input to the virtual face generator at each time is not completely the same, so that it can be effectively ensured that virtual face images generated by the virtual face generator at each time can be different, and further, the diversity of the virtual face images can be effectively improved.

The first quantity can be a numerical value designated by a technician, and the value of the first quantity can be a smaller value or a larger value so as to meet the production requirement of mass production of the virtual face images in a shorter time.

Optionally, the method of the invention may also be configured to extract image feature data of each real face image after obtaining a plurality of real face images of the target style type; then, when a virtual face generator is used for generating a virtual face image each time, image feature data of a real face image is randomly selected from image feature data of each real face image, random noise is obtained through random sampling, virtual face feature data of a target style type obtained through feature mapping of the random noise is obtained, the randomly selected image feature data and the virtual face feature data are mixed to obtain mixed data, the mixed data are input to the virtual face generator, and the virtual face image generated and output by the virtual face generator is obtained.

The virtual face image generation method provided by the invention can realize a production line for generating virtual face images of specific style types in batches, can generate a large amount of virtual face images of specific style types which do not exist in the real world in a short time, and effectively improves the generation efficiency of the virtual face images.

Based on fig. 1, the invention provides a third virtual face image generation method, and a real face image of a first style type can be obtained by the following method:

obtaining a first description text and a first real face image; the first description text is an image style description text of a first style type;

when the image classification result is yes, determining the first real face image as a real face image of a first style type;

The image style description text is related to the image style, for example, the image style description text may include names of the image style.

In particular, the first genre type may include a particular genre type, such as a target genre type. It can be understood that, when the first style type includes the specific style type, the invention can effectively enhance the model performance of the virtual face image of the specific style type generated by the virtual face generator based on the random noise and the real face image of the specific style type, and improve the image yield of the virtual face image of the specific style type.

Wherein the second description text may include image style description text of certain image style types; the second genre may include certain image genre types.

It can be understood that, for an image style description text and a real face image, if the image style type described by the image style description text is identical to the image style type to which the real face image belongs, the image style description text and the real face image are corresponding; otherwise, the image style description text and the real face image are not corresponding.

Specifically, in the second description text and the real face image of the second style type, the corresponding image style description text and the real face image are used as positive samples, the image style description text and the real face image which do not correspond are used as negative samples, and the positive samples and the negative samples are used for training the image style classifier to be trained, so that the image style classifier to be trained has the model performance of determining whether the real face image meets the style requirements based on the image style description text. When the performance of the model meets the performance requirement, the image style classifier to be trained can be determined as the trained image style classifier. The image style classifier to be trained can be a basic model with image style classification capability.

It can be understood that, in the case that the image classification result output by the image style classifier is no, that is, in the case that the first real face image is not the real face image of the first style type, the present invention may prohibit the first real face image from being determined as the real face image of the first style type.

It should be noted that, the image style filtering method and device can perform image style filtering on the first real face image through the setting of the image style classifier, determine the first real face image meeting the image style requirement as the real face image of the first style type, and effectively ensure the acquisition of the real face image of the first style type, thereby effectively improving the subsequent training efficiency of the virtual face generator and improving the image yield of the virtual face generator when the virtual face generator is used for generating the virtual face image of the specific style type.

Optionally, the image style classifier includes an image feature extraction layer, a graph-text association feature extraction layer, and a similarity processing layer, and the image classification result may be obtained as follows:

inputting the first description text into a picture-text associated feature extraction layer for text feature extraction, and obtaining a text feature vector output by the picture-text associated feature extraction layer;

inputting the first real face image into an image feature extraction layer for image feature extraction, and obtaining an image feature vector output by the image feature extraction layer;

inputting the text feature vector and the image feature vector into a similarity processing layer to obtain an image classification result output by the similarity processing layer; the similarity processing layer is used for determining cosine similarity of the text feature vector and the image feature vector, and generating and outputting an image classification result based on a comparison result of the cosine similarity and a preset similarity threshold.

The image-text associated feature extraction layer can be a model, such as a clip model, which can be used for associating images and texts together in the prior art.

The preset similarity threshold may be determined by a technician according to actual needs, which is not limited in the present invention.

Specifically, when the cosine similarity between the text feature vector and the image feature vector is smaller than a preset similarity threshold, the similarity processing layer can determine that the image style type of the first real face image does not meet the style requirement, namely the image style of the first real face image is not the first style type, can generate and output a corresponding image classification result, namely the first real face image is not the face image of the first style type, and at the moment, the invention can prohibit the first real face image from being determined as the face image of the first style type;

specifically, when the cosine similarity between the text feature vector and the image feature vector is not less than the preset similarity threshold, the similarity processing layer may determine that the image style type of the first real face image meets the style requirement, that is, the image style of the first real face image is the first style type, and may generate and output a corresponding image classification result, that is, the first real face image is a face image of the first style type.

As shown in the schematic diagram of the image style classification processing flow shown in fig. 2, the present invention may input a first real face image to an image feature extraction layer for image feature extraction, to obtain an image feature vector output by the image feature extraction layer, input a first description text to an image-text associated feature extraction layer for text feature extraction, to obtain a text feature vector output by the image-text associated feature extraction layer, and input the image feature vector and the text feature vector to a similarity processing layer; then, the similarity processing layer can calculate cosine similarity of the image feature vector and the text feature vector, compare the calculated cosine similarity with a preset similarity threshold, and determine whether the first real face image meets the image style requirement based on the comparison result.

It should be noted that, the invention effectively guarantees the function realization of the image style classifier through the arrangement of the image feature extraction layer, the image-text associated feature extraction layer and the similarity processing layer, thereby effectively guaranteeing the acquisition of the real face image of the first style type, effectively improving the subsequent training efficiency of the virtual face generator, and improving the image yield of the virtual face generator when the virtual face generator is used for generating the virtual face image of the specific style type.

The virtual face image generation method provided by the invention can be used for filtering the image style of the first real face image through the setting of the image style classifier, determining the first real face image meeting the image style requirement as the real face image of the first style type, and effectively ensuring the acquisition of the real face image of the first style type, thereby effectively improving the subsequent training efficiency of the virtual face generator and improving the image yield of the virtual face generator when the virtual face generator is used for generating the virtual face image of the specific style type.

Based on fig. 1, the present invention provides a fourth virtual face image generation method, in which the step S104 may include the following steps:

s1041, determining a feature mixing weight to be used for mixing;

the feature mixing weight may be a weight that is set by a skilled person according to actual conditions, and the present invention is not limited thereto.

Specifically, the feature mixing weight may include a feature mixing weight of the virtual face feature data and a feature mixing weight of the real face feature data. The sum of the feature mixing weight of the virtual face feature data and the feature mixing weight of the real face feature data may be 1.

S1042, based on the feature mixed weight, carrying out mixed processing on the virtual face feature data and the real face feature data to obtain mixed data.

Specifically, the invention can perform weighting mixing processing on the virtual face feature data and the real face feature data based on the virtual face feature data, the real face feature data and the feature mixing weight to obtain corresponding mixed data.

In order to better explain the processing procedure of the fourth virtual face image generation method, the present invention is proposed and described with reference to fig. 3.

As shown in fig. 3, the present invention may first obtain a plurality of real face images of a target style type, respectively input the plurality of real face images to the image feature extractor, obtain each real face feature data output by the image feature extractor, and randomly select one real face feature data w2 from each real face feature data; obtaining random noise through random sampling, inputting the random noise into a feature mapper, and obtaining virtual human face feature data w1 output by the feature mapper; determining the feature mixing weight Ratio of the virtual face feature data w1, and determining the feature mixing weight of the real face feature data w2 as 1-Ratio; inputting the virtual face feature data w1 and the real face feature data w2 into a mixing processor for weighting and mixing processing to obtain corresponding mixed data, inputting the mixed data into a trained virtual face generator, and obtaining a virtual face image of a target style type generated and output by the virtual face generator.

The virtual face image generation method provided by the invention can effectively realize the acquisition of the virtual face feature data and the real face feature data, and can effectively realize the generation of the mixed data by performing weighting mixed processing on the virtual face feature data and the real face feature data through weighting mixed processing, thereby effectively ensuring the image yield of the virtual face image of the target style type.

Based on fig. 1, the present invention provides a fifth virtual face image generation method, in which the virtual face generator includes a self-attention module, and the self-attention module is configured to optimize face detail texture in a virtual face image to be generated; at this time, step S105 may include:

and inputting the mixed data into a trained virtual face generator to obtain a virtual face image generated by the virtual face generator based on face detail texture optimization processing of the self-attention module.

It should be noted that, on the basis of the network structure of the virtual face generation network, the invention can add a self-attention module in the middle layer to obtain the virtual face generator, so that the middle layer in the virtual face generator can learn the detail texture of the face more adaptively according to the output information of the previous layer, and the virtual face generator can generate a virtual face image which optimizes the expression of the face detail texture based on the mixed data.

As with the network structure of the self-attention module shown in fig. 4, the network structure of the self-attention module may be the same as in the prior art. In FIG. 4, non-local indicates that if the block is to perform image upsampling, then non-local is the upsampling algorithm, otherwise non-local indicates that no operation is performed; conv2d represents a 2-dimensional convolutional neural network; leak relu is a common activation function.

Optionally, the virtual face generator is obtained by adding a self-attention module to a virtual face generation network of the virtual face generation model; and in the virtual face generator, starting from the fifth complex convolution stack layer, each complex convolution stack layer comprises a self-attention module.

As shown in fig. 5, the network structure diagram of the virtual face generator may be formed by adding a Self-attention module, which is a Self-attention module, to the virtual face generation network in the virtual face generation model. It should be noted that, in fig. 5, except for the Self-attribute module, other network structures may be network structures in the virtual face generation model.

In fig. 5, mapping may be a feature mapper for performing feature Mapping processing on random noise in the virtual face generation model; a may be a Fourier transformer, which may be used to transform input data into a corresponding Fourier feature; conv1x1 represents a convolution with a convolution kernel of 1; the L0-L13 structures are basically consistent and are all formed by stacking complex convolutions, namely complex convolution stacking layers which are used for carrying out up-sampling on features and generating a virtual face image by stacking layers by layers; toRGB is a simple convolutional layer, and the main purpose is to convert the input data into a final RGB color pattern image.

It should be noted that, after experiments performed by the inventor of the present invention, it is found that information output by the L0 to L4 layers (such as L0-L4 shown in fig. 5) of the virtual face generation model focuses more on the overall information of the face, including the face shape, the position of five sense organs, and the like. And the layers L5 to L13 focus more on face detail textures, such as hairlines and face textures. In order to optimize the generated face detail texture of the image, the invention can start from the fifth complex convolution stack layer, namely the L4 layer, and a Self-attention module is arranged behind each complex convolution stack layer, so that the Self-attention module in the virtual face generator can more adaptively learn the detail texture information of the face according to the output information of the previous layer, enhance the face detail texture expression of the generated image, reduce the artifact and improve the generation quality of the virtual face image.

Alternatively, the virtual face generation model may be implemented by StyleGAN. It is understood that, in addition to the StyleGAN, the virtual face generation model described above may also be implemented by other models having similar or same network structures (such as network structures including Mapping and virtual face generation networks) as the StyleGAN.

The virtual face image generation method provided by the invention can effectively enhance the image generation quality of the virtual face generator through the setting of the self-attention module.

As shown in the schematic diagram of the data cleaning process shown in fig. 6, the present invention may acquire images of potential image materials by means of webworms, and then perform data cleaning on a plurality of acquired images to obtain the real face image of the first style. As shown in fig. 6, the data cleansing process may include face detection, image quality filtering, color value filtering, style filtering, gender and age filtering, and face alignment.

Specifically, after a plurality of images are collected, the face detection algorithm is used for respectively carrying out face detection on each image, and images which do not contain faces are filtered out; meanwhile, images with low face quality, such as fuzzy and cartoon images, can be filtered by controlling a confidence threshold value in face detection, and incomplete images of the face can be filtered by controlling an iou threshold value, so that the face image containing the whole face can be obtained.

Specifically, after the face detection is performed, the image quality filtering can be performed on the remaining face images. The image quality of each face image can be respectively graded through an image quality filtering algorithm, the grading threshold value is set according to experience, and the face images lower than the grading threshold value are filtered;

specifically, the invention can filter the color value of the remaining face image after the image quality is filtered. The invention can filter out the face image with lower face value through the existing face value evaluation algorithm;

specifically, the style of the remaining face image can be filtered after the color value is filtered. The image style classifier can be used for carrying out style filtering on the rest face images. The invention can input the face image and the image style description text of the appointed style type into the image style classifier, determine whether the input face image is the face image of the appointed style type, and filter out the face image which is not the face image of the appointed style type;

specifically, the gender and age filtering can be performed on the remaining face images after the style filtering is performed. The invention can adopt the existing gender and age classification model to classify the remaining face images and screen out the face images with both gender and age meeting the requirements;

specifically, the invention can align the faces of the remaining face images after gender and age filtering. The method can predict five key points in the face image by using the face key point model in advance, then calculate a transformation matrix based on the set template face key points, and obtain the face image after face alignment through radiation transformation.

Specifically, after the face alignment is carried out, the face image of the target style type can be manually screened out from the face image after the face alignment, and the style type, such as the style of a neighboring girl, is labeled. At this time, the present invention may determine the face image labeled with the style type as the real face image of the first style type.

The virtual face image generation method provided by the invention can obtain the real face image of the first style type meeting the relevant requirements through image acquisition and data cleaning, and effectively ensure the training effect of using the real face image of the first style type for training the virtual face generator, thereby effectively ensuring the training efficiency of the virtual face generator and improving the image yield of the virtual face generator when the virtual face generator is used for generating the virtual face image of the specific style type.

As shown in fig. 7, the sixth virtual human face image generation method provided by the present invention may include the steps of image acquisition, data cleaning, model training, virtual human face image generation, and the like. The method can acquire training data for training the virtual face generator by acquiring images and performing data cleaning on the images; then, the invention can use the training data to train the virtual face generator, and obtain the trained virtual face generator; the trained virtual face generation can then be used by the present invention to generate virtual face images of a particular style type, such as a target style type.

The virtual face image generation method provided by the invention can obtain the real face image of the first style type meeting the relevant requirements through image acquisition and data cleaning, and effectively ensure the training effect of utilizing the first real face image for training the virtual face generator, thereby effectively ensuring the image yield of the virtual face image of the target style type and improving the image yield of the virtual face generator when the virtual face generator is used for generating the virtual face image of the specific style type.

The following describes the virtual face image generation apparatus provided by the present invention, and the virtual face image generation apparatus described below and the virtual face image generation method described above may be referred to in correspondence with each other.

As shown in fig. 8, the present invention provides a virtual face image generating device, including: a first obtaining unit 801, a second obtaining unit 802, a third obtaining unit 803, a fourth obtaining unit 804, and a fifth obtaining unit 805; wherein:

a first obtaining unit 801, configured to obtain a real face image of random noise and a target style type;

a second obtaining unit 802, configured to perform typed face feature mapping on the random noise to obtain virtual face feature data of a target style type;

a third obtaining unit 803, configured to perform feature extraction on the real face image to obtain real face feature data;

a fourth obtaining unit 804, configured to perform mixing processing on the virtual face feature data and the real face feature data to obtain mixed data;

a fifth obtaining unit 805, configured to input the mixed data into a trained virtual face generator, and obtain a virtual face image of a target style type generated by the virtual face generator;

It should be noted that, the processing procedures of the first obtaining unit 801, the second obtaining unit 802, the third obtaining unit 803, the fourth obtaining unit 804 and the fifth obtaining unit 805 and the technical effects brought by the processing procedures may refer to the related descriptions of steps S101 to S105 in fig. 1, respectively, and are not repeated.

Optionally, the fourth obtaining unit 804 includes: a first determination unit, a mixing processing unit and a sixth obtaining unit;

a first determination unit configured to determine a feature mixing weight to be used for mixing processing;

the mixed processing unit is used for mixing the virtual human face feature data and the real human face feature data based on the feature mixed weight;

a sixth obtaining unit configured to obtain the mixed data.

Optionally, the virtual face generator includes a self-attention module, where the self-attention module is used to optimize face detail texture in the virtual face image to be generated; a fifth obtaining unit 805, configured to input the mixed data to a virtual face generator, and obtain a virtual face image generated by the virtual face generator based on face detail texture optimization processing performed by the self-attention module.

Optionally, the virtual face generator is obtained by adding a self-attention module to a virtual face generation network of the virtual face generation model; in the virtual face generator, starting from the fifth complex convolution stack layer, a self-attention module is included after each complex convolution stack layer.

Optionally, the real face image of the first style type is obtained by:

Optionally, the image classification result is obtained as follows:

inputting the text feature vector and the image feature vector into a similarity processing layer to obtain an image classification result output by the similarity processing layer; the similarity processing layer is used for determining cosine similarity between the text feature vector and the image feature vector, and generating and outputting an image classification result based on a comparison result of the cosine similarity and a preset similarity threshold.

Optionally, the first obtaining unit 801 includes: a seventh obtaining unit and an eighth obtaining unit;

a seventh obtaining unit configured to obtain a plurality of real face images;

an eighth obtaining unit, configured to obtain a random noise through random sampling;

a second obtaining unit 802, configured to perform typed face feature mapping on random noise obtained by random sampling to obtain virtual face feature data;

a third obtaining unit 803, which includes a random selecting unit and a ninth obtaining unit;

the random selection unit is used for randomly selecting one real face image from a plurality of real face images and determining the real face image as a to-be-processed real face image;

and the ninth obtaining unit is used for extracting the characteristics of the real face image to be processed to obtain the real face characteristic data.

Optionally, the ninth obtaining unit is further configured to trigger the eighth obtaining unit after performing feature extraction on the real face image to be processed to obtain image feature data, until the first number of virtual face images of the target style type are obtained.

The virtual face image generation device provided by the invention can obtain real face images with random noise and target style types; carrying out typed face feature mapping on random noise to obtain virtual face feature data of a target style type; extracting the features of the real face image to obtain real face feature data; mixing the virtual face feature data and the real face feature data to obtain mixed data; inputting the mixed data into a trained virtual face generator to obtain a virtual face image of a target style type generated by the virtual face generator; the virtual face generator is obtained by training a pre-training model by using first random noise and a real face image of a first style type. The generated mixed data simultaneously comprises the virtual face feature data of the target style type and the real face feature data of the target style type, after the mixed data is input into the virtual face generator, the virtual face image generated by the virtual face generator can be close to the real face image and can be different from the real face image of the real world, the style type of the virtual face image generated by the virtual face generator can be more controllable, the generation probability of the virtual face image of the target style type by the virtual face generator is effectively improved, the image yield of the virtual face image of the target style type can be effectively improved, and the individual requirements of users on the virtual face image of the target style type can be met.

Fig. 9 illustrates a physical structure diagram of an electronic device, and as shown in fig. 9, the electronic device may include: a processor (processor) 910, a communication Interface (Communications Interface) 920, a memory (memory) 930, and a communication bus 940, wherein the processor 910, the communication Interface 920, and the memory 930 are coupled for communication via the communication bus 940. Processor 910 may invoke logic instructions in memory 930 to perform a virtual face image generation method comprising:

acquiring a real face image of random noise and a target style type;

carrying out typed face feature mapping on random noise to obtain virtual face feature data of a target style type;

inputting the mixed data into a trained virtual face generator to obtain a virtual face image of a target style type generated by the virtual face generator;

Furthermore, the logic instructions in the memory 930 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to execute the virtual face image generation method provided by the above methods, the method including:

acquiring a real face image of random noise and a target style type;

performing typed face feature mapping on random noise to obtain virtual face feature data of a target style type;

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A virtual human face image generation method is characterized by comprising the following steps:

acquiring a real face image of random noise and a target style type;

2. The virtual face image generation method according to claim 1, wherein the mixing the virtual face feature data and the real face feature data to obtain mixed data includes:

determining a feature mixing weight to be used for mixing processing;

3. The virtual face image generation method according to claim 1, wherein the virtual face generator comprises a self-attention module, and the self-attention module is used for optimizing the face detail texture in the virtual face image to be generated; the inputting the mixed data into a trained virtual face generator to obtain a virtual face image of the target style type generated by the virtual face generator includes:

4. The virtual face image generation method according to claim 3, wherein the virtual face generator is obtained by adding the self-attention module to a virtual face generation network of a virtual face generation model; in the virtual face generator, starting from the fifth complex convolution stack layer, each complex convolution stack layer includes the self-attention module.

5. The virtual face image generation method according to claim 1, wherein the real face image of the first style type is obtained by:

6. The virtual human face image generation method according to claim 5, wherein the image style classifier comprises an image feature extraction layer, a graph-text association feature extraction layer and a similarity processing layer, and the image classification result is obtained by:

7. The virtual face image generation method according to claim 1, wherein the obtaining of the real face image of random noise and the target style type comprises:

obtaining a plurality of said real face images;

obtaining one of the random noises by random sampling;

8. The virtual face image generation method according to claim 7, wherein after said obtaining the virtual face image of the target style type generated by the virtual face generator, the virtual face image generation method further comprises:

9. A virtual face image generation apparatus, comprising: a first obtaining unit, a second obtaining unit, a third obtaining unit, a fourth obtaining unit and a fifth obtaining unit; wherein:

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the virtual face image generation method according to any one of claims 1 to 8 when executing the program.