WO2021169556A1

WO2021169556A1 - Method and apparatus for compositing face image

Info

Publication number: WO2021169556A1
Application number: PCT/CN2020/140440
Authority: WO
Inventors: 马骁勇; 申皓全; 王铭学
Original assignee: 华为技术有限公司
Priority date: 2020-02-29
Filing date: 2020-12-28
Publication date: 2021-09-02
Also published as: CN113327191A; CN113327191B

Abstract

A method and apparatus for compositing a face image, relating to the field of artificial intelligence. The face image can be composited by performing attribute information correction on the basis of a real face image, the authenticity of a composited face image is improved, and the compositing efficiency is improved. The method comprises: obtaining first attribute information, the first attribute information being attribute information comprised in a face image to be composited; according to the first attribute information, searching a real face image library for a first face image, the first face image comprising second attribute information, and a repetitive rate of the second attribute information and the first attribute information satisfying a threshold requirement; obtaining attribute difference information according to the first attribute information and the second attribute information, the attribute difference information being used for representing an attribute difference between the first face image and the face image to be composited; performing facial feature extraction on the first face image to obtain first facial feature information of the first face image; and compositing a second face image according to the first facial feature information and the attribute difference information.

Description

Human face image synthesis method and device

This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office on February 29, 2020, the application number is 202010132570.1, and the invention title is "Face image synthesis method and device", the entire content of which is incorporated into this application by reference middle.

Technical field

This application relates to the field of artificial intelligence (AI), and in particular to a method and device for synthesizing a face image.

Background technique

Face image synthesis technology is widely used in fields such as photographing entertainment, medical plastic surgery and so on. For example, in the field of photography and entertainment, users can make changes to several attributes of the photo, such as double eyelids, big eyes, and face-lifting. For another example, in the field of medical plastic surgery, doctors can modify the current user's photo based on the user's description to generate a postoperative effect map.

Currently, face image synthesis methods mainly include the following two. The first is the traditional image processing method. The usual method is to build a template library that contains a variety of partial images of the facial features. Then the painter selects the facial features in the template library for splicing according to the description of the witnesses, and finally smoothes the edges of the spliced image to generate a face image. However, it is difficult to guarantee the authenticity of the synthesized face image by simply splicing the partial images of the facial features. Moreover, due to the two subjective influences of the painter and the eyewitness, there may be a gap between the synthesized face image and the actual face image required.

The second is a deep learning method, which uses massive face image data to train a deep neural network using the method of confrontation generation. Later, the trained neural network is used to generate face images. However, the trained neural network cannot synthesize face images containing user-specified attribute information.

Summary of the invention

The face image synthesis method provided in the present application can realize face image synthesis based on real face images, can obtain more realistic face images that meet the requirements, and have high synthesis efficiency.

In order to achieve the above objectives, this application adopts the following technical solutions:

In a first aspect, the present application provides a face image synthesis method, which may include: acquiring first attribute information; the first attribute information is attribute information included in the face image to be synthesized. According to the first attribute information, the first face image is searched in the real face image database; the first face image contains the second attribute information, and the repetition rate of the second attribute information and the first attribute information meets the threshold requirement. According to the first attribute information and the second attribute information, the attribute difference information is obtained; the attribute difference information is used to indicate the attribute difference between the first face image and the face image to be synthesized. Perform facial feature extraction on the first face image to obtain first facial feature information of the first face image. The second face image is synthesized according to the first facial feature information and the attribute difference information.

The first attribute information may be collected by the user and input into the face image synthesis device, and the face image synthesis device may synthesize the face image based on the attribute information. Exemplarily, the process for the user to collect the first attribute information includes: the face image synthesis device creates a face image attribute information questionnaire according to all the attribute information required to synthesize the face image, and sends the face image attribute information questionnaire to the terminal Device, the user fills in the terminal device and returns to the face image synthesis device, and then the face image synthesis device obtains the attribute information. In this way, the face image synthesis device can collect the attribute information of the face image to be synthesized from multiple dimensions, so that the final synthesized face image is closer to the desired face image.

The attribute difference information is used to indicate the difference between the attribute information contained in the searched first face image and the first attribute information, and the attribute information that needs to be corrected in the first face image can be determined, and then the first face image can be corrected according to the attribute difference information. Face image.

In this way, the face image synthesis method provided by the embodiment of the present application may not require the participation of professionals, has high efficiency, and is convenient for promotion. And based on real face images for face image synthesis, a more realistic face image can be obtained.

In a possible implementation manner, obtaining the attribute difference information according to the first attribute information and the second attribute information includes: obtaining the first attribute vector according to the first attribute information. The second attribute vector is obtained according to the second attribute information; the length of the first attribute vector and the second attribute vector are the same, and each bit corresponds to one type of attribute information. The first vector is obtained according to the difference between the first attribute vector and the second attribute vector; the first vector is used to represent attribute difference information.

Exemplarily, if the position values corresponding to the first attribute vector and the second attribute vector are different, indicating that the attribute information corresponding to the position needs to be modified, the value of the position is set as the first attribute vector in the output first vector The value of the position in. If the value of the corresponding position of the first attribute vector and the second attribute vector is the same, it means that the attribute information corresponding to the position does not need to be modified, and the value of the position in the output first vector can be set as a meaningless sign. In this way, after the first vector representing the attribute difference information is obtained, the attribute information of the first face image can be subsequently corrected according to the value of the meaningful position in the first vector.

In a possible implementation manner, performing facial feature extraction on the first face image to obtain first facial feature information of the first face image includes: inputting the first face image into the first neural network to obtain the first face image Two vectors; where the first neural network is used to extract the facial feature information of the input face image; the second vector is used to represent the first facial feature information of the first face image.

Wherein, the face image may include multiple facial feature information that cannot be exhaustively listed. The facial feature information may be used to represent the corresponding face image. The attribute information contained in the first face image may be the first face image face. Part of the feature information. Each person’s facial feature information is different. You can use facial feature information to specifically distinguish different people. For example, Zhang San’s facial feature information is different from Li Si’s facial feature information. You can use Zhang San’s facial feature information to quickly determine who is Zhang. three. Further, facial feature extraction is to obtain facial feature information that can represent the face image.

In this way, the first neural network is used to perform facial feature extraction on the first face image to obtain first facial feature information representing the first face image.

In a possible implementation manner, synthesizing the second face image according to the first facial feature information and the attribute difference information includes: obtaining a first vector and a second vector; wherein the first vector is used to represent the attribute difference information ; The second vector is used to represent the first facial feature information of the first face image. The first vector and the second vector are spliced and input into the second neural network to obtain the second face image; wherein, the second neural network is used to correct the first facial feature information of the first face image according to the attribute difference information.

Wherein, the splicing of the first vector and the second vector may include splicing the first vector directly after the second vector, and the second neural network corrects the part representing the attribute information in the second vector according to the first vector. For example, the length of the part representing the attribute information in the first vector and the second vector is the same, and each of them corresponds to a type of facial image attribute information. In this way, the second neural network can directly correct the value of the corresponding position in the second vector according to the value of the first vector, thereby realizing the correction of the first face image according to the attribute difference information. In addition, the attribute information of the first face image can be changed as little as possible except for the attribute information that needs to be corrected.

In a possible implementation manner, before performing facial feature extraction on the first face image to obtain first facial feature information of the first face image, the method further includes: initializing a second neural network. Obtain the first real face image, the vector corresponding to the first real face image, and the vector corresponding to the attribute difference information. Among them, the vector corresponding to the first real face image is used to represent the facial feature information of the first real face image; the vector corresponding to the first real face image and the vector corresponding to the attribute difference information are input to the second neural network, and the output contains Synthetic face image of attribute difference information. Input the first real face image and the synthetic face image corresponding to the first real face image into the third neural network and the fourth neural network; among them, the third neural network is used to distinguish the input face image as a real face image The first probability; the fourth neural network is used to determine the second probability that the input face image contains attribute difference information. The second neural network is iteratively trained according to the first probability output by the third neural network and the second probability output by the fourth neural network. In the iterative training process, the weights of the parameters of the second neural network are adjusted. If the first probability output by the third neural network is greater than the first threshold and the second probability output by the fourth neural network is greater than the second threshold, stop training the second neural network to obtain a second neural network that can be used to synthesize a face image.

Exemplarily, the second neural network can be trained using the method of adversarial networks (generative adversarial networks, GAN). Use the third neural network to determine the first probability that the synthetic face image output by the second neural network is a real face image, and use the fourth neural network to determine the second probability that the synthetic face image output by the second neural network contains attribute difference information . According to the first probability output by the third neural network and the second probability output by the fourth neural network, the weights of the parameters of the second neural network are adjusted, so that the second neural network synthesizes the face image, and the corresponding real face image The difference is as small as possible, more real, and contains specific attribute information such as attribute difference information.

In a possible implementation manner, the method further includes: obtaining attribute adjustment information fed back by the user. Perform facial feature extraction on the second face image to obtain second facial feature information of the second face image. The third face image is synthesized according to the second facial feature information and the attribute adjustment information.

Among them, the attribute adjustment information fed back by the user represents the information that the user wants to adjust the attributes of the synthesized face image, and the face image synthesis device adjusts the attribute information contained in the face image through an interaction process with the user. For example, some detailed features of face images that are difficult to fully describe. Or, some attribute information that was not collected in the initial collection of user attribute information can be adjusted by using the attribute adjustment information. For example, it is possible to collect the adjustment made by the user directly by drawing on the above-mentioned synthesized second face image, identify the adjusted attribute information, and use the adjusted attribute information to directly correct the second face image. In turn, a real face image that is closer to the user's needs is obtained.

In a possible implementation, synthesizing the third face image according to the second facial feature information and the attribute adjustment information includes: obtaining the third vector according to the attribute adjustment information fed back by the user; wherein the third vector is used to represent the first The attribute information that needs to be adjusted in the face image. The second face image is input to the fifth neural network to obtain the fourth vector; the fifth neural network is used to extract the facial feature information of the input face image; the fourth vector is used to represent the second face image of the second Facial feature information. The third vector and the fourth vector are spliced and input into the sixth neural network to obtain the third face image; wherein the sixth neural network is used to correct the second facial feature information of the second face image according to the attribute adjustment information.

In this way, the attribute information can be directly converted into a vector problem, and the neural network is used to synthesize the face image, so that the synthesized face image has as little change as possible except for the part that needs to be adjusted.

In a possible implementation manner, before synthesizing the third face image according to the second facial feature information and the attribute adjustment information, the method further includes: initializing a sixth neural network. Obtain the second real face image, the vector corresponding to the second real face image, and the vector corresponding to the attribute adjustment information. Among them, the vector corresponding to the second real face image is used to represent the facial feature information of the second real face image; the vector corresponding to the second real face image and the vector corresponding to the attribute adjustment information are input to the sixth neural network, and the output contains Synthetic face image with attribute adjustment information. Input the second real face image and the synthetic face image corresponding to the second real face image into the seventh neural network and the eighth neural network; among them, the seventh neural network is used to distinguish the input face image as a real face image The third probability; the eighth neural network is used to determine the fourth probability that the second real face image and the second real face image corresponding to the synthetic face image segmentation map are consistent. The sixth neural network is iteratively trained according to the third probability output by the seventh neural network and the fourth probability output from the eighth neural network. In the iterative training process, the weights of the parameters of the sixth neural network are adjusted. If the third probability output by the seventh neural network is greater than the third threshold and the fourth probability output by the eighth neural network is greater than the fourth threshold, the training is stopped, and a sixth neural network that can be used to synthesize a face image is obtained.

Exemplarily, the GAN method can be used to train the sixth neural network. Use the seventh neural network to determine the probability that the synthetic face image output by the sixth neural network is a real face image, and use the eighth neural network to identify the segmentation image of the synthetic face image output by the sixth neural network and the corresponding real face image Probability of agreement. According to the third probability output by the seventh neural network and the fourth probability output by the eighth neural network, the weights of the parameters of the sixth neural network are adjusted, so that the sixth neural network synthesizes the face image, and the corresponding real face image The difference is as small as possible, more real, with similar segmentation maps.

In a possible implementation, the attribute information includes any one or more of the following: age information, gender information, race information, skin color information, face shape information, facial features information, skin condition information, wearing accessories information, hairstyle information, makeup information.

Among them, the face shape information may include information such as the shape of the human face and the height of the cheekbones. The skin condition information may include wrinkles, spots, beards, scars and other information on the skin. The wearing accessories information may include wearing glasses, masks, hats and other information.

In this way, a real face image similar to the face image to be synthesized can be searched from multiple dimensions, and the face image can be synthesized based on the real face image by changing less attribute information, making the synthesized face image more realistic.

In a possible implementation manner, before acquiring the first attribute information, the method further includes: establishing a real face image database, the real face image database containing real face images and attribute information contained in the real face images.

Exemplarily, the real face image library includes attribute information contained in the real face image. Furthermore, in the process of searching for the first face image, the corresponding real face image can be directly determined according to the repetition rate of the attribute information.

In a second aspect, the present application provides a face image synthesis device, which may include: an acquisition unit and a processing unit. The obtaining unit is used to obtain first attribute information; the first attribute information is the attribute information contained in the face image to be synthesized. The processing unit is used to search for the first face image in the real face image database according to the first attribute information; the first face image contains the second attribute information, and the repetition rate of the second attribute information and the first attribute information satisfies the threshold Require. The processing unit is further configured to obtain attribute difference information according to the first attribute information and the second attribute information; the attribute difference information is used to indicate the attribute difference between the first face image and the face image to be synthesized. The processing unit is further configured to perform facial feature extraction on the first face image to obtain first facial feature information of the first face image. The processing unit is further configured to synthesize a second face image according to the first facial feature information and the attribute difference information.

In a possible implementation manner, the processing unit is specifically configured to obtain the first attribute vector according to the first attribute information. The second attribute vector is obtained according to the second attribute information; the length of the first attribute vector and the second attribute vector are the same, and each bit corresponds to one type of attribute information. The first vector is obtained according to the difference between the first attribute vector and the second attribute vector; the first vector is used to represent attribute difference information.

In a possible implementation, the processing unit is specifically configured to input the first face image into the first neural network to obtain the second vector; wherein the first neural network is used to extract facial feature information of the input face image ; The second vector is used to represent the first facial feature information of the first face image.

In a possible implementation, the processing unit is specifically used to obtain the first vector and the second vector; wherein the first vector is used to represent the attribute difference information; the second vector is used to represent the first face image of the first face image. Facial feature information. The first vector and the second vector are spliced and input into the second neural network to obtain the second face image; wherein, the second neural network is used to correct the first facial feature information of the first face image according to the attribute difference information.

In a possible implementation manner, the processing unit is also used to initialize the second neural network. Obtain the first real face image, the vector corresponding to the first real face image, and the vector corresponding to the attribute difference information. Among them, the vector corresponding to the first real face image is used to represent the facial feature information of the first real face image; the vector corresponding to the first real face image and the vector corresponding to the attribute difference information are input to the second neural network, and the output contains The synthetic face image corresponding to the first real face image of the attribute difference information. Input the first real face image and the synthetic face image corresponding to the first real face image into the third neural network and the fourth neural network; among them, the third neural network is used to distinguish the input face image as a real face image The first probability; the fourth neural network is used to determine the second probability that the input face image contains attribute difference information. The second neural network is iteratively trained according to the first probability output by the third neural network and the second probability output by the fourth neural network. In the iterative training process, the weights of the parameters of the second neural network are adjusted. If the first probability output by the third neural network is greater than the first threshold and the second probability output by the fourth neural network is greater than the second threshold, stop training the second neural network to obtain a second neural network that can be used to synthesize a face image.

In a possible implementation manner, the obtaining unit is also used to obtain attribute adjustment information fed back by the user. The processing unit is further configured to perform facial feature extraction on the second face image to obtain second facial feature information of the second face image. The processing unit is further configured to synthesize a third face image according to the second facial feature information and the attribute adjustment information.

In a possible implementation manner, the processing unit is specifically configured to obtain the third vector according to the attribute adjustment information fed back by the user; wherein, the third vector is used to represent the attribute information that needs to be adjusted in the second face image. The second face image is input to the fifth neural network to obtain the fourth vector; the fifth neural network is used to extract the facial feature information of the input face image; the fourth vector is used to represent the second face image of the second Facial feature information. The third vector and the fourth vector are spliced and input into the sixth neural network to obtain the third face image; wherein the sixth neural network is used to correct the second facial feature information of the second face image according to the attribute adjustment information.

In a possible implementation manner, the processing unit is also used to initialize the sixth neural network. Obtain the second real face image, the vector corresponding to the second real face image, and the vector corresponding to the attribute adjustment information. Among them, the vector corresponding to the second real face image is used to represent the facial feature information of the second real face image; the vector corresponding to the second real face image and the vector corresponding to the attribute adjustment information are input to the sixth neural network, and the output contains The synthetic face image corresponding to the second real face image of the attribute adjustment information. Input the second real face image and the synthetic face image corresponding to the second real face image into the seventh neural network and the eighth neural network; among them, the seventh neural network is used to distinguish the input face image as a real face image The third probability; the eighth neural network is used to determine the fourth probability that the second real face image and the second real face image corresponding to the synthetic face image segmentation map are consistent. The sixth neural network is iteratively trained according to the third probability output by the seventh neural network and the fourth probability output from the eighth neural network. In the iterative training process, the weights of the parameters of the sixth neural network are adjusted. If the third probability output by the seventh neural network is greater than the third threshold and the fourth probability output by the eighth neural network is greater than the fourth threshold, the training is stopped, and a sixth neural network that can be used to synthesize a face image is obtained.

In a possible implementation manner, the processing unit is also used to establish a real face image database, and the real face image database contains real face images and attribute information contained in the real face images.

In a third aspect, the present application provides a face image synthesis device. The face image synthesis device may include: one or more processors, memories, and one or more instructions. One or more instructions are stored in the memory. When the instruction is executed by one or more processors, the face image synthesis apparatus is caused to execute the face image synthesis method as described in the first aspect and any one of its possible implementation manners.

In a fourth aspect, the present application provides a device that has the function of implementing the face image synthesis method described in the first aspect and any one of its possible implementation manners. This function can be realized by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-mentioned functions.

In the fifth aspect, the present application provides a computer-readable storage medium, including computer instructions, which when the computer instructions are executed on the computer, cause the processor to execute the human Face image synthesis method.

In a sixth aspect, the present application provides a computer program product, which when the computer program product runs on a server, causes the face image synthesis device to execute the face image composition device described in the first aspect and any one of its possible implementations. Image synthesis method.

In a seventh aspect, a circuit system is provided. The circuit system includes a processing circuit configured to execute the face image synthesis method described in the first aspect and any one of its possible implementation manners.

Description of the drawings

FIG. 1 is a schematic diagram of an application scenario of a face image synthesis method provided by an embodiment of the present application;

Figure 2 is a schematic diagram of a system architecture provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of the hardware structure of a chip provided by an embodiment of the present application;

FIG. 4 is a first schematic flowchart of a method for synthesizing a face image provided by an embodiment of the present application;

FIG. 5 is a second schematic flowchart of a method for synthesizing a face image provided by an embodiment of the present application;

6 is a schematic diagram 1 of the training process of a facial image synthesis neural network provided by an embodiment of the application;

FIG. 7 is a third schematic flowchart of a method for synthesizing a face image provided by an embodiment of the application;

FIG. 8 is a fourth schematic flowchart of a method for synthesizing a face image according to an embodiment of the application;

FIG. 9 is a second schematic diagram of a training process of a facial image synthesis neural network provided by an embodiment of the application;

FIG. 10 is a schematic structural diagram of a face image synthesis device provided by an embodiment of the application;

FIG. 11 is a schematic diagram of the hardware structure of a face image synthesis device provided by an embodiment of the application.

Detailed ways

The face image synthesis method and device provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings.

FIG. 1 shows a face image synthesis system. The face image synthesis system includes a face image synthesis device 110 and a terminal device 120. The facial image synthesis device 110 and the terminal device 120 may be connected through a wired network or a wireless network. The embodiment of the present application does not specifically limit the connection mode between devices.

The aforementioned terminal device 120 provides a related human-computer interaction interface so that the user can input related parameters required in the process of synthesizing the face image, such as attribute information of the face image to be synthesized, attribute adjustment information, and the like. Exemplarily, the attribute information of the face image to be synthesized may include gender information, and age information, face shape information, etc. may be used to describe the attribute information of the face feature. Exemplarily, the terminal device may be a mobile phone (mobile phone), a tablet computer (pad), a computer with wireless transceiver function, a personal digital assistant (personal digital assistant, PDA), a netbook, a desktop, a laptop, a handheld Computers, laptops, artificial intelligence (AI) terminals and other terminal devices. The embodiment of the present application does not impose special restrictions on the specific form of the terminal device 120.

The aforementioned facial image synthesis device 110 may be a device or server with image search and image synthesis functions, such as a cloud server or a network server. The face image synthesis device 110 receives the attribute information and attribute adjustment information sent from the terminal device 120 through an interactive interface, and then searches for real face images through a processor based on the real face image library stored in the memory. After that, the processor uses the searched real face image and the obtained attribute information to synthesize the face image, and the processor can use the obtained attribute adjustment information to further adjust the attributes of the synthesized face image. The final synthesized human face image is sent to the corresponding terminal device 120. The memory in the face image synthesis device 110 may be a general term, including a database for local storage and storage of historical face images. The database may be on the face image synthesis device or on other cloud servers.

It should be noted that the face image synthesis device 110 may be a server, a server cluster composed of multiple servers, or a cloud computing service center.

For example, in FIG. 1, the server as the face image synthesis device 110 can execute the face image synthesis method of the embodiment of the present application.

For another example, in FIG. 1, the terminal device 120 directly functions as a face image synthesis device, receiving the attribute information and/or attribute adjustment information of the face image to be synthesized from the user's input, and the terminal device 120 performs the face image synthesis task by itself. Synthesize the desired face image. That is, the terminal device 120 itself can execute the face image synthesis method of the embodiment of the present application.

Fig. 2 exemplarily shows a system architecture provided by an embodiment of the present application.

As shown in the system architecture in FIG. 2, the face image synthesis device 110 is equipped with a transceiver interface 211 for data interaction with external devices. The face image synthesis device 110 receives input data transmitted by the terminal device 120 through the transceiver interface 211. The input data in the embodiment of the present application may include: attribute information of the face image to be synthesized and attribute adjustment information.

The face image collection module 240 is used to collect real face images. For example, the real face image in the embodiment of the present application may be a collected face image of a local resident population. After the real face images are collected, the face image collection module 240 stores these real face images in the database 230. The database 230 may also include a real face image library 231, which is used to store the real face images searched by the face image acquisition module 240. The database 230 may also store face images used for training the face image synthesis device 110.

It should be noted that in actual applications, the real face images maintained in the database 230 may not all come from the collection of the face image collection module 240, and may also be received from other devices. For example, the information sent by the terminal device 120 for expanding the real face gallery 231 is received.

The attribute information collection module 212 is used to collect attribute information 201. The attribute information 201 may include, for example, attribute information and attribute adjustment information of the face image to be synthesized. Specifically, the attribute information collection module 212 collects the attribute information of the face image to be synthesized that is required for synthesizing the face image through the transceiver interface 211. And in the process of synthesizing the face image, the attribute adjustment information input by the user through the terminal device 120 is collected.

The search module 213 is configured to search the real face image library 231 based on the attribute information 201 for the real face image 202 whose repetition rate with the attribute information of the face image to be synthesized meets the threshold requirement, that is, to search for the real face image that is closer to the face image to be synthesized Face image.

The generating module 214 is used to train the real face image 202 based on the attribute information 201 to obtain the synthesized face image 203. For example, based on the attribute information of the face image to be synthesized, some attribute information is added/decreased/corrected in the real face image. For example, the hairstyle attribute information in the real face image 202 is corrected from short hair to long hair. And output the synthesized face image 203 to the terminal device 120 through the transceiver interface 211.

The face image attribute adjustment module 215 is used to adjust the details of the attributes in the synthesized face image based on the face image generated by the generation module 214 and the attribute adjustment information collected by the attribute information acquisition module 212, so as to obtain a more user-friendly result The required face image.

It should also be noted that FIG. 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 2, the database 230 is an external memory relative to the face image synthesis device 110. In other cases, the database 230 may also be placed in the face image synthesis device 110.

Since the embodiments of the present application involve a large number of applications of neural networks, in order to facilitate understanding, the following first introduces related terms and concepts of neural networks that may be involved in the embodiments of the present application.

(1) Neural network

A neural network can be composed of neural units. A neural unit can refer to _{an arithmetic unit that takes x s} and intercept 1 as inputs. The output of the arithmetic unit can be:

Among them, s=1, 2,...n, n is a natural number greater than 1, W _s is the weight of x _s , and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field. The local receptive field can be a region composed of several neural units.

(2) Deep neural network

Deep neural network (DNN), also known as multi-layer neural network, can be understood as a neural network with multiple hidden layers. The DNN is divided according to the positions of different layers. The neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.

Although DNN looks complicated, it is not complicated as far as the work of each layer is concerned. Simply put, it is the following nonlinear relational expression:

in,

Is the input vector,

Is the output vector,

Is the offset vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just the input vector

After such a simple operation, the output vector is obtained

Due to the large number of DNN layers, the coefficient W and the offset vector

The number is also relatively large. The definition of these parameters in DNN is as follows: Take coefficient W as an example: Suppose in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as

The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.

In summary, the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as

It should be noted that there is no W parameter in the input layer. In deep neural networks, more hidden layers make the network more capable of portraying complex situations in the real world. In theory, a model with more parameters is more complex and has a greater "capacity", which means that it can complete more complex learning tasks. Training the deep neural network is also the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).

(3) Loss function

In the process of training a deep neural network, because the output of the deep neural network needs to be as close as possible to the value that you really want to predict, you can compare the current network's predicted value with the really desired target value, and then based on the difference between the two To update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, pre-configured parameters for each layer in the deep neural network), for example, if the predicted value of the network If it is high, adjust the weight vector to make its prediction lower, and keep adjusting until the deep neural network can predict the really wanted target value or a value very close to the really wanted target value. Therefore, it is necessary to predefine "how to compare the difference between the predicted value and the target value". This is the loss function or objective function, which is used to measure the difference between the predicted value and the target value. Important equation. Among them, take the loss function as an example. The higher the output value (loss) of the loss function, the greater the difference. Then the training of the deep neural network becomes a process of reducing this loss as much as possible.

(4) Backpropagation algorithm

The neural network can use the back propagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged. The back-propagation algorithm is a back-propagation motion dominated by error loss, and aims to obtain better neural network model parameters, such as a weight matrix.

FIG. 3 is a hardware structure of a chip provided by an embodiment of the application. The chip includes a neural-network processing unit (NPU) 300. The chip can be set in the face image synthesis device 110 shown in FIG. 2 to complete all or part of the work of the attribute information collection module 212 in FIG. 2 (for example, collect attribute information 201), or it can be used to complete search All or part of the work of the module 213 (for example, searching for the real face image 202), can also be used to complete all or part of the work in the generating module 214 (for example, generating a synthetic face image 203), and can also be used to complete the attributes of the face image All or part of the work of the adjusting module 215 (for example, adjusting the attribute information of the synthetic face image 203 generated by the generating module 214).

The neural network processor NPU 300 is mounted as a co-processor to a main central processing unit (central processing unit, CPU) (host CPU) 320, and the main CPU 320 allocates tasks. The core part of the NPU 300 is the arithmetic circuit 303. The controller 304 controls the arithmetic circuit 303 to extract data from the memory (weight memory or input memory) and perform calculations.

In some implementations, the arithmetic circuit 303 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 303 is a two-dimensional systolic array. The arithmetic circuit 303 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 303 is a general-purpose matrix processor.

For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit 303 fetches the data corresponding to the matrix B from the weight memory 302 and caches it on each PE in the arithmetic circuit 303. The arithmetic circuit 303 takes the matrix A data and the matrix B from the input memory 301 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 308 (accumulator).

The vector calculation unit 307 can perform further processing on the output of the arithmetic circuit 303, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on. For example, the vector calculation unit 307 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .

In some implementations, the vector calculation unit 307 can store the processed output vector to the unified memory 306. For example, the vector calculation unit 307 may apply a nonlinear function to the output of the arithmetic circuit 303, such as a vector of accumulated values, to generate the activation value.

In some implementations, the vector calculation unit 307 generates a normalized value, a combined value, or both.

In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 303, for example, for use in a subsequent layer in a neural network.

The unified memory 306 is used to store input data and output data. The weight data directly passes through the storage unit access controller 305 (direct memory access controller, DMAC) to store the input data in the external memory 330 into the input memory 301 and/or the unified memory 306, and store the weight data in the external memory 330 into the weight The memory 302 and the data in the unified memory 306 are stored in the external memory 330.

The bus interface unit 310 (bus interface unit, BIU) is used to implement interaction between the main CPU 320, the DMAC, and the fetch memory 309 through the bus.

The instruction fetch buffer 309 connected to the controller 304 is used to store instructions used by the controller 304. The controller 304 is used to call the instructions cached in the instruction fetch memory 309 to control the working process of the computing accelerator.

Generally, the unified memory 306, the input memory 301, the weight memory 302, and the instruction fetch memory 309 are all on-chip (On-Chip) memories, and the external memory 330 is a memory external to the NPU 300, and the external memory 330 can be synchronized at a double data rate. Dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (HBM) or other readable and writable memory.

The face image synthesis method according to the embodiment of the present application will be described in detail below with reference to FIG. 4. The face image synthesis method of the embodiment of the present application may be executed by devices such as the face image synthesis device 110 in FIG. 1 and the face image synthesis device 110 in FIG. 2.

As shown in FIG. 4, a schematic flow chart of a method for synthesizing a face image is provided in an embodiment of this application. The method may include S101-S105:

S101. Acquire first attribute information.

Wherein, the first attribute information is attribute information contained in the face image to be synthesized. The attribute information includes any one or more of the following: age information, gender information, race information, skin color information, face information, facial features information, skin condition information, wearing accessories information, hairstyle information, makeup information. For example, the face shape information may include information such as the shape of the human face and the height of the cheekbones. The skin condition information may include wrinkles, spots, beards, scars and other information on the skin. The wearing accessories information may include wearing glasses, masks, hats and other information.

If a face image needs to be synthesized, the attribute information of the face image to be synthesized needs to be collected first, and then the synthesized face image needs to include these attribute information.

Exemplarily, the method of collecting attribute information includes: the face image synthesis device creates a face image attribute information questionnaire according to all the attribute information needed to synthesize the face image, and sends the face image attribute information questionnaire to the terminal device, and The witness fills in the terminal device and returns to the face image synthesis device, and then the face image synthesis device obtains the attribute information. In this way, the face image synthesis device can collect the attribute information of the face image to be synthesized from multiple dimensions, so that the final synthesized face image is closer to the required face image.

As shown in Table 1 below, the content of a face image attribute information questionnaire is exemplarily given. For example, it is confirmed that the first attribute information required to synthesize a face image this time includes males aged 30-40, sharp chins, and big eyes. High nose, thin lips, short hair, oblique bangs.

Table 1

序号Serial number	属性Attributes	详细信息details
11	年龄age	30-4030-40
22	性别gender	男male
33	种族Race	--
44	肤色color	--
55	脸型Face shape	尖下巴Pointed chin
66	五官Five senses	大眼睛、高鼻梁、薄嘴唇Big eyes, high nose bridge, thin lips
77	皮肤情况Skin condition	--
88	佩戴饰品Wear accessories	--
99	发型hairstyle	短发、斜刘海Short hair, diagonal bangs
1010	妆容Makeup	--

It should be noted that the foregoing Table 1 is only a possible implementation of the face image attribute information questionnaire, and the face image attribute information can also be obtained through other expression forms of the face image attribute information questionnaire. For example, list all possible attribute information in the face image attribute information questionnaire, and determine the attribute information by collecting the check results of the user. For another example, in the face image attribute information questionnaire, an adjustable progress bar corresponding to the degree of presentation of various attribute information is established, and the specific situation of the attribute information is obtained by the user adjusting the degree of the progress bar. Such as hairstyle attribute information, the progress bar at 20% means short hair, and 80% means long hair.

S102. According to the first attribute information, search for a first face image in a real face image database, where the first face image includes second attribute information, and the repetition rate of the second attribute information and the first attribute information meets the threshold requirement.

Among them, the real face image library is used to store real face images. Exemplarily, a real face image database is established by collecting real face images of the local resident population. Optionally, the real face image library also includes attribute information contained in the real face image. Furthermore, in the process of searching for the first face image, the corresponding real face image can be directly determined according to the repetition rate of the attribute information. Optionally, a large number of real face images can be collected in advance to establish a real face image database, and the real face image database can be updated and expanded periodically.

The first face image is a certain face image found in the real face image library, therefore, the first face image is a real face image. The repetition rate of the second attribute information and the first attribute information contained in the first face image also needs to meet certain threshold requirements. In this way, a real face image that is closer to the face image to be synthesized can be found. Wherein, the value of the threshold can be set according to an empirical value, for example, the repetition rate of the first attribute information and the second attribute information is greater than or equal to 80%. Further, if multiple first face images that meet the threshold requirement are found, the face image with the highest repetition rate may be used as the first face image for subsequent synthesis of the face image.

Exemplarily, as shown in Table 2 below, it is the second attribute information contained in the first face image found based on the first attribute information shown in Table 1 above. For example, it is confirmed that the second attribute information contained in the first face image includes a 35-year-old man, sharp chin, big eyes, high nose bridge, thin lips, and short hair.

Table 2

序号Serial number	属性Attributes	详细信息details
11	年龄age	3535
22	性别gender	男male
33	种族Race	--
44	肤色color	--
55	脸型Face shape	尖下巴Pointed chin
66	五官Five senses	大眼睛、高鼻梁、薄嘴唇Big eyes, high nose bridge, thin lips
77	皮肤情况Skin condition	--
88	佩戴饰品Wear accessories	--
99	发型hairstyle	短发short hair
1010	妆容Makeup	--

In this way, the face image is synthesized on the basis of the found real first face image, and the synthesized face image can also be closer to the real face image.

S103: Obtain attribute difference information according to the first attribute information and the second attribute information.

Among them, the attribute difference information is used to indicate the attribute difference between the first face image and the face image to be synthesized. That is, the attribute information that needs to be added/deleted/corrected on the first face image.

Optionally, the first attribute vector is obtained according to the first attribute information. Obtain the second attribute vector according to the second attribute information. Among them, the length of the first attribute vector and the second attribute vector are the same, and each bit corresponds to one type of attribute information. The first vector is obtained according to the difference between the first attribute vector and the second attribute vector. The first vector is used to represent the attribute difference information.

For example, the corresponding relationship between the attribute information and each bit of the attribute vector can be configured in advance. For example, according to the face image attribute information questionnaire in step S101, each bit of the attribute vector is sequentially corresponding to one type of attribute information. For example, the face image attribute information questionnaire contains three types of attribute information, namely gender, hair style, and skin color. The length of the first attribute vector and the second attribute vector used to represent the attribute information are both 3, and each of them corresponds to A type of attribute information. Among them, for the gender attribute information, 0 can be preset to indicate a female, and 1 to indicate a male. For hairstyle attribute information, the hair length can be divided into 9 intervals from short to long, corresponding to nine numbers from 1-9, and 0 means that the attribute information is not collected. For the skin color attribute information, the skin color can be divided into 9 intervals from light to dark, corresponding to nine numbers 1-9, and 0 means that the attribute information is not collected. In addition, in the first vector, characters with no meaning may be used to indicate that the first attribute information and the second attribute information corresponding to the position are the same, such as X. Furthermore, when subsequently using the attribute difference information represented by the first vector to modify the attribute information of the first face image, there is no need to modify the attribute information of the first face image corresponding to the position of the meaningless character. Exemplarily, as shown in Table 3 below, the first attribute information collected is male with short hair. The first attribute vector obtained according to the first attribute information is expressed as

As shown in Table 4 below, the second attribute information contained in the found first face image is male, with medium-length hair and white skin. The second attribute vector corresponding to the second attribute information is expressed as

The 2nd and 3rd attribute vectors in the first attribute vector and the second attribute vector are different. In this way, the attribute difference information can be obtained according to the difference between the two attribute vectors, and the first vector can be expressed as

table 3

序号Serial number	属性Attributes	详细信息details
11	性别gender	11
22	发型hairstyle	22
33	肤色color	00

Table 4

序号Serial number	属性Attributes	详细信息details
11	性别gender	11
22	发型hairstyle	77
33	肤色color	33

Exemplarily, referring to FIG. 5, the first attribute vector 511 is obtained according to the obtained first attribute information 51, and the second attribute vector 521 is obtained according to the obtained second attribute information 52. Furthermore, the first vector 53 is obtained according to the first attribute vector 511 and the second attribute vector 521. For example, obtain the first attribute information contained in the above table 1 and the second attribute information contained in the above table 2, respectively obtain the first attribute vector and the second attribute vector according to the first attribute information and the second attribute information, and then press The bit subtraction method compares the first attribute vector and the second attribute vector to obtain a first vector, and the first vector indicates that the attribute difference information is oblique bangs.

In this way, the second attribute information contained in the first face image can be compared with the collected first attribute information to determine the attribute information that needs to be added/deleted/corrected on the first face image. After that, S104 is executed, that is, the first face image is adjusted according to the attribute difference information to obtain a second face image that is closer to the face image to be synthesized.

S104: Perform facial feature extraction on the first face image to obtain first facial feature information of the first face image.

Wherein, the face image may include multiple pieces of facial feature information that cannot be exhaustively listed, and the facial feature information may be used to represent the corresponding face image. The above-mentioned attribute information contained in the first face image may be part of the content of the facial feature information of the first face image, and this part of the content can be easily observed, memorized, and described by the user. The facial feature information of each person is different, and the similarity of the facial feature information can be used to distinguish different people. For example, Zhang San’s facial feature information is different from Li Si’s facial feature information, and Zhang San’s facial feature information can be used to quickly determine who is Zhang San. Further, facial feature extraction is to obtain facial feature information that can represent the face image.

Optionally, a neural network is used to implement facial feature extraction of the first face image, and the first face image is converted into a second vector representing the first facial feature information. Wherein, the second vector contains content representing attribute information. For example, the first face image is input to the input layer of the neural network, and the operator of the input layer extracts the facial feature information contained in the first face image to form a high-dimensional matrix representing the facial feature information. The high-dimensional matrix is input into the hidden layer of the neural network, and the high-dimensional matrix is processed for dimensionality reduction through the operation of each layer of the operator in the hidden layer. Finally, the neural network output layer outputs the second vector.

Exemplarily, as shown in FIG. 5, the first face image 54 is input to the first neural network 55 to obtain the second vector 56. Among them, the first neural network 55 is used to extract the facial feature information of the input face image, and the second vector 56 is used to represent the first facial feature information of the first face image 54.

S105: Synthesize a second face image according to the first facial feature information and the attribute difference information.

Optionally, through the above steps S103 and S104, the attribute difference information between the first face image and the face image to be synthesized, and the facial feature information of the first face image are obtained, and then the first face is corrected according to the attribute difference information The facial feature information of the image is synthesized to a face image that is closer to the demand and contains the attribute difference information.

For example, referring to FIG. 5, the first vector 53 representing the attribute difference information and the second vector 56 representing the first facial feature information of the first face image are spliced, and then input into the second neural network 57 after splicing. Obtain a second face image 58. The second neural network 57 is used to correct the first facial feature information of the first face image 54 according to the attribute difference information. For example, if the attribute difference information obtained according to the above Table 1 and Table 2 is oblique bangs, the second neural network 57 can synthesize the oblique bangs in the first face image 54 to obtain the second face image 58 in FIG. 5. The two face images 58 contain attribute difference information (such as oblique bangs).

The splicing of the first vector and the second vector may include splicing the first vector directly after the second vector, and the second neural network corrects the second vector according to the first vector.

Exemplarily, in the first vector, the position corresponding to the same attribute information of the first attribute information and the second attribute information can be set to a meaningless symbol such as X, and then the second neural network can be directly based on the part of the first vector. The content of the meaning position modifies the second vector. If the second vector is expressed as

The first vector is expressed as

The spliced vector is expressed as

In this way, the second neural network can directly modify the second vector based on the content of a part of the meaningful position in the first vector. For example, the second neural network reads that the value of one bit of the vector representing the attribute information part of the spliced vector is meaningful, and obtains the attribute information corresponding to the bit value of the vector, such as oblique bangs attribute information. Then, according to the obtained attribute information, the vector representing the facial feature information is corrected, and then the corrected vector representing the facial feature information is subjected to dimensionality processing to obtain a composite face image containing oblique bangs. In this way, the second neural network synthesizing the face image can correct the attribute information contained in the found real face image based on the attribute difference information, thereby obtaining a synthetic face image containing specific attribute information that is closer to the real face image.

Therefore, the face image synthesis method provided by the embodiment of the present application can obtain a real face image closer to the face image to be synthesized through the collected face image attribute information, and according to the attribute information contained in the real face image and The collected face image attribute information obtains the attribute difference information, and then the face image synthesis is performed based on the real face image according to the attribute difference information to obtain a more realistic and satisfying face image. Compared with the prior art, a face image synthesis method in which an artist paints and then adjusts the face image. The face image synthesis method provided in the embodiments of the present application may not require the participation of professionals, has high efficiency, and is convenient for promotion. And based on real face images for face image synthesis, a more realistic face image can be obtained.

In some embodiments, before the second neural network for synthesizing the face image is applied, the second neural network needs to be trained to make it more realistic and reliable to synthesize the face image. The embodiment of the application adopts a method based on adversarial networks (generative adversarial networks, GAN) to train the second neural network. Wherein, the confrontation network used for training the second neural network (also referred to as the generation network) may include a third neural network and a fourth neural network. The third neural network is used to determine the first probability that the input face image is a real face image. The fourth neural network is used to determine the second probability that the input face image contains attribute difference information. The specific training process is as follows.

Step 1: Initialize the second neural network.

Optionally, a second neural network that can be used for face image synthesis is constructed, and the weight matrix corresponding to each parameter contained therein is set as an initial value, and a training process needs to be performed, and the weight matrix corresponding to each parameter is learned during the training process. As described above for the deep neural network, training the second neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained second neural network.

Step 2: Obtain the first real face image, the vector corresponding to the first real face image, and the vector corresponding to the attribute difference information.

Optionally, first obtain a real face image data set and an attribute difference information data set. Wherein, the real face image data set includes one or more first real face images, and the attribute difference information data set includes one or more attribute difference information. After that, according to the method introduced in the above step S103 and step S104, the vector corresponding to the first real face image is obtained through the first neural network, and the vector represents the first facial feature information of the first face image. Moreover, the vector corresponding to the attribute difference information can be directly constructed. Further, a training data set containing multiple sets of training data can be obtained through the combination of the first real face image and the attribute difference information.

Step 3: Input the vector corresponding to the first real face image and the vector corresponding to the attribute difference information into the second neural network, and output a synthetic face image containing the attribute difference information.

Optionally, in the process of training the second neural network, inputting a vector of a first real face image can be combined with inputting a vector of attribute difference information, and then outputting the first image according to the face image synthesis method in step S105. The synthetic face image corresponding to the real face image.

Exemplarily, as shown in FIG. 6, the vector 62 corresponding to the first real face image and the vector 63 corresponding to the attribute difference information are input to the second neural network 64, and the second neural network 64 synthesizes the face image, and the output contains attributes Synthetic face image 65 of difference information.

Step 4: Input the first real face image and the synthetic face image corresponding to the first real face image into the third neural network and the fourth neural network.

Optionally, the first real face image and its corresponding synthetic face image are input as a set of data into the third neural network and the fourth neural network, respectively. Among them, the third neural network can realize the function of true and false discrimination, and determine the probability that the synthetic face image output by the second neural network is a real face image. If the probability is higher, the probability that it is a real face image is higher, indicating that the second neural network has a higher ability to synthesize a real face image. The fourth neural network can determine the probability of certain specific attribute information contained in the input face image, for example, the probability of determining the attribute difference information contained in the input face image. The higher the probability, the higher the ability of the second neural network to synthesize specific attribute information, so that the second neural network can correct the attribute difference information in the first face image. In addition, the difference between the synthetic face image and the corresponding first real face image can be determined according to the probability of the input face image being a real face image and the probability of containing specific attribute information, so as to ensure that the synthetic face image is The difference with the corresponding first face image except for the attribute difference information is as small as possible.

Exemplarily, referring to FIG. 6, the first real face image 61 and the synthesized face image 65 are input to the third neural network 66 and the fourth neural network 67.

Step 5: Perform iterative training on the second neural network according to the first probability output by the third neural network and the second probability output by the fourth neural network. In the iterative training process, the weights of the parameters of the second neural network are adjusted.

Optionally, the output result of the third neural network is the first probability of judging that the input face image is a real face image, and the output result of the fourth neural network is the second probability of judging that the input face image contains attribute difference information. That is to say, the output results of the third neural network and the fourth neural network are used to measure the difference between the synthetic face image output by the second neural network and the real face image input, using a loss function or target The objective function indicates that the higher the output value (loss) of the loss function, the greater the difference, and the training of the second neural network becomes a process of reducing this loss as much as possible.

Among them, the objective function contains three parts of constraints. First, the difference between the synthesized face image output by the second neural network and the corresponding real face image is required to be as small as possible. The smaller the difference, the smaller the objective function value. Secondly, it is required that the synthetic face image output by the second neural network can be hidden from the third neural network as much as possible, that is, the third neural network can judge the synthetic face image as a real face image. The higher the probability of being judged as a real image, the smaller the objective function value. Third, the synthetic face image output by the second neural network needs to contain specific attribute information (attribute difference information). The greater the probability of including specific attribute information, the smaller the objective function value. In this way, the training of the second neural network can be achieved by minimizing the objective function.

For example, the output results of the third neural network and the fourth neural network are used as the operation condition input loss (loss) function, and the result of the loss function is input into the second neural network for back propagation operation. In the process of back propagation operation, Perform gradient update, modify the weight of each parameter of the second neural network, and finally obtain a better parameter weight matrix.

Exemplarily, as shown in FIG. 6, after the first real face image 61 and the corresponding synthetic face image 65 are input to the third neural network 66 and the fourth neural network 67, the first real face image 61 outputted by the third neural network 66 The probability and the second probability output after the fourth neural network 67 are input to the objective function, and the second neural network 64 is trained by the method of minimizing the objective function.

Step 6. If the first probability output by the third neural network is greater than the first threshold and the second probability output by the fourth neural network is greater than the second threshold, stop training to obtain a second neural network that can be used to synthesize a face image.

Optionally, the first threshold and the second threshold may be determined based on empirical values. The first probability of the output of the third neural network is greater than the first threshold and the second probability of the output of the fourth neural network is greater than the second threshold, then the training is stopped. Exemplarily, in a specific training process, the stopping timing may be determined according to the output result of the loss function, for example, the training is stopped when the output result of the loss function changes relatively smoothly. Or the training can be stopped when the preset number of iterative training is reached. Take the loss function as an example. The higher the output value (loss) of the loss function, the greater the difference. After the output result of the loss function is smooth and no longer decreases, it means that the second neural network training is completed, which can meet the goal of achieving face image synthesis. .

In some embodiments, before applying the third neural network and the fourth neural network to train the second neural network, it is also necessary to train the third neural network and the fourth neural network, and adjust the third neural network and the fourth neural network. Weight matrix. Under the premise that the input face image is a real face image or a synthetic face image, the third neural network and the fourth neural network are trained by the method of minimizing the objective function. Among them, the objective function contains part of the content representing the classification error. For example, the objective function corresponding to the third neural network includes a part representing the classification error. If the input face image is judged to be a real face image, the greater the output probability, the smaller the corresponding objective function value. Conversely, if it is determined that the input face image is a synthetic face image, the smaller the output probability, the larger the corresponding objective function value. By continuously inputting real face images and synthesizing face images, and minimizing the value of the objective function, the training process of the third neural network is completed. For another example, the objective function corresponding to the fourth neural network includes multiple parts representing classification errors, where each part represents the classification error of an attribute in the attribute information. For example, if a face image is input, each bit of the output attribute vector represents the probability of attribute information contained in the input face image. It is determined that the input face image contains specific attribute information, and the greater the probability of output, the smaller the corresponding objective function value. Conversely, if it is determined that the input face image does not contain specific attribute information, the smaller the output probability, the larger the corresponding objective function value. By continuously inputting real face images and synthesizing face images, and minimizing the value of the objective function, the training process of the fourth neural network is completed.

In still other embodiments, the first neural network and the second neural network form a generating network, and the three neural network and the fourth neural network form a discriminant network. Using the same method of GAN, according to the input and output of the first neural network, the second neural network, the third neural network and the fourth neural network, the first neural network, the second neural network, and the third neural network are trained through the objective function. Neural network and fourth neural network. In the confrontation process between the generating network and the discriminating network, the weights of the parameters are adjusted.

Therefore, the face image synthesis method provided by the embodiment of the present application can train the second neural network, so that the second face image synthesized by the second neural network and the corresponding first face image are excluded from the attributes that need to be corrected. The difference outside the difference information is as small as possible; the second face image is as real as possible; the second face image contains specific attribute information, such as attribute difference information. In this way, compared with the prior art that requires the participation of professional painters or cannot synthesize the face image synthesis method containing specific attribute information, the face image synthesis method provided in the embodiment of the present application only needs to perform a small amount of attribute correction based on the real face image ( It is also called attribute transfer), that is, it is possible to synthesize a face image containing specific attribute information. The method is more convenient and the synthesis of a face image is more realistic.

In the face image synthesis method provided by the embodiment of the present application, the user can also modify the synthesized face image, and the face image synthesis device adjusts the attribute information contained in the face image through interaction with the user. For example, some detailed features of face images that are difficult to fully describe. Or, some attribute information that was not collected in the initial collection of user attribute information can be adjusted by the following methods. If the modified attribute information changes greatly, such as the user fills in the wrong gender when filling out the form, it is necessary to re-execute the above steps S101 to S105. If the attribute information that needs to be modified does not change significantly, after step S105, the second face image can be modified directly through the following steps S106 to S108 to obtain a face image that meets the requirements more. As shown in FIG. 7, after the above step S105, the face image synthesis method of the embodiment of the present application may further include S106-S108:

S106. Obtain attribute adjustment information fed back by the user.

Among them, after the user obtains the second face image synthesized through the above step S105, he can intuitively recall the attribute information contained in the second face image and what else needs to be adjusted, and can propose on the basis of the second face image The attribute information that needs to be adjusted. The face image synthesis device can still collect the attribute adjustment information in a form of a table, or can change the attribute information in a simple drawing image directly performed by the user on the second face image. For example, if the user judges that a scar is missing on the second face image, he can directly draw a rough shape of the scar on the face of the second face image. After that, the face image synthesis device extracts information from the changed part of the content to obtain attribute adjustment information. In this way, the user can realize the adjustment of the attribute information of the synthesized face image in a relatively simple manner. Among them, the facial image synthesis device extracts information from the changed part of the facial image fed back by the user, which can be completed through a neural network. For example, a face image containing attribute adjustment information is input to a neural network for data reading, and a vector representing the attribute adjustment information is obtained.

Exemplarily, as shown in FIG. 8, the third vector 82 is obtained according to the attribute adjustment information 81 fed back by the user. Among them, the third vector 82 is used to represent the attribute information in the second face image 83 that needs to be adjusted. As shown in FIG. 8, the third vector 82 may represent the scar information that the user needs to add to the second face image.

S107: Perform facial feature extraction on the second face image to obtain second facial feature information of the second face image.

For the facial feature extraction process of the second face image, please refer to the related description of the process of facial feature extraction on the first face image in step S104, which will not be repeated here. Among them, the function implemented by the fifth neural network is the same as that of the first neural network, and is used to extract facial feature information from facial features of a human face to obtain second facial feature information. The fifth neural network and the first neural network may be the same or different, which is not specifically limited in the embodiment of the present application.

Exemplarily, referring to FIG. 8, the second face image 83 is input to the fifth neural network 84 to obtain the fourth vector 85. Among them, the fifth neural network 84 is used to extract facial feature information of the input face image. The fourth vector 85 is used to represent the second facial feature information of the second face image 83.

S108: Synthesize a third face image according to the second facial feature information and the attribute adjustment information.

Exemplarily, as shown in FIG. 8, the third vector 82 representing the attribute adjustment information obtained through the above step S106 is obtained, and the fourth vector 85 representing the facial feature information of the second face image is obtained through the above step S107. The third vector 82 and the fourth vector 85 are spliced and then input to the sixth neural network 86 to obtain the third face image 87. The sixth neural network 86 is used to correct the second facial feature information of the second face image according to the attribute adjustment information.

For the specific vector splicing and attribute information correction process, please refer to the related content in the above step S105, and the details will not be repeated here.

Therefore, the face image synthesis method provided by the embodiment of the present application can realize the interaction with the user, and in the interaction process, the modification of the attribute information can be realized through a simpler method. In this way, compared with the prior art face image synthesis method that requires the participation of professional painters or cannot interact with the user, the face image synthesis method provided in the embodiments of the present application can realize the attributes of the face image through a simple interaction process with the user The correction of the information makes the final face image closer to the desired face image.

Similarly, in some embodiments, before the sixth neural network for synthesizing the face image is applied, the sixth neural network needs to be trained to make it more realistic to synthesize the face image. The embodiment of the present application adopts a GAN-based manner to train the sixth neural network. Wherein, the confrontation network used for training the generation network (sixth neural network) may include the seventh neural network and the eighth neural network. The seventh neural network is used to determine the third probability that the input face image is a real face image. The eighth neural network is used to determine the fourth probability that the second real face image and the segmentation map of the synthetic face image corresponding to the second real face image are consistent. The specific training process is as follows.

Step 1. Initialize the sixth neural network.

Optionally, a sixth neural network that can be used for face image synthesis is constructed, and the weight matrix corresponding to each parameter contained therein is set as an initial value, and a training process needs to be performed, and the weight matrix corresponding to each parameter is learned during the training process. As described above for the deep neural network, training the sixth neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained sixth neural network.

Step 2. Obtain a second real face image, a vector corresponding to the second real face image, and a vector corresponding to the attribute adjustment information.

Optionally, first obtain a real face image data set and an attribute adjustment information data set. Wherein, the real face image data set includes one or more second real face images, and the attribute adjustment information data set includes one or more attribute adjustment information. After that, through the method introduced in the above step S106 and step S107, the vector corresponding to the second real face image is obtained through the fifth neural network, and the vector represents the second facial feature information of the second face image. In addition, the vector corresponding to the attribute adjustment information can be directly constructed. Further, a training data set containing multiple sets of training data can be obtained through the combination of the second real face image and the attribute difference information.

Step 3: Input the vector corresponding to the second real face image and the vector corresponding to the attribute adjustment information to the sixth neural network, and output a synthetic face image corresponding to the second real face image containing the attribute adjustment information.

Optionally, in the process of training the sixth neural network, inputting a vector corresponding to the second real face image can be combined with inputting a vector corresponding to the attribute adjustment information, and then synthesizing the face image according to the above step S107 and step S108 The method outputs the synthetic face image corresponding to the second real face image.

Exemplarily, as shown in FIG. 9, the vector 92 corresponding to the second real face image and the vector 93 corresponding to the attribute adjustment information are input to the sixth neural network 94, and the face image is synthesized by the sixth neural network 94, and the output contains attributes Adjust the information of the composite face image 95.

Step 4. Input the second real face image and the synthetic face image corresponding to the second real face image into the seventh neural network and the eighth neural network.

Optionally, the second real face image and its corresponding synthetic face image are input as a set of data into the seventh neural network and the eighth neural network, respectively. Among them, the seventh neural network can realize the role of true and false discrimination, to determine the probability that the synthesized face image output by the sixth neural network is a real face image. If the probability is higher, the probability that it is a real face image is higher, indicating that the sixth neural network has a higher ability to synthesize a real face image. The eighth neural network can determine the probability that the second real face image and the segmentation map of the synthetic face image corresponding to the second real face image are consistent. The higher the probability, the more consistent the segmentation map of the second real face image and the corresponding synthetic face image is, and the higher the ability of the sixth neural network to synthesize the face image, and the realization of the sixth neural network in the second face image Modify the more detailed attribute information according to the attribute adjustment information. In addition, the difference between the synthetic face image and the corresponding second real face image can be judged according to the probability that the input face image is the real face image and the probability that the segmentation image is consistent, so as to ensure that the synthetic face image is consistent with the corresponding The difference in the second face image except for the attribute adjustment information is as small as possible.

Among them, the edge-based segmentation method is one of the image segmentation methods, and the segmented face image can be represented by a segmentation map. The segmentation map includes dividing the face image from the background image along the outer contour of the face image, and only retains the part of the face image, so that the interference of the background information on the process of judging the face image can be ignored. Using the consistency of the real face image and the corresponding synthetic face image segmentation map, the face image synthesis ability of the sixth neural network can be judged. The synthesized face image will not cause large abnormal changes in the face image. The neural network will only modify the attribute information that needs to be adjusted in detail.

Exemplarily, referring to FIG. 9, the second real face image 91 and the synthesized face image 95 are input to the seventh neural network 96 and the eighth neural network 97.

It should be noted that the seventh neural network has the same function as the fourth neural network applied in the process of training the second neural network, and both are used to determine the probability that the input face image is a real face image. The seventh neural network and the fourth neural network may be the same or different.

Step 5. Perform iterative training on the sixth neural network according to the third probability output by the seventh neural network and the fourth probability output by the eighth neural network; during the iterative training process, adjust the weights of the parameters of the sixth neural network.

Optionally, the output result of the seventh neural network is the third probability of judging that the input face image is a real face image, and the output result of the eighth neural network is judging that the second real face image corresponds to the second real face image The fourth probability that the segmentation map of the synthesized face image is consistent. In other words, the output results of the seventh neural network and the eighth neural network are used to measure the difference between the synthesized face image output by the sixth neural network and the real face image input, using a loss function or target The objective function means that the higher the output value (loss) of the loss function, the greater the difference. Then the training of the sixth neural network becomes a process of reducing this loss as much as possible.

Among them, the objective function contains three parts of constraints. First, it is required that the difference between the synthesized face image output by the sixth neural network and the corresponding real face image is as small as possible. The smaller the difference, the smaller the objective function value. Secondly, it is required that the synthesized face image output by the sixth neural network can be hidden from the seventh neural network as much as possible, that is, the seventh neural network can judge the synthesized face image as a real face image. The higher the probability of being judged as a real image, the smaller the objective function value. Thirdly, the synthesized face image output by the sixth neural network and the segmentation image of the corresponding real face image are as similar as possible. The more similar the segmentation maps, the smaller the objective function value. In this way, the training of the sixth neural network can be achieved by minimizing the objective function.

For example, the output results of the seventh neural network and the eighth neural network are used as the operation condition input loss (loss) function, and the result of the loss function is input into the sixth neural network for back propagation operation. In the process of back propagation operation, Perform gradient update, modify the weight of each parameter of the sixth neural network, and finally obtain a better parameter weight matrix.

Among them, the greater the probability that the seventh neural network outputs the synthetic face image as a real face image, the smaller the loss function value. The seventh neural network judges that the smaller the difference between the synthetic face image and the second real face image except for the attribute information that needs to be modified, the smaller the loss function. The eighth neural network judges that the smaller the difference between the segmentation map of the synthetic face image and the second real face image, the smaller the loss function value. In this way, the training of the sixth neural network can be achieved by minimizing the objective function.

Exemplarily, as shown in FIG. 9, after the second real face image 91 and the corresponding synthetic face image 95 are input to the seventh neural network 96 and the eighth neural network 97, the third output of the seventh neural network 96 is The probability and the fourth probability output after the eighth neural network 97 are input to the objective function, and the sixth neural network 94 is trained by the method of minimizing the objective function.

Step 6. If the third probability output by the seventh neural network is greater than the third threshold and the fourth probability output by the eighth neural network is greater than the fourth threshold, stop training, and obtain a sixth neural network that can be used to synthesize a face image.

Wherein, the third threshold and the fourth threshold can be determined according to empirical values. The third probability of the output of the seventh neural network is greater than the third threshold and the fourth probability of the output of the eighth neural network is greater than the fourth threshold, then the training is stopped. Exemplarily, in a specific training process, the stopping timing may be determined according to the output result of the loss function, for example, the training is stopped when the output result of the loss function changes relatively smoothly. Or the training can be stopped when the preset number of iterative training is reached. Take the loss function as an example. The higher the output value (loss) of the loss function, the greater the difference. After the output result of the loss function does not drop smoothly, it means that the sixth neural network training is completed, which can meet the goal of achieving face image synthesis. .

In addition, before applying the seventh neural network and the eighth neural network to train the sixth neural network, it is also necessary to train the seventh neural network and the eighth neural network, and adjust the weight matrices of the seventh neural network and the eighth neural network. Under the premise that the input face image is a real face image or a synthetic face image, the seventh neural network and the eighth neural network are trained by minimizing the objective function. Among them, the objective function contains part of the content representing the classification error. For example, the objective function corresponding to the seventh neural network includes a part representing the classification error. If the input face image is judged to be a real face image, the greater the output probability, the smaller the corresponding objective function value. Conversely, if it is determined that the input face image is a synthetic face image, the smaller the output probability, the larger the corresponding objective function value. By continuously inputting real face images and synthesizing face images, and minimizing the value of the objective function, the training process of the seventh neural network is completed. The training process of the eighth neural network is similar to the training process of the seventh neural network. The objective function contains part of the content representing the classification error. Then the eighth neural network judges that the difference between the segmentation map of the real face image and the synthetic face image is smaller, Then the objective function value is smaller. Conversely, the greater the difference between the segmentation maps, the greater the value of the objective function. In this way, by continuously inputting real face images and synthetic face images, and minimizing the objective function value, the training process of the seventh neural network and the eighth neural network is completed.

Therefore, the face image synthesis method provided by the embodiment of the present application can train the sixth neural network to make the third face image synthesized by the sixth neural network and the corresponding second face image, except for those that need to be corrected. Differences other than the attribute adjustment information are as small as possible; the third face image is as real as possible; the segmentation map of the third face image is as same as the corresponding segmentation map of the second face image. In this way, compared with the prior art face image synthesis method that requires the participation of professional painters or cannot interact with the user, the face image synthesis method provided in the embodiments of the present application can interact with the user to analyze the face image attributes that are difficult to describe. The information is adjusted through simple input recognition, thereby more comprehensively synthesizing the face image that the user needs.

FIG. 10 shows a schematic diagram of a possible structure of the face image synthesizing device 1000 involved in the foregoing embodiment. The face image synthesis device 1000 includes: an acquisition unit 1001 and a processing unit 1002.

Wherein, the acquiring unit 1001 is used to support the face image synthesis apparatus 1000 to perform step S101 in FIG. 4, step S101 and step S106 in FIG. 7, and/or other processes used in the technology described herein.

The processing unit 1002 is used to support the face image synthesis device 1000 to perform step S102-step S105 in FIG. 4, step S102-step S105 and step S107-step S108 in FIG. 7, and/or for the technology described herein Other processes.

Among them, all relevant content of each step involved in the above method embodiment can be cited in the functional description of the corresponding functional unit, which will not be repeated here.

FIG. 11 is a schematic diagram of the hardware structure of the device provided by an embodiment of the application. The device includes at least one processor 1101, a communication line 1102, a memory 1103, and at least one communication interface 1104. The memory 1103 may also be included in the processor 1101.

The processor 1101 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more programs for controlling the execution of the program of this application. integrated circuit.

The communication line 1102 may include a path to transmit information between the aforementioned components.

The communication interface 1104 is used to communicate with other devices. In the embodiments of the present application, the communication interface may be a module, a circuit, a bus, an interface, a transceiver, or other device that can realize a communication function, and is used to communicate with other devices. Optionally, when the communication interface is a transceiver, the transceiver can be an independently set transmitter, which can be used to send information to other devices, and the transceiver can also be an independently set receiver for sending information from other devices. The device receives information. The transceiver may also be a component that integrates the functions of sending and receiving information, and the embodiment of the present application does not limit the specific implementation of the transceiver.

The memory 1103 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions The dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this. The memory may exist independently, and is connected to the processor 1101 through a communication line 1102. The memory 1103 may also be integrated with the processor 1101.

The memory 1103 is used to store computer-executed instructions used to implement the solution of the present application, and the processor 1101 controls the execution. The processor 1101 is configured to execute computer-executable instructions stored in the memory 1103, so as to implement the neural network adaptive search method provided in the following embodiments of the present application.

Optionally, the computer execution instructions in the embodiments of the present application may also be referred to as application program codes, instructions, computer programs, or other names, which are not specifically limited in the embodiments of the present application.

In a specific implementation, as an embodiment, the processor 1101 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 11.

In a specific implementation, as an embodiment, the device may include multiple processors, such as the processor 1101 and the processor 1105 in FIG. 11. Each of these processors can be a single-core processor or a multi-core processor. The processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).

It should be noted that the above-mentioned device may be a general-purpose device or a special-purpose device, and the embodiment of the present application does not limit the type of the device. The structure illustrated in the embodiment of the present application does not constitute a specific limitation on the device. In other embodiments of the present application, the device may include more or fewer components than those shown in the figure, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.

The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on a server, the server executes the above-mentioned related method steps to realize the people in the above-mentioned embodiments. Face image synthesis method.

The embodiments of the present application also provide a computer program product, which when the computer program product runs on a computer, causes the computer to execute the above-mentioned related steps, so as to realize the face image synthesis method in the above-mentioned embodiment.

In addition, the embodiments of the present application also provide a device, which may specifically be a component or a module. The device may include a connected processor and a memory; wherein the memory is used to store computer execution instructions. When the device is running, the processor The computer-executable instructions stored in the executable memory can be executed to make the device execute the face image synthesis method in the foregoing method embodiments.

Among them, the device, computer readable storage medium, computer program product, or chip provided in the embodiments of the present application are all used to execute the corresponding method provided above. Therefore, the beneficial effects that can be achieved can refer to the above provided The beneficial effects of the corresponding method will not be repeated here.

Through the description of the above embodiments, those skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated according to needs. It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the above-described system, device, and unit, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed method can be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, for example, multiple units or components may be divided. Combined or can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, modules or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be realized in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage media include: flash memory, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program instructions.

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any changes or substitutions within the technical scope disclosed in this application shall be covered by the protection scope of this application. . Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A method for synthesizing a face image, characterized in that the method includes:

Acquiring first attribute information; the first attribute information is the attribute information contained in the face image to be synthesized;

According to the first attribute information, search for a first face image in a real face image database; the first face image includes second attribute information, and the second attribute information is a duplicate of the first attribute information The rate meets the threshold requirement;

Obtaining attribute difference information according to the first attribute information and the second attribute information; the attribute difference information is used to indicate the attribute difference between the first face image and the face image to be synthesized;

Performing facial feature extraction on the first face image to obtain first facial feature information of the first face image;

Synthesize a second face image according to the first facial feature information and the attribute difference information.
The method according to claim 1, wherein the obtaining attribute difference information according to the first attribute information and the second attribute information comprises:

Obtaining a first attribute vector according to the first attribute information;

Obtaining a second attribute vector according to the second attribute information; the length of the first attribute vector and the second attribute vector are the same, and each bit corresponds to one type of attribute information;

A first vector is obtained according to the difference between the first attribute vector and the second attribute vector; the first vector is used to represent the attribute difference information.
The method according to claim 1, wherein said performing facial feature extraction on said first face image to obtain first facial feature information of said first face image comprises:

The first face image is input to a first neural network to obtain a second vector; wherein, the first neural network is used to extract facial feature information of the input face image; the second vector is used to represent the The first facial feature information of the first face image.
The method according to claim 2 or 3, wherein the synthesizing a second face image according to the first facial feature information and the attribute difference information comprises:

Acquiring a first vector and a second vector; wherein the first vector is used to represent the attribute difference information; the second vector is used to represent the first facial feature information of the first face image;

The first vector and the second vector are spliced and input into a second neural network to obtain the second face image; wherein, the second neural network is used to correct the first face image according to the attribute difference information The first facial feature information of a face image.
The method according to claim 4, characterized in that, before the facial feature extraction is performed on the first face image to obtain the first facial feature information of the first face image, the method further include:

Initialize the second neural network;

Acquire a first real face image, a vector corresponding to the first real face image, and a vector corresponding to attribute difference information; wherein the vector corresponding to the first real face image is used to represent the first real face The facial feature information of the image;

Inputting the vector corresponding to the first real face image and the vector corresponding to the attribute difference information into the second neural network, and outputting a synthetic face image containing the attribute difference information;

Input the first real human face image and the synthetic human face image corresponding to the first real human face image into the third neural network and the fourth neural network; wherein the third neural network is used to discriminate the input human face The first probability that the image is a real face image; the fourth neural network is used to determine the second probability that the input face image contains the attribute difference information;

Performing iterative training on the second neural network according to the first probability output by the third neural network and the second probability output by the fourth neural network;

During the iterative training process, adjusting the weights of the parameters of the second neural network;

If the first probability of the output of the third neural network is greater than the first threshold and the second probability of the output of the fourth neural network is greater than the second threshold, stop training the second neural network to obtain a signal that can be used to synthesize a face image The second neural network.
The method according to any one of claims 1-5, wherein the method further comprises:

Obtain the attribute adjustment information feedback from the user;

Performing facial feature extraction on the second face image to obtain second facial feature information of the second face image;

Synthesize a third face image according to the second facial feature information and the attribute adjustment information.
The method according to claim 6, wherein the synthesizing a third face image according to the second facial feature information and the attribute adjustment information comprises:

Obtaining a third vector according to the attribute adjustment information fed back by the user; wherein the third vector is used to represent the attribute information that needs to be adjusted in the second face image;

The second face image is input to a fifth neural network to obtain a fourth vector; wherein, the fifth neural network is used to extract facial feature information of the input face image; the fourth vector is used to represent the Second facial feature information of the second face image;

The third vector and the fourth vector are spliced and input into a sixth neural network to obtain the third face image; wherein, the sixth neural network is used to modify the second face image according to the attribute adjustment information The second facial feature information of the face image.
The method according to claim 7, characterized in that, before the synthesizing a third face image according to the second facial feature information and the attribute adjustment information, the method further comprises:

Initialize the sixth neural network;

Acquire a second real face image, a vector corresponding to the second real face image, and a vector corresponding to the attribute adjustment information; wherein the vector corresponding to the second real face image is used to represent the second real person Facial feature information of the face image;

Inputting the vector corresponding to the second real face image and the vector corresponding to the attribute adjustment information into the sixth neural network, and outputting a synthetic face image containing the attribute adjustment information;

Input the second real human face image and the synthetic human face image corresponding to the second real human face image into the seventh neural network and the eighth neural network; wherein the seventh neural network is used to discriminate the input human face The third probability that the image is a real face image; the eighth neural network is used to determine that the second real face image and the second real face image have the same fourth probability that the synthetic face image segmentation map is consistent Probability

Performing iterative training on the sixth neural network according to the third probability output by the seventh neural network and the fourth probability output by the eighth neural network;

During the iterative training process, adjusting the weights of the parameters of the sixth neural network;

If the third probability output by the seventh neural network is greater than the third threshold and the fourth probability output by the eighth neural network is greater than the fourth threshold, the training is stopped, and a sixth neural network that can be used to synthesize a face image is obtained.
The method according to any one of claims 1-8, wherein the attribute information includes any one or more of the following: age information, gender information, race information, skin color information, face shape information, facial features information, skin condition information , Wearing accessories information, hairstyle information, makeup information.
The method according to any one of claims 1-9, characterized in that, before said obtaining the first attribute information, the method further comprises:

A real face image database is established, and the real face image database contains real face images and attribute information contained in the real face images.
A face image synthesis device, characterized in that the device includes: an acquisition unit and a processing unit;

The acquiring unit is configured to acquire first attribute information; the first attribute information is the attribute information contained in the face image to be synthesized;

The processing unit is configured to search for a first face image in a real face image database according to the first attribute information; the first face image includes second attribute information, and the second attribute information is related to the The repetition rate of the first attribute information meets the threshold requirement;

The processing unit is further configured to obtain attribute difference information according to the first attribute information and the second attribute information; the attribute difference information is used to indicate the first face image and the face image to be synthesized The attribute difference between;

The processing unit is further configured to perform facial feature extraction on the first face image to obtain first facial feature information of the first face image;

The processing unit is further configured to synthesize a second face image according to the first facial feature information and the attribute difference information.
The device according to claim 11, wherein:

The processing unit is specifically configured to obtain a first attribute vector according to the first attribute information; obtain a second attribute vector according to the second attribute information; the length of the first attribute vector and the second attribute vector are the same , Each bit corresponds to a type of attribute information; a first vector is obtained according to the difference between the first attribute vector and the second attribute vector; the first vector is used to represent the attribute difference information.
The device according to claim 11, wherein:

The processing unit is specifically configured to input the first face image into a first neural network to obtain a second vector; wherein, the first neural network is configured to extract facial feature information of the input face image; The second vector is used to represent the first facial feature information of the first face image.
The device according to claim 12 or 13, characterized in that:

The processing unit is specifically configured to obtain a first vector and a second vector; wherein, the first vector is used to represent the attribute difference information; the second vector is used to represent the first facial feature of the first face image Information; the first vector and the second vector are spliced and then input into a second neural network to obtain the second face image; wherein the second neural network is used to correct the all according to the attribute difference information The first facial feature information of the first face image is described.
The device of claim 14, wherein:

The processing unit is also used to initialize the second neural network; obtain a first real face image, a vector corresponding to the first real face image, and a vector corresponding to attribute difference information; wherein, the first real face image The vector corresponding to the face image is used to represent the facial feature information of the first real face image; the vector corresponding to the first real face image and the vector corresponding to the attribute difference information are input to the second neural network , Output a synthetic face image containing the attribute difference information; input the first real face image and the synthetic face image corresponding to the first real face image into the third neural network and the fourth neural network; wherein The third neural network is used to determine the first probability that the input face image is a real face image; the fourth neural network is used to determine the second probability that the input face image contains the attribute difference information; Perform iterative training on the second neural network according to the first probability output by the third neural network and the second probability output by the fourth neural network; during the iterative training process, adjust the parameters of the second neural network If the first probability of the output of the third neural network is greater than the first threshold and the second probability of the output of the fourth neural network is greater than the second threshold, stop training the second neural network to obtain The second neural network for face images.
The device according to any one of claims 11-15, wherein:

The obtaining unit is also used to obtain attribute adjustment information fed back by the user;

The processing unit is further configured to perform facial feature extraction on the second face image to obtain second facial feature information of the second face image;

The processing unit is further configured to synthesize a third face image according to the second facial feature information and the attribute adjustment information.
The device of claim 16, wherein:

The processing unit is specifically configured to obtain a third vector according to the attribute adjustment information fed back by the user; wherein, the third vector is used to represent the attribute information that needs to be adjusted in the second face image; The second face image is input to the fifth neural network to obtain a fourth vector; wherein, the fifth neural network is used to extract facial feature information of the input face image; the fourth vector is used to represent the second person The second facial feature information of the face image; splicing the third vector and the fourth vector into the sixth neural network to obtain the third face image; wherein the sixth neural network is used to The attribute adjustment information corrects the second facial feature information of the second face image.
The device of claim 17, wherein:

The processing unit is also used to initialize the sixth neural network; obtain a second real face image, a vector corresponding to the second real face image, and a vector corresponding to attribute adjustment information; wherein, the second The vector corresponding to the real face image is used to represent the facial feature information of the second real face image; the vector corresponding to the second real face image and the vector corresponding to the attribute adjustment information are input into the sixth neural network, Outputting a synthetic face image containing the attribute adjustment information; inputting the second real face image and the synthetic face image corresponding to the second real face image into the seventh neural network and the eighth neural network; wherein, The seventh neural network is used to determine the third probability that the input face image is a real human face image; the eighth neural network is used to determine the second real human face image and the second real human face image The fourth probability that the segmentation maps of the corresponding synthetic face image are consistent; the sixth neural network is iteratively trained according to the third probability output by the seventh neural network and the fourth probability output by the eighth neural network In the iterative training process, adjust the weights of the parameters of the sixth neural network; if the third probability of the output of the seventh neural network is greater than the third threshold and the fourth probability of the output of the eighth neural network is greater than the fourth threshold, then Stop training and get the sixth neural network that can be used to synthesize face images.
The device according to any one of claims 11-18, wherein the attribute information includes any one or more of the following: age information, gender information, race information, skin color information, face shape information, facial features information, skin condition information , Wearing accessories information, hairstyle information, makeup information.
The device according to any one of claims 11-19, wherein:

The processing unit is also used to establish a real face image library, which contains real face images and attribute information contained in the real face images.
A face image synthesis device, which is characterized in that it comprises:

One or more processors;

Memory

And one or more instructions, wherein the one or more instructions are stored in the memory; when the instructions are executed by the one or more processors, the face image synthesis device is caused to execute The face image synthesis method described in any one of 1-10 is required.
A computer-readable storage medium, characterized by comprising computer instructions, when the computer instructions run on a computer, cause a processor to execute the face image synthesis method according to any one of claims 1-10.
A computer program product, characterized in that, when the computer program product runs on a computer, the computer is caused to execute the face image synthesis method according to any one of claims 1-10.