WO2021169556A1 - Method and apparatus for compositing face image - Google Patents

Method and apparatus for compositing face image Download PDF

Info

Publication number
WO2021169556A1
WO2021169556A1 PCT/CN2020/140440 CN2020140440W WO2021169556A1 WO 2021169556 A1 WO2021169556 A1 WO 2021169556A1 CN 2020140440 W CN2020140440 W CN 2020140440W WO 2021169556 A1 WO2021169556 A1 WO 2021169556A1
Authority
WO
WIPO (PCT)
Prior art keywords
face image
information
neural network
attribute
vector
Prior art date
Application number
PCT/CN2020/140440
Other languages
French (fr)
Chinese (zh)
Inventor
马骁勇
申皓全
王铭学
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021169556A1 publication Critical patent/WO2021169556A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/24Indexing scheme for image data processing or generation, in general involving graphical user interfaces [GUIs]

Definitions

  • This application relates to the field of artificial intelligence (AI), and in particular to a method and device for synthesizing a face image.
  • AI artificial intelligence
  • Face image synthesis technology is widely used in fields such as photographing entertainment, medical plastic surgery and so on.
  • users can make changes to several attributes of the photo, such as double eyelids, big eyes, and face-lifting.
  • doctors can modify the current user's photo based on the user's description to generate a postoperative effect map.
  • face image synthesis methods mainly include the following two.
  • the first is the traditional image processing method.
  • the usual method is to build a template library that contains a variety of partial images of the facial features.
  • the painter selects the facial features in the template library for splicing according to the description of the witnesses, and finally smoothes the edges of the spliced image to generate a face image.
  • it is difficult to guarantee the authenticity of the synthesized face image by simply splicing the partial images of the facial features.
  • due to the two subjective influences of the painter and the eyewitness there may be a gap between the synthesized face image and the actual face image required.
  • the second is a deep learning method, which uses massive face image data to train a deep neural network using the method of confrontation generation. Later, the trained neural network is used to generate face images. However, the trained neural network cannot synthesize face images containing user-specified attribute information.
  • the face image synthesis method provided in the present application can realize face image synthesis based on real face images, can obtain more realistic face images that meet the requirements, and have high synthesis efficiency.
  • the present application provides a face image synthesis method, which may include: acquiring first attribute information; the first attribute information is attribute information included in the face image to be synthesized.
  • the first attribute information is searched in the real face image database; the first face image contains the second attribute information, and the repetition rate of the second attribute information and the first attribute information meets the threshold requirement.
  • the attribute difference information is obtained; the attribute difference information is used to indicate the attribute difference between the first face image and the face image to be synthesized.
  • the second face image is synthesized according to the first facial feature information and the attribute difference information.
  • the first attribute information may be collected by the user and input into the face image synthesis device, and the face image synthesis device may synthesize the face image based on the attribute information.
  • the process for the user to collect the first attribute information includes: the face image synthesis device creates a face image attribute information questionnaire according to all the attribute information required to synthesize the face image, and sends the face image attribute information questionnaire to the terminal Device, the user fills in the terminal device and returns to the face image synthesis device, and then the face image synthesis device obtains the attribute information.
  • the face image synthesis device can collect the attribute information of the face image to be synthesized from multiple dimensions, so that the final synthesized face image is closer to the desired face image.
  • the attribute difference information is used to indicate the difference between the attribute information contained in the searched first face image and the first attribute information, and the attribute information that needs to be corrected in the first face image can be determined, and then the first face image can be corrected according to the attribute difference information. Face image.
  • the face image synthesis method provided by the embodiment of the present application may not require the participation of professionals, has high efficiency, and is convenient for promotion. And based on real face images for face image synthesis, a more realistic face image can be obtained.
  • obtaining the attribute difference information according to the first attribute information and the second attribute information includes: obtaining the first attribute vector according to the first attribute information.
  • the second attribute vector is obtained according to the second attribute information; the length of the first attribute vector and the second attribute vector are the same, and each bit corresponds to one type of attribute information.
  • the first vector is obtained according to the difference between the first attribute vector and the second attribute vector; the first vector is used to represent attribute difference information.
  • the value of the position is set as the first attribute vector in the output first vector The value of the position in. If the value of the corresponding position of the first attribute vector and the second attribute vector is the same, it means that the attribute information corresponding to the position does not need to be modified, and the value of the position in the output first vector can be set as a meaningless sign. In this way, after the first vector representing the attribute difference information is obtained, the attribute information of the first face image can be subsequently corrected according to the value of the meaningful position in the first vector.
  • performing facial feature extraction on the first face image to obtain first facial feature information of the first face image includes: inputting the first face image into the first neural network to obtain the first face image Two vectors; where the first neural network is used to extract the facial feature information of the input face image; the second vector is used to represent the first facial feature information of the first face image.
  • the face image may include multiple facial feature information that cannot be exhaustively listed.
  • the facial feature information may be used to represent the corresponding face image.
  • the attribute information contained in the first face image may be the first face image face. Part of the feature information.
  • Each person’s facial feature information is different. You can use facial feature information to specifically distinguish different people. For example, Zhang San’s facial feature information is different from Li Si’s facial feature information. You can use Zhang San’s facial feature information to quickly determine who is Zhang. three. Further, facial feature extraction is to obtain facial feature information that can represent the face image.
  • the first neural network is used to perform facial feature extraction on the first face image to obtain first facial feature information representing the first face image.
  • synthesizing the second face image according to the first facial feature information and the attribute difference information includes: obtaining a first vector and a second vector; wherein the first vector is used to represent the attribute difference information ; The second vector is used to represent the first facial feature information of the first face image.
  • the first vector and the second vector are spliced and input into the second neural network to obtain the second face image; wherein, the second neural network is used to correct the first facial feature information of the first face image according to the attribute difference information.
  • the splicing of the first vector and the second vector may include splicing the first vector directly after the second vector, and the second neural network corrects the part representing the attribute information in the second vector according to the first vector.
  • the length of the part representing the attribute information in the first vector and the second vector is the same, and each of them corresponds to a type of facial image attribute information.
  • the second neural network can directly correct the value of the corresponding position in the second vector according to the value of the first vector, thereby realizing the correction of the first face image according to the attribute difference information.
  • the attribute information of the first face image can be changed as little as possible except for the attribute information that needs to be corrected.
  • the method before performing facial feature extraction on the first face image to obtain first facial feature information of the first face image, the method further includes: initializing a second neural network. Obtain the first real face image, the vector corresponding to the first real face image, and the vector corresponding to the attribute difference information. Among them, the vector corresponding to the first real face image is used to represent the facial feature information of the first real face image; the vector corresponding to the first real face image and the vector corresponding to the attribute difference information are input to the second neural network, and the output contains Synthetic face image of attribute difference information.
  • the third neural network is used to distinguish the input face image as a real face image
  • the fourth neural network is used to determine the second probability that the input face image contains attribute difference information.
  • the second neural network is iteratively trained according to the first probability output by the third neural network and the second probability output by the fourth neural network. In the iterative training process, the weights of the parameters of the second neural network are adjusted. If the first probability output by the third neural network is greater than the first threshold and the second probability output by the fourth neural network is greater than the second threshold, stop training the second neural network to obtain a second neural network that can be used to synthesize a face image.
  • the second neural network can be trained using the method of adversarial networks (generative adversarial networks, GAN).
  • GAN generative adversarial networks
  • the fourth neural network uses the fourth neural network to determine the second probability that the synthetic face image output by the second neural network contains attribute difference information .
  • the weights of the parameters of the second neural network are adjusted, so that the second neural network synthesizes the face image, and the corresponding real face image
  • the difference is as small as possible, more real, and contains specific attribute information such as attribute difference information.
  • the method further includes: obtaining attribute adjustment information fed back by the user. Perform facial feature extraction on the second face image to obtain second facial feature information of the second face image.
  • the third face image is synthesized according to the second facial feature information and the attribute adjustment information.
  • the attribute adjustment information fed back by the user represents the information that the user wants to adjust the attributes of the synthesized face image
  • the face image synthesis device adjusts the attribute information contained in the face image through an interaction process with the user. For example, some detailed features of face images that are difficult to fully describe. Or, some attribute information that was not collected in the initial collection of user attribute information can be adjusted by using the attribute adjustment information. For example, it is possible to collect the adjustment made by the user directly by drawing on the above-mentioned synthesized second face image, identify the adjusted attribute information, and use the adjusted attribute information to directly correct the second face image. In turn, a real face image that is closer to the user's needs is obtained.
  • synthesizing the third face image according to the second facial feature information and the attribute adjustment information includes: obtaining the third vector according to the attribute adjustment information fed back by the user; wherein the third vector is used to represent the first The attribute information that needs to be adjusted in the face image.
  • the second face image is input to the fifth neural network to obtain the fourth vector; the fifth neural network is used to extract the facial feature information of the input face image; the fourth vector is used to represent the second face image of the second Facial feature information.
  • the third vector and the fourth vector are spliced and input into the sixth neural network to obtain the third face image; wherein the sixth neural network is used to correct the second facial feature information of the second face image according to the attribute adjustment information.
  • the attribute information can be directly converted into a vector problem, and the neural network is used to synthesize the face image, so that the synthesized face image has as little change as possible except for the part that needs to be adjusted.
  • the method before synthesizing the third face image according to the second facial feature information and the attribute adjustment information, the method further includes: initializing a sixth neural network. Obtain the second real face image, the vector corresponding to the second real face image, and the vector corresponding to the attribute adjustment information. Among them, the vector corresponding to the second real face image is used to represent the facial feature information of the second real face image; the vector corresponding to the second real face image and the vector corresponding to the attribute adjustment information are input to the sixth neural network, and the output contains Synthetic face image with attribute adjustment information.
  • the seventh neural network and the eighth neural network are used to distinguish the input face image as a real face image
  • the third probability; the eighth neural network is used to determine the fourth probability that the second real face image and the second real face image corresponding to the synthetic face image segmentation map are consistent.
  • the sixth neural network is iteratively trained according to the third probability output by the seventh neural network and the fourth probability output from the eighth neural network. In the iterative training process, the weights of the parameters of the sixth neural network are adjusted. If the third probability output by the seventh neural network is greater than the third threshold and the fourth probability output by the eighth neural network is greater than the fourth threshold, the training is stopped, and a sixth neural network that can be used to synthesize a face image is obtained.
  • the GAN method can be used to train the sixth neural network.
  • the weights of the parameters of the sixth neural network are adjusted, so that the sixth neural network synthesizes the face image, and the corresponding real face image The difference is as small as possible, more real, with similar segmentation maps.
  • the attribute information includes any one or more of the following: age information, gender information, race information, skin color information, face shape information, facial features information, skin condition information, wearing accessories information, hairstyle information, makeup information.
  • the face shape information may include information such as the shape of the human face and the height of the cheekbones.
  • the skin condition information may include wrinkles, spots, beards, scars and other information on the skin.
  • the wearing accessories information may include wearing glasses, masks, hats and other information.
  • a real face image similar to the face image to be synthesized can be searched from multiple dimensions, and the face image can be synthesized based on the real face image by changing less attribute information, making the synthesized face image more realistic.
  • the method before acquiring the first attribute information, further includes: establishing a real face image database, the real face image database containing real face images and attribute information contained in the real face images.
  • the real face image library includes attribute information contained in the real face image. Furthermore, in the process of searching for the first face image, the corresponding real face image can be directly determined according to the repetition rate of the attribute information.
  • the present application provides a face image synthesis device, which may include: an acquisition unit and a processing unit.
  • the obtaining unit is used to obtain first attribute information; the first attribute information is the attribute information contained in the face image to be synthesized.
  • the processing unit is used to search for the first face image in the real face image database according to the first attribute information; the first face image contains the second attribute information, and the repetition rate of the second attribute information and the first attribute information satisfies the threshold Require.
  • the processing unit is further configured to obtain attribute difference information according to the first attribute information and the second attribute information; the attribute difference information is used to indicate the attribute difference between the first face image and the face image to be synthesized.
  • the processing unit is further configured to perform facial feature extraction on the first face image to obtain first facial feature information of the first face image.
  • the processing unit is further configured to synthesize a second face image according to the first facial feature information and the attribute difference information.
  • the processing unit is specifically configured to obtain the first attribute vector according to the first attribute information.
  • the second attribute vector is obtained according to the second attribute information; the length of the first attribute vector and the second attribute vector are the same, and each bit corresponds to one type of attribute information.
  • the first vector is obtained according to the difference between the first attribute vector and the second attribute vector; the first vector is used to represent attribute difference information.
  • the processing unit is specifically configured to input the first face image into the first neural network to obtain the second vector; wherein the first neural network is used to extract facial feature information of the input face image ; The second vector is used to represent the first facial feature information of the first face image.
  • the processing unit is specifically used to obtain the first vector and the second vector; wherein the first vector is used to represent the attribute difference information; the second vector is used to represent the first face image of the first face image. Facial feature information.
  • the first vector and the second vector are spliced and input into the second neural network to obtain the second face image; wherein, the second neural network is used to correct the first facial feature information of the first face image according to the attribute difference information.
  • the processing unit is also used to initialize the second neural network.
  • the vector corresponding to the first real face image is used to represent the facial feature information of the first real face image; the vector corresponding to the first real face image and the vector corresponding to the attribute difference information are input to the second neural network, and the output contains The synthetic face image corresponding to the first real face image of the attribute difference information.
  • the third neural network is used to distinguish the input face image as a real face image
  • the fourth neural network is used to determine the second probability that the input face image contains attribute difference information.
  • the second neural network is iteratively trained according to the first probability output by the third neural network and the second probability output by the fourth neural network. In the iterative training process, the weights of the parameters of the second neural network are adjusted. If the first probability output by the third neural network is greater than the first threshold and the second probability output by the fourth neural network is greater than the second threshold, stop training the second neural network to obtain a second neural network that can be used to synthesize a face image.
  • the obtaining unit is also used to obtain attribute adjustment information fed back by the user.
  • the processing unit is further configured to perform facial feature extraction on the second face image to obtain second facial feature information of the second face image.
  • the processing unit is further configured to synthesize a third face image according to the second facial feature information and the attribute adjustment information.
  • the processing unit is specifically configured to obtain the third vector according to the attribute adjustment information fed back by the user; wherein, the third vector is used to represent the attribute information that needs to be adjusted in the second face image.
  • the second face image is input to the fifth neural network to obtain the fourth vector; the fifth neural network is used to extract the facial feature information of the input face image; the fourth vector is used to represent the second face image of the second Facial feature information.
  • the third vector and the fourth vector are spliced and input into the sixth neural network to obtain the third face image; wherein the sixth neural network is used to correct the second facial feature information of the second face image according to the attribute adjustment information.
  • the processing unit is also used to initialize the sixth neural network.
  • the vector corresponding to the second real face image is used to represent the facial feature information of the second real face image; the vector corresponding to the second real face image and the vector corresponding to the attribute adjustment information are input to the sixth neural network, and the output contains The synthetic face image corresponding to the second real face image of the attribute adjustment information.
  • the seventh neural network and the eighth neural network are used to distinguish the input face image as a real face image
  • the third probability; the eighth neural network is used to determine the fourth probability that the second real face image and the second real face image corresponding to the synthetic face image segmentation map are consistent.
  • the sixth neural network is iteratively trained according to the third probability output by the seventh neural network and the fourth probability output from the eighth neural network. In the iterative training process, the weights of the parameters of the sixth neural network are adjusted. If the third probability output by the seventh neural network is greater than the third threshold and the fourth probability output by the eighth neural network is greater than the fourth threshold, the training is stopped, and a sixth neural network that can be used to synthesize a face image is obtained.
  • the attribute information includes any one or more of the following: age information, gender information, race information, skin color information, face shape information, facial features information, skin condition information, wearing accessories information, hairstyle information, makeup information.
  • the processing unit is also used to establish a real face image database, and the real face image database contains real face images and attribute information contained in the real face images.
  • the present application provides a face image synthesis device.
  • the face image synthesis device may include: one or more processors, memories, and one or more instructions. One or more instructions are stored in the memory. When the instruction is executed by one or more processors, the face image synthesis apparatus is caused to execute the face image synthesis method as described in the first aspect and any one of its possible implementation manners.
  • the present application provides a device that has the function of implementing the face image synthesis method described in the first aspect and any one of its possible implementation manners.
  • This function can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • the present application provides a computer-readable storage medium, including computer instructions, which when the computer instructions are executed on the computer, cause the processor to execute the human Face image synthesis method.
  • the present application provides a computer program product, which when the computer program product runs on a server, causes the face image synthesis device to execute the face image composition device described in the first aspect and any one of its possible implementations.
  • Image synthesis method when the computer program product runs on a server, causes the face image synthesis device to execute the face image composition device described in the first aspect and any one of its possible implementations.
  • a circuit system in a seventh aspect, includes a processing circuit configured to execute the face image synthesis method described in the first aspect and any one of its possible implementation manners.
  • FIG. 1 is a schematic diagram of an application scenario of a face image synthesis method provided by an embodiment of the present application
  • Figure 2 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of the hardware structure of a chip provided by an embodiment of the present application.
  • FIG. 4 is a first schematic flowchart of a method for synthesizing a face image provided by an embodiment of the present application
  • FIG. 5 is a second schematic flowchart of a method for synthesizing a face image provided by an embodiment of the present application
  • FIG. 6 is a schematic diagram 1 of the training process of a facial image synthesis neural network provided by an embodiment of the application;
  • FIG. 7 is a third schematic flowchart of a method for synthesizing a face image provided by an embodiment of the application.
  • FIG. 8 is a fourth schematic flowchart of a method for synthesizing a face image according to an embodiment of the application.
  • FIG. 9 is a second schematic diagram of a training process of a facial image synthesis neural network provided by an embodiment of the application.
  • FIG. 10 is a schematic structural diagram of a face image synthesis device provided by an embodiment of the application.
  • FIG. 11 is a schematic diagram of the hardware structure of a face image synthesis device provided by an embodiment of the application.
  • FIG. 1 shows a face image synthesis system.
  • the face image synthesis system includes a face image synthesis device 110 and a terminal device 120.
  • the facial image synthesis device 110 and the terminal device 120 may be connected through a wired network or a wireless network.
  • the embodiment of the present application does not specifically limit the connection mode between devices.
  • the aforementioned terminal device 120 provides a related human-computer interaction interface so that the user can input related parameters required in the process of synthesizing the face image, such as attribute information of the face image to be synthesized, attribute adjustment information, and the like.
  • the attribute information of the face image to be synthesized may include gender information, and age information, face shape information, etc. may be used to describe the attribute information of the face feature.
  • the terminal device may be a mobile phone (mobile phone), a tablet computer (pad), a computer with wireless transceiver function, a personal digital assistant (personal digital assistant, PDA), a netbook, a desktop, a laptop, a handheld Computers, laptops, artificial intelligence (AI) terminals and other terminal devices.
  • the embodiment of the present application does not impose special restrictions on the specific form of the terminal device 120.
  • the aforementioned facial image synthesis device 110 may be a device or server with image search and image synthesis functions, such as a cloud server or a network server.
  • the face image synthesis device 110 receives the attribute information and attribute adjustment information sent from the terminal device 120 through an interactive interface, and then searches for real face images through a processor based on the real face image library stored in the memory. After that, the processor uses the searched real face image and the obtained attribute information to synthesize the face image, and the processor can use the obtained attribute adjustment information to further adjust the attributes of the synthesized face image.
  • the final synthesized human face image is sent to the corresponding terminal device 120.
  • the memory in the face image synthesis device 110 may be a general term, including a database for local storage and storage of historical face images. The database may be on the face image synthesis device or on other cloud servers.
  • the face image synthesis device 110 may be a server, a server cluster composed of multiple servers, or a cloud computing service center.
  • the server as the face image synthesis device 110 can execute the face image synthesis method of the embodiment of the present application.
  • the terminal device 120 directly functions as a face image synthesis device, receiving the attribute information and/or attribute adjustment information of the face image to be synthesized from the user's input, and the terminal device 120 performs the face image synthesis task by itself. Synthesize the desired face image. That is, the terminal device 120 itself can execute the face image synthesis method of the embodiment of the present application.
  • Fig. 2 exemplarily shows a system architecture provided by an embodiment of the present application.
  • the face image synthesis device 110 is equipped with a transceiver interface 211 for data interaction with external devices.
  • the face image synthesis device 110 receives input data transmitted by the terminal device 120 through the transceiver interface 211.
  • the input data in the embodiment of the present application may include: attribute information of the face image to be synthesized and attribute adjustment information.
  • the face image collection module 240 is used to collect real face images.
  • the real face image in the embodiment of the present application may be a collected face image of a local resident population.
  • the face image collection module 240 stores these real face images in the database 230.
  • the database 230 may also include a real face image library 231, which is used to store the real face images searched by the face image acquisition module 240.
  • the database 230 may also store face images used for training the face image synthesis device 110.
  • the real face images maintained in the database 230 may not all come from the collection of the face image collection module 240, and may also be received from other devices. For example, the information sent by the terminal device 120 for expanding the real face gallery 231 is received.
  • the attribute information collection module 212 is used to collect attribute information 201.
  • the attribute information 201 may include, for example, attribute information and attribute adjustment information of the face image to be synthesized.
  • the attribute information collection module 212 collects the attribute information of the face image to be synthesized that is required for synthesizing the face image through the transceiver interface 211. And in the process of synthesizing the face image, the attribute adjustment information input by the user through the terminal device 120 is collected.
  • the search module 213 is configured to search the real face image library 231 based on the attribute information 201 for the real face image 202 whose repetition rate with the attribute information of the face image to be synthesized meets the threshold requirement, that is, to search for the real face image that is closer to the face image to be synthesized Face image.
  • the generating module 214 is used to train the real face image 202 based on the attribute information 201 to obtain the synthesized face image 203. For example, based on the attribute information of the face image to be synthesized, some attribute information is added/decreased/corrected in the real face image. For example, the hairstyle attribute information in the real face image 202 is corrected from short hair to long hair. And output the synthesized face image 203 to the terminal device 120 through the transceiver interface 211.
  • the face image attribute adjustment module 215 is used to adjust the details of the attributes in the synthesized face image based on the face image generated by the generation module 214 and the attribute adjustment information collected by the attribute information acquisition module 212, so as to obtain a more user-friendly result The required face image.
  • FIG. 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the database 230 is an external memory relative to the face image synthesis device 110. In other cases, the database 230 may also be placed in the face image synthesis device 110.
  • a neural network can be composed of neural units.
  • a neural unit can refer to an arithmetic unit that takes x s and intercept 1 as inputs.
  • the output of the arithmetic unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • the DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the number of layers in the middle are all hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • DNN looks complicated, it is not complicated as far as the work of each layer is concerned. Simply put, it is the following nonlinear relational expression: in, Is the input vector, Is the output vector, Is the offset vector, W is the weight matrix (also called coefficient), and ⁇ () is the activation function.
  • Each layer is just the input vector After such a simple operation, the output vector is obtained Due to the large number of DNN layers, the coefficient W and the offset vector The number is also relatively large.
  • DNN The definition of these parameters in DNN is as follows: Take coefficient W as an example: Suppose in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.
  • the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as
  • the neural network can use the back propagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the back-propagation algorithm is a back-propagation motion dominated by error loss, and aims to obtain better neural network model parameters, such as a weight matrix.
  • FIG. 3 is a hardware structure of a chip provided by an embodiment of the application.
  • the chip includes a neural-network processing unit (NPU) 300.
  • the chip can be set in the face image synthesis device 110 shown in FIG. 2 to complete all or part of the work of the attribute information collection module 212 in FIG.
  • All or part of the work of the module 213 can also be used to complete all or part of the work in the generating module 214 (for example, generating a synthetic face image 203), and can also be used to complete the attributes of the face image All or part of the work of the adjusting module 215 (for example, adjusting the attribute information of the synthetic face image 203 generated by the generating module 214).
  • the neural network processor NPU 300 is mounted as a co-processor to a main central processing unit (central processing unit, CPU) (host CPU) 320, and the main CPU 320 allocates tasks.
  • the core part of the NPU 300 is the arithmetic circuit 303.
  • the controller 304 controls the arithmetic circuit 303 to extract data from the memory (weight memory or input memory) and perform calculations.
  • the arithmetic circuit 303 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 303 is a two-dimensional systolic array. The arithmetic circuit 303 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 303 is a general-purpose matrix processor.
  • the arithmetic circuit 303 fetches the data corresponding to the matrix B from the weight memory 302 and caches it on each PE in the arithmetic circuit 303.
  • the arithmetic circuit 303 takes the matrix A data and the matrix B from the input memory 301 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 308 (accumulator).
  • the vector calculation unit 307 can perform further processing on the output of the arithmetic circuit 303, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on.
  • the vector calculation unit 307 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .
  • the vector calculation unit 307 can store the processed output vector to the unified memory 306.
  • the vector calculation unit 307 may apply a nonlinear function to the output of the arithmetic circuit 303, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 307 generates a normalized value, a combined value, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 303, for example, for use in a subsequent layer in a neural network.
  • the unified memory 306 is used to store input data and output data.
  • the weight data directly passes through the storage unit access controller 305 (direct memory access controller, DMAC) to store the input data in the external memory 330 into the input memory 301 and/or the unified memory 306, and store the weight data in the external memory 330 into the weight
  • DMAC direct memory access controller
  • the bus interface unit 310 (bus interface unit, BIU) is used to implement interaction between the main CPU 320, the DMAC, and the fetch memory 309 through the bus.
  • the instruction fetch buffer 309 connected to the controller 304 is used to store instructions used by the controller 304.
  • the controller 304 is used to call the instructions cached in the instruction fetch memory 309 to control the working process of the computing accelerator.
  • the unified memory 306, the input memory 301, the weight memory 302, and the instruction fetch memory 309 are all on-chip (On-Chip) memories
  • the external memory 330 is a memory external to the NPU 300, and the external memory 330 can be synchronized at a double data rate.
  • Dynamic random access memory double data rate synchronous dynamic random access memory, DDR SDRAM
  • high bandwidth memory HBM
  • the face image synthesis method according to the embodiment of the present application will be described in detail below with reference to FIG. 4.
  • the face image synthesis method of the embodiment of the present application may be executed by devices such as the face image synthesis device 110 in FIG. 1 and the face image synthesis device 110 in FIG. 2.
  • the method may include S101-S105:
  • the first attribute information is attribute information contained in the face image to be synthesized.
  • the attribute information includes any one or more of the following: age information, gender information, race information, skin color information, face information, facial features information, skin condition information, wearing accessories information, hairstyle information, makeup information.
  • the face shape information may include information such as the shape of the human face and the height of the cheekbones.
  • the skin condition information may include wrinkles, spots, beards, scars and other information on the skin.
  • the wearing accessories information may include wearing glasses, masks, hats and other information.
  • the attribute information of the face image to be synthesized needs to be collected first, and then the synthesized face image needs to include these attribute information.
  • the method of collecting attribute information includes: the face image synthesis device creates a face image attribute information questionnaire according to all the attribute information needed to synthesize the face image, and sends the face image attribute information questionnaire to the terminal device, and The witness fills in the terminal device and returns to the face image synthesis device, and then the face image synthesis device obtains the attribute information.
  • the face image synthesis device can collect the attribute information of the face image to be synthesized from multiple dimensions, so that the final synthesized face image is closer to the required face image.
  • the content of a face image attribute information questionnaire is exemplarily given.
  • the first attribute information required to synthesize a face image this time includes males aged 30-40, sharp chins, and big eyes. High nose, thin lips, short hair, oblique bangs.
  • the face image attribute information questionnaire can also be obtained through other expression forms of the face image attribute information questionnaire.
  • list all possible attribute information in the face image attribute information questionnaire and determine the attribute information by collecting the check results of the user.
  • an adjustable progress bar corresponding to the degree of presentation of various attribute information is established, and the specific situation of the attribute information is obtained by the user adjusting the degree of the progress bar.
  • the progress bar at 20% means short hair, and 80% means long hair.
  • the first attribute information search for a first face image in a real face image database, where the first face image includes second attribute information, and the repetition rate of the second attribute information and the first attribute information meets the threshold requirement.
  • the real face image library is used to store real face images.
  • a real face image database is established by collecting real face images of the local resident population.
  • the real face image library also includes attribute information contained in the real face image.
  • the corresponding real face image can be directly determined according to the repetition rate of the attribute information.
  • a large number of real face images can be collected in advance to establish a real face image database, and the real face image database can be updated and expanded periodically.
  • the first face image is a certain face image found in the real face image library, therefore, the first face image is a real face image.
  • the repetition rate of the second attribute information and the first attribute information contained in the first face image also needs to meet certain threshold requirements. In this way, a real face image that is closer to the face image to be synthesized can be found.
  • the value of the threshold can be set according to an empirical value, for example, the repetition rate of the first attribute information and the second attribute information is greater than or equal to 80%. Further, if multiple first face images that meet the threshold requirement are found, the face image with the highest repetition rate may be used as the first face image for subsequent synthesis of the face image.
  • the second attribute information contained in the first face image found based on the first attribute information shown in Table 1 above.
  • the second attribute information contained in the first face image includes a 35-year-old man, sharp chin, big eyes, high nose bridge, thin lips, and short hair.
  • the face image is synthesized on the basis of the found real first face image, and the synthesized face image can also be closer to the real face image.
  • the attribute difference information is used to indicate the attribute difference between the first face image and the face image to be synthesized. That is, the attribute information that needs to be added/deleted/corrected on the first face image.
  • the first attribute vector is obtained according to the first attribute information.
  • the length of the first attribute vector and the second attribute vector are the same, and each bit corresponds to one type of attribute information.
  • the first vector is obtained according to the difference between the first attribute vector and the second attribute vector.
  • the first vector is used to represent the attribute difference information.
  • each bit of the attribute vector is sequentially corresponding to one type of attribute information.
  • the face image attribute information questionnaire contains three types of attribute information, namely gender, hair style, and skin color.
  • the length of the first attribute vector and the second attribute vector used to represent the attribute information are both 3, and each of them corresponds to A type of attribute information.
  • 0 can be preset to indicate a female, and 1 to indicate a male.
  • the hair length can be divided into 9 intervals from short to long, corresponding to nine numbers from 1-9, and 0 means that the attribute information is not collected.
  • the skin color attribute information can be divided into 9 intervals from light to dark, corresponding to nine numbers 1-9, and 0 means that the attribute information is not collected.
  • characters with no meaning may be used to indicate that the first attribute information and the second attribute information corresponding to the position are the same, such as X.
  • the attribute difference information represented by the first vector to modify the attribute information of the first face image, there is no need to modify the attribute information of the first face image corresponding to the position of the meaningless character.
  • the first attribute information collected is male with short hair.
  • the first attribute vector obtained according to the first attribute information is expressed as As shown in Table 4 below, the second attribute information contained in the found first face image is male, with medium-length hair and white skin.
  • the second attribute vector corresponding to the second attribute information is expressed as The 2nd and 3rd attribute vectors in the first attribute vector and the second attribute vector are different. In this way, the attribute difference information can be obtained according to the difference between the two attribute vectors, and the first vector can be expressed as
  • the first attribute vector 511 is obtained according to the obtained first attribute information 51
  • the second attribute vector 521 is obtained according to the obtained second attribute information 52
  • the first vector 53 is obtained according to the first attribute vector 511 and the second attribute vector 521.
  • obtain the first attribute information contained in the above table 1 and the second attribute information contained in the above table 2 respectively obtain the first attribute vector and the second attribute vector according to the first attribute information and the second attribute information, and then press
  • the bit subtraction method compares the first attribute vector and the second attribute vector to obtain a first vector, and the first vector indicates that the attribute difference information is oblique bangs.
  • the second attribute information contained in the first face image can be compared with the collected first attribute information to determine the attribute information that needs to be added/deleted/corrected on the first face image.
  • S104 is executed, that is, the first face image is adjusted according to the attribute difference information to obtain a second face image that is closer to the face image to be synthesized.
  • S104 Perform facial feature extraction on the first face image to obtain first facial feature information of the first face image.
  • the face image may include multiple pieces of facial feature information that cannot be exhaustively listed, and the facial feature information may be used to represent the corresponding face image.
  • the above-mentioned attribute information contained in the first face image may be part of the content of the facial feature information of the first face image, and this part of the content can be easily observed, memorized, and described by the user.
  • the facial feature information of each person is different, and the similarity of the facial feature information can be used to distinguish different people. For example, Zhang San’s facial feature information is different from Li Si’s facial feature information, and Zhang San’s facial feature information can be used to quickly determine who is Zhang San. Further, facial feature extraction is to obtain facial feature information that can represent the face image.
  • a neural network is used to implement facial feature extraction of the first face image, and the first face image is converted into a second vector representing the first facial feature information.
  • the second vector contains content representing attribute information.
  • the first face image is input to the input layer of the neural network, and the operator of the input layer extracts the facial feature information contained in the first face image to form a high-dimensional matrix representing the facial feature information.
  • the high-dimensional matrix is input into the hidden layer of the neural network, and the high-dimensional matrix is processed for dimensionality reduction through the operation of each layer of the operator in the hidden layer.
  • the neural network output layer outputs the second vector.
  • the first face image 54 is input to the first neural network 55 to obtain the second vector 56.
  • the first neural network 55 is used to extract the facial feature information of the input face image
  • the second vector 56 is used to represent the first facial feature information of the first face image 54.
  • S105 Synthesize a second face image according to the first facial feature information and the attribute difference information.
  • the attribute difference information between the first face image and the face image to be synthesized, and the facial feature information of the first face image are obtained, and then the first face is corrected according to the attribute difference information
  • the facial feature information of the image is synthesized to a face image that is closer to the demand and contains the attribute difference information.
  • the first vector 53 representing the attribute difference information and the second vector 56 representing the first facial feature information of the first face image are spliced, and then input into the second neural network 57 after splicing.
  • the second neural network 57 is used to correct the first facial feature information of the first face image 54 according to the attribute difference information.
  • the attribute difference information obtained according to the above Table 1 and Table 2 is oblique bangs
  • the second neural network 57 can synthesize the oblique bangs in the first face image 54 to obtain the second face image 58 in FIG. 5.
  • the two face images 58 contain attribute difference information (such as oblique bangs).
  • the splicing of the first vector and the second vector may include splicing the first vector directly after the second vector, and the second neural network corrects the second vector according to the first vector.
  • the position corresponding to the same attribute information of the first attribute information and the second attribute information can be set to a meaningless symbol such as X, and then the second neural network can be directly based on the part of the first vector.
  • the content of the meaning position modifies the second vector.
  • the second neural network can directly modify the second vector based on the content of a part of the meaningful position in the first vector. For example, the second neural network reads that the value of one bit of the vector representing the attribute information part of the spliced vector is meaningful, and obtains the attribute information corresponding to the bit value of the vector, such as oblique bangs attribute information.
  • the vector representing the facial feature information is corrected, and then the corrected vector representing the facial feature information is subjected to dimensionality processing to obtain a composite face image containing oblique bangs.
  • the second neural network synthesizing the face image can correct the attribute information contained in the found real face image based on the attribute difference information, thereby obtaining a synthetic face image containing specific attribute information that is closer to the real face image.
  • the face image synthesis method provided by the embodiment of the present application can obtain a real face image closer to the face image to be synthesized through the collected face image attribute information, and according to the attribute information contained in the real face image and The collected face image attribute information obtains the attribute difference information, and then the face image synthesis is performed based on the real face image according to the attribute difference information to obtain a more realistic and satisfying face image.
  • the face image synthesis method provided in the embodiments of the present application may not require the participation of professionals, has high efficiency, and is convenient for promotion. And based on real face images for face image synthesis, a more realistic face image can be obtained.
  • the second neural network before the second neural network for synthesizing the face image is applied, the second neural network needs to be trained to make it more realistic and reliable to synthesize the face image.
  • the embodiment of the application adopts a method based on adversarial networks (generative adversarial networks, GAN) to train the second neural network.
  • the confrontation network used for training the second neural network (also referred to as the generation network) may include a third neural network and a fourth neural network.
  • the third neural network is used to determine the first probability that the input face image is a real face image.
  • the fourth neural network is used to determine the second probability that the input face image contains attribute difference information.
  • the specific training process is as follows.
  • Step 1 Initialize the second neural network.
  • a second neural network that can be used for face image synthesis is constructed, and the weight matrix corresponding to each parameter contained therein is set as an initial value, and a training process needs to be performed, and the weight matrix corresponding to each parameter is learned during the training process.
  • training the second neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained second neural network.
  • Step 2 Obtain the first real face image, the vector corresponding to the first real face image, and the vector corresponding to the attribute difference information.
  • the real face image data set includes one or more first real face images
  • the attribute difference information data set includes one or more attribute difference information.
  • the vector corresponding to the first real face image is obtained through the first neural network, and the vector represents the first facial feature information of the first face image.
  • the vector corresponding to the attribute difference information can be directly constructed.
  • a training data set containing multiple sets of training data can be obtained through the combination of the first real face image and the attribute difference information.
  • Step 3 Input the vector corresponding to the first real face image and the vector corresponding to the attribute difference information into the second neural network, and output a synthetic face image containing the attribute difference information.
  • inputting a vector of a first real face image can be combined with inputting a vector of attribute difference information, and then outputting the first image according to the face image synthesis method in step S105.
  • the vector 62 corresponding to the first real face image and the vector 63 corresponding to the attribute difference information are input to the second neural network 64, and the second neural network 64 synthesizes the face image, and the output contains attributes Synthetic face image 65 of difference information.
  • Step 4 Input the first real face image and the synthetic face image corresponding to the first real face image into the third neural network and the fourth neural network.
  • the first real face image and its corresponding synthetic face image are input as a set of data into the third neural network and the fourth neural network, respectively.
  • the third neural network can realize the function of true and false discrimination, and determine the probability that the synthetic face image output by the second neural network is a real face image. If the probability is higher, the probability that it is a real face image is higher, indicating that the second neural network has a higher ability to synthesize a real face image.
  • the fourth neural network can determine the probability of certain specific attribute information contained in the input face image, for example, the probability of determining the attribute difference information contained in the input face image.
  • the difference between the synthetic face image and the corresponding first real face image can be determined according to the probability of the input face image being a real face image and the probability of containing specific attribute information, so as to ensure that the synthetic face image is The difference with the corresponding first face image except for the attribute difference information is as small as possible.
  • the first real face image 61 and the synthesized face image 65 are input to the third neural network 66 and the fourth neural network 67.
  • Step 5 Perform iterative training on the second neural network according to the first probability output by the third neural network and the second probability output by the fourth neural network. In the iterative training process, the weights of the parameters of the second neural network are adjusted.
  • the output result of the third neural network is the first probability of judging that the input face image is a real face image
  • the output result of the fourth neural network is the second probability of judging that the input face image contains attribute difference information. That is to say, the output results of the third neural network and the fourth neural network are used to measure the difference between the synthetic face image output by the second neural network and the real face image input, using a loss function or target
  • the objective function indicates that the higher the output value (loss) of the loss function, the greater the difference, and the training of the second neural network becomes a process of reducing this loss as much as possible.
  • the objective function contains three parts of constraints.
  • the difference between the synthesized face image output by the second neural network and the corresponding real face image is required to be as small as possible.
  • the smaller the difference the smaller the objective function value.
  • the synthetic face image output by the second neural network can be hidden from the third neural network as much as possible, that is, the third neural network can judge the synthetic face image as a real face image.
  • the higher the probability of being judged as a real image the smaller the objective function value.
  • the synthetic face image output by the second neural network needs to contain specific attribute information (attribute difference information). The greater the probability of including specific attribute information, the smaller the objective function value. In this way, the training of the second neural network can be achieved by minimizing the objective function.
  • the output results of the third neural network and the fourth neural network are used as the operation condition input loss (loss) function, and the result of the loss function is input into the second neural network for back propagation operation.
  • back propagation operation Perform gradient update, modify the weight of each parameter of the second neural network, and finally obtain a better parameter weight matrix.
  • the first real face image 61 and the corresponding synthetic face image 65 are input to the third neural network 66 and the fourth neural network 67
  • the probability and the second probability output after the fourth neural network 67 are input to the objective function, and the second neural network 64 is trained by the method of minimizing the objective function.
  • Step 6 If the first probability output by the third neural network is greater than the first threshold and the second probability output by the fourth neural network is greater than the second threshold, stop training to obtain a second neural network that can be used to synthesize a face image.
  • the first threshold and the second threshold may be determined based on empirical values.
  • the first probability of the output of the third neural network is greater than the first threshold and the second probability of the output of the fourth neural network is greater than the second threshold, then the training is stopped.
  • the stopping timing may be determined according to the output result of the loss function, for example, the training is stopped when the output result of the loss function changes relatively smoothly. Or the training can be stopped when the preset number of iterative training is reached. Take the loss function as an example. The higher the output value (loss) of the loss function, the greater the difference. After the output result of the loss function is smooth and no longer decreases, it means that the second neural network training is completed, which can meet the goal of achieving face image synthesis. .
  • the third neural network and the fourth neural network before applying the third neural network and the fourth neural network to train the second neural network, it is also necessary to train the third neural network and the fourth neural network, and adjust the third neural network and the fourth neural network.
  • Weight matrix Under the premise that the input face image is a real face image or a synthetic face image, the third neural network and the fourth neural network are trained by the method of minimizing the objective function. Among them, the objective function contains part of the content representing the classification error. For example, the objective function corresponding to the third neural network includes a part representing the classification error. If the input face image is judged to be a real face image, the greater the output probability, the smaller the corresponding objective function value.
  • the objective function corresponding to the fourth neural network includes multiple parts representing classification errors, where each part represents the classification error of an attribute in the attribute information. For example, if a face image is input, each bit of the output attribute vector represents the probability of attribute information contained in the input face image. It is determined that the input face image contains specific attribute information, and the greater the probability of output, the smaller the corresponding objective function value.
  • the training process of the fourth neural network is completed.
  • the first neural network and the second neural network form a generating network
  • the three neural network and the fourth neural network form a discriminant network.
  • the face image synthesis method provided by the embodiment of the present application can train the second neural network, so that the second face image synthesized by the second neural network and the corresponding first face image are excluded from the attributes that need to be corrected.
  • the difference outside the difference information is as small as possible; the second face image is as real as possible; the second face image contains specific attribute information, such as attribute difference information.
  • the face image synthesis method provided in the embodiment of the present application only needs to perform a small amount of attribute correction based on the real face image ( It is also called attribute transfer), that is, it is possible to synthesize a face image containing specific attribute information.
  • the method is more convenient and the synthesis of a face image is more realistic.
  • the user can also modify the synthesized face image, and the face image synthesis device adjusts the attribute information contained in the face image through interaction with the user. For example, some detailed features of face images that are difficult to fully describe. Or, some attribute information that was not collected in the initial collection of user attribute information can be adjusted by the following methods. If the modified attribute information changes greatly, such as the user fills in the wrong gender when filling out the form, it is necessary to re-execute the above steps S101 to S105. If the attribute information that needs to be modified does not change significantly, after step S105, the second face image can be modified directly through the following steps S106 to S108 to obtain a face image that meets the requirements more. As shown in FIG. 7, after the above step S105, the face image synthesis method of the embodiment of the present application may further include S106-S108:
  • the face image synthesis device can still collect the attribute adjustment information in a form of a table, or can change the attribute information in a simple drawing image directly performed by the user on the second face image. For example, if the user judges that a scar is missing on the second face image, he can directly draw a rough shape of the scar on the face of the second face image. After that, the face image synthesis device extracts information from the changed part of the content to obtain attribute adjustment information.
  • the facial image synthesis device extracts information from the changed part of the facial image fed back by the user, which can be completed through a neural network. For example, a face image containing attribute adjustment information is input to a neural network for data reading, and a vector representing the attribute adjustment information is obtained.
  • the third vector 82 is obtained according to the attribute adjustment information 81 fed back by the user. Among them, the third vector 82 is used to represent the attribute information in the second face image 83 that needs to be adjusted. As shown in FIG. 8, the third vector 82 may represent the scar information that the user needs to add to the second face image.
  • S107 Perform facial feature extraction on the second face image to obtain second facial feature information of the second face image.
  • the function implemented by the fifth neural network is the same as that of the first neural network, and is used to extract facial feature information from facial features of a human face to obtain second facial feature information.
  • the fifth neural network and the first neural network may be the same or different, which is not specifically limited in the embodiment of the present application.
  • the second face image 83 is input to the fifth neural network 84 to obtain the fourth vector 85.
  • the fifth neural network 84 is used to extract facial feature information of the input face image.
  • the fourth vector 85 is used to represent the second facial feature information of the second face image 83.
  • S108 Synthesize a third face image according to the second facial feature information and the attribute adjustment information.
  • the third vector 82 representing the attribute adjustment information obtained through the above step S106 is obtained, and the fourth vector 85 representing the facial feature information of the second face image is obtained through the above step S107.
  • the third vector 82 and the fourth vector 85 are spliced and then input to the sixth neural network 86 to obtain the third face image 87.
  • the sixth neural network 86 is used to correct the second facial feature information of the second face image according to the attribute adjustment information.
  • the face image synthesis method provided by the embodiment of the present application can realize the interaction with the user, and in the interaction process, the modification of the attribute information can be realized through a simpler method.
  • the face image synthesis method provided in the embodiments of the present application can realize the attributes of the face image through a simple interaction process with the user The correction of the information makes the final face image closer to the desired face image.
  • the sixth neural network before the sixth neural network for synthesizing the face image is applied, the sixth neural network needs to be trained to make it more realistic to synthesize the face image.
  • the embodiment of the present application adopts a GAN-based manner to train the sixth neural network.
  • the confrontation network used for training the generation network may include the seventh neural network and the eighth neural network.
  • the seventh neural network is used to determine the third probability that the input face image is a real face image.
  • the eighth neural network is used to determine the fourth probability that the second real face image and the segmentation map of the synthetic face image corresponding to the second real face image are consistent.
  • the specific training process is as follows.
  • Step 1 Initialize the sixth neural network.
  • a sixth neural network that can be used for face image synthesis is constructed, and the weight matrix corresponding to each parameter contained therein is set as an initial value, and a training process needs to be performed, and the weight matrix corresponding to each parameter is learned during the training process.
  • training the sixth neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained sixth neural network.
  • Step 2 Obtain a second real face image, a vector corresponding to the second real face image, and a vector corresponding to the attribute adjustment information.
  • the real face image data set includes one or more second real face images
  • the attribute adjustment information data set includes one or more attribute adjustment information.
  • the vector corresponding to the second real face image is obtained through the fifth neural network, and the vector represents the second facial feature information of the second face image.
  • the vector corresponding to the attribute adjustment information can be directly constructed.
  • a training data set containing multiple sets of training data can be obtained through the combination of the second real face image and the attribute difference information.
  • Step 3 Input the vector corresponding to the second real face image and the vector corresponding to the attribute adjustment information to the sixth neural network, and output a synthetic face image corresponding to the second real face image containing the attribute adjustment information.
  • inputting a vector corresponding to the second real face image can be combined with inputting a vector corresponding to the attribute adjustment information, and then synthesizing the face image according to the above step S107 and step S108
  • the method outputs the synthetic face image corresponding to the second real face image.
  • the vector 92 corresponding to the second real face image and the vector 93 corresponding to the attribute adjustment information are input to the sixth neural network 94, and the face image is synthesized by the sixth neural network 94, and the output contains attributes Adjust the information of the composite face image 95.
  • Step 4 Input the second real face image and the synthetic face image corresponding to the second real face image into the seventh neural network and the eighth neural network.
  • the second real face image and its corresponding synthetic face image are input as a set of data into the seventh neural network and the eighth neural network, respectively.
  • the seventh neural network can realize the role of true and false discrimination, to determine the probability that the synthesized face image output by the sixth neural network is a real face image. If the probability is higher, the probability that it is a real face image is higher, indicating that the sixth neural network has a higher ability to synthesize a real face image.
  • the eighth neural network can determine the probability that the second real face image and the segmentation map of the synthetic face image corresponding to the second real face image are consistent.
  • the difference between the synthetic face image and the corresponding second real face image can be judged according to the probability that the input face image is the real face image and the probability that the segmentation image is consistent, so as to ensure that the synthetic face image is consistent with the corresponding
  • the difference in the second face image except for the attribute adjustment information is as small as possible.
  • the edge-based segmentation method is one of the image segmentation methods, and the segmented face image can be represented by a segmentation map.
  • the segmentation map includes dividing the face image from the background image along the outer contour of the face image, and only retains the part of the face image, so that the interference of the background information on the process of judging the face image can be ignored.
  • the face image synthesis ability of the sixth neural network can be judged.
  • the synthesized face image will not cause large abnormal changes in the face image.
  • the neural network will only modify the attribute information that needs to be adjusted in detail.
  • the second real face image 91 and the synthesized face image 95 are input to the seventh neural network 96 and the eighth neural network 97.
  • the seventh neural network has the same function as the fourth neural network applied in the process of training the second neural network, and both are used to determine the probability that the input face image is a real face image.
  • the seventh neural network and the fourth neural network may be the same or different.
  • Step 5 Perform iterative training on the sixth neural network according to the third probability output by the seventh neural network and the fourth probability output by the eighth neural network; during the iterative training process, adjust the weights of the parameters of the sixth neural network.
  • the output result of the seventh neural network is the third probability of judging that the input face image is a real face image
  • the output result of the eighth neural network is judging that the second real face image corresponds to the second real face image
  • the fourth probability that the segmentation map of the synthesized face image is consistent.
  • the output results of the seventh neural network and the eighth neural network are used to measure the difference between the synthesized face image output by the sixth neural network and the real face image input, using a loss function or target
  • the objective function means that the higher the output value (loss) of the loss function, the greater the difference. Then the training of the sixth neural network becomes a process of reducing this loss as much as possible.
  • the objective function contains three parts of constraints. First, it is required that the difference between the synthesized face image output by the sixth neural network and the corresponding real face image is as small as possible. The smaller the difference, the smaller the objective function value. Secondly, it is required that the synthesized face image output by the sixth neural network can be hidden from the seventh neural network as much as possible, that is, the seventh neural network can judge the synthesized face image as a real face image. The higher the probability of being judged as a real image, the smaller the objective function value. Thirdly, the synthesized face image output by the sixth neural network and the segmentation image of the corresponding real face image are as similar as possible. The more similar the segmentation maps, the smaller the objective function value. In this way, the training of the sixth neural network can be achieved by minimizing the objective function.
  • the output results of the seventh neural network and the eighth neural network are used as the operation condition input loss (loss) function, and the result of the loss function is input into the sixth neural network for back propagation operation.
  • the process of back propagation operation Perform gradient update, modify the weight of each parameter of the sixth neural network, and finally obtain a better parameter weight matrix.
  • the greater the probability that the seventh neural network outputs the synthetic face image as a real face image the smaller the loss function value.
  • the seventh neural network judges that the smaller the difference between the synthetic face image and the second real face image except for the attribute information that needs to be modified, the smaller the loss function.
  • the eighth neural network judges that the smaller the difference between the segmentation map of the synthetic face image and the second real face image, the smaller the loss function value. In this way, the training of the sixth neural network can be achieved by minimizing the objective function.
  • the third output of the seventh neural network 96 is The probability and the fourth probability output after the eighth neural network 97 are input to the objective function, and the sixth neural network 94 is trained by the method of minimizing the objective function.
  • Step 6 If the third probability output by the seventh neural network is greater than the third threshold and the fourth probability output by the eighth neural network is greater than the fourth threshold, stop training, and obtain a sixth neural network that can be used to synthesize a face image.
  • the third threshold and the fourth threshold can be determined according to empirical values.
  • the third probability of the output of the seventh neural network is greater than the third threshold and the fourth probability of the output of the eighth neural network is greater than the fourth threshold, then the training is stopped.
  • the stopping timing may be determined according to the output result of the loss function, for example, the training is stopped when the output result of the loss function changes relatively smoothly. Or the training can be stopped when the preset number of iterative training is reached. Take the loss function as an example. The higher the output value (loss) of the loss function, the greater the difference. After the output result of the loss function does not drop smoothly, it means that the sixth neural network training is completed, which can meet the goal of achieving face image synthesis. .
  • the seventh neural network and the eighth neural network are trained by minimizing the objective function.
  • the objective function contains part of the content representing the classification error.
  • the objective function corresponding to the seventh neural network includes a part representing the classification error. If the input face image is judged to be a real face image, the greater the output probability, the smaller the corresponding objective function value.
  • the training process of the seventh neural network is completed.
  • the training process of the eighth neural network is similar to the training process of the seventh neural network.
  • the objective function contains part of the content representing the classification error. Then the eighth neural network judges that the difference between the segmentation map of the real face image and the synthetic face image is smaller, Then the objective function value is smaller. Conversely, the greater the difference between the segmentation maps, the greater the value of the objective function. In this way, by continuously inputting real face images and synthetic face images, and minimizing the objective function value, the training process of the seventh neural network and the eighth neural network is completed.
  • the face image synthesis method provided by the embodiment of the present application can train the sixth neural network to make the third face image synthesized by the sixth neural network and the corresponding second face image, except for those that need to be corrected. Differences other than the attribute adjustment information are as small as possible; the third face image is as real as possible; the segmentation map of the third face image is as same as the corresponding segmentation map of the second face image.
  • the face image synthesis method provided in the embodiments of the present application can interact with the user to analyze the face image attributes that are difficult to describe. The information is adjusted through simple input recognition, thereby more comprehensively synthesizing the face image that the user needs.
  • FIG. 10 shows a schematic diagram of a possible structure of the face image synthesizing device 1000 involved in the foregoing embodiment.
  • the face image synthesis device 1000 includes: an acquisition unit 1001 and a processing unit 1002.
  • the acquiring unit 1001 is used to support the face image synthesis apparatus 1000 to perform step S101 in FIG. 4, step S101 and step S106 in FIG. 7, and/or other processes used in the technology described herein.
  • the processing unit 1002 is used to support the face image synthesis device 1000 to perform step S102-step S105 in FIG. 4, step S102-step S105 and step S107-step S108 in FIG. 7, and/or for the technology described herein Other processes.
  • FIG. 11 is a schematic diagram of the hardware structure of the device provided by an embodiment of the application.
  • the device includes at least one processor 1101, a communication line 1102, a memory 1103, and at least one communication interface 1104.
  • the memory 1103 may also be included in the processor 1101.
  • the processor 1101 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more programs for controlling the execution of the program of this application. integrated circuit.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • the communication line 1102 may include a path to transmit information between the aforementioned components.
  • the communication interface 1104 is used to communicate with other devices.
  • the communication interface may be a module, a circuit, a bus, an interface, a transceiver, or other device that can realize a communication function, and is used to communicate with other devices.
  • the transceiver can be an independently set transmitter, which can be used to send information to other devices, and the transceiver can also be an independently set receiver for sending information from other devices.
  • the device receives information.
  • the transceiver may also be a component that integrates the functions of sending and receiving information, and the embodiment of the present application does not limit the specific implementation of the transceiver.
  • the memory 1103 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • the dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this.
  • the memory may exist independently, and is connected to the processor 1101 through a communication line 1102.
  • the memory 1103 may also be integrated with the processor 1101.
  • the memory 1103 is used to store computer-executed instructions used to implement the solution of the present application, and the processor 1101 controls the execution.
  • the processor 1101 is configured to execute computer-executable instructions stored in the memory 1103, so as to implement the neural network adaptive search method provided in the following embodiments of the present application.
  • the computer execution instructions in the embodiments of the present application may also be referred to as application program codes, instructions, computer programs, or other names, which are not specifically limited in the embodiments of the present application.
  • the processor 1101 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 11.
  • the device may include multiple processors, such as the processor 1101 and the processor 1105 in FIG. 11. Each of these processors can be a single-core processor or a multi-core processor.
  • the processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
  • the above-mentioned device may be a general-purpose device or a special-purpose device, and the embodiment of the present application does not limit the type of the device.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the device.
  • the device may include more or fewer components than those shown in the figure, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on a server, the server executes the above-mentioned related method steps to realize the people in the above-mentioned embodiments. Face image synthesis method.
  • the embodiments of the present application also provide a computer program product, which when the computer program product runs on a computer, causes the computer to execute the above-mentioned related steps, so as to realize the face image synthesis method in the above-mentioned embodiment.
  • the embodiments of the present application also provide a device, which may specifically be a component or a module.
  • the device may include a connected processor and a memory; wherein the memory is used to store computer execution instructions.
  • the processor When the device is running, the processor The computer-executable instructions stored in the executable memory can be executed to make the device execute the face image synthesis method in the foregoing method embodiments.
  • the device, computer readable storage medium, computer program product, or chip provided in the embodiments of the present application are all used to execute the corresponding method provided above. Therefore, the beneficial effects that can be achieved can refer to the above provided The beneficial effects of the corresponding method will not be repeated here.
  • the disclosed method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, for example, multiple units or components may be divided. Combined or can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, modules or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be realized in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: flash memory, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

A method and apparatus for compositing a face image, relating to the field of artificial intelligence. The face image can be composited by performing attribute information correction on the basis of a real face image, the authenticity of a composited face image is improved, and the compositing efficiency is improved. The method comprises: obtaining first attribute information, the first attribute information being attribute information comprised in a face image to be composited; according to the first attribute information, searching a real face image library for a first face image, the first face image comprising second attribute information, and a repetitive rate of the second attribute information and the first attribute information satisfying a threshold requirement; obtaining attribute difference information according to the first attribute information and the second attribute information, the attribute difference information being used for representing an attribute difference between the first face image and the face image to be composited; performing facial feature extraction on the first face image to obtain first facial feature information of the first face image; and compositing a second face image according to the first facial feature information and the attribute difference information.

Description

人脸图像合成方法及装置Human face image synthesis method and device
本申请要求于2020年02月29日提交国家知识产权局、申请号为202010132570.1、发明名称为“人脸图像合成方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office on February 29, 2020, the application number is 202010132570.1, and the invention title is "Face image synthesis method and device", the entire content of which is incorporated into this application by reference middle.
技术领域Technical field
本申请涉及人工智能(artificial intelligence,AI)领域,尤其涉及一种人脸图像合成方法及装置。This application relates to the field of artificial intelligence (AI), and in particular to a method and device for synthesizing a face image.
背景技术Background technique
人脸图像合成技术广泛应用于拍照娱乐、医疗整形等领域。比如,在拍照娱乐领域,用户可以对照片的若干属性做出改变,如双眼皮、大眼睛、瘦脸等。又比如,在医疗整形领域,医生可以根据用户描述,在当前用户照片的基础上进修改生成术后效果图。Face image synthesis technology is widely used in fields such as photographing entertainment, medical plastic surgery and so on. For example, in the field of photography and entertainment, users can make changes to several attributes of the photo, such as double eyelids, big eyes, and face-lifting. For another example, in the field of medical plastic surgery, doctors can modify the current user's photo based on the user's description to generate a postoperative effect map.
目前,人脸图像合成方法主要包括以下两种。第一种是传统图像处理方法,通常的做法是建立模板库,包含各式各样的五官局部图像。然后画师根据目击者的描述在模板库中选择五官进行拼接,最后对拼接图像的边缘做平滑处理,生成人脸图像。但是,单纯的五官局部图像拼接很难保证合成人脸图像的真实性。并且,受到画师和目击者的两次主观影响,合成人脸图像与实际需求的人脸图像可能存在差距。Currently, face image synthesis methods mainly include the following two. The first is the traditional image processing method. The usual method is to build a template library that contains a variety of partial images of the facial features. Then the painter selects the facial features in the template library for splicing according to the description of the witnesses, and finally smoothes the edges of the spliced image to generate a face image. However, it is difficult to guarantee the authenticity of the synthesized face image by simply splicing the partial images of the facial features. Moreover, due to the two subjective influences of the painter and the eyewitness, there may be a gap between the synthesized face image and the actual face image required.
第二种是深度学习的方法,利用海量的人脸图像数据,用对抗生成的方法训练深度神经网络。后续利用训练后的神经网络生成人脸图像。但是,训练后的神经网络无法合成包含用户指定属性信息的人脸图像。The second is a deep learning method, which uses massive face image data to train a deep neural network using the method of confrontation generation. Later, the trained neural network is used to generate face images. However, the trained neural network cannot synthesize face images containing user-specified attribute information.
发明内容Summary of the invention
本申请提供的人脸图像合成方法,可以实现基于真实人脸图像进行人脸图像合成,能够获得满足需求且更加真实的人脸图像,合成效率较高。The face image synthesis method provided in the present application can realize face image synthesis based on real face images, can obtain more realistic face images that meet the requirements, and have high synthesis efficiency.
为达到上述目的,本申请采用如下技术方案:In order to achieve the above objectives, this application adopts the following technical solutions:
第一方面,本申请提供一种人脸图像合成方法,该方法可以包括:获取第一属性信息;第一属性信息为待合成人脸图像包含的属性信息。根据第一属性信息,在真实人脸图像库中查找第一人脸图像;第一人脸图像包含第二属性信息,第二属性信息与第一属性信息的重复率满足阈值要求。根据第一属性信息以及第二属性信息,获得属性差异信息;属性差异信息用于表示第一人脸图像与待合成人脸图像之间的属性差异。对第一人脸图像进行面部特征提取,获得第一人脸图像的第一面部特征信息。根据第一面部特征信息以及属性差异信息合成第二人脸图像。In a first aspect, the present application provides a face image synthesis method, which may include: acquiring first attribute information; the first attribute information is attribute information included in the face image to be synthesized. According to the first attribute information, the first face image is searched in the real face image database; the first face image contains the second attribute information, and the repetition rate of the second attribute information and the first attribute information meets the threshold requirement. According to the first attribute information and the second attribute information, the attribute difference information is obtained; the attribute difference information is used to indicate the attribute difference between the first face image and the face image to be synthesized. Perform facial feature extraction on the first face image to obtain first facial feature information of the first face image. The second face image is synthesized according to the first facial feature information and the attribute difference information.
其中,第一属性信息可以由用户采集后输入人脸图像合成设备,由人脸图像合成设备基于属性信息进行人脸图像合成。示例性的,用户采集第一属性信息的过程包括:人脸图像合成设备根据合成人脸图像需要的全部属性信息创建人脸图像属性信息调查表,将该人脸图像属性信息调查表发送至终端设备,由用户在终端设备上填写后返回人脸图像合成设备,进而人脸图像合成设备获得属性信息。如此,人脸图像合成设备 可以从多维度采集待合成人脸图像的属性信息,使得最后合成人脸图像更加接近需要的人脸图像。The first attribute information may be collected by the user and input into the face image synthesis device, and the face image synthesis device may synthesize the face image based on the attribute information. Exemplarily, the process for the user to collect the first attribute information includes: the face image synthesis device creates a face image attribute information questionnaire according to all the attribute information required to synthesize the face image, and sends the face image attribute information questionnaire to the terminal Device, the user fills in the terminal device and returns to the face image synthesis device, and then the face image synthesis device obtains the attribute information. In this way, the face image synthesis device can collect the attribute information of the face image to be synthesized from multiple dimensions, so that the final synthesized face image is closer to the desired face image.
属性差异信息用于表示查找的第一人脸图像包含的属性信息与第一属性信息之间的差异,则可以确定第一人脸图像中需要修正的属性信息,进而根据属性差异信息修正第一人脸图像。The attribute difference information is used to indicate the difference between the attribute information contained in the searched first face image and the first attribute information, and the attribute information that needs to be corrected in the first face image can be determined, and then the first face image can be corrected according to the attribute difference information. Face image.
如此,本申请实施例提供的人脸图像合成方法,可以不需要专业人员参与,效率较高,便于推广。并且基于真实人脸图像进行人脸图像合成,能够获得更加真实的人脸图像。In this way, the face image synthesis method provided by the embodiment of the present application may not require the participation of professionals, has high efficiency, and is convenient for promotion. And based on real face images for face image synthesis, a more realistic face image can be obtained.
在一种可能的实现方式中,根据第一属性信息以及第二属性信息,获得属性差异信息,包括:根据第一属性信息获得第一属性向量。根据第二属性信息获得第二属性向量;第一属性向量和第二属性向量的长度相同,每一位对应一种属性信息。根据第一属性向量和第二属性向量的差异获得第一向量;第一向量用于表示属性差异信息。In a possible implementation manner, obtaining the attribute difference information according to the first attribute information and the second attribute information includes: obtaining the first attribute vector according to the first attribute information. The second attribute vector is obtained according to the second attribute information; the length of the first attribute vector and the second attribute vector are the same, and each bit corresponds to one type of attribute information. The first vector is obtained according to the difference between the first attribute vector and the second attribute vector; the first vector is used to represent attribute difference information.
示例性的,若第一属性向量与第二属性向量对应的位置取值不同,表示该位置对应的属性信息需要修改,则输出的第一向量中将该位置的取值设置为第一属性向量中该位置的取值。若第一属性向量与第二属性向量对应位置的取值相同,表示该位置对应的属性信息不用修改,则输出的第一向量中该位置的取值可以设置为无意义的符号。如此,获得表示属性差异信息的第一向量后,后续可以根据第一向量中有意义位置的取值修正第一人脸图像的属性信息。Exemplarily, if the position values corresponding to the first attribute vector and the second attribute vector are different, indicating that the attribute information corresponding to the position needs to be modified, the value of the position is set as the first attribute vector in the output first vector The value of the position in. If the value of the corresponding position of the first attribute vector and the second attribute vector is the same, it means that the attribute information corresponding to the position does not need to be modified, and the value of the position in the output first vector can be set as a meaningless sign. In this way, after the first vector representing the attribute difference information is obtained, the attribute information of the first face image can be subsequently corrected according to the value of the meaningful position in the first vector.
在一种可能的实现方式中,对第一人脸图像进行面部特征提取,获得第一人脸图像的第一面部特征信息,包括:将第一人脸图像输入第一神经网络,获得第二向量;其中,第一神经网络用于提取输入的人脸图像的面部特征信息;第二向量用于表示第一人脸图像的第一面部特征信息。In a possible implementation manner, performing facial feature extraction on the first face image to obtain first facial feature information of the first face image includes: inputting the first face image into the first neural network to obtain the first face image Two vectors; where the first neural network is used to extract the facial feature information of the input face image; the second vector is used to represent the first facial feature information of the first face image.
其中,人脸图像可以包括无法穷举的多项面部特征信息,该面部特征信息可以用于表示对应的人脸图像,上述的第一人脸图像包含的属性信息可以为第一人脸图像面部特征信息的部分内容。每个人的面部特征信息不相同,可以利用面部特征信息具体的区分不同的人,如张三的面部特征信息不同于李四的面部特征信息,可以利用张三的面部特征信息迅速确定谁是张三。进一步的,面部特征提取就是为了获得可以表示该人脸图像的面部特征信息。Wherein, the face image may include multiple facial feature information that cannot be exhaustively listed. The facial feature information may be used to represent the corresponding face image. The attribute information contained in the first face image may be the first face image face. Part of the feature information. Each person’s facial feature information is different. You can use facial feature information to specifically distinguish different people. For example, Zhang San’s facial feature information is different from Li Si’s facial feature information. You can use Zhang San’s facial feature information to quickly determine who is Zhang. three. Further, facial feature extraction is to obtain facial feature information that can represent the face image.
如此,利用第一神经网络对第一人脸图像进行面部特征提取,获得表示第一人脸图像的第一面部特征信息。In this way, the first neural network is used to perform facial feature extraction on the first face image to obtain first facial feature information representing the first face image.
在一种可能的实现方式中,根据第一面部特征信息以及属性差异信息,合成第二人脸图像,包括:获取第一向量以及第二向量;其中,第一向量用于表示属性差异信息;第二向量用于表示第一人脸图像的第一面部特征信息。将第一向量和第二向量进行拼接后输入第二神经网络,获得第二人脸图像;其中,第二神经网络用于根据属性差异信息修正第一人脸图像的第一面部特征信息。In a possible implementation manner, synthesizing the second face image according to the first facial feature information and the attribute difference information includes: obtaining a first vector and a second vector; wherein the first vector is used to represent the attribute difference information ; The second vector is used to represent the first facial feature information of the first face image. The first vector and the second vector are spliced and input into the second neural network to obtain the second face image; wherein, the second neural network is used to correct the first facial feature information of the first face image according to the attribute difference information.
其中,第一向量和第二向量的拼接可以包括将第一向量直接拼接在第二向量之后,第二神经网络根据第一向量修正第二向量中表示属性信息的部分。比如,第一向量和第二向量中表示属性信息的部分长度相同,其中的每一位对应一种人脸图像属性信息。如此,第二神经网络可以直接根据第一向量的取值修正第二向量中对应位置的取值, 进而实现根据属性差异信息修正第一人脸图像。并且,可以实现除需要修正的属性信息以外尽可能小的改变第一人脸图像的属性信息。Wherein, the splicing of the first vector and the second vector may include splicing the first vector directly after the second vector, and the second neural network corrects the part representing the attribute information in the second vector according to the first vector. For example, the length of the part representing the attribute information in the first vector and the second vector is the same, and each of them corresponds to a type of facial image attribute information. In this way, the second neural network can directly correct the value of the corresponding position in the second vector according to the value of the first vector, thereby realizing the correction of the first face image according to the attribute difference information. In addition, the attribute information of the first face image can be changed as little as possible except for the attribute information that needs to be corrected.
在一种可能的实现方式中,在对第一人脸图像进行面部特征提取,获得第一人脸图像的第一面部特征信息之前,方法还包括:初始化第二神经网络。获取第一真实人脸图像,第一真实人脸图像对应的向量以及属性差异信息对应的向量。其中,第一真实人脸图像对应的向量用于表示第一真实人脸图像的面部特征信息;将第一真实人脸图像对应的向量和属性差异信息对应的向量输入第二神经网络,输出包含属性差异信息的合成人脸图像。将第一真实人脸图像和第一真实人脸图像对应的合成人脸图像输入第三神经网络和第四神经网络;其中,第三神经网络用于判别输入的人脸图像为真实人脸图像的第一概率;第四神经网络用于判别输入的人脸图像包含属性差异信息的第二概率。根据第三神经网络输出的第一概率和第四神经网络输出的第二概率对第二神经网络进行迭代训练。在迭代训练过程中,调整第二神经网络的参数的权重。若第三神经网络输出的第一概率大于第一阈值和第四神经网络输出的第二概率大于第二阈值则停止训练第二神经网络,得到可用于合成人脸图像的第二神经网络。In a possible implementation manner, before performing facial feature extraction on the first face image to obtain first facial feature information of the first face image, the method further includes: initializing a second neural network. Obtain the first real face image, the vector corresponding to the first real face image, and the vector corresponding to the attribute difference information. Among them, the vector corresponding to the first real face image is used to represent the facial feature information of the first real face image; the vector corresponding to the first real face image and the vector corresponding to the attribute difference information are input to the second neural network, and the output contains Synthetic face image of attribute difference information. Input the first real face image and the synthetic face image corresponding to the first real face image into the third neural network and the fourth neural network; among them, the third neural network is used to distinguish the input face image as a real face image The first probability; the fourth neural network is used to determine the second probability that the input face image contains attribute difference information. The second neural network is iteratively trained according to the first probability output by the third neural network and the second probability output by the fourth neural network. In the iterative training process, the weights of the parameters of the second neural network are adjusted. If the first probability output by the third neural network is greater than the first threshold and the second probability output by the fourth neural network is greater than the second threshold, stop training the second neural network to obtain a second neural network that can be used to synthesize a face image.
示例性的,可以利用对抗式生成网络(generative adversarial networks,GAN)的方法训练第二神经网络。利用第三神经网络判别第二神经网络输出的合成人脸图像为真实人脸图像的第一概率,利用第四神经网络判别第二神经网络输出的合成人脸图像包含属性差异信息的第二概率。根据第三神经网络输出的第一概率和第四神经网络输出的第二概率,调整第二神经网络的参数的权重,进而使得第二神经网络合成人脸图像,与对应的真实人脸图像的差别尽可能的小,更加的真实,且包含特定的属性信息如属性差异信息。Exemplarily, the second neural network can be trained using the method of adversarial networks (generative adversarial networks, GAN). Use the third neural network to determine the first probability that the synthetic face image output by the second neural network is a real face image, and use the fourth neural network to determine the second probability that the synthetic face image output by the second neural network contains attribute difference information . According to the first probability output by the third neural network and the second probability output by the fourth neural network, the weights of the parameters of the second neural network are adjusted, so that the second neural network synthesizes the face image, and the corresponding real face image The difference is as small as possible, more real, and contains specific attribute information such as attribute difference information.
在一种可能的实现方式中,该方法还包括:获取用户反馈的属性调整信息。对第二人脸图像进行面部特征提取,获得第二人脸图像的第二面部特征信息。根据第二面部特征信息以及属性调整信息合成第三人脸图像。In a possible implementation manner, the method further includes: obtaining attribute adjustment information fed back by the user. Perform facial feature extraction on the second face image to obtain second facial feature information of the second face image. The third face image is synthesized according to the second facial feature information and the attribute adjustment information.
其中,用户反馈的属性调整信息表示用户想要对合成人脸图像进行属性调整的信息,人脸图像合成设备通过与用户的交互过程,对人脸图像包含的属性信息进行调整。比如一些难以完整描述的人脸图像的细节特征。或者,一些初次采集用户属性信息未采集到的属性信息,可以利用属性调整信息进行调整。比如,可以采集用户在上述合成的第二人脸图像上直接通过绘画方式进行的调整,识别该调整的属性信息,利用该调整的属性信息直接对第二人脸图像进行修正。进而获得更加接近用户需求的真实的人脸图像。Among them, the attribute adjustment information fed back by the user represents the information that the user wants to adjust the attributes of the synthesized face image, and the face image synthesis device adjusts the attribute information contained in the face image through an interaction process with the user. For example, some detailed features of face images that are difficult to fully describe. Or, some attribute information that was not collected in the initial collection of user attribute information can be adjusted by using the attribute adjustment information. For example, it is possible to collect the adjustment made by the user directly by drawing on the above-mentioned synthesized second face image, identify the adjusted attribute information, and use the adjusted attribute information to directly correct the second face image. In turn, a real face image that is closer to the user's needs is obtained.
在一种可能的实现方式中,根据第二面部特征信息以及属性调整信息合成第三人脸图像,包括:根据用户反馈的属性调整信息,获得第三向量;其中,第三向量用于表示第二人脸图像中需要调整的属性信息。将第二人脸图像输入第五神经网络,获得第四向量;其中,第五神经网络用于提取输入的人脸图像的面部特征信息;第四向量用于表示第二人脸图像的第二面部特征信息。将第三向量和第四向量拼接后输入第六神经网络,获得第三人脸图像;其中,第六神经网络用于根据属性调整信息修正第二人脸图像的第二面部特征信息。In a possible implementation, synthesizing the third face image according to the second facial feature information and the attribute adjustment information includes: obtaining the third vector according to the attribute adjustment information fed back by the user; wherein the third vector is used to represent the first The attribute information that needs to be adjusted in the face image. The second face image is input to the fifth neural network to obtain the fourth vector; the fifth neural network is used to extract the facial feature information of the input face image; the fourth vector is used to represent the second face image of the second Facial feature information. The third vector and the fourth vector are spliced and input into the sixth neural network to obtain the third face image; wherein the sixth neural network is used to correct the second facial feature information of the second face image according to the attribute adjustment information.
如此,可以直接将属性信息转换为向量问题,实现利用神经网络合成人脸图像, 使得合成人脸图像除需要调整的部分之外尽可能少的变化。In this way, the attribute information can be directly converted into a vector problem, and the neural network is used to synthesize the face image, so that the synthesized face image has as little change as possible except for the part that needs to be adjusted.
在一种可能的实现方式中,在根据第二面部特征信息以及属性调整信息合成第三人脸图像之前,方法还包括:初始化第六神经网络。获取第二真实人脸图像,第二真实人脸图像对应的向量,以及属性调整信息对应的向量。其中,第二真实人脸图像对应的向量用于表示第二真实人脸图像的面部特征信息;将第二真实人脸图像对应的向量和属性调整信息对应的向量输入第六神经网络,输出包含属性调整信息的合成人脸图像。将第二真实人脸图像和第二真实人脸图像对应的合成人脸图像输入第七神经网络和第八神经网络;其中,第七神经网络用于判别输入的人脸图像为真实人脸图像的第三概率;第八神经网络用于判别第二真实人脸图像和第二真实人脸图像对应的合成人脸图像的分割图一致的第四概率。根据第七神经网络输出的第三概率和第八神经网络的输出的第四概率对第六神经网络进行迭代训练。在迭代训练过程中,调整第六神经网络的参数的权重。若第七神经网络输出的第三概率大于第三阈值和第八神经网络输出的第四概率大于第四阈值则停止训练,得到可用于合成人脸图像的第六神经网络。In a possible implementation manner, before synthesizing the third face image according to the second facial feature information and the attribute adjustment information, the method further includes: initializing a sixth neural network. Obtain the second real face image, the vector corresponding to the second real face image, and the vector corresponding to the attribute adjustment information. Among them, the vector corresponding to the second real face image is used to represent the facial feature information of the second real face image; the vector corresponding to the second real face image and the vector corresponding to the attribute adjustment information are input to the sixth neural network, and the output contains Synthetic face image with attribute adjustment information. Input the second real face image and the synthetic face image corresponding to the second real face image into the seventh neural network and the eighth neural network; among them, the seventh neural network is used to distinguish the input face image as a real face image The third probability; the eighth neural network is used to determine the fourth probability that the second real face image and the second real face image corresponding to the synthetic face image segmentation map are consistent. The sixth neural network is iteratively trained according to the third probability output by the seventh neural network and the fourth probability output from the eighth neural network. In the iterative training process, the weights of the parameters of the sixth neural network are adjusted. If the third probability output by the seventh neural network is greater than the third threshold and the fourth probability output by the eighth neural network is greater than the fourth threshold, the training is stopped, and a sixth neural network that can be used to synthesize a face image is obtained.
示例性的,可以利用GAN的方法训练第六神经网络。利用第七神经网络判别第六神经网络输出的合成人脸图像为真实人脸图像的概率,利用第八神经网络识别第六神经网络输出的合成人脸图像与对应的真实人脸图像的分割图的一致的概率。根据第七神经网络输出的第三概率和第八神经网络输出的第四概率,调整第六神经网络的参数的权重,进而使得第六神经网络合成人脸图像,与对应的真实人脸图像的差别尽可能的小,更加的真实,具有相似的分割图。Exemplarily, the GAN method can be used to train the sixth neural network. Use the seventh neural network to determine the probability that the synthetic face image output by the sixth neural network is a real face image, and use the eighth neural network to identify the segmentation image of the synthetic face image output by the sixth neural network and the corresponding real face image Probability of agreement. According to the third probability output by the seventh neural network and the fourth probability output by the eighth neural network, the weights of the parameters of the sixth neural network are adjusted, so that the sixth neural network synthesizes the face image, and the corresponding real face image The difference is as small as possible, more real, with similar segmentation maps.
在一种可能的实现方式中,属性信息包括如下任一项或几项:年龄信息、性别信息、种族信息、肤色信息、脸型信息、五官信息、皮肤情况信息、佩戴饰品信息、发型信息、妆容信息。In a possible implementation, the attribute information includes any one or more of the following: age information, gender information, race information, skin color information, face shape information, facial features information, skin condition information, wearing accessories information, hairstyle information, makeup information.
其中,脸型信息可以包括人脸的形状,颧骨高低等信息。皮肤情况信息可以包括皮肤上的皱纹,长斑,胡子,疤痕等信息。佩戴饰品信息可以包括佩戴眼镜,口罩,帽子等信息。Among them, the face shape information may include information such as the shape of the human face and the height of the cheekbones. The skin condition information may include wrinkles, spots, beards, scars and other information on the skin. The wearing accessories information may include wearing glasses, masks, hats and other information.
如此,可以从多个维度查找与待合成人脸图像相似的真实人脸图像,实现基于真实人脸图像更改较少属性信息即可合成人脸图像,使得合成人脸图像更加的真实。In this way, a real face image similar to the face image to be synthesized can be searched from multiple dimensions, and the face image can be synthesized based on the real face image by changing less attribute information, making the synthesized face image more realistic.
在一种可能的实现方式中,在获取第一属性信息之前,方法还包括:建立真实人脸图像库,真实人脸图像库中包含真实人脸图像以及真实人脸图像包含的属性信息。In a possible implementation manner, before acquiring the first attribute information, the method further includes: establishing a real face image database, the real face image database containing real face images and attribute information contained in the real face images.
示例性的,真实人脸图像库包括真实人脸图像包含的属性信息。进而在查找第一人脸图像过程中,根据属性信息的重复率可以直接确定对应的真实人脸图像。Exemplarily, the real face image library includes attribute information contained in the real face image. Furthermore, in the process of searching for the first face image, the corresponding real face image can be directly determined according to the repetition rate of the attribute information.
第二方面,本申请提供一种人脸图像合成装置,该装置可以包括:获取单元以及处理单元。获取单元,用于获取第一属性信息;第一属性信息为待合成人脸图像包含的属性信息。处理单元,用于根据第一属性信息,在真实人脸图像库中查找第一人脸图像;第一人脸图像包含第二属性信息,第二属性信息与第一属性信息的重复率满足阈值要求。处理单元,还用于根据第一属性信息以及第二属性信息,获得属性差异信息;属性差异信息用于表示第一人脸图像与待合成人脸图像之间的属性差异。处理单元,还用于对第一人脸图像进行面部特征提取,获得第一人脸图像的第一面部特征信息。处理单元,还用于根据第一面部特征信息以及属性差异信息合成第二人脸图像。In a second aspect, the present application provides a face image synthesis device, which may include: an acquisition unit and a processing unit. The obtaining unit is used to obtain first attribute information; the first attribute information is the attribute information contained in the face image to be synthesized. The processing unit is used to search for the first face image in the real face image database according to the first attribute information; the first face image contains the second attribute information, and the repetition rate of the second attribute information and the first attribute information satisfies the threshold Require. The processing unit is further configured to obtain attribute difference information according to the first attribute information and the second attribute information; the attribute difference information is used to indicate the attribute difference between the first face image and the face image to be synthesized. The processing unit is further configured to perform facial feature extraction on the first face image to obtain first facial feature information of the first face image. The processing unit is further configured to synthesize a second face image according to the first facial feature information and the attribute difference information.
在一种可能的实现方式中,处理单元,具体用于根据第一属性信息获得第一属性向量。根据第二属性信息获得第二属性向量;第一属性向量和第二属性向量的长度相同,每一位对应一种属性信息。根据第一属性向量和第二属性向量的差异获得第一向量;第一向量用于表示属性差异信息。In a possible implementation manner, the processing unit is specifically configured to obtain the first attribute vector according to the first attribute information. The second attribute vector is obtained according to the second attribute information; the length of the first attribute vector and the second attribute vector are the same, and each bit corresponds to one type of attribute information. The first vector is obtained according to the difference between the first attribute vector and the second attribute vector; the first vector is used to represent attribute difference information.
在一种可能的实现方式中,处理单元,具体用于将第一人脸图像输入第一神经网络,获得第二向量;其中,第一神经网络用于提取输入的人脸图像的面部特征信息;第二向量用于表示第一人脸图像的第一面部特征信息。In a possible implementation, the processing unit is specifically configured to input the first face image into the first neural network to obtain the second vector; wherein the first neural network is used to extract facial feature information of the input face image ; The second vector is used to represent the first facial feature information of the first face image.
在一种可能的实现方式中,处理单元,具体用于获取第一向量以及第二向量;其中,第一向量用于表示属性差异信息;第二向量用于表示第一人脸图像的第一面部特征信息。将第一向量和第二向量进行拼接后输入第二神经网络,获得第二人脸图像;其中,第二神经网络用于根据属性差异信息修正第一人脸图像的第一面部特征信息。In a possible implementation, the processing unit is specifically used to obtain the first vector and the second vector; wherein the first vector is used to represent the attribute difference information; the second vector is used to represent the first face image of the first face image. Facial feature information. The first vector and the second vector are spliced and input into the second neural network to obtain the second face image; wherein, the second neural network is used to correct the first facial feature information of the first face image according to the attribute difference information.
在一种可能的实现方式中,处理单元,还用于初始化第二神经网络。获取第一真实人脸图像,第一真实人脸图像对应的向量以及属性差异信息对应的向量。其中,第一真实人脸图像对应的向量用于表示第一真实人脸图像的面部特征信息;将第一真实人脸图像对应的向量和属性差异信息对应的向量输入第二神经网络,输出包含属性差异信息的第一真实人脸图像对应的合成人脸图像。将第一真实人脸图像和第一真实人脸图像对应的合成人脸图像输入第三神经网络和第四神经网络;其中,第三神经网络用于判别输入的人脸图像为真实人脸图像的第一概率;第四神经网络用于判别输入的人脸图像包含属性差异信息的第二概率。根据第三神经网络输出的第一概率和第四神经网络输出的第二概率对第二神经网络进行迭代训练。在迭代训练过程中,调整第二神经网络的参数的权重。若第三神经网络输出的第一概率大于第一阈值和第四神经网络输出的第二概率大于第二阈值则停止训练第二神经网络,得到可用于合成人脸图像的第二神经网络。In a possible implementation manner, the processing unit is also used to initialize the second neural network. Obtain the first real face image, the vector corresponding to the first real face image, and the vector corresponding to the attribute difference information. Among them, the vector corresponding to the first real face image is used to represent the facial feature information of the first real face image; the vector corresponding to the first real face image and the vector corresponding to the attribute difference information are input to the second neural network, and the output contains The synthetic face image corresponding to the first real face image of the attribute difference information. Input the first real face image and the synthetic face image corresponding to the first real face image into the third neural network and the fourth neural network; among them, the third neural network is used to distinguish the input face image as a real face image The first probability; the fourth neural network is used to determine the second probability that the input face image contains attribute difference information. The second neural network is iteratively trained according to the first probability output by the third neural network and the second probability output by the fourth neural network. In the iterative training process, the weights of the parameters of the second neural network are adjusted. If the first probability output by the third neural network is greater than the first threshold and the second probability output by the fourth neural network is greater than the second threshold, stop training the second neural network to obtain a second neural network that can be used to synthesize a face image.
在一种可能的实现方式中,获取单元,还用于获取用户反馈的属性调整信息。处理单元,还用于对第二人脸图像进行面部特征提取,获得第二人脸图像的第二面部特征信息。处理单元,还用于根据第二面部特征信息以及属性调整信息合成第三人脸图像。In a possible implementation manner, the obtaining unit is also used to obtain attribute adjustment information fed back by the user. The processing unit is further configured to perform facial feature extraction on the second face image to obtain second facial feature information of the second face image. The processing unit is further configured to synthesize a third face image according to the second facial feature information and the attribute adjustment information.
在一种可能的实现方式中,处理单元,具体用于根据用户反馈的属性调整信息,获得第三向量;其中,第三向量用于表示第二人脸图像中需要调整的属性信息。将第二人脸图像输入第五神经网络,获得第四向量;其中,第五神经网络用于提取输入的人脸图像的面部特征信息;第四向量用于表示第二人脸图像的第二面部特征信息。将第三向量和第四向量拼接后输入第六神经网络,获得第三人脸图像;其中,第六神经网络用于根据属性调整信息修正第二人脸图像的第二面部特征信息。In a possible implementation manner, the processing unit is specifically configured to obtain the third vector according to the attribute adjustment information fed back by the user; wherein, the third vector is used to represent the attribute information that needs to be adjusted in the second face image. The second face image is input to the fifth neural network to obtain the fourth vector; the fifth neural network is used to extract the facial feature information of the input face image; the fourth vector is used to represent the second face image of the second Facial feature information. The third vector and the fourth vector are spliced and input into the sixth neural network to obtain the third face image; wherein the sixth neural network is used to correct the second facial feature information of the second face image according to the attribute adjustment information.
在一种可能的实现方式中,处理单元,还用于初始化第六神经网络。获取第二真实人脸图像,第二真实人脸图像对应的向量,以及属性调整信息对应的向量。其中,第二真实人脸图像对应的向量用于表示第二真实人脸图像的面部特征信息;将第二真实人脸图像对应的向量和属性调整信息对应的向量输入第六神经网络,输出包含属性调整信息的第二真实人脸图像对应的合成人脸图像。将第二真实人脸图像和第二真实人脸图像对应的合成人脸图像输入第七神经网络和第八神经网络;其中,第七神经网 络用于判别输入的人脸图像为真实人脸图像的第三概率;第八神经网络用于判别第二真实人脸图像和第二真实人脸图像对应的合成人脸图像的分割图一致的第四概率。根据第七神经网络输出的第三概率和第八神经网络的输出的第四概率对第六神经网络进行迭代训练。在迭代训练过程中,调整第六神经网络的参数的权重。若第七神经网络输出的第三概率大于第三阈值和第八神经网络输出的第四概率大于第四阈值则停止训练,得到可用于合成人脸图像的第六神经网络。In a possible implementation manner, the processing unit is also used to initialize the sixth neural network. Obtain the second real face image, the vector corresponding to the second real face image, and the vector corresponding to the attribute adjustment information. Among them, the vector corresponding to the second real face image is used to represent the facial feature information of the second real face image; the vector corresponding to the second real face image and the vector corresponding to the attribute adjustment information are input to the sixth neural network, and the output contains The synthetic face image corresponding to the second real face image of the attribute adjustment information. Input the second real face image and the synthetic face image corresponding to the second real face image into the seventh neural network and the eighth neural network; among them, the seventh neural network is used to distinguish the input face image as a real face image The third probability; the eighth neural network is used to determine the fourth probability that the second real face image and the second real face image corresponding to the synthetic face image segmentation map are consistent. The sixth neural network is iteratively trained according to the third probability output by the seventh neural network and the fourth probability output from the eighth neural network. In the iterative training process, the weights of the parameters of the sixth neural network are adjusted. If the third probability output by the seventh neural network is greater than the third threshold and the fourth probability output by the eighth neural network is greater than the fourth threshold, the training is stopped, and a sixth neural network that can be used to synthesize a face image is obtained.
在一种可能的实现方式中,属性信息包括如下任一项或几项:年龄信息、性别信息、种族信息、肤色信息、脸型信息、五官信息、皮肤情况信息、佩戴饰品信息、发型信息、妆容信息。In a possible implementation, the attribute information includes any one or more of the following: age information, gender information, race information, skin color information, face shape information, facial features information, skin condition information, wearing accessories information, hairstyle information, makeup information.
在一种可能的实现方式中,处理单元,还用于建立真实人脸图像库,真实人脸图像库中包含真实人脸图像以及真实人脸图像包含的属性信息。In a possible implementation manner, the processing unit is also used to establish a real face image database, and the real face image database contains real face images and attribute information contained in the real face images.
第三方面,本申请提供一种人脸图像合成装置,该人脸图像合成装置可以包括:一个或多个处理器,存储器,以及一个或多个指令。其中一个或多个指令被存储在存储器中。当指令一个或多个处理器执行时,使得人脸图像合成装置执行如上述第一方面及其中任一种可能的实现方式中所述的人脸图像合成方法。In a third aspect, the present application provides a face image synthesis device. The face image synthesis device may include: one or more processors, memories, and one or more instructions. One or more instructions are stored in the memory. When the instruction is executed by one or more processors, the face image synthesis apparatus is caused to execute the face image synthesis method as described in the first aspect and any one of its possible implementation manners.
第四方面,本申请提供一种装置,该装置具有实现如上述第一方面及其中任一种可能的实现方式中所述的人脸图像合成方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。In a fourth aspect, the present application provides a device that has the function of implementing the face image synthesis method described in the first aspect and any one of its possible implementation manners. This function can be realized by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-mentioned functions.
第五方面,本申请提供一种计算机可读存储介质,包括计算机指令,当计算机指令在计算机运行时,使得处理器执行如上述第一方面及其中任一种可能的实现方式中所述的人脸图像合成方法。In the fifth aspect, the present application provides a computer-readable storage medium, including computer instructions, which when the computer instructions are executed on the computer, cause the processor to execute the human Face image synthesis method.
第六方面,本申请提供一种计算机程序产品,当计算机程序产品在服务器上运行时,使得人脸图像合成装置执行如上述第一方面及其中任一种可能的实现方式中所述的人脸图像合成方法。In a sixth aspect, the present application provides a computer program product, which when the computer program product runs on a server, causes the face image synthesis device to execute the face image composition device described in the first aspect and any one of its possible implementations. Image synthesis method.
第七方面,提供一种电路系统,电路系统包括处理电路,处理电路被配置为执行如上述第一方面及其中任一种可能的实现方式中所述的人脸图像合成方法。In a seventh aspect, a circuit system is provided. The circuit system includes a processing circuit configured to execute the face image synthesis method described in the first aspect and any one of its possible implementation manners.
附图说明Description of the drawings
图1是本申请实施例提供的一种人脸图像合成方法的应用场景示意图;FIG. 1 is a schematic diagram of an application scenario of a face image synthesis method provided by an embodiment of the present application;
图2是本申请实施例提供的一种系统架构的示意图;Figure 2 is a schematic diagram of a system architecture provided by an embodiment of the present application;
图3是本申请实施例提供的一种芯片的硬件结构的示意图;FIG. 3 is a schematic diagram of the hardware structure of a chip provided by an embodiment of the present application;
图4是本申请实施例提供的一种人脸图像合成方法的流程示意图一;FIG. 4 is a first schematic flowchart of a method for synthesizing a face image provided by an embodiment of the present application;
图5是本申请实施例提供的一种人脸图像合成方法的流程示意图二;FIG. 5 is a second schematic flowchart of a method for synthesizing a face image provided by an embodiment of the present application;
图6为本申请实施例提供的一种人脸图像合成神经网络的训练流程示意图一;6 is a schematic diagram 1 of the training process of a facial image synthesis neural network provided by an embodiment of the application;
图7为本申请实施例提供的一种人脸图像合成方法的流程示意图三;FIG. 7 is a third schematic flowchart of a method for synthesizing a face image provided by an embodiment of the application;
图8为本申请实施例提供的一种人脸图像合成方法的流程示意图四;FIG. 8 is a fourth schematic flowchart of a method for synthesizing a face image according to an embodiment of the application;
图9为本申请实施例提供的一种人脸图像合成神经网络的训练流程示意图二;FIG. 9 is a second schematic diagram of a training process of a facial image synthesis neural network provided by an embodiment of the application;
图10为本申请实施例提供的人脸图像合成装置的结构示意图;FIG. 10 is a schematic structural diagram of a face image synthesis device provided by an embodiment of the application;
图11为本申请实施例提供的人脸图像合成装置的硬件结构示意图。FIG. 11 is a schematic diagram of the hardware structure of a face image synthesis device provided by an embodiment of the application.
具体实施方式Detailed ways
下面结合附图对本申请实施例提供的人脸图像合成方法及装置进行详细地描述。The face image synthesis method and device provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings.
图1示出了一种人脸图像合成系统,该人脸图像合成系统包括人脸图像合成设备110和终端设备120。人脸图像合成设备110与终端设备120之间可以通过有线网络或者无线网络连接。本申请实施例对设备之间的连接方式不做具体限定。FIG. 1 shows a face image synthesis system. The face image synthesis system includes a face image synthesis device 110 and a terminal device 120. The facial image synthesis device 110 and the terminal device 120 may be connected through a wired network or a wireless network. The embodiment of the present application does not specifically limit the connection mode between devices.
上述终端设备120提供相关的人机交互界面,以便用户输入合成人脸图像过程中所需要的相关参数,例如待合成人脸图像的属性信息,属性调整信息等。示例性的,待合成人脸图像的属性信息可以包括性别信息,年龄信息、脸型信息等可以用于描述人脸特征的属性信息。示例性的,终端设备可以是可移动电话(mobile phone)、平板电脑(pad)、带无线收发功能的电脑、个人数字助理(personal digital assistant,PDA)、上网本、桌面型、膝上型、手持计算机、笔记本电脑、人工智能(artificial intelligence,AI)终端等的终端设备。本申请实施例对终端设备120的具体形态不作特殊限制。The aforementioned terminal device 120 provides a related human-computer interaction interface so that the user can input related parameters required in the process of synthesizing the face image, such as attribute information of the face image to be synthesized, attribute adjustment information, and the like. Exemplarily, the attribute information of the face image to be synthesized may include gender information, and age information, face shape information, etc. may be used to describe the attribute information of the face feature. Exemplarily, the terminal device may be a mobile phone (mobile phone), a tablet computer (pad), a computer with wireless transceiver function, a personal digital assistant (personal digital assistant, PDA), a netbook, a desktop, a laptop, a handheld Computers, laptops, artificial intelligence (AI) terminals and other terminal devices. The embodiment of the present application does not impose special restrictions on the specific form of the terminal device 120.
上述人脸图像合成设备110可以是云服务器或者网络服务器等具有图像搜索及图像合成功能的设备或服务器。人脸图像合成设备110通过交互接口接收来自终端设备120发送的属性信息,属性调整信息等信息,再基于存储器中存储的真实人脸图像库,通过处理器进行真实人脸图像搜索。之后,由处理器利用搜索到的真实人脸图像以及获得的属性信息进行人脸图像合成,并且处理器可以利用获得的属性调成信息对合成人脸图像的属性进行进一步的调整。将最终合成人脸图像发送至对应的终端设备120。人脸图像合成设备110中的存储器可以是一个统称,包括本地存储以及存储历史人脸图像的数据库,数据库可以在人脸图像合成设备上,也可以在其它云服务器上。The aforementioned facial image synthesis device 110 may be a device or server with image search and image synthesis functions, such as a cloud server or a network server. The face image synthesis device 110 receives the attribute information and attribute adjustment information sent from the terminal device 120 through an interactive interface, and then searches for real face images through a processor based on the real face image library stored in the memory. After that, the processor uses the searched real face image and the obtained attribute information to synthesize the face image, and the processor can use the obtained attribute adjustment information to further adjust the attributes of the synthesized face image. The final synthesized human face image is sent to the corresponding terminal device 120. The memory in the face image synthesis device 110 may be a general term, including a database for local storage and storage of historical face images. The database may be on the face image synthesis device or on other cloud servers.
需要说明的是,人脸图像合成设备110可以是一台服务器,也可以是由多台服务器组成的服务器集群,或者是一个云计算服务中心。It should be noted that the face image synthesis device 110 may be a server, a server cluster composed of multiple servers, or a cloud computing service center.
比如,在图1中,服务器作为人脸图像合成设备110可以执行本申请实施例的人脸图像合成方法。For example, in FIG. 1, the server as the face image synthesis device 110 can execute the face image synthesis method of the embodiment of the present application.
又比如,在图1中,终端设备120直接作为人脸图像合成设备,接收用户的输入的待合成人脸图像的属性信息和/或属性调整信息,由终端设备120自身执行人脸图像合成任务,合成需要的人脸图像。即,终端设备120自身就可以执行本申请实施例的人脸图像合成方法。For another example, in FIG. 1, the terminal device 120 directly functions as a face image synthesis device, receiving the attribute information and/or attribute adjustment information of the face image to be synthesized from the user's input, and the terminal device 120 performs the face image synthesis task by itself. Synthesize the desired face image. That is, the terminal device 120 itself can execute the face image synthesis method of the embodiment of the present application.
图2示例性的给出了本申请实施例提供的一种系统架构。Fig. 2 exemplarily shows a system architecture provided by an embodiment of the present application.
如图2中的系统架构所示,人脸图像合成设备110配置收发接口211,用于与外部设备进行数据交互,人脸图像合成设备110通过收发接口211接收终端设备120传输的输入数据,该输入数据在本申请实施例中可以包括:待合成人脸图像的属性信息,属性调整信息。As shown in the system architecture in FIG. 2, the face image synthesis device 110 is equipped with a transceiver interface 211 for data interaction with external devices. The face image synthesis device 110 receives input data transmitted by the terminal device 120 through the transceiver interface 211. The input data in the embodiment of the present application may include: attribute information of the face image to be synthesized and attribute adjustment information.
人脸图像采集模块240,用于采集真实人脸图像。例如,本申请实施例中真实人脸图像可以是采集到的当地常驻人口的人脸图像。在采集到真实人脸图像之后,人脸图像采集模块240将这些真实人脸图像存入数据库230。数据库230中还可以包括真 实人脸图像库231,真实人脸图像库231用于存储人脸图像采集模块240搜索到的真实人脸图像。数据库230中还可以存储有用于训练人脸图像合成设备110的人脸图像。The face image collection module 240 is used to collect real face images. For example, the real face image in the embodiment of the present application may be a collected face image of a local resident population. After the real face images are collected, the face image collection module 240 stores these real face images in the database 230. The database 230 may also include a real face image library 231, which is used to store the real face images searched by the face image acquisition module 240. The database 230 may also store face images used for training the face image synthesis device 110.
需要说明的是,在实际的应用中,所述数据库230中维护的真实人脸图像不一定都来自于人脸图像采集模块240的采集,也有可能是从其他设备接收得到的。例如接收由终端设备120发送的用于扩充真实人脸图库231的信息。It should be noted that in actual applications, the real face images maintained in the database 230 may not all come from the collection of the face image collection module 240, and may also be received from other devices. For example, the information sent by the terminal device 120 for expanding the real face gallery 231 is received.
属性信息采集模块212,用于采集属性信息201。属性信息201例如可以包括待合成人脸图像的属性信息和属性调整信息。具体的,属性信息采集模块212通过收发接口211采集到合成人脸图像需要的待合成人脸图像的属性信息。以及在合成人脸图像过程中,采集到用户通过终端设备120输入的属性调整信息。The attribute information collection module 212 is used to collect attribute information 201. The attribute information 201 may include, for example, attribute information and attribute adjustment information of the face image to be synthesized. Specifically, the attribute information collection module 212 collects the attribute information of the face image to be synthesized that is required for synthesizing the face image through the transceiver interface 211. And in the process of synthesizing the face image, the attribute adjustment information input by the user through the terminal device 120 is collected.
搜索模块213,用于基于属性信息201在真实人脸图像库231中搜索到与待合成人脸图像属性信息重复率满足阈值要求的真实人脸图像202,即搜索到更加接近待合成人脸图像的真实人脸图像。The search module 213 is configured to search the real face image library 231 based on the attribute information 201 for the real face image 202 whose repetition rate with the attribute information of the face image to be synthesized meets the threshold requirement, that is, to search for the real face image that is closer to the face image to be synthesized Face image.
生成模块214,用于基于属性信息201训练真实人脸图像202,进而获得合成人脸图像203。例如,基于待合成人脸图像的属性信息在真实人脸图像中增加/减少/修正某些属性信息。比如,将真实人脸图像202中的发型属性信息由短头发修正为长头发。并通过收发接口211向终端设备120输出合成人脸图像203。The generating module 214 is used to train the real face image 202 based on the attribute information 201 to obtain the synthesized face image 203. For example, based on the attribute information of the face image to be synthesized, some attribute information is added/decreased/corrected in the real face image. For example, the hairstyle attribute information in the real face image 202 is corrected from short hair to long hair. And output the synthesized face image 203 to the terminal device 120 through the transceiver interface 211.
人脸图像属性调整模块215,用于基于生成模块214生成的人脸图像,以及属性信息采集模块212采集到的属性调整信息,对合成人脸图像中的属性进行细节调整,进而获得更加符合用户需求的人脸图像。The face image attribute adjustment module 215 is used to adjust the details of the attributes in the synthesized face image based on the face image generated by the generation module 214 and the attribute adjustment information collected by the attribute information acquisition module 212, so as to obtain a more user-friendly result The required face image.
还需要说明的是,图2仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。例如,在图2中,数据库230相对人脸图像合成设备110是外部存储器,在其它情况下,也可以将数据库230置于人脸图像合成设备110中。It should also be noted that FIG. 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 2, the database 230 is an external memory relative to the face image synthesis device 110. In other cases, the database 230 may also be placed in the face image synthesis device 110.
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。Since the embodiments of the present application involve a large number of applications of neural networks, in order to facilitate understanding, the following first introduces related terms and concepts of neural networks that may be involved in the embodiments of the present application.
(1)神经网络(1) Neural network
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以为: A neural network can be composed of neural units. A neural unit can refer to an arithmetic unit that takes x s and intercept 1 as inputs. The output of the arithmetic unit can be:
Figure PCTCN2020140440-appb-000001
Figure PCTCN2020140440-appb-000001
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。 Among them, s=1, 2,...n, n is a natural number greater than 1, W s is the weight of x s , and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field. The local receptive field can be a region composed of several neural units.
(2)深度神经网络(2) Deep neural network
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。Deep neural network (DNN), also known as multi-layer neural network, can be understood as a neural network with multiple hidden layers. The DNN is divided according to the positions of different layers. The neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下非线性关系表达式:
Figure PCTCN2020140440-appb-000002
其中,
Figure PCTCN2020140440-appb-000003
是输入向量,
Figure PCTCN2020140440-appb-000004
是输出向量,
Figure PCTCN2020140440-appb-000005
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2020140440-appb-000006
经过如此简单的操作得到输出向量
Figure PCTCN2020140440-appb-000007
由于DNN层数多,系数W和偏移向量
Figure PCTCN2020140440-appb-000008
的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2020140440-appb-000009
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
Although DNN looks complicated, it is not complicated as far as the work of each layer is concerned. Simply put, it is the following nonlinear relational expression:
Figure PCTCN2020140440-appb-000002
in,
Figure PCTCN2020140440-appb-000003
Is the input vector,
Figure PCTCN2020140440-appb-000004
Is the output vector,
Figure PCTCN2020140440-appb-000005
Is the offset vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just the input vector
Figure PCTCN2020140440-appb-000006
After such a simple operation, the output vector is obtained
Figure PCTCN2020140440-appb-000007
Due to the large number of DNN layers, the coefficient W and the offset vector
Figure PCTCN2020140440-appb-000008
The number is also relatively large. The definition of these parameters in DNN is as follows: Take coefficient W as an example: Suppose in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as
Figure PCTCN2020140440-appb-000009
The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2020140440-appb-000010
In summary, the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as
Figure PCTCN2020140440-appb-000010
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。It should be noted that there is no W parameter in the input layer. In deep neural networks, more hidden layers make the network more capable of portraying complex situations in the real world. In theory, a model with more parameters is more complex and has a greater "capacity", which means that it can complete more complex learning tasks. Training the deep neural network is also the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).
(3)损失函数(3) Loss function
在训练深度神经网络的过程中,因为需要深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。In the process of training a deep neural network, because the output of the deep neural network needs to be as close as possible to the value that you really want to predict, you can compare the current network's predicted value with the really desired target value, and then based on the difference between the two To update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, pre-configured parameters for each layer in the deep neural network), for example, if the predicted value of the network If it is high, adjust the weight vector to make its prediction lower, and keep adjusting until the deep neural network can predict the really wanted target value or a value very close to the really wanted target value. Therefore, it is necessary to predefine "how to compare the difference between the predicted value and the target value". This is the loss function or objective function, which is used to measure the difference between the predicted value and the target value. Important equation. Among them, take the loss function as an example. The higher the output value (loss) of the loss function, the greater the difference. Then the training of the deep neural network becomes a process of reducing this loss as much as possible.
(4)反向传播算法(4) Backpropagation algorithm
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到更优的神经网络模型的参数,例如,权重矩阵。The neural network can use the back propagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged. The back-propagation algorithm is a back-propagation motion dominated by error loss, and aims to obtain better neural network model parameters, such as a weight matrix.
图3为本申请实施例提供的一种芯片的硬件结构,该芯片包括神经网络处理器(neural-network processing units,NPU)300。该芯片可以被设置在如图2所示 的人脸图像合成设备110中,用以完成图2中属性信息采集模块212的全部或部分工作(例如采集属性信息201),也可以用于完成搜索模块213的全部或部分工作(例如搜索真实人脸图像202),还可以用于完成生成模块214中的全部或部分工作(例如生成合成人脸图像203),还可以用于完成人脸图像属性调整模块215的全部或部分工作(例如调整生成模块214生成的合成人脸图像203的属性信息)。FIG. 3 is a hardware structure of a chip provided by an embodiment of the application. The chip includes a neural-network processing unit (NPU) 300. The chip can be set in the face image synthesis device 110 shown in FIG. 2 to complete all or part of the work of the attribute information collection module 212 in FIG. 2 (for example, collect attribute information 201), or it can be used to complete search All or part of the work of the module 213 (for example, searching for the real face image 202), can also be used to complete all or part of the work in the generating module 214 (for example, generating a synthetic face image 203), and can also be used to complete the attributes of the face image All or part of the work of the adjusting module 215 (for example, adjusting the attribute information of the synthetic face image 203 generated by the generating module 214).
神经网络处理器NPU 300作为协处理器挂载到主中央处理器(central processing unit,CPU)(host CPU)320上,由主CPU 320分配任务。NPU 300的核心部分为运算电路303,控制器304控制运算电路303提取存储器(权重存储器或输入存储器)中的数据并进行运算。The neural network processor NPU 300 is mounted as a co-processor to a main central processing unit (central processing unit, CPU) (host CPU) 320, and the main CPU 320 allocates tasks. The core part of the NPU 300 is the arithmetic circuit 303. The controller 304 controls the arithmetic circuit 303 to extract data from the memory (weight memory or input memory) and perform calculations.
在一些实现中,运算电路303内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路303是二维脉动阵列。运算电路303还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路303是通用的矩阵处理器。In some implementations, the arithmetic circuit 303 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 303 is a two-dimensional systolic array. The arithmetic circuit 303 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 303 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路303从权重存储器302中取矩阵B相应的数据,并缓存在运算电路303中每一个PE上。运算电路303从输入存储器301中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器308(accumulator)中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit 303 fetches the data corresponding to the matrix B from the weight memory 302 and caches it on each PE in the arithmetic circuit 303. The arithmetic circuit 303 takes the matrix A data and the matrix B from the input memory 301 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 308 (accumulator).
向量计算单元307可以对运算电路303的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元307可以用于神经网络中非卷积/非FC层的网络计算,如池化(pooling),批归一化(batch normalization),局部响应归一化(local response normalization)等。The vector calculation unit 307 can perform further processing on the output of the arithmetic circuit 303, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on. For example, the vector calculation unit 307 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .
在一些实现中,向量计算单元能307将经处理的输出的向量存储到统一存储器306。例如,向量计算单元307可以将非线性函数应用到运算电路303的输出,例如累加值的向量,用以生成激活值。In some implementations, the vector calculation unit 307 can store the processed output vector to the unified memory 306. For example, the vector calculation unit 307 may apply a nonlinear function to the output of the arithmetic circuit 303, such as a vector of accumulated values, to generate the activation value.
在一些实现中,向量计算单元307生成归一化的值、合并值,或二者均有。In some implementations, the vector calculation unit 307 generates a normalized value, a combined value, or both.
在一些实现中,处理过的输出的向量能够用作到运算电路303的激活输入,例如,用于在神经网络中的后续层中的使用。In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 303, for example, for use in a subsequent layer in a neural network.
统一存储器306用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器305(direct memory access controller,DMAC)将外部存储器330中的输入数据存入至输入存储器301和/或统一存储器306、将外部存储器330中的权重数据存入权重存储器302,以及将统一存储器306中的数据存入外部存储器330。The unified memory 306 is used to store input data and output data. The weight data directly passes through the storage unit access controller 305 (direct memory access controller, DMAC) to store the input data in the external memory 330 into the input memory 301 and/or the unified memory 306, and store the weight data in the external memory 330 into the weight The memory 302 and the data in the unified memory 306 are stored in the external memory 330.
总线接口单元310(bus interface unit,BIU),用于通过总线实现主CPU320、DMAC和取指存储器309之间进行交互。The bus interface unit 310 (bus interface unit, BIU) is used to implement interaction between the main CPU 320, the DMAC, and the fetch memory 309 through the bus.
与控制器304连接的取指存储器309(instruction fetch buffer)用于存储控制器304使用的指令。控制器304用于调用取指存储器309中缓存的指令,实现控制该运算加速器的工作过程。The instruction fetch buffer 309 connected to the controller 304 is used to store instructions used by the controller 304. The controller 304 is used to call the instructions cached in the instruction fetch memory 309 to control the working process of the computing accelerator.
一般地,统一存储器306,输入存储器301,权重存储器302以及取指存储器309均为片上(On-Chip)存储器,外部存储器330为该NPU300外部的存储器,该外部存储器330可以为双倍数据率同步动态随机存储器(double data rate  synchronous dynamic random access memory,DDR SDRAM)、高带宽存储器(high bandwidth memory,HBM)或其他可读可写的存储器。Generally, the unified memory 306, the input memory 301, the weight memory 302, and the instruction fetch memory 309 are all on-chip (On-Chip) memories, and the external memory 330 is a memory external to the NPU 300, and the external memory 330 can be synchronized at a double data rate. Dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (HBM) or other readable and writable memory.
下面结合图4对本申请实施例的人脸图像合成方法进行详细介绍。本申请实施例的人脸图像合成方法可以由图1中的人脸图像合成设备110、图2中的人脸图像合成设备110等设备执行。The face image synthesis method according to the embodiment of the present application will be described in detail below with reference to FIG. 4. The face image synthesis method of the embodiment of the present application may be executed by devices such as the face image synthesis device 110 in FIG. 1 and the face image synthesis device 110 in FIG. 2.
如图4所示,为本申请实施例提供一种人脸图像合成方法的流程示意图,该方法可以包括S101-S105:As shown in FIG. 4, a schematic flow chart of a method for synthesizing a face image is provided in an embodiment of this application. The method may include S101-S105:
S101、获取第一属性信息。S101. Acquire first attribute information.
其中,第一属性信息为待合成人脸图像包含的属性信息。该属性信息包括如下任一项或几项:年龄信息、性别信息、种族信息、肤色信息、脸型信息、五官信息、皮肤情况信息、佩戴饰品信息、发型信息、妆容信息。比如,脸型信息可以包括人脸的形状,颧骨高低等信息。皮肤情况信息可以包括皮肤上的皱纹,长斑,胡子,疤痕等信息。佩戴饰品信息可以包括佩戴眼镜,口罩,帽子等信息。Wherein, the first attribute information is attribute information contained in the face image to be synthesized. The attribute information includes any one or more of the following: age information, gender information, race information, skin color information, face information, facial features information, skin condition information, wearing accessories information, hairstyle information, makeup information. For example, the face shape information may include information such as the shape of the human face and the height of the cheekbones. The skin condition information may include wrinkles, spots, beards, scars and other information on the skin. The wearing accessories information may include wearing glasses, masks, hats and other information.
若需要合成人脸图像,需要先采集待合成人脸图像的属性信息,进而合成人脸图像需要包含这些属性信息。If a face image needs to be synthesized, the attribute information of the face image to be synthesized needs to be collected first, and then the synthesized face image needs to include these attribute information.
示例性的,采集属性信息的方式包括:人脸图像合成设备根据合成人脸图像需要的全部属性信息创建人脸图像属性信息调查表,将该人脸图像属性信息调查表发送至终端设备,由目击证人在终端设备上填写后返回人脸图像合成设备,进而人脸图像合成设备获得属性信息。如此,人脸图像合成设备可以从多维度采集待合成人脸图像的属性信息,使得最后合成人脸图像更加接近需要的人脸图像。Exemplarily, the method of collecting attribute information includes: the face image synthesis device creates a face image attribute information questionnaire according to all the attribute information needed to synthesize the face image, and sends the face image attribute information questionnaire to the terminal device, and The witness fills in the terminal device and returns to the face image synthesis device, and then the face image synthesis device obtains the attribute information. In this way, the face image synthesis device can collect the attribute information of the face image to be synthesized from multiple dimensions, so that the final synthesized face image is closer to the required face image.
如下表1所示,示例性的给出了一种人脸图像属性信息调查表内容,如确认此次需要合成人脸图像的第一属性信息包括30-40岁男性,尖下巴,大眼睛,高鼻梁,薄嘴唇,短发,斜刘海。As shown in Table 1 below, the content of a face image attribute information questionnaire is exemplarily given. For example, it is confirmed that the first attribute information required to synthesize a face image this time includes males aged 30-40, sharp chins, and big eyes. High nose, thin lips, short hair, oblique bangs.
表1Table 1
序号Serial number 属性Attributes 详细信息details
11 年龄age 30-4030-40
22 性别gender male
33 种族Race --
44 肤色color --
55 脸型Face shape 尖下巴Pointed chin
66 五官Five senses 大眼睛、高鼻梁、薄嘴唇Big eyes, high nose bridge, thin lips
77 皮肤情况Skin condition --
88 佩戴饰品Wear accessories --
99 发型hairstyle 短发、斜刘海Short hair, diagonal bangs
1010 妆容Makeup --
需要说明的是,上述表1只是一种人脸图像属性信息调查表可能的实现方式,还可以通过其他表现形式的人脸图像属性信息调查表实现人脸图像属性信息的获取。比如,在人脸图像属性信息调查表内列举所有可能的属性信息,通过采集用户的勾选结果确定属性信息。又比如,在人脸图像属性信息调查表内建立各种属性信息对应的表 示程度的可调节的进度条,通过用户调节进度条的程度获得该项属性信息的具体情况。如发型属性信息,进度条在20%表示短发,80%表示长发。It should be noted that the foregoing Table 1 is only a possible implementation of the face image attribute information questionnaire, and the face image attribute information can also be obtained through other expression forms of the face image attribute information questionnaire. For example, list all possible attribute information in the face image attribute information questionnaire, and determine the attribute information by collecting the check results of the user. For another example, in the face image attribute information questionnaire, an adjustable progress bar corresponding to the degree of presentation of various attribute information is established, and the specific situation of the attribute information is obtained by the user adjusting the degree of the progress bar. Such as hairstyle attribute information, the progress bar at 20% means short hair, and 80% means long hair.
S102、根据第一属性信息,在真实人脸图像库中查找第一人脸图像,第一人脸图像包含第二属性信息,第二属性信息与第一属性信息的重复率满足阈值要求。S102. According to the first attribute information, search for a first face image in a real face image database, where the first face image includes second attribute information, and the repetition rate of the second attribute information and the first attribute information meets the threshold requirement.
其中,真实人脸图像库,用于存储真实人脸图像。示例性的,通过采集当地常驻人口的真实人脸图像建立真实人脸图像库。可选的,真实人脸图像库中还包括真实人脸图像包含的属性信息。进而在查找第一人脸图像过程中,根据属性信息的重复率可以直接确定对应的真实人脸图像。可选的,可以预先采集大量的真实人脸图像建立真实人脸图像库,并且周期性更新扩充真实人脸图像库。Among them, the real face image library is used to store real face images. Exemplarily, a real face image database is established by collecting real face images of the local resident population. Optionally, the real face image library also includes attribute information contained in the real face image. Furthermore, in the process of searching for the first face image, the corresponding real face image can be directly determined according to the repetition rate of the attribute information. Optionally, a large number of real face images can be collected in advance to establish a real face image database, and the real face image database can be updated and expanded periodically.
第一人脸图像为在真实人脸图像库中查找到的某一人脸图像,因此,该第一人脸图像为真实人脸图像。第一人脸图像包含的第二属性信息与第一属性信息的重复率还需要满足一定的阈值要求。如此,可以查找到更加接近待合成人脸图像的真实人脸图像。其中,可以根据经验值设定阈值的取值,如第一属性信息和第二属性信息的重复率大于等于80%。进一步的,若查找到多个满足阈值要求的第一人脸图像,则可以将其中重复率最高的一张人脸图像作为第一人脸图像,用于后续合成人脸图像。The first face image is a certain face image found in the real face image library, therefore, the first face image is a real face image. The repetition rate of the second attribute information and the first attribute information contained in the first face image also needs to meet certain threshold requirements. In this way, a real face image that is closer to the face image to be synthesized can be found. Wherein, the value of the threshold can be set according to an empirical value, for example, the repetition rate of the first attribute information and the second attribute information is greater than or equal to 80%. Further, if multiple first face images that meet the threshold requirement are found, the face image with the highest repetition rate may be used as the first face image for subsequent synthesis of the face image.
示例性的,如下表2所示,为基于上表1中所示的第一属性信息查找到的第一人脸图像包含的第二属性信息。如确认第一人脸图像包含的第二属性信息包括35岁男性,尖下巴,大眼睛,高鼻梁,薄嘴唇,短发。Exemplarily, as shown in Table 2 below, it is the second attribute information contained in the first face image found based on the first attribute information shown in Table 1 above. For example, it is confirmed that the second attribute information contained in the first face image includes a 35-year-old man, sharp chin, big eyes, high nose bridge, thin lips, and short hair.
表2Table 2
序号Serial number 属性Attributes 详细信息details
11 年龄age 3535
22 性别gender male
33 种族Race --
44 肤色color --
55 脸型Face shape 尖下巴Pointed chin
66 五官Five senses 大眼睛、高鼻梁、薄嘴唇Big eyes, high nose bridge, thin lips
77 皮肤情况Skin condition --
88 佩戴饰品Wear accessories --
99 发型hairstyle 短发short hair
1010 妆容Makeup --
如此,在查找到的真实的第一人脸图像的基础上合成人脸图像,合成后的人脸图像也能够更加接近真实人脸图像。In this way, the face image is synthesized on the basis of the found real first face image, and the synthesized face image can also be closer to the real face image.
S103、根据第一属性信息以及第二属性信息,获得属性差异信息。S103: Obtain attribute difference information according to the first attribute information and the second attribute information.
其中,属性差异信息用于表示第一人脸图像与待合成人脸图像之间的属性差异。即需要在第一人脸图像上增加/删除/修正的属性信息。Among them, the attribute difference information is used to indicate the attribute difference between the first face image and the face image to be synthesized. That is, the attribute information that needs to be added/deleted/corrected on the first face image.
可选的,根据第一属性信息获得第一属性向量。根据第二属性信息获得第二属性向量。其中,第一属性向量和第二属性向量的长度相同,每一位对应一种属性信息。根据第一属性向量和第二属性向量的差异获得第一向量。第一向量用于表示属性差异信息。Optionally, the first attribute vector is obtained according to the first attribute information. Obtain the second attribute vector according to the second attribute information. Among them, the length of the first attribute vector and the second attribute vector are the same, and each bit corresponds to one type of attribute information. The first vector is obtained according to the difference between the first attribute vector and the second attribute vector. The first vector is used to represent the attribute difference information.
比如,可以预先配置属性信息与属性向量每一位的对应关系。比如,根据上述步骤S101的人脸图像属性信息调查表,将属性向量的每一位按序对应为一种属性信息。如人脸图像属性信息调查表包含3种属性信息,分别为性别,发型,肤色,则用于表示属性信息的第一属性向量和第二属性向量的长度均为3,其中的每一位对应一种属性信息。其中,对于性别属性信息,可以预设0表示女性,1表示男性。对于发型属性信息,可以将头发长度由短到长划分9个区间,分别对应1-9九个数字,并且用0表示未采集到该属性信息。对于肤色属性信息,可以将肤色深浅由浅到深划分9个区间,分别对应1-9九个数字,并且用0表示未采集到该属性信息。并且,第一向量中可以用不具备含义的字符表示在该位置对应的第一属性信息和第二属性信息相同,如X。进而在后续利用第一向量表示的属性差异信息对第一人脸图像进行属性信息修改时,则不必对无意义字符位置对应的第一人脸图像的属性信息进行修改。示例性的,如下表3所示,采集到第一属性信息为男性,短发。根据第一属性信息获得第一属性向量表示为
Figure PCTCN2020140440-appb-000011
如下表4所示,查找到的第一人脸图像包含的第二属性信息为男性,中长发,肤色偏白。根据第二属性信息对应的第二属性向量表示为
Figure PCTCN2020140440-appb-000012
第一属性向量和第二属性向量中的第2位和第3位属性向量不同,如此,可以根据两个属性向量的差异获得属性差异信息,第一向量可以表示为
Figure PCTCN2020140440-appb-000013
For example, the corresponding relationship between the attribute information and each bit of the attribute vector can be configured in advance. For example, according to the face image attribute information questionnaire in step S101, each bit of the attribute vector is sequentially corresponding to one type of attribute information. For example, the face image attribute information questionnaire contains three types of attribute information, namely gender, hair style, and skin color. The length of the first attribute vector and the second attribute vector used to represent the attribute information are both 3, and each of them corresponds to A type of attribute information. Among them, for the gender attribute information, 0 can be preset to indicate a female, and 1 to indicate a male. For hairstyle attribute information, the hair length can be divided into 9 intervals from short to long, corresponding to nine numbers from 1-9, and 0 means that the attribute information is not collected. For the skin color attribute information, the skin color can be divided into 9 intervals from light to dark, corresponding to nine numbers 1-9, and 0 means that the attribute information is not collected. In addition, in the first vector, characters with no meaning may be used to indicate that the first attribute information and the second attribute information corresponding to the position are the same, such as X. Furthermore, when subsequently using the attribute difference information represented by the first vector to modify the attribute information of the first face image, there is no need to modify the attribute information of the first face image corresponding to the position of the meaningless character. Exemplarily, as shown in Table 3 below, the first attribute information collected is male with short hair. The first attribute vector obtained according to the first attribute information is expressed as
Figure PCTCN2020140440-appb-000011
As shown in Table 4 below, the second attribute information contained in the found first face image is male, with medium-length hair and white skin. The second attribute vector corresponding to the second attribute information is expressed as
Figure PCTCN2020140440-appb-000012
The 2nd and 3rd attribute vectors in the first attribute vector and the second attribute vector are different. In this way, the attribute difference information can be obtained according to the difference between the two attribute vectors, and the first vector can be expressed as
Figure PCTCN2020140440-appb-000013
表3table 3
序号Serial number 属性Attributes 详细信息details
11 性别gender 11
22 发型hairstyle 22
33 肤色color 00
表4Table 4
序号Serial number 属性Attributes 详细信息details
11 性别gender 11
22 发型hairstyle 77
33 肤色color 33
示例性的,参见图5所示,根据获取到的第一属性信息51获得第一属性向量511,根据获取到的第二属性信息52获得第二属性向量521。进而根据第一属性向量511和第二属性向量521获得第一向量53。比如,获得上述表1中包含的第一属性信息以及上述表2中包含的第二属性信息,根据第一属性信息和第二属性信息分别获得第一属性向量和第二属性向量,进而通过按位减法的方式比较第一属性向量和第二属性向量获得第一向量,该第一向量表示属性差异信息为斜刘海。Exemplarily, referring to FIG. 5, the first attribute vector 511 is obtained according to the obtained first attribute information 51, and the second attribute vector 521 is obtained according to the obtained second attribute information 52. Furthermore, the first vector 53 is obtained according to the first attribute vector 511 and the second attribute vector 521. For example, obtain the first attribute information contained in the above table 1 and the second attribute information contained in the above table 2, respectively obtain the first attribute vector and the second attribute vector according to the first attribute information and the second attribute information, and then press The bit subtraction method compares the first attribute vector and the second attribute vector to obtain a first vector, and the first vector indicates that the attribute difference information is oblique bangs.
如此,可以根据第一人脸图像包含的第二属性信息与采集到的第一属性信息进行比对,进而确定需要在第一人脸图像上增加/删除/修正的属性信息。之后,执行 S104,也即根据该属性差异信息调整第一人脸图像,得到更接近待合成人脸图像的第二人脸图像。In this way, the second attribute information contained in the first face image can be compared with the collected first attribute information to determine the attribute information that needs to be added/deleted/corrected on the first face image. After that, S104 is executed, that is, the first face image is adjusted according to the attribute difference information to obtain a second face image that is closer to the face image to be synthesized.
S104、对第一人脸图像进行面部特征提取,获得第一人脸图像的第一面部特征信息。S104: Perform facial feature extraction on the first face image to obtain first facial feature information of the first face image.
其中,人脸图像可以包括无法穷举的多项面部特征信息,该面部特征信息可以用于表示对应的人脸图像。上述的第一人脸图像包含的属性信息可以为第一人脸图像面部特征信息的部分内容,该部分内容容易被用户观察、记忆,以及描述。每个人的面部特征信息不相同,可以利用面部特征信息的相似度区分不同的人。如张三的面部特征信息不同于李四的面部特征信息,可以利用张三的面部特征信息迅速确定谁是张三。进一步的,面部特征提取就是为了获得可以表示该人脸图像的面部特征信息。Wherein, the face image may include multiple pieces of facial feature information that cannot be exhaustively listed, and the facial feature information may be used to represent the corresponding face image. The above-mentioned attribute information contained in the first face image may be part of the content of the facial feature information of the first face image, and this part of the content can be easily observed, memorized, and described by the user. The facial feature information of each person is different, and the similarity of the facial feature information can be used to distinguish different people. For example, Zhang San’s facial feature information is different from Li Si’s facial feature information, and Zhang San’s facial feature information can be used to quickly determine who is Zhang San. Further, facial feature extraction is to obtain facial feature information that can represent the face image.
可选的,利用神经网络实现第一人脸图像的面部特征提取,将第一人脸图像转换为表示第一面部特征信息的第二向量。其中,第二向量中包含表示属性信息的内容。如将第一人脸图像输入神经网络的输入层,输入层的算子对第一人脸图像包含的面部特征信息进行提取,构成表示面部特征信息的高维矩阵。将该高维矩阵输入神经网络隐含层,通过隐含层中每一层算子的运算,对高维矩阵进行降维处理。最后,该神经网络输出层输出第二向量。Optionally, a neural network is used to implement facial feature extraction of the first face image, and the first face image is converted into a second vector representing the first facial feature information. Wherein, the second vector contains content representing attribute information. For example, the first face image is input to the input layer of the neural network, and the operator of the input layer extracts the facial feature information contained in the first face image to form a high-dimensional matrix representing the facial feature information. The high-dimensional matrix is input into the hidden layer of the neural network, and the high-dimensional matrix is processed for dimensionality reduction through the operation of each layer of the operator in the hidden layer. Finally, the neural network output layer outputs the second vector.
示例性的,如图5所示,将第一人脸图像54输入第一神经网络55,获得第二向量56。其中,第一神经网络55用于提取输入的人脸图像的面部特征信息,第二向量56用于表示第一人脸图像54的第一面部特征信息。Exemplarily, as shown in FIG. 5, the first face image 54 is input to the first neural network 55 to obtain the second vector 56. Among them, the first neural network 55 is used to extract the facial feature information of the input face image, and the second vector 56 is used to represent the first facial feature information of the first face image 54.
S105、根据第一面部特征信息以及属性差异信息合成第二人脸图像。S105: Synthesize a second face image according to the first facial feature information and the attribute difference information.
可选的,通过上述步骤S103和步骤S104获得第一人脸图像与待合成人脸图像之间的属性差异信息,以及第一人脸图像的面部特征信息,进而根据属性差异信息修正第一人脸图像的面部特征信息,合成包含属性差异信息的更接近需求的人脸图像。Optionally, through the above steps S103 and S104, the attribute difference information between the first face image and the face image to be synthesized, and the facial feature information of the first face image are obtained, and then the first face is corrected according to the attribute difference information The facial feature information of the image is synthesized to a face image that is closer to the demand and contains the attribute difference information.
比如,参见图5所示,将表示属性差异信息的第一向量53以及表示第一人脸图像的第一面部特征信息的第二向量56进行拼接后,拼接后输入第二神经网络57,获得第二人脸图像58。其中,第二神经网络57用于根据属性差异信息修正第一人脸图像54的第一面部特征信息。比如,根据上述表1和表2获得的属性差异信息为斜刘海,则第二神经网络57可以在第一人脸图像54中合成斜刘海,得到图5中的第二人脸图像58,第二人脸图像58包含属性差异信息(如斜刘海)。For example, referring to FIG. 5, the first vector 53 representing the attribute difference information and the second vector 56 representing the first facial feature information of the first face image are spliced, and then input into the second neural network 57 after splicing. Obtain a second face image 58. The second neural network 57 is used to correct the first facial feature information of the first face image 54 according to the attribute difference information. For example, if the attribute difference information obtained according to the above Table 1 and Table 2 is oblique bangs, the second neural network 57 can synthesize the oblique bangs in the first face image 54 to obtain the second face image 58 in FIG. 5. The two face images 58 contain attribute difference information (such as oblique bangs).
其中,第一向量和第二向量的拼接可以包括将第一向量直接拼接在第二向量之后,第二神经网络根据第一向量修正第二向量。The splicing of the first vector and the second vector may include splicing the first vector directly after the second vector, and the second neural network corrects the second vector according to the first vector.
示例性的,在第一向量中,可以将第一属性信息与第二属性信息相同的属性信息对应的位置设置为无意义符号如X,则第二神经网络可以直接根据第一向量中部分有意义位置的内容修正第二向量。如第二向量表示为
Figure PCTCN2020140440-appb-000014
第一向量表示为
Figure PCTCN2020140440-appb-000015
拼接 后的向量表示为
Figure PCTCN2020140440-appb-000016
如此,第二神经网络可以直接根据第一向量中部分有意义位置的内容修正第二向量。比如,第二神经网络读取到拼接后的向量中表示属性信息部分的中有一位向量的取值有意义,获得该向量位取值对应的属性信息,如斜刘海属性信息。则根据获得的属性信息,修正表示面部特征信息的向量,再将修正后的表示面部特征信息的向量进行升维处理,获得包含斜刘海的合成人脸图像。如此,第二神经网络合成人脸图像可以实现基于属性差异信息修正查找到的真实人脸图像中包含的属性信息,进而获得包含特定属性信息的更接近真实人脸图像的合成人脸图像。
Exemplarily, in the first vector, the position corresponding to the same attribute information of the first attribute information and the second attribute information can be set to a meaningless symbol such as X, and then the second neural network can be directly based on the part of the first vector. The content of the meaning position modifies the second vector. If the second vector is expressed as
Figure PCTCN2020140440-appb-000014
The first vector is expressed as
Figure PCTCN2020140440-appb-000015
The spliced vector is expressed as
Figure PCTCN2020140440-appb-000016
In this way, the second neural network can directly modify the second vector based on the content of a part of the meaningful position in the first vector. For example, the second neural network reads that the value of one bit of the vector representing the attribute information part of the spliced vector is meaningful, and obtains the attribute information corresponding to the bit value of the vector, such as oblique bangs attribute information. Then, according to the obtained attribute information, the vector representing the facial feature information is corrected, and then the corrected vector representing the facial feature information is subjected to dimensionality processing to obtain a composite face image containing oblique bangs. In this way, the second neural network synthesizing the face image can correct the attribute information contained in the found real face image based on the attribute difference information, thereby obtaining a synthetic face image containing specific attribute information that is closer to the real face image.
由此,本申请实施例提供的人脸图像合成方法,可以通过采集到的人脸图像属性信息获得更接近待合成人脸图像的真实人脸图像,并根据该真实人脸图像包含的属性信息与采集到的人脸图像属性信息获得属性差异信息,进而根据属性差异信息基于真实人脸图像进行人脸图像合成,获得更加真实且满足需求的人脸图像。相对于现有技术中,通过画师作画,再进行调整的人脸图像合成方法。本申请实施例提供的人脸图像合成方法,可以不需要专业人员参与,效率较高,便于推广。并且基于真实人脸图像进行人脸图像合成,能够获得更加真实的人脸图像。Therefore, the face image synthesis method provided by the embodiment of the present application can obtain a real face image closer to the face image to be synthesized through the collected face image attribute information, and according to the attribute information contained in the real face image and The collected face image attribute information obtains the attribute difference information, and then the face image synthesis is performed based on the real face image according to the attribute difference information to obtain a more realistic and satisfying face image. Compared with the prior art, a face image synthesis method in which an artist paints and then adjusts the face image. The face image synthesis method provided in the embodiments of the present application may not require the participation of professionals, has high efficiency, and is convenient for promotion. And based on real face images for face image synthesis, a more realistic face image can be obtained.
在一些实施例中,在用于合成人脸图像的第二神经网络在应用之前,需要先对第二神经网络进行训练,使其合成人脸图像更加真实可靠。本申请实施例采用基于对抗式生成网络(generative adversarial networks,GAN)的方法训练第二神经网络。其中,用于训练第二神经网络(也可以称之为生成网络)的对抗网络可以包括第三神经网络和第四神经网络。第三神经网络用于判别输入的人脸图像为真实人脸图像的第一概率。第四神经网络用于判别输入的人脸图像包含属性差异信息的第二概率。具体的训练过程如下。In some embodiments, before the second neural network for synthesizing the face image is applied, the second neural network needs to be trained to make it more realistic and reliable to synthesize the face image. The embodiment of the application adopts a method based on adversarial networks (generative adversarial networks, GAN) to train the second neural network. Wherein, the confrontation network used for training the second neural network (also referred to as the generation network) may include a third neural network and a fourth neural network. The third neural network is used to determine the first probability that the input face image is a real face image. The fourth neural network is used to determine the second probability that the input face image contains attribute difference information. The specific training process is as follows.
步骤一、初始化第二神经网络。Step 1: Initialize the second neural network.
可选的,构建能够用于人脸图像合成的第二神经网络,其中包含的各个参数对应的权重矩阵设置为初始值,需要执行训练过程,在训练的过程中学习各个参数对应的权重矩阵。如上文对深度神经网络的介绍,训练第二神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的第二神经网络的所有层的权重矩阵。Optionally, a second neural network that can be used for face image synthesis is constructed, and the weight matrix corresponding to each parameter contained therein is set as an initial value, and a training process needs to be performed, and the weight matrix corresponding to each parameter is learned during the training process. As described above for the deep neural network, training the second neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained second neural network.
步骤二、获取第一真实人脸图像,第一真实人脸图像对应的向量以及属性差异信息对应的向量。Step 2: Obtain the first real face image, the vector corresponding to the first real face image, and the vector corresponding to the attribute difference information.
可选的,首先获取真实人脸图像数据集和属性差异信息数据集。其中,真实人脸图像数据集包含一个或多个第一真实人脸图像,属性差异信息数据集包含一个或多个属性差异信息。之后,根据上述步骤S103和步骤S104中介绍的方法,通过第一神经网络获得第一真实人脸图像对应的向量,该向量表示第一人脸图像的第一面部特征信息。并且,可以直接构建属性差异信息对应的向量。进一步的,可以通过第一真实人 脸图像和属性差异信息的搭配获得包含多组训练数据的训练数据集。Optionally, first obtain a real face image data set and an attribute difference information data set. Wherein, the real face image data set includes one or more first real face images, and the attribute difference information data set includes one or more attribute difference information. After that, according to the method introduced in the above step S103 and step S104, the vector corresponding to the first real face image is obtained through the first neural network, and the vector represents the first facial feature information of the first face image. Moreover, the vector corresponding to the attribute difference information can be directly constructed. Further, a training data set containing multiple sets of training data can be obtained through the combination of the first real face image and the attribute difference information.
步骤三、将第一真实人脸图像对应的向量和属性差异信息对应的向量输入第二神经网络,输出包含属性差异信息的合成人脸图像。Step 3: Input the vector corresponding to the first real face image and the vector corresponding to the attribute difference information into the second neural network, and output a synthetic face image containing the attribute difference information.
可选的,在训练第二神经网络的过程中,输入一个第一真实人脸图像的向量可以搭配输入一个属性差异信息的向量,进而根据上述步骤S105中的人脸图像合成方法输出该第一真实人脸图像对应的合成人脸图像。Optionally, in the process of training the second neural network, inputting a vector of a first real face image can be combined with inputting a vector of attribute difference information, and then outputting the first image according to the face image synthesis method in step S105. The synthetic face image corresponding to the real face image.
示例性的,如图6所示,将第一真实人脸图像对应的向量62和属性差异信息对应的向量63输入第二神经网络64,由第二神经网络64合成人脸图像,输出包含属性差异信息的合成人脸图像65。Exemplarily, as shown in FIG. 6, the vector 62 corresponding to the first real face image and the vector 63 corresponding to the attribute difference information are input to the second neural network 64, and the second neural network 64 synthesizes the face image, and the output contains attributes Synthetic face image 65 of difference information.
步骤四、将第一真实人脸图像和第一真实人脸图像对应的合成人脸图像输入第三神经网络和第四神经网络。Step 4: Input the first real face image and the synthetic face image corresponding to the first real face image into the third neural network and the fourth neural network.
可选的,将第一真实人脸图像和其对应的合成人脸图像作为一组数据分别输入第三神经网络和第四神经网络。其中,第三神经网络可以实现真假判别的作用,判别第二神经网络输出的合成人脸图像为真实人脸图像的概率。如概率越高,则其为真实人脸图像的概率越高,表明第二神经网络合成真实人脸图像的能力越高。第四神经网络可以实现判别输入的人脸图像包含的某些特定属性信息的概率,如判别包含属性差异信息的概率。概率越高表明第二神经网络合成特定属性信息的能力越高,实现第二神经网络在第一人脸图像中修正属性差异信息。并且,可以根据输入的人脸图像为真实人脸图像概率的大小以及包含特定属性信息的概率的大小,判断合成人脸图像与对应的第一真实人脸图像的差异大小,保证合成人脸图像与对应的第一人脸图像除属性差异信息以外的差异尽可能的小。Optionally, the first real face image and its corresponding synthetic face image are input as a set of data into the third neural network and the fourth neural network, respectively. Among them, the third neural network can realize the function of true and false discrimination, and determine the probability that the synthetic face image output by the second neural network is a real face image. If the probability is higher, the probability that it is a real face image is higher, indicating that the second neural network has a higher ability to synthesize a real face image. The fourth neural network can determine the probability of certain specific attribute information contained in the input face image, for example, the probability of determining the attribute difference information contained in the input face image. The higher the probability, the higher the ability of the second neural network to synthesize specific attribute information, so that the second neural network can correct the attribute difference information in the first face image. In addition, the difference between the synthetic face image and the corresponding first real face image can be determined according to the probability of the input face image being a real face image and the probability of containing specific attribute information, so as to ensure that the synthetic face image is The difference with the corresponding first face image except for the attribute difference information is as small as possible.
示例性的,参见图6所示,将第一真实人脸图像61和合成人脸图像65输入第三神经网络66和第四神经网络67。Exemplarily, referring to FIG. 6, the first real face image 61 and the synthesized face image 65 are input to the third neural network 66 and the fourth neural network 67.
步骤五、根据第三神经网络输出的第一概率和第四神经网络输出的第二概率对第二神经网络进行迭代训练。在迭代训练过程中,调整第二神经网络的参数的权重。Step 5: Perform iterative training on the second neural network according to the first probability output by the third neural network and the second probability output by the fourth neural network. In the iterative training process, the weights of the parameters of the second neural network are adjusted.
可选的,第三神经网络的输出结果为判别输入人脸图像为真实人脸图像的第一概率,第四神经网络的输出结果为判别输入人脸图像包含属性差异信息的第二概率。也就是说,第三神经网络和第四神经网络的输出结果用于衡量第二神经网络输出的合成人脸图像与输入的真实人脸图像之间的差异,利用损失函数(loss function)或目标函数(objective function)表示,损失函数的输出值(loss)越高表示差异越大,那么第二神经网络的训练就变成了尽可能缩小这个loss的过程。Optionally, the output result of the third neural network is the first probability of judging that the input face image is a real face image, and the output result of the fourth neural network is the second probability of judging that the input face image contains attribute difference information. That is to say, the output results of the third neural network and the fourth neural network are used to measure the difference between the synthetic face image output by the second neural network and the real face image input, using a loss function or target The objective function indicates that the higher the output value (loss) of the loss function, the greater the difference, and the training of the second neural network becomes a process of reducing this loss as much as possible.
其中,目标函数包含3部分约束条件。第一、要求第二神经网络输出的合成人脸图像与对应的真实人脸图像的差异尽可能的小。差异越小,目标函数值越小。第二、要求第二神经网络输出的合成人脸图像能够尽可能的瞒过第三神经网络,即使得第三神经网络将合成人脸图像判定为真实的人脸图像。判定为真实图像的概率越高,目标函数值越小。第三、第二神经网络输出的合成人脸图像需要包含的特定的属性信息(属性差异信息)。包含特定的属性信息的概率越大,目标函数值越小。如此,通过最小化目标函数的方法可以实现对第二神经网络的训练。Among them, the objective function contains three parts of constraints. First, the difference between the synthesized face image output by the second neural network and the corresponding real face image is required to be as small as possible. The smaller the difference, the smaller the objective function value. Secondly, it is required that the synthetic face image output by the second neural network can be hidden from the third neural network as much as possible, that is, the third neural network can judge the synthetic face image as a real face image. The higher the probability of being judged as a real image, the smaller the objective function value. Third, the synthetic face image output by the second neural network needs to contain specific attribute information (attribute difference information). The greater the probability of including specific attribute information, the smaller the objective function value. In this way, the training of the second neural network can be achieved by minimizing the objective function.
比如,将第三神经网络和第四神经网络的输出结果作为运算条件输入损失 (loss)函数,将loss函数的结果输入第二神经网络进行反向传播运算,在反向传播运算的过程中,进行梯度更新,修正第二神经网络各个参数的权重,最终得到更优的参数权重矩阵。For example, the output results of the third neural network and the fourth neural network are used as the operation condition input loss (loss) function, and the result of the loss function is input into the second neural network for back propagation operation. In the process of back propagation operation, Perform gradient update, modify the weight of each parameter of the second neural network, and finally obtain a better parameter weight matrix.
示例性的,如图6所示,第一真实人脸图像61和与其对应的合成人脸图像65输入第三神经网络66和第四神经网络67后,将第三神经网络66输出的第一概率和第四神经网络67后输出的第二概率输入目标函数,利用最小化目标函数的方法训练第二神经网络64。Exemplarily, as shown in FIG. 6, after the first real face image 61 and the corresponding synthetic face image 65 are input to the third neural network 66 and the fourth neural network 67, the first real face image 61 outputted by the third neural network 66 The probability and the second probability output after the fourth neural network 67 are input to the objective function, and the second neural network 64 is trained by the method of minimizing the objective function.
步骤六、若第三神经网络输出的第一概率大于第一阈值和第四神经网络输出的第二概率大于第二阈值则停止训练,得到可用于合成人脸图像的第二神经网络。Step 6. If the first probability output by the third neural network is greater than the first threshold and the second probability output by the fourth neural network is greater than the second threshold, stop training to obtain a second neural network that can be used to synthesize a face image.
可选的,第一阈值和第二阈值可以根据经验值确定,第三神经网络输出的第一概率大于第一阈值和第四神经网络输出的第二概率大于第二阈值则停止训练。示例性的,在具体训练过程中,可以根据loss函数的输出结果确定停止时机,如loss函数的输出结果变化较为平缓时停止训练。或者达到预设的迭代训练次数则可以停止训练。以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么loss函数输出结果平缓不再下降后,则表示第二神经网络训练完成,可以满足实现人脸图像合成的目标要求。Optionally, the first threshold and the second threshold may be determined based on empirical values. The first probability of the output of the third neural network is greater than the first threshold and the second probability of the output of the fourth neural network is greater than the second threshold, then the training is stopped. Exemplarily, in a specific training process, the stopping timing may be determined according to the output result of the loss function, for example, the training is stopped when the output result of the loss function changes relatively smoothly. Or the training can be stopped when the preset number of iterative training is reached. Take the loss function as an example. The higher the output value (loss) of the loss function, the greater the difference. After the output result of the loss function is smooth and no longer decreases, it means that the second neural network training is completed, which can meet the goal of achieving face image synthesis. .
在一些实施例中,在应用第三神经网络和第四神经网络训练第二神经网络之前,同样需要对第三神经网络和第四神经网络进行训练,调整第三神经网络和第四神经网络的权重矩阵。在已知输入的人脸图像为真实人脸图像或合成人脸图像的前提下,通过最小化目标函数的方法训练第三神经网络和第四神经网络。其中,目标函数包含代表分类误差的部分内容。比如,第三神经网络对应的目标函数包含一个代表分类误差的部分,判别输入的人脸图像为真实人脸图像则输出的概率越大,对应的目标函数值越小。反之,如果判别输入的人脸图像为合成人脸图像则输出的概率越小,对应的目标函数值越大。通过不断的输入真实人脸图像和合成人脸图像,并最小化目标函数值,完成对第三神经网络的训练过程。又比如,第四神经网络对应的目标函数包含多个代表分类误差的部分,其中,每一部分分别表示属性信息中的一种属性的分类误差。如输入一张人脸图像,输出的属性向量的每一位代表输入的人脸图像包含的属性信息的概率。判别输入的人脸图像包含特定属性信息则输出的概率越大,对应的目标函数值越小。反之,如果判别输入的人脸图像不包含特定属性信息则输出的概率越小,对应的目标函数值越大。通过不断的输入真实人脸图像和合成人脸图像,并最小化目标函数值,完成对第四神经网络的训练过程。In some embodiments, before applying the third neural network and the fourth neural network to train the second neural network, it is also necessary to train the third neural network and the fourth neural network, and adjust the third neural network and the fourth neural network. Weight matrix. Under the premise that the input face image is a real face image or a synthetic face image, the third neural network and the fourth neural network are trained by the method of minimizing the objective function. Among them, the objective function contains part of the content representing the classification error. For example, the objective function corresponding to the third neural network includes a part representing the classification error. If the input face image is judged to be a real face image, the greater the output probability, the smaller the corresponding objective function value. Conversely, if it is determined that the input face image is a synthetic face image, the smaller the output probability, the larger the corresponding objective function value. By continuously inputting real face images and synthesizing face images, and minimizing the value of the objective function, the training process of the third neural network is completed. For another example, the objective function corresponding to the fourth neural network includes multiple parts representing classification errors, where each part represents the classification error of an attribute in the attribute information. For example, if a face image is input, each bit of the output attribute vector represents the probability of attribute information contained in the input face image. It is determined that the input face image contains specific attribute information, and the greater the probability of output, the smaller the corresponding objective function value. Conversely, if it is determined that the input face image does not contain specific attribute information, the smaller the output probability, the larger the corresponding objective function value. By continuously inputting real face images and synthesizing face images, and minimizing the value of the objective function, the training process of the fourth neural network is completed.
在又一些实施例中,将第一神经网络和第二神经网络构成生成网络,将三神经网络和第四神经网络构成判别网络。同样利用GAN的方式,根据上文可知第一神经网络,第二神经网络,第三神经网络和第四神经网络的输入和输出,通过目标函数训练第一神经网络,第二神经网络,第三神经网络和第四神经网络。在生成网络和判别网络的对抗过程中,调整参数的权重。In still other embodiments, the first neural network and the second neural network form a generating network, and the three neural network and the fourth neural network form a discriminant network. Using the same method of GAN, according to the input and output of the first neural network, the second neural network, the third neural network and the fourth neural network, the first neural network, the second neural network, and the third neural network are trained through the objective function. Neural network and fourth neural network. In the confrontation process between the generating network and the discriminating network, the weights of the parameters are adjusted.
由此,本申请实施例提供的人脸图像合成方法,可以通过对第二神经网络的训练,使得第二神经网络合成的第二人脸图像与对应的第一人脸图像除需要修正的属性差异信息以外的差异尽可能的小;第二人脸图像尽可能的真实;第二人脸图像包含特 定的属性信息,如属性差异信息。如此,相对于现有技术需要专业画师参与或无法合成包含特定属性信息的人脸图像合成方法,本申请实施例提供的人脸图像合成方法,只需要基于真实人脸图像进行少量的属性修正(也称之为属性迁移),即可以合成包含特定属性信息的人脸图像,方法更为便捷,合成人脸图像更加的真实。Therefore, the face image synthesis method provided by the embodiment of the present application can train the second neural network, so that the second face image synthesized by the second neural network and the corresponding first face image are excluded from the attributes that need to be corrected. The difference outside the difference information is as small as possible; the second face image is as real as possible; the second face image contains specific attribute information, such as attribute difference information. In this way, compared with the prior art that requires the participation of professional painters or cannot synthesize the face image synthesis method containing specific attribute information, the face image synthesis method provided in the embodiment of the present application only needs to perform a small amount of attribute correction based on the real face image ( It is also called attribute transfer), that is, it is possible to synthesize a face image containing specific attribute information. The method is more convenient and the synthesis of a face image is more realistic.
本申请实施例提供的人脸图像合成方法,用户还可以对合成人脸图像进行修改,人脸图像合成设备通过与用户的交互,对人脸图像包含的属性信息进行调整。比如一些难以完整描述的人脸图像的细节特征。或者,一些初次采集用户属性信息未采集到的属性信息,可以通过下述方法进行调整。若修改的属性信息变动较大,如用户填表时性别填写错误,则需要重新执行上述步骤S101-步骤S105。若需要修改的属性信息变动后不大,则在步骤S105之后,可以直接通过下述步骤S106-步骤S108对第二人脸图像进行修改,获得更加符合要求的人脸图像。如图7所示,在上述步骤S105之后,本申请实施例的人脸图像合成方法还可以包括S106-S108:In the face image synthesis method provided by the embodiment of the present application, the user can also modify the synthesized face image, and the face image synthesis device adjusts the attribute information contained in the face image through interaction with the user. For example, some detailed features of face images that are difficult to fully describe. Or, some attribute information that was not collected in the initial collection of user attribute information can be adjusted by the following methods. If the modified attribute information changes greatly, such as the user fills in the wrong gender when filling out the form, it is necessary to re-execute the above steps S101 to S105. If the attribute information that needs to be modified does not change significantly, after step S105, the second face image can be modified directly through the following steps S106 to S108 to obtain a face image that meets the requirements more. As shown in FIG. 7, after the above step S105, the face image synthesis method of the embodiment of the present application may further include S106-S108:
S106、获取用户反馈的属性调整信息。S106. Obtain attribute adjustment information fed back by the user.
其中,用户获得通过上述步骤S105合成的第二人脸图像后,可以直观的回忆出第二人脸图像包含的属性信息还有哪些需要进行调整,则可以在第二人脸图像的基础上提出需要调整的属性信息。人脸图像合成设备可以仍通过表格的方式采集该属性调整信息,也可以由用户直接在第二人脸图像上进行的简单绘画图像,更改其中的属性信息。比如,用户判断第二人脸图像上缺少一个伤疤,则可以直接在第二人脸图像面部画出一个伤疤的大概形状。之后,由人脸图像合成设备对更改的部分内容进行信息提取,获得属性调整信息。如此,用户可以通过较为简便的方式实现合成人脸图像的属性信息的调整。其中,人脸图像合成设备对用户反馈的人脸图像上更改的部分内容进行信息提取的工作,可以通过神经网络完成。比如,将包含属性调整信息的人脸图像输入神经网络进行数据读取,获得表示属性调整信息的向量。Among them, after the user obtains the second face image synthesized through the above step S105, he can intuitively recall the attribute information contained in the second face image and what else needs to be adjusted, and can propose on the basis of the second face image The attribute information that needs to be adjusted. The face image synthesis device can still collect the attribute adjustment information in a form of a table, or can change the attribute information in a simple drawing image directly performed by the user on the second face image. For example, if the user judges that a scar is missing on the second face image, he can directly draw a rough shape of the scar on the face of the second face image. After that, the face image synthesis device extracts information from the changed part of the content to obtain attribute adjustment information. In this way, the user can realize the adjustment of the attribute information of the synthesized face image in a relatively simple manner. Among them, the facial image synthesis device extracts information from the changed part of the facial image fed back by the user, which can be completed through a neural network. For example, a face image containing attribute adjustment information is input to a neural network for data reading, and a vector representing the attribute adjustment information is obtained.
示例性的,如图8所示,根据用户反馈的属性调整信息81,获得第三向量82。其中,第三向量82用于表示第二人脸图像83中需要调整的属性信息。如图8中,第三向量82可以表示用户需要在第二人脸图像上添加的疤痕信息。Exemplarily, as shown in FIG. 8, the third vector 82 is obtained according to the attribute adjustment information 81 fed back by the user. Among them, the third vector 82 is used to represent the attribute information in the second face image 83 that needs to be adjusted. As shown in FIG. 8, the third vector 82 may represent the scar information that the user needs to add to the second face image.
S107、对第二人脸图像进行面部特征提取,获得第二人脸图像的第二面部特征信息。S107: Perform facial feature extraction on the second face image to obtain second facial feature information of the second face image.
其中,对第二人脸图像的面部特征提取过程可以参见参考步骤S104中对第一人脸图像进行面部特征提取的过程相关描述,在此不再赘述。其中,第五神经网络实现的功能与第一神经网络的功能相同,用于对人脸面部特征进行面部特征信息提取,获得第二面部特征信息。第五神经网络与第一神经网络可以相同,也可以不相同,对此本申请实施例不做具体限定。For the facial feature extraction process of the second face image, please refer to the related description of the process of facial feature extraction on the first face image in step S104, which will not be repeated here. Among them, the function implemented by the fifth neural network is the same as that of the first neural network, and is used to extract facial feature information from facial features of a human face to obtain second facial feature information. The fifth neural network and the first neural network may be the same or different, which is not specifically limited in the embodiment of the present application.
示例性的,参见图8所示,将第二人脸图像83输入第五神经网络84,获得第四向量85。其中,第五神经网络84用于提取输入的人脸图像的面部特征信息。第四向量85用于表示第二人脸图像83的第二面部特征信息。Exemplarily, referring to FIG. 8, the second face image 83 is input to the fifth neural network 84 to obtain the fourth vector 85. Among them, the fifth neural network 84 is used to extract facial feature information of the input face image. The fourth vector 85 is used to represent the second facial feature information of the second face image 83.
S108、根据第二面部特征信息以及属性调整信息合成第三人脸图像。S108: Synthesize a third face image according to the second facial feature information and the attribute adjustment information.
示例性的,如图8所示,获得通过上述步骤S106获得表示属性调整信息的第三向量82,以及通过上述步骤S107获得表示第二人脸图像面部特征信息的第四向量 85。将第三向量82和第四向量85拼接后输入第六神经网络86,获得第三人脸图像87。其中,第六神经网络86用于根据属性调整信息修正第二人脸图像的第二面部特征信息。Exemplarily, as shown in FIG. 8, the third vector 82 representing the attribute adjustment information obtained through the above step S106 is obtained, and the fourth vector 85 representing the facial feature information of the second face image is obtained through the above step S107. The third vector 82 and the fourth vector 85 are spliced and then input to the sixth neural network 86 to obtain the third face image 87. The sixth neural network 86 is used to correct the second facial feature information of the second face image according to the attribute adjustment information.
其中,具体的向量拼接以及属性信息修正过程可以参见上述步骤S105中相关内容,在此不再进行赘述。For the specific vector splicing and attribute information correction process, please refer to the related content in the above step S105, and the details will not be repeated here.
由此,本申请实施例提供的人脸图像合成方法,可以实现与用户的交互,在交互过程中,通过更为简单的方法即可实现属性信息的修改。如此,相对于现有技术需要专业画师参与或无法与用户交互的人脸图像合成方法,本申请实施例提供的人脸图像合成方法,能够通过与用户的简单交互过程,实现人脸图像的属性信息的修正,使得最终的人脸图像更加的接近需要的人脸图像。Therefore, the face image synthesis method provided by the embodiment of the present application can realize the interaction with the user, and in the interaction process, the modification of the attribute information can be realized through a simpler method. In this way, compared with the prior art face image synthesis method that requires the participation of professional painters or cannot interact with the user, the face image synthesis method provided in the embodiments of the present application can realize the attributes of the face image through a simple interaction process with the user The correction of the information makes the final face image closer to the desired face image.
同样的,在一些实施例中,在用于合成人脸图像的第六神经网络在应用之前,需要先对第六神经网络进行训练,使其合成人脸图像更加真实。本申请实施例采用基于GAN的方式训练第六神经网络。其中,用于训练生成网络(第六神经网络)的对抗网络可以包括第七神经网络和第八神经网络。第七神经网络用于判别输入的人脸图像为真实人脸图像的第三概率。第八神经网络用于判别第二真实人脸图像和第二真实人脸图像对应的合成人脸图像的分割图一致的第四概率。具体的训练过程如下。Similarly, in some embodiments, before the sixth neural network for synthesizing the face image is applied, the sixth neural network needs to be trained to make it more realistic to synthesize the face image. The embodiment of the present application adopts a GAN-based manner to train the sixth neural network. Wherein, the confrontation network used for training the generation network (sixth neural network) may include the seventh neural network and the eighth neural network. The seventh neural network is used to determine the third probability that the input face image is a real face image. The eighth neural network is used to determine the fourth probability that the second real face image and the segmentation map of the synthetic face image corresponding to the second real face image are consistent. The specific training process is as follows.
步骤1、初始化第六神经网络。Step 1. Initialize the sixth neural network.
可选的,构建能够用于人脸图像合成的第六神经网络,其中包含的各个参数对应的权重矩阵设置为初始值,需要执行训练过程,在训练的过程中学习各个参数对应的权重矩阵。如上文对深度神经网络的介绍,训练第六神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的第六神经网络的所有层的权重矩阵。Optionally, a sixth neural network that can be used for face image synthesis is constructed, and the weight matrix corresponding to each parameter contained therein is set as an initial value, and a training process needs to be performed, and the weight matrix corresponding to each parameter is learned during the training process. As described above for the deep neural network, training the sixth neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained sixth neural network.
步骤2、获取第二真实人脸图像,第二真实人脸图像对应的向量,以及属性调整信息对应的向量。Step 2. Obtain a second real face image, a vector corresponding to the second real face image, and a vector corresponding to the attribute adjustment information.
可选的,首先获取真实人脸图像数据集和属性调整信息数据集。其中,真实人脸图像数据集包含一个或多个第二真实人脸图像,属性调整信息数据集包含一个或多个属性调整信息。之后,通过上述步骤S106和步骤S107中介绍的方法,通过第五神经网络获得第二真实人脸图像对应的向量,该向量表示第二人脸图像的第二面部特征信息。并且,可以直接构建属性调整信息对应的向量。进一步的,可以通过第二真实人脸图像和属性差异信息的搭配获得包含多组训练数据的训练数据集。Optionally, first obtain a real face image data set and an attribute adjustment information data set. Wherein, the real face image data set includes one or more second real face images, and the attribute adjustment information data set includes one or more attribute adjustment information. After that, through the method introduced in the above step S106 and step S107, the vector corresponding to the second real face image is obtained through the fifth neural network, and the vector represents the second facial feature information of the second face image. In addition, the vector corresponding to the attribute adjustment information can be directly constructed. Further, a training data set containing multiple sets of training data can be obtained through the combination of the second real face image and the attribute difference information.
步骤3、将第二真实人脸图像对应的向量和属性调整信息对应的向量输入第六神经网络,输出包含属性调整信息的第二真实人脸图像对应的合成人脸图像。Step 3: Input the vector corresponding to the second real face image and the vector corresponding to the attribute adjustment information to the sixth neural network, and output a synthetic face image corresponding to the second real face image containing the attribute adjustment information.
可选的,在训练第六神经网络的过程中,输入一个第二真实人脸图像对应的向量可以搭配输入一个属性调整信息对应的向量,进而根据上述步骤S107和步骤S108中的人脸图像合成方法输出该第二真实人脸图像对应的合成人脸图像。Optionally, in the process of training the sixth neural network, inputting a vector corresponding to the second real face image can be combined with inputting a vector corresponding to the attribute adjustment information, and then synthesizing the face image according to the above step S107 and step S108 The method outputs the synthetic face image corresponding to the second real face image.
示例性的,如图9所示,将第二真实人脸图像对应的向量92和属性调整信息对应的向量93输入第六神经网络94,由第六神经网络94合成人脸图像,输出包含属性调整信息的合成人脸图像95。Exemplarily, as shown in FIG. 9, the vector 92 corresponding to the second real face image and the vector 93 corresponding to the attribute adjustment information are input to the sixth neural network 94, and the face image is synthesized by the sixth neural network 94, and the output contains attributes Adjust the information of the composite face image 95.
步骤4、将第二真实人脸图像和第二真实人脸图像对应的合成人脸图像输入第七神经网络和第八神经网络。Step 4. Input the second real face image and the synthetic face image corresponding to the second real face image into the seventh neural network and the eighth neural network.
可选的,将第二真实人脸图像和其对应的合成人脸图像作为一组数据分别输入第七神经网络和第八神经网络。其中,第七神经网络可以实现真假判别的作用,以判别第六神经网络输出的合成人脸图像为真实人脸图像的概率。如概率越高,则其为真实人脸图像的概率越高,表明第六神经网络合成真实人脸图像的能力越高。第八神经网络可以实现判别第二真实人脸图像和第二真实人脸图像对应的合成人脸图像的分割图一致的概率。概率越高表明第二真实人脸图像和其对应的合成人脸图像的分割图越一致,则表明第六神经网络合成人脸图像的能力越高,实现第六神经网络在第二人脸图像中根据属性调整信息修正较为细节的属性信息。并且,可以根据输入的人脸图像为真实人脸图像概率的大小以及分割图一致的概率,判断合成人脸图像与对应的第二真实人脸图像的差异大小,保证合成人脸图像与对应的第二人脸图像除属性调整信息以外的差异尽可能的小。Optionally, the second real face image and its corresponding synthetic face image are input as a set of data into the seventh neural network and the eighth neural network, respectively. Among them, the seventh neural network can realize the role of true and false discrimination, to determine the probability that the synthesized face image output by the sixth neural network is a real face image. If the probability is higher, the probability that it is a real face image is higher, indicating that the sixth neural network has a higher ability to synthesize a real face image. The eighth neural network can determine the probability that the second real face image and the segmentation map of the synthetic face image corresponding to the second real face image are consistent. The higher the probability, the more consistent the segmentation map of the second real face image and the corresponding synthetic face image is, and the higher the ability of the sixth neural network to synthesize the face image, and the realization of the sixth neural network in the second face image Modify the more detailed attribute information according to the attribute adjustment information. In addition, the difference between the synthetic face image and the corresponding second real face image can be judged according to the probability that the input face image is the real face image and the probability that the segmentation image is consistent, so as to ensure that the synthetic face image is consistent with the corresponding The difference in the second face image except for the attribute adjustment information is as small as possible.
其中,基于边缘的分割方法为图像分割方法之一,分割后的人脸图像可以用分割图表示。分割图包括沿人脸图像外围轮廓将人脸图像与背景图像分割开后,只保留人脸图像部分,进而可以忽略背景信息对判别人脸图像过程的干扰。利用真实人脸图像与对应的合成人脸图像分割图的一致性,可以判别第六神经网络的人脸图像合成能力,合成的人脸图像不会造成人脸图像出现较大异常变动,第六神经网络只会对需要进行细节调整的属性信息进行修正。Among them, the edge-based segmentation method is one of the image segmentation methods, and the segmented face image can be represented by a segmentation map. The segmentation map includes dividing the face image from the background image along the outer contour of the face image, and only retains the part of the face image, so that the interference of the background information on the process of judging the face image can be ignored. Using the consistency of the real face image and the corresponding synthetic face image segmentation map, the face image synthesis ability of the sixth neural network can be judged. The synthesized face image will not cause large abnormal changes in the face image. The neural network will only modify the attribute information that needs to be adjusted in detail.
示例性的,参见图9所示,将第二真实人脸图像91和合成人脸图像95输入第七神经网络96和第八神经网络97。Exemplarily, referring to FIG. 9, the second real face image 91 and the synthesized face image 95 are input to the seventh neural network 96 and the eighth neural network 97.
需要说明的是,第七神经网络和上述训练第二神经网络过程中应用的第四神经网络的功能相同,都是用于判别输入的人脸图像为真实人脸图像的概率。第七神经网络和第四神经网络可以相同,也可以不相同。It should be noted that the seventh neural network has the same function as the fourth neural network applied in the process of training the second neural network, and both are used to determine the probability that the input face image is a real face image. The seventh neural network and the fourth neural network may be the same or different.
步骤5、根据第七神经网络输出的第三概率和第八神经网络输出的第四概率对第六神经网络进行迭代训练;在迭代训练过程中,调整第六神经网络的参数的权重。Step 5. Perform iterative training on the sixth neural network according to the third probability output by the seventh neural network and the fourth probability output by the eighth neural network; during the iterative training process, adjust the weights of the parameters of the sixth neural network.
可选的,第七神经网络的输出结果为判别输入人脸图像为真实人脸图像的第三概率,第八神经网络的输出结果为判别第二真实人脸图像和第二真实人脸图像对应的合成人脸图像分割图一致的第四概率。也就是说,第七神经网络和第八神经网络的输出结果用于衡量第六神经网络输出的合成人脸图像与输入的真实人脸图像之间的差异,利用损失函数(loss function)或目标函数(objective function)表示,损失函数的输出值(loss)越高表示差异越大,那么第六神经网络的训练就变成了尽可能缩小这个loss的过程。Optionally, the output result of the seventh neural network is the third probability of judging that the input face image is a real face image, and the output result of the eighth neural network is judging that the second real face image corresponds to the second real face image The fourth probability that the segmentation map of the synthesized face image is consistent. In other words, the output results of the seventh neural network and the eighth neural network are used to measure the difference between the synthesized face image output by the sixth neural network and the real face image input, using a loss function or target The objective function means that the higher the output value (loss) of the loss function, the greater the difference. Then the training of the sixth neural network becomes a process of reducing this loss as much as possible.
其中,目标函数包含3部分约束条件。第一、要求第六神经网络输出的合成人脸图像尽可能的与对应的真实人脸图像的差别尽可能的小。差异越小,目标函数值越小。第二、要求第六神经网络输出的合成人脸图像能够尽可能的瞒过第七神经网络,即使得第七神经网络将合成人脸图像判定为真实的人脸图像。判定为真实图像的概率越高,目标函数值越小。第三、第六神经网络输出的合成人脸图像与对应的真实人脸图像的分割图尽可能的相同。分割图越相似,目标函数值越小。如此,通过最小化目标函数的方法可以实现对第六神经网络的训练。Among them, the objective function contains three parts of constraints. First, it is required that the difference between the synthesized face image output by the sixth neural network and the corresponding real face image is as small as possible. The smaller the difference, the smaller the objective function value. Secondly, it is required that the synthesized face image output by the sixth neural network can be hidden from the seventh neural network as much as possible, that is, the seventh neural network can judge the synthesized face image as a real face image. The higher the probability of being judged as a real image, the smaller the objective function value. Thirdly, the synthesized face image output by the sixth neural network and the segmentation image of the corresponding real face image are as similar as possible. The more similar the segmentation maps, the smaller the objective function value. In this way, the training of the sixth neural network can be achieved by minimizing the objective function.
比如,将第七神经网络和第八神经网络的输出结果作为运算条件输入损失 (loss)函数,将loss函数的结果输入第六神经网络进行反向传播运算,在反向传播运算的过程中,进行梯度更新,修正第六神经网络各个参数的权重,最终得到更优的参数权重矩阵。For example, the output results of the seventh neural network and the eighth neural network are used as the operation condition input loss (loss) function, and the result of the loss function is input into the sixth neural network for back propagation operation. In the process of back propagation operation, Perform gradient update, modify the weight of each parameter of the sixth neural network, and finally obtain a better parameter weight matrix.
其中,第七神经网络输出合成人脸图像为真实人脸图像的概率越大,则loss函数值越小。第七神经网络判断合成人脸图像与第二真实人脸图像之间除需要修改的属性信息以外的差异越小,loss函数越小。第八神经网络判断合成人脸图像与第二真实人脸图像的分割图之间的差别越小,loss函数值越小。如此,通过最小化目标函数的方法可以实现对第六神经网络的训练。Among them, the greater the probability that the seventh neural network outputs the synthetic face image as a real face image, the smaller the loss function value. The seventh neural network judges that the smaller the difference between the synthetic face image and the second real face image except for the attribute information that needs to be modified, the smaller the loss function. The eighth neural network judges that the smaller the difference between the segmentation map of the synthetic face image and the second real face image, the smaller the loss function value. In this way, the training of the sixth neural network can be achieved by minimizing the objective function.
示例性的,如图9所示,第二真实人脸图像91和与其对应的合成人脸图像95输入第七神经网络96和第八神经网络97后,将第七神经网络96输出的第三概率和第八神经网络97后输出的第四概率输入目标函数,利用最小化目标函数的方法训练第六神经网络94。Exemplarily, as shown in FIG. 9, after the second real face image 91 and the corresponding synthetic face image 95 are input to the seventh neural network 96 and the eighth neural network 97, the third output of the seventh neural network 96 is The probability and the fourth probability output after the eighth neural network 97 are input to the objective function, and the sixth neural network 94 is trained by the method of minimizing the objective function.
步骤6、若第七神经网络输出的第三概率大于第三阈值和第八神经网络输出的第四概率大于第四阈值则停止训练,得到可用于合成人脸图像的第六神经网络。Step 6. If the third probability output by the seventh neural network is greater than the third threshold and the fourth probability output by the eighth neural network is greater than the fourth threshold, stop training, and obtain a sixth neural network that can be used to synthesize a face image.
其中,第三阈值和第四阈值可以根据经验值确定,第七神经网络输出的第三概率大于第三阈值和第八神经网络输出的第四概率大于第四阈值则停止训练。示例性的,在具体训练过程中,可以根据loss函数的输出结果确定停止时机,如loss函数的输出结果变化较为平缓时停止训练。或者达到预设的迭代训练次数则可以停止训练。以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么loss函数输出结果平缓不再下降后,则表示第六神经网络训练完成,可以满足实现人脸图像合成的目标要求。Wherein, the third threshold and the fourth threshold can be determined according to empirical values. The third probability of the output of the seventh neural network is greater than the third threshold and the fourth probability of the output of the eighth neural network is greater than the fourth threshold, then the training is stopped. Exemplarily, in a specific training process, the stopping timing may be determined according to the output result of the loss function, for example, the training is stopped when the output result of the loss function changes relatively smoothly. Or the training can be stopped when the preset number of iterative training is reached. Take the loss function as an example. The higher the output value (loss) of the loss function, the greater the difference. After the output result of the loss function does not drop smoothly, it means that the sixth neural network training is completed, which can meet the goal of achieving face image synthesis. .
此外,在应用第七神经网络和第八神经网络训练第六神经网络之前,同样需要对第七神经网络和第八神经网络进行训练,调整第七神经网络和第八神经网络的权重矩阵。在已知输入的人脸图像为真实人脸图像或合成人脸图像的前提下,通过最小化目标函数训练第七神经网络和第八神经网络。其中,目标函数包含代表分类误差的部分内容。比如,第七神经网络对应的目标函数包含一个代表分类误差的部分,判别输入的人脸图像为真实人脸图像则输出的概率越大,对应的目标函数值越小。反之,如果判别输入的人脸图像为合成人脸图像则输出的概率越小,对应的目标函数值越大。通过不断的输入真实人脸图像和合成人脸图像,并最小化目标函数值,完成对第七神经网络的训练过程。第八神经网络的训练过程与第七神经网络的训练过程类似,目标函数包含代表分类误差的部分内容,则第八神经网络判断真实人脸图像和合成人脸图像的分割图的差异越小,则目标函数值越小。反之,分割图的差异越大,则目标函数值越大。如此,通过不断的输入真实人脸图像和合成人脸图像,并最小化目标函数值,完成第七神经网络和第八神经网络的训练过程。In addition, before applying the seventh neural network and the eighth neural network to train the sixth neural network, it is also necessary to train the seventh neural network and the eighth neural network, and adjust the weight matrices of the seventh neural network and the eighth neural network. Under the premise that the input face image is a real face image or a synthetic face image, the seventh neural network and the eighth neural network are trained by minimizing the objective function. Among them, the objective function contains part of the content representing the classification error. For example, the objective function corresponding to the seventh neural network includes a part representing the classification error. If the input face image is judged to be a real face image, the greater the output probability, the smaller the corresponding objective function value. Conversely, if it is determined that the input face image is a synthetic face image, the smaller the output probability, the larger the corresponding objective function value. By continuously inputting real face images and synthesizing face images, and minimizing the value of the objective function, the training process of the seventh neural network is completed. The training process of the eighth neural network is similar to the training process of the seventh neural network. The objective function contains part of the content representing the classification error. Then the eighth neural network judges that the difference between the segmentation map of the real face image and the synthetic face image is smaller, Then the objective function value is smaller. Conversely, the greater the difference between the segmentation maps, the greater the value of the objective function. In this way, by continuously inputting real face images and synthetic face images, and minimizing the objective function value, the training process of the seventh neural network and the eighth neural network is completed.
由此,本申请实施例提供的人脸图像合成方法,可以通过对第六神经网络的训练,使得第六神经网络合成的第三人脸图像与对应的第二人脸图像,除需要修正的属性调整信息以外的差别尽可能的小;第三人脸图像尽可能的真实;第三人脸图像的分割图与对应的第二人脸图像的分割图尽可能的相同。如此,相对于现有技术需要专业画师参与或无法与用户交互的人脸图像合成方法,本申请实施例提供的人脸图像合成 方法,可以通过与用户的交互,对难以描述的人脸图像属性信息通过简单的输入识别进行调整,从而更加全面的合成用户需要的人脸图像。Therefore, the face image synthesis method provided by the embodiment of the present application can train the sixth neural network to make the third face image synthesized by the sixth neural network and the corresponding second face image, except for those that need to be corrected. Differences other than the attribute adjustment information are as small as possible; the third face image is as real as possible; the segmentation map of the third face image is as same as the corresponding segmentation map of the second face image. In this way, compared with the prior art face image synthesis method that requires the participation of professional painters or cannot interact with the user, the face image synthesis method provided in the embodiments of the present application can interact with the user to analyze the face image attributes that are difficult to describe. The information is adjusted through simple input recognition, thereby more comprehensively synthesizing the face image that the user needs.
图10示出了上述实施例中所涉及的人脸图像合成装置1000的一种可能的结构示意图。该人脸图像合成装置1000包括:获取单元1001,以及处理单元1002。FIG. 10 shows a schematic diagram of a possible structure of the face image synthesizing device 1000 involved in the foregoing embodiment. The face image synthesis device 1000 includes: an acquisition unit 1001 and a processing unit 1002.
其中,获取单元1001,用于支持人脸图像合成装置1000执行图4中的步骤S101,图7中的步骤S101和步骤S106,和/或用于本文所描述的技术的其它过程。Wherein, the acquiring unit 1001 is used to support the face image synthesis apparatus 1000 to perform step S101 in FIG. 4, step S101 and step S106 in FIG. 7, and/or other processes used in the technology described herein.
处理单元1002,用于支持人脸图像合成装置1000执行图4中的步骤S102-步骤S105,图7中的步骤S102-步骤S105和步骤S107-步骤S108,和/或用于本文所描述的技术的其它过程。The processing unit 1002 is used to support the face image synthesis device 1000 to perform step S102-step S105 in FIG. 4, step S102-step S105 and step S107-step S108 in FIG. 7, and/or for the technology described herein Other processes.
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能单元的功能描述,在此不再赘述。Among them, all relevant content of each step involved in the above method embodiment can be cited in the functional description of the corresponding functional unit, which will not be repeated here.
图11所示为本申请实施例提供的装置的硬件结构示意图。该装置包括至少一个处理器1101,通信线路1102,存储器1103以及至少一个通信接口1104。其中,存储器1103还可以包括于处理器1101中。FIG. 11 is a schematic diagram of the hardware structure of the device provided by an embodiment of the application. The device includes at least one processor 1101, a communication line 1102, a memory 1103, and at least one communication interface 1104. The memory 1103 may also be included in the processor 1101.
处理器1101可以是一个通用中央处理器(central processing unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本申请方案程序执行的集成电路。The processor 1101 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more programs for controlling the execution of the program of this application. integrated circuit.
通信线路1102可包括一通路,在上述组件之间传送信息。The communication line 1102 may include a path to transmit information between the aforementioned components.
通信接口1104,用于与其他设备通信。在本申请实施例中,通信接口可以是模块、电路、总线、接口、收发器或者其它能实现通信功能的装置,用于与其他设备通信。可选的,当通信接口是收发器时,该收发器可以为独立设置的发送器,该发送器可用于向其他设备发送信息,该收发器也可以为独立设置的接收器,用于从其他设备接收信息。该收发器也可以是将发送、接收信息功能集成在一起的部件,本申请实施例对收发器的具体实现不做限制。The communication interface 1104 is used to communicate with other devices. In the embodiments of the present application, the communication interface may be a module, a circuit, a bus, an interface, a transceiver, or other device that can realize a communication function, and is used to communicate with other devices. Optionally, when the communication interface is a transceiver, the transceiver can be an independently set transmitter, which can be used to send information to other devices, and the transceiver can also be an independently set receiver for sending information from other devices. The device receives information. The transceiver may also be a component that integrates the functions of sending and receiving information, and the embodiment of the present application does not limit the specific implementation of the transceiver.
存储器1103可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过通信线路1102与处理器1101相连接。存储器1103也可以和处理器1101集成在一起。The memory 1103 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions The dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this. The memory may exist independently, and is connected to the processor 1101 through a communication line 1102. The memory 1103 may also be integrated with the processor 1101.
其中,存储器1103用于存储用于实现本申请方案的计算机执行指令,并由处理器1101来控制执行。处理器1101用于执行存储器1103中存储的计算机执行指令,从而实现本申请下述实施例提供的神经网络自适应的搜索方法。The memory 1103 is used to store computer-executed instructions used to implement the solution of the present application, and the processor 1101 controls the execution. The processor 1101 is configured to execute computer-executable instructions stored in the memory 1103, so as to implement the neural network adaptive search method provided in the following embodiments of the present application.
可选的,本申请实施例中的计算机执行指令也可以称之为应用程序代码、指令、计算机程序或者其它名称,本申请实施例对此不作具体限定。Optionally, the computer execution instructions in the embodiments of the present application may also be referred to as application program codes, instructions, computer programs, or other names, which are not specifically limited in the embodiments of the present application.
在具体实现中,作为一种实施例,处理器1101可以包括一个或多个CPU,例如 图11中的CPU0和CPU1。In a specific implementation, as an embodiment, the processor 1101 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 11.
在具体实现中,作为一种实施例,该装置可以包括多个处理器,例如图11中的处理器1101和处理器1105。这些处理器中的每一个可以是一个单核(single-core)处理器,也可以是一个多核(multi-core)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。In a specific implementation, as an embodiment, the device may include multiple processors, such as the processor 1101 and the processor 1105 in FIG. 11. Each of these processors can be a single-core processor or a multi-core processor. The processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
需要说明的是,上述的装置可以是一个通用设备或者是一个专用设备,本申请实施例不限定该装置的类型。本申请实施例示意的结构并不构成对该装置的具体限定。在本申请另一些实施例中,该装置可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It should be noted that the above-mentioned device may be a general-purpose device or a special-purpose device, and the embodiment of the present application does not limit the type of the device. The structure illustrated in the embodiment of the present application does not constitute a specific limitation on the device. In other embodiments of the present application, the device may include more or fewer components than those shown in the figure, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机指令,当该计算机指令在服务器上运行时,使得服务器执行上述相关方法步骤实现上述实施例中的人脸图像合成方法。The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on a server, the server executes the above-mentioned related method steps to realize the people in the above-mentioned embodiments. Face image synthesis method.
本申请实施例还提供一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述实施例中的人脸图像合成方法。The embodiments of the present application also provide a computer program product, which when the computer program product runs on a computer, causes the computer to execute the above-mentioned related steps, so as to realize the face image synthesis method in the above-mentioned embodiment.
另外,本申请的实施例还提供一种装置,该装置具体可以是组件或模块,该装置可包括相连的处理器和存储器;其中,存储器用于存储计算机执行指令,当装置运行时,处理器可执行存储器存储的计算机执行指令,以使装置执行上述各方法实施例中的人脸图像合成方法。In addition, the embodiments of the present application also provide a device, which may specifically be a component or a module. The device may include a connected processor and a memory; wherein the memory is used to store computer execution instructions. When the device is running, the processor The computer-executable instructions stored in the executable memory can be executed to make the device execute the face image synthesis method in the foregoing method embodiments.
其中,本申请实施例提供的装置、计算机可读存储介质、计算机程序产品或芯片均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。Among them, the device, computer readable storage medium, computer program product, or chip provided in the embodiments of the present application are all used to execute the corresponding method provided above. Therefore, the beneficial effects that can be achieved can refer to the above provided The beneficial effects of the corresponding method will not be repeated here.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Through the description of the above embodiments, those skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated according to needs. It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the above-described system, device, and unit, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,模块或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed method can be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, for example, multiple units or components may be divided. Combined or can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, modules or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集 成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be realized in the form of hardware or software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:快闪存储器、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序指令的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage media include: flash memory, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program instructions.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any changes or substitutions within the technical scope disclosed in this application shall be covered by the protection scope of this application. . Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (23)

  1. 一种人脸图像合成方法,其特征在于,所述方法包括:A method for synthesizing a face image, characterized in that the method includes:
    获取第一属性信息;所述第一属性信息为待合成人脸图像包含的属性信息;Acquiring first attribute information; the first attribute information is the attribute information contained in the face image to be synthesized;
    根据所述第一属性信息,在真实人脸图像库中查找第一人脸图像;所述第一人脸图像包含第二属性信息,所述第二属性信息与所述第一属性信息的重复率满足阈值要求;According to the first attribute information, search for a first face image in a real face image database; the first face image includes second attribute information, and the second attribute information is a duplicate of the first attribute information The rate meets the threshold requirement;
    根据所述第一属性信息以及所述第二属性信息,获得属性差异信息;所述属性差异信息用于表示所述第一人脸图像与所述待合成人脸图像之间的属性差异;Obtaining attribute difference information according to the first attribute information and the second attribute information; the attribute difference information is used to indicate the attribute difference between the first face image and the face image to be synthesized;
    对所述第一人脸图像进行面部特征提取,获得所述第一人脸图像的第一面部特征信息;Performing facial feature extraction on the first face image to obtain first facial feature information of the first face image;
    根据所述第一面部特征信息以及所述属性差异信息合成第二人脸图像。Synthesize a second face image according to the first facial feature information and the attribute difference information.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第一属性信息以及所述第二属性信息,获得属性差异信息,包括:The method according to claim 1, wherein the obtaining attribute difference information according to the first attribute information and the second attribute information comprises:
    根据所述第一属性信息获得第一属性向量;Obtaining a first attribute vector according to the first attribute information;
    根据所述第二属性信息获得第二属性向量;所述第一属性向量和所述第二属性向量的长度相同,每一位对应一种属性信息;Obtaining a second attribute vector according to the second attribute information; the length of the first attribute vector and the second attribute vector are the same, and each bit corresponds to one type of attribute information;
    根据所述第一属性向量和所述第二属性向量的差异获得第一向量;所述第一向量用于表示所述属性差异信息。A first vector is obtained according to the difference between the first attribute vector and the second attribute vector; the first vector is used to represent the attribute difference information.
  3. 根据权利要求1所述的方法,其特征在于,所述对所述第一人脸图像进行面部特征提取,获得所述第一人脸图像的第一面部特征信息,包括:The method according to claim 1, wherein said performing facial feature extraction on said first face image to obtain first facial feature information of said first face image comprises:
    将所述第一人脸图像输入第一神经网络,获得第二向量;其中,所述第一神经网络用于提取输入的人脸图像的面部特征信息;所述第二向量用于表示所述第一人脸图像的第一面部特征信息。The first face image is input to a first neural network to obtain a second vector; wherein, the first neural network is used to extract facial feature information of the input face image; the second vector is used to represent the The first facial feature information of the first face image.
  4. 根据权利要求2或3所述的方法,其特征在于,所述根据所述第一面部特征信息以及所述属性差异信息,合成第二人脸图像,包括:The method according to claim 2 or 3, wherein the synthesizing a second face image according to the first facial feature information and the attribute difference information comprises:
    获取第一向量以及第二向量;其中,所述第一向量用于表示所述属性差异信息;所述第二向量用于表示第一人脸图像的第一面部特征信息;Acquiring a first vector and a second vector; wherein the first vector is used to represent the attribute difference information; the second vector is used to represent the first facial feature information of the first face image;
    将所述第一向量和所述第二向量进行拼接后输入第二神经网络,获得所述第二人脸图像;其中,所述第二神经网络用于根据所述属性差异信息修正所述第一人脸图像的第一面部特征信息。The first vector and the second vector are spliced and input into a second neural network to obtain the second face image; wherein, the second neural network is used to correct the first face image according to the attribute difference information The first facial feature information of a face image.
  5. 根据权利要求4所述的方法,其特征在于,在所述对所述第一人脸图像进行面部特征提取,获得所述第一人脸图像的第一面部特征信息之前,所述方法还包括:The method according to claim 4, characterized in that, before the facial feature extraction is performed on the first face image to obtain the first facial feature information of the first face image, the method further include:
    初始化所述第二神经网络;Initialize the second neural network;
    获取第一真实人脸图像,所述第一真实人脸图像对应的向量以及属性差异信息对应的向量;其中,所述第一真实人脸图像对应的向量用于表示所述第一真实人脸图像的面部特征信息;Acquire a first real face image, a vector corresponding to the first real face image, and a vector corresponding to attribute difference information; wherein the vector corresponding to the first real face image is used to represent the first real face The facial feature information of the image;
    将所述第一真实人脸图像对应的向量和所述属性差异信息对应的向量输入所述第二神经网络,输出包含所述属性差异信息的合成人脸图像;Inputting the vector corresponding to the first real face image and the vector corresponding to the attribute difference information into the second neural network, and outputting a synthetic face image containing the attribute difference information;
    将所述第一真实人脸图像和所述第一真实人脸图像对应的合成人脸图像输入第三 神经网络和第四神经网络;其中,所述第三神经网络用于判别输入的人脸图像为真实人脸图像的第一概率;所述第四神经网络用于判别输入的人脸图像包含所述属性差异信息的第二概率;Input the first real human face image and the synthetic human face image corresponding to the first real human face image into the third neural network and the fourth neural network; wherein the third neural network is used to discriminate the input human face The first probability that the image is a real face image; the fourth neural network is used to determine the second probability that the input face image contains the attribute difference information;
    根据所述第三神经网络输出的第一概率和所述第四神经网络输出的第二概率对所述第二神经网络进行迭代训练;Performing iterative training on the second neural network according to the first probability output by the third neural network and the second probability output by the fourth neural network;
    在迭代训练过程中,调整所述第二神经网络的参数的权重;During the iterative training process, adjusting the weights of the parameters of the second neural network;
    若所述第三神经网络输出的第一概率大于第一阈值和所述第四神经网络输出的第二概率大于第二阈值则停止训练所述第二神经网络,得到可用于合成人脸图像的第二神经网络。If the first probability of the output of the third neural network is greater than the first threshold and the second probability of the output of the fourth neural network is greater than the second threshold, stop training the second neural network to obtain a signal that can be used to synthesize a face image The second neural network.
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-5, wherein the method further comprises:
    获取用户反馈的属性调整信息;Obtain the attribute adjustment information feedback from the user;
    对所述第二人脸图像进行面部特征提取,获得所述第二人脸图像的第二面部特征信息;Performing facial feature extraction on the second face image to obtain second facial feature information of the second face image;
    根据所述第二面部特征信息以及所述属性调整信息合成第三人脸图像。Synthesize a third face image according to the second facial feature information and the attribute adjustment information.
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述第二面部特征信息以及所述属性调整信息合成第三人脸图像,包括:The method according to claim 6, wherein the synthesizing a third face image according to the second facial feature information and the attribute adjustment information comprises:
    根据所述用户反馈的属性调整信息,获得第三向量;其中,所述第三向量用于表示所述第二人脸图像中需要调整的属性信息;Obtaining a third vector according to the attribute adjustment information fed back by the user; wherein the third vector is used to represent the attribute information that needs to be adjusted in the second face image;
    将所述第二人脸图像输入第五神经网络,获得第四向量;其中,所述第五神经网络用于提取输入的人脸图像的面部特征信息;所述第四向量用于表示所述第二人脸图像的第二面部特征信息;The second face image is input to a fifth neural network to obtain a fourth vector; wherein, the fifth neural network is used to extract facial feature information of the input face image; the fourth vector is used to represent the Second facial feature information of the second face image;
    将所述第三向量和所述第四向量拼接后输入第六神经网络,获得所述第三人脸图像;其中,所述第六神经网络用于根据所述属性调整信息修正所述第二人脸图像的第二面部特征信息。The third vector and the fourth vector are spliced and input into a sixth neural network to obtain the third face image; wherein, the sixth neural network is used to modify the second face image according to the attribute adjustment information The second facial feature information of the face image.
  8. 根据权利要求7所述的方法,其特征在于,在所述根据所述第二面部特征信息以及所述属性调整信息合成第三人脸图像之前,所述方法还包括:The method according to claim 7, characterized in that, before the synthesizing a third face image according to the second facial feature information and the attribute adjustment information, the method further comprises:
    初始化所述第六神经网络;Initialize the sixth neural network;
    获取第二真实人脸图像,所述第二真实人脸图像对应的向量,以及属性调整信息对应的向量;其中,所述第二真实人脸图像对应的向量用于表示所述第二真实人脸图像的面部特征信息;Acquire a second real face image, a vector corresponding to the second real face image, and a vector corresponding to the attribute adjustment information; wherein the vector corresponding to the second real face image is used to represent the second real person Facial feature information of the face image;
    将所述第二真实人脸图像对应的向量和所述属性调整信息对应的向量输入所述第六神经网络,输出包含所述属性调整信息的合成人脸图像;Inputting the vector corresponding to the second real face image and the vector corresponding to the attribute adjustment information into the sixth neural network, and outputting a synthetic face image containing the attribute adjustment information;
    将所述第二真实人脸图像和所述第二真实人脸图像对应的合成人脸图像输入第七神经网络和第八神经网络;其中,所述第七神经网络用于判别输入的人脸图像为真实人脸图像的第三概率;所述第八神经网络用于判别所述第二真实人脸图像和所述第二真实人脸图像对应的合成人脸图像的分割图一致的第四概率;Input the second real human face image and the synthetic human face image corresponding to the second real human face image into the seventh neural network and the eighth neural network; wherein the seventh neural network is used to discriminate the input human face The third probability that the image is a real face image; the eighth neural network is used to determine that the second real face image and the second real face image have the same fourth probability that the synthetic face image segmentation map is consistent Probability
    根据所述第七神经网络输出的第三概率和所述第八神经网络的输出的第四概率对所述第六神经网络进行迭代训练;Performing iterative training on the sixth neural network according to the third probability output by the seventh neural network and the fourth probability output by the eighth neural network;
    在迭代训练过程中,调整所述第六神经网络的参数的权重;During the iterative training process, adjusting the weights of the parameters of the sixth neural network;
    若所述第七神经网络输出的第三概率大于第三阈值和第八神经网络输出的第四概率大于第四阈值则停止训练,得到可用于合成人脸图像的第六神经网络。If the third probability output by the seventh neural network is greater than the third threshold and the fourth probability output by the eighth neural network is greater than the fourth threshold, the training is stopped, and a sixth neural network that can be used to synthesize a face image is obtained.
  9. 根据权利要求1-8任一项所述的方法,其特征在于,属性信息包括如下任一项或几项:年龄信息、性别信息、种族信息、肤色信息、脸型信息、五官信息、皮肤情况信息、佩戴饰品信息、发型信息、妆容信息。The method according to any one of claims 1-8, wherein the attribute information includes any one or more of the following: age information, gender information, race information, skin color information, face shape information, facial features information, skin condition information , Wearing accessories information, hairstyle information, makeup information.
  10. 根据权利要求1-9任一项所述的方法,其特征在于,在所述获取第一属性信息之前,所述方法还包括:The method according to any one of claims 1-9, characterized in that, before said obtaining the first attribute information, the method further comprises:
    建立真实人脸图像库,所述真实人脸图像库中包含真实人脸图像以及所述真实人脸图像包含的属性信息。A real face image database is established, and the real face image database contains real face images and attribute information contained in the real face images.
  11. 一种人脸图像合成装置,其特征在于,所述装置包括:获取单元以及处理单元;A face image synthesis device, characterized in that the device includes: an acquisition unit and a processing unit;
    所述获取单元,用于获取第一属性信息;所述第一属性信息为待合成人脸图像包含的属性信息;The acquiring unit is configured to acquire first attribute information; the first attribute information is the attribute information contained in the face image to be synthesized;
    所述处理单元,用于根据所述第一属性信息,在真实人脸图像库中查找第一人脸图像;所述第一人脸图像包含第二属性信息,所述第二属性信息与所述第一属性信息的重复率满足阈值要求;The processing unit is configured to search for a first face image in a real face image database according to the first attribute information; the first face image includes second attribute information, and the second attribute information is related to the The repetition rate of the first attribute information meets the threshold requirement;
    所述处理单元,还用于根据所述第一属性信息以及所述第二属性信息,获得属性差异信息;所述属性差异信息用于表示所述第一人脸图像与所述待合成人脸图像之间的属性差异;The processing unit is further configured to obtain attribute difference information according to the first attribute information and the second attribute information; the attribute difference information is used to indicate the first face image and the face image to be synthesized The attribute difference between;
    所述处理单元,还用于对所述第一人脸图像进行面部特征提取,获得所述第一人脸图像的第一面部特征信息;The processing unit is further configured to perform facial feature extraction on the first face image to obtain first facial feature information of the first face image;
    所述处理单元,还用于根据所述第一面部特征信息以及所述属性差异信息合成第二人脸图像。The processing unit is further configured to synthesize a second face image according to the first facial feature information and the attribute difference information.
  12. 根据权利要求11所述的装置,其特征在于,The device according to claim 11, wherein:
    所述处理单元,具体用于根据所述第一属性信息获得第一属性向量;根据所述第二属性信息获得第二属性向量;所述第一属性向量和所述第二属性向量的长度相同,每一位对应一种属性信息;根据所述第一属性向量和所述第二属性向量的差异获得第一向量;所述第一向量用于表示所述属性差异信息。The processing unit is specifically configured to obtain a first attribute vector according to the first attribute information; obtain a second attribute vector according to the second attribute information; the length of the first attribute vector and the second attribute vector are the same , Each bit corresponds to a type of attribute information; a first vector is obtained according to the difference between the first attribute vector and the second attribute vector; the first vector is used to represent the attribute difference information.
  13. 根据权利要求11所述的装置,其特征在于,The device according to claim 11, wherein:
    所述处理单元,具体用于将所述第一人脸图像输入第一神经网络,获得第二向量;其中,所述第一神经网络用于提取输入的人脸图像的面部特征信息;所述第二向量用于表示所述第一人脸图像的第一面部特征信息。The processing unit is specifically configured to input the first face image into a first neural network to obtain a second vector; wherein, the first neural network is configured to extract facial feature information of the input face image; The second vector is used to represent the first facial feature information of the first face image.
  14. 根据权利要求12或13所述的装置,其特征在于,The device according to claim 12 or 13, characterized in that:
    所述处理单元具体于获取第一向量以及第二向量;其中,所述第一向量用于表示所述属性差异信息;所述第二向量用于表示第一人脸图像的第一面部特征信息;将所述第一向量和所述第二向量进行拼接后输入第二神经网络,获得所述第二人脸图像;其中,所述第二神经网络用于根据所述属性差异信息修正所述第一人脸图像的第一面部特征信息。The processing unit is specifically configured to obtain a first vector and a second vector; wherein, the first vector is used to represent the attribute difference information; the second vector is used to represent the first facial feature of the first face image Information; the first vector and the second vector are spliced and then input into a second neural network to obtain the second face image; wherein the second neural network is used to correct the all according to the attribute difference information The first facial feature information of the first face image is described.
  15. 根据权利要求14所述的装置,其特征在于,The device of claim 14, wherein:
    所述处理单元,还用于初始化所述第二神经网络;获取第一真实人脸图像,所述第一真实人脸图像对应的向量以及属性差异信息对应的向量;其中,所述第一真实人脸图像对应的向量用于表示所述第一真实人脸图像的面部特征信息;将所述第一真实人脸图像对应的向量和所述属性差异信息对应的向量输入所述第二神经网络,输出包含所述属性差异信息的合成人脸图像;将所述第一真实人脸图像和所述第一真实人脸图像对应的合成人脸图像输入第三神经网络和第四神经网络;其中,所述第三神经网络用于判别输入的人脸图像为真实人脸图像的第一概率;所述第四神经网络用于判别输入的人脸图像包含所述属性差异信息的第二概率;根据所述第三神经网络输出的第一概率和所述第四神经网络输出的第二概率对所述第二神经网络进行迭代训练;在迭代训练过程中,调整所述第二神经网络的参数的权重;若所述第三神经网络输出的第一概率大于第一阈值和所述第四神经网络输出的第二概率大于第二阈值则停止训练所述第二神经网络,得到可用于合成人脸图像的第二神经网络。The processing unit is also used to initialize the second neural network; obtain a first real face image, a vector corresponding to the first real face image, and a vector corresponding to attribute difference information; wherein, the first real face image The vector corresponding to the face image is used to represent the facial feature information of the first real face image; the vector corresponding to the first real face image and the vector corresponding to the attribute difference information are input to the second neural network , Output a synthetic face image containing the attribute difference information; input the first real face image and the synthetic face image corresponding to the first real face image into the third neural network and the fourth neural network; wherein The third neural network is used to determine the first probability that the input face image is a real face image; the fourth neural network is used to determine the second probability that the input face image contains the attribute difference information; Perform iterative training on the second neural network according to the first probability output by the third neural network and the second probability output by the fourth neural network; during the iterative training process, adjust the parameters of the second neural network If the first probability of the output of the third neural network is greater than the first threshold and the second probability of the output of the fourth neural network is greater than the second threshold, stop training the second neural network to obtain The second neural network for face images.
  16. 根据权利要求11-15任一项所述的装置,其特征在于,The device according to any one of claims 11-15, wherein:
    所述获取单元,还用于获取用户反馈的属性调整信息;The obtaining unit is also used to obtain attribute adjustment information fed back by the user;
    所述处理单元,还用于对所述第二人脸图像进行面部特征提取,获得所述第二人脸图像的第二面部特征信息;The processing unit is further configured to perform facial feature extraction on the second face image to obtain second facial feature information of the second face image;
    所述处理单元,还用于根据所述第二面部特征信息以及所述属性调整信息合成第三人脸图像。The processing unit is further configured to synthesize a third face image according to the second facial feature information and the attribute adjustment information.
  17. 根据权利要求16所述的装置,其特征在于,The device of claim 16, wherein:
    所述处理单元,具体用于根据所述用户反馈的属性调整信息,获得第三向量;其中,所述第三向量用于表示所述第二人脸图像中需要调整的属性信息;将所述第二人脸图像输入第五神经网络,获得第四向量;其中,所述第五神经网络用于提取输入的人脸图像的面部特征信息;所述第四向量用于表示所述第二人脸图像的第二面部特征信息;将所述第三向量和所述第四向量拼接后输入第六神经网络,获得所述第三人脸图像;其中,所述第六神经网络用于根据所述属性调整信息修正所述第二人脸图像的第二面部特征信息。The processing unit is specifically configured to obtain a third vector according to the attribute adjustment information fed back by the user; wherein, the third vector is used to represent the attribute information that needs to be adjusted in the second face image; The second face image is input to the fifth neural network to obtain a fourth vector; wherein, the fifth neural network is used to extract facial feature information of the input face image; the fourth vector is used to represent the second person The second facial feature information of the face image; splicing the third vector and the fourth vector into the sixth neural network to obtain the third face image; wherein the sixth neural network is used to The attribute adjustment information corrects the second facial feature information of the second face image.
  18. 根据权利要求17所述的装置,其特征在于,The device of claim 17, wherein:
    所述处理单元,还用于初始化所述第六神经网络;获取第二真实人脸图像,所述第二真实人脸图像对应的向量,以及属性调整信息对应的向量;其中,所述第二真实人脸图像对应的向量用于表示第二真实人脸图像的面部特征信息;将所述第二真实人脸图像对应的向量和所述属性调整信息对应的向量输入所述第六神经网络,输出包含所述属性调整信息的合成人脸图像;将所述第二真实人脸图像和所述第二真实人脸图像对应的合成人脸图像输入第七神经网络和第八神经网络;其中,所述第七神经网络用于判别输入的人脸图像为真实人脸图像的第三概率;所述第八神经网络用于判别所述第二真实人脸图像和所述第二真实人脸图像对应的合成人脸图像的分割图一致的第四概率;根据所述第七神经网络输出的第三概率和所述第八神经网络的输出的第四概率对所述第六神经网络进行迭代训练;在迭代训练过程中,调整所述第六神经网络的参数的权重;若所述第七神经网络输出的第三概率大于第三阈值和第八神经网络输出的第四概率大于第四阈值则停止训练,得到可用于合成人脸图像的第六神经网络。The processing unit is also used to initialize the sixth neural network; obtain a second real face image, a vector corresponding to the second real face image, and a vector corresponding to attribute adjustment information; wherein, the second The vector corresponding to the real face image is used to represent the facial feature information of the second real face image; the vector corresponding to the second real face image and the vector corresponding to the attribute adjustment information are input into the sixth neural network, Outputting a synthetic face image containing the attribute adjustment information; inputting the second real face image and the synthetic face image corresponding to the second real face image into the seventh neural network and the eighth neural network; wherein, The seventh neural network is used to determine the third probability that the input face image is a real human face image; the eighth neural network is used to determine the second real human face image and the second real human face image The fourth probability that the segmentation maps of the corresponding synthetic face image are consistent; the sixth neural network is iteratively trained according to the third probability output by the seventh neural network and the fourth probability output by the eighth neural network In the iterative training process, adjust the weights of the parameters of the sixth neural network; if the third probability of the output of the seventh neural network is greater than the third threshold and the fourth probability of the output of the eighth neural network is greater than the fourth threshold, then Stop training and get the sixth neural network that can be used to synthesize face images.
  19. 根据权利要求11-18任一项所述的装置,其特征在于,属性信息包括如下任一项或几项:年龄信息、性别信息、种族信息、肤色信息、脸型信息、五官信息、皮肤情况信息、佩戴饰品信息、发型信息、妆容信息。The device according to any one of claims 11-18, wherein the attribute information includes any one or more of the following: age information, gender information, race information, skin color information, face shape information, facial features information, skin condition information , Wearing accessories information, hairstyle information, makeup information.
  20. 根据权利要求11-19任一项所述的装置,其特征在于,The device according to any one of claims 11-19, wherein:
    所述处理单元,还用于建立真实人脸图像库,所述真实人脸图像库中包含真实人脸图像以及所述真实人脸图像包含的属性信息。The processing unit is also used to establish a real face image library, which contains real face images and attribute information contained in the real face images.
  21. 一种人脸图像合成装置,其特征在于,包括:A face image synthesis device, which is characterized in that it comprises:
    一个或多个处理器;One or more processors;
    存储器;Memory
    以及一个或多个指令,其中所述一个或多个指令被存储在所述存储器中;当所述指令被所述一个或多个处理器执行时,使得所述人脸图像合成装置执行如权利要求1-10中任一项所述的人脸图像合成方法。And one or more instructions, wherein the one or more instructions are stored in the memory; when the instructions are executed by the one or more processors, the face image synthesis device is caused to execute The face image synthesis method described in any one of 1-10 is required.
  22. 一种计算机可读存储介质,其特征在于,包括计算机指令,当所述计算机指令在计算机上运行时,使得处理器执行如权利要求1-10中任一项所述的人脸图像合成方法。A computer-readable storage medium, characterized by comprising computer instructions, when the computer instructions run on a computer, cause a processor to execute the face image synthesis method according to any one of claims 1-10.
  23. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1-10中任一项所述的人脸图像合成方法。A computer program product, characterized in that, when the computer program product runs on a computer, the computer is caused to execute the face image synthesis method according to any one of claims 1-10.
PCT/CN2020/140440 2020-02-29 2020-12-28 Method and apparatus for compositing face image WO2021169556A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010132570.1 2020-02-29
CN202010132570.1A CN113327191B (en) 2020-02-29 2020-02-29 Face image synthesis method and device

Publications (1)

Publication Number Publication Date
WO2021169556A1 true WO2021169556A1 (en) 2021-09-02

Family

ID=77413045

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/140440 WO2021169556A1 (en) 2020-02-29 2020-12-28 Method and apparatus for compositing face image

Country Status (2)

Country Link
CN (1) CN113327191B (en)
WO (1) WO2021169556A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114283051A (en) * 2021-12-09 2022-04-05 湖南大学 Face image processing method and device, computer equipment and storage medium
CN114373033A (en) * 2022-01-10 2022-04-19 腾讯科技(深圳)有限公司 Image processing method, image processing apparatus, image processing device, storage medium, and computer program
CN115065863A (en) * 2022-06-14 2022-09-16 北京达佳互联信息技术有限公司 Video generation method and device, electronic equipment and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113850890A (en) * 2021-09-29 2021-12-28 北京字跳网络技术有限公司 Method, device, equipment and storage medium for generating animal image
CN117078974B (en) * 2023-09-22 2024-01-05 腾讯科技(深圳)有限公司 Image processing method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070229498A1 (en) * 2006-03-29 2007-10-04 Wojciech Matusik Statistical modeling for synthesis of detailed facial geometry
CN104715447A (en) * 2015-03-02 2015-06-17 百度在线网络技术(北京)有限公司 Image synthesis method and device
CN107967463A (en) * 2017-12-12 2018-04-27 武汉科技大学 A kind of conjecture face recognition methods based on composograph and deep learning
CN108319932A (en) * 2018-03-12 2018-07-24 中山大学 A kind of method and device for the more image faces alignment fighting network based on production
CN109376582A (en) * 2018-09-04 2019-02-22 电子科技大学 A kind of interactive human face cartoon method based on generation confrontation network
CN110322394A (en) * 2019-06-18 2019-10-11 中国科学院自动化研究所 Face age ageing image confrontation generation method and device based on attribute guidance

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170116498A1 (en) * 2013-12-04 2017-04-27 J Tech Solutions, Inc. Computer device and method executed by the computer device
CN110097606B (en) * 2018-01-29 2023-07-07 微软技术许可有限责任公司 Face synthesis
CN108537790B (en) * 2018-04-13 2021-09-03 西安电子科技大学 Different-source image change detection method based on coupling translation network
CN110532871B (en) * 2019-07-24 2022-05-10 华为技术有限公司 Image processing method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070229498A1 (en) * 2006-03-29 2007-10-04 Wojciech Matusik Statistical modeling for synthesis of detailed facial geometry
CN104715447A (en) * 2015-03-02 2015-06-17 百度在线网络技术(北京)有限公司 Image synthesis method and device
CN107967463A (en) * 2017-12-12 2018-04-27 武汉科技大学 A kind of conjecture face recognition methods based on composograph and deep learning
CN108319932A (en) * 2018-03-12 2018-07-24 中山大学 A kind of method and device for the more image faces alignment fighting network based on production
CN109376582A (en) * 2018-09-04 2019-02-22 电子科技大学 A kind of interactive human face cartoon method based on generation confrontation network
CN110322394A (en) * 2019-06-18 2019-10-11 中国科学院自动化研究所 Face age ageing image confrontation generation method and device based on attribute guidance

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114283051A (en) * 2021-12-09 2022-04-05 湖南大学 Face image processing method and device, computer equipment and storage medium
CN114373033A (en) * 2022-01-10 2022-04-19 腾讯科技(深圳)有限公司 Image processing method, image processing apparatus, image processing device, storage medium, and computer program
CN115065863A (en) * 2022-06-14 2022-09-16 北京达佳互联信息技术有限公司 Video generation method and device, electronic equipment and storage medium
CN115065863B (en) * 2022-06-14 2024-04-12 北京达佳互联信息技术有限公司 Video generation method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113327191B (en) 2024-06-21
CN113327191A (en) 2021-08-31

Similar Documents

Publication Publication Date Title
WO2021169556A1 (en) Method and apparatus for compositing face image
US11610122B2 (en) Generative adversarial neural network assisted reconstruction
US11625613B2 (en) Generative adversarial neural network assisted compression and broadcast
US11537884B2 (en) Machine learning model training method and device, and expression image classification method and device
TWI779970B (en) Image processing method, processor, electronic device and computer-readable storage medium
US12039454B2 (en) Microexpression-based image recognition method and apparatus, and related device
WO2021036059A1 (en) Image conversion model training method, heterogeneous face recognition method, device and apparatus
EP3933693B1 (en) Object recognition method and device
US11620521B2 (en) Smoothing regularization for a generative neural network
WO2023050992A1 (en) Network training method and apparatus for facial reconstruction, and device and storage medium
Haider et al. Deepgender: real-time gender classification using deep learning for smartphones
WO2021184902A1 (en) Image classification method and apparatus, training method and apparatus, device, and medium
CN110555896B (en) Image generation method and device and storage medium
CN111108508B (en) Face emotion recognition method, intelligent device and computer readable storage medium
CN105096353B (en) Image processing method and device
CN111680550B (en) Emotion information identification method and device, storage medium and computer equipment
WO2022267036A1 (en) Neural network model training method and apparatus and data processing method and apparatus
Ma et al. Landmark‐Based Facial Feature Construction and Action Unit Intensity Prediction
CN112668482A (en) Face recognition training method and device, computer equipment and storage medium
CN112580572A (en) Training method of multi-task recognition model, using method, equipment and storage medium
US20230351566A1 (en) Exemplar-based object appearance transfer driven by correspondence
CN109087240B (en) Image processing method, image processing apparatus, and storage medium
US11605001B2 (en) Weight demodulation for a generative neural network
CN113408694B (en) Weight demodulation for generative neural networks
WO2024066549A1 (en) Data processing method and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20921333

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20921333

Country of ref document: EP

Kind code of ref document: A1