WO2022052530A1 - 人脸矫正模型的训练方法、装置、电子设备及存储介质 - Google Patents

人脸矫正模型的训练方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022052530A1
WO2022052530A1 PCT/CN2021/098646 CN2021098646W WO2022052530A1 WO 2022052530 A1 WO2022052530 A1 WO 2022052530A1 CN 2021098646 W CN2021098646 W CN 2021098646W WO 2022052530 A1 WO2022052530 A1 WO 2022052530A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
model
corrected
face image
loss function
Prior art date
Application number
PCT/CN2021/098646
Other languages
English (en)
French (fr)
Inventor
朱振文
吴泽衡
周古月
徐倩
杨强
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2022052530A1 publication Critical patent/WO2022052530A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present application relates to the technical field of machine learning, and in particular, to a training method, device, electronic device and storage medium for a face correction model.
  • Image correction is an important direction in the field of computer science and artificial intelligence. It can correct image distortion caused by distortion and limited bandwidth, or image geometric distortion caused by imaging device shooting attitude and scanning nonlinearity, due to motion blur, radiation distortion. , image distortion caused by the introduction of noise, etc. Face correction technology is a branch in the field of image correction, which plays an increasingly important role in many life application scenarios today.
  • the method usually adopted is to use a machine model to cross the In the learning of posture face images, the input of the model is the face images in various postures to be corrected, and the output of the model is the image of the required face posture after correction.
  • this method may produce image mapping ambiguity, so that the model learns changes other than pose changes, resulting in the generated image losing the information of the original face image, and finally generating another completely different face.
  • the embodiments of the present application provide a training method, device, electronic device, and computer-readable storage medium for a face correction model, so that the trained face correction model can achieve cross-pose correction of the face without losing the information about the face image.
  • An embodiment of the present application provides a training method for a face correction model, where the method is executed by an electronic device, including:
  • the input face image is subjected to face posture correction by using the face correction model to obtain a corrected face image of a standard face posture; wherein, the face image has a face attribute of at least one dimension;
  • Predict the authenticity of the corrected face image by using the discriminant model obtain a prediction result representing the authenticity of the corrected face image compared to the target face image, and construct a first loss function based on the prediction result;
  • face attribute recognition is performed on the corrected face image for the face attributes of the at least one dimension, and a recognition result including the face attributes possessed by the corrected face image is obtained, and based on The recognition result constructs a second loss function;
  • the model parameters of the face correction model are updated.
  • the updating of the model parameters of the face correction model based on the first loss function and the second loss function includes: respectively determining the weight of the first loss function and the The weight of the second loss function; based on the weight of the first loss function and the weight of the second loss function, weighted summation is performed on the first loss function and the second loss function to obtain the target loss function; based on the target loss function, the model parameters of the face correction model are updated.
  • updating the model parameters of the face correction model based on the target loss function includes: determining the value of the first loss function based on the prediction result; Determine the value of the second loss function based on the difference between the face attribute and the recognition result; determine the target loss function based on the value of the first loss function and the value of the second loss function The value of ; based on the value of the target loss function, the model parameters of the face correction model are updated.
  • updating the model parameters of the face correction model based on the value of the target loss function includes: when the value of the target loss function reaches a first threshold, updating the model parameters based on the target loss function Determining the corresponding first error signal; starting from the output layer of the discriminant model, backpropagating the first error signal in the discriminant model and the face correction model, and updating all the The discriminant model and the model parameters of the face correction model.
  • the embodiment of the present application provides a training device for a face correction model, including:
  • the face posture correction module is configured to perform face posture correction on the input face image through the face correction model to obtain the corrected face image of the standard face posture; wherein, the face image has a face of at least one dimension Attributes;
  • a prediction module configured to predict the authenticity of the corrected face image through a discriminant model, obtain a prediction result that characterizes the authenticity of the corrected face image compared to the target face image, and construct a first a loss function
  • an attribute identification module configured to perform face attribute identification on the corrected face image with respect to the face attributes of the at least one dimension through a face attribute identification model, and obtain a face attribute that includes the corrected face image. the recognition result, and construct a second loss function based on the recognition result;
  • a parameter updating module configured to update model parameters of the face correction model based on the first loss function and the second loss function.
  • the embodiment of the present application provides an electronic device, including:
  • the processor is configured to implement the training method of the face correction model provided by the embodiment of the present application when executing the executable instructions stored in the memory.
  • the embodiments of the present application provide a computer-readable storage medium storing executable instructions for causing a processor to execute the training method of the face correction model provided by the embodiments of the present application.
  • the embodiment of the present application also provides a face correction method, the method is performed by an electronic device, and includes:
  • the face correction model is obtained by training based on the training method of the face correction model provided in the embodiment of the present application.
  • the face correction model includes an encoding layer, a correction layer and a decoding layer; the face correction model is used to perform face posture correction on the to-be-corrected face image to obtain the target of the standard face posture
  • Correcting the face image comprising: encoding the face image to be corrected through the coding layer to obtain an initial code; passing through the correction layer, based on the face posture in the face image to be corrected and the standard face The deviation of the posture, correct the initial coding to obtain the target coding; through the decoding layer, decode the target coding to obtain the target corrected face image of the standard face posture;
  • the parameters of the coding layer, the correction are obtained by updating the parameters of the first loss function constructed based on the prediction result of the discriminant model and the second loss function constructed based on the face attribute recognition result of the face attribute recognition model;
  • the prediction result is that the discriminant model predicts the authenticity of the corrected face image output by the face correction model; the face attribute recognition
  • the embodiment of the present application of the present invention also provides a face correction device, comprising:
  • an acquisition module configured to acquire the face image to be corrected
  • an input module configured to input the face image to be corrected into a face correction model
  • a correction module configured to perform face posture correction on the to-be-corrected face image through the face correction model to obtain a target corrected face image with a standard face posture
  • the face correction model is obtained by training based on the training method of the face correction model provided in the embodiment of the present application.
  • the embodiment of the present application provides an electronic device, including:
  • the processor is configured to implement the face correction method provided by the embodiment of the present application when executing the executable instructions stored in the memory.
  • Embodiments of the present application provide a computer-readable storage medium storing executable instructions for causing a processor to execute the face correction method provided by the embodiments of the present application.
  • the training method of the face correction model provided in the embodiment of the present application is applied.
  • the device, electronic equipment and computer-readable storage medium on the basic training framework of the generative adversarial network composed of a face correction model and a discriminant model, introduce a face attribute recognition model as a training guide to realize the training of the face correction model , so that the face correction model can learn the face correction across poses and the face attributes of the face image during the training process, thus overcoming the easy loss of face images by using the model training method in the related technology.
  • the defect of information realizes that the face correction model obtained by training has the function of face correction across poses, and at the same time, the corrected face image after correction can not lose the information of the original input face image.
  • FIG. 1 is a schematic diagram of the principle of a generative adversarial network model provided by the related art
  • Fig. 2 is an optional structural schematic diagram of a generative adversarial network model provided by the related art
  • FIG. 3 is an optional schematic diagram of a training system for a face correction model provided by an embodiment of the present application.
  • FIG. 5 is an optional structural schematic diagram of a face correction model provided by an embodiment of the present application.
  • FIG. 6 is an optional schematic diagram of a model architecture for model training provided by an embodiment of the present application.
  • FIG. 7 is an optional schematic flowchart of a training method for a face correction model provided by an embodiment of the present application.
  • FIG. 8 is an optional schematic diagram of a training sample provided by an embodiment of the present application.
  • FIG. 9 is an optional schematic diagram of a model architecture for model training provided by an embodiment of the present application.
  • FIG. 10 is an optional schematic flowchart of a training method for a face correction model provided by an embodiment of the present application
  • FIG. 11 is an optional schematic flowchart of a training method for a face correction model provided by an embodiment of the present application.
  • FIG. 12 is an optional schematic diagram of a training sample provided by an embodiment of the present application.
  • FIG. 13 is an optional schematic flowchart of a face correction method provided by an embodiment of the present application.
  • FIG. 14 is an optional schematic diagram of the structure of the face correction device provided by the embodiment of the present application.
  • first ⁇ second ⁇ third is only used to distinguish similar objects, and does not represent a specific ordering of objects. It is understood that “first ⁇ second ⁇ third” Where permitted, the specific order or sequence may be interchanged to enable the embodiments of the application described herein to be practiced in sequences other than those illustrated or described herein.
  • Latent space The sample space where the noise z is located, which is a vector space.
  • Cross entropy used to measure the similarity between two distributions. For example, in logistic regression, the real distribution of the data set is p, and the distribution corresponding to the result predicted by the logistic regression model is q. At this time, the cross entropy is here to measure the difference between the predicted result q and the real result p, Call it the cross-entropy loss function.
  • GAN Generative Adversarial Network
  • Convergence refers to approaching a certain value.
  • the convergence of the model refers to the convergence of the loss function of the model.
  • Figure 1 is a schematic diagram of the principle of a generative adversarial network GAN model provided by related technologies.
  • the generative adversarial network GAN model includes a generative model G and a discriminant model D.
  • the generative model G is A generative network that receives a random noise z from the latent space and generates an image G(z) from this noise.
  • the discriminant model D is a discriminative network that determines whether a picture is "real". For example, its input parameter is x, x represents a picture, and the output D(x) represents the probability that x is a real picture.
  • FIG. 2 is an optional schematic diagram of the structure of the generative adversarial network GAN model provided in the related art.
  • the goal of generating the model G is to try to generate a real picture and input it into the discriminant model D to deceive the discriminant model D.
  • the goal of the discriminative model D is to try to separate the pictures generated by the generative model G from the pictures in the real world.
  • the generative model G and the discriminant model D constitute a dynamic "game process”.
  • the generative model G learns the distribution of data. If it is used for image generation, after the training is completed, the generative model G can generate realistic images from a random number.
  • the applicant adopts a generative adversarial network composed of a face correction model and a discriminant model to perform face correction learning.
  • the face correction model is used as a generative model, and is specifically used to perform cross-pose face image correction through the face correction model.
  • the input of the face correction model is the face images in various poses to be corrected, and the output of the face correction model is the corrected face images of standard face poses, such as frontal face images.
  • the rectified face image and another frontal face image are then input to the discriminant model.
  • the other front face image may correspond to the same person as the face image input to the face correction model, or may correspond to two different persons.
  • the discriminative model is used to decide which image is real and which image is generated. Finally, it is learned through the confrontation between the face correction model and the discriminative model.
  • this method may generate image mapping ambiguity, so that the generative model learns changes other than pose changes, resulting in the generated image losing the identity of the original face image. information, and finally generate another completely different face, so the face correction model needs to be further optimized.
  • the embodiments of the present application provide a training method, device, electronic device, and computer-readable storage medium for a face correction model, which can obtain a face correction model without losing the information of the face image while realizing the cross-pose correction of the face. Face Correction Model.
  • FIG. 3 is an optional schematic diagram of the training system 100 of the face correction model provided by the embodiment of the present application.
  • the terminal 400 is connected to the server 200 through the network 300,
  • the network 300 may be a wide area network or a local area network, or a combination of the two, using wireless links to realize data transmission.
  • the terminal 400 may be a notebook computer, a tablet computer, a desktop computer, a smart phone, a dedicated messaging device, a portable gaming device, a smart speaker, a smart watch, etc., but is not limited thereto.
  • the server 200 may be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, Cloud servers for basic cloud computing services such as middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
  • the network 300 may be a wide area network or a local area network, or a combination of the two.
  • the terminal 400 and the server 200 may be directly or indirectly connected through wired or wireless communication, which is not limited in this embodiment of the present application.
  • the terminal 400 is configured to send the face image used for training the face correction model to the server 200;
  • the server 200 is configured to perform face posture correction on an input face image through a face correction model to obtain a corrected face image with a standard face posture; wherein the face image has a face attribute of at least one dimension; Predict the authenticity of the corrected face image, obtain a prediction result representing the authenticity of the corrected face image compared to the target face image, and construct the first loss function based on the prediction result;
  • the face image is subjected to face attribute recognition for face attributes of at least one dimension, and a recognition result including the face attribute possessed by the corrected face image is obtained, and a second loss function is constructed based on the recognition result; based on the first loss function and the second loss function
  • the loss function constructs the target loss function and obtains the value of the target loss function; based on the value of the target loss function, the model parameters of the face correction model are updated. In this way, the training of the face correction model is realized.
  • the terminal 400 is further configured to send an image correction request carrying the face image to be corrected to the server 200, so that the server 200 obtains the face image to be corrected after parsing the image correction request, and treats the corrected face image through the face correction model obtained by training Perform face posture correction to obtain a corrected face image of the standard face posture and return to the terminal 400 .
  • FIG. 4 is an optional schematic diagram of the structure of the electronic device 500 provided by the embodiment of the present application.
  • the electronic device 500 may be implemented as the terminal 400 or the server 200 in FIG. 3 .
  • the electronic device implementing the training method of the face correction model of the embodiment of the present application will be described.
  • the electronic device 500 shown in FIG. 4 includes: at least one processor 510 , a memory 550 , at least one network interface 520 and a user interface 530 .
  • the various components in electronic device 500 are coupled together by bus system 540 .
  • bus system 540 is used to implement the connection communication between these components.
  • the bus system 540 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled as bus system 540 in FIG. 4 .
  • the processor 510 may be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., where a general-purpose processor may be a microprocessor or any conventional processor or the like.
  • a general-purpose processor may be a microprocessor or any conventional processor or the like.
  • User interface 530 includes one or more output devices 531 that enable presentation of media content, including one or more speakers and/or one or more visual display screens.
  • User interface 530 also includes one or more input devices 532, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, and other input buttons and controls.
  • Memory 550 may be removable, non-removable, or a combination thereof.
  • Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like.
  • Memory 550 optionally includes one or more storage devices that are physically remote from processor 510 .
  • Memory 550 includes volatile memory or non-volatile memory, and may also include both volatile and non-volatile memory.
  • the non-volatile memory may be a read-only memory (Read Only Memory, ROM), and the volatile memory may be a random access memory (Random Access Memory, RAM).
  • ROM Read Only Memory
  • RAM Random Access Memory
  • memory 550 is capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
  • the operating system 551 includes system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
  • a presentation module 553 for enabling presentation of information (eg, a user interface for operating peripherals and displaying content and information) via one or more output devices 531 associated with the user interface 530 (eg, a display screen, speakers, etc.) );
  • An input processing module 554 for detecting one or more user inputs or interactions from one of the one or more input devices 532 and translating the detected inputs or interactions.
  • the training apparatus for the face correction model provided by the embodiments of the present application may be implemented in software.
  • FIG. 4 shows the training apparatus 555 for the face correction model stored in the memory 550, which may be a program and Software in the form of plug-ins, including the following software modules: face posture correction module 5551, prediction module 5552, attribute recognition module 5553, parameter update module 5554 and parameter update module 5555, these modules are logical, so according to the realized function Arbitrary combinations or further splits are possible. The function of each module will be explained below.
  • the training apparatus for the face correction model provided by the embodiments of the present application may be implemented in hardware.
  • the training apparatus for the face correction model provided by the embodiments of the present application may be implemented by using a hardware decoding processor
  • a processor in the form of a processor which is programmed to execute the training method of the face correction model provided by the embodiments of the present application
  • a processor in the form of a hardware decoding processor may adopt one or more Application Specific Integrated Circuits (Application Specific Integrated Circuits). , ASIC), DSP, Programmable Logic Device (PLD), Complex Programmable Logic Device (CPLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other electronic components .
  • FIG. 5 is an optional face correction model provided by the embodiment of the present application. Schematic diagram of the structure.
  • the face correction model provided by the embodiments of the present application includes an encoder and a decoder.
  • the server Before implementing the training method of the face correction model provided by the embodiment of the present application, the server further constructs a face correction model consisting of an encoder and a decoder.
  • the encoder is used to encode the input image, and the image encoding of the output image.
  • the image encoding can be represented by a multivariate one-dimensional vector or a multivariate multidimensional vector representation.
  • an image can be encoded as a 256-element one-dimensional vector dimensional vector, or a 256-dimensional vector of 256 elements.
  • the decoder is used to decode the input noise to generate an image and output it.
  • the noise is a one-dimensional vector, which is calculated by the reshape function to obtain a two-dimensional image, and then uses several deconvolution layers to learn upsampling.
  • random noise and selected sample vectors can be input into the decoder at the same time to jointly constrain the decoder to generate images.
  • random noise and the image code generated by the encoder are input into the decoder, so that the decoder decodes and generates a corresponding face image.
  • the image encoding obtained by encoding is also corrected, so as to change the face image obtained by the image encoding mapping.
  • face posture in the embodiment of the present application, the corrected target image code is mapped to a standard posture face image, and the corrected target image code is input into the decoder to generate a corrected face image.
  • the encoder can be constructed by using the first 5 layers of the AlexNet network, plus a fully connected layer.
  • the fully connected layer is fully connected to the neurons in the front and rear layers for feature mapping and dimensionality reduction, and activates the AlexNet network.
  • the function is changed from a linear rectification function (Rectified Linear Unit, ReLU) to an ELU activation function.
  • the generative adversarial network involved in the embodiments of the present application may adopt a deep convolutional generative adversarial network (Deep onvolutional Generative Adversarial Network, DCGAN).
  • FIG. 6 is an optional schematic diagram of the model architecture of the training process provided by the embodiment of the present application.
  • the model architecture includes:
  • the face correction model 61 is used to correct the face posture of the input face image, and obtain the corrected face image of the standard face posture; wherein, the face image has a face attribute of at least one dimension;
  • the discriminant model 62 is used to predict the authenticity of the corrected face image, and obtain a prediction result representing the authenticity of the corrected face image compared to the target face image;
  • the face attribute recognition model 63 is used for performing face attribute recognition on the face attributes of at least one dimension of the corrected face image to obtain a recognition result including the face attributes possessed by the corrected face image.
  • the output of each model in the model architecture is used to update the model parameters of the face correction model 61 and the model parameters of the discriminant model 62, so as to realize the confrontation training of the face correction model 61 and the discriminant model 62, and then use
  • the face correction model 61 obtained by training realizes face correction under the condition of retaining the attributes of the face.
  • the training method of the face correction model provided by the embodiment of the present application may be implemented by the terminal alone, or implemented by the server alone, or implemented jointly by the server and the terminal.
  • FIG. 7 is an optional schematic flowchart of a training method for a face correction model provided by an embodiment of the present application, which will be described with reference to the steps shown in FIG. 7 .
  • FIG. 8 is an optional schematic diagram of a training sample provided by an embodiment of the present application.
  • the training samples include the face image input to the face correction model, the face attributes of the face image (not shown in the figure), and the target face image.
  • the face image is denoted as A
  • the target face image is denoted as B
  • the face attribute possessed by the face image is denoted as C
  • a set of training data can be denoted as (A, B, C).
  • the face image A and the target face image B are both real-world face images.
  • the face pose of the face image A can be any pose, such as a profile face pose.
  • the target face image B may be a face image of a standard face pose, which may correspond to the same person as the face image, or may correspond to two different persons.
  • the embodiment of the present application defines at least one dimension for a face attribute.
  • a face attribute may be defined to include at least one of the following face attribute tags: gender, age, expression, hair length, whether there is a wearable or the like.
  • the length of the hair can be further divided into multiple dimensions such as whether it is long hair, whether it is short hair, whether it is bald or not.
  • Whether there are wearing objects can be further divided into whether to wear glasses, whether to wear a hat, whether to wear earrings and other dimensions.
  • the corresponding face attributes may include: male, 22 years old, expressionless, short hair, and wearing glasses.
  • the specific definition of the face attribute is not specifically limited in this embodiment of the present application.
  • the training samples consist of multiple sets of face images, face attributes of the face images, and target face images.
  • the training samples can be already constructed general samples, which the server obtains from the target device by accessing the target device, or can be uploaded by the user based on the client, and the server receives the training samples sent by the client and uploaded by the user.
  • the target device may be the server itself, the training samples are pre-stored locally on the server, and the server obtains by accessing the storage address of the training samples.
  • the target device may also be an external device that is communicatively connected to the server, such as a terminal or a database server. The server accesses the target device through the communication connection, and obtains training samples from the target device.
  • the training samples can also be constructed and obtained by the server. Based on FIG. 7, before step 701, the following steps can be performed:
  • the server obtains the face image of the target user in any posture, the target face image of the target user in the standard face posture, and the face attribute of at least one dimension possessed by the face image;
  • the server can collect from the webpage the face image of the same target user in any posture, the target face image in a standard face posture with the same face attributes as the face image, and all the features of the face image. has at least one dimension of face attributes.
  • the server may also use a camera connected to the server to photograph the target user to obtain a face image of the target user in any posture and a target face image of the target user in a standard posture.
  • the model training personnel can perform human identification based on face images and then input them into the server.
  • the server collects the face image of the same target user in any posture from the web page, and sends the face image to the client for output.
  • the model trainers manually identify the person based on the output face image.
  • the client user inputs the face attribute to the client, and the client sends the face attribute input by the user to the server.
  • the server obtains the face attributes input by the client, and saves them mapped with the face image.
  • a training sample for training the face correction model is constructed.
  • the server takes the face image, the target face image and the face attributes possessed by the face image as a set of training data, and obtains multiple sets of training data through the above method.
  • the server takes multiple sets of training data as training samples.
  • different two groups of training data may correspond to the same target user, and may also correspond to two different users respectively.
  • the face image and the target face image in the same set of training data correspond to the same target user.
  • the face image and the target face image in the same set of training data may also correspond to two different users respectively.
  • the server after acquiring the training data, the server further preprocesses the images in the acquired training data (that is, the face image A and the target face image B).
  • the server may process the images as follows: Adjust the size of each frame of image, such as adjusting the image size of each frame of image to 286 ⁇ 386, then denoise the image, and normalize the image pixel value, such as normalizing to between -1 and 1 , and then randomly crop the image (such as randomly cropping the size to 250 ⁇ 350).
  • the server can also perform random flipping of the image, such as online flipping or left-right flipping, etc., and the server can also adjust the brightness or grayscale of the image to realize data enhancement of the image. Then, the server constructs training samples based on each set of training data after preprocessing.
  • the face image, the target face image and the face attributes of the face image are used to construct a training sample for training the face correction model, which can provide a reliable and effective training sample for the training of the face correction model.
  • step 701 After obtaining the training samples, the server performs step 701 to continue training the face correction model, which will be described below.
  • Step 701 the server performs face posture correction on the input face image through the face correction model, and obtains a corrected face image with a standard face posture; wherein, the face image has a face attribute of at least one dimension;
  • the face correction model can correct the face image of any pose across the pose to obtain the corrected face image in the standard pose, and can generate a face image that is closer to the standard pose and the real image in the continuous training process. Correct the face image.
  • step 701 shown in FIG. 7 can be implemented in the following manner, which will be described in conjunction with each step.
  • the server inputs the face image in any posture to the face correction model; encodes the face image through the face correction model to obtain the initial image code;
  • the server inputs the face image in any pose to the encoder of the face correction model.
  • Convolution operation is performed on the face image through five convolutional layers of the encoder, in which the first and second convolutional layers perform Local Response Normalization (LRN) processing on the face image, and the first , the second and fifth convolutional layers all perform a max pooling operation (MaxPooling) after the convolution operation.
  • LRN Local Response Normalization
  • MaxPooling max pooling operation
  • the activation functions used in the convolutional layers are all ReLU functions.
  • the fully connected layer is used to perform feature mapping and dimension reduction processing on the output of the convolution layer to obtain the initial image encoding of the face image.
  • the initial image encoding is corrected to obtain the target image encoding
  • the server modifies the initial image code based on the deviation between the face pose in the face image and the standard face pose, so that the target image code obtained after the correction can be mapped to the standard face pose. face image.
  • the server may use affine transformation, that is, may use RST (rotation-transformation-ratio-translation) transformation, polynomial model (Polynomial) or local triangulation (Triangulation) algorithm to modify the initial image encoding, so as to modify the initial image encoding.
  • RST rotation-transformation-ratio-translation
  • Polynomial polynomial model
  • Triangulation local triangulation
  • the target image code obtained by the above-mentioned correction of the initial code can be mapped to the face image of the standard face pose.
  • the server decodes the target image code using the face correction model, the correction of the standard face pose is obtained. face image.
  • the decoding process is that the server encodes the target image input to the decoder through the decoder of the face correction model, passes through the fully connected layer, and calculates a three-dimensional tensor through the reshape function, and the three-dimensional tensor undergoes four deconvolutions. After the network performs upsampling, a two-dimensional rectified face image is generated.
  • the target image is encoded as a 1*100 vector
  • the target image can be reshaped into a 4*4*1024 3D tensor, and then 4 upsampling deconvolution
  • the network generates a 64*64 two-dimensional image, that is, a corrected face image.
  • Step 702 Predict the authenticity of the corrected face image by using the discriminant model, obtain a prediction result representing the authenticity of the corrected face image compared to the target face image, and construct a first loss function based on the prediction result;
  • the discriminant model is a Convolutional Neural Networks (CNN) classifier, and the discriminant model in DCGAN has 4 convolutional layers. It implements authenticity classification of input samples.
  • the server inputs the real-world target face image and the corrected face image generated by the face correction model into the discriminant model.
  • Authenticity classification which outputs the prediction result of the rectified face image based on the authenticity probability of the target face image.
  • the authenticity probability represented by the output prediction result is 1, it means that the corrected face image is a real image; if the authenticity probability represented by the output prediction result is 0, it means that the corrected face image is not a real image, if the output The probability of authenticity represented by the prediction result is 0.5, which means that the discriminant model cannot judge whether the corrected face image is a real image.
  • the terminal also constructs a first loss function based on the prediction result.
  • the first loss function is used to update the decoder parameters of the face correction model and the model parameters of the discriminant model.
  • the first loss function is constructed based on formula (1):
  • L gan min G max D (log D(B)+log(1-D(G(A))) (1)
  • Lgan is the first loss function
  • D(B) is the prediction result of the authenticity prediction of the target face image B by the discriminant model
  • G(A) is the corrected face image
  • D(G(A)) is the discriminant The prediction result of the model's prediction of the authenticity of the corrected face image G(A).
  • step 702 shown in FIG. 7 “predicting the authenticity of the corrected face image by using the discriminant model to obtain a prediction result representing the authenticity of the corrected face image compared to the target face image” may be It is implemented in the following manner, and will be described in conjunction with each step.
  • the server inputs the corrected face image and the target face image into the discriminant model; the feature extraction is performed on the corrected face image and the target face image respectively through the discriminant model, and the corrected face feature corresponding to the corrected face image and the target face are obtained.
  • the server inputs the corrected face image G(A) and the target face image B to the discriminant model, and uses the discriminant model to perform feature extraction respectively.
  • the discriminant model adopted in the embodiment of the present application uses convolution with stride to realize down-sampling operation, and some specified features of the image are extracted from the input image through mathematical operation with the convolution kernel.
  • the input corrected face image and the convolution kernel are subjected to mathematical techniques through the discriminant model to obtain the corrected face feature corresponding to the corrected face image, and the input target face image and the convolution kernel are mathematically processed technology to obtain the target face features of the target face image.
  • the corrected face feature and the target face feature are represented by vectors.
  • the discriminant model implements downsampling in the convolutional layer to obtain the corrected face features and target face features, and then uses the fully connected layer to process the corrected face features and target face features to obtain fixed-length features. vector.
  • the discriminant model can accept input images of any size, and use the deconvolution layer to upsample the feature map of the last convolution layer to restore it to the same size as the input image, so that the correction of the face image can be performed.
  • Each pixel produces a prediction while preserving the spatial information in the original input image, and finally performs pixel-by-pixel classification on the upsampled feature map, and maps the output through the softmax function to characterize the corrected face image compared to the target face image. true predictions.
  • the above-mentioned process of predicting the authenticity probability of the input corrected face image through the discriminant model can effectively predict the real probability of the corrected face image, and obtain a prediction result based on the authenticity of the target face image.
  • Step 703 Perform face attribute recognition on the face attribute of at least one dimension of the corrected face image through the face attribute recognition model, obtain a recognition result including the face attribute possessed by the corrected face image, and construct based on the recognition result.
  • the second loss function
  • the face attribute recognition model is a one-to-one classification model. If the face attribute has multiple dimensions, the face attribute recognition model is a one-to-many classification model with multiple tasks and multiple classifications, including multiple linear discriminant functions, and softmax regression can be used to implement multi-class Logistic regression.
  • the face attribute is denoted as C
  • n dimensions are defined for C
  • the face attribute C can have n values. Given an x, the conditional probability of the face attribute label belonging to the nth dimension predicted by softmax regression can be obtained based on formula (2):
  • p(y n
  • x) is the conditional probability that x belongs to the face attribute label of the nth dimension
  • wn is the weight vector of the face attribute label of the nth dimension
  • the server obtains a recognition result including the face attribute of the corrected face image by inputting a face image into the face attribute recognition model.
  • the recognition result of the face attribute includes a face attribute label of at least one dimension.
  • the server also constructs a second loss function based on the recognition result.
  • the second loss function is used to update the parameters of the face correction model in combination with the first loss function.
  • the face attribute recognition model is denoted as FA, and in some embodiments, the second loss function is constructed based on formula (3):
  • FA(G(A) is the recognition result of the face attribute recognition model FA on the corrected face image G(A)
  • C is the face attribute of the face image
  • L attr is the second loss function , which represents the cross-entropy of FA(G(A) and C.
  • step 703 shown in FIG. 7 “through the face attribute recognition model, perform face attribute recognition on the face attribute of at least one dimension of the corrected face image, and obtain a face attribute including the corrected face image with The recognition result of face attributes” can be achieved in the following ways, which will be described in conjunction with each step.
  • the server inputs the corrected face image and the face attribute label corresponding to the face attribute of at least one dimension into the face attribute recognition model;
  • the face attribute label input to the face attribute recognition model is the face attribute label actually corresponding to the face image, that is, the face attribute label corresponding to the face attribute C in the training sample. It can be input into the face attribute recognition model of the embodiment of the present application as a training sample after being recognized by humans.
  • the server downsamples the corrected face image and the face attribute labels of each dimension through the convolution layer of the face attribute recognition model, so as to realize feature extraction, and obtain the corrected face corresponding to the corrected face image. features, and the face attribute features corresponding to the face attribute labels of each dimension.
  • a recognition result including the face attributes possessed by the corrected face image is predicted to be obtained.
  • the server uses the deconvolution layer of the face attribute recognition model to upsample the feature map of the last convolution layer, that is, the corrected face feature, so that it returns to the same size as the input image, so that the correction can be performed.
  • Each pixel of the face image generates a prediction, while retaining the spatial information in the original input image, and finally performs pixel-by-pixel classification on the up-sampled feature map, and maps the output corresponding to the face attribute features through the softmax function.
  • the corrected face attribute label of at least one dimension is used as the predicted recognition result including the face attribute possessed by the corrected face image.
  • the above-mentioned process of recognizing the face attribute of the corrected face image through the face attribute identification model can effectively identify the face attribute of at least one dimension of the corrected face image.
  • Step 704 based on the first loss function and the second loss function, update the model parameters of the face correction model
  • the server combines the first loss function and the second loss function to jointly train a generative adversarial network composed of a face correction model and a discriminant model.
  • the generative adversarial network is converged and the model training is completed. , so that the face correction model obtained by training can maintain the original face attributes while realizing the cross-pose correction of the face.
  • step 704 shown in FIG. 7 can be implemented in the following manner, which will be described in conjunction with each step.
  • the server uses the first loss function and the second loss function to construct a target loss function, and uses the target loss function to train the face correction model, wherein the second loss function is constructed based on the face attribute recognition model,
  • the objective loss function for training the face correction model is constructed by combining the face attribute recognition model, so that the trained face correction model can retain the original face attributes of the face image, so that the face correction model corrected by the face correction model can be used.
  • the face attributes of the corrected face image are closer to the original face image.
  • the server may determine the weight of the first loss function and the weight of the second loss function, respectively, based on the preset weight distribution of the first loss function and the second loss function.
  • the weight distribution of the first loss function and the second loss function can be divided based on the functions that need to be emphasized in the end. is a higher weight than the second loss function. If you want the face correction model to have a stronger effect of preserving face attributes, set the second loss function to a higher weight than the first loss function.
  • the higher the weight the higher the proportion, that is, the higher the importance.
  • the weight of the first loss function and the weight of the second loss function may be stored in the server in advance, and may also be input by the user based on the user interface of the client, and then the client sends the weight input by the user.
  • the server receives the weights input by the client, and obtains the weights of the first loss function and the weights of the second loss function.
  • the server performs weighted summation on the first loss function and the second loss function based on the weight of the first loss function and the weight of the second loss function to obtain the target loss function.
  • the objective loss function constructed by the server can refer to formula (4):
  • Loss is the target loss function
  • is the weight of the first loss function L gan
  • is the weight of the second loss function L attr .
  • the loss function of the face attribute recognition model can be combined with the loss function of the generative adversarial network, and finally a target loss function is constructed to train the generative adversarial network of the embodiment of the present application, so that the training obtains While the face correction model has the function of face correction, it can also make the corrected face image obtained by training retain the same face attributes as the face image before correction.
  • the above-mentioned updating of the model parameters of the face correction model based on the target loss function can be implemented in the following ways: the server determines the value of the first loss function based on the prediction result; The difference between the face attribute and the recognition result determines the value of the second loss function; based on the value of the first loss function and the value of the second loss function, the value of the target loss function is determined; based on the value of the target loss function, the face The model parameters of the rectified model are updated.
  • the prediction result is the probability that the corrected face image is similar to the target face image, and the target face image and the face image in this embodiment of the present application correspond to the same target user, and the larger the prediction result, that is, the correction.
  • the target face image may be set to a different user corresponding to the face image, and the smaller the prediction result, that is, the smaller the probability that the corrected face image is similar to the target face image, means that the corrected face image is obtained. Correcting the face image is more successful.
  • the server may calculate and obtain the value of the first loss function by using formula (1) based on the prediction result.
  • the recognition result of the attribute recognition of the corrected face image by the face attribute recognition model has a face attribute label of at least one dimension.
  • the server uses the cross-entropy between the face attribute of the face image and the recognition result to characterize the difference between the face attribute of the face image and the recognition result, and calculates the human face attribute of the face image by using formula (2).
  • the cross-entropy of the face attribute and the recognition result is used to obtain the value of the second loss function.
  • the server may further determine the value of the target loss function.
  • the server first determines the weight of the first loss function and the weight of the second loss function, and after weighting and summing the value of the first loss function and the value of the second loss function, obtains the weight of the target loss function. value.
  • the server fixes the model parameters of the face attribute recognition model, and based on the value of the target loss function, updates the model parameters of the generative adversarial network provided by the embodiment of the present application, so as to realize the recognition of the human face. Training of face correction models.
  • the above-mentioned updating of the model parameters of the face correction model based on the value of the target loss function can be implemented in the following manner, which will be described in conjunction with each step.
  • the server determines the corresponding first error signal based on the target loss function; starting from the output layer of the discriminant model, the first error signal is back-propagated in the discriminant model and the face correction model , and update the model parameters of the discriminant model and the face correction model in the process of propagation.
  • the server may implement the training of the face correction model in the following manner:
  • the server fixes the model parameters of the face attribute recognition model during the training process of the face correction model.
  • the value of the target loss function reaches the first threshold, it determines the corresponding first error signal based on the target loss function, and converts the first error signal into the first error signal.
  • Backpropagation is performed in the face correction model and the discriminant model, and the model parameters of each layer of the face correction model and the model parameters of each layer of the discriminant model are updated in the process of propagation.
  • the server backpropagates the first error signal in the face correction model and the discriminant model, and updates the model parameters of each layer of the face correction model and the model of each layer of the discriminant model during the propagation process parameter.
  • the backpropagation is explained here.
  • the training samples are input to the input layer of the neural network model, pass through the hidden layer, and finally reach the output layer and output the results.
  • This is the forward propagation process of the neural network model. Since the output of the neural network model If there is an error with the actual result, calculate the error between the output result and the actual value, and propagate the error back from the output layer to the hidden layer until it propagates to the input layer. In the process of back propagation, adjust the model according to the error. The value of the parameter; iterates the above process until convergence.
  • the server determines the first error signal based on the target loss function, the first error signal is back-propagated from the output layer of the face correction model or the discriminant model, and the first error signal is back-propagated layer by layer.
  • the gradient that is, the partial derivative of the Loss function to the parameters of the layer
  • the parameters of the layer are updated to the corresponding gradient values.
  • the server inputs a set of face images with appropriate probability distribution to the face correction model, and then obtains a bunch of generated corrected face images, fixes the model parameters of the face attribute recognition model, and inputs the corrected face images into the human face.
  • the face attribute recognition model is used to obtain the corrected face attributes, and then these corrected face images are used as negative examples. example to train a discriminant model. After this round of training, the ability of the obtained discriminant model has been improved, and it can learn to give high scores to some real and corrected face images whose face attributes are close to those of the real world. Pictures whose face attributes are far from those of real-world face images are given low scores. After that, the server fixes the model parameters of the discriminant model.
  • the server inputs a face image to the face correction model, and then sends the corrected face image generated by it into the discriminant model, and will get a feedback score output by the discriminant model.
  • This feedback score can be used as LOSS.
  • the ability of the obtained face correction model has also been improved, and some more realistic images can be generated.
  • the server continues to repeat the above process to strengthen the discriminant model.
  • the face correction model is strengthened. It can be expected that after multiple rounds of iterations, the ability of the obtained discriminant model and face correction model can become stronger. , and the obtained face correction model can retain more face attributes of the input face image while realizing the correction of face across poses.
  • FIG. 9 is an optional schematic diagram of a model architecture for model training provided by an embodiment of the present application. Based on FIG. 6 , the model architecture for model training may further include:
  • the face recognition model 64 performs feature extraction on the corrected face image and the standard face image, respectively, to obtain the corrected face feature corresponding to the corrected face image and the standard face feature corresponding to the standard face image.
  • the face recognition model can recognize the face from the semantic dimension of the face image, wherein the semantic dimension includes the texture, color, and shape of the image.
  • the feature extraction of the face image based on the face recognition model can extract the information of the semantic dimension of the face in the face image.
  • step 704 it is also possible to perform:
  • the server performs feature extraction on the corrected face image and the standard face image, respectively, and obtains the corrected face feature corresponding to the corrected face image and the standard face feature corresponding to the standard face image.
  • Features and standard face features construct a third loss function.
  • the standard face image is the face image of the target user in the standard face pose, which has completely consistent face attributes with the face image. Referring to FIG. 8 , here, the standard face image may be illustration B.
  • the embodiment of the present application also trains the face correction model in combination with the face recognition model, so that the corrected face image generated by the trained face correction model is closer to the facial features of the original input face image.
  • the face recognition model can be implemented by using the CNN model. For example, a face image is input into the face recognition model, and the user identity corresponding to the face image can be identified.
  • the embodiment of the present application does not require identification of the face image, but only uses the face recognition model to perform feature extraction on the face image, so as to train the face correction model according to the extracted face features.
  • the server uses the face recognition model to perform feature extraction on the corrected face image and the standard face image in its convolutional layer, respectively, to obtain the corrected face feature of the corrected face image and the standard for characterizing the face image.
  • facial features can be represented by vectors, and the extracted facial features can be multi-dimensional vectors, such as 256-dimensional or 516-dimensional.
  • a third loss function is constructed based on the two.
  • the facial features are represented by vectors, and the distance between the corrected facial features and the standard facial features can be used to determine whether the two are close. It can be understood that the smaller the distance between the two, the closer the two are. Close, that is, the closer the corrected face image is to the standard face image.
  • the server can construct a third loss function based on the distance between the corrected face feature and the standard face feature, and the constructed third loss function refers to formula (5):
  • L recog L 2 (FR(G(A)),FR(B)) (5)
  • L recog is the third loss function
  • FR(G(A)) is the corrected face feature
  • FR(B) is the standard face feature of the face recognition model FR
  • L recog represents FR(G(A)) and FR (B) distance between.
  • step 704 of FIG. 7 including: the server constructs the target loss based on the first loss function, the second loss function and the third loss function. function.
  • the server respectively determines the weight of the first loss function, the weight of the second loss function and the weight of the third loss function, and then based on the weight of the first loss function, the weight of the second loss function and the The weight of the third loss function is weighted summation of the first loss function, the second loss function and the third loss function to obtain the target loss function.
  • the target loss function constructed by the server may refer to formula (6):
  • Loss is the target loss function
  • is the weight of the first loss function L gan
  • is the weight of the second loss function L attr
  • is the weight of the third loss function L recog .
  • the server may determine the weights of the first loss function, the second loss function, and the third loss function, respectively, based on the preset weight distribution of the first loss function, the second loss function, and the third loss function. value.
  • the weight distribution of the first loss function, the second loss function and the third loss function can be divided based on the functions that need to be emphasized in the end. Set the first loss function to a higher weight than the second loss function. If you want the face correction model to have a stronger effect of preserving face attributes, set the second loss function to a higher weight than the first loss function. If the face correction model is expected to have a stronger effect of retaining the original facial features, the third loss function is set to a higher weight than the first loss function. Among them, the higher the weight, the higher the proportion, that is, the higher the importance.
  • the loss function of the face attribute recognition model and the face recognition model can be combined with the loss function of the generative adversarial network, and finally a target loss function is constructed to perform the generative adversarial network in the embodiment of the present application.
  • Training so that the trained face correction model has the face correction function, and at the same time, the trained corrected face image can retain the face attributes and facial features that are closer to the face image before correction. .
  • step 704 shown in FIG. 7 can also be implemented by the following steps: obtaining the value of the first loss function, the value of the second loss function and the value of the third loss function; based on the value of the first loss function, The value of the second loss function and the value of the third loss function determine the value of the target loss function; when the value of the target loss function reaches the second threshold, the server determines the corresponding second error signal based on the target loss function; At the beginning of the output layer, the second error signal is back-propagated in the discriminant model and the face correction model, and the model parameters of the discriminant model and the face correction model are updated in the process of propagation.
  • obtaining the value of the third loss function includes: obtaining the distance between the corrected face feature and the standard face feature; and determining the value of the third loss function based on the distance.
  • the server may calculate the distance between the corrected face feature and the standard face feature based on the corrected face feature and the standard face feature, and determine the distance as the value of the third loss function.
  • the server after obtaining the value of the first loss function, the value of the second loss function, and the value of the third loss function, the server further determines the value of the target loss function.
  • the server first determines the weight of the first loss function, the weight of the second loss function, and the weight of the third loss function, and combines the value of the first loss function, the value of the second loss function, and the value of the third loss function After the weighted summation of the loss function, the value of the target loss function is obtained.
  • the server After obtaining the value of the target loss function, the server updates the model parameters of the face correction model based on the value of the target loss function.
  • the server may implement the training of the face correction model in the following manner:
  • the server fixes the model parameters of the face attribute recognition model and the model parameters of the face recognition model during the training process of the face correction model.
  • the value of the target loss function reaches the second threshold, it determines the corresponding second threshold based on the target loss function.
  • Error signal, the second error signal is back-propagated in the face correction model and the discriminant model, and the model parameters of each layer of the face correction model and the model parameters of each layer of the discriminant model are updated in the process of propagation.
  • the server backpropagates the second error signal in the face correction model and the discriminant model, and updates the model parameters of each layer of the face correction model and the model of each layer of the discriminant model during the propagation process parameter.
  • the server inputs a set of face images with appropriate probability distribution to the face correction model, and then obtains a bunch of generated corrected face images, fixes the model parameters of the face attribute recognition model and the face recognition model, and corrects the model parameters.
  • the face image is input into the face attribute recognition model, and the corrected face attributes are obtained, and the corrected face image and the standard face image are input into the face recognition model to obtain the corrected face features and standard face features, and then these corrected face images are used.
  • the value of the target loss function is obtained by combining the difference between the face attribute of the face image and the corrected face attribute, as well as the distance between the corrected face feature and the standard face feature.
  • the real-world target face images are used as positive examples to train the discriminative model.
  • the ability of the obtained discriminant model has been improved, and it can learn to give high scores to some real pictures with facial attributes that are close to the real world and closer to the face image, and give some non- Pictures with real and corrected face attributes that are far from the face attributes of the real-world face image, and the facial features of the face are far different from those of the face image are given a low score.
  • the server fixes the model parameters of the discriminant model.
  • the server inputs a face image to the face correction model, and then sends the corrected face image generated by it into the discriminant model, and will get a feedback score output by the discriminant model.
  • This feedback score can be used as LOSS.
  • the ability of the obtained face correction model has also been improved, and some more realistic images can be generated.
  • the server continues to repeat the above process to strengthen the discriminant model.
  • the face correction model is strengthened. It can be expected that after multiple rounds of iterations, the ability of the obtained discriminant model and face correction model can become stronger. , and the obtained face correction model can retain more face attributes and facial features of the input face image while realizing the correction of face across poses.
  • the input face image is subjected to face posture correction through the face correction model to obtain the corrected face image of the standard face posture, and then the authenticity prediction is performed on the corrected face image through the discriminant model to obtain: Characterize the prediction result of the authenticity of the corrected face image compared to the target face image, and construct a first loss function based on the prediction result, and perform the correction on the face attribute of the corrected face image for at least one dimension through the face attribute recognition model.
  • Face attribute recognition obtain a recognition result including the face attribute of the corrected face image, and build a second loss function based on the recognition result, and build a target loss function based on the first loss function and the second loss function, and finally based on the target loss function
  • the value of the loss function is used to update the model parameters of the face correction model; in this way, the target loss function for training the face correction model is constructed by combining the loss function of the face attribute recognition model, which can make the face correction obtained by training.
  • the model retains the original face attributes of the face image, so that the corrected face image corrected by using the face correction model is closer to the face attributes of the original face image, so that the face correction obtained by training in the embodiment of the present application
  • the model realizes the cross-pose correction of the face without losing the information of the face image.
  • FIG. 10 is an optional schematic flowchart of the training method of the face correction model provided by the embodiment of the present application.
  • the present application implements The training method of the face correction model provided in the example is implemented by the terminal and the server collaboratively.
  • Step 801 the terminal receives the uploaded face image and the target face image in response to the uploading operation for the face image and the target face image;
  • the face image is the face image of the target user in any face pose
  • the target face image is the face image of the target user under the standard face pose
  • Step 802 the terminal acquires the face attribute of at least one dimension of the face image in response to the input operation for the face attribute of the face image;
  • Step 803 The terminal sends the face image, the face attributes of at least one dimension of the face image, and the target face image to the server.
  • Step 804 the server constructs a training sample for training the face correction model based on the received face image, the face attributes of at least one dimension of the face image, and the target face image;
  • Step 805 the server performs face posture correction on the input face image through the face correction model to obtain a corrected face image with a standard face posture
  • Step 806 the server performs authenticity prediction on the corrected face image through the discriminant model, obtains a prediction result representing the authenticity of the corrected face image compared to the target face image, and constructs a first loss function based on the prediction result;
  • Step 807 the server performs face attribute recognition on the face attribute of at least one dimension of the corrected face image through the face attribute recognition model, and obtains a recognition result including the face attribute of the corrected face image, and based on the recognition result Build a second loss function;
  • Step 808 the server performs feature extraction on the corrected face image and the standard face image through the face recognition model, respectively, to obtain the corrected face feature corresponding to the corrected face image and the standard face feature corresponding to the standard face image, and based on the Correcting face features and standard face features to construct a third loss function;
  • Step 809 the server constructs a target loss function based on the first loss function, the second loss function and the third loss function;
  • Step 810 the server obtains the value of the first loss function, the value of the second loss function, and the value of the third loss function;
  • Step 811 the server determines the value of the target loss function based on the value of the first loss function, the value of the second loss function and the value of the third loss function;
  • Step 812 when the value of the target loss function reaches the second threshold, the server determines a corresponding second error signal based on the target loss function;
  • Step 813 Starting from the output layer of the discriminant model, the server backpropagates the second error signal in the discriminant model and the face correction model, and updates the model parameters of the discriminant model and the face correction model in the process of propagation.
  • the terminal receives the training sample and sends it to the server, so that the server trains the face correction model according to the training sample, and performs the face correction model and the discrimination model by combining the face attribute recognition model and the face recognition model.
  • the update of the model parameters realizes the model training of the generative adversarial network, so that the trained face correction model can not only achieve the cross-pose correction of the face, but also retain the face attributes and facial features of the original input face image.
  • the corrected face image in the standard posture of the target user which is closer to the input face image is obtained.
  • Step 814 the terminal sends an image correction request carrying the face image to be corrected to the server;
  • the image correction request may be generated after the user sends a trigger operation based on the input device of the terminal, and the server responds to the trigger operation.
  • the image correction request can also be automatically generated by the terminal based on certain conditions. For example, after the camera connected to the terminal collects the image of the face to be corrected, the image of the face to be corrected is sent to the terminal, and the terminal receives the image of the face to be corrected. Then generate an image correction request.
  • the image correction request may also be received by the terminal and sent by other devices.
  • Step 815 the server obtains the face image to be corrected after parsing the image correction request, and performs face posture correction on the to-be-corrected face image through the face correction model obtained by training, and obtains a corrected face image with a standard face posture;
  • Step 816 the server sends the corrected face image of the standard face posture to the terminal.
  • the terminal after receiving the corrected face image sent by the server, the terminal can present it on its user interface for the user to browse, and can also use the corrected face image for other processing, such as using the corrected face image for human face recognition, etc.
  • FIG. 11 is an optional schematic flowchart of the training method of the face correction model provided by the embodiment of the present application.
  • the training method of the face correction model provided by the embodiment of the present application may include the following operations :
  • Step 901 the server obtains training samples consisting of multiple sets of training data; wherein, a set of training data includes a face image of the first user in an arbitrary posture, and a first standard face image of the first user in a standard face posture , a face attribute of at least one dimension corresponding to the face image, and a second standard face image of the second user in a standard face pose.
  • FIG. 12 is an optional schematic diagram of a training sample provided by an embodiment of the present application, wherein the face image of the first user in any posture is denoted as A, and the image of the first user in a standard face posture is denoted as A
  • the first standard face image is marked as B
  • the second standard face image of the second user in the standard face pose is marked as E.
  • the face image and the first standard face image also have the same face attribute, and the face attribute of at least one dimension corresponding to the face image is denoted as C.
  • a set of training data in the training samples can be represented as (A, B, C, E).
  • the at least one face attribute tag corresponding to the face attribute may be, for example, gender, age, hair length, whether to wear glasses, whether to wear a hat, and the like.
  • Step 902 Input the face image into the face correction model, and perform face posture correction on the face image through the face correction model to obtain a corrected face image with a standard face posture;
  • the face correction model is a generative network in the generative adversarial network, and the face image is corrected by the generative network.
  • the corrected face image is denoted as A' in this embodiment of the present application.
  • Step 903 input the corrected face image into the discriminant model, and predict the authenticity of the corrected face image through the discriminant model, and obtain a prediction result representing the authenticity of the corrected face image compared to the target face image;
  • the discriminant model is the discriminative network in the generative adversarial network.
  • the server inputs the corrected face image A' and the second standard face image E of the second user in the standard face pose into the discriminant model. face image A'.
  • the generated prediction results are generated when the authenticity prediction is performed on the corrected face image A' by the discriminant model.
  • Step 904 determining the value of the first loss function based on the prediction result
  • the first loss function is the loss function corresponding to the generative adversarial network, that is, the loss function corresponding to the generative adversarial network composed of the face correction model and the discriminant model.
  • the first loss function L gan can be implemented by using the above formula (1).
  • Step 905 Input the corrected face image and the first standard face image into the face recognition model, and perform feature extraction on the corrected face image and the first standard face image respectively through the face recognition model to obtain the corrected face
  • the face recognition model is implemented by a feature extraction model, which maps the face image into a fixed-dimensional feature representation, such as 256-dimensional, or 512-dimensional, etc., and then passes the distance between the two features. to determine whether two face images are of the same person.
  • the server inputs the corrected face image A' and the first standard face image B into the face recognition model, and performs feature extraction on the corrected face image A' and the first standard face image B respectively through the face recognition model.
  • the corrected face feature corresponding to the corrected face image and the standard face feature corresponding to the standard face image are obtained.
  • Step 906 based on the corrected face feature and the standard face feature, calculate the third loss function, and obtain the value of the third loss function;
  • the third loss function is the loss function corresponding to the generative adversarial network, that is, the loss function L recog corresponding to the generative adversarial network composed of the face correction model and the discriminant model.
  • the third loss function L recog can be implemented by using the above formula (5). If the corrected face feature and the standard face feature are represented as FR(G(A)) and FR(B), respectively, the third loss function L recog represents the difference between FR(G(A)) and FR(B). distance.
  • Step 907 Input the corrected face image and the face attribute of at least one dimension corresponding to the face image into the face attribute recognition model, and perform the correction of the face image for the face attribute of at least one dimension through the face attribute recognition model. Face attribute recognition, to obtain a recognition result including the face attribute of the corrected face image;
  • the face attribute recognition model is realized by a multi-task and multi-classification model, which can identify the face attributes of at least one dimension of a face image, and the corrected face image should be the same as the original one.
  • the input face image keeps the same face attributes.
  • the server performs face attribute recognition on the input corrected face image A' based on the face attribute C of at least one dimension corresponding to the face image through the face attribute recognition model, and obtains a face attribute containing the face attribute of the corrected face image. Identify the results.
  • the recognition result is the corrected face attribute of at least one dimension corresponding to the face attribute of at least one dimension of the face image.
  • Step 908 determining the value of the second loss function based on the difference between the face attributes possessed by the face image and the recognition result;
  • the second loss function is the loss function L attr corresponding to the face attribute recognition model, which can be implemented by the above formula (3).
  • the value of the second loss function can be calculated based on the recognition result and the face attribute C.
  • Step 909 based on the first loss function, the second loss function and the third loss function, construct a target loss function, and determine the value of the target loss function;
  • the server assigns weights to the first loss function, the second loss function, and the third loss function respectively, and then assigns weights to the first loss function, the second loss function, and the third loss function based on the respective weights of the first loss function, the second loss function, and the third loss function.
  • the loss function, the second loss function and the third loss function are weighted and summed to obtain the target loss function. Then, based on the weights of each loss function and the value of each loss function, the value of the target loss function is calculated.
  • Step 910 based on the value of the target loss function, update the model parameters of the face correction model and the model parameters of the discriminant model.
  • the server fixes the model parameters of the face attribute recognition model and the face recognition model, and then uses the objective loss function to update the model parameters of the face correction model and the model parameters of the discriminant model.
  • the model is trained to update the parameters of the model until the generative adversarial network reaches convergence and the training is completed.
  • the training device 555 of the face correction model provided by the embodiments of the present application is implemented as a software module.
  • the training of the face correction model stored in the memory 550 is performed.
  • Software modules in apparatus 555 may include:
  • the face posture correction module 5551 is configured to perform face posture correction on the input face image through the face correction model, so as to obtain the corrected face image of the standard face posture; wherein, the face image has at least one dimension of human face attribute;
  • the prediction module 5552 is configured to predict the authenticity of the corrected face image through a discriminant model, obtain a prediction result that characterizes the authenticity of the corrected face image compared to the target face image, and constructs based on the prediction result The first loss function;
  • the attribute identification module 5553 is configured to perform face attribute identification on the corrected face image with respect to the face attributes of the at least one dimension through a face attribute identification model, and obtain a face that includes the corrected face image. the identification result of the attribute, and construct a second loss function based on the identification result;
  • the parameter updating module 5554 is configured to update the model parameters of the face correction model based on the first loss function and the second loss function.
  • the face posture correction module 5551 is further configured to input a face image in any posture into the face correction model; encode the face image through the face correction model , obtain the initial image code; based on the deviation of the face posture in the face image and the standard face posture, correct the initial image code to obtain the target image code; decode the target image code to obtain the standard human Corrected face image for face pose.
  • the prediction module 5552 is further configured to input the corrected face image and the target face image into the discriminant model; performing feature extraction on the face image to obtain the corrected face feature corresponding to the corrected face image and the target face feature corresponding to the target face image; based on the corrected face feature and the target face feature, predicting A prediction result is obtained that characterizes the authenticity of the corrected face image compared to the target face image.
  • the attribute identification module 5553 is further configured to input the corrected face image and the face attribute label corresponding to the face attribute of the at least one dimension into the face attribute identification model Through the described face attribute recognition model, carry out feature extraction to the described rectification face image and the face attribute label of each dimension respectively, obtain the rectification face feature corresponding to the rectification face image and the face of each dimension The face attribute feature corresponding to the attribute label; based on the obtained corrected face feature and face attribute feature, a recognition result including the face attribute possessed by the corrected face image is predicted to be obtained.
  • the parameter update module 5554 is further configured to respectively determine the weight of the first loss function and the weight of the second loss function; based on the weight of the first loss function and the weights of the second loss function, weighted summation of the first loss function and the second loss function to obtain a target loss function; based on the target loss function, model parameters of the face correction model to update.
  • the parameter updating module 5554 is further configured to determine the value of the first loss function based on the prediction result; based on the difference between the face attribute of the face image and the recognition result , determine the value of the second loss function; based on the value of the first loss function and the value of the second loss function, determine the value of the target loss function; based on the value of the target loss function, The model parameters of the face correction model are updated.
  • the parameter update module 5555 is further configured to determine a corresponding first error signal based on the target loss function when the value of the target loss function reaches a first threshold; Starting from the output layer, the first error signal is back-propagated in the discriminant model and the face correction model, and the model parameters of the discriminant model and the face correction model are updated in the process of propagation.
  • the software module stored in the training device 555 of the face correction model in the memory 550 may further include: a training sample building module configured to obtain a face image of the target user in any posture, the target user The target face image in the standard face pose, and the face attributes of at least one dimension of the face image; based on the acquired face image, the target face image and the face image The face attributes possessed are constructed and configured to train the training samples of the face correction model.
  • the software module stored in the training device 555 of the face correction model in the memory 550 may further include: a face recognition module, configured to use the face recognition model to Perform feature extraction on the corrected face image and the standard face image to obtain the corrected face feature corresponding to the corrected face image and the standard face feature corresponding to the standard face image, so as to obtain the corrected face feature corresponding to the corrected face image and the standard face feature corresponding to the standard face image.
  • the standard face feature constructs a third loss function; correspondingly, the parameter update module 5554 is further configured to, based on the first loss function, the second loss function and the third loss function The model parameters of the face correction model are updated.
  • the parameter update module 5554 is further configured to obtain the value of the first loss function, the value of the second loss function and the value of the third loss function; based on the first loss The value of the function, the value of the second loss function and the value of the third loss function determine the value of the target loss function; based on the value of the target loss function, the model parameters of the face correction model are to update.
  • the parameter update module 5554 is further configured to determine a corresponding second error signal based on the target loss function when the value of the target loss function reaches a second threshold; Starting from the output layer, the second error signal is back-propagated in the discriminant model and the face correction model, and the model parameters of the discriminant model and the face correction model are updated in the process of propagation.
  • the parameter updating module 5554 is further configured to obtain the distance between the corrected face feature and the standard face feature; and determine the value of the third loss function based on the distance.
  • FIG. 13 is an optional schematic flowchart of the face correction method provided by the embodiment of the present application, which will be described with reference to the steps shown in FIG. 13 .
  • Step 1001 the server obtains the face image to be corrected
  • Step 1002 input the face image to be corrected into the face correction model
  • the face image to be corrected can be uploaded by the user to the server, or sent to the server by other devices connected to the server, or detected in real time by other devices connected to the server, such as with the server. Taken from the connected camera.
  • the server further preprocesses it, for example, performs image cutting, denoising, image enhancement and other processing on the face image to be corrected. Then, the server inputs the preprocessed face image to be corrected into the face correction model, so that the face correction model corrects the face posture of the face image to be corrected.
  • Step 1003 performing face posture correction on the face image to be corrected by the face correction model, to obtain a target corrected face image of a standard face posture; wherein, the face correction model is based on the training of the face correction model provided in the embodiment of the present application method is trained.
  • the server uses the face correction model to correct the face posture of the input face image to be corrected, and obtains the target corrected face image of the standard face posture. Since the face correction model is obtained by training based on the training method of the face correction model provided in the embodiment of the present application, the training of the generative adversarial network composed of the face correction model and the discriminant model is guided by the face attribute recognition model, so that the training obtained
  • the face correction model learns the data distribution of face attributes, so that the target corrected face image processed by the model can still retain the input face image to be corrected after the transformation of the face pose is realized. Attributes.
  • the face correction model includes an encoding layer, a correction layer, and a decoding layer.
  • step 1003 shown in FIG. 13 can also be implemented in the following manner.
  • the server encodes the face image to be corrected through the encoding layer to obtain the initial encoding; through the correction layer, based on the deviation of the face pose in the face image to be corrected and the standard face pose, corrects the initial encoding to obtain the target encoding; through the decoding layer , decode the target code, and obtain the target corrected face image of the standard face pose; wherein, the parameters of the encoding layer, the parameters of the correction layer and the parameters of the decoding layer are the first loss function constructed based on the prediction result of the discriminant model, and The second loss function constructed by the face attribute recognition result of the face attribute recognition model is obtained by updating the parameters; wherein, the prediction result is that the discriminant model predicts the authenticity of the corrected face image output by the face correction model; the face The result of attribute recognition is that the face attribute recognition model performs face attribute recognition on the corrected face image output by the face correction model.
  • the coding layer of the face correction model is used to perform numerical coding on the input face image to be corrected, so as to obtain a data form that can be statistically calculated by the machine.
  • the server uses the coding layer of the face correction model to encode the face image to be corrected in the form of a vector matrix.
  • the server encodes the face image based on the RGB (Red, Green, Blue, red, green, and blue) values of each pixel of the face image to be corrected.
  • RGB Red, Green, Blue, red, green, and blue
  • the feature extraction of face elements is further performed to obtain an initial encoding that only contains face elements.
  • the parameters of the coding layer are obtained based on the training method of the face correction model provided by the embodiment of the present application, and the face attributes of the face image to be corrected are retained during feature extraction.
  • the face pose of the image represented by the initial encoding obtained by the server through the encoding layer is still the initial face pose in the face image to be corrected. In order to convert it into a standard face pose, further steps are required. process it.
  • the server uses the correction layer to correct the initial encoding.
  • the server uses the correction layer to determine the deviation between the face pose of the image represented by the initial encoding and the standard face pose, and based on the deviation, the initial encoding is modified to obtain the target encoding.
  • the face pose of the image represented by the target code is the standard face pose.
  • the server uses the decoding layer to convert the target code from a numerical feature vector to an image, and obtains the target corrected face image.
  • the face pose in the target corrected face image is recorded as the standard face pose. So far, the treatment is completed. Face correction for rectifying face images.
  • the parameters of the encoding layer, the parameters of the correction layer, and the parameters of the decoding layer are the first loss function constructed based on the prediction result of the discriminant model and the face attribute recognition result of the face attribute recognition model.
  • the second loss function, and the third loss function constructed by the corrected face features extracted by the face recognition model and the standard face features are obtained by updating the parameters; wherein, the prediction result is that the discriminant model corrects the output of the face correction model.
  • the authenticity prediction of the face image is obtained;
  • the face attribute recognition result is that the face attribute recognition model performs face attribute recognition on the corrected face image output by the face correction model;
  • the corrected face feature is the face recognition model corrects the face.
  • the corrected face image output by the model is obtained by feature extraction;
  • the standard face feature is obtained by feature extraction of the standard face image by the face recognition model.
  • the server uses the face correction model to perform face correction on the corrected face image and obtains the target corrected face image.
  • the conversion also retains the face attributes and face semantic dimension information of the face image to be corrected, and the specific processing process refers to the above embodiment, which is not repeated here.
  • FIG. 14 is an optional schematic diagram of the structure of the face correction device provided by the embodiment of the present application, as shown in FIG. 14, the face correction device 14 provided by the embodiment of the present application includes:
  • an input module 1402 configured to input the face image to be corrected into a face correction model
  • the correction module 1403 is configured to perform face posture correction on the face image to be corrected through the face correction model to obtain the target corrected face image of the standard face posture; wherein, the face correction model is implemented based on the present application
  • the training method of the face correction model provided in the example is obtained by training.
  • the above-mentioned correction module 1403 is further configured to encode the face image to be corrected through the encoding layer to obtain an initial encoding; through the correction layer, based on the image of the face to be corrected The deviation between the face pose and the standard face pose is corrected, and the initial code is corrected to obtain the target code; through the decoding layer, the target code is decoded to obtain the target-corrected face image of the standard face pose; wherein, the code
  • the parameters of the layer, the parameters of the correction layer and the parameters of the decoding layer are the first loss function constructed based on the prediction result of the discriminant model and the second loss function constructed based on the face attribute recognition result of the face attribute recognition model.
  • the loss function is obtained by updating the parameters; wherein, the prediction result is that the discriminant model performs authenticity prediction on the corrected face image output by the face correction model; the face attribute recognition result is that the person The face attribute recognition model performs face attribute recognition on the corrected face image output by the face correction model.
  • Embodiments of the present application provide a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the training method of the face correction model provided by the embodiment of the present application or the training method provided by the embodiment of the present application. face correction method.
  • the embodiments of the present application provide a computer-readable storage medium storing executable instructions, wherein the executable instructions are stored, and when the executable instructions are executed by a processor, the processor will cause the processor to perform the face correction provided by the embodiments of the present application.
  • the training method of the model may execute the face correction method provided by the embodiments of the present application, for example, the training method of the face correction model shown in FIG. 7 , or the face correction method shown in FIG. 13 .
  • the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; it may also include one or any combination of the foregoing memories Various equipment.
  • executable instructions may take the form of programs, software, software modules, scripts, or code, written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and which Deployment may be in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • executable instructions may, but do not necessarily correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, a Hyper Text Markup Language (HTML, Hyper Text Markup Language) document
  • HTML Hyper Text Markup Language
  • One or more scripts in stored in a single file dedicated to the program in question, or in multiple cooperating files (eg, files that store one or more modules, subroutines, or code sections).
  • executable instructions may be deployed to be executed on one computing device, or on multiple computing devices located at one site, or alternatively, distributed across multiple sites and interconnected by a communication network execute on.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种人脸矫正模型的训练方法、装置、电子设备及计算机可读存储介质;方法包括:服务器通过人脸矫正模型对输入的人脸图像进行人脸姿态矫正,得到标准人脸姿态的矫正人脸图像(701);通过判别模型对矫正人脸图像进行真实性预测,得到表征所述矫正人脸图像相较于目标人脸图像的真实性的预测结果,并基于预测结果构建第一损失函数(702);通过人脸属性识别模型,对所述矫正人脸图像针对所述至少一个维度的人脸属性进行人脸属性识别,得到包含所述矫正人脸图像所具有的人脸属性的识别结果,并基于识别结果构建第二损失函数(703);基于第一损失函数及第二损失函数,对人脸矫正模型的模型参数进行更新(704)。

Description

人脸矫正模型的训练方法、装置、电子设备及存储介质
相关申请的交叉引用
本申请基于申请号为202010946586.6、申请日为2020年09月10日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及机器学习技术领域,尤其涉及一种人脸矫正模型的训练方法、装置、电子设备及存储介质。
背景技术
图像矫正是计算机科学领域与人工智能领域中的一个重要方向,能够对畸变、带宽有限等造成的图像失真,或者由于成像器件拍摄姿态和扫描非线性引起的图像几何失真,由于运动模糊、辐射失真、引入噪声等造成的图像失真等进行校正。人脸矫正技术是图像矫正领域的一个分支,在如今很多生活应用场景中发挥着越来越重要的作用。
在相关技术中,为了实现对不同姿态的人脸图像的姿态矫正,以得到一个相对能够较好的识别出人脸特征的人脸姿态的图像,通常采用的方式是通过一个机器模型来进行跨姿态人脸图像的学习,模型的输入是待矫正的各种姿态下的人脸图像,模型的输出是矫正后的需要的人脸姿态的图像,通过学习一个包含各人脸姿态的图像和需要的人脸姿态的图像的样本数据,来使得模型能够实现跨姿态的人脸矫正。然而这种方式可能会产生图像映射歧义,使得模型学习到除了姿态改变之外的变化,从而导致生成后的图片丢失了原始人脸图像的信息,最终生成出另一个完全不同的人脸。
发明内容
本申请实施例提供一种人脸矫正模型的训练方法、装置、电子设备及计算机可读存储介质,能够使得训练得到的人脸矫正模型在实现人脸的跨姿态矫正的同时,还能够不丢失人脸图像的信息。
本申请实施例提供一种人脸矫正模型的训练方法,所述方法由电子设备执行,包括:
通过人脸矫正模型对输入的人脸图像进行人脸姿态矫正,得到标准人脸姿态的矫正人脸图像;其中,所述人脸图像具有至少一个维度的人脸属性;
通过判别模型对所述矫正人脸图像进行真实性预测,得到表征所述矫正人脸图像相较于目标人脸图像的真实性的预测结果,并基于所述预测结果构建第一损失函数;
通过人脸属性识别模型,对所述矫正人脸图像针对所述至少一个维度的人脸属性进行人脸属性识别,得到包含所述矫正人脸图像所具有的人脸属性的识别结果,并基于所述识别结果构建第二损失函数;
基于所述第一损失函数及所述第二损失函数,对所述人脸矫正模型的模型参数进行更新。
上述方案中,所述基于所述第一损失函数及所述第二损失函数,对所述人脸矫正模 型的模型参数进行更新,包括:分别确定所述第一损失函数的权值和所述第二损失函数的权值;基于所述第一损失函数的权值和所述第二损失函数的权值,对所述第一损失函数和所述第二损失函数进行加权求和,得到目标损失函数;基于所述目标损失函数,对所述人脸矫正模型的模型参数进行更新。
上述方案中,所述基于所述目标损失函数,对所述人脸矫正模型的模型参数进行更新,包括:基于所述预测结果确定所述第一损失函数的值;基于所述人脸图像所具有的人脸属性与所述识别结果之间的差异,确定所述第二损失函数的值;基于所述第一损失函数的值和所述第二损失函数的值,确定所述目标损失函数的值;基于所述目标损失函数的值,对所述人脸矫正模型的模型参数进行更新。
上述方案中,所述基于所述目标损失函数的值,对所述人脸矫正模型的模型参数进行更新,包括:当所述目标损失函数的值达到第一阈值时,基于所述目标损失函数确定相应的第一误差信号;从所述判别模型的输出层开始,将所述第一误差信号在所述判别模型及所述人脸矫正模型中反向传播,并在传播的过程中更新所述判别模型及所述人脸矫正模型的模型参数。
本申请实施例提供一种人脸矫正模型的训练装置,包括:
人脸姿态矫正模块,配置为通过人脸矫正模型对输入的人脸图像进行人脸姿态矫正,得到标准人脸姿态的矫正人脸图像;其中,所述人脸图像具有至少一个维度的人脸属性;
预测模块,配置为通过判别模型对所述矫正人脸图像进行真实性预测,得到表征所述矫正人脸图像相较于目标人脸图像的真实性的预测结果,并基于所述预测结果构建第一损失函数;
属性识别模块,配置为通过人脸属性识别模型,对所述矫正人脸图像针对所述至少一个维度的人脸属性进行人脸属性识别,得到包含所述矫正人脸图像所具有的人脸属性的识别结果,并基于所述识别结果构建第二损失函数;
参数更新模块,配置为基于所述第一损失函数及所述第二损失函数,对所述人脸矫正模型的模型参数进行更新。
本申请实施例提供一种电子设备,包括:
存储器,用于存储可执行指令;
处理器,用于执行所述存储器中存储的可执行指令时,实现本申请实施例提供的人脸矫正模型的训练方法。
本申请实施例提供一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现本申请实施例提供的人脸矫正模型的训练方法。
本申请实施例还提供一种人脸矫正方法,所述方法由电子设备执行,包括:
获取待矫正人脸图像;
将所述待矫正人脸图像输入至人脸矫正模型;
通过所述人脸矫正模型对所述待矫正人脸图像进行人脸姿态矫正,得到标准人脸姿态的目标矫正人脸图像;
其中,所述人脸矫正模型基于本申请实施例提供的人脸矫正模型的训练方法训练得到。
上述方案中,所述人脸矫正模型包括编码层、修正层以及解码层;所述通过所述人脸矫正模型对所述待矫正人脸图像进行人脸姿态矫正,得到标准人脸姿态的目标矫正人脸图像,包括:通过所述编码层,对所述待矫正人脸图像进行编码,得到初始编码;通过所述修正层,基于所述待矫正人脸图像中人脸姿态与标准人脸姿态的偏差,修正所述初始编码,得到目标编码;通过所述解码层,解码所述目标编码,得到标准人脸姿态的 目标矫正人脸图像;其中,所述编码层的参数、所述修正层的参数和所述解码层的参数为,基于判别模型的预测结果所构建的第一损失函数、及人脸属性识别模型的人脸属性识别结果所构建的第二损失函数进行参数更新得到;其中,所述预测结果为,所述判别模型对所述人脸矫正模型输出的矫正人脸图像进行真实性预测得到;所述人脸属性识别结果为,所述人脸属性识别模型对所述人脸矫正模型输出的矫正人脸图像进行人脸属性识别得到。
本发明本申请实施例还提供一种人脸矫正装置,包括:
获取模块,配置为获取待矫正人脸图像;
输入模块,配置为将所述待矫正人脸图像输入至人脸矫正模型;
矫正模块,配置为通过所述人脸矫正模型对所述待矫正人脸图像进行人脸姿态矫正,得到标准人脸姿态的目标矫正人脸图像;
其中,所述人脸矫正模型基于本申请实施例提供的人脸矫正模型的训练方法训练得到。
本申请实施例提供一种电子设备,包括:
存储器,用于存储可执行指令;
处理器,用于执行所述存储器中存储的可执行指令时,实现本申请实施例提供的人脸矫正方法。
本申请实施例提供一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现本申请实施例提供的人脸矫正方法。
本申请实施例具有以下有益效果:
与相关技术中以包含各人脸姿态的人脸图像和所需人脸姿态的真实图像作为样本数据进行模型训练的训练方式相比,应用本申请实施例提供的人脸矫正模型的训练方法、装置、电子设备及计算机可读存储介质,在由人脸矫正模型和判别模型构成的生成对抗网络的基础训练架构上,引入人脸属性识别模型作为训练指导,来实现对人脸矫正模型的训练,使得人脸矫正模型在训练过程中能够学习到跨姿态的人脸矫正的同时,还能够学习到人脸图像的人脸属性,从而克服了采用相关技术中的模型训练方式容易丢失人脸图像信息的缺陷,实现了使训练得到的人脸矫正模型在具备跨姿态的人脸矫正功能的同时,还能够让矫正后的矫正人脸图像不丢失原始输入的人脸图像的信息。
附图说明
图1是相关技术提供的生成对抗网络模型的原理示意图;
图2是相关技术提供的生成对抗网络模型的一个可选的结构示意图;
图3是本申请实施例提供的人脸矫正模型的训练系统的一个可选的示意图;
图4是本申请实施例提供的电子设备的一个可选的结构示意图;
图5是本申请实施例提供的人脸矫正模型的一个可选的结构示意图;
图6是本申请实施例提供的用于模型训练的模型架构的一个可选的示意图;
图7是本申请实施例提供的人脸矫正模型的训练方法的一个可选的流程示意图;
图8是本申请实施例提供的训练样本的一个可选的示意图;
图9是本申请实施例提供的用于模型训练的模型架构的一个可选的示意图;
图10是本申请实施例提供的人脸矫正模型的训练方法的一个可选的流程示意图;
图11是本申请实施例提供的人脸矫正模型的训练方法的一个可选的流程示意图;
图12是本申请实施例提供的训练样本的一个可选的示意图;
图13是本申请实施例提供的人脸矫正方法的一个可选的流程示意图;
图14是本申请实施例提供的人脸矫正装置的结构的一个可选的示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述,所描述的实施例不应视为对本申请的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
在以下的描述中,所涉及的术语“第一\第二\第三”仅仅是是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
对本申请实施例进行进一步详细说明之前,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。
1)隐空间(latent space):噪声z所在的样本空间,它是一个向量空间。
2)交叉熵:用于衡量两个分布之间的相似度。例如,在逻辑回归中,数据集真实的分布是p,通过逻辑回归模型预测出来的结果对应的分布是q,此时交叉熵在这里就是衡量预测结果q与真实结果p之间的差异程度,称之为交叉熵损失函数。
3)生成对抗网络(Generative Adversarial Network,GAN):一种深度学习模型,通过生成模型(Generative Model)和判别模型(Discriminative Model)的互相博弈学习来产生较好的输出。
4)收敛:指逼近于某一个值。模型的收敛则是指模型的损失函数的收敛。
为了实现跨姿态的人脸矫正,即将各种不同人脸姿态的人脸图像矫正为标准人脸姿态的人脸图像,相关技术提供一种生成对抗网络模型。图1为相关技术提供的生成对抗网络GAN模型的原理示意图,参见图1,生成对抗网络GAN模型包括生成模型G和判别模型D,以通过该GAN模型对图片进行处理为例,生成模型G是一个生成式的网络,它从隐空间接收一个随机的噪声z,通过这个噪声生成图像G(z)。判别模型D是一个判别网络,判别一张图片是不是“真实的”,例如它的输入参数是x,x代表一张图片,输出D(x)则代表x为真实图片的概率。
图2为相关技术中提供的生成对抗网络GAN模型的结构的一个可选的示意图,在训练过程中,生成模型G的目标就是尽量生成真实的图片输入到判别模型D中去欺骗判别模型D。而判别模型D的目标就是尽量把生成模型G生成的图片和真实世界的图片分别开来。这样,生成模型G和判别模型D构成了一个动态的“博弈过程”。通过生成模型G和判别模型D的不断博弈,进而使生成模型G学习到数据的分布,如果用到图片生成上,则训练完成后,生成模型G可以从一段随机数中生成逼真的图像。
申请人在实施本申请实施例的过程中,采用一由人脸矫正模型和判别模型构成的生成对抗网络来进行人脸矫正学习。其中,人脸矫正模型作为生成模型,具体用于通过人脸矫正模型来进行跨姿态的人脸图像矫正。人脸矫正模型的输入是待矫正的各种姿态下的人脸图像,人脸矫正模型的输出是矫正后的标准人脸姿态的人脸图像,例如正脸图像。然后将矫正后的人脸图像以及另一张正脸图像一起输入到判别模型。其中,另一张正脸图像可以与输入人脸矫正模型的人脸图像对应为同一个人,也可以对应为 两个不同的人。判别模型用于判断哪张图像是真实的,哪张图像是生成的。最后通过人脸矫正模型和判别模型之间的对抗来学习。
申请人在实施本申请实施例的过程中发现,这种方式可能会产生图像映射歧义,使得生成模型学习到除了姿态改变之外的变化,从而导致生成后的图片丢失了原始人脸图像的身份信息,最终生成出另一个完全不同的人脸,导致人脸矫正模型有待进一步优化。
基于此,本申请实施例提供了一种人脸矫正模型的训练方法、装置、电子设备和计算机可读存储介质,能够得到在实现人脸的跨姿态矫正的同时不丢失人脸图像的信息的人脸矫正模型。
首先对本申请实施例提供的人脸矫正模型的训练系统进行说明,图3为本申请实施例提供的人脸矫正模型的训练系统100的一个可选的示意图,终端400通过网络300连接服务器200,网络300可以是广域网或者局域网,又或者是二者的组合,使用无线链路实现数据传输。在一些实施例中,终端400可以是笔记本电脑,平板电脑,台式计算机,智能手机,专用消息设备,便携式游戏设备,智能音箱,智能手表等,但并不局限于此。服务器200可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。网络300可以是广域网或者局域网,又或者是二者的组合。终端400以及服务器200可以通过有线或无线通信方式进行直接或间接地连接,本申请实施例中不做限制。
终端400,用于发送用于训练人脸矫正模型的人脸图像至服务器200;
服务器200,用于通过人脸矫正模型对输入的人脸图像进行人脸姿态矫正,得到标准人脸姿态的矫正人脸图像;其中,人脸图像具有至少一个维度的人脸属性;通过判别模型对矫正人脸图像进行真实性预测,得到表征矫正人脸图像相较于目标人脸图像的真实性的预测结果,并基于预测结果构建第一损失函数;通过人脸属性识别模型,对矫正人脸图像针对至少一个维度的人脸属性进行人脸属性识别,得到包含矫正人脸图像所具有的人脸属性的识别结果,并基于识别结果构建第二损失函数;基于第一损失函数及第二损失函数构建目标损失函数,并获取目标损失函数的值;基于目标损失函数的值,对人脸矫正模型的模型参数进行更新。如此,实现对人脸矫正模型的训练。
终端400,还用于发送携带待矫正人脸图像的图像矫正请求给服务器200,使得服务器200解析图像矫正请求后得到待矫正人脸图像,并通过训练得到的人脸矫正模型对待矫正人脸图像进行人脸姿态矫正,得到标准人脸姿态的矫正人脸图像并返回至终端400。
接下来对本申请实施例提供的用于实施上述训练方法的电子设备进行说明,参见图4,图4是本申请实施例提供的电子设备500的结构的一个可选的示意图,在实际应用中,电子设备500可以实施为图3中的终端400或服务器200,以电子设备为图3所示的服务器200为例,对实施本申请实施例的人脸矫正模型的训练方法的电子设备进行说明。图4所示的电子设备500包括:至少一个处理器510、存储器550、至少一个网络接口520和用户接口530。电子设备500中的各个组件通过总线系统540耦合在一起。可理解,总线系统540用于实现这些组件之间的连接通信。总线系统540除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图4中将各种总线都标为总线系统540。
处理器510可以是一种集成电路芯片,具有信号的处理能力,例如通用处理器、 数字信号处理器(Digital Signal Processor,DSP),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其中,通用处理器可以是微处理器或者任何常规的处理器等。
用户接口530包括使得能够呈现媒体内容的一个或多个输出装置531,包括一个或多个扬声器和/或一个或多个视觉显示屏。用户接口530还包括一个或多个输入装置532,包括有助于用户输入的用户接口部件,比如键盘、鼠标、麦克风、触屏显示屏、摄像头、其他输入按钮和控件。
存储器550可以是可移除的,不可移除的或其组合。示例性的硬件设备包括固态存储器,硬盘驱动器,光盘驱动器等。存储器550可选地包括在物理位置上远离处理器510的一个或多个存储设备。
存储器550包括易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。非易失性存储器可以是只读存储器(Read Only Memory,ROM),易失性存储器可以是随机存取存储器(Random Access Memory,RAM)。本申请实施例描述的存储器550旨在包括任意适合类型的存储器。
在一些实施例中,存储器550能够存储数据以支持各种操作,这些数据的示例包括程序、模块和数据结构或者其子集或超集,下面示例性说明。
操作系统551,包括用于处理各种基本系统服务和执行硬件相关任务的系统程序,例如框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务;
网络通信模块552,用于经由一个或多个(有线或无线)网络接口520到达其他计算设备,示例性的网络接口520包括:蓝牙、无线相容性认证(WiFi)、和通用串行总线(Universal Serial Bus,USB)等;
呈现模块553,用于经由一个或多个与用户接口530相关联的输出装置531(例如,显示屏、扬声器等)使得能够呈现信息(例如,用于操作外围设备和显示内容和信息的用户接口);
输入处理模块554,用于对一个或多个来自一个或多个输入装置532之一的一个或多个用户输入或互动进行检测以及翻译所检测的输入或互动。
在一些实施例中,本申请实施例提供的人脸矫正模型的训练装置可以采用软件方式实现,图4示出了存储在存储器550中的人脸矫正模型的训练装置555,其可以是程序和插件等形式的软件,包括以下软件模块:人脸姿态矫正模块5551、预测模块5552、属性识别模块5553、参数更新模块5554和参数更新模块5555,这些模块是逻辑上的,因此根据所实现的功能可以进行任意的组合或进一步拆分。将在下文中说明各个模块的功能。
在另一些实施例中,本申请实施例提供的人脸矫正模型的训练装置可以采用硬件方式实现,作为示例,本申请实施例提供的人脸矫正模型的训练装置可以是采用硬件译码处理器形式的处理器,其被编程以执行本申请实施例提供的人脸矫正模型的训练方法,例如,硬件译码处理器形式的处理器可以采用一个或多个应用专用集成电路(Application Specific Integrated Circuit,ASIC)、DSP、可编程逻辑器件(Programmable Logic Device,PLD)、复杂可编程逻辑器件(Complex Programmable Logic Device,CPLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或其他电子元件。
在对本申请实施例提供的人脸矫正模型的训练方法说明之前,先对本申请实施例提供的人脸矫正模型的结构进行说明,图5是本申请实施例提供的人脸矫正模型的一个可选的结构示意图。本申请实施例提供的人脸矫正模型包括一个编码器和一个解码 器。服务器在实施本申请实施例提供的人脸矫正模型的训练方法之前,还构造一个由编码器和一个解码器组成的人脸矫正模型。其中,编码器用于将输入的图像进行编码,输出图像的图像编码,图像编码可以为多元的一维向量表示,还可以是多元的多维向量表示,例如可以将一张图像编码为256元的一维向量,或者256元的256维向量。解码器则用于将输入的噪声进行解码后生成图像并输出,通常噪声就是一个一维的向量,经过reshape函数计算得到一个二维图像,然后利用若干个反卷积层来学习上采样。在实际实施时,可以将随机的噪声和选取的样本向量同时输入解码器中,共同约束解码器生成图像。本申请实施例的人脸矫正模型中,则是将随机的噪声和由编码器生成的图像编码输入解码器中,以使解码器解码生成对应的人脸图像。本申请实施例提供的人脸矫正模型,在利用编码器将输入的任意姿态的人脸图像进行编码后,还对编码得到的图像编码进行修正,以改变图像编码映射得到的人脸图像的人脸姿态,本申请实施例中则将修正得到的目标图像编码映射为标准姿态的人脸图像,将修正后的目标图像编码输入解码器中,则可以生成矫正人脸图像。
在一些实施例中,编码器可以采用AlexNet网络的前5层,外加一个全连接层构建,全连接层为前后层的神经元全连接,用于特征映射以及降维,并将AlexNet网络的激活函数由线性整流函数(Rectified Linear Unit,ReLU)改为ELU激活函数。本申请实施例涉及的生成对抗网络可以采用深度卷积生成对抗网络(Deep onvolutional Generative Adversarial Network,DCGAN)。
接下来,对本申请实施例提供的训练过程的模型架构进行说明,图6是本申请实施例提供的训练过程的模型架构的一个可选的示意图,参见图6,本申请实施例提供的训练过程的模型架构包括:
人脸矫正模型61,用于对输入的人脸图像进行人脸姿态矫正,得到标准人脸姿态的矫正人脸图像;其中,人脸图像具有至少一个维度的人脸属性;
判别模型62,用于对矫正人脸图像进行真实性预测,得到表征矫正人脸图像相较于目标人脸图像的真实性的预测结果;
人脸属性识别模型63,用于对矫正人脸图像针对至少一个维度的人脸属性进行人脸属性识别,得到包含矫正人脸图像所具有的人脸属性的识别结果。
基于上述模型架构,利用模型架构中各模型的输出,对人脸矫正模型61的模型参数和判别模型62的模型参数进行更新,实现人脸矫正模型61和判别模型62的对抗训练,进而可利用训练得到的人脸矫正模型61,实现在保留人脸属性下的人脸矫正。
基于上述对本申请实施例的人脸矫正模型的训练系统、电子设备及人脸矫正模型的结构的说明,接下来对本申请实施例提供的人脸矫正模型的训练方法进行说明。在一些实施例中,本申请实施例提供的人脸矫正模型的训练方法可以由终端单独实施,或由服务器单独实施,或由服务器及终端协同实施。
下面以服务器实施为例,结合本申请实施例提供的服务器的示例性应用和实施,说明本申请实施例提供的人脸矫正模型的训练方法。参见图7,图7是本申请实施例提供的人脸矫正模型的训练方法的一个可选的流程示意图,将结合图7示出的步骤进行说明。
在一些实施例中,在进行人脸矫正模型的训练之前,需要获取模型的训练样本。参照图8,图8是本申请实施例提供的训练样本的一个可选的示意图。训练样本包括输入人脸矫正模型的人脸图像、该人脸图像的人脸属性(图未示出)以及目标人脸图像。为了便于说明,本申请实施例将人脸图像记为A,将目标人脸图像记为B,将人 脸图像所具有的人脸属性记为C,一组训练数据则可以记为(A,B,C)。其中,人脸图像A和目标人脸图像B均为真实世界的人脸图像。人脸图像A的人脸姿态可以是任意姿态,例如侧脸姿态。目标人脸图像B可以是标准人脸姿态的人脸图像,它可以与人脸图像对应为同一个人,还可以对应为两个不同的人。本申请实施例对人脸属性定义有至少一个维度,例如可以将人脸属性定义为包含以下人脸属性标签中的至少之一:性别、年龄、表情、头发长度、是否有佩戴物等。其中,头发长度还可以进一步划分为是否为长发、是否为短发、是否光头等多个维度。是否有佩戴物可以进一步划分为是否戴眼镜、是否带帽子、是否戴耳饰等多个维度。例如图8示出的人脸图像A,其对应的人脸属性则可以包括:男、22岁、无表情、短发、戴眼镜。对于人脸属性的具体定义本申请实施例不作具体限定。
还需要说明的是,训练样本由多组人脸图像、人脸图像的人脸属性以及目标人脸图像构成。训练样本可以为已经构造好的通用样本,服务器通过访问目标设备,从目标设备中进行获取,还可以是用户基于客户端进行上传,服务器接收客户端发送的由用户上传的训练样本。其中,目标设备可以是服务器本身,训练样本预存与服务器本地,服务器通过访问训练样本的存储地址来获取。目标设备还可以是与服务器通信连接的外部设备,例如可以是终端还可以是数据库服务器等,服务器通过通信连接访问目标设备,从目标设备内获取训练样本。
在一些实施例中,训练样本还可以由服务器构建得到,基于图7,在步骤701之前,还可以执行:
服务器获取目标用户在任意姿态下的人脸图像、目标用户在标准人脸姿态下的目标人脸图像,以及人脸图像所具有的至少一个维度的人脸属性;
在实际实施时,服务器可以从网页中采集同一目标用户在任意姿态下的人脸图像、与该人脸图像具有相同人脸属性的标准人脸姿态下的目标人脸图像、以及人脸图像所具有的至少一个维度的人脸属性。在一些实施例中,服务器还可以利用与其通信连接的摄像头,对目标用户进行拍摄,获得目标用户在任意姿态下的人脸图像,以及目标用户在标准姿态下的目标人脸图像。其中,关于人脸属性的采集,可以由模型训练人员基于人脸图像进行人为识别后输入至服务器中。在实际实施时,服务器从网页中采集得到同一目标用户在任意姿态下的人脸图像后,将人脸图像发送至客户端进行输出,模型训练人员基于输出的人脸图像,人为识别得到该人脸图像所具有的人脸属性。然后客户端用户基于客户端的输入设备,将人脸属性输入至客户端,客户端将用户输入的人脸属性发送至服务器。服务器则获取客户端输入的人脸属性,与该人脸图像相映射的保存。
基于获取的人脸图像、目标人脸图像及人脸图像所具有的人脸属性,构建用于训练人脸矫正模型的训练样本。
在实际实施时,服务器将人脸图像、目标人脸图像及人脸图像所具有的人脸属性作为一组训练数据,通过上述方式获得多组训练数据。服务器将多组训练数据作为训练样本。其中,不同的两组训练数据可以对应同一目标用户,还可以分别对应不同的两个用户。本申请实施例中,同一组训练数据内的人脸图像和目标人脸图像则对应同一目标用户。在一些实施例中,同一组训练数据内的人脸图像和目标人脸图像也可以分别对应不同的两个用户。
在一些实施例中,服务器在获取到训练数据之后,还对获取的训练数据中的图像(也即人脸图像A和目标人脸图像B)进行预处理,服务器可对图像进行如下处理:分别对每帧图像进行大小调整,如对每一帧图像调整图像大小为286×386,然后对图像进行去噪处理,将图像像素值进行归一化,如归一化至-1到1之间,然后,将图像 进行随机剪裁(如随机裁剪出大小为250×350)。服务器还可以对图像进行随机翻转,例如上线翻转或左右翻转等,服务器还可以对图像进行亮度或灰度等的调整,以实现图像的数据增强。然后,服务器基于进行预处理后的各组训练数据,构建训练样本。
上述的步骤中,通过获取目标用户在任意姿态下的人脸图像、目标用户在标准人脸姿态下的目标人脸图像,以及人脸图像所具有的至少一个维度的人脸属性,基于获取的人脸图像、目标人脸图像及人脸图像所具有的人脸属性,构建用于训练人脸矫正模型的训练样本,能够为对人脸矫正模型的训练提供一个可靠有效的训练样本。
在获得训练样本后,服务器则实施步骤701继续对人脸矫正模型进行训练,下面进行说明。
步骤701,服务器通过人脸矫正模型对输入的人脸图像进行人脸姿态矫正,得到标准人脸姿态的矫正人脸图像;其中,人脸图像具有至少一个维度的人脸属性;
需要说明的是,人脸矫正模型可以将任意姿态的人脸图像进行跨姿态的人脸矫正,得到标准姿态下的矫正人脸图像,在不断训练过程中能够生成更接近标准姿态和真实图像的矫正人脸图像。
在一些实施例中,图7示出的步骤701可以通过如下方式实现,将结合各步骤进行说明。
服务器将任意姿态下的人脸图像输入至人脸矫正模型;通过人脸矫正模型对人脸图像进行编码,得到初始图像编码;
在实际实施时,服务器将任意姿态下的人脸图像输入至人脸矫正模型的编码器。通过编码器五个卷积层对人脸图像进行卷积操作,其中第一个和第二个卷积层对人脸图像进行局部响应归一化(Local Response Normalization,LRN)处理,第一个、第二个和第五个卷积层在卷积操作之后都进行了最大池化操作(MaxPooling)。其中,卷积层使用的激活函数均为ReLU函数。经过卷积层的卷积操作之后,利用全连接层对卷积层的输出进行特征映射和降维处理,得到人脸图像的初始图像编码。
基于人脸图像中人脸姿态及标准人脸姿态的偏差,修正初始图像编码,得到目标图像编码;
在实际实施时,服务器基于人脸图像中的人脸姿态与标准人脸姿态之间的偏差,对初始图像编码进行修正,以使修正后得到的目标图像编码能够映射到标准人脸姿态的人脸图像。在一些实施例中,服务器可以采用仿射变换,即可以采用RST(旋转-变比-平移)变换、多项式模型(Polynomial)或者局部三角网(Triangulation)算法对初始图像编码进行修正,以对初始图像编码在向量层面实现人脸姿态的变换,得到对应于标准人脸姿态的目标图像编码。
解码目标图像编码,得到标准人脸姿态的矫正人脸图像。
这里,基于上述对初始编码的修正得到的目标图像编码,则可以映射到标准人脸姿态的人脸图像,服务器在利用人脸矫正模型对目标图像编码进行解码之后,得到标准人脸姿态的矫正人脸图像。其中,解码过程为,服务器通过人脸矫正模型的解码器,将输入至解码器的目标图像编码经过全连接层,经过reshape函数计算得到一个三维张量,该三维张量经过4个反卷积网络进行上采样后,生成一张二维的矫正人脸图像。例如,若目标图像编码为一个1*100的向量,则经过一个全连接层学习,可以将目标图像编码reshape到一个4*4*1024的三维张量,再经过4个上采样的反卷积网络,生成64*64的二维图像,即矫正人脸图像。
步骤702,通过判别模型对矫正人脸图像进行真实性预测,得到表征矫正人脸图像相较于目标人脸图像的真实性的预测结果,并基于预测结果构建第一损失函数;
需要说明的是,判别模型是一个卷积神经网络(Convolutional Neural Networks, CNN)分类器,在DCGAN中判别模型有4个卷积层。它实现对输入的样本的真实性分类。在实际实施时,服务器将真实世界的目标人脸图像和由人脸矫正模型生成的矫正人脸图像输入到判别模型中,判别模型则以目标人脸图像为基准,实现对矫正人脸图像的真实性分类,输出矫正人脸图像基于目标人脸图像的真实性概率的预测结果。若输出的预测结果表征的真实性概率为1,则表示矫正人脸图像为真实的图像,若输出的预测结果表征的真实性概率为0,则表示矫正人脸图像不是真实的图像,若输出的预测结果表征的真实性概率为0.5,则表示判别模型无法判断矫正人脸图像是否为真实的图像。
在实际实施时,终端还基于预测结果构建第一损失函数。该第一损失函数用于对人脸矫正模型的解码器参数和判别模型的模型参数进行更新。在一些实施例中,第一损失函数基于公式(1)构建:
L gan=min Gmax D(log D(B)+log(1-D(G(A))))       (1)
其中,L gan为第一损失函数,D(B)为判别模型对目标人脸图像B进行真实性预测的预测结果,G(A)为矫正人脸图像,D(G(A))为判别模型对矫正人脸图像G(A)进行真实性预测的预测结果。
在一些实施例中,图7示出的步骤702中的“通过判别模型对矫正人脸图像进行真实性预测,得到表征矫正人脸图像相较于目标人脸图像的真实性的预测结果”可以通过如下方式实现,将结合各步骤进行说明。
服务器将矫正人脸图像和目标人脸图像输入至判别模型;通过判别模型分别对矫正人脸图像和目标人脸图像进行特征提取,得到矫正人脸图像对应的矫正人脸特征、及目标人脸图像对应的目标人脸特征;
在实际实施时,服务器将矫正人脸图像G(A)和目标人脸图像B输入至判别模型,利用判别模型分别进行特征提取。本申请实施例采用的判别模型使用带步长的卷积实现下采样操作,输入的图像通过与卷积核的数学运算,提取出图像的某些指定特征。在本申请实施例中,通过判别模型将输入的矫正人脸图像与卷积核进行数学技术,得到矫正人脸图像对应的矫正人脸特征,将输入的目标人脸图像与卷积核进行数学技术,得到目标人脸图像的目标人脸特征。其中,矫正人脸特征和目标人脸特征为向量表示。
基于矫正人脸特征及目标人脸特征,预测得到表征矫正人脸图像相较于目标人脸图像的真实性的预测结果。
在实际实施时,判别模型在卷积层实现下采样,得到矫正人脸特征和目标人脸特征后,则使用全连接层对矫正人脸特征和目标人脸特征进行处理后得到固定长度的特征向量。判别模型可以接受任意尺寸的输入图像,利用反卷积层对最后一个卷积层的特征图像(feature map)进行上采样,使它恢复到输入图像相同的尺寸,从而可以对矫正人脸图像的每个像素都产生了一个预测,同时保留了原始输入图像中的空间信息,最后在上采样的特征图上进行逐像素分类,通过softmax函数映射输出表征矫正人脸图像相较于目标人脸图像的真实性的预测结果。
上述的通过判别模型对输入的矫正人脸图像进行真实性概率预测的过程,能够有效地对矫正人脸图像进行真实概率预测,得到基于目标人脸图像的真实性的预测结果。
步骤703,通过人脸属性识别模型,对矫正人脸图像针对至少一个维度的人脸属性进行人脸属性识别,得到包含矫正人脸图像所具有的人脸属性的识别结果,并基于识别结果构建第二损失函数;
需要说明的是,若人脸属性只有一个维度,则人脸属性识别模型为一个一对一的 分类模型。若人脸属性具有多个维度,则人脸属性识别模型为一个多任务多分类的一对多的分类模型,包含多个线性判别函数,可以采用softmax回归来实现多类的Logistic回归。为了便于说明,本申请实施例将人脸属性记为C,并对C定义n个维度,将第n维度的人脸属性标签记为c n,则C=[c 1,c 2,c,c n]。人脸属性C则可以有n个取值,给定一个x,softmax回归预测的属于第n个维度的人脸属性标签的条件概率则可以基于公式(2)获得:
Figure PCTCN2021098646-appb-000001
其中,p(y=n|x)为x属于第n个维度的人脸属性标签的条件概率,w n为第n个维度的人脸属性标签的权重向量。
在实际实施时,服务器通过将一张人脸图像输入人脸属性识别模型中,得到含矫正人脸图像所具有的人脸属性的识别结果。人脸属性的识别结果包括至少一个维度的人脸属性标签。
此外,服务器还基于识别结果构建第二损失函数,本申请实施例中,第二损失函数用于结合第一损失函数对人脸矫正模型进行参数更新。将人脸属性识别模型记为FA,在一些实施例中,第二损失函数基于公式(3)构建:
Figure PCTCN2021098646-appb-000002
其中,FA(G(A)为人脸属性识别模型FA对矫正人脸图像G(A)进行人脸属性识别的识别结果,C为人脸图像所具有的人脸属性,L attr为第二损失函数,它表示FA(G(A)和C的交叉熵。
在一些实施例中,图7示出的步骤703中的“通过人脸属性识别模型,对矫正人脸图像针对至少一个维度的人脸属性进行人脸属性识别,得到包含矫正人脸图像所具有的人脸属性的识别结果”可以通过如下方式实现,将结合各步骤进行说明。
服务器将矫正人脸图像和至少一个维度的人脸属性所对应的人脸属性标签,输入至人脸属性识别模型;
在实际实施时,输入人脸属性识别模型的人脸属性标签为人脸图像所实际对应的人脸属性标签,也即训练样本中的人脸属性C所对应的人脸属性标签。它可以由人为进行识别后作为训练样本输入本申请实施例的人脸属性识别模型中。
通过人脸属性识别模型,分别对矫正人脸图像和各维度的人脸属性标签进行特征提取,得到矫正人脸图像对应的矫正人脸特征、及各维度的人脸属性标签所对应的人脸属性特征;
在实际实施时,服务器通过人脸属性识别模型的卷积层分别对矫正人脸图像和各维度的人脸属性标签进行下采样,以实现特征的提取,得到矫正人脸图像对应的矫正人脸特征,以及各维度的人脸属性标签所对应的人脸属性特征。
基于得到的矫正人脸特征及人脸属性特征,预测得到包含矫正人脸图像所具有的人脸属性的识别结果。
在实际实施时,服务器利用人脸属性识别模型的反卷积层对最后一个卷积层的feature map,即矫正人脸特征进行上采样,使它恢复到输入图像相同的尺寸,从而可以对矫正人脸图像的每个像素都产生了一个预测,同时保留了原始输入图像中的空间信息,最后在上采样的特征图上进行逐像素分类,通过softmax函数映射输出与人脸属性特征所对应的至少一个维度的矫正人脸属性标签,作为预测得到的包含矫正人脸图像所具有的人脸属性的识别结果。
上述的通过人脸属性识别模型对矫正人脸图像进行人脸属性识别的过程,能够有 效识别得矫正人脸图像的至少一个维度的人脸属性。
步骤704,基于第一损失函数及第二损失函数,对人脸矫正模型的模型参数进行更新;
需要说明的是,服务器结合第一损失函数及第二损失函数共同训练由人脸矫正模型和判别模型构成的生成对抗网络,在不断的迭代训练中,使生成对抗网络达到收敛后完成模型的训练,从而使得训练得到的人脸矫正模型在实现人脸的跨姿态的矫正的同时还能够保留原始的人脸属性。
在一些实施例中,图7示出的步骤704可以通过如下方式实现,将结合各步骤进行说明。
分别确定第一损失函数的权值和第二损失函数的权值;基于第一损失函数的权值和第二损失函数的权值,对第一损失函数和第二损失函数进行加权求和,得到目标损失函数;基于目标损失函数,对人脸矫正模型的模型参数进行更新。
本申请实施例中,服务器利用第一损失函数和第二损失函数构建目标损失函数,利用目标损失函数来对人脸矫正模型进行训练,其中,第二损失函数基于人脸属性识别模型构建,通过结合人脸属性识别模型来构建对人脸矫正模型进行训练的目标损失函数,能够使得训练得到的人脸矫正模型保留有人脸图像的原始的人脸属性,从而使得利用人脸矫正模型矫正后的矫正人脸图像更加接近原始的人脸图像的人脸属性。
在实际实施时,服务器可以基于预先设置的第一损失函数和第二损失函数权重分配,分别确定第一损失函数的权值和第二损失函数的权值。其中,对第一损失函数和第二损失函数权重分配可以基于对最终需要侧重的功能来划分,例如若希望人脸矫正模型拥有更强的人脸姿态的矫正效果,则将第一损失函数设置为相较于第二损失函数更高的权值。若希望人脸矫正模型拥有更强的保留人脸属性的效果,则将第二损失函数设置为相较于第一损失函数更高的权值。其中,权值越高代表所占的比重越高,也即重要性越高。
在一些实施例中,第一损失函数的权值和第二损失函数的权值可以预先存储于服务器内,还可以由用户基于客户端的用户界面进行输入,然后客户端将用户输入的权值发送给服务器,服务器则接收客户端输入的权值,得到第一损失函数的权值和第二损失函数的权值。
接下来,服务器则基于第一损失函数的权值和第二损失函数的权值,对第一损失函数和第二损失函数进行加权求和以得到目标损失函数。在实际实施时,服务器构建得到的目标损失函数可以参照公式(4):
Loss=αL gan+βL attr            (4)
其中,Loss为目标损失函数,α为第一损失函数L gan的权值,β为第二损失函数L attr的权值。
通过上述的目标损失函数的构造过程,能够使得人脸属性识别模型的损失函数结合生成对抗网络的损失函数,最终构造出一个目标损失函数来对本申请实施例的生成对抗网络进行训练,使得训练得到的人脸矫正模型在拥有人脸矫正功能的同时,还能够使得训练得到的矫正人脸图像保留有与矫正前的人脸图像一致的人脸属性。
在一些实施例中,上述的基于目标损失函数,对人脸矫正模型的模型参数进行更新,可以通过如下方式实现:服务器基于预测结果确定第一损失函数的值;基于人脸图像所具有的人脸属性与识别结果之间的差异,确定第二损失函数的值;基于第一损失函数的值和第二损失函数的值,确定目标损失函数的值;基于目标损失函数的值,对人脸矫正模型的模型参数进行更新。
需要说明的是,预测结果即为矫正人脸图像与目标人脸图像相似的概率,本申请 实施例的目标人脸图像与人脸图像对应为同一目标用户,则预测结果越大,也即矫正人脸图像与目标人脸图像相似的概率越大,则表示矫正得到矫正人脸图像越成功。而在一些实施例中,目标人脸图像可以设置为与人脸图像对应不同的用户,则预测结果越小,也即矫正人脸图像与目标人脸图像相似的概率越小,则表示矫正得到矫正人脸图像越成功。服务器可以基于预测结果,利用公式(1)计算得到第一损失函数的值。
在实际实施时,人脸属性识别模型对矫正人脸图像的属性识别的识别结果具有至少一个维度的人脸属性标签。服务器利用人脸图像所具有的人脸属性与识别结果的交叉熵来表征人脸图像所具有的人脸属性与识别结果之间的差异,通过利用公式(2)计算人脸图像所具有的人脸属性与识别结果的交叉熵,来获得第二损失函数的值。
在实际实施时,服务器在得到第一损失函数的值和第二损失函数的值之后,则可以进一步确定目标损失函数的值。在一些实施例中,服务器首先确定第一损失函数的权值和第二损失函数的权值,将第一损失函数的值与第二损失函数的值进行加权求和后,得到目标损失函数的值。
在对人脸矫正模型的模型参数进行更新时,服务器固定住人脸属性识别模型的模型参数,基于目标损失函数的值,对本申请实施例提供的生成对抗网络进行模型参数的更新,从而实现对人脸矫正模型的训练。
在一些实施例中,上述的基于目标损失函数的值,对人脸矫正模型的模型参数进行更新,可以通过如下方式实现,将结合各步骤进行说明。
当目标损失函数的值达到第一阈值时,服务器基于目标损失函数确定相应的第一误差信号;从判别模型的输出层开始,将第一误差信号在判别模型及人脸矫正模型中反向传播,并在传播的过程中更新判别模型及人脸矫正模型的模型参数。
在一些实施例中,服务器可通过如下方式实现对人脸矫正模型的训练:
服务器在对人脸矫正模型的训练过程中固定人脸属性识别模型的模型参数,当目标损失函数的值达到第一阈值时,基于目标损失函数确定相应的第一误差信号,将第一误差信号在人脸矫正模型和判别模型中反向传播,并在传播的过程中更新人脸矫正模型的各个层的模型参数以及判别模型的各个层的模型参数。在一些实施例中,服务器将第一误差信号在人脸矫正模型和判别模型中反向传播,并在传播的过程中更新人脸矫正模型的各个层的模型参数以及判别模型的各个层的模型参数。
这里对反向传播进行说明,将训练样本输入到神经网络模型的输入层,经过隐藏层,最后达到输出层并输出结果,这是神经网络模型的前向传播过程,由于神经网络模型的输出结果与实际结果有误差,则计算输出结果与实际值之间的误差,并将该误差从输出层向隐藏层反向传播,直至传播到输入层,在反向传播的过程中,根据误差调整模型参数的值;不断迭代上述过程,直至收敛。
以目标损失函数为例,服务器基于目标损失函数确定第一误差信号,第一误差信号从人脸矫正模型或判别模型的输出层反向传播,逐层反向传播第一误差信号,在第一误差信号到达每一层时,结合传导的第一误差信号来求解梯度(也就是Loss函数对该层参数的偏导数),将该层的参数更新对应的梯度值。
通俗来说,就是服务器向人脸矫正模型输入一组适当概率分布的人脸图像,然后得到一堆生成的矫正人脸图像,固定住人脸属性识别模型的模型参数,将矫正人脸图像输入人脸属性识别模型,得到矫正人脸属性,然后将这些矫正人脸图像作为反例,同时结合人脸图像的人脸属性与矫正人脸属性之间的差异,用真实世界的目标人脸图像作为正例训练判别模型。这轮训练后,得到的判别模型的能力得到了提升,能够学会给一些真实的且矫正人脸属性接近于真实世界的人脸图像的人脸属性的图片打高分,给一些非真实且矫正人脸属性与真实世界的人脸图像的人脸属性相差较远的图片 打低分。这之后,服务器再固定判别模型的模型参数。此时服务器给人脸矫正模型输入一张人脸图像,再把它生成的矫正人脸图像送入判别模型中,将得到一个判别模型输出的反馈分数。这个反馈分数就可以作为LOSS,我们根据LOSS FUNCTION的梯度调整人脸矫正模型的参数,使得它尽可能生成可以骗过这个版本的判别模型,从它手下得到一个高分。这轮训练后,得到的人脸矫正模型的能力也得到了提升,能够生成一些更真实的图像了。然后服务器继续重复上面的过程,强化判别模型,判别模型强化后再强化人脸矫正模型,可以期望的是,多轮迭代后,得到的判别模型和人脸矫正模型的能力都可以变得更强,且得到的人脸矫正模型在实现人脸跨姿态的矫正的同时,还能够更多的保留输入的人脸图像的人脸属性。
在一些实施例中,参照图9,图9是本申请实施例提供的用于模型训练的模型架构的一个可选的示意图,基于图6,用于模型训练的模型架构还可包括:
人脸识别模型64,分别对矫正人脸图像和标准人脸图像进行特征提取,得到矫正人脸图像对应的矫正人脸特征和标准人脸图像对应的标准人脸特征。
需要说明的是,人脸识别模型可以从人脸图像的语义维度识别人脸,其中,语义维度包括图像的纹理、色彩及形状等。基于人脸识别模型进行人脸图像的特征提取,能够提取得到人脸图像内人脸的语义维度的信息。
在一些实施例中,基于图7,在步骤704之前,还可以执行:
服务器通过人脸识别模型,分别对矫正人脸图像和标准人脸图像进行特征提取,得到矫正人脸图像对应的矫正人脸特征和标准人脸图像对应的标准人脸特征,以基于矫正人脸特征和标准人脸特征构建第三损失函数。需要说明的是,标准人脸图像为目标用户在标准人脸姿态下的人脸图像,它与人脸图像具有完全一致的人脸属性。参照图8,这里,标准人脸图像可以是图示B。
本申请实施例还结合人脸识别模型来对人脸矫正模型进行训练,使得训练得到的人脸矫正模型生成的矫正人脸图像更加接近于原始输入的人脸图像的面部特征。人脸识别模型可以采用CNN模型实现,例如将一张人脸图像输入至人脸识别模型内,可以识别得到该人脸图像所对应的用户身份。本申请实施例不需要对人脸图像进行身份识别,而仅仅利用人脸识别模型对人脸图像进行特征提取,以根据提取的人脸特征来对人脸矫正模型进行训练。
在实际实施时,服务器利用人脸识别模型,在其卷积层分别对矫正人脸图像和标准人脸图像进行特征提取,得到矫正人脸图像的矫正人脸特征,以及表征人脸图像的标准人脸特征。其中,人脸特征可以用向量表示,提取得到的人脸特征可以为多维度的向量,例如256维,或者516维等。
接下来,服务器在得到矫正人脸特征和标准人脸特征,则基于二者构建第三损失函数。其中,矫正人脸特征和标准人脸特征越接近,则表示矫正人脸图像和标准人脸图像越接近。本申请实施例中,人脸特征由向量表示,则可以用矫正人脸特征和标准人脸特征的距离来确定二者是否接近,可以理解,二者之间的距离越小则表示二者越接近,也即矫正人脸图像和标准人脸图像越接近。则服务器可以基于矫正人脸特征和标准人脸特征的距离来构造第三损失函数,构造的第三损失函数参照公式(5):
L recog=L 2(FR(G(A)),FR(B))         (5)
其中,L recog为第三损失函数,FR(G(A))为矫正人脸特征,FR(B)为人脸识别模型FR的标准人脸特征,L recog表示FR(G(A))与FR(B)之间的距离。
相应的,基于图7的步骤704,中的“基于第一损失函数及第二损失函数构建目标损失函数”,包括:服务器基于第一损失函数、第二损失函数及第三损失函数构建目标损失函数。
在实际实施时,服务器分别确定第一损失函数的权值、第二损失函数的权值和第三损失函数的权值,然后基于第一损失函数的权值、第二损失函数的权值和第三损失函数的权值,对第一损失函数、第二损失函数和第三损失函数进行加权求和,得到目标损失函数。在一些实施例中,服务器构建得到的目标损失函数可以参照公式(6):
Loss=αL gan+βL attr+γL recog          (6)
其中,Loss为目标损失函数,α为第一损失函数L gan的权值,β为第二损失函数L attr的权值,γ为第三损失函数L recog的权值。
在实际实施时,服务器可以基于预先设置的第一损失函数、第二损失函数和第三损失函数的权重分配,分别确定第一损失函数的权值、第二损失函数和第三损失函数的权值。其中,对第一损失函数、第二损失函数和第三损失函数的权重分配可以基于对最终需要侧重的功能来划分,例如若希望人脸矫正模型拥有更强的人脸姿态的矫正效果,则将第一损失函数设置为相较于第二损失函数更高的权值。若希望人脸矫正模型拥有更强的保留人脸属性的效果,则将第二损失函数设置为相较于第一损失函数更高的权值。若希望人脸矫正模型拥有更强的保留原始的人脸面部特征的效果,则将第三损失函数设置为相较于第一损失函数更高的权值。其中,权值越高代表所占的比重越高,也即重要性越高。
通过上述的目标损失函数的构造过程,能够使得人脸属性识别模型和人脸识别模型的损失函数结合生成对抗网络的损失函数,最终构造出一个目标损失函数来对本申请实施例的生成对抗网络进行训练,使得训练得到的人脸矫正模型在拥有人脸矫正功能的同时,还能够使得训练得到的矫正人脸图像保留有与矫正前的人脸图像更接近的人脸属性以及更接近的面部特征。
在一些实施例中,图7示出的步骤704还可以通过以下步骤实现:获取第一损失函数的值、第二损失函数的值和第三损失函数的值;基于第一损失函数的值、第二损失函数的值和第三损失函数的值,确定目标损失函数的值;当目标损失函数的值达到第二阈值时,服务器基于目标损失函数确定相应的第二误差信号;从判别模型的输出层开始,将第二误差信号在判别模型及人脸矫正模型中反向传播,并在传播的过程中更新判别模型及人脸矫正模型的模型参数。
其中,服务器获取第一损失函数的值和获取第二损失函数的值的过程在此不再赘述。在一些实施例中,获取第三损失函数的值,包括:获取矫正人脸特征和标准人脸特征之间的距离;基于距离,确定第三损失函数的值。在实际实施时,服务器可以基于矫正人脸特征和标准人脸特征,计算矫正人脸特征和标准人脸特征之间的距离,将该距离确定为第三损失函数的值。
在实际实施时,服务器在得到第一损失函数的值、第二损失函数的值和第三损失函数的值之后,进一步确定目标损失函数的值。在一些实施例中,服务器首先确定第一损失函数的权值、第二损失函数的权值及第三损失函数的权值,将第一损失函数的值、第二损失函数的值及第三损失函数进行加权求和后,得到目标损失函数的值。
服务器在得到目标损失函数的值后,则基于目标损失函数的值更新人脸矫正模型的模型参数。在一些实施例中,服务器可通过如下方式实现对人脸矫正模型的训练:
服务器在对人脸矫正模型的训练过程中固定人脸属性识别模型的模型参数和人脸识别模型的模型参数,当目标损失函数的值达到第二阈值时,基于目标损失函数确定相应的第二误差信号,将第二误差信号在人脸矫正模型和判别模型中反向传播,并在传播的过程中更新人脸矫正模型的各个层的模型参数以及判别模型的各个层的模型参数。在一些实施例中,服务器将第二误差信号在人脸矫正模型和判别模型中反向传播,并在传播的过程中更新人脸矫正模型的各个层的模型参数以及判别模型的各个 层的模型参数。
通俗来说,就是服务器向人脸矫正模型输入一组适当概率分布的人脸图像,然后得到一堆生成的矫正人脸图像,固定住人脸属性识别模型和人脸识别模型的模型参数,将矫正人脸图像输入人脸属性识别模型,得到矫正人脸属性,将矫正人脸图像和标准人脸图像输入人脸识别模型,得到矫正人脸特征和标准人脸特征,然后将这些矫正人脸图像作为反例,同时结合人脸图像的人脸属性与矫正人脸属性之间的差异、以及矫正人脸特征和标准人脸特征的距离得到目标损失函数的值,基于该目标损失函数的值,用真实世界的目标人脸图像作为正例训练判别模型。这轮训练后,得到的判别模型的能力得到了提升,能够学会给一些真实的、人脸属性接近于真实世界且更接近于人脸图像的人脸面部特征的图片打高分,给一些非真实、矫正人脸属性与真实世界的人脸图像的人脸属性相差较远的、且人脸面部特征与人脸图像的人脸面部特征相差较远的图片打低分。这之后,服务器再固定判别模型的模型参数。此时服务器给人脸矫正模型输入一张人脸图像,再把它生成的矫正人脸图像送入判别模型中,将得到一个判别模型输出的反馈分数。这个反馈分数就可以作为LOSS,我们根据LOSS FUNCTION的梯度调整人脸矫正模型的参数,使得它尽可能生成可以骗过这个版本的判别模型,从它手下得到一个高分。这轮训练后,得到的人脸矫正模型的能力也得到了提升,能够生成一些更真实的图像了。然后服务器继续重复上面的过程,强化判别模型,判别模型强化后再强化人脸矫正模型,可以期望的是,多轮迭代后,得到的判别模型和人脸矫正模型的能力都可以变得更强,且得到的人脸矫正模型在实现人脸跨姿态的矫正的同时,还能够更多的保留输入的人脸图像的人脸属性和面部特征。
上述的步骤中,通过人脸矫正模型对输入的人脸图像进行人脸姿态矫正,得到标准人脸姿态的矫正人脸图像,然后通过判别模型对所述矫正人脸图像进行真实性预测,得到表征矫正人脸图像相较于目标人脸图像的真实性的预测结果,并基于预测结果构建第一损失函数,通过人脸属性识别模型,对矫正人脸图像针对至少一个维度的人脸属性进行人脸属性识别,得到包含矫正人脸图像所具有的人脸属性的识别结果,并基于识别结果构建第二损失函数,并基于第一损失函数及第二损失函数构建目标损失函数,最后基于目标损失函数的值,对人脸矫正模型的模型参数进行更新;如此,通过结合人脸属性识别模型的损失函数来构建对人脸矫正模型进行训练的目标损失函数,能够使得训练得到的人脸矫正模型保留有人脸图像的原始的人脸属性,从而使得利用人脸矫正模型矫正后的矫正人脸图像更加接近原始的人脸图像的人脸属性,使得通过本申请实施例训练得到的人脸矫正模型在实现人脸的跨姿态矫正的同时不丢失人脸图像的信息。
接下来继续对本申请实施例提供的人脸矫正模型的训练方法进行介绍,图10是本申请实施例提供的人脸矫正模型的训练方法的一个可选的流程示意图,参见图10,本申请实施例提供的人脸矫正模型的训练方法由终端、服务器协同实施。
步骤801,终端响应于针对人脸图像和目标人脸图像的上传操作,接收上传的人脸图像和目标人脸图像;
其中,人脸图像为目标用户在任意人脸姿态下的人脸图像,目标人脸图像为目标用户在标准人脸姿态下的人脸图像。
步骤802,终端响应于针对人脸图像的人脸属性输入操作,获取人脸图像的至少一个维度的人脸属性;
步骤803,终端发送人脸图像、人脸图像的至少一个维度的人脸属性、以及目标人脸图像至服务器。
步骤804,服务器基于接收到的人脸图像、人脸图像的至少一个维度的人脸属性、以及目标人脸图像,构建用于训练人脸矫正模型的训练样本;
步骤805,服务器通过人脸矫正模型对输入的人脸图像进行人脸姿态矫正,得到标准人脸姿态的矫正人脸图像;
步骤806,服务器通过判别模型对矫正人脸图像进行真实性预测,得到表征矫正人脸图像相较于目标人脸图像的真实性的预测结果,并基于预测结果构建第一损失函数;
步骤807,服务器通过人脸属性识别模型,对矫正人脸图像针对至少一个维度的人脸属性进行人脸属性识别,得到包含矫正人脸图像所具有的人脸属性的识别结果,并基于识别结果构建第二损失函数;
步骤808,服务器通过人脸识别模型,分别对矫正人脸图像和标准人脸图像进行特征提取,得到矫正人脸图像对应的矫正人脸特征和标准人脸图像对应的标准人脸特征,并基于矫正人脸特征和标准人脸特征构建第三损失函数;
步骤809,服务器基于第一损失函数、第二损失函数及第三损失函数构建目标损失函数;
步骤810,服务器获取第一损失函数的值、第二损失函数的值和第三损失函数的值;
步骤811,服务器基于第一损失函数的值、第二损失函数的值和第三损失函数的值,确定目标损失函数的值;
步骤812,当目标损失函数的值达到第二阈值时,服务器基于目标损失函数确定相应的第二误差信号;
步骤813,服务器从判别模型的输出层开始,将第二误差信号在判别模型及人脸矫正模型中反向传播,并在传播的过程中更新判别模型及人脸矫正模型的模型参数。
上述的步骤中,终端接收训练样本发送给服务器,以使服务器根据该训练样本对人脸矫正模型进行训练,通过结合人脸属性识别模型以及人脸识别模型,对人脸矫正模型和判别模型进行模型参数的更新,实现对生成对抗网络的模型训练,使得训练得到的人脸矫正模型在实现人脸的跨姿态矫正的同时,还能够保留原始输入的人脸图像的人脸属性和面部特征,从而得到更接近于输入的人脸图像的目标用户的标准姿态下的矫正人脸图像。
步骤814,终端发送携带待矫正人脸图像的图像矫正请求至服务器;
在实际实施时,图像矫正请求可以是用户基于终端的输入设备发出触发操作后,服务器响应于触发操作后生成。图像矫正请求还可以是终端基于一定的条件后自动生成,例如与终端通信连接的摄像头采集到待矫正人脸图像后,将待矫正人脸图像发送给终端,终端在接收到待矫正人脸图像后生成图像矫正请求。此外,图像矫正请求还可以是终端接收到的由其他设备发出的。
步骤815,服务器解析图像矫正请求后得到待矫正人脸图像,并通过训练得到的人脸矫正模型对待矫正人脸图像进行人脸姿态矫正,得到标准人脸姿态的矫正人脸图像;
步骤816,服务器发送标准人脸姿态的矫正人脸图像至终端。
在实际实施时,终端在接收到服务器发送的矫正人脸图像后,可以在其用户界面进行呈现以供用户浏览,还可以利用矫正人脸图像进行其他处理,例如利用该矫正人脸图像进行人脸身份的识别等。
下面,将说明本申请实施例在一个实际的应用场景中的示例性应用。在实际实施 时,参见图11,图11是本申请实施例提供的人脸矫正模型的训练方法的一个可选的流程示意图,本申请实施例提供的人脸矫正模型的训练方法可包括如下操作:
步骤901,服务器获取由多组训练数据构成的训练样本;其中,一组训练数据包括第一用户在任意姿态下的人脸图像、第一用户在标准人脸姿态下的第一标准人脸图像、人脸图像对应的至少一个维度的人脸属性、第二用户在标准人脸姿态下的第二标准人脸图像。
参照图12,图12是本申请实施例提供的训练样本的一个可选的示意图,其中,将第一用户在任意姿态下的人脸图像记为A、第一用户在标准人脸姿态下的第一标准人脸图像记为B、第二用户在标准人脸姿态下的第二标准人脸图像记为E。需要说明的是,人脸图像与第一标准人脸图像还具有相同的人脸属性,将人脸图像对应的至少一个维度的人脸属性记为C。则训练样本中的一组训练数据可以表示为(A,B,C,E)。其中,人脸属性所对应的至少一个人脸属性标签例如可以是性别、年龄、头发长度、是否戴眼镜、是否戴帽子等。
步骤902,将人脸图像输入至人脸矫正模型中,通过人脸矫正模型对人脸图像进行人脸姿态矫正,得到标准人脸姿态的矫正人脸图像;
这里,人脸矫正模型为生成对抗网络中的生成网络,通过生成网络对人脸图像进行人脸矫正。为了便于说明,本申请实施例将矫正人脸图像记为A’。
步骤903,将矫正人脸图像输入至判别模型中,通过判别模型对矫正人脸图像进行真实性预测,得到表征矫正人脸图像相较于目标人脸图像的真实性的预测结果;
其中,判别模型为生成对抗网络中的判别网络。在实际实施时,服务器将矫正人脸图像A’和第二用户在标准人脸姿态下的第二标准人脸图像E输入判别模型中,通过判别模型基于第二标准人脸图像E对矫正人脸图像A’。
这里,由于人脸图像A和第二标准人脸图像E所对应的是分别两个不同的用户,因此,这里在通过判别模型对矫正人脸图像A’进行真实性预测时,生成的预测结果所表征的概率越低,则表示判别模型的预测越正确。
步骤904,基于预测结果确定第一损失函数的值;
这里,第一损失函数为生成对抗网络对应的损失函数,也即人脸矫正模型与判别模型构成的生成对抗网络所对应的损失函数。其中,第一损失函数L gan可以采用上述公式(1)实现。
步骤905,将矫正人脸图像和第一标准人脸图像输入至人脸识别模型中,通过人脸识别模型,分别对矫正人脸图像和第一标准人脸图像进行特征提取,得到矫正人脸图像对应的矫正人脸特征和标准人脸图像对应的标准人脸特征;
在实际实施时,人脸识别模型是通过特征提取模型来实现的,它将人脸图像映射成一个固定维度的特征表示,比如256维,或512维等,然后通过两个特征之间的距离来确定两张人脸图像是否为同一个人。服务器将矫正人脸图像A’和第一标准人脸图像B输入至人脸识别模型中,通过人脸识别模型分别对矫正人脸图像A’和第一标准人脸图像B进行特征提取。得到矫正人脸图像对应的矫正人脸特征和标准人脸图像对应的标准人脸特征。
步骤906,基于矫正人脸特征和标准人脸特征,计算第三损失函数,得到第三损失函数的值;
这里,第三损失函数为生成对抗网络对应的损失函数,也即人脸矫正模型与判别模型构成的生成对抗网络所对应的损失函数L recog。其中,第三损失函数L recog可以采用上述公式(5)实现。若将矫正人脸特征和标准人脸特征分别表示为FR(G(A))和FR(B),则第三损失函数L recog表示FR(G(A))与FR(B)之间的距离。
步骤907,将矫正人脸图像以及人脸图像对应的至少一个维度的人脸属性输入至人脸属性识别模型中,通过人脸属性识别模型对矫正人脸图像针对至少一个维度的人脸属性进行人脸属性识别,得到包含矫正人脸图像所具有的人脸属性的识别结果;
在实际实施时,人脸属性识别模型是通过一个多任务多分类模型来实现的,它能够识别得到一张人脸图像的至少一个维度的人脸属性,矫正后的矫正人脸图像应该和原始输入的人脸图像保持人脸属性上的一致。服务器通过人脸属性识别模型基于人脸图像对应的至少一个维度的人脸属性C,对输入的矫正人脸图像A’进行人脸属性识别,得到包含矫正人脸图像所具有的人脸属性的识别结果。其中,识别结果则为人脸图像的至少一个维度的人脸属性所对应的至少一个维度的矫正人脸属性。
步骤908,基于人脸图像所具有的人脸属性与识别结果之间的差异,确定第二损失函数的值;
这里,第二损失函数为人脸属性识别模型所对应的损失函数L attr,它可以采用上述公式(3)实现。在实际实施时,得到含矫正人脸图像所具有的人脸属性的识别结果后,则可基于识别结果以及人脸属性C计算得到第二损失函数的值。
步骤909,基于第一损失函数、第二损失函数及第三损失函数,构建目标损失函数,并确定目标损失函数的值;
在实际实施时,服务器对第一损失函数、第二损失函数及第三损失函数分别分配权值,然后基于第一损失函数、第二损失函数及第三损失函数各自的权值,对第一损失函数、第二损失函数及第三损失函数进行加权求和,得到目标损失函数。然后基于各损失函数的权值以及各损失函数的值,计算得到目标损失函数的值。
步骤910,基于目标损失函数的值,对人脸矫正模型的模型参数和判别模型的模型参数进行更新。
在实际实施时,服务器将人脸属性识别模型和人脸识别模型的模型参数固定,然后利用目标损失函数对人脸矫正模型的模型参数和判别模型的模型参数进行更新,通过不断的迭代不断对模型进行训练以更新模型的参数,直至生成对抗网络达到收敛,完成训练。
下面继续说明本申请实施例提供的人脸矫正模型的训练装置555的实施为软件模块的示例性结构,在一些实施例中,如图4所示,存储在存储器550的人脸矫正模型的训练装置555中的软件模块可以包括:
人脸姿态矫正模块5551,配置为通过人脸矫正模型对输入的人脸图像进行人脸姿态矫正,得到标准人脸姿态的矫正人脸图像;其中,所述人脸图像具有至少一个维度的人脸属性;
预测模块5552,配置为通过判别模型对所述矫正人脸图像进行真实性预测,得到表征所述矫正人脸图像相较于目标人脸图像的真实性的预测结果,并基于所述预测结果构建第一损失函数;
属性识别模块5553,配置为通过人脸属性识别模型,对所述矫正人脸图像针对所述至少一个维度的人脸属性进行人脸属性识别,得到包含所述矫正人脸图像所具有的人脸属性的识别结果,并基于所述识别结果构建第二损失函数;
参数更新模块5554,配置为基于所述第一损失函数及所述第二损失函数,对所述人脸矫正模型的模型参数进行更新。
在一些实施例中,所述人脸姿态矫正模块5551,还配置为将任意姿态下的人脸图像输入至所述人脸矫正模型;通过所述人脸矫正模型对所述人脸图像进行编码,得到初始图像编码;基于所述人脸图像中人脸姿态及所述标准人脸姿态的偏差,修正所 述初始图像编码,得到目标图像编码;解码所述目标图像编码,得到所述标准人脸姿态的矫正人脸图像。在一些实施例中,所述预测模块5552,还配置为将所述矫正人脸图像和所述目标人脸图像输入至所述判别模型;通过判别模型分别对所述矫正人脸图像和目标人脸图像进行特征提取,得到所述矫正人脸图像对应的矫正人脸特征、及所述目标人脸图像对应的目标人脸特征;基于所述矫正人脸特征及所述目标人脸特征,预测得到表征所述矫正人脸图像相较于目标人脸图像的真实性的预测结果。
在一些实施例中,所述属性识别模块5553,还配置为将所述矫正人脸图像和所述至少一个维度的人脸属性所对应的人脸属性标签,输入至所述人脸属性识别模型;通过所述人脸属性识别模型,分别对所述矫正人脸图像和各维度的人脸属性标签进行特征提取,得到所述矫正人脸图像对应的矫正人脸特征、及各维度的人脸属性标签所对应的人脸属性特征;基于得到的所述矫正人脸特征及人脸属性特征,预测得到包含所述矫正人脸图像所具有的人脸属性的识别结果。在一些实施例中,所述参数更新模块5554,还配置为分别确定所述第一损失函数的权值和所述第二损失函数的权值;基于所述第一损失函数的权值和所述第二损失函数的权值,对所述第一损失函数和所述第二损失函数进行加权求和,得到目标损失函数;基于所述目标损失函数,对所述人脸矫正模型的模型参数进行更新。
在一些实施例中,所述参数更新模块5554,还配置为基于所述预测结果确定所述第一损失函数的值;基于所述人脸图像所具有的人脸属性与所述识别结果之间的差异,确定所述第二损失函数的值;基于所述第一损失函数的值和所述第二损失函数的值,确定所述目标损失函数的值;基于所述目标损失函数的值,对所述人脸矫正模型的模型参数进行更新。
在一些实施例中,所述参数更新模块5555,还配置为当所述目标损失函数的值达到第一阈值时,基于所述目标损失函数确定相应的第一误差信号;从所述判别模型的输出层开始,将所述第一误差信号在所述判别模型及所述人脸矫正模型中反向传播,并在传播的过程中更新所述判别模型及所述人脸矫正模型的模型参数。
在一些实施例中,存储在存储器550的人脸矫正模型的训练装置555中的软件模块还可以包括:训练样本构建模块,配置为获取目标用户在任意姿态下的人脸图像、所述目标用户在标准人脸姿态下的目标人脸图像,以及所述人脸图像所具有的至少一个维度的人脸属性;基于获取的所述人脸图像、所述目标人脸图像及所述人脸图像所具有的人脸属性,构建配置为训练所述人脸矫正模型的训练样本。在一些实施例中,在一些实施例中,存储在存储器550的人脸矫正模型的训练装置555中的软件模块还可以包括:人脸识别模块,配置为通过人脸识别模型,分别对所述矫正人脸图像和标准人脸图像进行特征提取,得到所述矫正人脸图像对应的矫正人脸特征和所述标准人脸图像对应的标准人脸特征,以基于所述矫正人脸特征和所述标准人脸特征构建第三损失函数;相应的,所述参数更新模块5554,还配置为基于所述第一损失函数、所述第二损失函数及所述第三损失函数,对所述人脸矫正模型的模型参数进行更新。
在一些实施例中,所述参数更新模块5554,还配置为获取所述第一损失函数的值、所述第二损失函数的值和所述第三损失函数的值;基于所述第一损失函数的值、所述第二损失函数的值和所述第三损失函数的值,确定所述目标损失函数的值;基于所述目标损失函数的值,对所述人脸矫正模型的模型参数进行更新。
在一些实施例中,所述参数更新模块5554,还配置为当所述目标损失函数的值达到第二阈值时,基于所述目标损失函数确定相应的第二误差信号;从所述判别模型的输出层开始,将所述第二误差信号在所述判别模型及所述人脸矫正模型中反向传播,并在传播的过程中更新所述判别模型及所述人脸矫正模型的模型参数。
在一些实施例中,所述参数更新模块5554,还配置为获取所述矫正人脸特征和所述标准人脸特征之间的距离;基于所述距离,确定所述第三损失函数的值。
需要说明的是,本申请装置实施例的描述,与本申请上述方法实施例的描述是类似的,具有同本申请方法实施例相似的有益效果,因此不做赘述。
本申请实施例还提供了一种人脸矫正方法,参见图13,图13是本申请实施例提供的人脸矫正方法的一个可选的流程示意图,将结合图13示出的步骤进行说明。
步骤1001,服务器获取待矫正人脸图像;
步骤1002,将待矫正人脸图像输入至人脸矫正模型;
需要说明的是,待矫正人脸图像可以是用户上传至服务器的,还可以是由与服务器连接的其他设备发送至服务器的,还可以是与服务器连接的其他设备实时检测得到的,例如与服务器连接的摄像头拍摄得到。服务器在获得待矫正人脸图像后,还对其进行预处理,例如对待矫正人脸图像进行图像切割、去噪、图像增强等处理。然后,服务器将经过预处理的待矫正人脸图像输入至人脸矫正模型中,以使人脸矫正模型对待矫正人脸图像进行人脸姿态的矫正。
步骤1003,通过人脸矫正模型对待矫正人脸图像进行人脸姿态矫正,得到标准人脸姿态的目标矫正人脸图像;其中,人脸矫正模型基于本申请实施例提供的人脸矫正模型的训练方法训练得到。
在实际实施时,服务器利用人脸矫正模型对输入的待矫正人脸图像进行人脸姿态的矫正,得到标准人脸姿态的目标矫正人脸图像。由于人脸矫正模型是基于本申请实施例提供的人脸矫正模型的训练方法训练得到,通过人脸属性识别模型指导由人脸矫正模型和判别模型构成的生成对抗网络的训练,使得训练得到的人脸矫正模型学习到了人脸属性的数据分布,从而使得通过该模型处理得到的目标矫正人脸图像在实现了人脸姿态的转换之后,还能够保留有输入的待矫正人脸图像的人脸属性。
在一些实施例中,人脸矫正模型包括编码层、修正层以及解码层。相应的,图13示出的步骤1003还可以通过如下方式实现。
服务器通过编码层,对待矫正人脸图像进行编码,得到初始编码;通过修正层,基于待矫正人脸图像中人脸姿态与标准人脸姿态的偏差,修正初始编码,得到目标编码;通过解码层,解码目标编码,得到标准人脸姿态的目标矫正人脸图像;其中,编码层的参数、修正层的参数和解码层的参数为,基于判别模型的预测结果所构建的第一损失函数、及人脸属性识别模型的人脸属性识别结果所构建的第二损失函数进行参数更新得到;其中,预测结果为,判别模型对人脸矫正模型输出的矫正人脸图像进行真实性预测得到;人脸属性识别结果为,人脸属性识别模型对人脸矫正模型输出的矫正人脸图像进行人脸属性识别得到。
需要说明的是,人脸矫正模型的编码层用于对输入的待矫正人脸图像进行数值型编码,以编码得到机器可以统计计算的数据形式。本申请实施例中,服务器利用人脸矫正模型的编码层将待矫正人脸图像编码为向量矩阵的形式。其中,服务器基于待矫正人脸图像各像素点的RGB(Red,Green,Blue,红绿蓝三色)值对其进行编码。实际应用中,服务器获得的待矫正人脸图像中除了人脸元素以外,往往还存在其他图像元素,例如人脸所处的背景图像等,在实际实施时,编码层基于待矫正人脸图像的RGB值进行编码后,还进一步进行人脸元素的特征提取,得到仅包含人脸元素的初始编码。而本申请实施例中,编码层的参数基于本申请实施例提供的人脸矫正模型的训练方法得到,在进行特征提取时还保留有待矫正人脸图像的人脸属性。
应当理解的是,服务器通过编码层编码得到的初始编码所表征的图像的人脸姿态 仍然是待矫正人脸图像中的初始的人脸姿态,为了将其转换为标准人脸姿态,还需进一步对其进行处理。接下来,服务器利用修正层对初始编码进行修正。在实际应用中,服务器利用修正层确定初始编码所表征的图像的人脸姿态与标准人脸姿态之间的偏差,基于该偏差对初始编码进行修正,得到目标编码。这里,目标编码所表征的图像的人脸姿态则为标准人脸姿态。
接下来,服务器利用解码层,将目标编码从数值型的特征向量转换为图像,得到目标矫正人脸图像,该目标矫正人脸图像中的人脸姿态记为标准人脸姿态,至此,完成对待矫正人脸图像的人脸矫正。
在一些实施例中,编码层的参数、修正层的参数和解码层的参数为,基于判别模型的预测结果所构建的第一损失函数、人脸属性识别模型的人脸属性识别结果所构建的第二损失函数、及人脸识别模型提取的矫正人脸特征和标准人脸特征所构建的第三损失函数进行参数更新得到;其中,预测结果为,判别模型对人脸矫正模型输出的矫正人脸图像进行真实性预测得到;人脸属性识别结果为,人脸属性识别模型对人脸矫正模型输出的矫正人脸图像进行人脸属性识别得到;矫正人脸特征为人脸识别模型对人脸矫正模型输出的矫正人脸图像进行特征提取得到;标准人脸特征为人脸识别模型对标准人脸图像进行特征提取得到。
基于上述的编码层的参数、修正层的参数和解码层的参数,服务器利用人脸矫正模型对待矫正人脸图像进行人脸矫正后得到的目标矫正人脸图像在实现了人脸姿态的跨姿态转换的同时还保留有待矫正人脸图像的人脸属性以及人脸语义维度的信息,具体处理过程参照上述实施例,在此不再赘述。
下面继续说明本申请实施例提供的人脸矫正装置实施为软件模块的示例性结构,参见图14,图14是本申请实施例提供的人脸矫正装置的结构的一个可选的示意图,如图14所示,本申请实施例提供的人脸矫正装置14包括:
获取模块1401,配置为获取待矫正人脸图像;
输入模块1402,配置为将所述待矫正人脸图像输入至人脸矫正模型;
矫正模块1403,配置为通过所述人脸矫正模型对所述待矫正人脸图像进行人脸姿态矫正得到标准人脸姿态的目标矫正人脸图像;其中,所述人脸矫正模型基于本申请实施例提供的人脸矫正模型的训练方法训练得到。
在一些实施例中,上述矫正模块1403,还配置为通过所述编码层,对所述待矫正人脸图像进行编码,得到初始编码;通过所述修正层,基于所述待矫正人脸图像中人脸姿态与标准人脸姿态的偏差,修正所述初始编码,得到目标编码;通过所述解码层,解码所述目标编码,得到标准人脸姿态的目标矫正人脸图像;其中,所述编码层的参数、所述修正层的参数和所述解码层的参数为,基于判别模型的预测结果所构建的第一损失函数、及人脸属性识别模型的人脸属性识别结果所构建的第二损失函数进行参数更新得到;其中,所述预测结果为,所述判别模型对所述人脸矫正模型输出的矫正人脸图像进行真实性预测得到;所述人脸属性识别结果为,所述人脸属性识别模型对所述人脸矫正模型输出的矫正人脸图像进行人脸属性识别得到。
需要说明的是,本申请装置实施例的描述,与本申请上述方法实施例的描述是类似的,具有同本申请方法实施例相似的有益效果,因此不做赘述。
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行本申请实施例提供的人脸矫正模型的训练方法或者执行本申请实施例提供的人脸矫正方法。
本申请实施例提供一种存储有可执行指令的计算机可读存储介质,其中存储有可执行指令,当可执行指令被处理器执行时,将引起处理器执行本申请实施例提供的人脸矫正模型的训练方法或者执行本申请实施例提供的人脸矫正方法,例如,图7示出的人脸矫正模型的训练方法,或者图13示出的人脸矫正方法。
在一些实施例中,计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、闪存、磁表面存储器、光盘、或CD-ROM等存储器;也可以是包括上述存储器之一或任意组合的各种设备。
在一些实施例中,可执行指令可以采用程序、软件、软件模块、脚本或代码的形式,按任意形式的编程语言(包括编译或解释语言,或者声明性或过程性语言)来编写,并且其可按任意形式部署,包括被部署为独立的程序或者被部署为模块、组件、子例程或者适合在计算环境中使用的其它单元。
作为示例,可执行指令可以但不一定对应于文件系统中的文件,可以可被存储在保存其它程序或数据的文件的一部分,例如,存储在超文本标记语言(HTML,Hyper Text Markup Language)文档中的一个或多个脚本中,存储在专用于所讨论的程序的单个文件中,或者,存储在多个协同文件(例如,存储一个或多个模块、子程序或代码部分的文件)中。
作为示例,可执行指令可被部署为在一个计算设备上执行,或者在位于一个地点的多个计算设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算设备上执行。
综上所述,通过本申请实施例能够训练得到在实现人脸的跨姿态矫正的同时不丢失人脸图像的信息的人脸矫正模型。
以上所述,仅为本申请的实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和范围之内所作的任何修改、等同替换和改进等,均包含在本申请的保护范围之内。

Claims (20)

  1. 一种人脸矫正模型的训练方法,所述方法由电子设备执行,所述方法包括:
    通过人脸矫正模型对输入的人脸图像进行人脸姿态矫正,得到标准人脸姿态的矫正人脸图像;其中,所述人脸图像具有至少一个维度的人脸属性;
    通过判别模型对所述矫正人脸图像进行真实性预测,得到表征所述矫正人脸图像相较于目标人脸图像的真实性的预测结果,并基于所述预测结果构建第一损失函数;
    通过人脸属性识别模型,对所述矫正人脸图像针对所述至少一个维度的人脸属性进行人脸属性识别,得到包含所述矫正人脸图像所具有的人脸属性的识别结果,并基于所述识别结果构建第二损失函数;
    基于所述第一损失函数及所述第二损失函数,对所述人脸矫正模型的模型参数进行更新。
  2. 根据权利要求1所述的方法,其中,所述通过人脸矫正模型对输入的人脸图像进行人脸姿态矫正,得到标准人脸姿态的矫正人脸图像,包括:
    将任意姿态下的人脸图像输入至所述人脸矫正模型;
    通过所述人脸矫正模型对所述人脸图像进行编码,得到初始图像编码;
    基于所述人脸图像中人脸姿态及所述标准人脸姿态的偏差,修正所述初始图像编码,得到目标图像编码;
    解码所述目标图像编码,得到所述标准人脸姿态的矫正人脸图像。
  3. 根据权利要求1所述的方法,其中,所述通过判别模型对所述矫正人脸图像进行真实性预测,得到表征所述矫正人脸图像相较于目标人脸图像的真实性的预测结果,包括:
    将所述矫正人脸图像和所述目标人脸图像输入至所述判别模型;
    通过判别模型分别对所述矫正人脸图像和目标人脸图像进行特征提取,得到所述矫正人脸图像对应的矫正人脸特征、及所述目标人脸图像对应的目标人脸特征;
    基于所述矫正人脸特征及所述目标人脸特征,预测得到表征所述矫正人脸图像相较于目标人脸图像的真实性的预测结果。
  4. 根据权利要求1所述的方法,其中,所述通过人脸属性识别模型,对所述矫正人脸图像针对所述至少一个维度的人脸属性进行人脸属性识别,得到包含所述矫正人脸图像所具有的人脸属性的识别结果,包括:
    将所述矫正人脸图像和所述至少一个维度的人脸属性所对应的人脸属性标签,输入至所述人脸属性识别模型;
    通过所述人脸属性识别模型,分别对所述矫正人脸图像和各维度的人脸属性标签进行特征提取,得到所述矫正人脸图像对应的矫正人脸特征、及各维度的人脸属性标签所对应的人脸属性特征;
    基于得到的所述矫正人脸特征及人脸属性特征,预测得到包含所述矫正人脸图像所具有的人脸属性的识别结果。
  5. 根据权利要求1所述的方法,其中,所述基于所述第一损失函数及所述第二损失函数,对所述人脸矫正模型的模型参数进行更新,包括:
    分别确定所述第一损失函数的权值和所述第二损失函数的权值;
    基于所述第一损失函数的权值和所述第二损失函数的权值,对所述第一损失函数和所述第二损失函数进行加权求和,得到目标损失函数;
    基于所述目标损失函数,对所述人脸矫正模型的模型参数进行更新。
  6. 根据权利要求5所述的方法,其中,所述基于所述目标损失函数,对所述人脸矫正模型的模型参数进行更新,包括:
    基于所述预测结果确定所述第一损失函数的值;
    基于所述人脸图像所具有的人脸属性与所述识别结果之间的差异,确定所述第二损失函数的值;
    基于所述第一损失函数的值和所述第二损失函数的值,确定所述目标损失函数的值;
    基于所述目标损失函数的值,对所述人脸矫正模型的模型参数进行更新。
  7. 根据权利要求6所述的方法,其中,所述基于所述目标损失函数的值,对所述人脸矫正模型的模型参数进行更新,包括:
    当所述目标损失函数的值达到第一阈值时,基于所述目标损失函数确定相应的第一误差信号;
    从所述判别模型的输出层开始,将所述第一误差信号在所述判别模型及所述人脸矫正模型中反向传播,并在传播的过程中更新所述判别模型及所述人脸矫正模型的模型参数。
  8. 根据权利要求1所述的方法,其中,所述通过人脸矫正模型对输入的人脸图像进行人脸姿态矫正之前,所述方法还包括:
    获取目标用户在任意姿态下的人脸图像、所述目标用户在标准人脸姿态下的目标人脸图像,以及所述人脸图像所具有的至少一个维度的人脸属性;
    基于获取的所述人脸图像、所述目标人脸图像及所述人脸图像所具有的人脸属性,构建用于训练所述人脸矫正模型的训练样本。
  9. 根据权利要求1-8任一项所述的方法,其中,所述基于所述第一损失函数及所述第二损失函数,对所述人脸矫正模型的模型参数进行更新之前,所述方法还包括:
    通过人脸识别模型,分别对所述矫正人脸图像和标准人脸图像进行特征提取,得到所述矫正人脸图像对应的矫正人脸特征和所述标准人脸图像对应的标准人脸特征,以基于所述矫正人脸特征和所述标准人脸特征构建第三损失函数;
    所述基于所述第一损失函数及所述第二损失函数,对所述人脸矫正模型的模型参数进行更新,包括:
    基于所述第一损失函数、所述第二损失函数及所述第三损失函数,对所述人脸矫正模型的模型参数进行更新。
  10. 一种人脸矫正方法,所述方法由电子设备执行,所述方法包括:
    获取待矫正人脸图像;
    将所述待矫正人脸图像输入至人脸矫正模型;
    通过所述人脸矫正模型对所述待矫正人脸图像进行人脸姿态矫正,得到标准人脸姿态的目标矫正人脸图像;
    其中,所述人脸矫正模型基于权利要求1-7任一项所述的人脸矫正模型的训练方法训练得到。
  11. 根据权利要求10所述的人脸矫正方法,其中,所述人脸矫正模型包括编码层、修正层以及解码层;所述通过所述人脸矫正模型对所述待矫正人脸图像进行人脸姿态矫正,得到标准人脸姿态的目标矫正人脸图像,包括:
    通过所述编码层,对所述待矫正人脸图像进行编码,得到初始编码;
    通过所述修正层,基于所述待矫正人脸图像中人脸姿态与标准人脸姿态的偏差,修正所述初始编码,得到目标编码;
    通过所述解码层,解码所述目标编码,得到标准人脸姿态的目标矫正人脸图像;
    其中,所述编码层的参数、所述修正层的参数和所述解码层的参数为,基于判别模型的预测结果所构建的第一损失函数、及人脸属性识别模型的人脸属性识别结果所构建的第二损失函数进行参数更新得到;
    其中,所述预测结果为,所述判别模型对所述人脸矫正模型输出的矫正人脸图像进行真实性预测得到;所述人脸属性识别结果为,所述人脸属性识别模型对所述人脸矫正模型输出的矫正人脸图像进行人脸属性识别得到。
  12. 一种人脸矫正模型的训练装置,所述装置包括:
    人脸姿态矫正模块,配置为通过人脸矫正模型对输入的人脸图像进行人脸姿态矫正,得到标准人脸姿态的矫正人脸图像;其中,所述人脸图像具有至少一个维度的人脸属性;
    预测模块,配置为通过判别模型对所述矫正人脸图像进行真实性预测,得到表征所述矫正人脸图像相较于目标人脸图像的真实性的预测结果,并基于所述预测结果构建第一损失函数;
    属性识别模块,配置为通过人脸属性识别模型,对所述矫正人脸图像针对所述至少一个维度的人脸属性进行人脸属性识别,得到包含所述矫正人脸图像所具有的人脸属性的识别结果,并基于所述识别结果构建第二损失函数;
    参数更新模块,配置为基于所述第一损失函数及所述第二损失函数,对所述人脸矫正模型的模型参数进行更新。
  13. 如权利要求12所述的装置,其中,所述人脸姿态矫正模块,还配置为将任意姿态下的人脸图像输入至所述人脸矫正模型;
    通过所述人脸矫正模型对所述人脸图像进行编码,得到初始图像编码;
    基于所述人脸图像中人脸姿态及所述标准人脸姿态的偏差,修正所述初始图像编码,得到目标图像编码;
    解码所述目标图像编码,得到所述标准人脸姿态的矫正人脸图像。
  14. 如权利要求12所述的装置,其中,所述预测模块,还配置为将所述矫正人脸图像和所述目标人脸图像输入至所述判别模型;
    通过判别模型分别对所述矫正人脸图像和目标人脸图像进行特征提取,得到所述矫正人脸图像对应的矫正人脸特征、及所述目标人脸图像对应的目标人脸特征;
    基于所述矫正人脸特征及所述目标人脸特征,预测得到表征所述矫正人脸图像相较于目标人脸图像的真实性的预测结果。
  15. 如权利要求12所述的装置,其中,所述属性识别模块,还配置为将所述矫正人脸图像和所述至少一个维度的人脸属性所对应的人脸属性标签,输入至所述人脸属性识别模型;
    通过所述人脸属性识别模型,分别对所述矫正人脸图像和各维度的人脸属性标签进行特征提取,得到所述矫正人脸图像对应的矫正人脸特征、及各维度的人脸属性标签所对应的人脸属性特征;
    基于得到的所述矫正人脸特征及人脸属性特征,预测得到包含所述矫正人脸图像所具有的人脸属性的识别结果。
  16. 如权利要求12所述的装置,其中,所述参数更新模块,还配置为分别确定所述第一损失函数的权值和所述第二损失函数的权值;
    基于所述第一损失函数的权值和所述第二损失函数的权值,对所述第一损失函数和所述第二损失函数进行加权求和,得到目标损失函数;
    基于所述目标损失函数,对所述人脸矫正模型的模型参数进行更新。
  17. 如权利要求12所述的装置,其中,所述装置还包括:
    训练样本构建模块,配置为获取目标用户在任意姿态下的人脸图像、所述目标用户在标准人脸姿态下的目标人脸图像,以及所述人脸图像所具有的至少一个维度的人脸属性;
    基于获取的所述人脸图像、所述目标人脸图像及所述人脸图像所具有的人脸属性,构建配置为训练所述人脸矫正模型的训练样本。
  18. 一种人脸矫正装置,所述装置包括:
    获取模块,配置为获取待矫正人脸图像;
    输入模块,配置为将所述待矫正人脸图像输入至人脸矫正模型;
    矫正模块,配置为通过所述人脸矫正模型对所述待矫正人脸图像进行人脸姿态矫正,得到标准人脸姿态的目标矫正人脸图像;
    其中,所述人脸矫正模型基于本申请实施例提供的人脸矫正模型的训练方法训练得到。
  19. 一种电子设备,所述电子设备包括:
    存储器,用于存储可执行指令;
    处理器,用于执行所述存储器中存储的可执行指令时,实现权利要求1至11任一项所述的方法。
  20. 一种计算机可读存储介质,存储有可执行指令,用于被处理器执行时,实现权利要求1至11任一项所述的方法。
PCT/CN2021/098646 2020-09-10 2021-06-07 人脸矫正模型的训练方法、装置、电子设备及存储介质 WO2022052530A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010946586.6A CN112164002B (zh) 2020-09-10 2020-09-10 人脸矫正模型的训练方法、装置、电子设备及存储介质
CN202010946586.6 2020-09-10

Publications (1)

Publication Number Publication Date
WO2022052530A1 true WO2022052530A1 (zh) 2022-03-17

Family

ID=73858430

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/098646 WO2022052530A1 (zh) 2020-09-10 2021-06-07 人脸矫正模型的训练方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN112164002B (zh)
WO (1) WO2022052530A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114944002A (zh) * 2022-06-16 2022-08-26 中国科学技术大学 文本描述辅助的姿势感知的人脸表情识别方法
CN116167922A (zh) * 2023-04-24 2023-05-26 广州趣丸网络科技有限公司 一种抠图方法、装置、存储介质及计算机设备

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112164002B (zh) * 2020-09-10 2024-02-09 深圳前海微众银行股份有限公司 人脸矫正模型的训练方法、装置、电子设备及存储介质
CN112967798A (zh) * 2021-03-22 2021-06-15 平安国际智慧城市科技股份有限公司 基于人脸面容的辅诊方法、装置、电子设备及存储介质
CN113592696A (zh) * 2021-08-12 2021-11-02 支付宝(杭州)信息技术有限公司 加密模型训练、图像加密和加密人脸图像识别方法及装置
CN115115552B (zh) * 2022-08-25 2022-11-18 腾讯科技(深圳)有限公司 图像矫正模型训练及图像矫正方法、装置和计算机设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284738A (zh) * 2018-10-25 2019-01-29 上海交通大学 不规则人脸矫正方法和系统
CN110363116A (zh) * 2019-06-28 2019-10-22 上海交通大学 基于gld-gan的不规则人脸矫正方法、系统及介质
CN110543846A (zh) * 2019-08-29 2019-12-06 华南理工大学 一种基于生成对抗网络的多姿态人脸图像正面化方法
CN110738161A (zh) * 2019-10-12 2020-01-31 电子科技大学 一种基于改进生成式对抗网络的人脸图像矫正方法
CN111046707A (zh) * 2018-10-15 2020-04-21 天津大学青岛海洋技术研究院 一种基于面部特征的任意姿态正脸还原网络
CN111428667A (zh) * 2020-03-31 2020-07-17 天津中科智能识别产业技术研究院有限公司 一种基于解耦表达学习生成对抗网络的人脸图像转正方法
CN112164002A (zh) * 2020-09-10 2021-01-01 深圳前海微众银行股份有限公司 人脸矫正模型的训练方法、装置、电子设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10388002B2 (en) * 2017-12-27 2019-08-20 Facebook, Inc. Automatic image correction using machine learning
CN109308450A (zh) * 2018-08-08 2019-02-05 杰创智能科技股份有限公司 一种基于生成对抗网络的脸部变化预测方法
CN109117801A (zh) * 2018-08-20 2019-01-01 深圳壹账通智能科技有限公司 人脸识别的方法、装置、终端及计算机可读存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046707A (zh) * 2018-10-15 2020-04-21 天津大学青岛海洋技术研究院 一种基于面部特征的任意姿态正脸还原网络
CN109284738A (zh) * 2018-10-25 2019-01-29 上海交通大学 不规则人脸矫正方法和系统
CN110363116A (zh) * 2019-06-28 2019-10-22 上海交通大学 基于gld-gan的不规则人脸矫正方法、系统及介质
CN110543846A (zh) * 2019-08-29 2019-12-06 华南理工大学 一种基于生成对抗网络的多姿态人脸图像正面化方法
CN110738161A (zh) * 2019-10-12 2020-01-31 电子科技大学 一种基于改进生成式对抗网络的人脸图像矫正方法
CN111428667A (zh) * 2020-03-31 2020-07-17 天津中科智能识别产业技术研究院有限公司 一种基于解耦表达学习生成对抗网络的人脸图像转正方法
CN112164002A (zh) * 2020-09-10 2021-01-01 深圳前海微众银行股份有限公司 人脸矫正模型的训练方法、装置、电子设备及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114944002A (zh) * 2022-06-16 2022-08-26 中国科学技术大学 文本描述辅助的姿势感知的人脸表情识别方法
CN114944002B (zh) * 2022-06-16 2024-04-16 中国科学技术大学 文本描述辅助的姿势感知的人脸表情识别方法
CN116167922A (zh) * 2023-04-24 2023-05-26 广州趣丸网络科技有限公司 一种抠图方法、装置、存储介质及计算机设备

Also Published As

Publication number Publication date
CN112164002B (zh) 2024-02-09
CN112164002A (zh) 2021-01-01

Similar Documents

Publication Publication Date Title
WO2022052530A1 (zh) 人脸矫正模型的训练方法、装置、电子设备及存储介质
Liu et al. Hard negative generation for identity-disentangled facial expression recognition
US10354362B2 (en) Methods and software for detecting objects in images using a multiscale fast region-based convolutional neural network
US20210174072A1 (en) Microexpression-based image recognition method and apparatus, and related device
CN108961369B (zh) 生成3d动画的方法和装置
US20190279075A1 (en) Multi-modal image translation using neural networks
CN112395979B (zh) 基于图像的健康状态识别方法、装置、设备及存储介质
CN112784763B (zh) 基于局部与整体特征自适应融合的表情识别方法及系统
CN112800903B (zh) 一种基于时空图卷积神经网络的动态表情识别方法及系统
CN110378208B (zh) 一种基于深度残差网络的行为识别方法
CN112418292B (zh) 一种图像质量评价的方法、装置、计算机设备及存储介质
CN115331769B (zh) 基于多模态融合的医学影像报告生成方法及装置
Natarajan et al. Dynamic GAN for high-quality sign language video generation from skeletal poses using generative adversarial networks
US20220101144A1 (en) Training a latent-variable generative model with a noise contrastive prior
CN109241890B (zh) 面部图像校正方法、装置及存储介质
CN113039555A (zh) 通过使用基于注意力的神经网络在视频剪辑中进行动作分类
WO2024109374A1 (zh) 换脸模型的训练方法、装置、设备、存储介质和程序产品
CN116075830A (zh) 图像到图像转换中语义关系保留的知识提炼
CN117576248B (zh) 基于姿态引导的图像生成方法和装置
Sumalakshmi et al. Fused deep learning based Facial Expression Recognition of students in online learning mode
CN115292439A (zh) 一种数据处理方法及相关设备
CN117541668A (zh) 虚拟角色的生成方法、装置、设备及存储介质
CN116486465A (zh) 用于人脸结构分析的图像识别方法及其系统
CN113822790B (zh) 一种图像处理方法、装置、设备及计算机可读存储介质
CN115690276A (zh) 虚拟形象的视频生成方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21865590

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03.07.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21865590

Country of ref document: EP

Kind code of ref document: A1