CN111860362A

CN111860362A - Method and device for generating human face image correction model and correcting human face image

Info

Publication number: CN111860362A
Application number: CN202010720935.2A
Authority: CN
Inventors: 王珂尧
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-07-24
Filing date: 2020-07-24
Publication date: 2020-10-30

Abstract

The application discloses a method and a device for generating a face image correction model and a method and a device for correcting a face image, and relates to the technical field of face recognition. The specific implementation scheme is as follows: acquiring a sample set, wherein each sample in the sample set comprises a front face image of the same person and a side face image of any posture angle; selecting samples from the sample set, and performing the following training steps: inputting the side face image of the selected sample to generate a confrontation network to obtain a composite image; analyzing the synthesized image and the corresponding frontal face image to determine a face characteristic loss value; and if the fact that the generation of the confrontation network is finished is determined according to the face characteristic loss value, the generated confrontation network is used as a face image correction model. The embodiment can generate the facial image correction model, and the facial images at different angles are converted into the frontal face image and then subjected to expression recognition, so that the accuracy and the robustness of the expression recognition are improved.

Description

Method and device for generating human face image correction model and correcting human face image

Technical Field

The application relates to the technical field of computers, in particular to the technical field of face recognition.

Background

Most of laboratory samples are collected by the problems that the human face is over against the camera, the head posture is correct, the postures of the human face are various and the like. And in a real scene, the human face expression is generated spontaneously, the deviation of the head posture is large, and the difficulty of identification is increased.

At present, the facial expression recognition generally uses a traditional method or a single-model convolutional neural network, a facial expression image after face correction is used as input, expression features are extracted through the convolutional neural network or manually, and then the expression recognition classification result is obtained through classifier output. The robustness is poor when the face posture in a real scene is too large, and false recognition is easily caused, so that the accuracy of the algorithm is reduced.

Disclosure of Invention

The present disclosure provides a method, an apparatus, a device and a storage medium for generating a face image correction model, and a method, an apparatus, a device and a storage medium for correcting a face image.

According to a first aspect of the present disclosure, there is provided a method for generating a face image correction model, including: acquiring a sample set, wherein each sample in the sample set comprises a front face image of the same person and a side face image of any posture angle; selecting samples from the sample set, and performing the following training steps: inputting the side face image of the selected sample to generate a confrontation network to obtain a composite image; analyzing the synthesized image and the corresponding frontal face image to determine a face characteristic loss value; and if the fact that the generation of the confrontation network is finished is determined according to the face characteristic loss value, the generated confrontation network is used as a face image correction model.

According to a second aspect of the present disclosure, there is provided a method of correcting a face image, comprising: inputting a head portrait to be recognized into a face detection model to obtain a face image; inputting a face image into a face key point detection model to obtain an aligned face comprising key points; inputting the aligned face into the face image correction model generated by the method in the first aspect, and obtaining a pose-corrected front face image.

According to a third aspect of the present disclosure, there is provided an apparatus for generating a face image correction model, comprising: an acquisition unit configured to acquire a sample set, wherein each sample in the sample set includes a front face image of the same person and a side face image of an arbitrary pose angle; the training unit is configured to input the side face image of the selected sample into a generation countermeasure network to obtain a composite image; analyzing the synthesized image and the corresponding frontal face image to determine a face characteristic loss value; and if the fact that the generation of the confrontation network is finished is determined according to the face characteristic loss value, the generated confrontation network is used as a face image correction model.

According to a fourth aspect of the present disclosure, there is provided an apparatus for correcting a face image, comprising: the face detection unit is configured to input a head portrait to be recognized into the face detection model to obtain a face image; a key point detection unit configured to input a face image into a face key point detection model to obtain an aligned face including key points; a correction unit configured to input the aligned face into a face image correction model generated using the apparatus according to one of the first aspects, resulting in a pose-corrected frontal face image.

According to a fifth aspect of the present disclosure, there is provided an electronic apparatus, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first and second aspects.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions, characterized in that the computer instructions are for causing a computer to perform the method of any one of claims 1-7.

According to the technical scheme of the application, the method for generating the countermeasure network (GAN) can be used for converting the facial image in any posture into the facial expression image in the front posture, and the accuracy and robustness of the facial expression recognition in the large posture in the complex environment can be greatly improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for generating a correction model for a face image according to the present application;

FIG. 3 is a schematic diagram of an application scenario of the method for generating a face image correction model according to the present application;

FIG. 4 is a flow chart of one embodiment of a method for correcting a face image according to the present application;

FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for generating a face image correction model according to the present application;

FIG. 6 is a schematic structural diagram of an embodiment of an apparatus for correcting a face image according to the present application;

fig. 7 is a block diagram of an electronic device for implementing the method for generating a face image correction model and the method for correcting a face image according to the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 illustrates an exemplary system architecture 100 to which a method of generating a face image correction model, an apparatus for generating a face image correction model, a method of correcting a face image, or an apparatus for correcting a face image according to an embodiment of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminals

101, 102, a network 103, a database server 104, and a server 105. The network 103 serves as a medium for providing communication links between the

terminals

101, 102, the database server 104 and the server 105. Network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user 110 may use the

terminals

101, 102 to interact with the server 105 over the network 103 to receive or send messages or the like. The

terminals

101 and 102 may have various client applications installed thereon, such as a model training application, a face detection and recognition application, a shopping application, a payment application, a web browser, an instant messenger, and the like.

Here, the

terminals

101 and 102 may be hardware or software. When the

terminals

101 and 102 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), laptop portable computers, desktop computers, and the like. When the

terminals

101 and 102 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

When the

terminals

101, 102 are hardware, an image capturing device may be further mounted thereon. The image acquisition device can be various devices capable of realizing the function of acquiring images, such as a camera, a sensor and the like. The user 110 may use the image capturing device on the

terminal

101, 102 to capture the facial image of himself or another person.

Database server 104 may be a database server that provides various services. For example, a database server may have a sample set stored therein. The sample set contains a large number of samples. Wherein, the sample can comprise a front face image of the same person and a side face image of any posture angle. In this way, the user 110 may also select samples from a set of samples stored by the database server 104 via the

terminals

101, 102.

The server 105 may also be a server providing various services, such as a background server providing support for various applications displayed on the

terminals

101, 102. The background server may train the initial model using samples in the sample set sent by the

terminals

101 and 102, and may send a training result (such as the generated face image correction model) to the

terminals

101 and 102. In this way, the user can apply the generated face image correction model to perform face detection.

Here, the database server 104 and the server 105 may be hardware or software. When they are hardware, they can be implemented as a distributed server cluster composed of a plurality of servers, or as a single server. When they are software, they may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the method for generating a face image correction model or the method for correcting a face image provided in the embodiment of the present application is generally performed by the server 105. Accordingly, a device for generating a face image correction model or a device for correcting a face image is also generally provided in the server 105.

It is noted that database server 104 may not be provided in system architecture 100, as server 105 may perform the relevant functions of database server 104.

It should be understood that the number of terminals, networks, database servers, and servers in fig. 1 are merely illustrative. There may be any number of terminals, networks, database servers, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow diagram 200 of one embodiment of a method of generating a correction model for a face image in accordance with the present application is shown. The method for generating the face image correction model can comprise the following steps:

step 201, a sample set is obtained.

In the present embodiment, the implementation subject (e.g., the server 105 shown in fig. 1) of the method of generating a face image correction model may acquire a sample set in various ways. For example, the executing entity may obtain the existing sample set stored therein from a database server (e.g., database server 104 shown in fig. 1) via a wired connection or a wireless connection. As another example, a user may collect a sample via a terminal (e.g.,

terminals

101, 102 shown in FIG. 1). In this way, the executing entity may receive samples collected by the terminal and store the samples locally, thereby generating a sample set.

Here, the sample set may include at least one sample. Wherein, the sample can comprise a front face image of the same person and a side face image of any posture angle. When the GAN (generic adaptive Networks, generation countermeasure network) is trained, a facial expression image with a front face posture angle of 0 degrees and a facial expression image with any posture angle of the same person are input, so that the GAN learns to generate the facial expression image with the front face posture angle of 0 degrees through the facial expression image with any posture angle, and meanwhile, the character id characteristic and the expression characteristic are ensured not to change. Multi-PIE can be used as a training set of GAN, and facial expression images at different pose angles (pose angles from-90 ° to 90 °) can be obtained assuming that the face pose angle is 0 °. In addition, the facial expressions are classified into 7 basic types of expressions according to changes in facial muscles, anger (Angry), Disgust (dispust), Fear (Fear), happy (happenses), Sadness (Sadness), Surprise (surrise), and Neutral (Neutral). Meanwhile, the face is defined to contain 72 key points which are respectively (x)₁,y₁)…(x₇₂,y₇₂) The schematic diagram is shown in fig. 3.

The generation of the samples is as follows: carrying out image preprocessing on images at different angles acquired by a camera, firstly obtaining an image containing a human face, and detecting the human face through a detection model to obtain an approximate position area of the human face; the detection model can be an existing face detection model, and the face position can be detected. Secondly, detecting key points of the face through a face key point detection model according to the detected face area to obtain key point coordinate values of the face; wherein, the human face key point detection model is an existing model, the existing model is called, the image of the detected human face is input, and 72 human face key point coordinates are obtained, which are respectively (x)₁,y₁)…(x₇₂,y₇₂). Then, the face alignment is carried out on the target face according to the key point coordinate value of the face, meanwhile, the region only containing the face is intercepted through affine transformation and is adjusted to the same size, for example, 224x224, and the face key point coordinate is also remapped to a new coordinate according to an affine transformation matrix. The front face image and the side face image in the sample comprise the coordinates of key points of the human face.

Optionally, the obtained region including the face image may be subjected to image normalization processing. In this embodiment, the image normalization processing is performed on each pixel in the image in sequence, and the normalization processing method may be: the pixel value of each pixel is subtracted by 128 and divided by 256 to bring the pixel value of each pixel between-0.5, 0.5. Therefore, the complexity of image processing can be reduced, and the processing speed can be improved.

Optionally, the normalized image may be subjected to random data enhancement. Such as rotation, scaling, cropping, flipping, etc. This may increase the number of training samples, making the trained model more robust.

At step 202, a sample is selected from a sample set.

In this embodiment, the executing subject may select a sample from the sample set obtained in step 201, and perform the training steps from step 203 to step 206. The selection manner and the number of samples are not limited in the present application. For example, at least one sample may be randomly selected, or a sample with better sharpness (i.e., higher pixels) of the face image may be selected.

And step 203, inputting the side face image of the selected sample into a generation countermeasure network to obtain a composite image.

In this embodiment, the generation countermeasure Network may be a deep convolution generated countermeasure Network (DCGAN). The generation of the countermeasure network may include a generation network configured to perform pose adjustment on the input image and output an adjusted image, and a determination network configured to determine whether the input image is an image output by the generation network. The generation network may be a convolutional neural network for performing image processing (for example, various convolutional neural network structures including a convolutional layer, a pooling layer, an anti-pooling layer, and an anti-convolutional layer, and may perform down-sampling and up-sampling in sequence); the discriminative network may be a convolutional neural network (e.g., various convolutional neural network structures that include a fully-connected layer, where the fully-connected layer may perform a classification function). In addition, the above discriminant network may be other model structures that can be used to implement the classification function, such as a Support Vector Machine (SVM). It should be noted that the image output by the above generation network can be expressed by a matrix of RGB three channels. Here, the determination network may output 1 if it determines that the input image is an image (from the generated data) output by the generation network; if it is determined that the input image is not an image (from the real data, i.e., the second image) output by the generation network, 0 may be output. The discrimination network may output other values based on a preset value, and is not limited to 1 and 0.

Based on a machine learning method, a side face image in a sample is used as an input of the generation network, an image output by the generation network and a front face image in the sample are used as inputs of the discrimination network, the generation network and the discrimination network are trained, and the generation confrontation network after training is determined as a face image correction model. Specifically, the parameters of any one of the generation network and the discrimination network (which may be referred to as a first network) may be fixed first, and the network with unfixed parameters (which may be referred to as a second network) may be optimized; and fixing the parameters of the second network to improve the first network. The iteration is continued so that the discrimination network cannot distinguish whether the input image is generated by the generation network until the final convergence. At this time, the image generated by the generation network is close to the second image, and the discrimination network cannot accurately distinguish the real data from the generated data (i.e., the accuracy is 50%), so that the generation network at this time can be determined as the face image correction model.

And 204, analyzing the synthesized image and the corresponding front face image to determine a face characteristic loss value.

In this embodiment, the execution subject may analyze a composite image generated from a side face image of a sample and a front face image of the sample, so that a facial feature loss value may be determined. For example, the coordinates of the key point in the composite image and the coordinates of the key point in the face image may be used as parameters, and the parameters may be input to a predetermined loss function (loss function), so that a loss value between the two may be calculated.

In this embodiment, the loss function is generally used to measure the degree of disparity between the predicted value (e.g., the synthesized image) and the actual value (e.g., the front face image) of the model. It is a non-negative real-valued function. In general, the smaller the loss function, the better the robustness of the model. The loss function may be set according to actual requirements.

And step 205, if the fact that the training of the generated confrontation network is finished is determined according to the face characteristic loss value, the generated confrontation network is used as a face image correction model.

In this embodiment, if a plurality of samples are selected in step 202, the execution subject may determine that the training of the generative confrontation network is complete if the loss value of the facial feature of each sample reaches the target value. As another example, the executive may count the proportion of samples whose facial feature loss values reach the target value to the selected samples. And when the ratio reaches a preset sample ratio (such as 95%), it can be determined that the generation of the confrontation network is finished.

Step 206, if it is determined that the generation of the countermeasure network is not finished according to the facial feature loss value, adjusting the relevant parameters in the generation of the countermeasure network, and continuing to execute step 202-206.

In this embodiment, in response to determining that the face feature loss value does not reach the target value, the electronic device may adjust parameters of the generation countermeasure network, i.e., update the generation network and/or the discrimination network, and then reuse the updated generation network and discrimination network to re-execute the training step. Therefore, parameters of the face image correction model obtained by the generative confrontation network training are obtained based on the training samples and can be determined based on the back propagation of the discrimination network, the training of the generative model can be realized without depending on a large number of labeled samples, the human cost is reduced, and the flexibility of image processing is further improved.

With further reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for generating a face image correction model according to the present embodiment. In the application scenario of fig. 3, a terminal used by a user may have a model training application installed thereon. When a user opens the application and uploads a sample set or a storage path of the sample set, a server providing background support for the application can run a method for generating a face image correction model, and the method comprises the following steps:

first, a sample set may be obtained. The samples in the sample set may include a front face image and a side face image at any pose angle. Thereafter, samples may be selected from the sample set, and the following training steps performed: inputting the side face image of the selected sample to generate a confrontation network to obtain a composite image; analyzing the synthesized image and the corresponding frontal face image to determine a face characteristic loss value; and if the fact that the generation of the confrontation network is finished is determined according to the face characteristic loss value, the generated confrontation network is used as a face image correction model.

At this time, the server may also send prompt information indicating that the model training is completed to the terminal. The prompt message may be a voice and/or text message. In this way, the user can acquire the face image correction model at a preset storage position.

The method provided by the above embodiment of the present disclosure may generate a face image correction model. Therefore, the side face image can be converted into the front face image, and the accuracy of face recognition can be improved. The method for generating the countermeasure network (GAN) can convert the facial image in any posture into the facial expression image in the front posture, and can greatly improve the accuracy and robustness of the facial expression recognition in the large posture in the complex environment.

With further reference to fig. 4, a flow 400 of an embodiment of a method of correcting a face image is shown. The process 400 of the method for correcting a face image includes the following steps:

step 401, inputting a head portrait to be recognized into a face detection model to obtain a face image.

In the present embodiment, the execution subject (e.g., the server 105 shown in fig. 1) of the method of correcting a face image may acquire an avatar of a detection target in various ways, and the avatar may include other parts such as a neck in addition to a face. For example, the execution subject may obtain the image including the human face stored therein from a database server (e.g., database server 104 shown in fig. 1) through a wired connection or a wireless connection. As another example, the executing entity may also receive an avatar captured by a terminal (e.g.,

terminals

101, 102 shown in fig. 1) or other device.

In the present embodiment, the detection object may be any user, such as a user using a terminal, or another user who appears in the image capturing range, or the like. The avatar may also be a color image and/or a grayscale image, etc. And the format of the avatar is not limited in this application.

Detecting the head portrait to be recognized through a detection model to obtain an approximate position area of the face; the detection model is an existing face detection model and can detect the face position.

Step 402, inputting the face image into the face key point detection model to obtain an aligned face including key points.

In this embodiment, according to a detected face region, detecting a face key point through a face key point detection model to obtain a key point coordinate value of a face; the face key point detection model is an existing model, the existing model is called, an image of a detected face is input, and 72 face key point coordinates are obtained and are respectively (x1, y1) … (x72, y 72). And then, carrying out face alignment on the target face according to the key point coordinate value of the face, simultaneously intercepting only a face region through affine transformation, adjusting the face region to the same ruler 224x224, and remapping the face key point coordinates to new coordinates according to an affine transformation matrix.

Step 403, inputting the aligned face into the face image correction model to obtain a pose-corrected front face image.

In this embodiment, the aligned face is input to the GAN trained in

step

201 and 206, and a front face image with its posture corrected by the GAN is obtained.

Optionally, the front face image may be input into a pre-trained expression detection model, and expression information may be output. The expression detection model is a convolutional neural network and is used for extracting expression features, the convolutional neural network comprises 8 convolutional layers and 5 maximum pooling layers, and finally face 7 classification expression information is obtained through a full connection layer. Facial expressions are classified into 7 types of basic expressions, anger (Angry), Disgust (distust), Fear (Fear), happy (happenses), Sadness (Sadness), Surprise (surrise), and Neutral (Neutral) according to changes in facial muscles.

Optionally, the method may further acquire scene information of the currently acquired avatar, and then perform service quality evaluation according to the expression information and the scene information. And auxiliary functions such as information recommendation and the like can be realized according to the service quality evaluation result. The improvement of the precision is beneficial to improving the service quality of a plurality of applications, for example, in the aspect of advertisement putting, the auxiliary recommendation of a search result which is more in line with the user requirement and the accurate advertisement putting are facilitated; in the aspect of distance education, the emotion recognition of students is facilitated to improve teaching contents and improve the quality of distance education; in the monitoring scene of the driver, the emotion of the driver can be recognized, and the driver is prompted correspondingly, so that the safety of the driver is guaranteed.

It should be noted that the method for correcting a face image according to the present embodiment can be used to test the face image correction model generated according to the above embodiments. And then the face image correction model can be continuously optimized according to the test result. The method may also be a practical application method of the face image correction model generated by the above embodiments. The face image correction model generated by the embodiments is adopted to correct the face image, which is beneficial to improving the performance of face recognition. If more faces are found, the found face information is more accurate, and the like.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of an apparatus for generating a face image correction model, where the apparatus embodiment corresponds to the method embodiment shown in fig. 2, and the apparatus may be applied to various electronic devices.

As shown in fig. 5, the apparatus 500 for generating a face image correction model according to the present embodiment includes: an acquisition unit 501 and a training unit 502. The acquiring unit 501 is configured to acquire a sample set, where each sample in the sample set includes a front face image of the same person and a side face image of any pose angle; a training unit 502 configured to select samples from a sample set and to perform the following training steps: inputting the side face image of the selected sample to generate a confrontation network to obtain a composite image; analyzing the synthesized image and the corresponding frontal face image to determine a face characteristic loss value; and if the fact that the generation of the confrontation network is finished is determined according to the face characteristic loss value, the generated confrontation network is used as a face image correction model.

In some optional implementations of this embodiment, the apparatus 500 further includes an adjusting unit 503 configured to: if the fact that the generation of the confrontation network is not trained is determined according to the face feature loss value, relevant parameters in the generation of the confrontation network are adjusted, samples are reselected from the sample set, the adjusted generation of the confrontation network is used as the generation of the confrontation network, and the training step is continuously executed.

In some optional implementations of the present embodiment, the front face image and the side face image in each sample in the sample set are subjected to normalization processing.

In some optional implementations of this embodiment, the samples in the sample set are subjected to the random data enhancement processing by at least one of: rotating, zooming, clipping and turning.

With further reference to fig. 6, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for correcting a face image, which corresponds to the method embodiment shown in fig. 4, and which is particularly applicable to various electronic devices.

As shown in fig. 6, the apparatus 600 for correcting a face image of the present embodiment includes: a face detection unit 601, a key point detection unit 602, and a correction unit 603. The face detection unit 601 is configured to input a head portrait to be recognized into a face detection model, so as to obtain a face image; a key point detection unit 602 configured to input a face image into a face key point detection model, resulting in an aligned face including key points; a correction unit 603 configured to input the aligned faces into a face image correction model generated using the apparatus according to any one of claims 1 to 4, resulting in a pose-corrected frontal face image.

In some optional implementations of this embodiment, the apparatus 600 further includes an expression detection unit 604 configured to: and inputting the front face image into a pre-trained expression detection model, and outputting expression information.

In some optional implementations of this embodiment, the apparatus 600 further comprises an evaluation unit (not shown in the drawings) configured to: acquiring scene information; and performing service quality evaluation according to the expression information and the scene information.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 7 is a block diagram of an electronic device for generating a face image correction model and a method for correcting a face image according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 7, the electronic apparatus includes: one or more processors 701, a memory 702, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 7, one processor 701 is taken as an example.

The memory 702 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method for generating a correction model of a facial image and the method for correcting a facial image provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of generating a correction model for a face image and the method of correcting a face image provided by the present application.

The memory 702 serves as a non-transitory computer readable storage medium and may be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., xx module X01, xx module X02, and xx module X03 shown in fig. 5) corresponding to the method for generating a face image correction model and the method for correcting a face image in the embodiments of the present application. The processor 701 executes various functional applications of the server and data processing, namely, a method of generating a face image correction model and a method of correcting a face image in the above-described method embodiments, by running a non-transitory software program, instructions and modules stored in the memory 702.

The memory 702 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the electronic device that generates the face image correction model and the method of correcting the face image, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 702 may optionally include memory located remotely from the processor 701, and such remote memory may be coupled through a network to an electronic device that generates the method for correcting a model of a facial image and the method for correcting a facial image. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method of generating a face image correction model and the method of correcting a face image may further include: an input device 703 and an output device 704. The processor 701, the memory 702, the input device 703 and the output device 704 may be connected by a bus or other means, and fig. 7 illustrates an example of a connection by a bus.

The input device 703 may receive input numeric or character information, and generate key signal inputs related to user settings and function control of the electronic apparatus for the method of generating the face image correction model and the method of correcting the face image, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 704 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of generating a correction model for a face image, comprising:

acquiring a sample set, wherein each sample in the sample set comprises a front face image of the same person and a side face image of any posture angle;

selecting samples from the sample set, and performing the following training steps: inputting the side face image of the selected sample to generate a confrontation network to obtain a composite image; analyzing the synthesized image and the corresponding frontal face image to determine a face characteristic loss value; and if the generated confrontation network is determined to be trained completely according to the facial feature loss value, taking the generated confrontation network as a facial image correction model.

2. The method of claim 1, wherein the method further comprises:

and if the generated confrontation network is determined not to be trained completely according to the facial feature loss value, adjusting relevant parameters in the generated confrontation network, reselecting a sample from the sample set, using the adjusted generated confrontation network as the generated confrontation network, and continuing to execute the training step.

3. The method of claim 1, wherein the front face image and the side face image in each sample in the sample set are normalized.

4. The method of claims 1-3, wherein the samples in the sample set are randomly data enhanced by at least one of:

rotating, zooming, clipping and turning.

5. A method of correcting a face image, comprising:

inputting a head portrait to be recognized into a face detection model to obtain a face image;

inputting the face image into a face key point detection model to obtain an aligned face comprising key points;

inputting the aligned face into a face image correction model generated by the method according to any one of claims 1 to 4, and obtaining a posture-corrected front face image.

6. The method of claim 5, wherein the method further comprises:

and inputting the front face image into a pre-trained expression detection model, and outputting expression information.

7. The method of claim 6, wherein the method further comprises:

acquiring scene information;

and performing service quality evaluation according to the expression information and the scene information.

8. An apparatus for generating a correction model of a face image, comprising:

an acquisition unit configured to acquire a sample set, wherein each sample in the sample set includes a front face image of the same person and a side face image of an arbitrary pose angle;

a training unit configured to select samples from the set of samples and to perform the following training steps: inputting the side face image of the selected sample to generate a confrontation network to obtain a composite image; analyzing the synthesized image and the corresponding frontal face image to determine a face characteristic loss value; and if the generated confrontation network is determined to be trained completely according to the facial feature loss value, taking the generated confrontation network as a facial image correction model.

9. The apparatus of claim 8, wherein the apparatus further comprises an adjustment unit configured to:

10. The apparatus of claim 8, wherein the front and side face images in each sample in the sample set are normalized.

11. The apparatus as recited in claims 8-10, wherein the samples of the sample set are randomly data enhanced by at least one of:

rotating, zooming, clipping and turning.

12. An apparatus for correcting a face image, comprising:

the face detection unit is configured to input a head portrait to be recognized into the face detection model to obtain a face image;

a key point detection unit configured to input the face image into a face key point detection model to obtain an aligned face including key points;

a correction unit configured to input the aligned faces into a face image correction model generated using the apparatus according to any one of claims 1 to 4, resulting in a pose-corrected frontal face image.

13. The apparatus of claim 12, further comprising an expression detection unit configured to:

14. The apparatus of claim 13, wherein the apparatus further comprises an evaluation unit configured to:

acquiring scene information;

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.