CN110796029B

CN110796029B - Face correction and model training method and device, electronic equipment and storage medium

Info

Publication number: CN110796029B
Application number: CN201910963567.1A
Authority: CN
Inventors: 杨帆
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-10-11
Filing date: 2019-10-11
Publication date: 2022-11-11
Anticipated expiration: 2039-10-11
Also published as: CN110796029A

Abstract

The disclosure relates to a face correction and model training method, a face correction and model training device, an electronic device and a storage medium, relates to the technical field of image processing, and aims to solve the problem of large calculation amount of face detection, and the method comprises the following steps: acquiring an image to be detected containing a human face, and inputting the image to be detected into a human face detection model; acquiring face class characteristics output by the face detection model, wherein the face class characteristics are used for representing the orientation angle range of the face in the image to be detected; determining a rotation direction and a rotation angle for performing direction correction on the face in the image to be detected according to the face type characteristics; the direction of the face in the image to be detected is corrected according to the rotating direction and the rotating angle, and because the output characteristics of the face detection module used in the method are face type characteristics, the orientation angle range of the face in the image to be detected can be directly determined, the face does not need to be rotated in the detection process, and the calculation is simpler.

Description

Face correction and model training method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method and an apparatus for face correction and model training, an electronic device, and a storage medium.

Background

The face detection is the first step of all face-related tasks, and face recognition, face attribute analysis, face key point extraction and the like can be performed only after the face is detected.

For example, in a large data short video platform, there is a large amount of face data, which in addition to the face in the forward direction, also includes the face in the oblique, lateral, or reverse direction, and in any case, the face in the 360 degree direction. Therefore, in an actual image to be detected, there are sometimes a plurality of faces, and directions of the faces are often different, for example, the image shown in fig. 1 includes faces in eight directions, in order to correctly recognize a face target in each direction, a commonly used face correction method at the present stage is to train a model to have a capability of detecting a forward face, then rotate the image in four directions, detect an image in each rotation direction by the model, and finally fuse detection results in the four directions to remove a repeatedly detected face.

However, in the detection process, the image needs to be rotated, and then four detection processes are performed, and generally, the calculation amount of the detection algorithm is large, and a large amount of calculation resources are consumed for performing four detections.

Disclosure of Invention

The present disclosure provides a method and an apparatus for face correction and model training, an electronic device, and a storage medium, so as to solve at least the problem that in the related art, face detection requires four detections for image rotation, and the amount of calculation is large. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a face correction method, including:

acquiring an image to be detected containing a human face, and inputting the image to be detected into a human face detection model;

acquiring face class characteristics output by the face detection model, wherein the face class characteristics are used for representing the orientation angle range of the face in the image to be detected;

determining a rotation direction and a rotation angle for performing direction correction on the face in the image to be detected according to the face type characteristics;

and correcting the direction of the face in the image to be detected according to the rotating direction and the rotating angle.

In an optional implementation manner, the step of determining, according to the face class characteristics, a rotation direction and a rotation angle for performing direction correction on the face in the image to be detected includes:

determining the rotation angle according to the orientation angle range corresponding to the face class characteristic; and

and taking the opposite direction of the target direction as the rotation direction, wherein the target direction is the direction in which a training image used in training the face detection model rotates.

In an optional implementation manner, the step of performing direction correction on the face in the image to be detected according to the rotation direction and the rotation angle includes:

and rotating the image to be detected by the rotation angle according to the rotation direction, and performing direction correction on the face in the image to be detected.

According to a second aspect of the embodiments of the present disclosure, there is provided a face detection model training method, including:

rotating a plurality of identical training images containing human faces, wherein the rotation angles of the training images are different;

determining face class characteristics corresponding to a face orientation angle range to which the face orientation belongs in the rotated training image;

and taking the rotated training image as an input feature of a face detection model, taking the determined face class feature as an output feature of the face detection model, and training the face detection model.

In an alternative embodiment, there is at least a set number of rotated training images per range of orientation angles.

According to a third aspect of the embodiments of the present disclosure, there is provided a face correction apparatus, including:

a first acquisition unit configured to perform acquisition of an image to be detected including a human face and input the image to be detected to a human face detection model;

a second obtaining unit configured to perform obtaining a face class feature output by the face detection model, wherein the face class feature is used for representing an orientation angle range of a face in the image to be detected;

a first determination unit configured to determine a rotation direction and a rotation angle for performing direction correction on a face in the image to be detected according to the face class characteristics;

and the human face correction unit is configured to perform direction correction on the human face in the image to be detected according to the rotation direction and the rotation angle.

In an optional implementation manner, the first determining unit is specifically configured to perform:

determining the rotation angle according to the orientation angle range corresponding to the face class characteristics; and

In an alternative embodiment, the face correction unit is specifically configured to perform:

According to a fourth aspect of the embodiments of the present disclosure, there is provided a face detection model training apparatus, including:

an image rotation unit configured to perform rotation of a plurality of identical training images containing faces, each of which is rotated by a different angle;

a second determining unit configured to perform determining a face class feature corresponding to an orientation angle range to which the orientation of the face in the rotated training image belongs;

a model training unit configured to perform training of the face detection model using the rotated training image as an input feature of the face detection model and using the determined face class feature as an output feature of the face detection model.

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the face correction method according to any one of the first aspect of the embodiments of the present disclosure.

According to a sixth aspect of an embodiment of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the face detection model training method according to any one of the second aspect of the embodiments of the present disclosure.

According to a seventh aspect of the embodiments of the present disclosure, there is provided a non-volatile readable storage medium, where instructions when executed by a processor of an electronic device enable the electronic device to perform the face correction method according to any one of the first aspect of the embodiments of the present disclosure.

According to an eighth aspect of the embodiments of the present disclosure, there is provided a non-transitory readable storage medium, wherein instructions of the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the face detection model training method according to any one of the second aspects of the embodiments of the present disclosure.

According to a ninth aspect of embodiments of the present disclosure, there is provided a computer program product, which, when run on an electronic device, causes the electronic device to perform a method that implements the first aspect of embodiments of the present disclosure described above and any one of the first aspects related thereto.

According to a tenth aspect of the embodiments of the present disclosure, there is provided a computer program product, which, when run on an electronic device, causes the electronic device to perform a method that implements the second aspect of the embodiments of the present disclosure described above and any one of the possible references to the second aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

because the convolutional neural network is utilized to have no rotation invariance, in order to detect the faces in all directions in the image, the face detection model is divided into the orientation angle ranges of a plurality of faces, each orientation angle range corresponds to one category respectively, and the face detection model used in the embodiment of the invention can directly output the face category characteristics, so that the face detection model can be used for detecting the image to be detected to directly determine the orientation of the face, and further the direction of the face in the image to be detected can be corrected according to the determined rotation direction and the determined rotation angle.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a schematic diagram illustrating an image to be detected in accordance with an exemplary embodiment;

FIG. 2 is a flow diagram illustrating a method for training a face detection model in accordance with an exemplary embodiment;

FIG. 3 is a flow diagram illustrating a method of face correction according to an exemplary embodiment;

FIG. 4A is a block diagram illustrating a face detection model in accordance with an exemplary embodiment;

FIG. 4B is a diagram illustrating a categorization by angular range according to an exemplary embodiment;

FIG. 5A is a schematic diagram illustrating a first type of image rotation, according to an exemplary embodiment;

FIG. 5B is a schematic diagram illustrating a second type of image rotation, according to an exemplary embodiment;

FIG. 5C is a schematic diagram illustrating a third image rotation according to an exemplary embodiment;

FIG. 5D is a schematic diagram illustrating a first type of image correction, according to an exemplary embodiment;

FIG. 5E is a schematic diagram illustrating a second type of image correction, according to an exemplary embodiment;

FIG. 5F is a schematic illustration of a third type of image correction shown in accordance with an exemplary embodiment;

FIG. 6 is a flow diagram illustrating a complete method of face detection in accordance with an exemplary embodiment;

FIG. 7 is a flow diagram illustrating a complete method of face correction according to an exemplary embodiment;

FIG. 8 is a block diagram illustrating a face correction device according to an exemplary embodiment;

FIG. 9 is a block diagram of an electronic device shown in accordance with an exemplary embodiment;

FIG. 10 is a block diagram illustrating a face detection model training apparatus in accordance with an exemplary embodiment;

FIG. 11 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Some of the words that appear in the text are explained below:

1. the term "and/or" in the embodiments of the present disclosure describes an association relationship of associated objects, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

2. The term "electronic equipment" in the embodiments of the present disclosure refers to equipment that is composed of electronic components such as integrated circuits, transistors, and electronic tubes, and functions by applying electronic technology (including) software, and includes electronic computers, robots controlled by the electronic computers, numerical control or program control systems, and the like.

3. The term "positive sample" in the embodiments of the present disclosure refers to a target object to be detected by a task, and the target object in face detection is a face in an image, such as faces of different ethnic ages, faces of different expressions, faces wearing different decorations, and the like, and may be a face of any scale.

4. In the embodiment of the present disclosure, the term "negative sample" refers to a background where an object to be detected by a task is located, for example, in human face detection, more than 99% of the background is a non-human face area, for example, a human face may appear in different environments, such as a street, a room, and the like.

5. The term "RetinaFace" in the disclosed embodiments is a powerful single-stage face detector that performs pixel-wise face localization on various face scales using joint supervised and self-supervised multitask learning.

6. The term "tenserflow" in the embodiment of the present disclosure is a system that transmits a complex data structure to an artificial intelligent neural network for analysis and processing, can be used in the field of deep learning of multiple machines such as voice recognition or image recognition, and can be operated on various devices as small as one smart phone and as large as thousands of data center servers.

The application scenario described in the embodiment of the present disclosure is for more clearly illustrating the technical solution of the embodiment of the present disclosure, and does not form a limitation on the technical solution provided in the embodiment of the present disclosure, and as a person having ordinary skill in the art knows, with the occurrence of a new application scenario, the technical solution provided in the embodiment of the present disclosure is also applicable to similar technical problems. Wherein, in the description of the present disclosure, unless otherwise indicated, "plurality" means.

Face detection is a key link in face recognition, which refers to a computer technology for identity identification by analyzing and comparing face visual characteristic information, is the key point of artificial intelligence and mode recognition at present, and is widely applied to the fields of national security, military security, identity recognition, bank and customs monitoring, access control systems, video conferences and the like.

The face detection means that any given image is searched by adopting a certain strategy to determine whether the image contains a face, and if so, the position, size and posture of the face are returned.

In an actual image to be detected, a plurality of faces sometimes exist, the directions of the faces are often different and have fronts, and the faces also deflect a certain angle towards the left shoulder or the right shoulder, so that the face targets in all directions are correctly identified, and the method is a very popular and very challenging algorithm problem in the field of computer vision. Face detection is also one of the most important business scenarios for artificial intelligence algorithms. In order to improve the Artificial Intelligence algorithm and externally display the technical strength of the Artificial Intelligence, many AI (Artificial Intelligence) companies select an open data set to verify the algorithm capability thereof. Among many data sets, the wildface is the largest-scale and most difficult-to-detect face detection data set disclosed in the industry, and the data set contains 2 ten thousand face images, including about 18 thousand face data, and is basically a forward face.

At present, the most effective face detection algorithm is basically based on deep learning, and a deep learning model does not have rotation invariance to images, so that faces in different rotation directions need to learn a specific pattern (pattern) for the model.

Fig. 2 is a flowchart illustrating a face detection model training method according to an exemplary embodiment, as shown in fig. 2, including the following steps.

In step S21, rotating a plurality of identical training images containing human faces, wherein each training image is rotated by a different angle;

in step S22, a face type feature corresponding to a face direction angle range to which the face direction belongs in the rotated training image is determined;

in step S23, the rotated training image is used as an input feature of a face detection model, and the determined face type feature is used as an output feature of the face detection model, so as to train the face detection model.

Fig. 3 is a flow chart illustrating a method of face correction according to an exemplary embodiment, as shown in fig. 3, including the following steps.

In step S31, an image to be detected including a human face is acquired, and the image to be detected is input to a human face detection model;

the face detection model is obtained by training through the training method shown in fig. 2.

In step S32, a face class feature output by the face detection model is obtained, where the face class feature is used to represent an orientation angle range of a face in the image to be detected;

in step S33, determining a rotation direction and a rotation angle for performing direction correction on the face in the image to be detected according to the face type feature;

in step S34, the direction of the face in the image to be detected is corrected according to the rotation direction and the rotation angle.

By the scheme, the convolutional neural network is utilized to have no rotation invariance, in order to detect faces in all directions in the image, the orientation angle ranges of the faces are divided into several face orientation angle ranges, each orientation angle range corresponds to one category, a plurality of identical training images containing the faces are rotated at different angles before the face detection model is trained, the rotated training images are used as input features when the model is trained, and the face category features are used as output features.

In an embodiment of the present disclosure, the face detection model is a neural network model.

In the disclosed embodiment, there are at least a set number of rotated training images in each orientation angle range.

For example, there are 4 orientation angle ranges, which are-45 to 45 degrees, 45 to 135 degrees, 135 to 225 degrees, 225 to 315 degrees, and if the set number is 100, the 4 orientation angle ranges are guaranteed to have 100 rotated training images, respectively, and the orientation of the face in the rotated training images is in the corresponding orientation angle range, for example, in the face detection, in the training images in the range of-45 to 45 degrees, there may be 10 faces with 10, 20 with 20 degrees, 30 with 30 degrees, 30 with 0 degrees, 10 with-10 degrees, and the like.

The output characteristic of the face detection model is a face type characteristic, and the face type characteristic represents the orientation angle range of the face in the image to be detected, so that the orientation of the face can be determined according to the face type characteristic, namely the rotation direction and the rotation angle for performing direction correction on the face in the image to be detected are determined, and further the direction correction can be performed on the face in the image to be detected.

In the embodiment of the present disclosure, in order to achieve a good detection effect, a retinaFace is used as a basic model, and a basic framework of the retinaFace is shown in fig. 4A.

The basic Network structure selects a resnet50 (Residual Neural Network), and selects block1 (p 2), block2 (p 3), block3 (p 4), block4 (p 5), and a convolution layer (c 6/p 6) with stride (step length) of 2 is made on the basis of block4 to output as a characteristic layer.

In the embodiment of the disclosure, blocks 1 to 4 are 4 blocks included in the face detection model, and correspond to 4 feature layers, which are p2 to p5, where c2 to c5 represent convolutional layers, and c6/p6 is both a convolutional layer and a feature layer.

Optionally, assuming that the number of anchors corresponding to each feature layer is 3, the size of the anchors is selected according to the scale of each feature layer, as shown in the following table.

Table 1 Anchor settings table

Wherein, the setting of the anchors in the table 1 is selected according to the dimension of each feature layer, and the larger the dimension of the feature layer is, the larger the anchors are.

The anchor represents a range of a face scale detected by the feature layer, and the larger the anchor is, the larger the range of the detected face scale is, for example, anchor =16 indicates that a face with a size of 16 pixels can be detected, that is, the range of the detectable face scale is about 16x 16, and can be detected by an anchor box (anchor frame) of 16x 16; anchor =25 indicates that a face of 25 pixels in size can be detected, i.e., the range of detectable face dimensions is about 25x 25.

In the embodiment of the present disclosure, the number of negative samples corresponding to different feature layers in the face detection model is determined according to the scale of the feature layer, the number of negative samples corresponding to each feature layer is about the scale of the feature layer, and taking the P2 feature layer as an example, the scale of the P2 feature layer is 160x 160=25600, and thus the number of negative samples corresponding to the P2 feature layer is 25600.

In the embodiment of the present disclosure, the number of negative samples corresponding to each feature layer is approximately the scale of the feature layer, and the larger the scale of the feature layer is, the larger the area is, the larger the number of negative samples corresponding to the feature layer is, as shown in the following table:

feature layer	Dimension	Number of negative examples
			P2	160x 160	25600
P3	80x 80	6400
			P4	40x 40	1600
P5	20x 20	400
			P6	10x 10	100

TABLE 2 characteristic layer Scale Table

Table 2 is an example of determining the number of negative samples corresponding to each feature layer according to the scale of each feature layer, which is enumerated in the embodiment of the present disclosure.

Wherein each anchor corresponds to one classifier, taking a P2 feature layer as an example, three anchors (16, 20.16, 25.40) correspond to 3 classifiers on the P2 feature layer, the number of negative samples corresponding to the 3 classifiers is 25600, the number of negative samples corresponding to the 3 anchors on the P3 feature layer is 6400, and so on, the number of negative samples corresponding to the 3 anchors on the P4 feature layer is 1600, the number of negative samples corresponding to the classifiers on the P5 feature layer is 400, and the number of negative samples corresponding to the classifiers on the P6 feature layer is 100. Wherein, the ratio of the number of negative samples between the respective feature layers refers to the ratio of the number of negative samples corresponding to each feature layer, so that the ratio of the number of negative samples between the respective feature layers is 25600.

In the embodiment of the present disclosure, a negative sample refers to a non-human face region in an image, taking as an example that three anchors (16, 20.16, 25.40) on a P2 feature layer correspond to 3 classifiers, the number of the negative samples corresponding to the 3 classifiers is 25600, and for a classifier corresponding to anchor =16, 25600 negative samples corresponding to the classifier, that is, 25600 non-human face regions with a scale of about 16x 16; for the classifier corresponding to anchor =20.16, 25600 negative samples, namely 25600 non-face regions with a scale of about 20.16x 20.16, are corresponding to the classifier, wherein multiple negative samples can exist in the same target training image, and the number of negative samples refers to the number of non-face regions, but not the number of non-training images.

In the embodiment of the present disclosure, in order to ensure that the classifier corresponding to each anchor can be well trained, it is necessary to ensure that the positive and negative samples of each classifier are relatively balanced, so the ratio of the number of positive samples on each feature layer should also be 256.

In an alternative embodiment, for 640 × 640 images, the range of positive sample pixels detected by the P2 feature layer is 16 to 28, the range of positive sample pixels detected by the P3 feature layer is 28 to 57, the range of positive sample pixels detected by the P4 feature layer is 57 to 114, the range of positive sample pixels detected by the P5 feature layer is 114 to 227, and the range of positive sample pixels detected by the P6 feature layer is 227 to 640.

It should be noted that the number of anchors and negative samples and the like can ensure that the face detection model realizes multi-scale face detection, and can detect faces of different scales, where the sizes of the faces detected by different feature layers are different, the size of the face detected by the P6 feature layer is the largest, and the size of the face detected by the P2 feature layer is the smallest.

Aiming at the network structure and the Anchor setting, data enhancement is firstly carried out before the face detection model is carried out: collecting an open-source face data set, then obtaining a plurality of identical training images containing human faces by copying images in the face data set, rotating the identical training images containing human faces, and enabling the rotation angles of all the training images to be different.

In an alternative embodiment, in order to detect a face in 360 directions, 360 degrees are divided into four directions, as shown in fig. 4B, the four directions correspond to four orientation angle ranges, each face in the orientation angle range represents a category, faces in different directions are taken as different "objects", a forward face corresponds to data in a first category, a 90-degree face corresponds to data in a second category, a 180-degree face corresponds to data in a third category, and a 270-degree face corresponds to data in a fourth category.

Before the face detection model is trained, a plurality of identical face images obtained after copying are rotated by different angles, for example, the face images are rotated by 90 degrees, 180 degrees and 270 degrees clockwise respectively, and then the change of the corresponding face position is calculated, so that face data of 360 degrees can be obtained.

For example, the face images shown in fig. 5A to 5C, wherein the face on the right side in fig. 5A is obtained by clockwise rotating the face on the left side in fig. 5A by 90 degrees; similarly, the face on the right side in fig. 5B is obtained by clockwise rotating the face on the left side in fig. 5B by 180 degrees; the face on the right side in fig. 5C is obtained by clockwise rotating the face on the left side in fig. 5C by 270 degrees, and the rotation direction of the training image used in the face detection model training in this manner is clockwise.

Optionally, the face on the right side in fig. 5C may also be obtained by rotating the face on the left side in fig. 5C by 90 degrees in the counterclockwise direction, the face on the right side in fig. 5B may also be obtained by rotating the face on the left side in fig. 5B by 180 degrees in the counterclockwise direction, and the face on the right side in fig. 5A may also be obtained by rotating the face on the left side in fig. 5A by 270 degrees in the counterclockwise direction.

In an alternative embodiment, a rotated training image is used as an input feature of a face detection model, and a determined face class feature is used as an output feature of the face detection model, when the face detection model is trained, a tenserflow frame is used for training the face detection model, wherein a cos (cosine) function is used as a learning rate variation mode, a classified cross entropy loss function and a position regression L2 loss function (square loss function) are used as loss functions, a weight ratio is 1, and when the loss functions are reduced to be not changed any more, the training can be stopped.

In the embodiment of the present disclosure, the trained face detection model may detect the orientation of the face in the image or directly correct the face in the image.

In an optional implementation manner, the output feature of the face detection model is a face class feature, when the face detection model is used, an image to be detected may be input to the trained face detection model, and a face is further corrected according to the face class feature output by the face detection model.

An optional implementation manner is that, when further correcting a face according to the face class features output by the face detection model, specifically:

determining a rotation angle for performing direction correction on the face in the image to be detected according to the orientation angle range corresponding to the face class characteristics; and taking the opposite direction of the target direction as the rotation direction, wherein the target direction is the direction in which the training image used in the training of the face detection model is rotated.

For example, in the process of training the face detection model, when a plurality of identical training images containing faces are rotated clockwise, the target direction is clockwise, and thus the rotation direction in the correction is the opposite direction of the target direction, that is, the counterclockwise direction; on the contrary, if the training images containing faces are rotated counterclockwise in the training process of the face detection model, the target direction is counterclockwise, and thus the rotation direction in the correction is the opposite direction of the target direction, that is, clockwise.

When the rotation angle during face correction is determined, assuming that the corresponding orientation angle range of the category is-45 (315) -45 degrees, the rotation angle determined according to the orientation angle range is 0 degree, and correction is not needed; the facing angle range corresponding to the category is 45-135 degrees, the rotation angle determined according to the facing angle range is 90 degrees, and the rotation angle is rotated by 90 degrees anticlockwise during correction; the third category corresponds to the orientation angle range of 135-225 degrees, the rotation angle determined according to the orientation angle range is 180 degrees, and the rotation angle is rotated by 180 degrees anticlockwise during correction; the category four corresponds to a heading angle range of 225 to 315 degrees, and the rotation angle determined from the heading angle range is 270 degrees, and the rotation angle is rotated counterclockwise by 270 degrees during correction.

Taking the target direction as a clockwise direction as an example, when the human face detection is carried out, if the human face class characteristics indicate a class one, the human face in the image to be detected can be judged to face within a range of-45 to 45 degrees in the clockwise direction, the human face is indicated in a forward direction, and the determined rotation angle is 0 degree, so that the correction is not needed;

if the detection result human face class characteristics represent class two, the human face in the image to be detected can be judged to face in the range of 45-135 degrees in the clockwise direction, and the human face image of the class can be corrected by rotating 90 degrees in the anticlockwise direction;

if the detection result shows that the face type feature represents type three, the face orientation in the image to be detected can be judged to be a face within the range of 135-225 degrees in the clockwise direction, and the correction can be completed by rotating the face image of the type by 180 degrees in the anticlockwise direction;

if the detection result indicates that the face type feature represents the type four, the face orientation in the image to be detected can be determined to be a face in the range of 225-315 degrees in the clockwise direction, and the face image of the type can be corrected by rotating 270 degrees in the counterclockwise direction.

As shown in fig. 5D, the face orientation is 90 degrees clockwise, and belongs to the orientation angle range corresponding to the category two, so that the face is rotated 90 degrees counterclockwise when the face correction is performed; similarly, as shown in fig. 5E, the face orientation is 180 degrees clockwise, and belongs to the orientation angle range corresponding to category three, so that the face is rotated 180 degrees counterclockwise when performing face correction; as shown in fig. 5F, the face orientation is clockwise 270 degrees, and belongs to the orientation angle range corresponding to the category four, so that the face is rotated counterclockwise by 270 degrees when the face correction is performed again.

The embodiment completes omnidirectional face detection through one model, can directly judge the face orientation, and provides a good preprocessing mode for subsequent face-related tasks.

It should be noted that the calibration method recited in the above embodiment is only an example, and the method of dividing the orientation angle range is also an example, optionally, when performing face calibration, the face may also be rotated counterclockwise by a corresponding angle, for example, if the orientation of the face is 135 degrees clockwise, the face may be rotated counterclockwise by 135 degrees, and the like when performing face calibration.

In this case, the face detection model finally outputs a corrected face image, that is, a face position feature, where the face position feature is determined after performing direction correction on the face in the image to be detected according to the face category feature corresponding to the image to be detected. In the method, the face class characteristic is only used as an intermediate result when the face detection model carries out face correction.

Fig. 6 is a flowchart illustrating a complete method for training a face detection model according to an exemplary embodiment, which specifically includes the following steps:

s600, collecting an open source face detection data set widget face;

s601, copying the face images in the face detection data set for multiple times;

s602, rotating the copied multiple identical training images containing the human faces clockwise by 90 degrees, 180 degrees and 270 degrees respectively to perform data enhancement;

s603, calculating the position change of the face corresponding to the position before and after rotation, and classifying the rotated training images according to the calculation result, wherein the forward face corresponds to the first class of data, the 90-degree face corresponds to the second class of data, the 180-degree face corresponds to the third class of data, and the 260-degree face corresponds to the fourth class of data;

s604, training a face detection model by using a tensoflow frame and 4 types of data prepared in the step S603;

and S605, stopping training when the loss function is reduced to be not changed any more.

The face detection model in step S604 may have a structure as shown in fig. 4A.

It should be noted that, by using the model trained in step S603, a face in any direction can be detected, if the detection result is the first type, it can be determined that the face is a forward face, and if the detection result is the second type, it can be determined that the face is a face within a range of 90 degrees clockwise, and the face is rotated 90 degrees counterclockwise, so that the face can be corrected.

Fig. 7 is a flowchart illustrating a complete method for face correction according to an exemplary embodiment, which specifically includes the following steps:

s700, acquiring an image to be detected containing a human face;

s701, inputting an image to be detected into a trained face detection model;

s702, determining the face type characteristics of the image to be detected through a face detection model;

s703, determining a rotation angle for performing direction correction on the face in the image to be detected according to the orientation angle range corresponding to the face class characteristics, and taking the opposite direction of a target direction as a rotation direction, wherein the target direction is the direction in which a training image used when a face detection model is trained rotates;

and S704, rotating the image to be detected by a rotation angle according to the target direction, and performing direction correction on the face in the image to be detected.

Fig. 8 is a block diagram illustrating a face correction apparatus according to an exemplary embodiment. Referring to fig. 8, the apparatus includes a first acquisition unit 800, a second acquisition unit 801, a first determination unit 802, and a face correction unit 803.

A first acquisition unit 800 configured to perform acquisition of an image to be detected including a human face and input the image to be detected to a human face detection model;

a second obtaining unit 801 configured to determine a rotation direction and a rotation angle for performing direction correction on the face in the image to be detected according to the face class characteristics;

a first determining unit 802 configured to determine a rotation direction and a rotation angle for performing direction correction on a face in the image to be detected according to the face class characteristics;

a face correction unit 803 configured to perform direction correction on the face in the image to be detected according to the rotation direction and the rotation angle.

In an optional implementation manner, the first determining unit 802 is specifically configured to perform:

In an alternative embodiment, the face correction unit 803 is specifically configured to perform:

With regard to the apparatus in the above-described embodiment, the specific manner in which each unit executes the request has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 9 is a block diagram illustrating an electronic device 900 according to an example embodiment, the apparatus comprising:

a processor 910;

a memory 920 for storing instructions executable by the processor 910;

wherein the processor 910 is configured to execute the instructions to implement the face correction method in any of the embodiments of the present disclosure.

In an exemplary embodiment, a storage medium comprising instructions, such as the memory 920 comprising instructions, executable by the processor 910 of the electronic device 900 to perform the above-described method is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The embodiments of the present disclosure further provide a computer program product, which, when run on an electronic device, causes the electronic device to execute any one of the above-mentioned face correction methods or any one of the methods that may be involved in implementing any one of the face correction methods in the embodiments of the present disclosure.

FIG. 10 is a block diagram illustrating a face detection model training apparatus according to an exemplary embodiment. Referring to fig. 10, the apparatus includes an image rotation unit 1000, a second determination unit 1001, and a model training unit 1002:

the image rotating unit 1000 is configured to perform rotation on a plurality of identical training images containing human faces, wherein the rotation angle of each training image is different;

the second determining unit 1001 is configured to determine a face class feature corresponding to an orientation angle range to which the orientation of the face in the rotated training image belongs;

the model training unit 1002 is configured to perform training on the face detection model by using the rotated training image as an input feature of the face detection model and using the determined face class feature as an output feature of the face detection model.

Fig. 11 is a block diagram illustrating an electronic device 1100 according to an example embodiment, the apparatus comprising:

a processor 1110;

a memory 1120 for storing instructions executable by the processor 1110;

wherein the processor 1110 is configured to execute the instructions to implement the face detection model training method according to any one of the embodiments of the present disclosure.

In an exemplary embodiment, a storage medium comprising instructions, such as the memory 1120 comprising instructions, executable by the processor 1110 of the electronic device 1100 to perform the method described above is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The embodiment of the present disclosure further provides a computer program product, which, when running on an electronic device, enables the electronic device to execute any one of the above methods for training a face detection model or any one of methods for training a face detection model, which may be involved in implementing the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A face correction method, comprising:

acquiring an image to be detected containing a human face, inputting the image to be detected into a human face detection model, and detecting the human face in the image to be detected based on a characteristic layer in the human face detection model; the face detection model comprises a feature layer for detecting faces with different scales and is used for realizing the detection of the faces with different scales;

acquiring face class characteristics output by the face detection model and aiming at the detected face in the image to be detected, wherein the face class characteristics are used for representing the orientation angle range of the face in the image to be detected;

determining a rotation direction and a rotation angle for performing direction correction on the face in the image to be detected according to the face class characteristics;

and correcting the direction of the detected face in the image to be detected according to the rotating direction and the rotating angle.

2. The method according to claim 1, wherein the step of determining a rotation direction and a rotation angle for performing direction correction on the detected face in the image to be detected according to the face classification features comprises:

3. The method for correcting human face according to claim 2, wherein the step of correcting direction of the detected human face in the image to be detected according to the rotation direction and the rotation angle comprises:

and rotating the image to be detected by the rotation angle according to the rotation direction, and performing direction correction on the detected face in the image to be detected.

4. A face detection model training method is characterized by comprising the following steps:

and taking the rotated training image as an input feature of a face detection model, taking the determined face class feature as an output feature of the face detection model, training the face detection model, and obtaining the trained face detection model, wherein the face detection model comprises a feature layer for detecting faces with different scales, and is used for carrying out direction correction on the detected face in the image to be detected based on the feature layer in the face detection model after detecting the face in the image to be detected.

5. The training method of the face detection model according to claim 4, wherein there are at least a set number of rotated training images in each orientation angle range.

6. A face correction apparatus, comprising:

the image processing device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is configured to acquire an image to be detected containing a human face, input the image to be detected into a human face detection model and detect the human face in the image to be detected based on a feature layer in the human face detection model; the face detection model comprises a feature layer for detecting faces with different scales and is used for realizing the detection of the faces with different scales;

a second obtaining unit configured to perform obtaining a face class feature output by the face detection model and aiming at the detected face in the image to be detected, wherein the face class feature is used for representing an orientation angle range of the face in the image to be detected;

and the human face correction unit is configured to perform direction correction on the detected human face in the image to be detected according to the rotation direction and the rotation angle.

7. The face correction device according to claim 6, characterized in that the first determination unit is specifically configured to perform:

8. The face correction apparatus according to claim 6, characterized in that the face correction unit is specifically configured to perform:

9. A face detection model training device is characterized by comprising:

and the model training unit is configured to execute the operation of taking the rotated training image as the input characteristic of the face detection model and taking the determined face class characteristic as the output characteristic of the face detection model, and then the face detection model is trained to obtain the trained face detection model, wherein the face detection model comprises a characteristic layer for detecting faces with different scales, and the direction of the face in the image to be detected is corrected after the face in the image to be detected is detected based on the characteristic layer in the face detection model.

10. The training apparatus for face detection model according to claim 9, wherein there are at least a predetermined number of rotated training images in each orientation angle range.

11. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the face correction method of any one of claims 1 to 3.

12. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the face detection model training method as claimed in claim 4 or claim 5.

13. A storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the face correction method of any one of claims 1 to 3.

14. A storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform a face detection model training method as claimed in claim 4 or claim 5.