CN111079659A

CN111079659A - Face feature point positioning method

Info

Publication number: CN111079659A
Application number: CN201911316378.1A
Authority: CN
Inventors: 谢建
Original assignee: Wuhan Shuixiang Electronic Technology Co ltd
Current assignee: Wuhan Shuixiang Electronic Technology Co ltd
Priority date: 2019-12-19
Filing date: 2019-12-19
Publication date: 2020-04-28

Abstract

The invention provides a face feature point positioning method, which comprises the following steps: (1) designing a neural network for detecting the human face characteristic points; (2) preprocessing training data by taking a plurality of face pictures and corresponding feature point coordinates as the training data; (3) training a neural network by using the training data after preprocessing; (4) carrying out format conversion on the trained neural network by using a neural network deployment framework; (5) packaging the neural network after format conversion into a function interface by using C + + codes; (6) and (5) calling the function interface packaged in the step (5) in the application program of the mobile terminal to detect the face of the image to be detected, and calculating the coordinates of the characteristic points of the face. The invention can solve the problems that the existing Dlib face feature point detection algorithm is slow in speed and not accurate enough at a mobile end, and is unstable and jittered in processing video stream.

Description

Face feature point positioning method

Technical Field

The invention relates to the technical field of computer vision, in particular to a face feature point positioning method.

Background

With the development of computer hardware and internet big data technology, the training of a deep neural network becomes possible, and meanwhile, the mobile internet makes the application market of a mobile terminal hot. The mobile end face makeup application software is popular with the vast mobile end users, the existing makeup application depends on accurate positioning of face characteristic points, coordinates, shapes and sizes of facial features can be obtained through positioning of the face characteristic points, and therefore the effect of makeup can be achieved by adjusting brightness and color through the software.

A C + + open source machine learning algorithm library named Dlib realizes a face feature point detection algorithm. Dlib uses a regression tree based face alignment algorithm that makes the face landmark positions step-by-step regress from the current position to the true position by building a gradient enhanced decision tree (GBDT). Each leaf node of each GBDT stores a residual regression quantity, when the input falls into a node, the residual is added to the input to achieve the purpose of regression, and finally all the residual are superposed together to achieve the purpose of positioning the face feature points. However, the existing Dlib face feature point detection algorithm is slow and not accurate enough at the mobile end, and the Dlib algorithm only supports 68 feature points to detect, so that a larger interframe jitter problem exists when video streams are processed.

Disclosure of Invention

The invention aims to provide a face feature point positioning method, and aims to solve the problems that the existing Dlib face feature point detection algorithm is slow in speed and not accurate enough at a moving end.

The invention is realized by the following steps:

the invention provides a face feature point positioning method, which comprises the following steps:

(1) designing a neural network for detecting the human face characteristic points;

(2) preprocessing training data by taking a plurality of face pictures and corresponding feature point coordinates as the training data;

(3) training a neural network by using the training data after preprocessing;

(4) carrying out format conversion on the trained neural network by using a neural network deployment framework;

(5) packaging the neural network after format conversion into a function interface by using C + + codes;

(6) and (5) calling the function interface packaged in the step (5) in the application program of the mobile terminal to detect the face of the image to be detected, and calculating the coordinates of the characteristic points of the face.

Further, the preprocessing the training data in the step (2) specifically includes:

(2.1) correcting all face pictures and corresponding feature point coordinates in the training data to positions close to the front face, enabling the roll angle of each face to be 0, obtaining a network input picture Fimg and a corresponding key point P1, and taking P1 as a labeled shape;

(2.2)) taking a pair of faces and corresponding key points, and performing certain random transformation on the key point data of one face to obtain randomly transformed key points P2, so that each face has two pieces of corresponding key point information (P1, P2), wherein P1 corresponds to the originally labeled key points, P2 is the key points of P1 after random transformation, and P2 is used as the initial shape;

further, the step (2) further comprises:

(2.3) carrying out normalization processing on the characteristic point coordinates (P1, P2) on the face picture, and carrying out the following transformation on the characteristic point coordinates (x, y)

x'＝x/w

y'＝y/h

Wherein w is the width of the face picture, h is the height of the face picture, (x, y) is the coordinates of the picture feature points before transformation, and (x ', y') is the normalized coordinates after transformation.

Further, the step (3) specifically includes:

(3.1) taking original network input pictures Fimg and P2 as network inputs, taking P1 as a supervision signal of the network, and taking the Euclidean distance between the network output and P1 as a training error to calculate a network feedback gradient;

(3.2) setting training times, keeping the structure of the neural network unchanged, and performing multiple iterative training on the neural network;

and (3.3) testing different parameters on the test data set, measuring the test error through the Euclidean distance between the predicted characteristic point and the real characteristic point, and selecting a group of parameters with the minimum test error as final neural network parameters.

Further, the step (5) specifically includes:

(5.1) detecting the face in the face picture and a plurality of corresponding face key points by using a face detection method, and calculating the angle of the face according to the plurality of key points;

(5.2) correcting the face until the roll angle is 0 according to the face in the step (5.1) and the plurality of key points to obtain Fimg;

(5.3) if the picture or video is the first frame, taking the key point of the average human face of the training set as P2; if the video is not the first frame of the video, the face key point of the previous frame is taken as P2; then correcting the P2 to be 0 according to the P2 calculated in the step (5.1), and obtaining P2'; taking the face pictures Fimg and P2' as network input;

(5.4) calling a neural network by using a function of the neural network deployment frame, and calculating the normalized coordinates of the face picture;

(5.5) transforming the calculated normalized coordinates of the face picture, and returning the transformed coordinates, wherein the transformation formula is as follows:

x＝x'*w

y＝y'*h

wherein, w is the width of the face picture, h is the height of the face picture, (x ', y') is the normalized coordinate before transformation, and (x, y) is the coordinate of the feature point of the transformed picture;

and (5.6) performing inverse transformation on all key points according to the rolling angle calculated in the step (5.1), and mapping to the original image to obtain the key points corresponding to the original face.

Further, the neural network in the step (1) is designed by using a neural network framework Caffe or PyTorch or Tensorflow or MxNe.

Further, the neural network deployment framework used in the step (4) is ncnn or tensoflowlite or TVM.

Compared with the prior art, the invention has the following beneficial effects:

according to the face feature point positioning method provided by the invention, the trained neural network is subjected to format conversion by using the neural network deployment framework, and the neural network after the format conversion is packaged into a function interface by using C + + codes, so that the problems of low speed and inaccuracy of the existing Dlib face feature point detection algorithm at a mobile end and instability and jitter in video stream processing can be solved; the method is faster than the Dlib method, higher in accuracy than the Dlib method, and smaller in Euclidean error between the predicted point and the real point.

Drawings

Fig. 1 is a flowchart of a method for locating face feature points according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, an embodiment of the present invention provides a method for locating facial feature points, including the following steps:

(3) training a neural network by using the training data after preprocessing;

According to the face feature point positioning method provided by the embodiment of the invention, the trained neural network is subjected to format conversion by using the neural network deployment framework, and the neural network after the format conversion is packaged into the function interface by using the C + + code, so that the problems of low speed and inaccuracy at a mobile end and instability and jitter in processing video stream of the existing digital Dlib face feature point detection algorithm can be solved; the method is faster than the Dlib method, higher in accuracy than the Dlib method, and smaller in Euclidean error between the predicted point and the real point.

The above steps are explained in detail below.

In the step (1), the neural network is composed of a network structure and parameters, and since the neural network framework can be used for designing the network structure and training the network parameters, the network structure is designed by using a Caffe neural network framework, and the Caffe neural network framework can also be replaced by PyTorch, Tensorflow or MxNet.

The training data is preprocessed in the step (2), and data enhancement is mainly performed on the training data, namely the human face feature point data; the face feature point data comprises two parts of a face picture and corresponding feature point coordinates.

The pretreatment process specifically comprises the following steps:

(2.1) correcting all face pictures and corresponding feature point coordinates in the training data to positions close to the front face, enabling the roll angle of each face to be 0, so as to reduce network instability factors caused by large angles in a plane, and greatly reducing the problem of feature point jitter, so that a network input picture Fimg and a corresponding key point P1 are obtained, and P1 is used as a mark shape;

(2.2)) taking a pair of faces and corresponding key points, and performing certain random transformation (translation, -5 degrees < rotation <5 degrees, scaling) on the key point data of one face to obtain randomly transformed key points P2, so that each face has two pieces of corresponding key point information (P1, P2), wherein P1 corresponds to the original labeled key points, P2 is the key points of P1 after random transformation, and P2 is used as the initial shape;

further, the step (2) further comprises:

x'＝x/w

y'＝y/h

The step (3) of training the neural network by using the preprocessed training data specifically includes:

(3.2) training a neural network; the neural network is subjected to repeated iterative training by setting the training times, the parameters are continuously updated when the structure of the neural network is unchanged during training, and multiple sets of neural network parameters are generated after multiple times of training;

In the step (4), the trained neural network is converted into the ncnn format by using ncnn (neural network deployment framework). In addition to using ncnn to deploy neural networks, the neural networks may also be deployed using TensorflowLite or TVM.

The step (5) specifically comprises:

(5.1) detecting the face and a plurality of corresponding face key points in the face picture by using a face detection method similar to MTCNN (multiple-term normalized human face noise), specifically 5 face key points (eyes, nose and mouth), and calculating the angle (pitch, yaw and roll) of the face according to the key points;

(5.3) if the picture or video is the first frame, taking the key point of the average human face of the training set as P2; if the video is not the first frame of the video, the face key point of the previous frame is taken as P2; then correcting P2 to be that the roll angle is 0 according to the result calculated in the step (5.1) by P2 to obtain P2' which is used as the initial shape of the face; taking the face picture Fimg and P2' obtained in the step (5.2) as network input;

(5.4) calling a neural network by using the ncnn function, and calculating the normalized coordinates of the face picture;

x＝x'*w

y＝y'*h

In the step (6), the feature point coordinates can be calculated by calling the function interface packaged in the step (5) in the mobile terminal application program.

In summary, the method for positioning the facial feature points provided by the embodiment of the invention has smaller neural network calculation amount, and uses the efficient ncnn framework, so that the method has the characteristics of high speed and high accuracy, and can position the feature points at the mobile terminal more quickly and accurately. The speed of the method is faster than that of the method of Dlib through tests, the accuracy is higher than that of the method of Dlib, key points of a human face in a video stream are more stable, the jitter is smaller, and the Euclidean error between a predicted point and a real point is smaller.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for positioning face feature points is characterized by comprising the following steps:

(3) training a neural network by using the training data after preprocessing;

2. The method for locating facial feature points according to claim 1, wherein the preprocessing the training data in the step (2) specifically comprises:

(2.2)) taking a pair of faces and corresponding key points, and performing certain random transformation on the key point data of one face to obtain randomly transformed key points P2, so that each face has two pieces of corresponding key point information (P1, P2), wherein P1 corresponds to the originally labeled key points, P2 is the key points of P1 after random transformation, and P2 is used as the initial shape.

3. The method of claim 2, wherein the step (2) further comprises:

x'＝x/w

y'＝y/h

4. The method for locating facial feature points as claimed in claim 3, wherein the step (3) specifically comprises:

5. The method for locating facial feature points as claimed in claim 4, wherein the step (5) specifically comprises:

x＝x'*w

y＝y'*h

6. The method for locating face feature points according to claim 1, characterized in that: the neural network in the step (1) is designed by using a neural network framework Caffe or PyTorch or Tensorflow or MxNe.

7. The method for locating face feature points according to claim 1, characterized in that: the neural network deployment framework used in the step (4) is ncnn or TensorflowLite or TVM.