CN113792633A

CN113792633A - Face tracking system and method based on neural network and optical flow method

Info

Publication number: CN113792633A
Application number: CN202111036198.5A
Authority: CN
Inventors: 侯堃; 左敏; 魏伟; 任翰驰; 胡怡; 张青川; 曹先哲
Original assignee: Beijing Technology and Business University
Current assignee: Beijing Technology and Business University
Priority date: 2021-09-06
Filing date: 2021-09-06
Publication date: 2021-12-14
Anticipated expiration: 2041-09-06
Also published as: CN113792633B

Abstract

The invention provides a super real-time face stable tracking system and a tracking method based on a deep neural network and an optical flow method, wherein a first frame face frame is determined through the deep neural network; comparing the current picture frame with the previous frame by an optical flow method to obtain a current frame face frame; verifying whether an image obtained by an optical flow method is a human face or not by using a deep neural network every few frames; and tracking and verifying, and repeatedly executing the face detection program when any stage fails until the face enters the tracking program. The tracking system can realize the long-term stable face tracking in a super real-time manner so as to improve the practical application effect of the face detection and tracking method in various fields.

Description

Face tracking system and method based on neural network and optical flow method

Technical Field

The invention belongs to the technical field of face tracking, and particularly relates to a super-real-time face stable tracking system and a super-real-time face stable tracking method based on a deep neural network and an optical flow method.

Background

The rapid development of the deep neural network greatly drives face-based related applications, such as beauty cameras, security access control, crime monitoring and tracking and the like. For these tasks, accurate detection and tracking of faces is a critical step. Especially for some special tasks such as face bio-signal extraction, static face live detection and face emotion analysis, highly stable detection and tracking of face frames is necessary because it will greatly reduce the noise included in the extracted signal.

As with traditional computer vision methods, early human face tracking efforts were largely centered around feature engineering and color space. Some work has also been done to apply a general object Tracking method to face Tracking, such as Tracking-Learning-detection (tld), which is a classic general object Tracking method, and many face Tracking methods have been derived on the basis of the method; then, a likelihood map is obtained by matching an optical flow method with the Viola-Jones face detection, and the position of the face frame under the current frame can be obtained on the likelihood map. With the success and rapid development of the deep neural network, people attract attention, and a plurality of face tracking methods based on the deep neural network appear. For example, a Face Tracking method Face Tracking with Region-based CNN (FT-RCNN) based on general object detection fast RCNN.

Theoretically, the face detection method can be applied to face tracking, but the detection and tracking cannot be completely mixed. Because tracking is more focused on the intrinsic association of the same face from frame to frame in a video picture. In other words, the face tracking system is more concerned about the mode relationship of the same face in different picture frames, and does not look at the current picture frame as the face detection method does.

The tasks of face detection and alignment are very hot in recent years, and many methods based on deep learning are proposed and deployed in the commercial field, but few of the methods relate to face tracking direction. Scientists have invested much effort in algorithm performance due to the limitations of edge computing devices in computing performance. However, convolutional neural networks are generally considered computationally inefficient.

Disclosure of Invention

In order to solve the problem of low computational efficiency of a convolutional neural network in the field of face tracking at present, the invention provides an innovative face tracking strategy integrating the convolutional neural network and an optical flow method, namely a super-real-time face stable tracking system and a tracking method based on a deep neural network and the optical flow method. In addition, the change of the human face frame between any two adjacent frames generated by the method is very smooth and stable, and great convenience is provided for biological signal extraction, static human face living body detection and human face emotion analysis.

The technical scheme of the invention is as follows:

1. a super real-time face stabilization tracking system based on a deep neural network and an optical flow method comprises three modules: the system comprises a face detection module, a face tracking module and a face verification module;

the face detection module is responsible for detecting a face in a current picture frame, and consists of three deep neural networks which respectively comprise a first subnetwork, a second subnetwork and a third subnetwork;

the detection steps of the face detection module are as follows:

1) the first subnetwork is responsible for finding out an interested area which is possibly a human face in the current input picture, and the interested area which partially covers more than a threshold value of 0.7 is filtered out through a non-maximum suppression algorithm and then enters a second subnetwork;

2) the second sub-network carries out secondary classification, and whether each region of interest is a human face or not is roughly judged; finally, the region of interest of the sub-network II classified as a positive sample is transmitted into a sub-network III;

3) performing secondary classification on the third sub-grid to select a final result;

4) if the face is not found in the current picture frame, repeating the steps 1) -3);

5) if the face of the current frame is found, starting a face tracking program from the second frame;

the face tracking module can determine the final position of the face frame of the current frame;

the face verification module can input the image tracked by the face tracking module into the face verification module, output the two classification results and judge the confidence coefficient that the current image is the face.

Preferably:

the first sub-network comprises: an input layer, four convolutional layers and a max pooling layer;

the second subnetwork comprises: the system comprises an input layer, three convolution layers, two maximum pooling layers and two full-connection layers;

the third sub-network comprises: one input layer, three convolutional layers, two max pooling layers and two full connection layers.

Preferably:

the face detection module comprises a sub-network, the sub-network comprising: one input layer, five convolutional layers and two max pooling layers.

Preferably:

the specific steps of the face tracking module for determining the final position of the face frame of the current frame are as follows:

1) at the position of the face frame in the previous frame, 10 × 10 feature points are determined on average and recorded as

Wherein N is the number of the characteristic points and is equal to 100;

2) by means of a pyramidForward determining the characteristic point P of the last frame by using the tower Lucas-Kanade method^lThe optical flow movement locus to the current frame is recorded as

If the step does not successfully obtain the optical flow motion track of the feature point, the tracking is failed, and the program returns to the face detection step, and the face detection is started from the beginning;

3) according to P^lAnd P^fObtaining the positions of the face frames of the previous frame and the current frame; template matching is carried out on the two face pictures, and the similarity of each pixel point is calculated; finding out the similarity of the feature points, and recording indexes of all the feature points larger than the median;

4) and determining the optical flow motion track from the feature point of the current frame to the previous frame in a reverse direction by the pyramid Lucas-Kanade method, and recording the optical flow motion track as

5)P^lAnd P^bAll are feature points obtained by different methods in the previous frame of image; we use the formula:

calculating to obtain Euclidean distances corresponding to the two groups of feature points one by one, and recording indexes of all feature points smaller than the median;

6) filtering P based on the indices obtained in steps 3) and 5)^lAnd P^fTo obtain

And

if the length of any one list is 0, the tracking fails, and the program returns to the face detection step, and the face detection is started from the beginning;

7) if it is not

And

the lengths are all not 0, and the following formula is adopted:

calculating to obtain the one-to-one corresponding offset distance from the face of the previous frame to the feature points of the face of the current frame, wherein the digit is the offset of all pixel point coordinates from the face frame of the previous frame to the face frame of the current frame; adding the offset to the coordinate value of the upper left corner point of the previous frame of the face frame to obtain the coordinate value of the upper left corner point of the current frame of the face frame;

8) respectively calculate

And

euclidean distances of all points inside; dividing the distance of the current frame feature point by the distance of the previous frame feature point to obtain the absolute distance ratio of the light flow motion of all the feature points, wherein the digit is the width-height scale variation from the previous frame face frame to the current frame face frame; multiplying the width and height of the face frame of the previous frame by the scale variable quantity to obtain a width and height value of the face frame of the current frame;

9) and 7) determining the final position of the face frame of the current frame through steps 7) and 8).

Preferably:

the verification steps of the face verification module are as follows:

inputting the image tracked by the face tracking module into the module, outputting a classification result, and judging the confidence coefficient that the current image is the face; setting the confidence threshold to be 0.90, if the output result is less than the threshold, judging that the tracking fails, and returning the program to the face detection step to start the face detection from the beginning.

Preferably:

and when the tracking is set to be successful, carrying out face verification every 3 to 5 frames.

A tracking method of a super real-time face stable tracking system based on a deep neural network and an optical flow method comprises the following steps:

1) the face detection module detects the position of a face frame in an initial frame picture, and when a face exists, the face detection module sends face frame coordinates to the face tracking module;

2) the face tracking module tracks the face frame coordinates of the next frame of picture based on the face frame coordinates of the current frame;

3) in the tracking process of one frame after another, operating a face verification module according to a certain frequency to ensure that the accurate face frame position is obtained by tracking;

4) and if the verification module finds that the tracking fails, the face detection module is operated again, and a new round of detection-tracking-verification circulation is entered.

The super real-time face stable tracking system based on the deep neural network and the optical flow method has the advantages that:

(1) the super real-time face stable tracking system based on the deep neural network and the optical flow method has the advantages of super real-time performance, long acting, extreme stability and the like. The application range is very wide, and the application system has obvious effect promotion on all application systems related to human faces.

(2) In the invention, a cascade lightweight face detection module composed of three deep neural networks is firstly applied and used for determining the position of a first frame face frame. In the subsequent tracking process, when the face tracking fails, the face detection module is operated again, and the next round of detection-tracking-verification circulation is entered.

(3) After the face detection module determines the position information of the face frame of the current frame, the position information of the face frame of the next frame is determined by an optical flow method and template matching. In order to ensure the accuracy of the face frame obtained by the optical flow method and the template matching, the second deep neural network in the face detection method is modified, so that the face detection method can be used for face verification. If the accuracy rate of the face obtained by the verification module is lower than a certain threshold value, the tracking is judged to be failed, and the face detection module needs to be operated again to enter the next round of circulation.

(4) The whole system uses a lightweight deep neural network, and a very high-efficiency optical flow method is matched, so that the invention has three advantages of super real-time performance, long-term effect and extreme stability.

(5) The super real-time performance is tested and used on a desktop computer

The CPU E5-2673v3@2.40GHz central processing unit of Xeon (R), face detection calculates that it takes 16 milliseconds to calculate the first frame of face frame, stable tracking processes one frame and takes 3 milliseconds, and face verification takes 1 millisecond. The comprehensive calculation process processes a video with a duration of 1 minute and a frame rate of 30, and the total time is 5173 milliseconds. In addition, on the RK3399 chip commonly used by edge computing equipment, the average frame rate of the invention can reach about 50.

(6) In the aspect of long-term effectiveness, the invention integrates three parts of face detection, face tracking and face verification, and ensures that face information can be fed back in time in the process of using the invention. Even if face frame information is lost in some video pictures, the face frame information can be timely re-detected and tracked when the face appears in the pictures again.

(7) In the aspect of stability, the method benefits from an optical flow method and pattern matching, and can ensure the tracking accuracy with the minimum translation distance for the face frames in the adjacent picture frames, thereby realizing the stable tracking effect. In practical use, compared with the deep neural network method commonly used in the industry at present, the continuous face frame output by the method has the advantages of being more stable and smoother.

Drawings

In order to illustrate embodiments of the present invention or technical solutions in the prior art more clearly, the drawings which are needed in the embodiments will be briefly described below, so that the features and advantages of the present invention can be understood more clearly by referring to the drawings, which are schematic and should not be construed as limiting the present invention in any way, and for a person skilled in the art, other drawings can be obtained on the basis of these drawings without any inventive effort. Wherein:

FIG. 1 is an overall flow chart of the invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

The invention provides a super-real-time face stable tracking system based on a deep neural network and an optical flow method, which can realize super-real-time long-acting stable face tracking so as to improve the practical application effect of a face detection and tracking method in various fields.

The main content of the invention comprises: determining a first frame face frame through a deep neural network; comparing the current picture frame with the previous frame by an optical flow method to obtain a current frame face frame; verifying whether an image obtained by an optical flow method is a human face or not by using a deep neural network every few frames; and tracking and verifying, and repeatedly executing the face detection program when any stage fails until the face enters the tracking program.

The overall process of the invention is shown in fig. 1, and the invention provides a super real-time face stable tracking system based on a deep neural network and an optical flow method, wherein the tracking system comprises three modules: the system comprises a face detection module, a face tracking module and a face verification module;

the face detection module is responsible for detecting the position of a face frame in an initial frame picture, and when a face exists, the face detection module sends face frame coordinates to the face tracking module; the face tracking module is used for tracking the face frame coordinates of the next frame of picture based on the face frame coordinates of the current frame; in the tracking process of one frame after another, operating a face verification module according to a certain frequency to ensure that the accurate face frame position is obtained by tracking; and if the verification module finds that the tracking fails, the face detection module is operated again, and a new round of detection-tracking-verification circulation is entered.

The face detection module is responsible for detecting a face in a current picture frame, and consists of three deep neural networks which respectively comprise a first subnetwork, a second subnetwork and a third subnetwork; for their specific structures and parameters, please refer to table 1.

1) Table 1: face detection module

The detection steps of the face detection module are as follows:

1) the first subnetwork is responsible for finding out an interested area which is possibly a human face in the current input picture, and the interested area which partially covers more than a threshold value of 0.7 is filtered out through a non-maximum suppression algorithm and then enters a second subnetwork.

4) and if the face is not found in the current picture frame, repeating the steps 1) -3).

5) If the current frame face is found, the face tracking procedure is started from the second frame.

The face tracking module can determine the final position of the face frame of the current frame, and the specific steps are as follows:

Wherein N is the number of the characteristic points and is equal to 100.

2) Forward determining the characteristic point P of the last frame by the pyramid Lucas-Kanade method^lThe optical flow movement locus to the current frame is recorded as

If the step does not successfully obtain the optical flow motion track of the feature point, the program returns to the step one to start the face detection from the beginning.

3) According to P^lAnd P^fAnd obtaining the positions of the face frames of the previous frame and the current frame. And carrying out template matching on the two human face pictures, and calculating the similarity of each pixel point. And finding out the similarity of the feature points, and recording the indexes of all the feature points which are larger than the median.

5)P^lAnd P^bAre all feature points obtained by different methods in the previous frame of image. We use the formula:

and calculating to obtain Euclidean distances corresponding to the two groups of feature points one by one, and recording indexes of all feature points smaller than the median.

And

if the length of any one list is 0, the tracking fails, and the procedure returns to the first step to start the face detection newly.

7) If it is not

And

the lengths are all not 0, and the following formula is adopted:

and calculating to obtain the one-to-one corresponding offset distance from the face of the previous frame to the feature points of the face of the current frame, wherein the digit is the offset of all pixel point coordinates from the face frame of the previous frame to the face frame of the current frame. And adding the offset to the coordinate value of the upper left corner point of the face frame of the previous frame to obtain the coordinate value of the upper left corner point of the face frame of the current frame.

8) Respectively calculate

And

all points inside the euclidean distance. And dividing the distance of the current frame feature point by the distance of the previous frame feature point to obtain the absolute distance ratio of the light flow motion of all the feature points, wherein the digit is the width-height scale variation from the previous frame face frame to the current frame face frame. And multiplying the width and height of the face frame of the previous frame by the scale variable quantity to obtain the width and height value of the face frame of the current frame.

In order to ensure that the image content in the frame tracked by the face tracking module is a face, the invention changes the sub-network II in the face detection module so as to make face verification. For the specific structure and parameters, please refer to table 2.

Input layer	Color image
		Convolutional layer	Number of convolution kernels: 28, size: 3 × 3, step size: 1, filling: 1
Maximum pooling layer	Size: 3 × 3, step size: 2
		Convolutional layer	Number of convolution kernels: 48, size: 3 × 3, step size: 1, filling: 1
Maximum pooling layer	Size: 3 × 3, step size: 2
		Convolutional layer	Number of convolution kernels: 64, size: 2 × 2, step size: 1, filling: 1
Convolutional layer	Number of convolution kernels: 128, size: 1 × 1, step size: 1, filling: 1
		Convolutional layer	Number of convolution kernels: 2, size: 1 × 1, step size: 1, filling: 1

Table 2: face verification module

In order to give consideration to the efficiency and tracking accuracy of the invention, when the tracking is successful, the face verification is performed every 3 to 5 frames.

The verification steps of the face verification module are as follows:

and inputting the image tracked by the face tracking module into the module, outputting a classification result, and judging the confidence coefficient that the current image is the face. Setting the confidence threshold to be 0.90, if the output result is less than the threshold, judging that the tracking fails, and returning the procedure to the step one to start the face detection from the beginning.

The above description is only an example of the present invention, and is not intended to limit the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A super real-time face stable tracking system based on a deep neural network and an optical flow method is characterized by comprising three modules: the system comprises a face detection module, a face tracking module and a face verification module;

the detection steps of the face detection module are as follows:

2. The super real-time human face stable tracking system based on the deep neural network and the optical flow method as claimed in claim 1,

3. The super real-time human face stable tracking system based on the deep neural network and the optical flow method as claimed in any one of claims 1-2, wherein the human face detection module comprises a sub-network, the sub-network comprises: one input layer, five convolutional layers and two max pooling layers.

4. The super real-time human face stable tracking system based on the deep neural network and the optical flow method as claimed in any one of claims 1 to 3,

Wherein N is the number of the characteristic points and is equal to 100;

2) by the pyramid Lucas-Kanade method, front viewTo determine the feature point P of the previous frame^lThe optical flow movement locus to the current frame is recorded as

And

7) if it is not

And

the lengths are all not 0, and the following formula is adopted:

8) respectively calculate

And

5. The super real-time human face stable tracking system based on deep neural network and optical flow method as claimed in any one of claims 1-4,

the verification steps of the face verification module are as follows:

6. The super real-time human face stable tracking system based on deep neural network and optical flow method as claimed in any one of claims 1-5,

7. The tracking method of the super real-time human face stable tracking system based on the deep neural network and the optical flow method as claimed in any one of claims 1 to 6, wherein: the tracking method comprises the following steps: