CN112215155B

CN112215155B - Face tracking method and system based on multi-feature fusion

Info

Publication number: CN112215155B
Application number: CN202011091950.1A
Authority: CN
Inventors: 高珊珊; 瞿洪柱; 宋春晓; 袁丽燕
Original assignee: Beijing Sinonet Science and Technology Co Ltd
Current assignee: Beijing Sinonet Science and Technology Co Ltd
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2022-10-14
Anticipated expiration: 2040-10-13
Also published as: CN112215155A

Abstract

The invention relates to the field of intelligent video image processing, and discloses a face tracking method and a face tracking system based on multi-feature fusion, wherein the face tracking method comprises the steps of obtaining a current frame target image; predicting the position coordinates of a face track frame of all stored face tracks in a current frame target image by using a Kalman filtering method; judging whether the frame number of the current frame target image is a multiple of the detection interval or not; judging whether a human face is detected; extracting the features of the rectangular frames of the human faces by using a feature extraction algorithm to obtain the human face features of all the human faces; carrying out face feature matching; carrying out secondary matching on the face which fails to be matched and the face track which fails to be matched; updating a Kalman filter and track information; and judging whether the face tracking is finished or not. The invention enables the system processing speed to be fast by setting the human face detection interval; the matching accuracy is high by fusing the depth characteristic and the HOG characteristic; the feature matching and the IOU matching are used in a matched mode, face ID switching is reduced, the tracking continuity is guaranteed, and the tracking accuracy is improved.

Description

Face tracking method and system based on multi-feature fusion

Technical Field

The invention relates to the field of intelligent video image processing, in particular to a face tracking method and system based on multi-feature fusion.

Background

The intelligent monitoring system can complete tasks of real-time troubleshooting of dangerous personnel, quick tracking of suspects and the like by matching with the face recognition system. The system obtains a face image through face tracking and snapshot and sends the face image to a face recognition module to complete face comparison. Due to the fact that the monitoring scene is complex and changeable, the face continuously moves in the picture, the situations of illumination change, blurring, shielding, face posture change and the like often occur, the face tracking track is difficult to keep, face IDs of the same person are switched, and therefore the number of repeated face pictures captured is large, and the processing efficiency of the system is affected.

For example, the national patent publication CN110210285A discloses a "face tracking method, a face tracking device and a computer storage medium", which includes: acquiring a target image; analyzing the target image, and acquiring the characteristic information and the position information of the face to be detected in the target image; determining a position searching range according to the position information of the face to be detected; judging whether a face template with position information in a position searching range exists or not; if the face template exists, detecting a face template matched with the face to be detected within the position search range according to the feature information of the face to be detected; and taking the tracking ID of the matched face template as the tracking ID of the face to be detected. The invention matches human face by combining position information and characteristic information, and can keep tracking accuracy of long time sequence. However, the method carries out face detection and feature extraction on each frame of image, the time consumption is large, only depth features are adopted to carry out searching and matching in a certain position range during matching, the missing matching phenomenon is likely to be generated, ID switching is caused, and the tracking accuracy is influenced.

Therefore, how to improve the speed and the accuracy of the face tracking algorithm and keep the tracking track uninterrupted is an urgent problem to be solved in the field.

Disclosure of Invention

The invention provides a face tracking method and system based on multi-feature fusion, thereby solving the problems in the prior art.

In a first aspect, the present invention provides a face tracking method based on multi-feature fusion, which includes the following steps:

s1) acquiring a current frame target image of a video or picture stream;

s2) acquiring all stored face tracks, and predicting the position coordinates of a face track frame of all stored face tracks in a current frame target image by using a Kalman filtering method, wherein each stored face track corresponds to a Kalman filter, and one Kalman filter correspondingly predicts the position coordinates of the face track frame of one stored face track;

s3) setting the detection interval as d, judging whether the frame number of the current frame target image is a multiple of the detection interval d, and if so, entering the step S4); if not, adding 1 to the frame number of the current frame target image, and entering step S9);

s4) carrying out face detection on the current frame target image, judging whether a face is detected in the current frame target image, if so, outputting face rectangular frame coordinates corresponding to each face, and entering the step S5); if not, marking that the matching of all stored face tracks fails, adding 1 to the frame number of the current frame target image, and entering step S9);

s5) obtaining a face rectangular frame corresponding to each face according to the face rectangular frame coordinates, and performing feature extraction on the face rectangular frame by using a feature extraction algorithm to obtain face features of all faces in a current frame target image;

s6) acquiring feature pools of all stored face tracks, wherein each feature pool of the stored face tracks comprises a plurality of historical face features, and respectively performing face feature matching on the face features of all faces in the step S5) with the feature pools of all stored face tracks to acquire a first face feature matching result;

s7) carrying out secondary matching on the face which fails to be matched and the face track which fails to be matched in the first face feature matching result by calculating an IOU value to obtain a second face feature matching result;

s8) updating the Kalman filter and the track information according to the first face feature matching result and the second face feature matching result, adding 1 to the frame number of the current frame target image, and entering the step S9);

s9) judging whether the face tracking is finished or not, if so, finishing the face tracking; if not, return to step S1).

Generally, a face target in a video can have tens of to hundreds of frames from drawing to drawing, the face position and posture change between continuous frames is very small, the background is basically unchanged, and the detection and matching of each frame of video consumes computing resources, so that the processing speed is influenced. Therefore, the invention carries out the steps of face detection and subsequent tracking once every d frames (namely, the detection interval is d), and the rest frames only carry out Kalman prediction of the face track (namely, the position coordinates of the face track frame are predicted according to a Kalman filter), thereby being capable of exponentially improving the processing speed on the premise of not influencing the tracking effect and simultaneously avoiding some situations of error tracking and face ID switching. The size of the detection interval d can be adjusted according to the actual application requirements.

According to the method, for all stored face tracks, the position coordinates of a face track frame of the tracks in a current frame target image are predicted according to a Kalman filtering method, namely for the face tracks of the tracked faces, a Kalman filtering algorithm can linearly predict the position coordinates of the faces possibly appearing in the current frame target image according to the coordinate information of historical face tracks. The prediction process is based on the assumption that the human face moves linearly at a uniform speed, the human face trajectory prediction frame can better track the real position of the human face under a general condition, but under the condition that the motion direction or the speed of the human face suddenly changes, the prediction has a certain error, so the prediction cannot be directly used for human face tracking, continuous tracking can be kept only by using a detected human face rectangular frame to correct coordinates, and the correction process is the updating process of the Kalman filter and is carried out in the step S8).

Further, in step S4), a face detection algorithm is used to perform face detection on the current frame target image, a face rectangular frame coordinate corresponding to each face is output, a face confidence corresponding to each face is output, a confidence threshold is set, and a face rectangular frame with a face confidence lower than the confidence threshold is deleted.

The face detection algorithm outputs the coordinates of the face rectangular frame and also outputs the face confidence corresponding to the face rectangular frame, and the face confidence represents the probability that the face rectangular frame is a face. By setting the confidence threshold, the invention can filter out the face rectangular frames with lower face confidence, and the face rectangular frames with lower face confidence are false detection or pictures with poorer quality (fuzzy and large angle).

Further, in step S4), marking matching failures of all stored face tracks, acquiring track continuous matching failure frame numbers of all stored face tracks with matching failures, setting a matching failure threshold q, and respectively judging whether the track continuous matching failure frame numbers of the stored face tracks with matching failures are not less than the matching failure threshold q, if so, deleting the stored face tracks with the track continuous matching failure frame numbers not less than the matching failure threshold q in all stored face tracks with matching failures; and if not, adding 1 to the track continuous matching failure frame number of the stored face track with the track continuous matching failure frame number smaller than the matching failure threshold q in all the stored face tracks with the matching failure.

Further, in step S5), a face rectangular frame corresponding to each face is obtained according to the face rectangular frame coordinates, feature extraction is performed on the face rectangular frame by using a feature extraction algorithm, so as to obtain face features of each face in the current frame target image, and the total number of faces detected in the current frame target image is n, including the following steps:

s51) calculating the length and the width according to the ith personal face rectangular frame coordinate, wherein i is less than or equal to n, acquiring the larger value of the length and the width, modifying the ith personal face rectangular frame coordinate according to the larger value to obtain a square face rectangular frame, taking the square face rectangular frame as a face rectangular frame corresponding to the ith personal face, zooming the square face rectangular frame to the predicted size, and acquiring the zoomed ith personal face rectangular frame;

s52) establishing a trained mobilefacetet network, inputting the zoomed ith personal face rectangular frame into the trained mobilefacetet network, outputting the depth feature of the ith personal face rectangular frame through the trained mobilefacetet network, and carrying out normalization processing on the depth feature of the ith personal face rectangular frame to obtain the depth feature of the normalized ith personal face rectangular frame;

s53) obtaining the HOG feature of the ith personal face rectangular frame by using an HOG feature extraction algorithm, and carrying out normalization processing on the HOG feature of the ith personal face rectangular frame to obtain the HOG feature of the ith personal face rectangular frame after the normalization processing;

s54) sequentially splicing the depth features of the ith personal face rectangular frame subjected to the normalization processing in the step S52) and the HOG features of the ith personal face rectangular frame subjected to the normalization processing in the step S53) to obtain the face features of the ith human face;

s55) repeating the steps S51) to S54) in sequence to obtain the face characteristics of all the faces in the current frame target image.

The method cuts out the face rectangular frame from the original target image according to the coordinates of the face rectangular frame, and performs feature extraction after the face rectangular frame is scaled to the size of the predicted size. And when the picture is cut, a square area is cut according to the long edge of the face rectangular frame, and the face is positioned in the center of the area, so that the face is not deformed when the picture is zoomed. The human face features of the invention are composed of two parts, and the depth features output by the Mobilefacenet network and the HOG features output by the HOG feature extraction algorithm are spliced into complete human face features in sequence. The HOG features are formed by calculating and counting gradient direction histograms of local areas of the image, have geometric and optical invariance and can better capture local shape information of the image. The depth feature extraction network is obtained by removing a classification layer from a Mobilefacenet network, and the depth feature can represent deep abstract features of an image and has stronger expression capability than manually designed shallow features. The two features are sequentially spliced into a complete human face feature for comparison and matching, deep information and shallow information of the image can be compared simultaneously, the advantages of the traditional features and the advantages of the depth features are combined, and more accurate comparison can be achieved. The mobile communication network is a small network, the convolution layer adopts deep separable convolution, compared with the common convolution, the parameters and the calculated amount of the mobile communication network are greatly reduced, and the speed of a characteristic extraction part is ensured. The method comprises the steps of representing the depth feature output by a Mobilefacenet network by using [ x1, x2, x3, \8230 ], representing the HOG feature extracted by a HOG feature extraction algorithm by using [ y1, y2, y3, \8230 ], normalizing and recombining the depth feature and the HOG feature respectively, wherein the final face feature is [ x1, x2, x3, \8230, y1, y2, y3, \8230;). The variation ranges of the depth features output by the mobile facenet network and the HOG features extracted by the HOG feature extraction algorithm are inconsistent, and if the variation ranges are too large, the variation range of the cosine similarity obtained by calculation is large, and whether the two features are similar or not is difficult to measure; and the inconsistent ranges can cause large difference of contribution degrees of the two characteristics to the characteristic similarity, and the advantage of fusing the two characteristics cannot be exerted.

Further, in step S6), acquiring a feature pool of all stored face tracks, wherein each feature pool of the stored face tracks comprises a plurality of historical face features, performing face feature matching on the face features of all faces in step S5) with the feature pool of all stored face tracks respectively to obtain a first face feature matching result, including calculating feature similarity between the face features of each face and a plurality of historical face features in the feature pool of all stored face tracks, setting a feature similarity threshold, screening out feature similarity exceeding the feature similarity threshold, regarding the face with the highest feature similarity among the feature similarities exceeding the feature similarity threshold and the stored face tracks as the same face, and labeling the face with the highest feature similarity among the feature similarities exceeding the feature similarity threshold and the stored face tracks as successful matching; if the stored face track does not match the face, the stored face track which does not match the face is marked as matching failure, and if the face does not match the stored face track, the face which does not match the stored face track is marked as matching failure.

The feature pool of the stored face track comprises a certain amount of historical face features, and the features all participate in the calculation of feature similarity, so that the failure of matching between the face and the face track caused by the face posture or illumination mutation in the target image of the previous frame is avoided; the size of the feature pool determines how many recent frames of face features can be stored in a section of track, and the face features can be set according to actual needs.

Further, calculating the feature similarity between the face feature of each face and a plurality of historical face features in the feature pool of all stored face tracks, and the feature similarity between the face feature of the ith face and the w-th historical face feature in the feature pool of the jth stored face track

a represents the face feature of the ith face, b _jw The w-th historical face feature in the feature pool representing the jth stored face track, cos (a, b) _jw ) Face feature representing the ith faceAnd figuring cosine similarity between w-th historical face features in a feature pool of the jth stored face track, wherein j is more than or equal to 1 and is less than or equal to e, and e is the total number of the stored face tracks.

According to the invention, the cosine similarity between the face features and the historical face features is firstly obtained, and because the complete face features are spliced with two normalized features (namely the depth features and the HOG features), the feature similarity is calculated to be half of the cosine similarity between the face features and the historical face features, so that the maximum value of the feature similarity is reduced to 1. The feature similarity threshold is used for filtering tracks with too small feature similarity and detecting face pairs, if the feature similarity threshold is lower than the feature similarity threshold, the probability that the two features, namely the face feature and the historical face feature, are greater than the feature of the same person, and then mismatching can be caused by performing matching screening, so that the matching accuracy is influenced. The setting of the size of the feature similarity threshold is obtained from actual tests.

Further, in step S7), performing secondary matching on the face with failed matching and the face track with failed matching in the first face feature matching result by calculating an IOU value to obtain a second face feature matching result, including calculating an IOU value between a face rectangular frame corresponding to each face without matching to a corresponding face track and face track prediction frames of all stored face tracks, setting an IOU threshold value, screening out the IOU values exceeding the IOU threshold value, regarding the face with the highest IOU value and the stored face track in the IOU values exceeding the IOU threshold value as the same face, and labeling the face with the highest IOU value in the IOU values exceeding the IOU threshold value and the stored face track to be successfully matched; if the stored face track does not match the face, the stored face track which does not match the face is marked as matching failure, and if the face does not match the stored face track, the face which does not match the stored face track is marked as matching failure.

And for the tracks which are not successfully matched and the detected faces, calculating the IOU value between the detected face frame and each track prediction frame obtained in the second step, and for the parts exceeding the IOU threshold value, considering that the tracks with the maximum IOU value are successfully matched with the detected faces. The invention does not match the first face feature matching result to the corresponding oneThe face of the face track or the face track not matched with the corresponding face is secondarily matched through an Intersection Over Unit (IOU) value. The IOU value is calculated by the ratio of the intersection and union of a predicted frame and a real frame, the predicted frame is a face track prediction frame of the stored face track obtained by predicting the track in the step S2) through a Kalman filter, and the real frame is a detected face rectangular frame. The face rectangle frame A corresponding to the k-th face not matched with the corresponding face track _k A face track prediction frame B corresponding to the r-th stored face track _r IOU value in between

A _k ∩B _r A face rectangular frame A corresponding to the kth face not matched with the corresponding face track _k And a face track prediction frame B of the r-th stored face track _r Intersection between, A _k ∪B _r A face rectangular frame A corresponding to the kth face not matched with the corresponding face track _k And a face track prediction frame B of the r-th stored face track _r The union between them. By calculating the IOU value, the contact ratio of the positions between the face track prediction frame and the face rectangular frame can be measured, and whether the face track is matched with the detected face or not can be judged. The invention additionally adds IOU matching after face feature matching, so that the face with larger feature variation can be successfully matched through the position information, the matching success rate is increased, and the initialization of new face tracks caused by matching failure is correspondingly reduced, thereby reducing face ID switching.

Further, in step S8), updating the kalman filter and the trajectory information according to the first face feature matching result and the second face feature matching result, including the following steps:

s81) acquiring track continuous matching failure frame numbers of stored face tracks successfully matched with the face in the current frame target image, and setting the track continuous matching failure frame numbers of the stored face tracks successfully matched with the face in the current frame target image to be 0; acquiring a face ID of a stored face track, and setting the face ID of the successfully matched face as the face ID of the stored face track corresponding to the successfully matched face; storing the face features of the successfully matched face into a corresponding feature pool of the stored face track; acquiring the coordinates of a face rectangular frame of the successfully matched face, and updating a Kalman filter of a stored face track corresponding to the successfully matched face according to the coordinates of the face rectangular frame of the successfully matched face;

s82) obtaining a face which fails to be matched, determining the face which fails to be matched as a face which newly appears in the video, initializing the face which fails to be matched as a new face track, and distributing a new face ID to the new face track; initializing a Kalman filter corresponding to the new face track by using the face rectangular frame coordinates of the face which fails to be matched, and storing the face features of the face which fails to be matched into a feature pool of the new face track; setting the number of the continuous matching failure frames of the new face track to be 0;

s83) adding 1 to the number of continuous matching failure frames of the track of the stored face track failed in matching; acquiring track continuous matching failure frame number x of the stored face track failed in matching, judging whether the track continuous matching failure frame number x is not less than a matching failure threshold q, and if so, deleting the stored face track failed in matching; if not, the process proceeds to step S9).

The method comprises the steps of setting a successfully matched detected face ID as a face ID corresponding to a face track, updating a Kalman filter corresponding to the face track, and adding the face characteristics of the detected face into a characteristic pool corresponding to the face track; for the face track which is not successfully matched, marking the face track which is not successfully matched as matching failure, and deleting the face track after continuous multiple matching failures; and for the face which is not successfully matched, the system considers the face which is newly appeared in the video, initializes the face to be a new face track, allocates a new face ID, initializes the corresponding Kalman filter by using the face rectangular frame coordinate of the face which is not successfully matched, and stores the face characteristic to a characteristic pool of the new face track, and returns to the step S1) to extract the next frame of target image after the Kalman filter is updated. And updating the Kalman filter after the detected face and the face track are successfully matched, wherein the face track prediction frame coordinate obtained by the Kalman filter through historical track prediction is corrected by mainly utilizing a relatively accurate face rectangular frame coordinate, so that the Kalman filter can follow the actual position of the face to perform more accurate prediction on the next frame of target image.

In a second aspect, the invention provides a face tracking system based on multi-feature fusion, which comprises a picture acquisition module, a track prediction module, a judgment module, a face detection module, a feature extraction module, a face matching module, a track updating module, a face snapshot module and a face recognition module;

the image acquisition module is used for acquiring a current frame target image in a video or an image stream;

the track prediction module is used for predicting the position coordinates of a human face track frame of the stored human face track in the current frame target image;

the judging module is used for judging whether the frame number f of the current frame target image is a multiple of the detection interval d or not, and if so, the face detection module is called; if not, returning to the image acquisition module;

the face detection module is used for detecting the face in the current frame target image and outputting the coordinates of a face rectangular frame, and if the face is not detected, the track updating module is directly called;

the characteristic extraction module is used for extracting the face characteristics of each face in the current frame target image;

the face matching module is used for calculating the historical face features of the face track and the feature similarity between the face features of the faces extracted by the feature extraction module so as to perform feature matching; for the face track and the face which are not successfully matched, secondary matching is carried out by calculating an IOU value;

the track updating module is used for setting the face ID of the successfully matched face as the face ID corresponding to the face track, updating a Kalman filter of the face track and adding the face features into a feature pool of the face track; for the face track which is not matched with the corresponding face, marking the face track which is not matched with the corresponding face as matching failure, and deleting the face track which is not matched after continuous multiple matching failures; for the face which is not matched with the corresponding face track, initializing the face which is not matched with the corresponding face track into a new face track, distributing a new face ID, initializing a corresponding Kalman filter by using face rectangular frame coordinates, and storing the face features of the face which is not matched with the corresponding face track into a feature pool of the new face track;

the human face snapshot module is used for snapshot of the human face after the human face track is finished;

and the face recognition module is used for comparing and recognizing the face image obtained by snapshot.

The invention has the beneficial effects that: the invention enables the system processing speed to be fast by setting the face detection interval, and simultaneously filters some false detections and track fragments, thereby improving the tracking accuracy and reducing face ID switching; when the face is matched, feature matching is firstly carried out through the splicing depth feature and the HOG feature, the depth and the non-depth feature of the image are considered, the matching accuracy is high, and the occurrence of the mismatching condition is avoided; the method has the advantages that secondary matching is carried out on unsuccessfully matched tracks and faces by using the IOU value, IOU matching is based on the principle that the positions of a face track prediction frame and a face rectangular frame of the same target are close to each other, when the condition that partial shielding, light mutation, angle change and the like of the face cause feature matching failure is detected, the face and face tracks can be successfully matched by using position information through IOU matching, the tracking continuity is guaranteed, feature matching and IOU matching are matched for use, face ID switching is few, and the tracking accuracy is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the embodiments are briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flow chart of a face tracking method based on multi-feature fusion according to a first embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a face tracking system based on multi-feature fusion according to the first embodiment.

Fig. 3 is a schematic flowchart of the feature extraction module according to the first embodiment.

Fig. 4 is a schematic diagram of a calculation method of the IOU value according to the first embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

In a first embodiment, as shown in fig. 1, a face tracking method based on multi-feature fusion includes the following steps:

s1) acquiring a current frame target image of a video or picture stream.

In this embodiment, the video or picture stream includes multiple frames of continuous images, the images in the multiple frames are sequentially taken from an initial frame for processing, and the image taking operation can be completed through an image encoding and decoding library such as OpenCV, where a frame number f of the initial frame is 0, and a frame number of each subsequent frame is increased by 1.

s3) setting the detection interval as d, judging whether the frame number of the current frame target image is a multiple of the detection interval d, and if so, entering the step S4); if not, adding 1 to the frame number of the current frame target image, and entering step S9).

Generally, a face target in a video has tens to hundreds of frames from drawing to drawing, the face position and posture change between continuous frames is very small, the background is basically unchanged, the significance of detecting and matching each frame of video is not great, computing resources are consumed very, and the processing speed is influenced. Therefore, the invention carries out the steps of face detection and follow-up tracking once every d frames (namely the detection interval is d), and the other frames only carry out Kalman prediction of the face track (namely, the Kalman filter is updated according to the face track prediction frame), thereby being capable of improving the processing speed by times on the premise of not influencing the tracking effect and simultaneously avoiding some situations of error tracking and face ID switching. The size of the detection interval d can be adjusted according to the actual application requirements.

S4) carrying out face detection on the current frame target image, judging whether a face is detected in the current frame target image, if so, outputting face rectangular frame coordinates corresponding to each face, and entering step S5); if not, marking that the matching of all stored face tracks fails, adding 1 to the frame number of the current frame target image, and entering step S9).

In step S4), a face detection algorithm is used to perform face detection on the current frame target image, the face rectangular frame coordinates corresponding to each face are output, the face confidence corresponding to each face is output, a confidence threshold is set, and the face rectangular frame with the face confidence lower than the confidence threshold is deleted.

In the step S4), marking matching failures of all stored face tracks, acquiring track continuous matching failure frame numbers of the stored face tracks which fail to be matched, setting a matching failure threshold q, respectively judging whether the track continuous matching failure frame numbers of the stored face tracks which fail to be matched are not less than the matching failure threshold q, and if so, deleting the stored face tracks of which the track continuous matching failure frame numbers are not less than the matching failure threshold q in the stored face tracks which fail to be matched; and if not, adding 1 to the track continuous matching failure frame number of the stored face track of which the track continuous matching failure frame number is smaller than the matching failure threshold q in all the stored face tracks with matching failure.

S5) obtaining a face rectangular frame corresponding to each face according to the face rectangular frame coordinates, performing feature extraction on the face rectangular frame by using a feature extraction algorithm to obtain face features of all faces in a current frame target image, wherein the total number of the faces detected when face detection is performed on the current frame target image is n, and the method comprises the following steps as shown in FIG. 3:

s51) calculating the length and the width according to the ith personal face rectangular frame coordinate, wherein i is less than or equal to n, acquiring a larger value of the length and the width, modifying the ith personal face rectangular frame coordinate according to the larger value to obtain a square face rectangular frame (corresponding to a square face small image in the image 3), taking the square face rectangular frame as a face rectangular frame corresponding to the ith personal face, zooming the square face rectangular frame to a predicted size, and setting the predicted size to be 112 x 112 to obtain an ith personal face rectangular frame after zooming;

s53) obtaining the HOG feature of the ith personal face rectangular frame by using an HOG (histogram of Oriented graphics) feature extraction algorithm, and carrying out normalization processing on the HOG feature of the ith personal face rectangular frame to obtain the HOG feature of the ith personal face rectangular frame after the normalization processing;

s54) sequentially splicing the depth feature of the ith personal face rectangular frame subjected to the normalization processing in the step S52) and the HOG feature of the ith personal face rectangular frame subjected to the normalization processing in the step S53) to obtain the face feature of the ith human face;

The invention cuts out a face rectangular frame from an original target image according to the coordinates of the face rectangular frame, and performs feature extraction after the face rectangular frame is scaled to the size of a predicted size. And when the picture is cut, a square area is cut according to the long edge of the face rectangular frame, and the face is positioned in the center of the area, so that the face is not deformed when the picture is zoomed. The human face features of the invention are composed of two parts, and the depth features output by the Mobilefacenet network and the HOG features output by the HOG feature extraction algorithm are spliced into complete human face features in sequence. The HOG features are formed by calculating and counting gradient direction histograms of local areas of the image, have geometric and optical invariance and can better capture local shape information of the image. The depth feature extraction network is obtained by removing a classification layer from a Mobilefacetet network, and the depth feature can represent deep abstract features of the image and has stronger expression capability than manually designed shallow features. The two features are sequentially spliced into a complete human face feature for comparison and matching, deep information and shallow information of the image can be compared simultaneously, the advantages of the traditional features and the advantages of the depth features are combined, and more accurate comparison can be achieved. The mobile facenet network is a small network, the convolution layer adopts deep separable convolution, compared with the common convolution, the parameter quantity and the calculation quantity of the mobile facenet network are greatly reduced, and the speed of a characteristic extraction part is ensured. The method comprises the steps of representing the depth feature output by a Mobilefacenet network by using [ x1, x2, x3, \8230 ], representing the HOG feature extracted by a HOG feature extraction algorithm by using [ y1, y2, y3, \8230 ], normalizing and recombining the depth feature and the HOG feature respectively, wherein the final face feature is [ x1, x2, x3, \8230, y1, y2, y3, \8230;). The variation ranges of the depth features output by the mobile facenet network and the HOG features extracted by the HOG feature extraction algorithm are inconsistent, and if the variation ranges are too large, the variation range of the cosine similarity obtained by calculation is large, and whether the two features are similar or not is difficult to measure; the difference of the contribution degrees of the two characteristics to the similarity of the characteristics is large due to the inconsistent ranges, the advantage of fusing the two characteristics cannot be exerted, the overlarge change range of the characteristic value can be avoided through normalization processing, and the accuracy of subsequent matching is ensured.

S6) acquiring feature pools of all stored face tracks, wherein each feature pool of the stored face tracks comprises a plurality of historical face features, and respectively performing face feature matching on the face features of all faces in the step S5) with the feature pools of all stored face tracks to acquire a first face feature matching result.

In the step S6), acquiring a feature pool of all stored face tracks, wherein the feature pool of each stored face track comprises a plurality of historical face features, respectively performing face feature matching on the face features of all faces in the step S5) with the feature pool of all stored face tracks to obtain a first face feature matching result, wherein the first face feature matching result comprises calculating feature similarity between the face features of each face and the plurality of historical face features in the feature pool of all stored face tracks, setting a feature similarity threshold, screening out feature similarity exceeding the feature similarity threshold, regarding the face with the highest feature similarity in the feature similarity exceeding the feature similarity threshold and the stored face tracks as the same face, and labeling the face with the highest feature similarity in the feature similarity exceeding the feature similarity threshold and the stored face tracks to be successfully matched; if the stored face track does not match the face, the stored face track which does not match the face is marked as matching failure, and if the face does not match the stored face track, the face which does not match the stored face track is marked as matching failure.

Calculating the feature similarity between the face features of each face and a plurality of historical face features in the feature pool of all stored face tracks, wherein the face features of the ith face and the w th calendar in the feature pool of the jth stored face trackFeature similarity between history face features

a represents the face feature of the ith face, b _jw The w-th historical face feature in the feature pool representing the j-th stored face trajectory, cos (a, b) _jw ) And the cosine similarity between the face feature of the ith face and the w-th historical face feature in the feature pool of the jth stored face track is represented, j is more than or equal to 1 and less than or equal to e, and e is the total number of the stored face tracks.

And S7) carrying out secondary matching on the face which fails to be matched and the face track which fails to be matched in the first face feature matching result by calculating the IOU value to obtain a second face feature matching result.

In step S7), performing secondary matching on the face with failed matching and the face track with failed matching in the first face feature matching result by calculating an IOU value to obtain a second face feature matching result, including calculating an IOU value between a face rectangular frame corresponding to each face without matching to a corresponding face track and face track prediction frames of all stored face tracks, setting an IOU threshold value, screening out the IOU values exceeding the IOU threshold value, regarding the face with the highest IOU value among the IOU values exceeding the IOU threshold value and the stored face tracks as the same face, and labeling the face with the highest IOU value among the IOU values exceeding the IOU threshold value and the stored face tracks to be successfully matched; if the stored face track does not match the face, the stored face track which does not match the face is marked as matching failure, and if the face does not match the stored face track, the face which does not match the stored face track is marked as matching failure.

And for the tracks which are not successfully matched and the detected faces, calculating the IOU value between the detected face frame and each track prediction frame obtained in the second step, and for the parts exceeding the IOU threshold value, considering that the tracks with the maximum IOU value are successfully matched with the detected faces. The invention carries out secondary matching on the face which is not matched with the corresponding face track in the first face feature matching result or the face track which is not matched with the corresponding face track through an Intersection Over Unit (IOU) value. The IOU value is calculated by the ratio of the intersection and union of the 'predicted frame' and the 'real frame' (see figure 4), the 'predicted frame', namely the face track prediction frame of the stored face track obtained by predicting the track in the step S2) through the Kalman filter, and the real frame is actually a detected face rectangular frame. The face rectangle frame A corresponding to the k-th face not matched with the corresponding face track _k And a face track prediction frame B of the r-th stored face track _r IOU value in between

A _k ∩B _r A face rectangular frame A corresponding to the kth face not matched with the corresponding face track _k And a face track prediction frame B of the r-th stored face track _r Intersection between, A _k ∪B _r A face rectangular frame A corresponding to the kth face not matched with the corresponding face track _k A face track prediction frame B corresponding to the r-th stored face track _r The union of (a) and (b). By calculating the IOU value, the contact ratio of the positions between the face track prediction frame and the face rectangular frame can be measured, and whether the face track is matched with the detected face or not can be judged. The invention additionally adds IOU matching after face feature matching, so that the face with larger feature variation can be successfully matched through the position information, the matching success rate is increased, and the new face track caused by matching failure is correspondingly reducedInitialization, thereby reducing face ID switching.

S8) updating the Kalman filter and the track information according to the first face feature matching result and the second face feature matching result, adding 1 to the frame number of the current frame target image, and entering the step S9).

In step S8), updating the kalman filter and the trajectory information according to the first face feature matching result and the second face feature matching result, including the steps of:

s81) acquiring track continuous matching failure frame numbers of stored face tracks successfully matched with the face in the current frame target image, and setting the track continuous matching failure frame numbers of the stored face tracks successfully matched with the face in the current frame target image to be 0; acquiring a face ID of a stored face track, and setting the face ID of a successfully matched face as the face ID of the stored face track corresponding to the successfully matched face; storing the face features of the successfully matched face into a corresponding feature pool of the stored face track; acquiring the coordinates of a face rectangular frame of a successfully matched face, and updating a Kalman filter of a stored face track corresponding to the successfully matched face according to the coordinates of the face rectangular frame of the successfully matched face;

s82) acquiring a face which fails to be matched, determining the face which fails to be matched as a face which newly appears in the video, initializing the face which fails to be matched into a new face track, and distributing a new face ID to the new face track; initializing a Kalman filter corresponding to the new face track by using the face rectangular frame coordinates of the face which fails to be matched, and storing the face features of the face which fails to be matched into a feature pool of the new face track; setting the number of the continuous matching failure frames of the new face track to be 0;

s83) adding 1 to the number of continuous matching failure frames of the track of the stored face track failed in matching to the stored face track failed in matching; acquiring track continuous matching failure frame numbers x of stored face tracks failed in matching, judging whether the track continuous matching failure frame numbers x are not smaller than a matching failure threshold q, and if yes, deleting the stored face tracks failed in matching; if not, the process proceeds to step S9).

The method comprises the steps of setting a successfully matched detected face ID as a face ID corresponding to a face track, updating a Kalman filter corresponding to the face track, and adding the face characteristics of the detected face into a characteristic pool corresponding to the face track; for the face track which is not successfully matched, marking the face track which is not successfully matched as matching failure, and deleting the face track after continuous multiple matching failures; and for the faces which are not successfully matched, the system considers the faces newly appearing in the video, initializes the faces to a new face track, allocates a new face ID, initializes a corresponding Kalman filter by using the coordinates of the face rectangular frame of the faces which are not successfully matched, stores the face features into a feature pool of the new face track, and returns to the step S1) to extract the next frame of target image after the Kalman filter is updated. And updating the Kalman filter after the detected face and the face track are successfully matched, wherein the face track prediction frame coordinate obtained by the Kalman filter through historical track prediction is corrected by mainly utilizing a relatively accurate face rectangular frame coordinate, so that the Kalman filter can follow the actual position of the face to perform more accurate prediction on the next frame of target image.

S9) judging whether the face tracking is finished or not, if so, finishing the face tracking; if not, returning to the step S1).

In a second aspect, the embodiment further provides a face tracking system based on multi-feature fusion, as shown in fig. 2, including a picture acquisition module, a trajectory prediction module, a judgment module, a face detection module, a feature extraction module, a face matching module, a trajectory update module, a face snapshot module, and a face recognition module;

the judging module is used for judging whether the current frame target image f is a multiple of the detection interval d, and if so, the face detection module is called; if not, returning to the image acquisition module;

the face matching module is used for calculating the feature similarity between the historical face features of the face track and the face features of each face extracted by the feature extraction module so as to perform feature matching; for the face track and the face which are not successfully matched, secondary matching is carried out by calculating an IOU value;

the track updating module is used for setting the face ID of the successfully matched face as the face ID corresponding to the face track, updating a Kalman filter of the face track and adding the face features into a feature pool of the face track; for the face track which is not matched with the corresponding face, marking the face track which is not matched with the corresponding face as matching failure, and deleting the face track which is matched with the failure after continuous multiple matching failures; for the face which is not matched with the corresponding face track, initializing the face which is not matched with the corresponding face track into a new face track, distributing a new face ID, initializing a corresponding Kalman filter by using face rectangular frame coordinates, and storing the face features of the face which is not matched with the corresponding face track into a feature pool of the new face track;

the face snapshot module is used for snapshot of the human face after the face track is finished;

Updating the Kalman filter after successful matching, wherein the face track prediction frame coordinates obtained by the Kalman filter through historical track prediction are corrected by using more accurate coordinates of a face rectangular frame of a current frame, so that the filter can follow the actual position of the face to make more accurate prediction in the next frame; meanwhile, when the human face features are added into the feature pool, if the feature pool is full, the human face features with the longest time are discarded, generally, the same human face close to each other in time has smaller change, is closer in feature expression, and is easier to match successfully.

For the track with matching failure, the number of times of continuous matching failure is marked, the track with matching failure can still participate in face matching, and if the matching is successful, the number of times of matching failure is cleared; and if the continuous matching failure times exceed the matching failure threshold value, the face track is considered to have disappeared from the picture, so that the corresponding information of the face is deleted, and the subsequent matching is not performed. The setting of the matching failure threshold value enables the situation that the face disappears in a short time, such as shielding, head lowering, large-angle side face and the like, to not cause track interruption, the face track can successfully identify the same face in the subsequent matching, the continuity is kept, and the face ID switching is reduced.

For the detected face which fails to be matched, the system considers the face which newly appears in the video, initializes the face to a track, allocates a new face ID for the face, and initializes the Kalman filter by using the coordinate information of the face rectangular frame. And the initialized Kalman filter can be used for predicting the position coordinates of the face track in the next frame of target image. The invention also sets an initialization threshold value, the new face track is still in an undetermined state, and the new face track is converted from the undetermined state to a determined state only if the number of frames which are successfully matched in the follow-up tracking continuously exceeds the initialization threshold value. If the matching fails in a pending state (i.e. a new face track is not matched with a face when the frame number is less than the initialization threshold in subsequent tracking), the new face track is directly deleted, and the marking of the matching failure is not carried out like the existing face track in a determined state. The method is used for avoiding the phenomenon of error tracking caused by error detection during detection, and the error detection target cannot be detected by continuous multiple frames usually, so that the error tracking is reduced by setting the initialization threshold value, and the tracking accuracy is improved.

The present embodiment also provides a computer-readable storage medium, all or part of the steps in the method related to the present embodiment may be implemented by a program to instruct related hardware, where the program may be stored in the storage medium readable by a computer device, and is used to execute all or part of the steps in the methods described in the foregoing embodiments. The computer device, for example: personal computer, server, network equipment, intelligent mobile terminal, intelligent home equipment, wearable intelligent equipment, vehicle-mounted intelligent equipment and the like; the storage medium, for example: RAM, ROM, magnetic disk, magnetic tape, optical disk, flash memory, U disk, removable hard disk, memory card, memory stick, network server storage, network cloud storage, etc.

By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:

the invention enables the system processing speed to be fast by setting the face detection interval, and simultaneously filters some false detection and track fragments, thereby improving the tracking accuracy and reducing face ID switching; when the face is matched, feature matching is firstly carried out through the splicing depth feature and the HOG feature, the depth and the non-depth feature of the image are considered, the matching accuracy is high, and the occurrence of the mismatching condition is avoided; the method has the advantages that secondary matching is carried out on unsuccessfully matched tracks and faces by using the IOU value, IOU matching is based on the principle that the positions of a face track prediction frame and a face rectangular frame of the same target are close to each other, when the condition that partial shielding, light mutation, angle change and the like of the face cause feature matching failure is detected, the face and face tracks can be successfully matched by using position information through IOU matching, the tracking continuity is guaranteed, feature matching and IOU matching are matched for use, face ID switching is few, and the tracking accuracy is improved.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims

1. A face tracking method based on multi-feature fusion is characterized by comprising the following steps:

s1) acquiring a current frame target image of a video or picture stream;

s2) acquiring all stored face tracks, and predicting the position coordinates of face track frames of all stored face tracks in the current frame target image by using a Kalman filtering method, wherein each stored face track corresponds to a Kalman filter, and one Kalman filter correspondingly predicts the position coordinates of the face track frame of one stored face track;

s3) setting a detection interval d, judging whether the frame number of the current frame target image is a multiple of the detection interval d, and if so, entering the step S4); if not, adding 1 to the frame number of the current frame target image, and entering step S9);

s5) obtaining a face rectangular frame corresponding to each face according to the face rectangular frame coordinates, and extracting the features of the face rectangular frame based on a depth feature extraction algorithm and an HOG feature extraction algorithm to obtain the face features of all the faces in the current frame target image;

s7) performing secondary matching on the face which fails to be matched and the face track which fails to be matched in the first face feature matching result by calculating the IOU value to obtain a second face feature matching result;

s8) updating a Kalman filter and track information according to the first face feature matching result and the second face feature matching result, adding 1 to the frame number of the current frame target image, and entering the step S9);

2. The method for tracking the human face based on the multi-feature fusion of claim 1, wherein in the step S4), a human face detection algorithm is used to detect the human face of the current frame target image, the coordinates of the human face rectangular frame corresponding to each human face are output, meanwhile, the human face confidence corresponding to each human face is output, a confidence threshold is set, and the human face rectangular frame with the human face confidence lower than the confidence threshold is deleted.

3. The face tracking method based on multi-feature fusion according to claim 1, characterized in that in step S4), all stored face tracks are marked as failed in matching, and further comprising obtaining the number of continuous track matching failure frames of all stored face tracks failed in matching, setting a matching failure threshold q, and respectively judging whether the number of continuous track matching failure frames of stored face tracks failed in matching is not less than the matching failure threshold q, if so, deleting the stored face tracks of which the number of continuous track matching failure frames of stored face tracks is not less than the matching failure threshold q; and if not, adding 1 to the track continuous matching failure frame number of the stored face track of which the track continuous matching failure frame number is smaller than the matching failure threshold q in all the stored face tracks with matching failure.

4. The method for tracking the human face based on the multi-feature fusion of the claim 1 or 3, wherein in the step S5), the human face rectangular frame corresponding to each human face is obtained according to the coordinates of the human face rectangular frame, the characteristic extraction algorithm is used to extract the characteristics of the human face rectangular frame, the human face characteristics of each human face in the current frame target image are obtained, and the total number of the detected human faces in the current frame target image is n, and the method comprises the following steps:

s51) calculating the length and the width according to the ith personal face rectangular frame coordinate, wherein i is less than or equal to n, modifying the ith personal face rectangular frame coordinate according to the calculation result to obtain a square personal face rectangular frame, taking the square personal face rectangular frame as a personal face rectangular frame corresponding to the ith personal face, zooming the square personal face rectangular frame to the predicted size, and obtaining a zoomed ith personal face rectangular frame;

s52) establishing a trained mobile face network, inputting the zoomed ith personal face rectangular frame into the trained mobile face network, outputting the depth feature of the ith personal face rectangular frame through the trained mobile face network, and carrying out normalization processing on the depth feature of the ith personal face rectangular frame to obtain the depth feature of the ith personal face rectangular frame after the normalization processing;

s55) repeating the steps S51) to S54) in sequence to obtain the face characteristics of all faces in the current frame target image.

5. The multi-feature fusion based face tracking method according to claim 4, wherein in step S6), a feature pool of all stored face tracks is obtained, the feature pool of each stored face track comprises a plurality of historical face features, the face features of all faces in step S5) are respectively matched with the feature pool of all stored face tracks to obtain a first face feature matching result, the first face feature matching result comprises calculating feature similarity between the face features of each face and a plurality of historical face features in the feature pool of all stored face tracks, setting a feature similarity threshold, screening out feature similarity exceeding the feature similarity threshold, regarding the face with the highest feature similarity among the feature similarity exceeding the feature similarity threshold and the stored face tracks as the same, and labeling the face with the highest feature similarity among the feature similarity exceeding the feature similarity threshold and the stored face tracks as a successful matching result; if the stored face track does not match the face, the stored face track which does not match the face is marked as matching failure, and if the face does not match the stored face track, the face which does not match the stored face track is marked as matching failure.

6. The method for tracking the human face based on the multi-feature fusion of claim 5, wherein the feature similarity between the human face feature of each human face and a plurality of historical human face features in the feature pool of all stored human face tracks and the feature similarity between the human face feature of the ith human face and the w-th historical human face feature in the feature pool of the jth stored human face track are calculated

a represents the face feature of the ith face, b _jw The w-th historical face feature in the feature pool representing the jth stored face track, cos (a, b) _jw ) And the cosine similarity between the face feature of the ith face and the w-th historical face feature in the feature pool of the jth stored face track is represented, j is more than or equal to 1 and less than or equal to e, and e is the total number of the stored face tracks.

7. The method for tracking the human face based on the multi-feature fusion according to claim 6, wherein in step S7), the second human face feature matching result is obtained by performing secondary matching on the face with failed matching and the face track with failed matching in the first human face feature matching result through calculating the IOU value, and the method includes calculating the IOU value between the face rectangular frame corresponding to each human face not matched to the corresponding face track and the face track prediction frames of all stored face tracks, setting an IOU threshold, screening out the IOU values exceeding the IOU threshold, considering the human face with the highest IOU value among the IOU values exceeding the IOU threshold and the stored face tracks as the same human face, and labeling the human face with the highest IOU value among the IOU values exceeding the IOU threshold and the stored face tracks as successful matching; if the stored face track does not match the face, the stored face track which does not match the face is marked as matching failure, and if the face does not match the stored face track, the face which does not match the stored face track is marked as matching failure.

8. The method for tracking the human face based on the multi-feature fusion according to the claim 3 or 7, wherein in the step S8), the Kalman filter and the track information are updated according to the first human face feature matching result and the second human face feature matching result, and the method comprises the following steps:

9. A face tracking system based on multi-feature fusion is applicable to the face tracking method based on multi-feature fusion as claimed in any one of claims 1 to 8, and is characterized by comprising an image acquisition module, a track prediction module, a judgment module, a face detection module, a feature extraction module, a face matching module, a track updating module, a face snapshot module and a face recognition module;

the picture acquisition module is used for acquiring a current frame target image in a video or a picture stream;

the judging module is used for judging whether the frame number f of the current frame target image is a multiple of the detection interval d, and if so, the face detection module is called; if not, returning to the image acquisition module;

the feature extraction module extracts the face features of each face in the current frame target image based on a depth feature extraction algorithm and an HOG feature extraction algorithm;

the face matching module is used for calculating the feature similarity between the historical face features of the face track and the face features of each face extracted by the feature extraction module so as to perform feature matching; for the face track and the face which are not successfully matched, calculating an IOU value to carry out secondary matching;

the track updating module is used for updating the Kalman filter and the track information according to the face feature matching result;