CN112215155B - Face tracking method and system based on multi-feature fusion - Google Patents

Face tracking method and system based on multi-feature fusion Download PDF

Info

Publication number
CN112215155B
CN112215155B CN202011091950.1A CN202011091950A CN112215155B CN 112215155 B CN112215155 B CN 112215155B CN 202011091950 A CN202011091950 A CN 202011091950A CN 112215155 B CN112215155 B CN 112215155B
Authority
CN
China
Prior art keywords
face
feature
track
matching
stored
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011091950.1A
Other languages
Chinese (zh)
Other versions
CN112215155A (en
Inventor
高珊珊
瞿洪柱
宋春晓
袁丽燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sinonet Science and Technology Co Ltd
Original Assignee
Beijing Sinonet Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sinonet Science and Technology Co Ltd filed Critical Beijing Sinonet Science and Technology Co Ltd
Priority to CN202011091950.1A priority Critical patent/CN112215155B/en
Publication of CN112215155A publication Critical patent/CN112215155A/en
Application granted granted Critical
Publication of CN112215155B publication Critical patent/CN112215155B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of intelligent video image processing, and discloses a face tracking method and a face tracking system based on multi-feature fusion, wherein the face tracking method comprises the steps of obtaining a current frame target image; predicting the position coordinates of a face track frame of all stored face tracks in a current frame target image by using a Kalman filtering method; judging whether the frame number of the current frame target image is a multiple of the detection interval or not; judging whether a human face is detected; extracting the features of the rectangular frames of the human faces by using a feature extraction algorithm to obtain the human face features of all the human faces; carrying out face feature matching; carrying out secondary matching on the face which fails to be matched and the face track which fails to be matched; updating a Kalman filter and track information; and judging whether the face tracking is finished or not. The invention enables the system processing speed to be fast by setting the human face detection interval; the matching accuracy is high by fusing the depth characteristic and the HOG characteristic; the feature matching and the IOU matching are used in a matched mode, face ID switching is reduced, the tracking continuity is guaranteed, and the tracking accuracy is improved.

Description

Face tracking method and system based on multi-feature fusion
Technical Field
The invention relates to the field of intelligent video image processing, in particular to a face tracking method and system based on multi-feature fusion.
Background
The intelligent monitoring system can complete tasks of real-time troubleshooting of dangerous personnel, quick tracking of suspects and the like by matching with the face recognition system. The system obtains a face image through face tracking and snapshot and sends the face image to a face recognition module to complete face comparison. Due to the fact that the monitoring scene is complex and changeable, the face continuously moves in the picture, the situations of illumination change, blurring, shielding, face posture change and the like often occur, the face tracking track is difficult to keep, face IDs of the same person are switched, and therefore the number of repeated face pictures captured is large, and the processing efficiency of the system is affected.
For example, the national patent publication CN110210285A discloses a "face tracking method, a face tracking device and a computer storage medium", which includes: acquiring a target image; analyzing the target image, and acquiring the characteristic information and the position information of the face to be detected in the target image; determining a position searching range according to the position information of the face to be detected; judging whether a face template with position information in a position searching range exists or not; if the face template exists, detecting a face template matched with the face to be detected within the position search range according to the feature information of the face to be detected; and taking the tracking ID of the matched face template as the tracking ID of the face to be detected. The invention matches human face by combining position information and characteristic information, and can keep tracking accuracy of long time sequence. However, the method carries out face detection and feature extraction on each frame of image, the time consumption is large, only depth features are adopted to carry out searching and matching in a certain position range during matching, the missing matching phenomenon is likely to be generated, ID switching is caused, and the tracking accuracy is influenced.
Therefore, how to improve the speed and the accuracy of the face tracking algorithm and keep the tracking track uninterrupted is an urgent problem to be solved in the field.
Disclosure of Invention
The invention provides a face tracking method and system based on multi-feature fusion, thereby solving the problems in the prior art.
In a first aspect, the present invention provides a face tracking method based on multi-feature fusion, which includes the following steps:
s1) acquiring a current frame target image of a video or picture stream;
s2) acquiring all stored face tracks, and predicting the position coordinates of a face track frame of all stored face tracks in a current frame target image by using a Kalman filtering method, wherein each stored face track corresponds to a Kalman filter, and one Kalman filter correspondingly predicts the position coordinates of the face track frame of one stored face track;
s3) setting the detection interval as d, judging whether the frame number of the current frame target image is a multiple of the detection interval d, and if so, entering the step S4); if not, adding 1 to the frame number of the current frame target image, and entering step S9);
s4) carrying out face detection on the current frame target image, judging whether a face is detected in the current frame target image, if so, outputting face rectangular frame coordinates corresponding to each face, and entering the step S5); if not, marking that the matching of all stored face tracks fails, adding 1 to the frame number of the current frame target image, and entering step S9);
s5) obtaining a face rectangular frame corresponding to each face according to the face rectangular frame coordinates, and performing feature extraction on the face rectangular frame by using a feature extraction algorithm to obtain face features of all faces in a current frame target image;
s6) acquiring feature pools of all stored face tracks, wherein each feature pool of the stored face tracks comprises a plurality of historical face features, and respectively performing face feature matching on the face features of all faces in the step S5) with the feature pools of all stored face tracks to acquire a first face feature matching result;
s7) carrying out secondary matching on the face which fails to be matched and the face track which fails to be matched in the first face feature matching result by calculating an IOU value to obtain a second face feature matching result;
s8) updating the Kalman filter and the track information according to the first face feature matching result and the second face feature matching result, adding 1 to the frame number of the current frame target image, and entering the step S9);
s9) judging whether the face tracking is finished or not, if so, finishing the face tracking; if not, return to step S1).
Generally, a face target in a video can have tens of to hundreds of frames from drawing to drawing, the face position and posture change between continuous frames is very small, the background is basically unchanged, and the detection and matching of each frame of video consumes computing resources, so that the processing speed is influenced. Therefore, the invention carries out the steps of face detection and subsequent tracking once every d frames (namely, the detection interval is d), and the rest frames only carry out Kalman prediction of the face track (namely, the position coordinates of the face track frame are predicted according to a Kalman filter), thereby being capable of exponentially improving the processing speed on the premise of not influencing the tracking effect and simultaneously avoiding some situations of error tracking and face ID switching. The size of the detection interval d can be adjusted according to the actual application requirements.
According to the method, for all stored face tracks, the position coordinates of a face track frame of the tracks in a current frame target image are predicted according to a Kalman filtering method, namely for the face tracks of the tracked faces, a Kalman filtering algorithm can linearly predict the position coordinates of the faces possibly appearing in the current frame target image according to the coordinate information of historical face tracks. The prediction process is based on the assumption that the human face moves linearly at a uniform speed, the human face trajectory prediction frame can better track the real position of the human face under a general condition, but under the condition that the motion direction or the speed of the human face suddenly changes, the prediction has a certain error, so the prediction cannot be directly used for human face tracking, continuous tracking can be kept only by using a detected human face rectangular frame to correct coordinates, and the correction process is the updating process of the Kalman filter and is carried out in the step S8).
Further, in step S4), a face detection algorithm is used to perform face detection on the current frame target image, a face rectangular frame coordinate corresponding to each face is output, a face confidence corresponding to each face is output, a confidence threshold is set, and a face rectangular frame with a face confidence lower than the confidence threshold is deleted.
The face detection algorithm outputs the coordinates of the face rectangular frame and also outputs the face confidence corresponding to the face rectangular frame, and the face confidence represents the probability that the face rectangular frame is a face. By setting the confidence threshold, the invention can filter out the face rectangular frames with lower face confidence, and the face rectangular frames with lower face confidence are false detection or pictures with poorer quality (fuzzy and large angle).
Further, in step S4), marking matching failures of all stored face tracks, acquiring track continuous matching failure frame numbers of all stored face tracks with matching failures, setting a matching failure threshold q, and respectively judging whether the track continuous matching failure frame numbers of the stored face tracks with matching failures are not less than the matching failure threshold q, if so, deleting the stored face tracks with the track continuous matching failure frame numbers not less than the matching failure threshold q in all stored face tracks with matching failures; and if not, adding 1 to the track continuous matching failure frame number of the stored face track with the track continuous matching failure frame number smaller than the matching failure threshold q in all the stored face tracks with the matching failure.
Further, in step S5), a face rectangular frame corresponding to each face is obtained according to the face rectangular frame coordinates, feature extraction is performed on the face rectangular frame by using a feature extraction algorithm, so as to obtain face features of each face in the current frame target image, and the total number of faces detected in the current frame target image is n, including the following steps:
s51) calculating the length and the width according to the ith personal face rectangular frame coordinate, wherein i is less than or equal to n, acquiring the larger value of the length and the width, modifying the ith personal face rectangular frame coordinate according to the larger value to obtain a square face rectangular frame, taking the square face rectangular frame as a face rectangular frame corresponding to the ith personal face, zooming the square face rectangular frame to the predicted size, and acquiring the zoomed ith personal face rectangular frame;
s52) establishing a trained mobilefacetet network, inputting the zoomed ith personal face rectangular frame into the trained mobilefacetet network, outputting the depth feature of the ith personal face rectangular frame through the trained mobilefacetet network, and carrying out normalization processing on the depth feature of the ith personal face rectangular frame to obtain the depth feature of the normalized ith personal face rectangular frame;
s53) obtaining the HOG feature of the ith personal face rectangular frame by using an HOG feature extraction algorithm, and carrying out normalization processing on the HOG feature of the ith personal face rectangular frame to obtain the HOG feature of the ith personal face rectangular frame after the normalization processing;
s54) sequentially splicing the depth features of the ith personal face rectangular frame subjected to the normalization processing in the step S52) and the HOG features of the ith personal face rectangular frame subjected to the normalization processing in the step S53) to obtain the face features of the ith human face;
s55) repeating the steps S51) to S54) in sequence to obtain the face characteristics of all the faces in the current frame target image.
The method cuts out the face rectangular frame from the original target image according to the coordinates of the face rectangular frame, and performs feature extraction after the face rectangular frame is scaled to the size of the predicted size. And when the picture is cut, a square area is cut according to the long edge of the face rectangular frame, and the face is positioned in the center of the area, so that the face is not deformed when the picture is zoomed. The human face features of the invention are composed of two parts, and the depth features output by the Mobilefacenet network and the HOG features output by the HOG feature extraction algorithm are spliced into complete human face features in sequence. The HOG features are formed by calculating and counting gradient direction histograms of local areas of the image, have geometric and optical invariance and can better capture local shape information of the image. The depth feature extraction network is obtained by removing a classification layer from a Mobilefacenet network, and the depth feature can represent deep abstract features of an image and has stronger expression capability than manually designed shallow features. The two features are sequentially spliced into a complete human face feature for comparison and matching, deep information and shallow information of the image can be compared simultaneously, the advantages of the traditional features and the advantages of the depth features are combined, and more accurate comparison can be achieved. The mobile communication network is a small network, the convolution layer adopts deep separable convolution, compared with the common convolution, the parameters and the calculated amount of the mobile communication network are greatly reduced, and the speed of a characteristic extraction part is ensured. The method comprises the steps of representing the depth feature output by a Mobilefacenet network by using [ x1, x2, x3, \8230 ], representing the HOG feature extracted by a HOG feature extraction algorithm by using [ y1, y2, y3, \8230 ], normalizing and recombining the depth feature and the HOG feature respectively, wherein the final face feature is [ x1, x2, x3, \8230, y1, y2, y3, \8230;). The variation ranges of the depth features output by the mobile facenet network and the HOG features extracted by the HOG feature extraction algorithm are inconsistent, and if the variation ranges are too large, the variation range of the cosine similarity obtained by calculation is large, and whether the two features are similar or not is difficult to measure; and the inconsistent ranges can cause large difference of contribution degrees of the two characteristics to the characteristic similarity, and the advantage of fusing the two characteristics cannot be exerted.
Further, in step S6), acquiring a feature pool of all stored face tracks, wherein each feature pool of the stored face tracks comprises a plurality of historical face features, performing face feature matching on the face features of all faces in step S5) with the feature pool of all stored face tracks respectively to obtain a first face feature matching result, including calculating feature similarity between the face features of each face and a plurality of historical face features in the feature pool of all stored face tracks, setting a feature similarity threshold, screening out feature similarity exceeding the feature similarity threshold, regarding the face with the highest feature similarity among the feature similarities exceeding the feature similarity threshold and the stored face tracks as the same face, and labeling the face with the highest feature similarity among the feature similarities exceeding the feature similarity threshold and the stored face tracks as successful matching; if the stored face track does not match the face, the stored face track which does not match the face is marked as matching failure, and if the face does not match the stored face track, the face which does not match the stored face track is marked as matching failure.
The feature pool of the stored face track comprises a certain amount of historical face features, and the features all participate in the calculation of feature similarity, so that the failure of matching between the face and the face track caused by the face posture or illumination mutation in the target image of the previous frame is avoided; the size of the feature pool determines how many recent frames of face features can be stored in a section of track, and the face features can be set according to actual needs.
Further, calculating the feature similarity between the face feature of each face and a plurality of historical face features in the feature pool of all stored face tracks, and the feature similarity between the face feature of the ith face and the w-th historical face feature in the feature pool of the jth stored face track
Figure BDA0002722414790000071
Figure BDA0002722414790000072
a represents the face feature of the ith face, b jw The w-th historical face feature in the feature pool representing the jth stored face track, cos (a, b) jw ) Face feature representing the ith faceAnd figuring cosine similarity between w-th historical face features in a feature pool of the jth stored face track, wherein j is more than or equal to 1 and is less than or equal to e, and e is the total number of the stored face tracks.
According to the invention, the cosine similarity between the face features and the historical face features is firstly obtained, and because the complete face features are spliced with two normalized features (namely the depth features and the HOG features), the feature similarity is calculated to be half of the cosine similarity between the face features and the historical face features, so that the maximum value of the feature similarity is reduced to 1. The feature similarity threshold is used for filtering tracks with too small feature similarity and detecting face pairs, if the feature similarity threshold is lower than the feature similarity threshold, the probability that the two features, namely the face feature and the historical face feature, are greater than the feature of the same person, and then mismatching can be caused by performing matching screening, so that the matching accuracy is influenced. The setting of the size of the feature similarity threshold is obtained from actual tests.
Further, in step S7), performing secondary matching on the face with failed matching and the face track with failed matching in the first face feature matching result by calculating an IOU value to obtain a second face feature matching result, including calculating an IOU value between a face rectangular frame corresponding to each face without matching to a corresponding face track and face track prediction frames of all stored face tracks, setting an IOU threshold value, screening out the IOU values exceeding the IOU threshold value, regarding the face with the highest IOU value and the stored face track in the IOU values exceeding the IOU threshold value as the same face, and labeling the face with the highest IOU value in the IOU values exceeding the IOU threshold value and the stored face track to be successfully matched; if the stored face track does not match the face, the stored face track which does not match the face is marked as matching failure, and if the face does not match the stored face track, the face which does not match the stored face track is marked as matching failure.
And for the tracks which are not successfully matched and the detected faces, calculating the IOU value between the detected face frame and each track prediction frame obtained in the second step, and for the parts exceeding the IOU threshold value, considering that the tracks with the maximum IOU value are successfully matched with the detected faces. The invention does not match the first face feature matching result to the corresponding oneThe face of the face track or the face track not matched with the corresponding face is secondarily matched through an Intersection Over Unit (IOU) value. The IOU value is calculated by the ratio of the intersection and union of a predicted frame and a real frame, the predicted frame is a face track prediction frame of the stored face track obtained by predicting the track in the step S2) through a Kalman filter, and the real frame is a detected face rectangular frame. The face rectangle frame A corresponding to the k-th face not matched with the corresponding face track k A face track prediction frame B corresponding to the r-th stored face track r IOU value in between
Figure BDA0002722414790000081
A k ∩B r A face rectangular frame A corresponding to the kth face not matched with the corresponding face track k And a face track prediction frame B of the r-th stored face track r Intersection between, A k ∪B r A face rectangular frame A corresponding to the kth face not matched with the corresponding face track k And a face track prediction frame B of the r-th stored face track r The union between them. By calculating the IOU value, the contact ratio of the positions between the face track prediction frame and the face rectangular frame can be measured, and whether the face track is matched with the detected face or not can be judged. The invention additionally adds IOU matching after face feature matching, so that the face with larger feature variation can be successfully matched through the position information, the matching success rate is increased, and the initialization of new face tracks caused by matching failure is correspondingly reduced, thereby reducing face ID switching.
Further, in step S8), updating the kalman filter and the trajectory information according to the first face feature matching result and the second face feature matching result, including the following steps:
s81) acquiring track continuous matching failure frame numbers of stored face tracks successfully matched with the face in the current frame target image, and setting the track continuous matching failure frame numbers of the stored face tracks successfully matched with the face in the current frame target image to be 0; acquiring a face ID of a stored face track, and setting the face ID of the successfully matched face as the face ID of the stored face track corresponding to the successfully matched face; storing the face features of the successfully matched face into a corresponding feature pool of the stored face track; acquiring the coordinates of a face rectangular frame of the successfully matched face, and updating a Kalman filter of a stored face track corresponding to the successfully matched face according to the coordinates of the face rectangular frame of the successfully matched face;
s82) obtaining a face which fails to be matched, determining the face which fails to be matched as a face which newly appears in the video, initializing the face which fails to be matched as a new face track, and distributing a new face ID to the new face track; initializing a Kalman filter corresponding to the new face track by using the face rectangular frame coordinates of the face which fails to be matched, and storing the face features of the face which fails to be matched into a feature pool of the new face track; setting the number of the continuous matching failure frames of the new face track to be 0;
s83) adding 1 to the number of continuous matching failure frames of the track of the stored face track failed in matching; acquiring track continuous matching failure frame number x of the stored face track failed in matching, judging whether the track continuous matching failure frame number x is not less than a matching failure threshold q, and if so, deleting the stored face track failed in matching; if not, the process proceeds to step S9).
The method comprises the steps of setting a successfully matched detected face ID as a face ID corresponding to a face track, updating a Kalman filter corresponding to the face track, and adding the face characteristics of the detected face into a characteristic pool corresponding to the face track; for the face track which is not successfully matched, marking the face track which is not successfully matched as matching failure, and deleting the face track after continuous multiple matching failures; and for the face which is not successfully matched, the system considers the face which is newly appeared in the video, initializes the face to be a new face track, allocates a new face ID, initializes the corresponding Kalman filter by using the face rectangular frame coordinate of the face which is not successfully matched, and stores the face characteristic to a characteristic pool of the new face track, and returns to the step S1) to extract the next frame of target image after the Kalman filter is updated. And updating the Kalman filter after the detected face and the face track are successfully matched, wherein the face track prediction frame coordinate obtained by the Kalman filter through historical track prediction is corrected by mainly utilizing a relatively accurate face rectangular frame coordinate, so that the Kalman filter can follow the actual position of the face to perform more accurate prediction on the next frame of target image.
In a second aspect, the invention provides a face tracking system based on multi-feature fusion, which comprises a picture acquisition module, a track prediction module, a judgment module, a face detection module, a feature extraction module, a face matching module, a track updating module, a face snapshot module and a face recognition module;
the image acquisition module is used for acquiring a current frame target image in a video or an image stream;
the track prediction module is used for predicting the position coordinates of a human face track frame of the stored human face track in the current frame target image;
the judging module is used for judging whether the frame number f of the current frame target image is a multiple of the detection interval d or not, and if so, the face detection module is called; if not, returning to the image acquisition module;
the face detection module is used for detecting the face in the current frame target image and outputting the coordinates of a face rectangular frame, and if the face is not detected, the track updating module is directly called;
the characteristic extraction module is used for extracting the face characteristics of each face in the current frame target image;
the face matching module is used for calculating the historical face features of the face track and the feature similarity between the face features of the faces extracted by the feature extraction module so as to perform feature matching; for the face track and the face which are not successfully matched, secondary matching is carried out by calculating an IOU value;
the track updating module is used for setting the face ID of the successfully matched face as the face ID corresponding to the face track, updating a Kalman filter of the face track and adding the face features into a feature pool of the face track; for the face track which is not matched with the corresponding face, marking the face track which is not matched with the corresponding face as matching failure, and deleting the face track which is not matched after continuous multiple matching failures; for the face which is not matched with the corresponding face track, initializing the face which is not matched with the corresponding face track into a new face track, distributing a new face ID, initializing a corresponding Kalman filter by using face rectangular frame coordinates, and storing the face features of the face which is not matched with the corresponding face track into a feature pool of the new face track;
the human face snapshot module is used for snapshot of the human face after the human face track is finished;
and the face recognition module is used for comparing and recognizing the face image obtained by snapshot.
The invention has the beneficial effects that: the invention enables the system processing speed to be fast by setting the face detection interval, and simultaneously filters some false detections and track fragments, thereby improving the tracking accuracy and reducing face ID switching; when the face is matched, feature matching is firstly carried out through the splicing depth feature and the HOG feature, the depth and the non-depth feature of the image are considered, the matching accuracy is high, and the occurrence of the mismatching condition is avoided; the method has the advantages that secondary matching is carried out on unsuccessfully matched tracks and faces by using the IOU value, IOU matching is based on the principle that the positions of a face track prediction frame and a face rectangular frame of the same target are close to each other, when the condition that partial shielding, light mutation, angle change and the like of the face cause feature matching failure is detected, the face and face tracks can be successfully matched by using position information through IOU matching, the tracking continuity is guaranteed, feature matching and IOU matching are matched for use, face ID switching is few, and the tracking accuracy is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the embodiments are briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flow chart of a face tracking method based on multi-feature fusion according to a first embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a face tracking system based on multi-feature fusion according to the first embodiment.
Fig. 3 is a schematic flowchart of the feature extraction module according to the first embodiment.
Fig. 4 is a schematic diagram of a calculation method of the IOU value according to the first embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
In a first embodiment, as shown in fig. 1, a face tracking method based on multi-feature fusion includes the following steps:
s1) acquiring a current frame target image of a video or picture stream.
In this embodiment, the video or picture stream includes multiple frames of continuous images, the images in the multiple frames are sequentially taken from an initial frame for processing, and the image taking operation can be completed through an image encoding and decoding library such as OpenCV, where a frame number f of the initial frame is 0, and a frame number of each subsequent frame is increased by 1.
S2) acquiring all stored face tracks, and predicting the position coordinates of a face track frame of all stored face tracks in a current frame target image by using a Kalman filtering method, wherein each stored face track corresponds to a Kalman filter, and one Kalman filter correspondingly predicts the position coordinates of the face track frame of one stored face track;
s3) setting the detection interval as d, judging whether the frame number of the current frame target image is a multiple of the detection interval d, and if so, entering the step S4); if not, adding 1 to the frame number of the current frame target image, and entering step S9).
Generally, a face target in a video has tens to hundreds of frames from drawing to drawing, the face position and posture change between continuous frames is very small, the background is basically unchanged, the significance of detecting and matching each frame of video is not great, computing resources are consumed very, and the processing speed is influenced. Therefore, the invention carries out the steps of face detection and follow-up tracking once every d frames (namely the detection interval is d), and the other frames only carry out Kalman prediction of the face track (namely, the Kalman filter is updated according to the face track prediction frame), thereby being capable of improving the processing speed by times on the premise of not influencing the tracking effect and simultaneously avoiding some situations of error tracking and face ID switching. The size of the detection interval d can be adjusted according to the actual application requirements.
S4) carrying out face detection on the current frame target image, judging whether a face is detected in the current frame target image, if so, outputting face rectangular frame coordinates corresponding to each face, and entering step S5); if not, marking that the matching of all stored face tracks fails, adding 1 to the frame number of the current frame target image, and entering step S9).
In step S4), a face detection algorithm is used to perform face detection on the current frame target image, the face rectangular frame coordinates corresponding to each face are output, the face confidence corresponding to each face is output, a confidence threshold is set, and the face rectangular frame with the face confidence lower than the confidence threshold is deleted.
The face detection algorithm outputs the coordinates of the face rectangular frame and also outputs the face confidence corresponding to the face rectangular frame, and the face confidence represents the probability that the face rectangular frame is a face. By setting the confidence threshold, the invention can filter out the face rectangular frames with lower face confidence, and the face rectangular frames with lower face confidence are false detection or pictures with poorer quality (fuzzy and large angle).
In the step S4), marking matching failures of all stored face tracks, acquiring track continuous matching failure frame numbers of the stored face tracks which fail to be matched, setting a matching failure threshold q, respectively judging whether the track continuous matching failure frame numbers of the stored face tracks which fail to be matched are not less than the matching failure threshold q, and if so, deleting the stored face tracks of which the track continuous matching failure frame numbers are not less than the matching failure threshold q in the stored face tracks which fail to be matched; and if not, adding 1 to the track continuous matching failure frame number of the stored face track of which the track continuous matching failure frame number is smaller than the matching failure threshold q in all the stored face tracks with matching failure.
S5) obtaining a face rectangular frame corresponding to each face according to the face rectangular frame coordinates, performing feature extraction on the face rectangular frame by using a feature extraction algorithm to obtain face features of all faces in a current frame target image, wherein the total number of the faces detected when face detection is performed on the current frame target image is n, and the method comprises the following steps as shown in FIG. 3:
s51) calculating the length and the width according to the ith personal face rectangular frame coordinate, wherein i is less than or equal to n, acquiring a larger value of the length and the width, modifying the ith personal face rectangular frame coordinate according to the larger value to obtain a square face rectangular frame (corresponding to a square face small image in the image 3), taking the square face rectangular frame as a face rectangular frame corresponding to the ith personal face, zooming the square face rectangular frame to a predicted size, and setting the predicted size to be 112 x 112 to obtain an ith personal face rectangular frame after zooming;
s52) establishing a trained mobilefacetet network, inputting the zoomed ith personal face rectangular frame into the trained mobilefacetet network, outputting the depth feature of the ith personal face rectangular frame through the trained mobilefacetet network, and carrying out normalization processing on the depth feature of the ith personal face rectangular frame to obtain the depth feature of the normalized ith personal face rectangular frame;
s53) obtaining the HOG feature of the ith personal face rectangular frame by using an HOG (histogram of Oriented graphics) feature extraction algorithm, and carrying out normalization processing on the HOG feature of the ith personal face rectangular frame to obtain the HOG feature of the ith personal face rectangular frame after the normalization processing;
s54) sequentially splicing the depth feature of the ith personal face rectangular frame subjected to the normalization processing in the step S52) and the HOG feature of the ith personal face rectangular frame subjected to the normalization processing in the step S53) to obtain the face feature of the ith human face;
s55) repeating the steps S51) to S54) in sequence to obtain the face characteristics of all the faces in the current frame target image.
The invention cuts out a face rectangular frame from an original target image according to the coordinates of the face rectangular frame, and performs feature extraction after the face rectangular frame is scaled to the size of a predicted size. And when the picture is cut, a square area is cut according to the long edge of the face rectangular frame, and the face is positioned in the center of the area, so that the face is not deformed when the picture is zoomed. The human face features of the invention are composed of two parts, and the depth features output by the Mobilefacenet network and the HOG features output by the HOG feature extraction algorithm are spliced into complete human face features in sequence. The HOG features are formed by calculating and counting gradient direction histograms of local areas of the image, have geometric and optical invariance and can better capture local shape information of the image. The depth feature extraction network is obtained by removing a classification layer from a Mobilefacetet network, and the depth feature can represent deep abstract features of the image and has stronger expression capability than manually designed shallow features. The two features are sequentially spliced into a complete human face feature for comparison and matching, deep information and shallow information of the image can be compared simultaneously, the advantages of the traditional features and the advantages of the depth features are combined, and more accurate comparison can be achieved. The mobile facenet network is a small network, the convolution layer adopts deep separable convolution, compared with the common convolution, the parameter quantity and the calculation quantity of the mobile facenet network are greatly reduced, and the speed of a characteristic extraction part is ensured. The method comprises the steps of representing the depth feature output by a Mobilefacenet network by using [ x1, x2, x3, \8230 ], representing the HOG feature extracted by a HOG feature extraction algorithm by using [ y1, y2, y3, \8230 ], normalizing and recombining the depth feature and the HOG feature respectively, wherein the final face feature is [ x1, x2, x3, \8230, y1, y2, y3, \8230;). The variation ranges of the depth features output by the mobile facenet network and the HOG features extracted by the HOG feature extraction algorithm are inconsistent, and if the variation ranges are too large, the variation range of the cosine similarity obtained by calculation is large, and whether the two features are similar or not is difficult to measure; the difference of the contribution degrees of the two characteristics to the similarity of the characteristics is large due to the inconsistent ranges, the advantage of fusing the two characteristics cannot be exerted, the overlarge change range of the characteristic value can be avoided through normalization processing, and the accuracy of subsequent matching is ensured.
S6) acquiring feature pools of all stored face tracks, wherein each feature pool of the stored face tracks comprises a plurality of historical face features, and respectively performing face feature matching on the face features of all faces in the step S5) with the feature pools of all stored face tracks to acquire a first face feature matching result.
In the step S6), acquiring a feature pool of all stored face tracks, wherein the feature pool of each stored face track comprises a plurality of historical face features, respectively performing face feature matching on the face features of all faces in the step S5) with the feature pool of all stored face tracks to obtain a first face feature matching result, wherein the first face feature matching result comprises calculating feature similarity between the face features of each face and the plurality of historical face features in the feature pool of all stored face tracks, setting a feature similarity threshold, screening out feature similarity exceeding the feature similarity threshold, regarding the face with the highest feature similarity in the feature similarity exceeding the feature similarity threshold and the stored face tracks as the same face, and labeling the face with the highest feature similarity in the feature similarity exceeding the feature similarity threshold and the stored face tracks to be successfully matched; if the stored face track does not match the face, the stored face track which does not match the face is marked as matching failure, and if the face does not match the stored face track, the face which does not match the stored face track is marked as matching failure.
The feature pool of the stored face track comprises a certain amount of historical face features, and the features all participate in the calculation of feature similarity, so that the failure of matching between the face and the face track caused by the face posture or illumination mutation in the target image of the previous frame is avoided; the size of the feature pool determines how many recent frames of face features can be stored in a section of track, and the face features can be set according to actual needs.
Calculating the feature similarity between the face features of each face and a plurality of historical face features in the feature pool of all stored face tracks, wherein the face features of the ith face and the w th calendar in the feature pool of the jth stored face trackFeature similarity between history face features
Figure BDA0002722414790000161
a represents the face feature of the ith face, b jw The w-th historical face feature in the feature pool representing the j-th stored face trajectory, cos (a, b) jw ) And the cosine similarity between the face feature of the ith face and the w-th historical face feature in the feature pool of the jth stored face track is represented, j is more than or equal to 1 and less than or equal to e, and e is the total number of the stored face tracks.
According to the invention, the cosine similarity between the face features and the historical face features is firstly obtained, and because the complete face features are spliced with two normalized features (namely the depth features and the HOG features), the feature similarity is calculated to be half of the cosine similarity between the face features and the historical face features, so that the maximum value of the feature similarity is reduced to 1. The feature similarity threshold is used for filtering tracks with too small feature similarity and detecting face pairs, if the feature similarity threshold is lower than the feature similarity threshold, the probability that the two features, namely the face feature and the historical face feature, are greater than the feature of the same person, and then mismatching can be caused by performing matching screening, so that the matching accuracy is influenced. The setting of the size of the feature similarity threshold is obtained from actual tests.
And S7) carrying out secondary matching on the face which fails to be matched and the face track which fails to be matched in the first face feature matching result by calculating the IOU value to obtain a second face feature matching result.
In step S7), performing secondary matching on the face with failed matching and the face track with failed matching in the first face feature matching result by calculating an IOU value to obtain a second face feature matching result, including calculating an IOU value between a face rectangular frame corresponding to each face without matching to a corresponding face track and face track prediction frames of all stored face tracks, setting an IOU threshold value, screening out the IOU values exceeding the IOU threshold value, regarding the face with the highest IOU value among the IOU values exceeding the IOU threshold value and the stored face tracks as the same face, and labeling the face with the highest IOU value among the IOU values exceeding the IOU threshold value and the stored face tracks to be successfully matched; if the stored face track does not match the face, the stored face track which does not match the face is marked as matching failure, and if the face does not match the stored face track, the face which does not match the stored face track is marked as matching failure.
And for the tracks which are not successfully matched and the detected faces, calculating the IOU value between the detected face frame and each track prediction frame obtained in the second step, and for the parts exceeding the IOU threshold value, considering that the tracks with the maximum IOU value are successfully matched with the detected faces. The invention carries out secondary matching on the face which is not matched with the corresponding face track in the first face feature matching result or the face track which is not matched with the corresponding face track through an Intersection Over Unit (IOU) value. The IOU value is calculated by the ratio of the intersection and union of the 'predicted frame' and the 'real frame' (see figure 4), the 'predicted frame', namely the face track prediction frame of the stored face track obtained by predicting the track in the step S2) through the Kalman filter, and the real frame is actually a detected face rectangular frame. The face rectangle frame A corresponding to the k-th face not matched with the corresponding face track k And a face track prediction frame B of the r-th stored face track r IOU value in between
Figure BDA0002722414790000171
A k ∩B r A face rectangular frame A corresponding to the kth face not matched with the corresponding face track k And a face track prediction frame B of the r-th stored face track r Intersection between, A k ∪B r A face rectangular frame A corresponding to the kth face not matched with the corresponding face track k A face track prediction frame B corresponding to the r-th stored face track r The union of (a) and (b). By calculating the IOU value, the contact ratio of the positions between the face track prediction frame and the face rectangular frame can be measured, and whether the face track is matched with the detected face or not can be judged. The invention additionally adds IOU matching after face feature matching, so that the face with larger feature variation can be successfully matched through the position information, the matching success rate is increased, and the new face track caused by matching failure is correspondingly reducedInitialization, thereby reducing face ID switching.
S8) updating the Kalman filter and the track information according to the first face feature matching result and the second face feature matching result, adding 1 to the frame number of the current frame target image, and entering the step S9).
In step S8), updating the kalman filter and the trajectory information according to the first face feature matching result and the second face feature matching result, including the steps of:
s81) acquiring track continuous matching failure frame numbers of stored face tracks successfully matched with the face in the current frame target image, and setting the track continuous matching failure frame numbers of the stored face tracks successfully matched with the face in the current frame target image to be 0; acquiring a face ID of a stored face track, and setting the face ID of a successfully matched face as the face ID of the stored face track corresponding to the successfully matched face; storing the face features of the successfully matched face into a corresponding feature pool of the stored face track; acquiring the coordinates of a face rectangular frame of a successfully matched face, and updating a Kalman filter of a stored face track corresponding to the successfully matched face according to the coordinates of the face rectangular frame of the successfully matched face;
s82) acquiring a face which fails to be matched, determining the face which fails to be matched as a face which newly appears in the video, initializing the face which fails to be matched into a new face track, and distributing a new face ID to the new face track; initializing a Kalman filter corresponding to the new face track by using the face rectangular frame coordinates of the face which fails to be matched, and storing the face features of the face which fails to be matched into a feature pool of the new face track; setting the number of the continuous matching failure frames of the new face track to be 0;
s83) adding 1 to the number of continuous matching failure frames of the track of the stored face track failed in matching to the stored face track failed in matching; acquiring track continuous matching failure frame numbers x of stored face tracks failed in matching, judging whether the track continuous matching failure frame numbers x are not smaller than a matching failure threshold q, and if yes, deleting the stored face tracks failed in matching; if not, the process proceeds to step S9).
The method comprises the steps of setting a successfully matched detected face ID as a face ID corresponding to a face track, updating a Kalman filter corresponding to the face track, and adding the face characteristics of the detected face into a characteristic pool corresponding to the face track; for the face track which is not successfully matched, marking the face track which is not successfully matched as matching failure, and deleting the face track after continuous multiple matching failures; and for the faces which are not successfully matched, the system considers the faces newly appearing in the video, initializes the faces to a new face track, allocates a new face ID, initializes a corresponding Kalman filter by using the coordinates of the face rectangular frame of the faces which are not successfully matched, stores the face features into a feature pool of the new face track, and returns to the step S1) to extract the next frame of target image after the Kalman filter is updated. And updating the Kalman filter after the detected face and the face track are successfully matched, wherein the face track prediction frame coordinate obtained by the Kalman filter through historical track prediction is corrected by mainly utilizing a relatively accurate face rectangular frame coordinate, so that the Kalman filter can follow the actual position of the face to perform more accurate prediction on the next frame of target image.
S9) judging whether the face tracking is finished or not, if so, finishing the face tracking; if not, returning to the step S1).
In a second aspect, the embodiment further provides a face tracking system based on multi-feature fusion, as shown in fig. 2, including a picture acquisition module, a trajectory prediction module, a judgment module, a face detection module, a feature extraction module, a face matching module, a trajectory update module, a face snapshot module, and a face recognition module;
the image acquisition module is used for acquiring a current frame target image in a video or an image stream;
the track prediction module is used for predicting the position coordinates of a human face track frame of the stored human face track in the current frame target image;
the judging module is used for judging whether the current frame target image f is a multiple of the detection interval d, and if so, the face detection module is called; if not, returning to the image acquisition module;
the face detection module is used for detecting the face in the current frame target image and outputting the coordinates of a face rectangular frame, and if the face is not detected, the track updating module is directly called;
the characteristic extraction module is used for extracting the face characteristics of each face in the current frame target image;
the face matching module is used for calculating the feature similarity between the historical face features of the face track and the face features of each face extracted by the feature extraction module so as to perform feature matching; for the face track and the face which are not successfully matched, secondary matching is carried out by calculating an IOU value;
the track updating module is used for setting the face ID of the successfully matched face as the face ID corresponding to the face track, updating a Kalman filter of the face track and adding the face features into a feature pool of the face track; for the face track which is not matched with the corresponding face, marking the face track which is not matched with the corresponding face as matching failure, and deleting the face track which is matched with the failure after continuous multiple matching failures; for the face which is not matched with the corresponding face track, initializing the face which is not matched with the corresponding face track into a new face track, distributing a new face ID, initializing a corresponding Kalman filter by using face rectangular frame coordinates, and storing the face features of the face which is not matched with the corresponding face track into a feature pool of the new face track;
the face snapshot module is used for snapshot of the human face after the face track is finished;
and the face recognition module is used for comparing and recognizing the face image obtained by snapshot.
Updating the Kalman filter after successful matching, wherein the face track prediction frame coordinates obtained by the Kalman filter through historical track prediction are corrected by using more accurate coordinates of a face rectangular frame of a current frame, so that the filter can follow the actual position of the face to make more accurate prediction in the next frame; meanwhile, when the human face features are added into the feature pool, if the feature pool is full, the human face features with the longest time are discarded, generally, the same human face close to each other in time has smaller change, is closer in feature expression, and is easier to match successfully.
For the track with matching failure, the number of times of continuous matching failure is marked, the track with matching failure can still participate in face matching, and if the matching is successful, the number of times of matching failure is cleared; and if the continuous matching failure times exceed the matching failure threshold value, the face track is considered to have disappeared from the picture, so that the corresponding information of the face is deleted, and the subsequent matching is not performed. The setting of the matching failure threshold value enables the situation that the face disappears in a short time, such as shielding, head lowering, large-angle side face and the like, to not cause track interruption, the face track can successfully identify the same face in the subsequent matching, the continuity is kept, and the face ID switching is reduced.
For the detected face which fails to be matched, the system considers the face which newly appears in the video, initializes the face to a track, allocates a new face ID for the face, and initializes the Kalman filter by using the coordinate information of the face rectangular frame. And the initialized Kalman filter can be used for predicting the position coordinates of the face track in the next frame of target image. The invention also sets an initialization threshold value, the new face track is still in an undetermined state, and the new face track is converted from the undetermined state to a determined state only if the number of frames which are successfully matched in the follow-up tracking continuously exceeds the initialization threshold value. If the matching fails in a pending state (i.e. a new face track is not matched with a face when the frame number is less than the initialization threshold in subsequent tracking), the new face track is directly deleted, and the marking of the matching failure is not carried out like the existing face track in a determined state. The method is used for avoiding the phenomenon of error tracking caused by error detection during detection, and the error detection target cannot be detected by continuous multiple frames usually, so that the error tracking is reduced by setting the initialization threshold value, and the tracking accuracy is improved.
The present embodiment also provides a computer-readable storage medium, all or part of the steps in the method related to the present embodiment may be implemented by a program to instruct related hardware, where the program may be stored in the storage medium readable by a computer device, and is used to execute all or part of the steps in the methods described in the foregoing embodiments. The computer device, for example: personal computer, server, network equipment, intelligent mobile terminal, intelligent home equipment, wearable intelligent equipment, vehicle-mounted intelligent equipment and the like; the storage medium, for example: RAM, ROM, magnetic disk, magnetic tape, optical disk, flash memory, U disk, removable hard disk, memory card, memory stick, network server storage, network cloud storage, etc.
By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:
the invention enables the system processing speed to be fast by setting the face detection interval, and simultaneously filters some false detection and track fragments, thereby improving the tracking accuracy and reducing face ID switching; when the face is matched, feature matching is firstly carried out through the splicing depth feature and the HOG feature, the depth and the non-depth feature of the image are considered, the matching accuracy is high, and the occurrence of the mismatching condition is avoided; the method has the advantages that secondary matching is carried out on unsuccessfully matched tracks and faces by using the IOU value, IOU matching is based on the principle that the positions of a face track prediction frame and a face rectangular frame of the same target are close to each other, when the condition that partial shielding, light mutation, angle change and the like of the face cause feature matching failure is detected, the face and face tracks can be successfully matched by using position information through IOU matching, the tracking continuity is guaranteed, feature matching and IOU matching are matched for use, face ID switching is few, and the tracking accuracy is improved.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims (9)

1. A face tracking method based on multi-feature fusion is characterized by comprising the following steps:
s1) acquiring a current frame target image of a video or picture stream;
s2) acquiring all stored face tracks, and predicting the position coordinates of face track frames of all stored face tracks in the current frame target image by using a Kalman filtering method, wherein each stored face track corresponds to a Kalman filter, and one Kalman filter correspondingly predicts the position coordinates of the face track frame of one stored face track;
s3) setting a detection interval d, judging whether the frame number of the current frame target image is a multiple of the detection interval d, and if so, entering the step S4); if not, adding 1 to the frame number of the current frame target image, and entering step S9);
s4) carrying out face detection on the current frame target image, judging whether a face is detected in the current frame target image, if so, outputting face rectangular frame coordinates corresponding to each face, and entering the step S5); if not, marking that the matching of all stored face tracks fails, adding 1 to the frame number of the current frame target image, and entering step S9);
s5) obtaining a face rectangular frame corresponding to each face according to the face rectangular frame coordinates, and extracting the features of the face rectangular frame based on a depth feature extraction algorithm and an HOG feature extraction algorithm to obtain the face features of all the faces in the current frame target image;
s6) acquiring feature pools of all stored face tracks, wherein each feature pool of the stored face tracks comprises a plurality of historical face features, and respectively performing face feature matching on the face features of all faces in the step S5) with the feature pools of all stored face tracks to acquire a first face feature matching result;
s7) performing secondary matching on the face which fails to be matched and the face track which fails to be matched in the first face feature matching result by calculating the IOU value to obtain a second face feature matching result;
s8) updating a Kalman filter and track information according to the first face feature matching result and the second face feature matching result, adding 1 to the frame number of the current frame target image, and entering the step S9);
s9) judging whether the face tracking is finished or not, if so, finishing the face tracking; if not, return to step S1).
2. The method for tracking the human face based on the multi-feature fusion of claim 1, wherein in the step S4), a human face detection algorithm is used to detect the human face of the current frame target image, the coordinates of the human face rectangular frame corresponding to each human face are output, meanwhile, the human face confidence corresponding to each human face is output, a confidence threshold is set, and the human face rectangular frame with the human face confidence lower than the confidence threshold is deleted.
3. The face tracking method based on multi-feature fusion according to claim 1, characterized in that in step S4), all stored face tracks are marked as failed in matching, and further comprising obtaining the number of continuous track matching failure frames of all stored face tracks failed in matching, setting a matching failure threshold q, and respectively judging whether the number of continuous track matching failure frames of stored face tracks failed in matching is not less than the matching failure threshold q, if so, deleting the stored face tracks of which the number of continuous track matching failure frames of stored face tracks is not less than the matching failure threshold q; and if not, adding 1 to the track continuous matching failure frame number of the stored face track of which the track continuous matching failure frame number is smaller than the matching failure threshold q in all the stored face tracks with matching failure.
4. The method for tracking the human face based on the multi-feature fusion of the claim 1 or 3, wherein in the step S5), the human face rectangular frame corresponding to each human face is obtained according to the coordinates of the human face rectangular frame, the characteristic extraction algorithm is used to extract the characteristics of the human face rectangular frame, the human face characteristics of each human face in the current frame target image are obtained, and the total number of the detected human faces in the current frame target image is n, and the method comprises the following steps:
s51) calculating the length and the width according to the ith personal face rectangular frame coordinate, wherein i is less than or equal to n, modifying the ith personal face rectangular frame coordinate according to the calculation result to obtain a square personal face rectangular frame, taking the square personal face rectangular frame as a personal face rectangular frame corresponding to the ith personal face, zooming the square personal face rectangular frame to the predicted size, and obtaining a zoomed ith personal face rectangular frame;
s52) establishing a trained mobile face network, inputting the zoomed ith personal face rectangular frame into the trained mobile face network, outputting the depth feature of the ith personal face rectangular frame through the trained mobile face network, and carrying out normalization processing on the depth feature of the ith personal face rectangular frame to obtain the depth feature of the ith personal face rectangular frame after the normalization processing;
s53) obtaining the HOG feature of the ith personal face rectangular frame by using an HOG feature extraction algorithm, and carrying out normalization processing on the HOG feature of the ith personal face rectangular frame to obtain the HOG feature of the ith personal face rectangular frame after the normalization processing;
s54) sequentially splicing the depth feature of the ith personal face rectangular frame subjected to the normalization processing in the step S52) and the HOG feature of the ith personal face rectangular frame subjected to the normalization processing in the step S53) to obtain the face feature of the ith human face;
s55) repeating the steps S51) to S54) in sequence to obtain the face characteristics of all faces in the current frame target image.
5. The multi-feature fusion based face tracking method according to claim 4, wherein in step S6), a feature pool of all stored face tracks is obtained, the feature pool of each stored face track comprises a plurality of historical face features, the face features of all faces in step S5) are respectively matched with the feature pool of all stored face tracks to obtain a first face feature matching result, the first face feature matching result comprises calculating feature similarity between the face features of each face and a plurality of historical face features in the feature pool of all stored face tracks, setting a feature similarity threshold, screening out feature similarity exceeding the feature similarity threshold, regarding the face with the highest feature similarity among the feature similarity exceeding the feature similarity threshold and the stored face tracks as the same, and labeling the face with the highest feature similarity among the feature similarity exceeding the feature similarity threshold and the stored face tracks as a successful matching result; if the stored face track does not match the face, the stored face track which does not match the face is marked as matching failure, and if the face does not match the stored face track, the face which does not match the stored face track is marked as matching failure.
6. The method for tracking the human face based on the multi-feature fusion of claim 5, wherein the feature similarity between the human face feature of each human face and a plurality of historical human face features in the feature pool of all stored human face tracks and the feature similarity between the human face feature of the ith human face and the w-th historical human face feature in the feature pool of the jth stored human face track are calculated
Figure FDA0003813063950000041
a represents the face feature of the ith face, b jw The w-th historical face feature in the feature pool representing the jth stored face track, cos (a, b) jw ) And the cosine similarity between the face feature of the ith face and the w-th historical face feature in the feature pool of the jth stored face track is represented, j is more than or equal to 1 and less than or equal to e, and e is the total number of the stored face tracks.
7. The method for tracking the human face based on the multi-feature fusion according to claim 6, wherein in step S7), the second human face feature matching result is obtained by performing secondary matching on the face with failed matching and the face track with failed matching in the first human face feature matching result through calculating the IOU value, and the method includes calculating the IOU value between the face rectangular frame corresponding to each human face not matched to the corresponding face track and the face track prediction frames of all stored face tracks, setting an IOU threshold, screening out the IOU values exceeding the IOU threshold, considering the human face with the highest IOU value among the IOU values exceeding the IOU threshold and the stored face tracks as the same human face, and labeling the human face with the highest IOU value among the IOU values exceeding the IOU threshold and the stored face tracks as successful matching; if the stored face track does not match the face, the stored face track which does not match the face is marked as matching failure, and if the face does not match the stored face track, the face which does not match the stored face track is marked as matching failure.
8. The method for tracking the human face based on the multi-feature fusion according to the claim 3 or 7, wherein in the step S8), the Kalman filter and the track information are updated according to the first human face feature matching result and the second human face feature matching result, and the method comprises the following steps:
s81) acquiring track continuous matching failure frame numbers of stored face tracks successfully matched with the face in the current frame target image, and setting the track continuous matching failure frame numbers of the stored face tracks successfully matched with the face in the current frame target image to be 0; acquiring a face ID of a stored face track, and setting the face ID of a successfully matched face as the face ID of the stored face track corresponding to the successfully matched face; storing the face features of the successfully matched face into a corresponding feature pool of the stored face track; acquiring the coordinates of a face rectangular frame of a successfully matched face, and updating a Kalman filter of a stored face track corresponding to the successfully matched face according to the coordinates of the face rectangular frame of the successfully matched face;
s82) acquiring a face which fails to be matched, determining the face which fails to be matched as a face which newly appears in the video, initializing the face which fails to be matched into a new face track, and distributing a new face ID to the new face track; initializing a Kalman filter corresponding to the new face track by using the face rectangular frame coordinates of the face which fails to be matched, and storing the face features of the face which fails to be matched into a feature pool of the new face track; setting the number of the continuous matching failure frames of the new face track to be 0;
s83) adding 1 to the number of continuous matching failure frames of the track of the stored face track failed in matching; acquiring track continuous matching failure frame number x of the stored face track failed in matching, judging whether the track continuous matching failure frame number x is not less than a matching failure threshold q, and if so, deleting the stored face track failed in matching; if not, the process proceeds to step S9).
9. A face tracking system based on multi-feature fusion is applicable to the face tracking method based on multi-feature fusion as claimed in any one of claims 1 to 8, and is characterized by comprising an image acquisition module, a track prediction module, a judgment module, a face detection module, a feature extraction module, a face matching module, a track updating module, a face snapshot module and a face recognition module;
the picture acquisition module is used for acquiring a current frame target image in a video or a picture stream;
the track prediction module is used for predicting the position coordinates of a human face track frame of the stored human face track in the current frame target image;
the judging module is used for judging whether the frame number f of the current frame target image is a multiple of the detection interval d, and if so, the face detection module is called; if not, returning to the image acquisition module;
the face detection module is used for detecting the face in the current frame target image and outputting the coordinates of a face rectangular frame, and if the face is not detected, the track updating module is directly called;
the feature extraction module extracts the face features of each face in the current frame target image based on a depth feature extraction algorithm and an HOG feature extraction algorithm;
the face matching module is used for calculating the feature similarity between the historical face features of the face track and the face features of each face extracted by the feature extraction module so as to perform feature matching; for the face track and the face which are not successfully matched, calculating an IOU value to carry out secondary matching;
the track updating module is used for updating the Kalman filter and the track information according to the face feature matching result;
the face snapshot module is used for snapshot of the human face after the face track is finished;
and the face recognition module is used for comparing and recognizing the face image obtained by snapshot.
CN202011091950.1A 2020-10-13 2020-10-13 Face tracking method and system based on multi-feature fusion Active CN112215155B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011091950.1A CN112215155B (en) 2020-10-13 2020-10-13 Face tracking method and system based on multi-feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011091950.1A CN112215155B (en) 2020-10-13 2020-10-13 Face tracking method and system based on multi-feature fusion

Publications (2)

Publication Number Publication Date
CN112215155A CN112215155A (en) 2021-01-12
CN112215155B true CN112215155B (en) 2022-10-14

Family

ID=74053857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011091950.1A Active CN112215155B (en) 2020-10-13 2020-10-13 Face tracking method and system based on multi-feature fusion

Country Status (1)

Country Link
CN (1) CN112215155B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010333B (en) * 2021-03-24 2021-10-15 北京中电兴发科技有限公司 Multi-scene inter-process communication method suitable for Linux server cluster
CN112990072A (en) * 2021-03-31 2021-06-18 广州敏视数码科技有限公司 Target detection and tracking method based on high and low dual thresholds
CN112819863B (en) * 2021-04-16 2021-08-03 北京万里红科技股份有限公司 Snapshot target tracking method and computing device in remote iris recognition
CN113065523B (en) * 2021-04-26 2023-06-16 上海哔哩哔哩科技有限公司 Target tracking method and device, electronic equipment and storage medium
CN113205079B (en) * 2021-06-04 2023-09-05 北京奇艺世纪科技有限公司 Face detection method and device, electronic equipment and storage medium
CN113379772B (en) * 2021-07-06 2022-10-11 新疆爱华盈通信息技术有限公司 Mobile temperature measurement method based on background elimination and tracking algorithm in complex environment
CN114897944B (en) * 2021-11-10 2022-10-25 北京中电兴发科技有限公司 Multi-target continuous tracking method based on DeepSORT
CN116152872A (en) * 2021-11-18 2023-05-23 北京眼神智能科技有限公司 Face tracking method, device, storage medium and equipment
CN113838100A (en) * 2021-11-24 2021-12-24 广东电网有限责任公司中山供电局 Target dynamic tracking method and system based on edge calculation
CN114663796A (en) * 2022-01-04 2022-06-24 北京航空航天大学 Target person continuous tracking method, device and system
CN114783043B (en) * 2022-06-24 2022-09-20 杭州安果儿智能科技有限公司 Child behavior track positioning method and system
CN117636045A (en) * 2023-12-07 2024-03-01 湖州练市漆宝木业有限公司 Wood defect detection system based on image processing

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490901A (en) * 2019-07-15 2019-11-22 武汉大学 The pedestrian detection tracking of anti-attitudes vibration
CN110751674A (en) * 2018-07-24 2020-02-04 北京深鉴智能科技有限公司 Multi-target tracking method and corresponding video analysis system
WO2020155873A1 (en) * 2019-02-02 2020-08-06 福州大学 Deep apparent features and adaptive aggregation network-based multi-face tracking method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10691925B2 (en) * 2017-10-28 2020-06-23 Altumview Systems Inc. Enhanced face-detection and face-tracking for resource-limited embedded vision systems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751674A (en) * 2018-07-24 2020-02-04 北京深鉴智能科技有限公司 Multi-target tracking method and corresponding video analysis system
WO2020155873A1 (en) * 2019-02-02 2020-08-06 福州大学 Deep apparent features and adaptive aggregation network-based multi-face tracking method
CN110490901A (en) * 2019-07-15 2019-11-22 武汉大学 The pedestrian detection tracking of anti-attitudes vibration

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于模型融合和特征关联的视频目标跟踪算法;季露等;《计算机技术与发展》;20180207(第06期);全文 *

Also Published As

Publication number Publication date
CN112215155A (en) 2021-01-12

Similar Documents

Publication Publication Date Title
CN112215155B (en) Face tracking method and system based on multi-feature fusion
CN109360226B (en) Multi-target tracking method based on time series multi-feature fusion
CN107145862B (en) Multi-feature matching multi-target tracking method based on Hough forest
CN113284168A (en) Target tracking method and device, electronic equipment and storage medium
CN112215156B (en) Face snapshot method and system in video monitoring
CN113139521B (en) Pedestrian boundary crossing monitoring method for electric power monitoring
CN110852219A (en) Multi-pedestrian cross-camera online tracking system
CN104361327A (en) Pedestrian detection method and system
CN108564598B (en) Improved online Boosting target tracking method
CN115131821A (en) Improved YOLOv5+ Deepsort-based campus personnel crossing warning line detection method
CN111476160A (en) Loss function optimization method, model training method, target detection method, and medium
CN111027370A (en) Multi-target tracking and behavior analysis detection method
CN112132873A (en) Multi-lens pedestrian recognition and tracking based on computer vision
CN112465854A (en) Unmanned aerial vehicle tracking method based on anchor-free detection algorithm
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
CN113537107A (en) Face recognition and tracking method, device and equipment based on deep learning
Basavaiah et al. Human activity detection and action recognition in videos using convolutional neural networks
Nodehi et al. Multi-metric re-identification for online multi-person tracking
Alagarsamy et al. Identifying the Missing People using Deep Learning Method
CN116824641B (en) Gesture classification method, device, equipment and computer storage medium
Yu et al. A unified transformer based tracker for anti-uav tracking
US20210097333A1 (en) Hierarchical sampling for object identification
CN114373203A (en) Picture archiving method and device, terminal equipment and computer readable storage medium
CN114140494A (en) Single-target tracking system and method in complex scene, electronic device and storage medium
Sujatha et al. An innovative moving object detection and tracking system by using modified region growing algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant