CN112581506A

CN112581506A - Face tracking method, system and computer readable storage medium

Info

Publication number: CN112581506A
Application number: CN202011641171.4A
Authority: CN
Inventors: 罗伯特·罗恩思; 马原
Original assignee: Beijing Pengsi Technology Co ltd
Current assignee: Beijing Pengsi Technology Co ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-03-30

Abstract

The application provides a face tracking method, a face tracking system and a computer readable storage medium. The method comprises the following steps: tracking a first face and a second face by using a first tracker in a first tracking frame; performing face detection on the first detection frame to obtain a detection result, wherein the detection result indicates that the first face is detected in the first detection frame and the second face is not detected in the first detection frame; tracking the first face using the first tracker at the second tracking frame; tracking a second face at a second tracking frame using a second tracker; wherein the first tracker is a deep learning model-based tracker, and the second tracker is a kernel correlation filter-based tracker. When the face is tracked, the tracking result of the missed detected face is not directly output, but another tracker with different types is tried to be used for continuously tracking the face, so that the repeated output of the tracking result of the same face caused by the missed detection can be reduced, and the tracking effect of the face is improved.

Description

Face tracking method, system and computer readable storage medium

Technical Field

The present application relates to the field of computer vision, and more particularly, to a face tracking method, system, and computer-readable storage medium.

Background

Object tracking is an important research direction in the field of computer vision. The human face tracking is a main research direction of target tracking, and the human face tracking is widely applied to the fields of video monitoring, man-machine interaction, security protection, automatic driving and the like.

In the process of face tracking, a plurality of faces generally appear in an image frame, so that a plurality of faces need to be tracked simultaneously. The prior art generally uses the same tracker for tracking a plurality of faces. Once the tracked face is not detected in the detection frame, the tracking result is output. When some faces are missed, the tracking result of the same face is repeatedly output by the multi-face tracking mode provided by the prior art, and further the face tracking effect is poor.

Disclosure of Invention

The application provides a face tracking method, a face tracking system, a computer readable storage medium and a computer program product, which are used for improving the face tracking effect.

In a first aspect, a face tracking method is provided, including: tracking a first face and a second face by using a first tracker in a first tracking frame; performing face detection on the first detection frame to obtain a detection result, wherein the detection result indicates that the first face is detected in the first detection frame and the second face is not detected in the first detection frame; tracking the first face using the first tracker at the second tracking frame; tracking a second face at a second tracking frame using a second tracker; the second tracking frame is any one tracking frame between the first detection frame and the second detection frame, and the second detection frame is the next detection frame of the first detection frame; wherein the first tracker is a deep learning model-based tracker, and the second tracker is a kernel correlation filter-based tracker.

In one embodiment, face detection is performed on the second detection frame to obtain a third face;

in a first stage of the Hungarian algorithm, matching a third face with a first face tracked by using a first tracker; and if the third face is not successfully matched with the first face, matching the third face with the second face in the second stage of the Hungarian algorithm.

In one embodiment, matching the third face to the first face tracked using the first tracker in a first stage of the hungarian algorithm comprises: in the first stage of Hungarian algorithm, calculating the intersection and parallel ratio of the boundary box of the image of the third face and the boundary box of the image of the first face; calculating a first matching cost according to the intersection ratio of the boundary frame of the image of the third face and the boundary frame of the image of the first face; and if the first matching cost is less than the first threshold value, determining that the third face is matched with the first face.

In one embodiment, if the third face does not match the first face, the first face is subsequently tracked using the second tracker.

In one embodiment, in the second stage of the hungarian algorithm, matching the third face with the second face comprises: in the second stage of the Hungarian algorithm, calculating the intersection and combination ratio of the boundary box of the image of the third face and the boundary box of the image of the second face; calculating a second matching cost according to the intersection ratio of the boundary frame of the image of the third face and the boundary frame of the image of the second face; and if the second matching cost is less than a second threshold value, determining that the third face is matched with the second face.

In one embodiment, the first threshold is less than the second threshold.

In one embodiment, if the third face matches the second face, the second face is subsequently tracked using the first tracker.

In one embodiment, in the process of tracking the second face by using the second tracker, the tracking duration of the second face is recorded; and when the tracking duration of the second face is longer than the preset duration, stopping tracking the second face and outputting the tracking result of the second face.

In a second aspect, there is provided a face tracking system comprising a memory, a processor and a computer program stored on the memory, the processor executing the computer program to implement the steps of the method of the first aspect.

In a third aspect, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the steps of the method of the first aspect.

When the face is tracked, if a certain face is not detected in a detection frame, the tracking result of the face is not directly output, but another tracker with different types is tried to continuously track the face, so that the probability that the tracking result of the same face is repeatedly output due to missing detection is possibly reduced, and the tracking effect of the face is improved.

Drawings

Fig. 1 is a schematic flow chart of a face tracking method according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a video sequence provided in an embodiment of the present application.

Fig. 3 is a schematic diagram of a tracking frame according to an embodiment of the present application.

Fig. 4 is a schematic diagram of a first detection frame provided in an embodiment of the present application.

Fig. 5 is a schematic diagram of a second tracking frame provided in an embodiment of the present application.

Fig. 6 is a schematic diagram of a second detection frame provided in an embodiment of the present application.

Fig. 7 is a neural network heat map generated during the tracking process provided by an embodiment of the present application.

Fig. 8 is a schematic block diagram of a face tracking system provided in an embodiment of the present application.

Detailed Description

Object tracking is an important research direction in the field of computer vision. The human face tracking aiming at the human face is a main research direction of target tracking, and the human face tracking is widely applied to the fields of video monitoring, man-machine interaction, security protection, automatic driving and the like.

For example, in the face tracking process, a face in an image frame may be framed to obtain a bounding box (a tracked bounding box) of the face. The position of the boundary frame of the face, the image in the boundary frame and other information can reflect the tracking condition of the face in the current image frame. The bounding box of the face may change continuously during the tracking process, so that the tracking information (or tracking sequence) of the face needs to be maintained. Each face being tracked may have respective tracking information, which may include information about the image, location, quality, etc. of the face in the respective image frame. After the tracking of a certain face is finished, an image with better quality is generally selected from the tracking information of the face to be output. If multiple faces are tracked simultaneously, the tracking information of the multiple faces can form a dynamic linked list (dynamic linked list). The face that enters the scene newly is added to the dynamic tracking linked list, and the face that leaves the scene is deleted from the dynamic tracking linked list.

The face is typically tracked by some type of tracker (or tracking algorithm). For example, the tracker may be a deep learning model-based tracker, or may be a Kernel Correlation Filter (KCF) tracker. The tracking principle of different trackers is different, which makes the tracking effect of different trackers different under different conditions.

In the process of face tracking, a plurality of faces generally appear in an image frame, so that a plurality of faces need to be tracked simultaneously. In the prior art, the same tracker is generally used for a plurality of faces, but because the states (such as angle, shielding degree, definition and the like) of each face in the plurality of faces in the same frame are different, the same tracker cannot adapt to the states of different faces, so that the tracking effect of each face is different, some faces can be tightly tracked, the tracking effect of some faces is poor, and some faces even cannot be tracked continuously.

For example, some faces of a plurality of faces may not be detected due to problems in angle, occlusion degree, sharpness, and the like, that is, there is missing detection. When some faces are missed, some trackers (such as a tracker based on a deep learning model) in the prior art cannot continue to track, so that the tracking task of the faces is finished and the tracking result is output. However, when the face can be detected again in subsequent image frames due to reasons such as angle adjustment, occlusion removal, and sharpness improvement, the face is tracked again, and a tracking result is output again after the tracking is finished. Therefore, when some faces are missed, the tracking effect of the face tracking method is poor, and the tracking result of the same face is repeatedly output.

In view of the above situation, the present application provides a face tracking method to improve the tracking effect of a face. For example, a tracker based on a deep learning model usually has a high requirement on the image quality of a tracked face, otherwise the face is easily lost. The KCF tracker generally tracks based on image content, and can continuously track even if the image quality of a human face is low. Therefore, when the image quality of the face is poor, the face posture is strange or the face is shielded, the face cannot be detected, or the detected face quality does not meet the requirement, a KCF tracker can be used for tracking the face; when the image quality of the face is restored, it can be tracked using a deep learning model-based tracker.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

Fig. 1 is a schematic flow chart of a face tracking method according to an embodiment of the present application. The method includes steps S110 to S140, which are described below.

And S110, tracking the first face and the second face by using a first tracker in the first tracking frame.

And S120, performing face detection on the first detection frame to obtain a detection result, wherein the detection result indicates that the first face is detected in the first detection frame and the second face is not detected in the first detection frame.

For example, in a first tracking frame, a first face and a second face are tracked by a first tracker, and in a first detection frame, face detection is performed by a detector (or a detection algorithm). The detector may be a deep learning model based detector or other type of detector.

The interval between the tracking frame and the detection frame may be set according to actual needs, and the present application is not limited thereto. It can be understood that, since the algorithm for performing face detection is more complex and time-consuming than the algorithm for face tracking, the number of tracking frames can be set to be much larger than the number of detection frames, for example, the number of tracking frames is several times (5-20, etc.) of the detection frames, for example, 10 times. On the other hand, in order to ensure the tracking accuracy of the tracking frame, one or several detection frames may be set after every consecutive several tracking frames. Optionally, each detection frame may be followed by a consecutive number of tracking frames (which may be referred to as a sequence of tracking frames), that is, between two adjacent detection frames, there are tracking frames, and the interval may be uniform. In other words, a sequence of tracking frames of a predetermined length, which may be 9 frames, for example, is included between every two adjacent detection frames.

For example, one detection frame may be set every 10 image frames, and the remaining image frames may be set as tracking frames. A video sequence comprising image frames is shown in fig. 2, wherein one out of every 10 image frames is taken as a detection frame. The gray image frames are detection frames, for example, the 30 th frame and the 40 th frame can be detection frames, and the rest of the image frames can be tracking frames, for example, the 29 th frame, the 31-39 th frame and the 41-44 th frame can be tracking frames.

Step S110 may include: a first face and a second face are tracked using a first tracker over a first sequence of tracking frames. Step S120 may include: the face detection is performed at a first detection frame after the first tracking frame sequence and the first detection frame is adjacent to the first tracking frame sequence, i.e. the first detection frame after the first tracking frame sequence. In connection with fig. 2, the first tracking frame sequence may be 21 st to 29 th frames and the first detection frame may be 30 th frame.

Optionally, when the first tracker performs face tracking, a batch processing mode may be adopted to perform parallel tracking on a plurality of faces.

Next, according to the video sequence shown in fig. 2, with reference to fig. 3 and fig. 4, the tracking of the first tracker and the detection process of the first detected frame are exemplified by taking the first tracker to track the first face and the second face as an example.

Fig. 3 shows a tracking frame, which may be, for example, frame 29 in fig. 2. In frame 29, a first face and a second face are tracked using a first tracker, for example, the first face may be face1 (labeled with box 1 in FIG. 2) and the second face may be face0 (labeled with box 0 in FIG. 2).

Fig. 4 shows a first detection frame, which may be, for example, the 30 th frame in fig. 2. In frame 30, a first face1 is detected (marked with light gray box 1 in FIG. 4). However, second face0 is not detected (face 0 is marked in FIG. 4 with dark gray box 0 for a more intuitive representation).

In step S130, the first face is tracked by using the first tracker in the second tracking frame.

In step S140, a second tracker is used to track a second face in a second tracking frame.

The second detection frame is a next detection frame of the first detection frame, and the second tracking frame is any tracking frame between the first detection frame and the second detection frame. Taking fig. 2 as an example, the first detection frame may be the 30 th frame, the second detection frame may be the 40 th frame, and the second tracking frame may be any one of the 31 st to 39 th frames.

Wherein the first tracker is a deep learning model-based tracker and the second tracker is a KCF tracker.

Step S130 may include: a first face is detected in a first detected frame, and a second sequence of tracking frames following the first detected frame continues to track the first face using the first tracker. It is understood that the process of using the first tracker in the second tracking frame sequence is similar to the process of using the first tracker in the first tracking frame sequence in S110 described above, and the description is not repeated here.

Similarly, step S140 may include: if a second face is not detected in the first detected frame, a second tracking frame sequence following the first detected frame tracks the second face using a second tracker.

Wherein the second tracking frame sequence may be a tracking frame subsequent to and adjacent to the first detection frame. In connection with fig. 2, the first detection frame may be the 30 th frame and the second tracking frame sequence may be the 31 st to 39 th frames.

Alternatively, since the second face is not detected, the second tracker may perform tracking based on a bounding box of the face of the last tracked frame. For example, with reference to FIG. 2, if the second face0 is not detected in frame 30, frames 31-39 can continue tracking using the second tracker based on the bounding box of the face of the second face0 in frame 29.

The second tracker can track a single face and can also track multiple faces. When the second tracker performs multi-face tracking. For example, the second tracker may be a tracking algorithm running on the CPU, such as a KCF algorithm, and when multiple faces need to be tracked, the multiple faces may be tracked on the CPU in a for loop mode.

Continuing with the example of FIG. 4, for the first face1 detected at 30 frames, tracking may continue using the first tracker. For example, the second tracking frame shown in FIG. 5 is, for example, the 39 th frame in the tracking frame sequence of the 31 st to 39 th frames in FIG. 2. The location of face1 tracked using the first tracker is represented in FIG. 5 by light gray box 1. Since the second face0 is not detected at frame 30, if the first tracker continues to be used to track face0, the face0 poses are deteriorated, and the bounding box of the face may jump arbitrarily. Thus, the present application continues tracking with the generic second tracker to face 0.

From frame 31 to frame 39, face0 is tracked using a second tracker based on the bounding box of face0 at frame 29 (e.g., box 0 in FIG. 3). The location of the face0 tracked at frame 39 using the second tracker is represented, for example, in FIG. 5 by the dark gray box 0.

It will be appreciated that the first tracker may be reused to track a second face when the second face tracked using the second tracker may be detected in subsequent detection frames. For example, fig. 6 shows a second detection frame, which may be, for example, the 40 th frame in fig. 2, that is, a detection frame subsequent to the 30 th frame (first detection frame) shown in fig. 4. At frame 40, face0 is detected (marked with box 0 in FIG. 6), then the first tracker tracking face0 may be reused.

It should be noted that the present application does not limit the number of faces tracked in the same frame, or the number of faces in the dynamic tracking linked list. For example, in the embodiment shown in fig. 3-5, a 3 rd personal face2 may also be tracked. The tracker used by the face2 can be flexibly selected according to the detection situation. For example, detecting the face2 at frame 30, the tracking frame sequence continues to track the face2 using the first tracker at frames 31-39. Combining the tracking situations of the face0 and the face1, any tracking frame, such as the 39 th frame, in the frame sequence of the 31 st to 39 th frames is tracked, and different faces are tracked by using different first trackers and second trackers, such as the first trackers tracking the face1 and the face2, and the second trackers tracking the face 0.

It is understood that the above steps S130 and S140 can be performed simultaneously, that is, the steps include: in response to the detection result in S120 indicating that the first face is detected in the first detection frame and the second face is not detected in the first detection frame, a second tracking frame following the first detection frame tracks the first face using the first tracker and tracks the second face using the second tracker.

Since the first tracker is a deep learning model-based tracker with no versatility. The requirement on the features in the image of the human face is high, and if the image features are few, the problem of loss of the heel is easy to occur. Therefore, when the face is missed, it is difficult to continue tracking the face. The embodiment of the application does not end the tracking and output the tracking result immediately after the missing detection (the image of the face can be output, for example, the image with the best quality in the tracking information of the face can be output), and the tracking is continued by using the second tracker. The second tracker is a KCF tracker, has universality and low requirements on the characteristics of the face, and can process the face which cannot be tracked continuously by the first tracker. By using the second tracker for assisted tracking, a connection is established with a face that may be re-detected later. The interruption of the tracking process is avoided, the tracking information is kept continuous, the phenomenon that the same face generates a plurality of repeated tracking information is avoided, and the repeated output caused by missed detection is reduced. In addition, compared with the prior art, the method for using different trackers for different faces in the same tracking frame is more flexible and has better tracking effect.

Next, one possible implementation of the first tracker, i.e. one possible implementation of the deep learning model based tracker, is given.

For each tracked face, a Kalman model (Kalman-model) may be maintained. Before the deep learning model tracker is applied for tracking, the boundary box of the face of the next tracking frame can be predicted according to the Kalman model. An enlarged search image is created centered on a Kalman-predicted box (Kalman-predicted box), and the size of the enlarged search image may be adjusted to 32 × 32 pixels. According to the Pnet topology, an output feature map of 11x11 pixels can be obtained. Applying softmax to the classification branch results in a neural network heat map, such as a Pnet-heatmap, of size 11x11 pixels, as shown in fig. 7. The position on the Pnet-heatmap with the highest score is the tracking result of the deep learning model. It can be seen that in the embodiment of fig. 7, the face moves down and to the right. Optionally, for more accurate tracking, bounding box regression (bounding box regression) may also be applied to the highest scoring location.

When a new face is detected in the first detection frame, the following problems exist: how to judge whether a new face (new detection) detected by the first detection frame is a newly detected face (new face) or an existing face in a dynamic tracking chain table. Or, there is a problem how to match a new face detected by the first detection frame (for convenience of description, the new face detected by the first detection frame is referred to as a third face) with an existing face in the dynamic tracking chain table, so as to determine whether the face in the dynamic tracking chain table is detected.

Aiming at the problems, the Hungarian algorithm is used for matching. Among them, the Hungarian algorithm may include two phases. Specifically, the face detected by the detection frame is matched with the face tracked by the first tracker in the first stage of the hungarian algorithm, and the face detected by the detection frame is matched with the face tracked by the second tracker in the second stage of the hungarian algorithm. And it can be understood that, because the tracking accuracy of the first tracker to the face is higher than that of the second tracker to the face, the requirement for matching in the first stage is higher than that in the second stage.

For example, in the first stage of the hungarian algorithm, the third face is matched to the first face tracked using the first tracker.

In the first stage, it may be further determined whether a first face is detected, and if the first face is detected, the first face may be tracked using the first tracker continuously in a next tracking frame, and if the first face tracked using the first tracker is not detected, the face may be tracked using the second tracker in a next tracking frame.

And if the third face is not successfully matched with the first face, matching the third face with the second face in the second stage of the Hungarian algorithm. In the second stage, it may be further determined whether a second face tracked by using a second tracker is detected, and if the second face is detected, the second face may be tracked by using the first tracker continuously in a next tracking frame, and if the second face is not detected, the second face may be tracked by using the second tracker in a next tracking frame, or the tracking of the second face may be stopped.

Optionally, after the two-stage matching is finished, if the third face still has an unmatched success, the third face is a newly detected face entering a scene where the image frame is located. Optionally, tracking information may be newly created for the newly detected face, and the face may be added to the dynamic tracking chain table, or a first tracker may be used for tracking in a subsequent tracking frame.

After the second stage of the hungarian matching algorithm, for faces in the dynamic tracking linked list, if a successfully matched face exists, the face can be regarded as detected, and if a non-successfully matched face still exists, the face can be regarded as not detected.

Optionally, the threshold of the matching cost used in the first stage is a first threshold, the threshold of the matching cost used in the second stage is a second threshold, and the first threshold may be smaller than the second threshold. That is, the matching criteria or requirements of the first stage may be more stringent than the second stage. For example, the two stages have different requirements for matching similarity. Alternatively, the first stage threshold may be smaller than the second stage threshold due to poor accuracy of the second face tracked using the second tracker (e.g., the tracked position of the second face deviates from the actual position of the second face). Because the trackers used by the matched faces in the two stages are different, the matching accuracy can be increased to a certain extent by using different matching standards in the two stages.

It should be noted that "the first threshold is smaller than the second threshold" in this application means that the matching in the first stage is more strict than that in the second stage, and for some matching cost calculation methods, there may be a case where the matching requirement is more strict as the cost value is higher, and for this case, there may also be a case where the first threshold is larger than the second threshold.

It should be noted that, the present application is not limited to specific values of the first threshold and the second threshold.

In the following, the two-stage matching process of the hungarian algorithm is explained in detail by an embodiment.

When the detection frame detects a third face, a matching cost is constructed, and the matching cost is constructed based on an intersection over Intersection (IOU) of two bounding boxes. One of the bounding boxes may be a bounding box of a third face detected by the first detection frame, and the other bounding box may be a bounding box of the first face or a bounding box of the second face. In the first stage of the hungarian algorithm, the two bounding boxes may be a bounding box of the third face and a bounding box of the first face, and in the second stage of the hungarian algorithm, the two bounding boxes may be a bounding box of the third face and a bounding box of the second face.

Similarly, when there are more than two detected new faces (one of the third faces) or faces in the dynamic tracking linked list (the first face and the second face are two of the faces), multiple matching costs may constitute a Hungarian cost matrix (Hungarian cost matrix).

In the first stage of the Hungarian algorithm, if the first matching cost is smaller than a first threshold value, determining that the third face is matched with the first face. That is, the first face is detected, the position information of the tracking information of the first face may be updated to the position information of the third face, and the first tracker may continue to be used for tracking the first face in subsequent tracking frames. And if the first matching cost is larger than or equal to the first threshold, determining that the third face is not matched with the first face, wherein the first face is not detected, and tracking the first face by using a second tracker in subsequent tracking frames.

And if the third face is not matched with the first face, calculating a second matching cost of the second face and the third face in a second stage of the Hungarian algorithm. If the second matching cost is less than the second threshold, it is determined that the third face matches the second face, that is, the second face is detected, the location information of the tracking information of the second face may be updated to the location information of the third face, and the first tracker may continue to be used for tracking the second face in subsequent tracking frames. And if the first matching cost is larger than or equal to the second threshold, determining that the third face is not matched with the second face, wherein the second face is not detected, and tracking the first face by using a second tracker in a subsequent tracking frame.

And if the third face is not matched with the first face and the second face, the third face is a newly detected face. Tracking information can be newly established for the third face, the third face is added into the dynamic tracking linked list, and the third face is tracked.

The number of the faces in the dynamic tracking chain table can be more than 2, namely more than the first face and the second face and other faces in the dynamic tracking chain table. For convenience of description, the present application will refer to faces tracked using a first tracker as a first set of faces (one of which is a first face), and will refer to faces tracked using a second tracker as a second set of faces (one of which is a second face). Next, an embodiment of this case will be described in detail.

In the first stage of the Hungarian matching algorithm, the new faces detected by the first detection frame are matched with the first group of faces. In the first stage, a first group of faces participates in calculation of a cost matrix, and according to the cost matrix, a Hungarian matching algorithm calculates faces detected by a first detection frame which are optimally matched with the first group of faces. In the first stage, it may be further determined whether a face tracked using a first tracker is detected, and if the face tracked using the first tracker is detected, the face may be continuously tracked using the first tracker at a next tracking frame, and if the face tracked using the first tracker is not detected, the face may be tracked using a second tracker at a next tracking frame.

After the first-stage matching of the Hungarian algorithm is completed, if new faces detected by the first detection frame which is not matched still exist, the remaining faces and the second group of faces are subjected to the second-stage matching of the Hungarian algorithm. It will be appreciated that the second stage of the matching process is similar to the first stage, except that the second stage is involved in matching the second set of faces with the new faces detected by the first detected frame. If there is a face with a cost less than the second threshold in the second group of faces, the face matching is successful, that is, the face is detected again, the location information of the tracking information of the face may be updated to the location information of the corresponding detected face, and the first tracker may be used for tracking in the subsequent tracking frame. If the second group of faces have no face matching successfully, that is, the face is still not detected in the detection frame, the second tracker can be used for tracking in the subsequent tracking frame, and the tracking of the face can also be stopped.

In the first stage of the hungarian algorithm, the first group of faces participating in matching may also include faces that are about to leave the picture scene in the dynamic tracking linked list. This avoids the occurrence of multiple repeated outputs of faces wandering near the edges of the image.

The embodiment of the application can continuously track the face which is missed to be detected, but for some faces which cannot be detected for a long time, the continuous use of the second tracker for tracking is not suitable. For example: the face has actually left the scene where the image frame is located, and the subsequent detection frame can not detect the face all the time. For another example: the image quality of the face is continuously poor, and the second tracker is difficult to accurately track the face.

In view of the above situation, in the process of tracking the second face by using the second tracker, the tracking duration of the second tracker is recorded, when the tracking duration is longer than the preset duration, the tracking of the second face is stopped, and the tracking result of the second face is output. The tracking result may be obtained from tracking information, such as a position coordinate sequence of a human face, an image with the highest quality of the human face, and the like.

Alternatively, the tracking duration of the second tracker may be calculated by consecutive frames, and the preset duration may be a set consecutive number of frames, for example, the preset duration may be 50 frames, and when no face is detected in 50 consecutive image frames (or 5 consecutive detected frames for the video sequence shown in fig. 2), that is, during the period, the second tracker is used for tracking, the task of tracking the face is ended, and the tracking result of the face is output.

Alternatively, the tracking duration of the second tracker may be calculated by the number of consecutive detected frames in which the second face is not detected, and may also be referred to as a second tracker tracking age (age), for example. For example, the preset time duration may be 4, when the face is not detected in 4 consecutive detection frames during the process of using the second tracker, in the example of fig. 2, if the face is not detected in any of the 40 th frame, the 50 th frame, the 60 th frame, and the 70 th frame, it is indicated that the tracking time duration is greater than the preset time duration, the tracking of the face is stopped, and the tracking result of the face is output.

In addition, the present application also provides a target tracking system, as shown in fig. 8, including a memory, a processor, and a computer program stored on the memory. And when executed by a processor is able to carry out the steps of the method described above in connection with fig. 1.

The memory may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, a Random Access Memory (RAM), or the like.

The processor may be a general-purpose CPU, a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, and is configured to execute the relevant programs to implement the methods of the embodiments of the present application.

The processor may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method of the present application may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The processor may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory, and performs functions required to be performed by a unit included in the system of the embodiment of the present application or a method of the embodiment of the method of the present application in combination with hardware thereof.

Optionally, the system may further comprise a communication interface and a bus. Wherein the communication interface enables communication with other devices or networks using transceiver means such as, but not limited to, a transceiver. For example, a plurality of images may be acquired from the image capture device through the communication interface, the image processing result may be transmitted to other external devices through the communication interface, and so on. A bus may include a pathway that transfers information between various components of the device (e.g., memory, processor, communication interface).

It is understood that the target tracking system in FIG. 8 may be a computer system, a computer device, or the like. In some embodiments, the object tracking system may be a mobile terminal, such as a handheld mobile terminal, which may be a cell phone, for example.

In addition, the embodiment of the invention also provides a computer storage medium, and the computer storage medium is stored with the computer program. When executed by a computer or processor, may implement the steps of the method described above in connection with fig. 1. For example, the computer storage medium is a computer-readable storage medium.

In one embodiment, the computer program instructions, when executed by a computer or processor, cause the computer or processor to perform the steps of: tracking a first face and a second face by using a first tracker in a first tracking frame; performing face detection on the first detection frame to obtain a detection result, wherein the detection result indicates that the first face is detected in the first detection frame and the second face is not detected in the first detection frame; tracking the first face using the first tracker at the second tracking frame; tracking a second face at a second tracking frame using a second tracker; the second tracking frame is any one tracking frame between the first detection frame and the second detection frame, and the second detection frame is the next detection frame of the first detection frame; wherein the first tracker is a deep learning model-based tracker, and the second tracker is a kernel correlation filter-based tracker.

The computer storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, or any combination of the above storage media. The computer-readable storage medium may be any combination of one or more computer-readable storage media.

In addition, the present invention also provides a computer program product, which contains a computer program or instructions, when the computer program or instructions are executed by a computer or a processor, the computer program or instructions can execute the steps of the method described above with reference to fig. 1.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the foregoing illustrative embodiments are merely exemplary and are not intended to limit the scope of the invention thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present invention. All such changes and modifications are intended to be included within the scope of the present invention as set forth in the appended claims.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Claims

1. A face tracking method, comprising:

tracking a first face and a second face by using a first tracker in a first tracking frame;

performing face detection on a first detection frame to obtain a detection result, wherein the detection result indicates that the first face is detected in the first detection frame and the second face is not detected in the first detection frame;

tracking the first face using the first tracker at a second tracking frame;

tracking the second face using a second tracker at the second tracking frame;

the second tracking frame is any tracking frame between the first detection frame and a second detection frame, and the second detection frame is a next detection frame of the first detection frame;

wherein the first tracker is a deep learning model-based tracker and the second tracker is a kernel correlation filter-based tracker.

2. The method of claim 1, further comprising:

performing face detection on the second detection frame to obtain a third face;

in a first stage of the Hungarian algorithm, matching the third face with the first face tracked by using the first tracker;

and if the third face is not successfully matched with the first face, matching the third face with the second face in a second stage of the Hungarian algorithm.

3. The method of claim 2, wherein the matching, in the first stage of the Hungarian algorithm, the third face to the first face tracked using the first tracker comprises:

in the first stage of the Hungarian algorithm, calculating the intersection and parallel ratio of the boundary box of the image of the third face and the boundary box of the image of the first face;

calculating a first matching cost according to the intersection ratio of the boundary frame of the image of the third face and the boundary frame of the image of the first face;

and if the first matching cost is less than a first threshold value, determining that the third face is matched with the first face.

4. The method of claim 3, further comprising:

and if the third face does not match the first face, using the second tracker to perform subsequent tracking on the first face.

5. The method of claim 3, wherein said matching the third face to the second face in the second phase of the Hungarian algorithm comprises:

in the second stage of the Hungarian algorithm, calculating the intersection and parallel ratio of the boundary box of the image of the third face and the boundary box of the image of the second face;

calculating a second matching cost according to the intersection ratio of the boundary frame of the image of the third face and the boundary frame of the image of the second face;

and if the second matching cost is smaller than a second threshold value, determining that the third face is matched with the second face.

6. The method of claim 5, wherein the first threshold is less than the second threshold.

7. The method of claim 5, further comprising:

and if the third face is matched with the second face, using the first tracker to perform subsequent tracking on the second face.

8. The method according to any one of claims 1-7, further comprising:

recording the tracking duration of the second face in the process of tracking the second face by using the second tracker;

and when the tracking duration of the second face is longer than the preset duration, stopping tracking the second face and outputting the tracking result of the second face.

9. A face tracking system comprising a memory, a processor and a computer program stored on the memory, characterised in that the processor executes the computer program to carry out the steps of the method of any one of claims 1 to 8.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.