CN113221841A

CN113221841A - Face detection and tracking method and device, electronic equipment and storage medium

Info

Publication number: CN113221841A
Application number: CN202110616942.2A
Authority: CN
Inventors: 宁学成; 胡炳然; 刘青松; 梁家恩
Original assignee: Unisound Shanghai Intelligent Technology Co Ltd
Current assignee: Unisound Shanghai Intelligent Technology Co Ltd
Priority date: 2021-06-02
Filing date: 2021-06-02
Publication date: 2021-08-06

Abstract

The invention relates to a method, a device, an electronic device and a storage medium for detecting and tracking a human face, wherein the method comprises the following steps: acquiring a video frame; judging whether the current video frame meets the condition of the first frame of the video or continuously tracking four frames; if the condition of the first frame or the continuous tracking of four frames of the video is met, detecting the coordinates of a first face frame and the position coordinates of five key points of a first face; and if the condition of the first frame or the continuous tracking of four frames of the video is not met, tracking the position coordinates of five key points of the second face according to the position coordinates of the five key points of the first face and updating the coordinates of the frame of the second face. In the embodiment of the application, five key points of the face are used as tracking targets, the tracking is not easy to lose, the position coordinates of the five key points of the first face are obtained every five times, and then tracking is carried out, and the position coordinates of the five key points of the second face are tracked according to the position coordinates of the five key points of the first face to update the coordinates of the frame of the second face, so that the tracked coordinates of the frame of the second face are more stable and cannot shake.

Description

Face detection and tracking method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of face detection and tracking, in particular to a face detection and tracking method, a face detection and tracking device, electronic equipment and a storage medium.

Background

At present, the existing face detection system is composed of two main modules, namely a face detection module and a face tracking module. Firstly, a face is detected through a face detection model, and coordinates (x, y, w, h) of a face detection frame are initialized. And initializing a tracking target t arg et (x, y, w, h) according to the detected face frame coordinates, calling a tracking module in a subsequent frame, and updating the detection frame coordinates t arg et (x ', y', w ', h').

The existing face tracking scheme based on detection usually takes two points on the diagonal line of a face frame as the target of face tracking. Because the two points of the diagonal line of the face frame are not obvious relative to the surrounding features, the face tracking result is often inaccurate and is easy to lose. In the scheme of tracking five key points of the human face, the tracked human face frame is often jittered due to inconsistent tracking results of the five key points, and particularly, when tracking of an extremely small point fails, the tracked human face frame is easy to deform.

Disclosure of Invention

The invention provides a method and a device for detecting and tracking a human face, electronic equipment and a storage medium, which can solve the technical problem that a tracked human face frame is easy to lose.

The technical scheme for solving the technical problems is as follows:

in a first aspect, an embodiment of the present invention provides a face detection and tracking method, including:

acquiring a video frame;

judging whether the current video frame meets the condition of the first frame of the video or continuously tracking four frames;

if the condition of the first frame or the continuous tracking of four frames of the video is met, detecting the coordinates of a first face frame and the position coordinates of five key points of a first face;

and if the condition of the first frame or the continuous tracking of four frames of the video is not met, tracking the position coordinates of five key points of the second face according to the position coordinates of the five key points of the first face and updating the coordinates of the frame of the second face.

In some embodiments, tracking the position coordinates of the five key points of the second face according to the position coordinates of the five key points of the first face and updating the coordinates of the frame of the second face comprises:

tracking the position coordinates of the five key points of the second face according to an L-K optical flow algorithm to obtain the position coordinates of the five key points of the tracked second face;

judging good points and bad points in the coordinates of the five key points of the tracked second face according to the threshold value of the key points;

determining the average value of the difference values of the coordinates corresponding to the good points and the position coordinates of the five key points of the second face; the average is the offset before and after tracking;

and updating the coordinates of the dead pixel and the coordinates of the second face frame according to the offset.

In some embodiments, the above method further comprises:

and if the position coordinates of the five key points of the tracked face are all dead points, re-detecting the key points of the face.

In some embodiments, the determining good points and bad points in the coordinates of the five key points of the tracked second face according to the key point threshold in the above method includes:

the coordinates in the five key point coordinates of the second face are good points which are larger than the first key point threshold value and smaller than the second key point threshold value;

the coordinates in the coordinates of the five key points of the second face are smaller than the first key point threshold value or larger than the second key point threshold value, and are dead points;

wherein the first keypoint threshold is less than the second keypoint threshold.

In some embodiments, the updating the coordinates of the dead pixel and the coordinates of the second face frame according to the offset in the method includes:

and adding the offset to the coordinates before dead pixel tracking to obtain updated dead pixel coordinates and second face frame coordinates.

In some embodiments, the position coordinates of the five key points in the above method include: the position coordinates of the left eye, the position coordinates of the right eye, the position coordinates of the nose, the position coordinates of the left mouth corner, and the position coordinates of the right mouth corner.

In a second aspect, an embodiment of the present invention provides an apparatus for face detection and tracking, including:

an acquisition module: for acquiring a video frame;

a judging module: the video frame tracking module is used for judging whether the current video frame meets the condition of video first frame or continuously tracking four frames;

the first detection module: the method comprises the steps of detecting coordinates of a first face frame and position coordinates of five key points of a first face if the conditions of a first frame or four continuous tracking frames of a video are met;

a second detection module and a tracking module: and if the condition of the first frame or the continuous tracking of four frames of the video is not met, detecting the position coordinates of five key points of the second face and tracking the position coordinates of the five key points of the second face.

In some embodiments, the tracking module in the apparatus: and is also used for:

In a third aspect, an embodiment of the present invention further provides an electronic device, including: a processor and a memory;

the processor is configured to execute a method of face detection and tracking as described in any one of the above by calling a program or instructions stored in the memory.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, which stores a program or instructions, where the program or instructions cause a computer to execute a method for detecting and tracking a human face as described in any one of the above.

The invention has the beneficial effects that: acquiring a video frame; judging whether the current video frame meets the condition of the first frame of the video or continuously tracking four frames; if the condition of the first frame or the continuous tracking of four frames of the video is met, detecting the coordinates of a first face frame and the position coordinates of five key points of a first face; and if the condition of the first frame or the continuous tracking of four frames of the video is not met, tracking the position coordinates of five key points of the second face according to the position coordinates of the five key points of the first face and updating the coordinates of the frame of the second face. In the embodiment of the application, five key points of the face are used as tracking targets, the tracking is not easy to lose, and the tracking is not always carried out, if the tracking is always carried out, the positions of the key points are deviated beyond a certain time, so that the position coordinates of the five key points of the first face are detected every five times, then the position coordinates of the five key points of the second face are tracked, and the position coordinates of the five key points of the second face are tracked according to the position coordinates of the five key points of the first face to update the coordinates of the frame of the second face, so that the tracked face frame is more stable and cannot shake.

Drawings

FIG. 1 is a diagram of a method for face detection and tracking according to an embodiment of the present invention;

FIG. 2 is a diagram of another face detection and tracking method provided by an embodiment of the invention;

FIG. 3 is a diagram of an apparatus for face detection and tracking according to an embodiment of the present invention;

fig. 4 is a schematic block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

In order that the above objects, features and advantages of the present application can be more clearly understood, the present disclosure will be further described in detail with reference to the accompanying drawings and examples. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. The specific embodiments described herein are merely illustrative of the disclosure and are not limiting of the application. All other embodiments that can be derived by one of ordinary skill in the art from the description of the embodiments are intended to be within the scope of the present disclosure.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

Fig. 1 is a diagram of a face detection and tracking method according to an embodiment of the present invention.

In a first aspect, with reference to fig. 1, an embodiment of the present invention provides a method for detecting and tracking a human face, including: s101, S102 and S103:

s101: a video frame is acquired.

Specifically, the video frames acquired in the embodiment of the present application are consecutive multiple pictures, and each picture in the video is a frame.

S102: and judging whether the current video frame meets the condition of video first frame or continuously tracking four frames.

Specifically, in the embodiment of the present application, the continuous tracking of four frames is to track four frames after the first frame, that is, to determine whether the current video frame meets the conditions of the first frame, the sixth frame, the eleventh frame, and the like of the video. It should be understood that one frame, two frames, and three frames are counted by the counter.

S103: and if the condition of the first frame or continuous tracking of four frames of the video is met, detecting the coordinates of the first face frame and the position coordinates of five key points of the first face.

Specifically, if the judgment result is that the condition of the first frame of the video or the condition of continuously tracking four frames is met, namely the current frame is the first frame, the sixth frame, the eleventh frame and the like of the video, the coordinates of a face frame in the pictures of the first frame, the sixth frame, the eleventh frame and the like and the position coordinates of five key points of the face are detected; the position coordinates of the five key points include: the position coordinates of the left eye, the position coordinates of the right eye, the position coordinates of the nose, the position coordinates of the left mouth corner, and the position coordinates of the right mouth corner.

Specifically, if the judgment result is that the condition of the first frame or the continuous tracking of the four frames of the video is not met, that is, the current frame is a second frame, a third frame, a fourth frame, a fifth frame and the like, the coordinates of the face frames of the second frame, the third frame, the fourth frame and the fifth frame are updated by tracking the position coordinates of the five key points of the face of the second frame, the third frame, the fourth frame and the fifth frame according to the position coordinates of the five key points of the face of the detected first frame.

It should be understood that, in the embodiment of the application, based on five key points of a human face as a tracking target, the tracking is not easy to lose, and the tracking is not performed all the time, if the tracking is performed all the time, the positions of the key points are deviated beyond a certain time, and the coordinates of the frame of the human face are updated by tracking the coordinates of the positions of the five key points of the second human face according to the coordinates of the positions of the five key points of the first human face to update the coordinates of the frame of the second human face, so that the tracked frame of the human face is more stable and cannot shake.

In the embodiment of the application, the coordinates of a face frame and the coordinates of five key points of a face are detected every four frames, and the positions of the face frame and the key points are corrected, practice shows that every five frames detect the coordinates of the face frame and the coordinates of the five key points of the face once every ten frames, and it is found that the time consumption difference of every ten frame detection is not large every five frames, but every five frame detection is accurate in tracking, every ten frame detection is performed once, and the seventh frame and the eighth frame are likely to be lost.

Fig. 2 is a diagram of another face detection and tracking method provided in the embodiment of the present invention.

In some embodiments, in conjunction with fig. 2, tracking the position coordinates of five key points of the second face according to the position coordinates of five key points of the first face to update the coordinates of the second face frame includes four steps S201, S202, S203 and S204:

s201: and tracking the position coordinates of the five key points of the second face according to an L-K optical flow algorithm to obtain the position coordinates of the five key points of the second face after tracking.

Specifically, in the embodiment of the present application, the L-K optical flow algorithm may utilize a small number of key points for tracking, and has a small amount of calculation, a fast convergence, and a good tracking effect for an object with a slow moving speed, where an LK optical flow algorithm called by the present application returns an errorcode value used for measuring a difference between an original image fragment and a tracking result, and the errorcode values of five key points on the tracked face are assumed to be (0.6,0.9,0.2,0.9, and 0.3), respectively.

S202: and judging good points and bad points in the coordinates of the five key points of the tracked second face according to the threshold value of the key points.

Specifically, in the embodiment of the present application, determining good points and bad points in coordinates of five key points of the tracked second face according to a key point threshold includes: the coordinates in the five key point coordinates of the second face are good points which are larger than the first key point threshold value and smaller than the second key point threshold value; the coordinates in the coordinates of the five key points of the second face are smaller than the first key point threshold value or larger than the second key point threshold value, and are dead points; wherein the first keypoint threshold is less than the second keypoint threshold.

Specifically, in the embodiment of the present application, two critical point thresholds are set to be track _ quality _ threshold equal to 0.8 and min _ track _ dist _ threshold equal to 0.0001, respectively, and a good point and a bad point in five points are determined according to the two critical point thresholds. Specifically, in this embodiment, taking the respective errorcode values of five key points of the tracked face as (0.6,0.9,0.2,0.9,0.3) as an example, according to the condition that the errorcode is <0.8 and the errorcode is >0.0001,0.0001 is the first key point threshold, 0.8 is the second key point threshold, it is determined that the three points corresponding to 0.6, 0.2, and 0.3 are good points, and the remaining two points corresponding to 0.9 are bad points, and the tracking is lost.

The specific first key point threshold and the second key point threshold can be flexibly set, and the embodiment of the application is not limited.

S203: determining the average value of the difference values of the coordinates corresponding to the good points and the position coordinates of the five key points of the second face; the average is the keypoint offset before and after tracking.

Specifically, in the embodiment of the present application, the coordinates of the corresponding positions before tracking are subtracted from the horizontal and vertical coordinates of the three good points, and then the subtracted result is divided by three to obtain the offset of the whole tracked key point.

S204: and updating the coordinates of the dead pixel and the coordinates of the second face frame according to the offset of the key point.

Specifically, in the embodiment of the application, the coordinate before dead pixel tracking and the offset are added to obtain the updated dead pixel coordinate and the second face frame coordinate.

And adding the offset of the key point to the coordinates of the two bad points before tracking to update the tracking coordinates of the two bad points, and updating the coordinates of the second face frame according to the good point coordinates and the updated bad point coordinates. Five key points based on the human face are adopted as tracking targets, so that tracking loss is not easy to occur; and updating the coordinates of the face frame by adopting the point with good tracking result and the offset mean value of the coordinates before tracking, so that the tracked face frame is more stable and cannot shake.

In some embodiments, the above method further comprises:

Assuming that the error values of five key points of the tracked face are respectively (0.9,0.9,0.9,0.9,0.9), namely all the key points are dead points, and detecting the coordinates of the face frame and the position coordinates of the five key points of the face.

Specifically, in this embodiment of the application, taking the respective errorcode values of five key points of the tracked face as (0.6,0.9,0.2,0.9,0.3) as an example, according to the condition that the errorcode is <0.8 and the errorcode is >0.0001, the three points corresponding to 0.6, 0.2, and 0.3 are determined to be good points, and the remaining two points corresponding to 0.9 are determined to be bad points, and then lost.

Fig. 3 is a diagram of an apparatus for face detection and tracking according to an embodiment of the present invention.

In a second aspect, in conjunction with fig. 3, an embodiment of the present invention provides an apparatus for detecting and tracking a human face, including:

the acquisition module 301: for acquiring video frames.

Specifically, in this embodiment of the present application, the video frames acquired by the acquiring module 301 are consecutive multiple pictures, and each picture in the video is a frame.

The judging module 302: and the method is used for judging whether the current video frame meets the condition of video initial frame or continuously tracking four frames.

Specifically, in this embodiment of the application, the continuous tracking of four frames is to track four frames after the first frame, that is, the determining module 302 determines whether the current video frame meets the conditions of the first frame, the sixth frame, the eleventh frame, and the like of the video. It should be understood that one frame, two frames, and three frames are counted by the counter.

It should be understood that, in the embodiment of the present application, based on five key points of a face as a tracking target, tracking is not easy to be performed, and tracking is not performed all the time, if tracking is performed all the time, positions of the key points are deviated beyond a certain time, and therefore tracking cannot be performed all the time.

The detection module 303: the detecting module 303 is configured to detect coordinates of a first face frame and position coordinates of five key points of a first face if a condition of tracking a first frame or continuously tracking four frames of a video is satisfied.

Specifically, if the current frame is the first frame, the sixth frame, the eleventh frame, and the like of the video, the detection module 303 detects coordinates of a face frame and position coordinates of five key points of the face in the first frame, the sixth frame, the eleventh frame, and the like.

The tracking module 304: and if the condition of the first frame or the continuous tracking of four frames of the video is not met, tracking the position coordinates of five key points of the second face according to the position coordinates of the five key points of the first face and updating the coordinates of the frame of the second face.

Specifically, if the condition of the first frame or the continuous tracking of four frames of the video is not satisfied, that is, the current frame is the second frame, the third frame, the fourth frame, the fifth frame, and the like, the tracking module 304 tracks the position coordinates of the five key points of the face of the second frame, the third frame, the fourth frame, and the fifth frame according to the detected position coordinates of the five key points of the face of the first frame, and updates the face frame coordinates of the second frame, the third frame, the fourth frame, and the fifth frame.

In some embodiments, the tracking module 304 in the above apparatus is further configured to: and tracking the position coordinates of the five key points of the second face according to an L-K optical flow algorithm to obtain the position coordinates of the five key points of the second face after tracking.

And judging good points and bad points in the coordinates of the five key points of the tracked second face according to the threshold value of the key points. Determining the average value of the difference values of the coordinates corresponding to the good points and the position coordinates of the five key points of the second face; the average is the offset before and after tracking. And updating the coordinates of the dead pixel and the coordinates of the second face frame according to the offset.

Fig. 4 is a schematic block diagram of an electronic device provided by an embodiment of the present disclosure.

As shown in fig. 4, the electronic apparatus includes: at least one processor 401, at least one memory 402, and at least one communication interface 403. The various components in the electronic device are coupled together by a bus system 404. A communication interface 403 for information transmission with an external device. It is understood that the bus system 404 is used to enable communications among the components. The bus system 404 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, the various buses are labeled as bus system 404 in fig. 4.

It will be appreciated that the memory 402 in this embodiment can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.

In some embodiments, memory 402 stores the following elements, executable units or data structures, or a subset thereof, or an expanded set thereof: an operating system and an application program.

The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application programs, including various application programs such as a Media Player (Media Player), a Browser (Browser), etc., are used to implement various application services. A program for implementing any one of the methods for face detection and tracking provided by the embodiments of the present application may be included in an application program.

In this embodiment of the application, the processor 401 is configured to execute the steps of the embodiments of the method for detecting and tracking a human face provided by the embodiments of the application by calling a program or an instruction stored in the memory 402, which may be specifically a program or an instruction stored in an application program.

Acquiring a video frame;

Any one of the methods for detecting and tracking a human face provided by the embodiments of the present application may be applied to the processor 401, or implemented by the processor 401. The processor 401 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 401. The Processor 401 may be a general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The steps of any one of the methods for detecting and tracking the human face provided by the embodiments of the present application can be directly embodied as being executed by a hardware decoding processor, or being executed by a combination of hardware and software units in the decoding processor. The software elements may be located in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in a memory 402, and a processor 401 reads information in the memory 402 and performs the steps of a method for detecting and tracking a human face in combination with hardware thereof.

Those skilled in the art will appreciate that although some embodiments described herein include some features included in other embodiments instead of others, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments.

Those skilled in the art will appreciate that the description of each embodiment has a respective emphasis, and reference may be made to the related description of other embodiments for those parts of an embodiment that are not described in detail.

Although the embodiments of the present application have been described in conjunction with the accompanying drawings, those skilled in the art will be able to make various modifications and variations without departing from the spirit and scope of the application, and such modifications and variations are included in the specific embodiments of the present invention as defined in the appended claims, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of various equivalent modifications and substitutions within the technical scope of the present disclosure, and these modifications and substitutions are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method of face detection and tracking, comprising:

acquiring a video frame;

and if the condition of the first frame or the continuous tracking of four frames of the video is not met, tracking the position coordinates of five key points of a second face according to the position coordinates of the five key points of the first face and updating the coordinates of a frame of the second face.

2. The method for detecting and tracking a human face according to claim 1, wherein the tracking the position coordinates of five key points of a second human face according to the position coordinates of five key points of the first human face updates the coordinates of a second human face frame, comprising:

judging good points and bad points in the coordinates of the five key points of the tracked second face according to a key point threshold;

3. The method of face detection and tracking according to claim 2, further comprising:

4. The method for detecting and tracking the human face according to claim 2, wherein the judging good points and bad points in the coordinates of the five key points of the tracked second human face according to the key point threshold comprises:

the coordinates in the five key point coordinates of the second face are smaller than the first key point threshold value or larger than the second key point threshold value, and are dead points;

5. The method of face detection and tracking according to claim 2, wherein said updating the coordinates of the dead pixel and the coordinates of the second face frame according to the offset comprises:

and adding the offset to the coordinates before dead pixel tracking to obtain updated dead pixel coordinates and the second face frame coordinates.

6. The method of face detection and tracking according to any of claims 1 to 5, wherein the position coordinates of the five key points comprise: the position coordinates of the left eye, the position coordinates of the right eye, the position coordinates of the nose, the position coordinates of the left mouth corner, and the position coordinates of the right mouth corner.

7. An apparatus for face detection and tracking, comprising:

an acquisition module: for acquiring a video frame;

the first detection module: detecting the coordinates of a first face frame and the position coordinates of five key points of a first face if the conditions of the first frame or the continuous tracking of four frames of the video are met;

a second detection module and a tracking module: and if the condition of the first frame or the continuous tracking of four frames of the video is not met, tracking the position coordinates of five key points of a second face according to the position coordinates of the five key points of the first face and updating the coordinates of a frame of the second face.

8. The apparatus for face detection and tracking according to claim 7, wherein the tracking module: and is also used for:

9. An electronic device, comprising: a processor and a memory;

the processor is configured to execute a method of face detection and tracking according to any one of claims 1 to 6 by calling a program or instructions stored in the memory.

10. A computer-readable storage medium storing a program or instructions for causing a computer to perform a method of face detection and tracking according to any one of claims 1 to 6.