CN112085764A

CN112085764A - Real-time face tracking method and system based on video

Info

Publication number: CN112085764A
Application number: CN202010956799.7A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Huayan Mutual Entertainment Technology Co ltd
Current assignee: Beijing Huayan Mutual Entertainment Technology Co ltd
Priority date: 2020-09-17
Filing date: 2020-09-17
Publication date: 2020-12-15

Abstract

The invention discloses a real-time face tracking method and a real-time face tracking system based on videos, wherein the method comprises the following steps: acquiring a video frame image, and selecting a face template image from the video frame image; detecting a face in a current video frame image to obtain a face candidate area image; generating a first histogram of the associated face template image and a second histogram of the associated face candidate region image; performing characteristic value density estimation on the first histogram and the second histogram, and obtaining a suspected face area on the current video frame image according to a characteristic value density estimation result; calculating the similarity of the first histogram density estimation result and a second histogram density estimation result associated with each suspected face area; and finally determining the suspected face area with the highest density estimation similarity as a real face area, and selecting the real face area on the current video frame image to realize the recognition and tracking of the face. The invention improves the accuracy of face detection and ensures the real-time performance of face tracking.

Description

Real-time face tracking method and system based on video

Technical Field

The invention relates to the technical field of face recognition and expression capture, in particular to a real-time face tracking method and system based on video.

Background

With the aging of high-definition video technology, people gradually focus on the monitoring object itself for intelligent video monitoring analysis, and the human face is always regarded as the key research object for intelligent video analysis as an important foreground object. In typical dynamic face tracking, a face target region in a video sequence is continuously determined according to a face target region in a video starting frame through a certain strategy. However, the existing face tracking algorithm has the technical problems that the face moves rapidly, is shielded, and cannot judge the face target in time under the condition of frequently entering and exiting the visual field of the camera, so that the tracking fails, the real-time effect of face tracking is poor, and the face target cannot be tracked rapidly and accurately.

Disclosure of Invention

The present invention is directed to a method and system for real-time video-based face tracking to solve the above-mentioned problems.

In order to achieve the purpose, the invention adopts the following technical scheme:

provided is a video-based real-time face tracking method, including:

step S1, acquiring video frame images, and selecting face template images from the video frame images;

step S2, detecting the face in the current video frame image to obtain a face candidate area image;

step S3, generating a first histogram associated with the face template image and a second histogram associated with the face candidate region image;

step S4, performing characteristic value density estimation on the first histogram and the second histogram, and obtaining a suspected face area on the current video frame image according to the characteristic value density estimation result;

step S5, calculating the similarity of the first histogram density estimation result and the second histogram density estimation result associated with each suspected face area;

step S6, the suspected face area with the highest density estimation similarity is finally determined as a real face area, and the real face area is framed on the current video frame image to realize the recognition and tracking of the face;

step S7, calculating the offset of the face area according to the positions of the face area in the current frame and the last video frame image of the current frame;

and step S8, predicting the position of the face region in the next video frame image according to the calculated offset, reducing the face region searching range according to the predicted face position, and then re-entering the steps S2-S7 to realize continuous recognition and tracking of the face.

As a preferred embodiment of the present invention, in step S4, the method for estimating the eigenvalue density of the first histogram is expressed by the following formula (1):

in the formula (1), the first and second groups,

representing the density estimation result of the characteristic value u on the face template image;

i is the ith pixel on the face template image;

n is the number of pixels on the face template image;

the coordinate position of a pixel i on the face template image is obtained;

u is a histogram feature value on the face template image;

function b and for judging

Whether the color value at (b) is a eigenvalue u;

a kernel function used in histogram density estimation;

c is a normalization constant.

As a preferable embodiment of the present invention, in step S4, the method for estimating the feature value density of the second histogram is expressed by the following formula (2):

in the formula (2), the first and second groups,

representing the density estimation result of the characteristic value u in a search window for detecting the face candidate region;

y represents a center position of the search window;

x_irepresenting a coordinate position of an ith pixel in the search window;

h is the radius of the search window;

u is a histogram feature value in the search window;

function b and method for determining x_iWhether the color value at (b) is a eigenvalue u;

a kernel function used in histogram density estimation;

C_his a normalization constant.

As a preferable aspect of the present invention, in step S5, the similarity between the first histogram density estimation result and the second histogram density estimation result is calculated by the following formula (3):

in the formula (3), the first and second groups,

representing a similarity of the first histogram density estimation result and the second histogram density estimation result;

u is a characteristic value in the face template image or the suspected face area;

m represents the number of classes of the eigenvalues.

As a preferred embodiment of the present invention,

and the suspected face area associated with the second histogram with the maximum value is regarded as a real face area in the current video frame image.

As a preferred scheme of the invention, the mean shift algorithm is adopted to calculate the offset of the position of the face area on the current video frame image compared with the face area on the previous video frame image of the current frame.

As a preferred embodiment of the present invention, the window radius h of the search window_newCalculated by the following formula (4):

h_new＝γh_now+(1-γ)h_upformula (4)

In the formula (4), h_newRepresenting a window radius h for said search window for detecting a face region on a current video frame image_nowCarrying out transformation;

h_nowthe window radius of the search window for detecting a face region on a current video frame image;

h_upthe window radius of the search window for detecting a face region on a previous video frame image of a current frame;

gamma is a constant.

In a preferred embodiment of the present invention, γ is 0.5.

As a preferred embodiment of the present invention, h_now＝h_up±0.1～0.5h_up。

The invention also provides a real-time face tracking system based on video, which can realize the real-time face tracking method, and the real-time face tracking system comprises:

the video frame image acquisition module is used for acquiring a video frame image;

the face template image selection module is connected with the video frame image acquisition module and used for selecting a face template image from the video frame image;

the face detection module is connected with the video frame image acquisition module and is used for detecting a face candidate area in the current video frame image;

the histogram generation module is respectively connected with the face template image selection module and the face detection module and is used for generating a first histogram related to the face template image and a second histogram related to a face candidate area image;

the histogram density estimation module is connected with the histogram generation module and is used for carrying out characteristic value density estimation on the first histogram and the second histogram and obtaining a suspected face area on the current video frame image according to a characteristic value density estimation result;

a density estimation result similarity calculation module connected with the histogram density estimation module and used for calculating the similarity between a first histogram density estimation result and a second histogram density estimation result associated with the suspected face area, and finally determining the suspected face area with the highest density estimation similarity as a real face area;

the face area framing module is connected with the density estimation result similarity calculation module and is used for framing a real face area on the current video frame image;

the face region position information storage module is connected with the face region framing module and is used for acquiring and storing the position information of the real face region framed and selected on the current video frame image;

the human face region offset calculation module is connected with the human face region position information storage module and used for calculating the offset of the position of the human face region according to the positions of the real human face region in the current frame and the last video frame image of the current frame;

the face region position prediction module is connected with the face region offset calculation module and used for predicting the position of the face on the next video frame image according to the calculated offset and the position of the face on the current video frame image;

the searching range determining module is connected with the face region position predicting module and used for determining the searching range of the searching window on the next video frame image according to the position of the predicted face region on the next video frame image;

and the face detection module is connected with the search range determination module and is used for detecting the face area of the next video frame image according to the determined search range.

The invention can preliminarily match suspected face areas by histogram matching of the face template image and the color distribution of the face candidate area, and can accurately judge the real face area from each suspected face area by measuring the density similarity of the histogram characteristic values of the face template image and the suspected face areas. In addition, the invention calculates the offset of the face area at the positions of the current frame and the previous video frame, and predicts the position of the face area on the next video frame image according to the offset, thereby greatly reducing the search range of the face candidate area, improving the face detection speed and ensuring the real-time performance of face tracking.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

FIG. 1 is a diagram of the steps of a video-based real-time face tracking method according to an embodiment of the invention;

fig. 2 is a schematic structural diagram of a video-based real-time face tracking system according to an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.

Wherein the showings are for the purpose of illustration only and are shown by way of illustration only and not in actual form, and are not to be construed as limiting the present patent; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if the terms "upper", "lower", "left", "right", "inner", "outer", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not indicated or implied that the referred device or element must have a specific orientation, be constructed in a specific orientation and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limitations of the present patent, and the specific meanings of the terms may be understood by those skilled in the art according to specific situations.

In the description of the present invention, unless otherwise explicitly specified or limited, the term "connected" or the like, if appearing to indicate a connection relationship between the components, is to be understood broadly, for example, as being fixed or detachable or integral; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or may be connected through one or more other components or may be in an interactive relationship with one another. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Fig. 1 illustrates a video-based real-time face tracking method according to an embodiment of the present invention. As shown in fig. 1, the video-based real-time face tracking method includes the steps of:

step S1, acquiring video frame images, and selecting face template images from the video frame images; the face template image can be selected by a manual frame selection mode, and can also be obtained by adopting the existing image detection algorithm for detection;

step S2, detecting the face in the current frame image to obtain a face candidate area; detecting and obtaining a face candidate region image through the existing face detection algorithm;

step S3, generating a first histogram of the associated face template image and a second histogram of the associated face candidate region image; because the posture, size and shape of the face are changed in the motion process, but the histogram gives the probability of color occurrence, and the probability is not influenced by the change of the target shape and size, the histogram is selected as the image characteristics of the face template image and the face candidate region image, and the color distribution of the first histogram and the second histogram is matched to match a real face region on the video frame image;

step S4, carrying out characteristic value density estimation on the first histogram and the second histogram, and obtaining a suspected face area on the current video frame image according to the characteristic value density estimation result; the method of feature value density estimation for the first histogram can be expressed by the following equation (1):

in the formula (1), the first and second groups,

representing the density estimation result of the characteristic value u (color characteristic) on the face template image;

i is the ith pixel on the human face template image;

n is the number of pixels on the image of the face template;

the coordinate position of the pixel i on the face template image is taken as the pixel i;

u is a histogram characteristic value on the face template image;

function b and for judging

Whether the color value at (b) is a eigenvalue u;

a kernel function used in histogram density estimation;

c is a normalization constant.

Through the above formula (1), the probability of the appearance of the feature value u on the face template image, that is, the density of the estimated feature value u on the face template image, can be estimated. The skin color of the face area is yellow relative to other human body areas, the face area presents a large area of yellow, and according to the color feature, if the estimated appearance probability of a certain color feature or a plurality of color features of the face candidate area is close to the appearance probability of the same color feature or the same color features on the face template image, the face candidate area is regarded as a suspected face area.

The method of performing the feature value density estimation on the second histogram is expressed by the following formula (2):

in the formula (2), the first and second groups,

representing the density estimation result of the characteristic value u in a search window for detecting the face candidate area;

y represents the center position of the search window;

x_irepresenting the coordinate position of the ith pixel in the search window;

h is the radius of the search window;

u is a histogram feature value in the search window;

a kernel function used in histogram density estimation;

C_his a normalization constant.

The first histogram and the second histogram are subjected to characteristic value density estimation, so that a suspected face area on a current video frame image can be preliminarily screened out, the time for subsequently judging whether the suspected face area is a real face area is reduced, the speed of face recognition and tracking is improved, and the real-time performance of face tracking is effectively ensured.

Step S5, calculating the similarity of the first histogram density estimation result and the second histogram density estimation result associated with each suspected face area; the present embodiment calculates the similarity between the second histogram density estimation result and the second histogram density estimation result by the following formula (3):

in the formula (3), the first and second groups,

m represents the number of categories of eigenvalues.

The larger the value is, the higher the similarity between the first histogram and the second histogram is, that is, the similarity between the suspected face area and the face template image is higher. This embodiment will be described

And the suspected face area associated with the second histogram with the largest value is regarded as a real face area in the current video frame image, and the real face area is identified and tracked in a frame selection mode.

Step S6, finally determining the suspected face area with the highest density estimation similarity as a real face area, and framing the real face area on the current video frame image to realize the recognition and tracking of the face;

in order to predict the approximate position of the face region on the next video frame image in advance, so as to reduce the search range of the face region and increase the speed of face tracking detection, preferably, the video-based face tracking method provided by the invention further comprises:

In step S7, the present invention preferably uses a mean shift algorithm to calculate the shift amount of the position of the face region on the current video frame image compared with the face region on the previous video frame image of the current frame. Since the specific process of calculating the position offset of the face region by using the mean shift algorithm is not within the scope of the claimed invention, the specific process of calculating the position offset of the face region is not described herein.

The size of the search window (namely the area size of the face candidate region) influences the histogram characteristic value density estimation result and indirectly influences the accuracy and the real-time performance of face identification tracking_new：

h_new＝γh_now+(1-γ)h_upFormula (4)

In the formula (4), h_newWindow radius h representing a search window for detecting a face region on a current video frame image_nowCarrying out correction;

h_nowdetecting the window radius of a search window of a face area on a current video frame image;

h_updetecting the window radius of a search window of a face area on a previous video frame image of a current frame;

gamma is a constant.

Preferably, γ is 0.5.

h_now＝h_up±0.1～0.5h_up。

As shown in fig. 2, the present invention further provides a real-time video-based face tracking system, which can implement the real-time face tracking method described above, and the system includes:

the video frame image acquisition module 1 is used for acquiring a video frame image;

the face module image selecting module 2 is connected with the video frame image acquiring module 1 and is used for selecting a face template image from the video frame image. Usually, a face template image is selected from a first frame image of a video in a manual selection or automatic detection mode;

the face detection module 3 is connected with the video frame image acquisition module 1 and is used for detecting a face candidate area in the current video frame image;

the histogram generation module 4 is respectively connected with the face template image selection module 2 and the face detection module 3 and is used for generating a first histogram related to the face template image and a second histogram related to the face candidate region image;

the histogram density estimation module 5 is connected with the histogram generation module 4 and is used for performing characteristic value density estimation on the first histogram and the second histogram and obtaining a suspected face area on the current video frame image according to a characteristic value density estimation result; the histogram feature value density estimation method has been described in detail in the above real-time face tracking method, and is not described herein again;

a density estimation result similarity calculation module 6 connected to the histogram density estimation module 5 for calculating the similarity between the first histogram density estimation result and the second histogram density estimation result associated with the suspected face region, and finally determining the suspected face region with the highest density estimation similarity as the real face region; the density estimation result similarity calculation method has been described in the above real-time face tracking method, and is not described herein again;

the face region framing module 7 is connected with the density estimation result similarity calculation module 6 and is used for framing a real face region on the current video frame image;

the face region position information storage module 8 is connected with the face region framing module 7 and is used for acquiring and storing the position information of the real face region framed on the current video frame image;

the face region offset calculation module 9 is connected with the face region position information storage module 8 and used for calculating the offset of the position of the face region according to the positions of the real face region in the current frame and the previous video frame image of the current frame;

the face region position prediction module 10 is connected with the face region offset calculation module 9 and used for predicting the position of the face on the next video frame image according to the calculated offset and the position of the face on the current video frame image;

the searching range determining module 11 is connected to the face region position predicting module 10, and is configured to determine a searching range of the searching window on the next video frame image according to a position of the predicted face region on the next video frame image;

and the face detection module 3 is connected with the search range determination module 11 and is used for detecting the face region of the next video frame image according to the determined search range.

It should be understood that the above-described embodiments are merely preferred embodiments of the invention and the technical principles applied thereto. It will be understood by those skilled in the art that various modifications, equivalents, changes, and the like can be made to the present invention. However, such variations are within the scope of the invention as long as they do not depart from the spirit of the invention. In addition, certain terms used in the specification and claims of the present application are not limiting, but are used merely for convenience of description.

Claims

1. A method for video-based real-time face tracking, comprising:

2. The video-based real-time face tracking method according to claim 1, wherein in the step S4, the method of performing feature value density estimation on the first histogram is expressed by the following formula (1):

in the formula (1), the first and second groups,

i is the ith pixel on the face template image;

n is the number of pixels on the face template image;

the coordinate position of a pixel i on the face template image is obtained;

u is a histogram feature value on the face template image;

function b and for judging

Whether the color value at (b) is a eigenvalue u;

a kernel function used in histogram density estimation;

c is a normalization constant.

3. The video-based real-time face tracking method according to claim 2, wherein in step S4, the method of performing feature value density estimation on the second histogram is expressed by the following formula (2):

in the formula (2), the first and second groups,

y represents a center position of the search window;

x_irepresenting a coordinate position of an ith pixel in the search window;

h is the radius of the search window;

u is a histogram feature value in the search window;

a kernel function used in histogram density estimation;

C_his a normalization constant.

4. The video-based real-time face tracking method according to claim 3, wherein in the step S5, the similarity between the first histogram density estimation result and the second histogram density estimation result is calculated by the following formula (3):

in the formula (3), the first and second groups,

m represents the number of classes of the eigenvalues.

5. The video-based real-time face tracking method of claim 4,

the suspected face area associated with the second histogram with the largest value is regarded as being correctThe real face region in the previous video frame image.

6. The video-based real-time face tracking method according to claim 1, wherein a mean shift algorithm is used to calculate the shift of the position of the face region on the current video frame image compared to the face region on the previous video frame image of the current frame.

7. The video-based real-time face tracking method according to claim 3, wherein a window radius h of the search window_newCalculated by the following formula (4):

h_new＝γh_now+(1-γ)h_upformula (4)

gamma is a constant.

8. The video-based real-time face tracking method of claim 7, wherein γ is 0.5.

9. The video-based real-time face tracking method of claim 7, wherein h is_now＝h_up±0.1～0.5h_up。

10. A real-time video-based face tracking system capable of implementing the real-time face tracking method as claimed in any one of claims 1 to 9, comprising: