CN113409354A

CN113409354A - Face tracking method and device and terminal equipment

Info

Publication number: CN113409354A
Application number: CN202010180300.8A
Authority: CN
Inventors: 李禹源; 胡文泽
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2020-03-16
Filing date: 2020-03-16
Publication date: 2021-09-17

Abstract

The application is applicable to the technical field of image processing, and provides a face tracking method, a face tracking device and terminal equipment, wherein the face tracking method comprises the following steps: performing face key point detection on a face detection area of a current frame image, and determining a target key point, wherein if the current frame image is a preset initial frame image, the face detection area of the current frame image is an area obtained by performing face detection on the current frame image; and positioning the face region of the current frame image according to the target key point, and determining the face detection region of the next frame image. The face tracking method can solve the problem of how to improve the face tracking efficiency in the prior art.

Description

Face tracking method and device and terminal equipment

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a face tracking method, a face tracking device and terminal equipment.

Background

In recent years, a face tracking technology is widely applied to man-machine interaction, community access control, subway security inspection, stability maintenance and anti-terrorism and mobile personnel management and control. In order to realize the positioning and tracking of the human face in different video frames, a common human face tracking method mainly utilizes the characteristics of the human face, such as color, texture and the like, in the video frames to perform the positioning and tracking, but the human face tracking methods cannot adapt to the change of the environment, have poor robustness and cannot ensure the real-time performance. Therefore, the existing face tracking method is low in efficiency.

Disclosure of Invention

In view of this, embodiments of the present application provide a face tracking method, an apparatus, and a terminal device, so as to solve the problem of how to improve face tracking efficiency in the prior art.

A first aspect of an embodiment of the present application provides a face tracking method, including:

performing face key point detection on a face detection area of a current frame image, and determining a target key point, wherein if the current frame image is a preset initial frame image, the face detection area of the current frame image is an area obtained by performing face detection on the current frame image;

and positioning the face region of the current frame image according to the target key point, and determining the face detection region of the next frame image.

A second aspect of an embodiment of the present application provides a face tracking apparatus, including:

the face key point detection unit is used for carrying out face key point detection on a face detection area of a current frame image and determining a target key point, wherein if the current frame image is a preset initial frame image, the face detection area of the current frame image is an area obtained by carrying out face detection on the current frame image;

and the positioning unit is used for positioning the face area of the current frame image according to the target key point and determining the face detection area of the next frame image.

A third aspect of the embodiments of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to enable the terminal device to implement the steps of face tracking.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, causes a terminal device to implement the steps of face tracking as described.

A fifth aspect of embodiments of the present application provides a computer program product, which, when run on a terminal device, causes the terminal device to execute the face tracking method as described in the first aspect.

Compared with the prior art, the embodiment of the application has the advantages that: in the embodiment of the application, each frame of image only needs to determine the target key point in the face detection area according to the face detection area determined in advance, so that the face area of the current frame of image can be positioned according to the target key point, and meanwhile, the face detection area of the next frame of image can be determined according to the target key point, so that the next frame of image continues to carry out target key point detection and face area positioning, and face tracking is realized through continuous face positioning. The face tracking can be further realized according to the target key point only by detecting and determining the target key point in other images except the preset initial frame image without carrying out complete face detection, so that the time of detection operation can be reduced, and the real-time performance of face positioning and tracking is improved; moreover, compared with the existing method for detecting the face based on information such as color, texture and the like, the method for positioning the face based on the key point detection is less influenced by the environment and can better adapt to the environmental change, so that the robustness of face tracking can be improved. In conclusion, the face tracking method of the embodiment of the application can improve the real-time performance and robustness of face tracking, thereby improving the face tracking efficiency.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flow chart illustrating an implementation process of a face tracking method according to an embodiment of the present application;

FIG. 2 is an exemplary diagram of a target key point provided by an embodiment of the present application;

FIG. 3 is a flowchart illustrating a method for tracking a face in a video image sequence according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a face tracking apparatus according to an embodiment of the present application;

fig. 5 is a schematic diagram of a terminal device provided in an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

In addition, in the description of the present application, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

The first embodiment is as follows:

fig. 1 shows a schematic flow chart of a face tracking method provided in an embodiment of the present application, where an execution subject of the face tracking method is a terminal device, and details are as follows:

the face tracking method in the embodiment of the application is a method for continuously positioning and tracking the face area of each frame of image in a specified video image sequence. The following steps S101-S102 are sequentially carried out on each frame of image in the video image sequence, so that the accurate positioning of the face area of the current frame of image is realized, and the face detection area of the next frame of image is determined, so that the face area can be rapidly positioned in the subsequent frame, and the efficient face tracking is realized.

In S101, performing face key point detection on a face detection region of a current frame image, and determining a target key point, wherein if the current frame image is a preset initial frame image, the face detection region of the current frame image is a region obtained by performing face detection on the current frame image.

In the embodiment of the application, the terminal device performs face tracking processing on each frame of image in the video image sequence in sequence, and the current frame of image is a frame of image which is to be subjected to face tracking processing by the terminal device and comes in turn at present in sequence.

In the embodiment of the application, the face detection area of the current frame image is a predetermined area to be subjected to face key point detection, and an accurate face area can be further positioned within the range of the face detection area. Specifically, if the current frame image is a preset start frame image, the face detection region is a region obtained by performing face detection on the current frame image through a preset face detection algorithm before step S101, and if the current frame image is not the preset start frame image, the face detection region is a region determined according to the target key point of the previous frame image. The preset face detection algorithm can generally completely and accurately determine the region where the face is located, but the face detection algorithm is generally large in calculation amount and time-consuming; the face detection area determined based on the previous frame image can quickly locate the area where the face is located, so that after the face detection area is accurately determined by presetting the initial frame image, the subsequent frame image can determine the face detection area depending on the previous frame image without performing face detection on each frame image, thereby improving the face tracking efficiency while ensuring the accuracy. Specifically, the preset start frame image in the embodiment of the present application at least includes a first frame image in a video image sequence, that is, a face detection region of the first frame image is a start region determined by complete and accurate face detection, so that face detection regions of a second frame image, a third frame image and a subsequent frame image can be sequentially determined according to a previous frame image. Illustratively, the face detection algorithm preset in the embodiment of the present application includes, but is not limited to, a template matching algorithm or a neural network-based algorithm.

And performing face key point detection on the face detection area of the current frame image, and determining target key points, wherein the target key points are a plurality of preset face key points, for example, the key points are key points on facial features. Preferably, the number of the target key points is less than or equal to 10, so as to reduce the complexity of detecting the face key points and reduce the operation amount when determining the target key points, thereby improving the face tracking efficiency.

Optionally, the preset start frame image in the embodiment of the present application specifically includes a first frame image in a video image sequence and an image determined according to a preset number of interval frames.

The preset start frame image in the embodiment of the present application includes, in addition to the first frame image serving as the start reference, an image spaced from the first frame image by a preset number of frame intervals. For example, every 5 frames, every 10 frames or every 20 frames is selected as the preset starting frame image.

Specifically, the image determined according to the preset number m of interval frames is an nth frame image, wherein the remainder of n to m is 0, and m and n are both positive integers greater than 1.

That is, the frame number n of the image determined according to the preset number of interval frames satisfies the following formula:

where "%" is a residue and "═ is a number equal to.

Illustratively, m has a value of 10. Since no great change occurs between 10 continuous frames of images under normal conditions, the determination of the preset initial frame image every 10 frames is enough to ensure the accuracy of face tracking.

In the embodiment of the application, the preset initial frame image comprises the image determined according to the preset interval frame number besides the first frame image, so that in the face tracking process of the video image sequence, complete face detection is performed at intervals to accurately determine the face detection area instead of being determined by relying on the face area of the previous frame image all the time, the accumulated deviation in the face tracking process can be reduced, and the robustness and the accuracy of the face tracking method are improved.

Optionally, before the step S101, the method further includes:

acquiring an input instruction of a user, and determining the preset interval frame number according to the input instruction;

and selecting one frame image as a preset initial frame image at intervals of the preset interval frame number from the first frame image of the video image sequence.

The preset interval frame number m in the embodiment of the application can be set by a user according to actual conditions such as the frame rate and the flow of people. Specifically, an input instruction of a user is obtained, and the preset interval frame number m is determined according to digital information carried by the input instruction. Or the input instruction of the user comprises parameters such as frame rate and pedestrian flow, and after the parameters are obtained, the corresponding preset interval frame number m is calculated according to a formula which is fitted in advance according to experiments.

After the preset interval frame number is determined, starting from a first frame image of the video image sequence, determining the first frame image and a frame image selected every the preset interval frame number m as a preset initial frame image.

In the embodiment of the application, because the input instruction of the user can be acquired, and the preset interval frame number is determined according to the actual situation, the accuracy of face tracking can be ensured by ensuring the number of the preset initial frame images.

Optionally, the step S101 specifically includes:

s10101: performing face key point detection on a face detection area of the current frame image according to a preset key point target type;

S10102A: if the face key point of the key point target type is detected in the face detection area, determining the face key point of the key point target type as a target key point of the current frame image;

S10102B: otherwise, performing face detection on the current frame image, resetting the key point target type and/or re-determining the face detection area of the current frame image according to the face detection result, and returning to execute the step S10101.

The target key points in the embodiment of the application are specifically face key points of a preset key point target type, and the key point target type is some key point types set in advance. For example, the preset keypoint target type may include any one or a combination of any number of keypoint types corresponding to the five sense organs, keypoint types corresponding to certain specific positions of the five sense organs, keypoint types corresponding to the chin, the cheek, and the forehead.

In the embodiment of the application, for each key point target type, a corresponding face key point detection algorithm is correspondingly set. Exemplarily, if the first key point target type is a key point type formed by combining a left eye corner, a right eye corner, a nose tip and a mouth corner, the human face key point detection algorithm corresponding to the first key point target type is an algorithm for specifically detecting the left eye corner, the right eye corner, the nose tip and the mouth corner of the human face; setting a second key point target type as a key point type formed by combining a left eye corner, a left mouth corner and a left ear, wherein the human face key point detection algorithm corresponding to the second key point target type is an algorithm specially used for detecting the left eye corner, the left mouth corner and the left ear of the human face; and if the third key point target type is a key point type formed by combining the right eye corner, the right mouth corner and the right ear, the human face key point detection algorithm corresponding to the third key point target type is an algorithm for specially detecting the right eye corner, the right mouth corner and the right ear of the human face.

In S10101, a corresponding face key point detection algorithm is selected according to a preset key point target type, and face key point detection is performed on a face detection region of the current frame image.

In S10102A, if a face keypoint of the keypoint target type is detected in the face detection region of the current frame image according to the face keypoint detection algorithm, the face keypoint of the detected keypoint target type is determined as the target keypoint. For example, if the preset key point target type is the first key point target type, after the detection is successful, target key points including a left-eye corner key point, a right-eye corner key point, a nose tip key point and a mouth corner key point are determined in the current frame image.

In S10102B, if the face key point of the key point target type cannot be detected in the face detection region of the current frame image according to the face key point detection algorithm, it indicates that the current face position changes suddenly or the face pose changes, and the face detection needs to be performed again on the current frame image, and the key point target type is reset according to the face detection result and/or the face detection region of the current frame image is determined again. Specifically, if the face detection result obtained by face detection indicates that the face pose changes, the key point target type is reset according to the current face pose. For example, if the face detection result is that the face pose of the current frame image is changed from the original front orientation to the left orientation, the keypoint target type is set as the second keypoint target type. Specifically, if the face position changes suddenly according to the face detection result obtained by face detection, the face detection area of the current frame image is determined again according to the face position obtained by current face detection. And after the key point target type is reset and/or the face detection area of the current frame image is determined again, returning to the step S10101, and performing face key point detection again.

Optionally, the target key points in the embodiment of the present application include a left eye corner key point, a right eye corner key point, a nose tip key point, and a mouth corner key point.

As shown in fig. 2, the target keypoints are composed of 7 keypoints in total, two canthus keypoints for the left eye, two canthus keypoints for the right eye, a nose tip keypoint, and two mouth corner keypoints. Because the features of the 7 key points are obvious and easy to locate, the calculation amount and complexity of the target key point determination process can be reduced, the face key point detection efficiency is improved, and the face tracking efficiency is further improved.

Optionally, the step S101 specifically includes:

and carrying out face key point detection on the face detection area of the current frame image through a pre-trained lightweight neural network to determine a target key point.

In the embodiment of the application, a pre-trained lightweight neural network is specifically adopted to perform face key point detection on a face detection area of a current frame image, wherein the lightweight neural network can be a MobileNet (a convolutional neural network special for mobile and embedded vision applications), a ShuffleNet (a high-efficiency mobile convolutional neural network), and the like. Because the number of the target key points is small and the target key points are easy to detect, the human face key point detection can be accurately carried out by adopting the lightweight neural network with small parameter number and small calculation amount, and the target key points are determined; meanwhile, the calculation amount of the lightweight neural network is small, so that the face tracking efficiency can be further improved. Illustratively, in the embodiment of the present application, the face keypoint detection is performed by using MobileNetV2 with a width scaling factor of 0.25, MobileNetV2 combines an inverted residual structure, so that the accuracy is high and the time delay is low, and the width scaling factor of 0.25 can further reduce the calculation amount and improve the detection efficiency.

Optionally, the target key points in the embodiment of the present application are specifically key points on facial features, and the pre-trained lightweight neural network in the embodiment of the present application is obtained based on a preset loss function training. The preset loss function is determined by the total loss value of all target key points and the loss value of a single key point of the five sense organs corresponding to each human face, so that the light-weight neural network obtained by training can be used for fitting the whole human face structure formed by all the target key points, the structure of each part of the five sense organs can be further refined and fitted, the performance of detecting the human face key points by the light-weight neural network is improved, the determined target key points are more accurate, and the accuracy of human face tracking is improved.

Specifically, the target key points in the embodiment of the present application include seven key points, which are two canthus key points of the left eye, two canthus key points of the right eye, a nose tip key point, and two mouth corner key points, respectively, and correspondingly, the preset loss function is:

loss＝l_{general assembly}+α(l_{Left eye}+l_{Right eye}+l_Nose+l_{Mouth bar})

Wherein loss is the total loss function value of the lightweight neural network, l_{General assembly}Total loss value, l, for 7 target keypoints_{Left eye}Corresponding loss values, l, for the two corner key points of the left eye_{Right eye}Corresponding loss values, l, for two corner key points of the right eye_NoseLoss values corresponding to key points of the nose tip,/_{Mouth bar}The loss values corresponding to the two key points of the mouth angle are calculated, alpha is a preset weight parameter, and the value range of alpha is between 0 and 1 (for example, alpha is 0.3). Specifically, the method comprises the following steps:

therein, pred₁～pred₇Predicted coordinate values, gth, respectively representing 7 key points output by the lightweight neural network₁～gth₇And the real coordinate values of 7 key points input by the lightweight neural network are respectively represented. As shown in the above formula,/₁Solving a two-norm vector consisting of the predicted coordinate values and the real coordinate values of all target key points (namely 7 target key points in total); l_{Left eye}By finding the predicted coordinate values pred of the two canthus key points for the left eye₁、pred₂With a real coordinate value gth₁、gth₂Solving the two norms; l_{Right eye}By finding the predicted coordinate values pred of two canthus key points of the right eye₃、pred₄With a real coordinate value gth₃、gth₄Solving the two norms; l_NoseBy finding the predicted coordinate value pred of the tip key point₅With a real coordinate value gth₅Found two norms,. l_{Mouth bar}By finding predicted coordinate values pred of two tip angle key points₆、pred₆With a real coordinate value gth₇、gth₇And (4) obtaining the second norm.

In S102, the face region of the current frame image is located according to the target key point, and the face detection region of the next frame image is determined.

After the target key points of the current frame image are accurately determined, the face area of the current frame image can be accurately positioned according to the target key points, and the face detection area of one frame image is determined so as to position the face area of the next frame image.

Optionally, the step S102 includes:

determining a minimum external quadrangle of the target key point, and taking the minimum external quadrangle as a face area of the current frame image;

and expanding the face area of the current frame image to a preset multiple to obtain the face detection area of the next frame image.

According to the method and the device, the minimum circumscribed quadrangle which can surround all the target key points is determined according to the target key points, and the minimum circumscribed quadrangle is used as a face area finally positioned by the current frame image.

After the face region of the current frame image is accurately positioned according to the target key point, the face region is expanded to a preset multiple (for example, the side length of the face region is multiplied by the preset multiple), and the expanded face region is obtained and used as a face detection region of the next frame image. Because the position movement of the human face in two continuous frames of images is not too large in the video image sequence, the human face detection area obtained by externally expanding the human face area by preset times can ensure that the target key points of the next frame of image are included, so that the accuracy of the detection of the human face key points of the next frame of image is ensured. Specifically, the preset multiple is a value determined in advance according to experimental data, and preferably, the preset multiple can ensure that all target key points of the same face can be completely detected in a face detection region in the next frame of image, and can ensure that the face detection region does not contain a complete target key point of another face. Illustratively, the preset multiple is 2.5 times.

Optionally, in this embodiment of the application, each frame of image includes a plurality of faces, after an initial face detection region is obtained by performing face detection in a preset initial frame, a unique identifier is respectively marked on each initial face detection region, and then the face region determined according to the face detection region and the face detection region of a subsequent frame of image all carry the unique identifier, so that a user can better distinguish the tracking conditions of different faces.

For easier understanding, the embodiment of the present application further provides a specific flowchart of a method for performing face tracking in a video image sequence. As shown in fig. 3, the process includes:

s1: acquiring a first frame image of a video image sequence;

s2: carrying out face detection on the frame image to obtain an initial face detection area;

s3: carrying out face key point detection on the face detection area to determine a target key point;

s4: positioning a face area of the current frame image according to the target key point;

s5: determining a face detection area of the next frame of image according to the face area;

s7: if the next frame image exists, acquiring the next frame image, and executing the step S8, otherwise, ending;

s8: if the next frame image is the predetermined start frame image, the process returns to step S2; otherwise, step S3 is executed.

In the embodiment of the application, each frame of image only needs to determine the target key point in the face detection area according to the face detection area determined in advance, so that the face area of the current frame of image can be positioned according to the target key point, and meanwhile, the face detection area of the next frame of image can be determined according to the target key point, so that the next frame of image continues to carry out target key point detection and face area positioning, and face tracking is realized through continuous face positioning. The face tracking can be further realized according to the target key point only by detecting and determining the target key point in other images except the preset initial frame image without carrying out complete face detection, so that the time of detection operation can be reduced, and the real-time performance of face positioning and tracking is improved; moreover, compared with the existing method for detecting the face based on information such as color, texture and the like, the method for positioning the face based on the key point detection is less influenced by the environment and can better adapt to the environmental change, so that the robustness of face tracking can be improved. In conclusion, the face tracking method of the embodiment of the application can improve the real-time performance and robustness of face tracking, thereby improving the face tracking efficiency.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Example two:

fig. 4 is a schematic structural diagram of a face tracking apparatus provided in an embodiment of the present application, and for convenience of explanation, only parts related to the embodiment of the present application are shown:

the face tracking apparatus includes: a face key point detection unit 41 and a positioning unit 42. Wherein:

a face key point detecting unit 41, configured to perform face key point detection on a face detection area of a current frame image, and determine a target key point, where if the current frame image is a preset initial frame image, the face detection area of the current frame image is an area obtained by performing face detection on the current frame image.

And the positioning unit 42 is configured to position the face region of the current frame image according to the target key point, and determine a face detection region of the next frame image.

Optionally, the preset start frame image specifically includes a first frame image in the video image sequence and an image determined according to a preset interval frame number.

Optionally, the face tracking device further includes an instruction obtaining unit and a preset start frame image determining unit:

the instruction acquisition unit is used for acquiring an input instruction of a user and determining the preset interval frame number according to the input instruction;

and the initial frame image determining unit is used for selecting one frame image as a preset initial frame image every preset interval frame number from the first frame image of the video image sequence.

Optionally, the face key detection unit 41 specifically includes a preset face key point detection module, a target key point determination module, and a reset module:

the preset face key point detection module is used for detecting the face key points of the face detection area of the current frame image according to the preset key point target type;

a target key point determining module, configured to determine, if a face key point of the key point target type is detected in the face detection area, the face key point of the key point target type as a target key point of the current frame image;

and the resetting module is used for carrying out face detection on the current frame image, resetting the key point target type and/or re-determining the face detection area of the current frame image according to the face detection result, and instructing the preset face key point detection module to carry out the step of carrying out face key point detection on the face detection area of the current frame image according to the preset key point target type.

Optionally, the target keypoints comprise left eye corner keypoints, right eye corner keypoints, nose tip keypoints, and mouth corner keypoints.

Optionally, the face key point detecting unit 41 is specifically configured to perform face key point detection on a face detection area of the current frame image through a pre-trained lightweight neural network, and determine a target key point.

Optionally, the positioning unit 42 specifically includes a face region determining module and a face detection region determining module:

the face area determining module is used for determining the minimum external quadrangle of the target key point, and the minimum external quadrangle is used as the face area of the current frame image;

and the face detection area determining module is used for expanding the face area of the current frame image to a preset multiple to obtain the face detection area of the next frame image.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Example three:

fig. 5 is a schematic diagram of a terminal device according to an embodiment of the present application. As shown in fig. 5, the terminal device 5 of this embodiment includes: a processor 50, a memory 51 and a computer program 52, such as a face tracking program, stored in said memory 51 and executable on said processor 50. The processor 50, when executing the computer program 52, implements the steps in the above-described embodiments of the face tracking method, such as the steps S101 to S102 shown in fig. 1. Alternatively, the processor 50, when executing the computer program 52, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the units 41 to 42 shown in fig. 4.

Illustratively, the computer program 52 may be partitioned into one or more modules/units, which are stored in the memory 51 and executed by the processor 50 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 52 in the terminal device 5. For example, the computer program 52 may be divided into a face key point detection unit and a positioning unit, and each unit specifically functions as follows:

and the face key point detection unit is used for carrying out face key point detection on a face detection area of the current frame image and determining a target key point, wherein if the current frame image is a preset initial frame image, the face detection area of the current frame image is an area obtained by carrying out face detection on the current frame image.

The terminal device 5 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 50, a memory 51. Those skilled in the art will appreciate that fig. 5 is merely an example of a terminal device 5 and does not constitute a limitation of terminal device 5 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.

The Processor 50 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 51 may be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. The memory 51 may also be an external storage device of the terminal device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the terminal device 5. The memory 51 is used for storing the computer program and other programs and data required by the terminal device. The memory 51 may also be used to temporarily store data that has been output or is to be output.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A face tracking method, comprising:

2. The face tracking method according to claim 1, wherein the preset start frame image specifically comprises a first frame image in a video image sequence and an image determined according to a preset number of interval frames.

3. The method for tracking human face according to claim 2, wherein before performing human face key point detection on the human face detection area of the current frame image and determining the target key point, the method further comprises:

4. The method for tracking human face according to claim 1, wherein said detecting human face key points in the human face detection area of the current frame image to determine the target key points comprises:

performing face key point detection on a face detection area of the current frame image according to a preset key point target type;

if the face key point of the key point target type is detected in the face detection area, determining the face key point of the key point target type as a target key point of the current frame image;

otherwise, carrying out face detection on the current frame image, resetting the key point target type and/or re-determining the face detection area of the current frame image according to the face detection result, and returning to execute the step of carrying out face key point detection on the face detection area of the current frame image according to the preset key point target type.

5. The method for tracking human face according to claim 1, wherein said detecting human face key points in the human face detection area of the current frame image to determine the target key points comprises:

6. The method of claim 1, wherein the locating the face region of the current frame image according to the target key points and determining the face detection region of the next frame image comprises:

7. A face tracking device, comprising:

8. The face tracking device of claim 7, wherein the predetermined start frame image specifically comprises a first frame image in the video image sequence and an image determined according to a predetermined number of spaced frames.

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the computer program, when executed by the processor, causes the terminal device to carry out the steps of the method according to any one of claims 1 to 6.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, causes a terminal device to carry out the steps of the method according to any one of claims 1 to 6.