CN111523424A

CN111523424A - Face tracking method and face tracking equipment

Info

Publication number: CN111523424A
Application number: CN202010296068.4A
Authority: CN
Inventors: 刘利朋; 梁峰
Original assignee: Shanghai Moxiang Network Technology Co ltd
Current assignee: Shanghai Moxiang Network Technology Co ltd
Priority date: 2020-04-15
Filing date: 2020-04-15
Publication date: 2020-08-11
Also published as: WO2021208251A1

Abstract

The embodiment of the application provides a face tracking method and face tracking equipment, comprising the following steps: acquiring an image frame of a video to be tracked, and carrying out face detection on the image frame; determining at least one detected face in the image frame according to the face detection result; acquiring the intersection ratio of each tracked face and at least one detected face in a tracking list; determining the matching relationship between each tracking face and at least one detection face according to the intersection ratio; and tracking the face according to the matching relation. The method improves the efficiency and accuracy of face tracking, thereby improving user experience.

Description

Face tracking method and face tracking equipment

Technical Field

The embodiment of the application relates to the technical field of image recognition, in particular to a face tracking method and face tracking equipment.

Background

With the development of image recognition technology, the application range of face tracking technology is wider and wider. The face tracking technology refers to tracking each face in video data, and the tracking of the face can facilitate subsequent processing of image frames included in the video data, such as expression analysis, driver fatigue driving detection, intelligent beauty and the like.

When the human face is tracked, the aim of tracking the human face can be achieved by judging the matching relation of the human faces of the front frame and the rear frame, wherein the human face features of the front frame and the rear frame can be extracted, and the human face matching is carried out based on the similarity of the human face features. In the scheme, although the purpose of determining the face matching relationship can be achieved, on one hand, the efficiency of face matching is reduced because more time is usually consumed for extracting the face features, on the other hand, the quality of the extracted face features is influenced by the quality of image frames, and when the quality of the image frames is poor, the quality of the extracted face features is possibly low, so that the accuracy of face matching is reduced, and the user experience is damaged.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a face tracking method and a face tracking device, so as to overcome the defects of low face tracking efficiency and low accuracy in the prior art.

The embodiment of the application provides a face tracking method, which comprises the following steps:

acquiring an image frame of a video to be tracked, and carrying out face detection on the image frame;

determining at least one detected face in the image frame according to the face detection result;

acquiring the intersection ratio of each tracked face and at least one detected face in a tracking list;

determining the matching relationship between each tracking face and at least one detection face according to the intersection ratio;

and tracking the face according to the matching relation.

In the embodiment of the application, the intersection ratio is used for indicating the overlapping degree of the tracked face and the detected face, and considering that the overlapping rate of the face of the same person between the continuous frames is larger, the intersection ratio of each tracked face and at least one detected face in the tracking list is obtained; the matching relation between each tracking face and at least one detection face is determined according to the intersection ratio, so that the matched faces are the faces of the same person, wherein the overlapping degree of the detection faces is less influenced by the image quality of the image frames, so that even if the image quality of the image frames is poor (for example, factors such as face motion blur, face orientation towards the side, uneven illumination of the face, and shielding of the face exist in the image frames), the accuracy of the acquired intersection ratio is not greatly influenced, the matching relation determined according to the intersection ratio is not influenced by more factors such as poor image quality of the image frames, and the accuracy of face tracking according to the matching relation is ensured to be high. In addition, compared with the extraction of the face features, the intersection acquisition is simpler and less in time consumption, so that the technical scheme provided by the embodiment of the application improves the face tracking efficiency and accuracy, and the user experience is improved.

Drawings

Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:

fig. 1 is a schematic flow chart of a face tracking method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a face tracking method according to an embodiment of the present application;

fig. 3 is a schematic flow chart of a face tracking method according to an embodiment of the present application;

fig. 4 is a schematic flow chart of a face tracking method according to an embodiment of the present application;

fig. 5 is a schematic flow chart of a face tracking method according to an embodiment of the present application;

fig. 6 is a schematic flow chart of a face tracking method according to an embodiment of the present application;

fig. 7 is a schematic flow chart of a face tracking method according to an embodiment of the present application;

fig. 8 is a schematic flow chart of a face tracking method according to an embodiment of the present application;

fig. 9 is a schematic flow chart of a face tracking method according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a face tracking device according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a handheld pan/tilt provided in an embodiment of the present application;

fig. 12 is a schematic view of an application scenario of a handheld pan/tilt head according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a handheld pan/tilt provided in an embodiment of the present application.

Detailed Description

It should be understood that the terms "first," "second," and the like as used in the description and in the claims, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. Also, the use of the terms "a" or "an" and the like do not denote a limitation of quantity, but rather denote the presence of at least one.

In recent years, face tracking technology has been rapidly developed. When face tracking is performed, face features of front and rear frames can be extracted, and face matching is performed based on face feature similarity.

In the scheme, although the purpose of obtaining the face matching relationship can be achieved, on one hand, the efficiency of face matching is reduced because more time is usually consumed for extracting the face features, on the other hand, the quality of the extracted face features is influenced by the quality of image frames, and when the quality of the image frames is poor, the quality of the extracted face features is possibly low, so that the accuracy of face matching is reduced, and the user experience is damaged.

In view of the deficiencies in the above technical solutions, the technical solution provided in the embodiments of the present application introduces an intersection-to-parallel ratio, where the intersection-to-parallel ratio is used to indicate an overlapping degree of a tracked face and a detected face, and considering that an overlapping rate of a face of the same person between consecutive frames is large, an intersection-to-parallel ratio of each tracked face and at least one detected face in a tracking list is obtained, and a matching relationship between each tracked face and at least one detected face is determined according to the intersection-to-parallel ratio, so that the matched face is the face of the same person, where the overlapping degree of the detected face is less affected by image quality of an image frame, even if image quality of the image frame is poor (for example, there are factors such as blurred face motion, face orientation toward side, uneven illumination of the face, and occlusion of the face), accuracy of the obtained intersection-to-parallel ratio is not greatly affected, and therefore the matching relationship between the tracked face and the detected face determined according to the intersection-to, therefore, the accuracy of face tracking according to the matching relation is high. In addition, compared with the extraction of the face features, the intersection acquisition is simpler and less in time consumption, so that the technical scheme provided by the embodiment of the application improves the face tracking efficiency and accuracy, and the user experience is improved.

The following further describes specific implementation of the embodiments of the present invention with reference to the drawings.

Example one

An embodiment of the present application provides a face tracking method, as shown in fig. 1, where fig. 1 is a schematic flow chart of the face tracking method provided in the embodiment of the present application. The face tracking method comprises the following steps:

101. and acquiring an image frame of a video to be tracked, and carrying out face detection on the image frame.

Specifically, the image frame of the video to be tracked may be a current image frame of the video to be tracked, or may be a designated image frame in the video to be tracked. The video to be tracked may be obtained by shooting the electronic device that executes the face tracking method provided in the embodiment of the present application with a shooting device, such as a camera, or may be a video stored in advance on the electronic device, or the electronic device may be obtained from another device or system. The image frame of the video to be tracked can be obtained by obtaining the current image frame of the video to be tracked or obtaining the appointed image frame in the video to be tracked.

The face detection may be performed on the image frame by using a template matching method, or by using a face detection method based on a Cascade Classifier (Cascade Classifier), or by using a face detection method based on a Deformable Part Model (DPM), or by using a face detection method based on a Convolutional Neural Network (CNN), which is not limited in this application.

102. And determining at least one detected face in the image frame according to the face detection result.

For example, the electronic device executing the face tracking method provided by the embodiment of the present application may store face feature point information or face template information, which is acquired in advance, and the face feature point information or the face template information may be obtained from a picture or a video acquired in advance, detect an image frame according to the face feature point information by using an image frame recognition algorithm based on face feature points, and determine at least one detected face in the image frame according to the detection result; or, the image frame may be detected by using a recognition algorithm based on the face template according to the face template information, and at least one detected face in the image frame may be determined according to the detection result.

103. And acquiring the intersection ratio of each tracked face and at least one detected face in the tracking list.

Specifically, an Intersection Over Union (IOU) is used to indicate the degree of overlap between the tracked face and the detected face. Acquiring the intersection ratio of each tracked face and at least one detected face in the tracking list, wherein the intersection ratio can be used for acquiring the intersection ratio of each tracked face and a target detected face in at least one detected face, and can also be used for acquiring the intersection ratio of each tracked face and each detected face, and how to determine the target detected face can be set by a person skilled in the art according to needs.

Illustratively, the intersection ratio of the tracking face and the detection face is obtained by obtaining an area a of the tracking face in the picture frame, an area B of the detection face in the picture frame, and an area C of an overlapping portion of the tracking face and the detection face in the picture frame

And acquiring the intersection ratio of the tracking face and the detection face. And by finding out the weight of each pair of the tracking face and the detection faceAnd determining the ratio of the overlapped area part to the total area of the pair of the tracking human faces and the detection human faces, and determining the ratio as the intersection ratio of the pair of the tracking human faces and the detection human faces.

104. And determining the matching relation between each tracking face and at least one detection face according to the intersection ratio.

Specifically, the matching relationship between each tracked face and at least one detected face is determined according to the intersection ratio, the tracked face with the largest intersection ratio among the tracked faces and the detected faces can be determined as the tracked face matched with the detected face, or the tracked face with the intersection ratio larger than or equal to the intersection ratio threshold value among the tracked faces and the detected faces can be determined as the tracked face matched with the detected face, wherein the intersection ratio threshold value is set by a person skilled in the art according to needs.

105. And tracking the face according to the matching relation.

In the embodiment of the application, the intersection ratio is used for indicating the overlapping degree of the tracked face and the detected face, and considering that the overlapping rate of the face of the same person between the continuous frames is larger, the intersection ratio of each tracked face and at least one detected face in the tracking list is obtained; and determining the matching relationship between each tracking face and at least one detection face according to the intersection ratio to enable the matched faces to be the faces of the same person, wherein the overlapping degree of the detection faces is less influenced by the image quality of the image frames, so that even if the image quality of the image frames is poor (for example, factors such as face motion blur, face orientation towards the side, uneven illumination of the face, and shielding of the face exist in the image frames), the accuracy of the obtained intersection ratio is not greatly influenced, the determination of the matching relationship between the tracking faces and the detection faces according to the intersection ratio is not more influenced by the poor quality of the image frames, and the accuracy of face tracking according to the matching relationship is ensured to be higher. In addition, compared with the extraction of the face features, the intersection acquisition is simpler and less in time consumption, so that the technical scheme provided by the embodiment of the application improves the face tracking efficiency and accuracy, and the user experience is improved.

Optionally, in an embodiment of the present application, as shown in fig. 2, fig. 2 is a schematic flowchart of a face tracking method provided in the embodiment of the present application. Step 104 may be implemented by steps 1041 to 1044:

1041. and generating a cost matrix according to the intersection ratio.

For example, when the tracking list includes m tracking faces and it is determined that the image frame includes n detection faces according to the face detection result, a cost matrix of m rows and n columns may be generated according to the intersection ratio of the tracking faces and the detection faces, where elements of the cost matrix may be the intersection ratio of the tracking faces and the detection faces.

1042. And determining the elements of which the median value is less than or equal to the intersection ratio threshold value as target elements, and removing the rows where the target elements are located and the columns where the target elements are located from the cost matrix to generate a filtering cost matrix.

Wherein, the intersection ratio threshold value is set by the person skilled in the art according to the needs.

Illustratively, the cost matrix may be as follows, where J1-J4 are tracking faces and W1-W4 are detecting faces:

the intersection ratio threshold is 0.3, the row and the column of the element with the value less than or equal to 0.3 are removed from the cost matrix, and the filtering cost matrix is generated as follows:

	J1	J2	J3	J4
					W1	0	0.83	0	0.92
W2	0	0.37	0	0.92
					W3	0	0	0	0
W4	0	0	0	0

1043. and performing total cost minimum analysis on the filtering cost matrix.

Illustratively, the total cost minimum analysis is performed on the filtering cost matrix, which can be performed on the filtering cost matrix by using the hungarian algorithm.

An exemplary process of performing total cost minimum analysis on the filtering cost matrix by using the hungarian algorithm is as follows:

the filtering cost matrix may be as follows, wherein for convenience of calculation, the values of the elements in the filtering cost matrix are set as the inverse of the corresponding intersection ratio, for example, the values of the elements corresponding to W1 and J1 are the inverse of the intersection ratio of the detected face W1 and the tracked face J1, that is, 82:

	J1	J2	J3	J4
					W1	82	83	69	92
W2	77	37	49	92
					W3	11	69	5	86
W4	8	9	98	23

the minimum value of a row is first subtracted from each row of the filtering cost matrix, and the resulting first matrix is as follows:

	J1	J2	J3	J4
					W1
	13	14	0	23
					W2	40	0	12	55
W3	6	64	0	81
					W4	0	1	90	15

subtracting the minimum value of each column of the first matrix to obtain a second matrix:

the minimum number of rows/columns (horizontal rows and vertical columns) required to cover all 0-valued elements in the second matrix is determined to be 3, less than the total number of rows of the second matrix, and all elements in the covered area (shown shaded in the table) are identified as found elements, the value of the minimum of the undiscovered elements, i.e. 6, is subtracted from all undiscovered elements, and 6 is added to all twice-covered elements. The following third matrix is generated:

	J1	J2	J3	J4
					W1	7	8	0	2
W2	40	0	18	40
					W3	0	58	0	60
W4	0	1	96	0

the minimum number of rows required to cover the element with the value 0 in the third matrix is 4 rows, and since the number of rows (4) required is equal to the size of the matrix (n-4), there is an optimal assignment (match) between zeros in the matrix. Therefore, a total cost minimum resolution result can be obtained: W3J1, W2J2, W1J3 and W4J 4.

1044. And determining the matching relation between each tracking face and at least one detection face according to the total cost minimum analysis result.

Illustratively, according to the total cost minimum parsing result exemplarily shown in step 1043, it may be determined that the tracked face J1 matches the detected face W3, the tracked face J2 matches the detected face W2, the tracked face J3 matches the detected face W1, and the tracked face J4 matches the detected face W4.

The method comprises the steps of generating a cost matrix according to the cross-over ratio, determining elements of which the median value is smaller than or equal to the cross-over ratio threshold value as target elements, removing the lines where the target elements are located and the columns where the target elements are located from the cost matrix, generating a filtering cost matrix, carrying out total cost minimum analysis on the filtering cost matrix, determining the matching relation between each tracking face and at least one detection face according to the total cost minimum analysis result, obtaining the optimal matching between each tracking face and at least one detection face, reducing the probability of face mismatching in the front-frame face matching process and the rear-frame face matching process, and improving the success rate of tracking multiple faces.

Optionally, in an embodiment of the present application, as shown in fig. 3, fig. 3 is a schematic flowchart of a face tracking method provided in the embodiment of the present application. In an embodiment of the present application, step 1044 may be implemented by steps 1045 to 1047:

1045. and acquiring the assignment relation between the cost matrix and the filtering cost matrix.

Specifically, the filtering cost matrix is generated by removing rows where the target elements are located and columns where the target elements are located in the cost matrix, so that the coordinates of some elements may have differences when the filtering cost matrix is compared with the cost matrix, and the coordinates of each element in the filtering cost matrix in the cost matrix before the filtering cost matrix is generated can be determined by obtaining the assignment relationship between the cost matrix and the filtering cost matrix.

1046. And determining the face matching relationship with the minimum total cost according to the minimum total cost analysis result.

Specifically, the total-cost minimum face matching relationship may be used to indicate a matching relationship between the optimal-cost matching element in the cost matrix and a row and a column of the cost matrix, where only one of the optimal-cost matching element exists in each row and each column of the cost matrix.

It should be noted that, the execution order of step 1045 and step 1046 may be reversed, and the present application is not limited to this.

1047. And correcting the face matching relationship with the minimum total cost according to the assignment relationship, and determining the matching relationship between each tracked face and at least one detected face.

Specifically, the total cost minimum face matching relationship is corrected according to the assignment relationship, which can be understood as obtaining coordinates of the cost optimal matching element indicated by the total cost minimum face matching relationship in the cost matrix before the filtering cost matrix is generated according to the assignment relationship between the cost matrix and the filtering cost matrix, and determining the matching relationship between each tracked face and at least one detected face according to the coordinates.

By obtaining the assignment relationship between the cost matrix and the filtering cost matrix, the position of each element in the filtering cost matrix in the cost matrix before the filtering cost matrix is generated can be clearly determined, then the total-cost minimum face matching relationship is determined according to the total-cost minimum analysis result, the total-cost minimum face matching relationship is corrected according to the assignment relationship, the cost optimal matching element indicated by the total-cost minimum face matching relationship can be simply and conveniently corresponding to the tracked face and the detected face, the difficulty in determining the matching relationship between each tracked face and at least one detected face is reduced, and the consumption of processing resources is reduced.

Optionally, as shown in fig. 4, fig. 4 is a schematic flowchart of a face tracking method provided in an embodiment of the present application. In one embodiment of the present application, the method further comprises:

106. and updating the lost times of the tracking faces which are not successfully matched with any detected face in the tracking list.

Specifically, after the matching relationship between each tracked face and at least one detected face is determined, the tracked face that is not successfully matched with any detected face is retrieved from the tracking list according to the matching relationship, and the number of missing times of the tracked face that is not successfully matched with any detected face in the tracking list is updated since the tracked face that is not successfully matched can be regarded as being unsuccessfully detected in the image frame, wherein the number of missing times of the tracked face can be updated to be the original value plus 1.

107. And moving the tracked faces with the loss times larger than or equal to the first loss time threshold value in the tracking list out of the tracking list and into the loss list.

Wherein, the first loss time threshold is set by those skilled in the art as required.

It should be noted that all faces that are moved into the loss list can be regarded as lost faces, and therefore, after a tracking face is moved into the loss list, the tracking face is regarded as a lost face.

By updating the loss times of the tracked faces which are not successfully matched with any detected face in the tracking list, and moving the tracked faces with the loss times larger than or equal to the first loss time threshold value in the tracking list out of the tracking list and into the loss list, the faces with excessive continuous frame loss numbers (namely the faces with low tracking success rate) can be prevented from still existing in the tracking list, and the success rate of face tracking according to the tracking list is improved.

Optionally, as shown in fig. 5, fig. 5 is a schematic flowchart of a face tracking method provided in an embodiment of the present application. In one embodiment of the present application, the method further comprises:

108. and extracting the face features of the detected face with the intersection ratio smaller than or equal to the intersection ratio threshold value.

Specifically, the Face feature extraction may be performed on the detected Face whose cross-over ratio is less than or equal to the cross-over ratio threshold, and the Face feature of the detected Face in the image frame whose cross-over ratio is less than or equal to the cross-over ratio threshold may be extracted by using a pre-trained Convolutional Neural Network (CNN) model, where the Convolutional Neural Network model may include an Openface model, a Face _ recognition model, and an insight model.

109. And when the extracted facial features are successfully matched with the lost faces in the loss list, the matched lost faces are moved out of the loss list and moved into a tracking list.

When the intersection of the detected faces is low due to the movement of camera equipment or the movement of the faces, the probability of face matching failure based on the intersection ratio is high, at this time, if the detected faces corresponding to the target elements still need to be matched, a trained Convolutional Neural Network (CNN) can be used for extracting the face features of the detected faces corresponding to the target elements, the detected faces are matched with the lost faces in the loss list according to the extracted face features, and when the extracted face features are successfully matched with the lost faces in the loss list, the matched lost faces are removed from the loss list and are moved into the tracking list, so that the success rate of face tracking according to the tracking list is improved.

Optionally, as shown in fig. 6, fig. 6 is a schematic flowchart of a face tracking method provided in an embodiment of the present application. In one embodiment of the present application, the method further comprises:

110. and extracting the face features of the detected face with the intersection ratio smaller than or equal to the intersection ratio threshold value.

Specifically, the face feature extraction is performed on the detected face with the cross-to-parallel ratio less than or equal to the cross-to-parallel ratio threshold, which may be performed by using a convolutional neural network model trained in advance. 111. And when the extracted face features are unsuccessfully matched with the lost faces in the lost list, generating a new face according to the extracted face features, and moving the new face into the tracking list.

It should be noted that all the faces moved into the tracking list can be regarded as tracking faces, so that the newly added faces are regarded as tracking faces after being moved into the tracking list.

The face feature extraction is carried out on the detected face with the intersection ratio smaller than or equal to the intersection ratio threshold, when the extracted face feature fails to be matched with the lost face in the loss list, a new face is generated according to the extracted face feature, the new face is moved into the tracking list, the face which cannot be tracked before the tracking list is updated can be updated, and the success rate of face tracking according to the tracking list is improved.

Optionally, as shown in fig. 7, fig. 7 is a schematic flowchart of a face tracking method provided in an embodiment of the present application. In one embodiment of the present application, the method further comprises:

112. and when determining that the video to be tracked does not comprise any face according to the face detection result, updating the lost times of the tracked face in the tracking list.

Specifically, when the face detection result determines that the video to be tracked does not include any face, it may be considered that all the tracked faces in the tracking list are not successfully detected in the image frame, and therefore, the loss times of the tracked faces in the tracking list are updated, where the loss times of all the tracked faces in the tracking list may be updated to the original value plus 1.

113. And moving the tracked faces with the loss times larger than or equal to the first loss time threshold value in the tracking list out of the tracking list and into the loss list.

When it is determined that the video to be tracked does not include any face according to the face detection result, the loss times of the tracked faces in the tracking list are updated, the tracked faces with the loss times larger than or equal to the first loss time threshold value in the tracking list are moved out of the tracking list and moved into the loss list, faces with excessive continuous tracking frame loss numbers (namely faces with low tracking success rate) can be prevented from still existing in the tracking list, and therefore the success rate of face tracking according to the tracking list is improved.

Optionally, as shown in fig. 8, fig. 8 is a schematic flowchart of a face tracking method provided in an embodiment of the present application. In one embodiment of the present application, the method further comprises:

114. and when determining that the video to be tracked does not comprise any face according to the face detection result, updating the loss times of the lost face in the loss list.

Specifically, when the face detection result determines that the video to be tracked does not include any face, it may be considered that all lost faces in the loss list have not been successfully retrieved in the image frame, and therefore, the number of times of loss of all lost faces in the loss list is updated, where the number of times of loss of all lost faces in the loss list may be updated to the original value plus 1.

115. And moving the lost face with the loss times larger than or equal to the second loss time threshold value in the loss list out of the loss list.

Wherein the second loss number threshold is set by a person skilled in the art as required.

Specifically, when the number of losses in the loss list is greater than or equal to the second loss number threshold, it may be considered that the probability that the face with the number of losses that is greater than or equal to the second loss number threshold is detected again in the image frame after the video to be tracked is low, that is, the probability that the part of the face is retrieved is low, if the part of the face is continuously retained in the loss list, processing resource consumption without return may be caused by continuously performing retrieval on the part of the face, and therefore, by removing the lost face with the number of losses that is greater than or equal to the second loss number threshold from the loss list, unnecessary processing resource consumption may be reduced in a subsequent face tracking process, and the face tracking efficiency is improved.

Optionally, as shown in fig. 9, fig. 9 is a schematic flowchart of a face tracking method provided in an embodiment of the present application. In one embodiment of the present application, the method further comprises:

116. when the tracking list is empty, at least one detected face is moved into the tracking list.

When the tracking list is empty, at least one detected face is moved into the tracking list, so that faces which can be used for face tracking can exist in the tracking list as soon as possible, and the starting speed of face tracking is increased.

Example II,

Based on the face tracking method of the sensor described in the foregoing embodiment, the present application provides a face tracking device, configured to execute the face tracking method described in the foregoing embodiment, as shown in fig. 10, where the face tracking device 30 includes: at least one processor (processor)302, memory 304, and a video collector 306.

Wherein:

and the video collector 306 is used for collecting the video to be tracked in the target area.

A memory 304 for storing program code. Memory 304 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The processor 302 may be a central processing unit CPU, or an application specific Integrated circuit (asic), or one or more Integrated circuits configured to implement an embodiment of the present invention. The one or more processors included in the face tracking device may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

A processor 302 for invoking program code, which when executed, performs the following:

and tracking the face according to the matching relation.

In an optional implementation manner, the determining, according to the intersection ratio, a matching relationship between each tracked face and at least one detected face includes:

generating a cost matrix according to the intersection ratio;

determining elements of which the median value is less than or equal to the intersection ratio threshold value as target elements, and removing rows where the target elements are located and columns where the target elements are located from the cost matrix to generate a filtering cost matrix;

performing total cost minimum analysis on the filtering cost matrix;

and determining the matching relation between each tracking face and at least one detection face according to the total cost minimum analysis result.

In an optional implementation manner, the determining, according to the total cost minimum analysis result, a matching relationship between each tracked face and at least one detected face includes:

acquiring an assignment relation between a cost matrix and a filtering cost matrix;

determining a face matching relation with the minimum total cost according to the minimum total cost analysis result;

and correcting the face matching relationship with the minimum total cost according to the assignment relationship, and determining the matching relationship between each tracked face and at least one detected face.

In an alternative embodiment, the operations further comprise:

updating the loss times of the tracking face which is not successfully matched with any detected face in the tracking list;

and moving the tracked faces with the loss times larger than or equal to the first loss time threshold value in the tracking list out of the tracking list and into the loss list.

In an alternative embodiment, the operations further comprise:

extracting the face features of the detected face with the intersection ratio smaller than or equal to the intersection ratio threshold;

and when the extracted facial features are successfully matched with the lost faces in the loss list, the matched lost faces are moved out of the loss list and moved into a tracking list.

In an alternative embodiment, the operations further comprise:

and when the extracted face features are unsuccessfully matched with the lost faces in the lost list, generating a new face according to the extracted face features, and moving the new face into the tracking list.

In an alternative embodiment, the operations further comprise:

when determining that the video to be tracked does not comprise any face according to the face detection result, updating the loss times of the tracked face in the tracking list;

In an alternative embodiment, the operations further comprise:

when determining that the video to be tracked does not comprise any face according to the face detection result, updating the loss times of the lost face in the loss list;

and moving the lost face with the loss times larger than or equal to the second loss time threshold value in the loss list out of the loss list.

In an alternative embodiment, the operations further comprise:

when the tracking list is empty, at least one detected face is moved into the tracking list.

The face tracking device of the embodiment of the present application exists in various forms, including but not limited to:

(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include: smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc., such as ipads.

(3) A portable entertainment device: such devices can display and play multimedia content. This type of device comprises: audio, video players (e.g., ipods), handheld game consoles, electronic books, and smart toys and portable car navigation devices.

(4) And other electronic equipment with data interaction function.

In one embodiment, the face tracking device includes a handheld pan-tilt camera.

The basic structure of the handheld pan/tilt camera will be briefly described below. Fig. 11 is a schematic structural diagram of a handheld pan/tilt head provided in an embodiment of the present application, fig. 12 is a schematic application scenario diagram of the handheld pan/tilt head provided in the embodiment of the present application, and fig. 13 is a schematic structural diagram of the handheld pan/tilt head provided in the embodiment of the present application. As shown in fig. 11 to 13, the handheld tripod head 1 according to the embodiment of the present invention includes: the camera system comprises a handle 11 and a shooting device 12 loaded on the handle 11, wherein in the embodiment, the shooting device 12 can comprise a three-axis pan-tilt camera, and in other embodiments, the shooting device comprises two or more than three axis pan-tilt cameras.

The handle 11 is provided with a display 13 for displaying the contents of the camera 12. The present invention does not limit the type of the display 13.

Through setting up display screen 13 at the handle 11 of handheld cloud platform 1, this display screen can show the shooting content of taking device 12 to realize that the user can browse the picture or the video that taking device 12 was taken through this display screen 13 fast, thereby improve handheld cloud platform 1 and user's interactivity and interest, satisfy user's diversified demand.

In one embodiment, the handle 11 is further provided with an operation function portion for controlling the photographing device 12, and by operating the operation function portion, it is possible to control the operation of the photographing device 12, for example, to control the on and off of the photographing device 12, to control the photographing of the photographing device 12, to control the posture change of the pan-tilt portion of the photographing device 12, and the like, so as to facilitate the user to quickly operate the photographing device 12. The operation function part can be in the form of a key, a knob or a touch screen.

In one embodiment, the operation function portion includes a shooting button 14 for controlling the shooting of the shooting device 12, a power/function button 15 for controlling the on/off and other functions of the shooting device 12, and a universal key 16 for controlling the movement of the pan/tilt head. Of course, the operation function portion may further include other control keys, such as an image storage key, an image playing control key, and the like, which may be set according to actual requirements.

In one embodiment, the operation function portion and the display 13 are disposed on the same surface of the handle 11, and the operation function portion and the display 13 are both disposed on the front surface of the handle 11, so as to meet the ergonomics and make the overall appearance layout of the handheld pan/tilt head 1 more reasonable and beautiful.

Further, the side of the handle 11 is provided with a function operating key a for facilitating the user to quickly and intelligently form a piece by one key. When the camera is started, the orange side key on the right side of the camera body is clicked to start the function, a video is automatically shot at intervals, N sections (N is more than or equal to 2) are shot totally, after a mobile device such as a mobile phone is connected, the function of 'one-key film forming' is selected, the shooting sections are intelligently screened by the system and matched with a proper template, and wonderful works are quickly generated.

In an alternative embodiment, the handle 11 is also provided with a latching groove 17 for the insertion of a memory element. In this embodiment, the card slot 17 is provided on a side surface of the handle 11 adjacent to the display 13, and the image captured by the imaging device 12 can be stored in the memory card by inserting the memory card into the card slot 17. In addition, the card slot 17 is arranged on the side part, so that the use of other functions is not influenced, and the user experience is better.

In one embodiment, a power supply battery for supplying power to the handle 11 and the camera 12 may be disposed inside the handle 11. The power supply battery can adopt a lithium battery, and has large capacity and small volume so as to realize the miniaturization design of the handheld cloud deck 1.

In one embodiment, the handle 11 is further provided with a charging/USB interface 18. In this embodiment, the charging interface/USB interface 18 is disposed at the bottom of the handle 11, so as to facilitate connection with an external power source or a storage device, thereby charging the power supply battery or performing data transmission.

In one embodiment, the handle 11 is further provided with a sound pickup hole 19 for receiving an audio signal, and a microphone is communicated with the interior of the sound pickup hole 19. Pickup hole 19 may include one or more. An indicator light 20 for displaying status is also included. The user may interact audibly with the display screen 13 through the sound pickup hole 19. In addition, the indicator light 20 can reach the warning effect, and the user can obtain the electric quantity condition and the current executive function condition of handheld cloud platform 1 through the indicator light 20. In addition, the sound collecting hole 19 and the indicator light 20 can be arranged on the front surface of the handle 11, so that the use habit and the operation convenience of a user are better met.

In one embodiment, the camera 12 includes a pan-tilt support and a camera mounted on the pan-tilt support. The camera may be a camera, or may be an image pickup element composed of a lens and an image sensor (such as a CMOS or CCD), and may be specifically selected as needed. The camera may be integrated on a pan-tilt stand, so that the camera 12 is a pan-tilt camera; the camera can also be an external shooting device which can be detachably connected or clamped and carried on the tripod head bracket.

In one embodiment, the pan/tilt support is a three-axis pan/tilt support and the camera 12 is a three-axis pan/tilt camera. The three-axis pan-tilt support comprises a yaw shaft assembly 22, a transverse rolling shaft assembly 23 movably connected with the yaw shaft assembly 22, and a pitch shaft assembly 24 movably connected with the transverse rolling shaft assembly 23, and the shooting device is carried on the pitch shaft assembly 24. The yaw shaft assembly 22 drives the camera 12 to rotate in the yaw direction. Of course, in other examples, the holder may also be a two-axis holder, a four-axis holder, or the like, which may be specifically selected as needed.

In one embodiment, a mounting portion is provided at one end of the connecting arm connected to the yaw axle assembly, and a yaw axle assembly may be provided in the handle, the yaw axle assembly driving the camera 12 to rotate in the yaw direction.

In an alternative embodiment, the handle 11 is provided with an adaptor 26 for coupling with a mobile device 2 (such as a mobile phone), and the adaptor 26 is detachably connected with the handle 11. The adaptor 26 protrudes from the side of the handle to connect with the mobile device 2, and when the adaptor 26 is connected with the mobile device 2, the handheld tripod head 1 is butted with the adaptor 26 and is used for being supported at the end of the mobile device 2.

Set up the adaptor 26 that is used for being connected with mobile device 2 at handle 11, and then with handle 11 and mobile device 2 interconnect, handle 11 can regard as a base of mobile device 2, and the user can come together to hold cloud platform 1 and pick up the operation through the other end that grips mobile device 2, connects convenient and fast, and the product aesthetic property is strong. In addition, after the handle 11 is coupled with the mobile device 2 through the adaptor 26, the communication connection between the handheld tripod head 1 and the mobile device 2 can be realized, and data transmission can be performed between the shooting device 12 and the mobile device 2.

In one embodiment, the adaptor 26 is removably attached to the handle 11, i.e., mechanical connection or disconnection between the adaptor 26 and the handle 11 is possible. Further, the adaptor 26 is provided with an electrical contact, and the handle 11 is provided with an electrical contact mating portion that mates with the electrical contact.

In this way, the adapter 26 can be removed from the handle 11 when the handheld head 1 does not need to be connected to the mobile device 2. When the handheld cloud platform 1 needs to be connected with the mobile device 2, the adaptor 26 is mounted on the handle 11, the mechanical connection between the adaptor 26 and the handle 11 is completed, and meanwhile, the electrical connection between the electrical contact part and the electrical contact matching part is guaranteed through the connection between the electrical contact part and the electrical contact matching part, so that data transmission between the shooting device 12 and the mobile device 2 can be achieved through the adaptor 26.

In one embodiment, a receiving groove 27 is formed on a side portion of the handle 11, and the adaptor 26 is slidably engaged in the receiving groove 27. When the adaptor 26 is received in the receiving slot 27, a portion of the adaptor 26 protrudes from the receiving slot 27, and a portion of the adaptor 26 protruding from the receiving slot 27 is used for connecting with the mobile device 2.

In one embodiment, referring to fig. 11-13, when the adaptor 26 is inserted into the receiving slot 27 from the adaptor 26, the adaptor is flush with the receiving slot 27, and the adaptor 26 is received in the receiving slot 27 of the handle 11.

Therefore, when the handheld tripod head 1 needs to be connected with the mobile device 2, the adaptor 26 can be inserted into the accommodating groove 27 from the adaptor part, so that the adaptor 26 protrudes out of the accommodating groove 27, and the mobile device 2 and the handle 11 can be connected with each other

After the mobile device 2 is used or when the mobile device 2 needs to be pulled out, the adaptor 26 may be taken out from the receiving groove 27 of the handle 11, and then put into the receiving groove 27 from the adaptor 26 in the reverse direction, so that the adaptor 26 may be received in the handle 11. The adaptor 26 is flush with the receiving groove 27 of the handle 11, so that when the adaptor 26 is received in the handle 11, the surface of the handle 11 is smooth, and the adaptor 26 is more convenient to carry when received in the handle 11.

In one embodiment, the receiving groove 27 is semi-open and is formed on one side surface of the handle 11, so that the adaptor 26 can be more easily slidably engaged with the receiving groove 27. Of course, in other examples, the adaptor 26 may be detachably connected to the receiving slot 27 of the handle 11 by a snap connection, a plug connection, or the like.

In one embodiment, the receiving slot 27 is formed on the side of the handle 11, and the cover 28 is clamped to cover the receiving slot 27 when the switch function is not used, so that the user can operate the switch conveniently without affecting the overall appearance of the front and side of the handle.

In one embodiment, the electrical contact and the electrical contact mating portion may be electrically connected by contact. For example, the electrical contact may be selected as a pogo pin, an electrical plug interface, or an electrical contact. Of course, in other examples, the electrical contact portion and the electrical contact mating portion may be directly connected by surface-to-surface contact.

A1, a face tracking method, comprising:

determining at least one detected face in the image frame according to a face detection result;

acquiring the intersection ratio of each tracked face and the at least one detected face in the tracking list;

determining the matching relationship between each tracking face and the at least one detection face according to the intersection ratio;

and tracking the human face according to the matching relation.

A2, the face tracking method according to claim A1, wherein the determining the matching relationship between the tracked faces and the at least one detected face according to the intersection ratio comprises:

generating a cost matrix according to the intersection ratio;

determining elements of which the median value is less than or equal to an intersection ratio threshold value as target elements, and removing rows where the target elements are located and columns where the target elements are located from the cost matrix to generate a filtering cost matrix;

performing total cost minimum analysis on the filtering cost matrix;

and determining the matching relation between each tracking face and the at least one detection face according to the total cost minimum analysis result.

The face tracking method according to claim a2, A3, wherein the determining the matching relationship between the tracked faces and the at least one detected face according to the total cost minimum analysis result comprises:

obtaining an assignment relation between the cost matrix and the filtering cost matrix;

determining a face matching relationship with the minimum total cost according to the minimum total cost analysis result;

and correcting the face matching relationship with the minimum total cost according to the assignment relationship, and determining the matching relationship between each tracked face and the at least one detected face.

A4, the face tracking method according to any of claims a1-A3, the method further comprising:

updating the lost times of the tracking face which is not successfully matched with any detected face in the tracking list;

and moving the tracking face with the loss times larger than or equal to a first loss time threshold value in the tracking list out of the tracking list and into a loss list.

A5, the face tracking method according to any of claims a1-A3, the method further comprising:

and when the extracted facial features are successfully matched with the lost faces in the loss list, removing the matched lost faces from the loss list and moving the matched lost faces into the tracking list.

A6, the face tracking method according to any of claims a1-A3, the method further comprising:

A7, the face tracking method according to any of claims a1-A3, the method further comprising:

when the fact that any face is not included in the video to be tracked is determined according to the face detection result, the lost times of the tracked face in the tracking list are updated;

A8, the face tracking method according to any of claims a1-A3, the method further comprising:

when the video does not comprise any face according to the face detection result, updating the loss times of the lost face in the loss list;

and moving the lost face with the loss times larger than or equal to a second loss time threshold value in the loss list out of the loss list.

A9, the face tracking method according to any one of claims a1-A3, the method further comprising:

when the tracking list is empty, moving the at least one detected face into the tracking list.

A10, a face tracking device, comprising: the device comprises a memory, a processor and a video collector, wherein the video collector is used for collecting a video to be tracked in a target area; the memory is used for storing program codes; the processor, invoking the program code, when executed, is configured to: acquiring an image frame of a video to be tracked, and carrying out face detection on the image frame; determining at least one detected face in the image frame according to a face detection result; acquiring the intersection ratio of each tracked face and the at least one detected face in the tracking list; determining the matching relationship between each tracking face and the at least one detection face according to the intersection ratio; and tracking the human face according to the matching relation.

A11, the face tracking device according to claim A10, wherein the determining the matching relationship between the tracked faces and the at least one detected face according to the intersection ratio comprises:

generating a cost matrix according to the intersection ratio;

performing total cost minimum analysis on the filtering cost matrix;

The face tracking device of claim a11, a12, wherein the determining the matching relationship between the tracked faces and the at least one detected face according to the total cost minimum analysis result includes:

A13, the face tracking device of any one of claims a10-a12, wherein the operations further comprise:

A14, the face tracking device of any one of claims a10-a12, wherein the operations further comprise:

A15, the face tracking device of any one of claims a10-a12, wherein the operations further comprise:

A16, the face tracking device of any one of claims a10-a12, wherein the operations further comprise:

A17, the face tracking device of any one of claims a10-a12, wherein the operations further comprise:

when the fact that any face is not included in the video to be tracked is determined according to the face detection result, the loss times of the lost face in the loss list are updated;

A18, the face tracking device of any one of claims a10-a12, wherein the operations further comprise:

A19, the face tracking device of claim a10, further comprising: and the loader is fixedly connected with the video collector and is used for loading the video collector.

A20, the face tracking device of claim a19, wherein the carrier includes, but is not limited to, a handheld pan-tilt head.

Thus, particular embodiments of the present subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may be advantageous.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A face tracking method, comprising:

and tracking the human face according to the matching relation.

2. The method according to claim 1, wherein said determining the matching relationship between each tracked face and the at least one detected face according to the intersection ratio comprises:

generating a cost matrix according to the intersection ratio;

performing total cost minimum analysis on the filtering cost matrix;

3. The method of claim 2, wherein the determining the matching relationship between each tracked face and the at least one detected face according to the total cost minimum analysis result comprises:

4. A face tracking method according to any of claims 1-3, characterized in that the method further comprises:

5. A face tracking method according to any of claims 1-3, characterized in that the method further comprises:

6. A face tracking method according to any of claims 1-3, characterized in that the method further comprises:

7. A face tracking method according to any of claims 1-3, characterized in that the method further comprises:

8. A face tracking method according to any of claims 1-3, characterized in that the method further comprises:

9. The face tracking method according to any one of claims 1-3, wherein the method further comprises:

10. A face tracking device, comprising: the device comprises a memory, a processor and a video collector, wherein the video collector is used for collecting a video to be tracked in a target area; the memory is used for storing program codes; the processor, invoking the program code, when executed, is configured to: acquiring an image frame of a video to be tracked, and carrying out face detection on the image frame; determining at least one detected face in the image frame according to a face detection result; acquiring the intersection ratio of each tracked face and the at least one detected face in the tracking list; determining the matching relationship between each tracking face and the at least one detection face according to the intersection ratio; and tracking the human face according to the matching relation.