CN110009662B

CN110009662B - Face tracking method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN110009662B
Application number: CN201910262510.9A
Authority: CN
Inventors: 杨弋; 周舒畅; 张一山
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-04-02
Filing date: 2019-04-02
Publication date: 2021-09-17
Anticipated expiration: 2039-04-02
Also published as: CN110009662A

Abstract

The embodiment of the application provides a face tracking method, a face tracking device, electronic equipment and a computer readable storage medium, and relates to the technical field of image processing. The method comprises the following steps: the method comprises the steps of processing at least one frame image in a video stream to obtain detection frame information of at least one face, determining attribute information corresponding to the at least one face based on the detection frame information, and tracking the at least one face based on the detection frame information of the at least one face and the attribute information corresponding to the at least one face. According to the embodiment of the application, the probability that the tracking tracks of the faces are alternated when the faces are tracked is reduced, the accuracy of tracking the faces is improved, and the user experience can be improved.

Description

Face tracking method and device, electronic equipment and computer readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for face tracking, an electronic device, and a computer-readable storage medium.

Background

With the development of information technology, a target object tracking technology has also developed, and the target object tracking technology performs trajectory tracking on a target object in each frame image of a video stream.

On many intelligent cameras or face capturing machines, tracking of a target object is achieved by tracking a detection frame for the target object in each frame of image, for example, tracking of the detection frame for a face on each frame of image is achieved to achieve face tracking, but an existing face tracking technology is only applicable to a relatively simple scene, for example, under a scene where only one target face or one face to be tracked exists in each frame of image, if for some complex scenes, for example, under a scene shown in fig. 1a, when real tracks of two faces are staggered in a video, an error track with alternative tracking tracks of the face may occur in the obtained tracking track based on the existing face tracking technology, as shown in fig. 1b, so that accuracy of face tracking is low, and user experience is poor.

Therefore, how to track the human face more accurately becomes a key problem.

Disclosure of Invention

The application provides a face tracking method, a face tracking device, electronic equipment and a computer-readable storage medium, which are used for solving the technical problems of low accuracy of tracking a target object and poor user experience.

In a first aspect, a method for face tracking is provided, where the method includes:

processing at least one frame image in a video stream to obtain detection frame information of at least one face;

determining attribute information corresponding to at least one face based on the detection frame information;

and tracking the at least one face based on the detection frame information of the at least one face and the attribute information corresponding to the at least one face.

In a possible implementation manner, tracking at least one face based on detection frame information of the at least one face and attribute information corresponding to the at least one face includes:

matching with the existing tracking track information according to the detection frame information of at least one face and the attribute information of at least one face;

and updating the existing tracking track information based on the matching result so as to realize the tracking processing of at least one face.

In a possible implementation mode, matching with the existing tracking track information according to the detection frame information of at least one face and the attribute information of at least one face; updating the existing tracking track information based on the matching result, including:

calculating a similarity matrix according to the existing tracking track information and the detection frame information and attribute information of at least one face;

and updating the existing tracking track information according to the similarity matrix.

In a possible implementation manner, updating the existing tracking track information according to the similarity matrix includes:

determining elements which are not larger than a preset threshold value in the similarity matrix;

determining a set of matching edges based on elements which are not greater than a preset threshold value in the similarity matrix through a bipartite graph optimal matching algorithm, wherein any matching edge in the set of matching edges represents any group of matched tracking track information and detection frame information and attribute information of the human face;

and updating the existing tracking track information according to the matching edge set.

In one possible implementation, updating the existing tracking trajectory information includes at least one of:

if the face information corresponding to any frame image in the video stream does not contain the existing tracking track information, deleting the tracking track information which is not contained in the face information corresponding to any frame image in the existing tracking track information;

if the existing tracking track information does not contain the face information corresponding to any frame image in the video stream, adding the face information corresponding to any frame image in the existing tracking track information;

the face information includes: the detection frame information of the human face and the attribute information corresponding to the human face.

In a possible implementation manner, the attribute information corresponding to any face includes at least one of the following:

age information; gender information.

In a possible implementation manner, calculating a similarity matrix according to existing tracking trajectory information and detection frame information and attribute information of at least one human face includes:

calculating any element in the similarity matrix according to a specific formula;

determining a similarity matrix according to each element in the calculated similarity matrix;

the specific formula is:

A_ij＝(T_i1-f_j1)²×a+(T_i2-f_j2)²×b+(T_i3-f_j3)²×c+(T_i4-f_j4) X d, wherein, T_i1F age information corresponding to a face in any one of the existing tracking tracks_j1Age information corresponding to the face detection frame; t is_i2Probability information f of male or female sex corresponding to face in any one of the existing tracking tracks_j2Probability information that the gender corresponding to the face detection frame is male or female; t is_i3-f_j3Representing the Euclidean distance of the central point position between any tracking track in the existing tracking tracks and the face detection frame; t is_i4-f_j4The characteristic distance between any tracking track in the existing tracking tracks and the face frame is used.

In a possible implementation manner, determining attribute information corresponding to at least one face based on the detection box information includes:

and outputting an attribute feature vector corresponding to at least one face through the trained network model based on the detection frame information.

In a second aspect, an apparatus for face tracking is provided, the apparatus comprising:

the processing module is used for processing at least one frame image in the video stream to obtain detection frame information of at least one human face;

the determining module is used for determining attribute information corresponding to at least one face based on the detection frame information;

and the tracking module is used for tracking at least one face based on the detection frame information of at least one face and the attribute information corresponding to at least one face.

In one possible implementation, the tracking module includes: a matching unit and an updating unit, wherein,

the matching unit is used for matching the existing tracking track information according to the detection frame information of at least one face and the attribute information of at least one face;

and the updating unit is used for updating the existing tracking track information based on the matching result of the matching unit so as to realize the tracking processing of at least one face.

In a possible implementation manner, the matching unit is specifically configured to calculate a similarity matrix according to existing tracking trajectory information and detection frame information and attribute information of at least one human face;

and the updating unit is specifically used for updating the existing tracking track information according to the similarity matrix.

In a possible implementation manner, the updating unit is specifically configured to determine an element in the similarity matrix that is not greater than a preset threshold;

the updating unit is specifically used for determining a set of matching edges based on elements, not larger than a preset threshold value, in the similarity matrix through a bipartite graph optimal matching algorithm, wherein any matching edge in the set of matching edges represents any group of matched tracking track information and detection frame information and attribute information of a human face;

and the updating unit is specifically further used for updating the existing tracking track information according to the matching edge set.

In a possible implementation manner, the updating unit is specifically configured to delete, when face information corresponding to any frame image in the video stream does not include existing tracking track information, tracking track information that is not included in the face information corresponding to any frame image in the existing tracking track information; and/or the presence of a gas in the gas,

the updating unit is specifically used for adding the face information corresponding to any frame image in the existing tracking track information when the existing tracking track information does not contain the face information corresponding to any frame image in the video stream;

age information; gender information.

In a possible implementation manner, the matching unit is specifically configured to calculate any element in the similarity matrix according to a specific formula;

the matching unit is specifically used for determining a similarity matrix according to each element in the calculated similarity matrix;

the specific formula is:

In a possible implementation manner, the determining module is specifically configured to output an attribute feature vector corresponding to at least one human face based on the detection box information and through the trained network model.

In a third aspect, an electronic device is provided, which includes:

one or more processors;

a memory;

one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: and executing corresponding operations of the method for tracking the face shown in the first aspect of the present application or any possible implementation manner of the first aspect.

In a fourth aspect, there is provided a computer readable storage medium storing at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of face tracking as shown in the first aspect or any one of the possible implementations of the first aspect.

The beneficial effect that technical scheme that this application provided brought is:

compared with the prior art, the method, the device, the electronic equipment and the computer-readable storage medium for tracking the human face have the advantages that at least one frame of image in a video stream is processed to obtain detection frame information of at least one human face, attribute information corresponding to the at least one human face is determined based on the detection frame information, and the at least one human face is tracked based on the detection frame information of the at least one human face and the attribute information corresponding to the at least one human face. When the method and the device are used for tracking at least one face, not only the attribute information of the face in each detection frame needs to be detected according to the detection frame information in the frame image, but also the probability of the tracking track alternation of a plurality of faces when the plurality of faces are tracked can be reduced, the accuracy of tracking the faces is improved, and the user experience can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

FIG. 1a is a schematic diagram of real trajectory interlacing of two target objects in a video;

FIG. 1b is a schematic diagram of two target objects with tracking tracks that alternate erroneously;

fig. 1c is a schematic flowchart of a method for tracking a human face according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a face tracking apparatus according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device for face tracking according to an embodiment of the present application;

FIG. 4 is a diagram illustrating an example of a representation of a detection box in an embodiment of the present application;

fig. 5 is a schematic flow chart of face tracking in a certain application scenario.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

The embodiment of the present application provides a method for tracking a human face, as shown in fig. 1c, the method includes:

step S101, processing at least one frame image in the video stream to obtain detection frame information of at least one human face.

For the embodiment of the present application, the detection frame may be identification information for indicating an area of the target object in the image, and may be identified by an arbitrary shape, for example, a rectangle, a square, or the like. For example, as shown in fig. 4, a square frame in the image is a detection frame for indicating a face region of a gorilla in the image.

For the embodiment of the present application, step S101 may specifically include: and (3) passing at least one frame image in the video stream through a preset model (detection network) to obtain the detection frame information of at least one face.

For the embodiment of the present application, the detection network may adopt any one of the following network structures:

a single-network object detection framework (SSD) network; SSD and Residual Neural Network (ResNet) 18; SSD and ResNet 50; SSD and ResNet 100; SSD and ShuffleNetV 2; a Regional Convolutional Neural Network (RCNN); RCNN and ResNet 18; RCNN and ResNet 50; RCNN and ResNet 100; RCNN and ShuffleNet V2; FasterRCNN; FasterRCNN and ResNet 18; FasterRCNN and ResNet 50; FasterRCNN and ResNet 100; FasterRCNN and ShuffleNet V2; YOLO-v 1; YOLO-v1 and ResNet 18; YOLO-v1 and ResNet 50; YOLO-v1 and ResNet 100; YOLO-v1 and ShuffleNet V2; YOLO-v 2; YOLO-v2 and ResNet 18; YOLO-v2 and ResNet 50; YOLO-v2 and ResNet 100; YOLO-v2 and ShuffleNet V2; YOLO-v 3; YOLO-v3 and ResNet 18; YOLO-v3 and ResNet 50; YOLO-v3 and ResNet 100; YOLO-v3 and ShuffleNet V2.

In another possible implementation manner of the embodiment of the present application, the detecting frame information includes: at least one of position information of the detection frame in the frame image and a size of the detection frame.

And S102, determining attribute information corresponding to at least one face based on the detection frame information.

In another possible implementation manner of the embodiment of the present application, step S102 may specifically include: and outputting an attribute feature vector corresponding to at least one face through the trained network model based on the detection frame information.

For the embodiment of the application, the human face is identified in any frame image based on the detection frame information; and determining at least one attribute feature vector corresponding to the face according to the recognized face image information through the trained network model.

For the embodiment of the present application, the attribute information corresponding to any face is a numerical value predicted by a preset model (i.e., the trained network model). In the embodiment of the present application, if the attribute is a discrete value (e.g., gender), the information predicted by the preset model is a probability that the attribute belongs to a certain type; if the attribute is a continuous value (e.g., age), a specific value of the attribute is obtained through a preset model.

For example, for the age attribute, the age attribute is obtained as an age value (unit: age) through a preset model, for example, 5 years; for the gender attribute, the probability of being a male is 0.8 and the probability of being a female is 0.2 through a preset model.

Because there may be a plurality of attribute information corresponding to any face, the attribute information corresponding to any face includes: and (4) attribute feature vectors corresponding to any human face.

For the embodiment of the application, the attribute information corresponding to at least one face in any frame of image is determined through the preset model and based on the detection frame information, so that the accuracy of the determined attribute information corresponding to at least one face can be improved, the efficiency of determining the attribute information corresponding to at least one face can also be improved, and the accuracy and the efficiency of face tracking can be further improved.

Step S103, tracking at least one face based on the detection frame information of at least one face and the attribute information corresponding to at least one face.

Compared with the prior art, the method for tracking the human face comprises the steps of processing at least one frame of image in a video stream to obtain detection frame information of at least one human face, determining attribute information corresponding to the at least one human face based on the detection frame information, and tracking the at least one human face based on the detection frame information of the at least one human face and the attribute information corresponding to the at least one human face. When at least one face is tracked, the embodiment of the application not only needs to track the face according to the detection frame information in the frame image but also needs to track the attribute information of the face in each detection frame, so that the probability of the alternation of the tracking tracks of a plurality of faces when the plurality of faces are tracked can be reduced, the accuracy of tracking the faces is improved, and the user experience can be further improved.

In another possible implementation manner of the embodiment of the present application, step S103 may specifically include: step S1031 (not shown in the figure) and step S1032 (not shown in the figure), wherein,

and step S1031, matching the information with the existing tracking track information according to the detection frame information of the at least one face and the attribute information of the at least one face.

And step S1032, updating the existing tracking track information based on the matching result so as to realize the tracking processing of at least one face.

In another possible implementation manner of the embodiment of the application, the attribute information corresponding to any face includes at least one of the following:

age information; gender information; skin color information; hair color information; iris color information; and (4) ornament information.

In another possible implementation manner of the embodiment of the present application, step S1031 may specifically include: calculating a similarity matrix according to the existing tracking track information and the detection frame information and attribute information of at least one face; step S1032 may specifically include: and updating the existing tracking track information according to the similarity matrix.

Another possible implementation manner of the embodiment of the present application is that, according to existing tracking trajectory information and detection frame information and attribute information of at least one human face, a similarity matrix is calculated, including: calculating any element in the similarity matrix according to a specific formula; determining a similarity matrix according to each element in the calculated similarity matrix;

wherein, the specific formula is:

A_ij＝(T_i1-f_j1)²×a+(T_i2-f_j2)²×b+(T_i3-f_j2)²×c+(T_i4-f_j4) X d, wherein, T_i1F age information corresponding to a face in any one of the existing tracking tracks_j1Age information corresponding to the face detection frame; t is_i2Probability information f of male or female sex corresponding to face in any one of the existing tracking tracks_j2Probability information that the gender corresponding to the face detection frame is male or female; t is_i3-f_j3Representing the Euclidean distance of the central point position between any tracking track in the existing tracking tracks and the face detection frame; t is_i4-f_j4The characteristic distance between any tracking track in the existing tracking tracks and the face frame is used.

For example, assume that there are n original tracks (track information corresponding to original faces), which are respectively referred to as T _1, T _2,.. and T _ n, and there are m faces in a new frame, which are respectively referred to as f _1, f _2,. and.f _ m.

The ith row and the jth column of the calculated similarity matrix A are the square of the difference between the age attributes of the track T _ i and the face f _ j multiplied by a coefficient a, the square of the difference between the gender attributes of the track T _ i and the face f _ j for predicting the probability of the human being as a female is multiplied by a coefficient b, the square of the Euclidean distance of the central point positions of the track T _ i and the face f _ j is multiplied by a coefficient c, and the characteristic distance of the track T _ i and the face f _ j is multiplied by a coefficient d.

The above a, b, c and d are constants selected in advance, and what algorithm is used for the "distance of the face feature" is often determined by a specific face recognition algorithm. The features of each face are typically represented as a high-dimensional vector, and the distance of a feature is represented as the squared euclidean distance of two such vectors, or the cosine of the angle between such two vectors in the vector space.

Another possible implementation manner of the embodiment of the present application, updating existing tracking track information according to the similarity matrix, includes: determining elements which are not larger than a preset threshold value in the similarity matrix; determining a set of matching edges based on elements which are not greater than a preset threshold value in the similarity matrix through a bipartite graph optimal matching algorithm, wherein any matching edge in the set of matching edges represents any group of matched tracking track information and detection frame information and attribute information of the human face; and updating the existing tracking track information according to the matching edge set.

Specifically, a preselected threshold value T is used, and for all elements A _ ij in the similarity matrix, which are greater than T, the track T _ i and the face f _ j cannot be matched certainly; for all possible matching trajectory and face bigrams (where the definition of possible matching is the same as in the previous paragraph, matching is possible if a _ ij < ═ t), a bipartite graph optimal matching algorithm is used to obtain an optimal matching solution.

The output of the bipartite graph matching algorithm is a set of matching edges, each matching edge represents a set of matching tracks and faces, and it is ensured that any face is only matched to at most one track, and any track is only matched to at most one face.

For the embodiment of the application, the bipartite graph is also called a bipartite graph and is a special model in graph theory; let G ═ V, E be an undirected graph, and if vertex V can be partitioned into two mutually disjoint subsets (a, B), and the two vertices i and j associated with each edge (i, j) in the graph belong to the two different sets of vertices (i in a, j in B), respectively, graph G is called a bipartite graph.

Another possible implementation manner of the embodiment of the present application, updating the existing tracking trace information, includes at least one of step Sa (not shown in the figure) and step Sb (not shown in the figure), wherein,

step Sa, if the face information corresponding to any frame image in the video stream does not include the existing tracking track information, deleting the tracking track information that is not included in the face information corresponding to any frame image in the existing tracking track information.

And Sb, if the existing tracking track information does not contain the face information corresponding to any frame image in the video stream, adding the face information corresponding to any frame image in the existing tracking track information.

Wherein, the face information includes: the detection frame information of the human face and the attribute information corresponding to the human face.

For the embodiment of the application, for the tracks which are not matched with the human faces, if the human faces corresponding to the tracks are considered to leave the picture, the tracks are deleted from the original tracking track information; and for the faces which are not matched with the track, if the faces are considered as the new faces of the frame, adding corresponding face information in the existing tracking track.

The above method for tracking the target is described in detail, and the following method for tracking the target is described in a summary manner through an application scenario, and is specifically shown in fig. 5:

after any frame image in the video stream is preprocessed, the preprocessed frame image passes through a detection network, human face detection frame information in the frame image is output, attribute information corresponding to each human face is obtained through a human face attribute network based on the human face detection frame information, the human face frame information in the frame image and the attribute information corresponding to each human face are obtained through a human face tracking module, and human face tracking information is obtained.

The above embodiment introduces the flow of the face tracking method from the perspective of the flow of the method, and the following introduces the face tracking apparatus from the perspective of the virtual module with reference to the accompanying drawings, which are specifically as follows:

the embodiment of the present application provides an apparatus for face tracking, as shown in fig. 2, the apparatus 20 for face tracking may include a processing module 21, a determining module 22, and a tracking module 23, wherein,

the processing module 21 is configured to obtain detection frame information of at least one human face by processing at least one frame of image in the video stream.

And the determining module 22 is configured to determine attribute information corresponding to at least one human face based on the detection frame information.

And the tracking module 23 is configured to track at least one face based on the detection frame information of the at least one face and the attribute information corresponding to the at least one face.

In another possible implementation manner of the embodiment of the present application, the tracking module 23 includes: a matching unit and an updating unit, wherein,

and the matching unit is used for matching the existing tracking track information according to the detection frame information of at least one face and the attribute information of at least one face.

In another possible implementation manner of the embodiment of the application, the matching unit is specifically configured to calculate the similarity matrix according to existing tracking track information and detection frame information and attribute information of at least one human face.

In another possible implementation manner of the embodiment of the present application, the updating unit is specifically configured to determine an element in the similarity matrix, where the element is not greater than a preset threshold.

And the updating unit is specifically used for determining a matching edge set based on the elements which are not greater than the preset threshold value in the similarity matrix through a bipartite graph optimal matching algorithm, wherein any matching edge in the matching edge set represents any group of matched tracking track information and detection frame information and attribute information of the human face.

In another possible implementation manner of the embodiment of the application, the updating unit is specifically configured to delete tracking track information that is not included in face information corresponding to any frame image in the existing tracking track information when the face information corresponding to any frame image in the video stream does not include the existing tracking track information; and/or the updating unit is specifically configured to add face information corresponding to any frame image in the existing tracking track information when the existing tracking track information does not include the face information corresponding to any frame image in the video stream.

In another possible implementation manner of the embodiment of the present application, the attribute information corresponding to any face includes: at least one of age information and gender information.

In another possible implementation manner of the embodiment of the present application, the matching unit is specifically configured to calculate any element in the similarity matrix according to a specific formula.

And the matching unit is specifically used for determining the similarity matrix according to each element in the calculated similarity matrix.

Wherein, the specific formula is:

In another possible implementation manner of the embodiment of the present application, the determining module 22 is specifically configured to output an attribute feature vector corresponding to at least one human face based on the detection box information and through the trained network model.

Compared with the prior art, the embodiment of the application provides a face tracking device, and the method and the device have the advantages that at least one frame of image in a video stream is processed to obtain the detection frame information of at least one face, the attribute information corresponding to the at least one face is determined based on the detection frame information, and then the at least one face is tracked based on the detection frame information of the at least one face and the attribute information corresponding to the at least one face. When at least one face is tracked, the embodiment of the application not only needs to track the face according to the detection frame information in the frame image but also needs to track the attribute information of the face in each detection frame, so that the probability of the alternation of the tracking tracks of a plurality of faces when the plurality of faces are tracked can be reduced, the accuracy of tracking the faces is improved, and the user experience can be further improved.

The face tracking apparatus of this embodiment may execute the face tracking method provided in the foregoing method embodiments, and the implementation principles thereof are similar, and are not described herein again.

The above embodiments describe a face tracking method from the perspective of a method flow and a face tracking device from the perspective of a virtual module, and an electronic device is described below with reference to the accompanying drawings from the perspective of a physical device to execute the face tracking method, which is specifically as follows:

an embodiment of the present application provides an electronic device, as shown in fig. 3, an electronic device 3000 shown in fig. 3 includes: a processor 3001 and a memory 3003. The processor 3001 is coupled to the memory 3003, such as via a bus 3002. Optionally, the electronic device 3000 may further comprise a transceiver 3004. It should be noted that the transceiver 3004 is not limited to one in practical applications, and the structure of the electronic device 3000 is not limited to the embodiment of the present application.

The processor 3001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 3001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 3002 may include a path that conveys information between the aforementioned components. The bus 3002 may be a PCI bus or an EISA bus, etc. The bus 3002 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.

Memory 3003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 3003 is used for storing application program codes for performing the present scheme, and is controlled to be executed by the processor 3001. The processor 3001 is configured to execute application program code stored in the memory 3003 to implement any of the method embodiments shown above.

An embodiment of the present application provides an electronic device, where the electronic device includes: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements: according to the method and the device, at least one frame of image in the video stream is processed to obtain the detection frame information of at least one face, the attribute information corresponding to the at least one face is determined based on the detection frame information, and then the at least one face is tracked based on the detection frame information of the at least one face and the attribute information corresponding to the at least one face. When at least one face is tracked, the embodiment of the application not only needs to track the face according to the detection frame information in the frame image but also needs to track the attribute information of the face in each detection frame, so that the probability of the alternation of the tracking tracks of a plurality of faces when the plurality of faces are tracked can be reduced, the accuracy of tracking the faces is improved, and the user experience can be further improved.

The electronic device of this embodiment may execute the method for face tracking provided by the above method embodiments, and the implementation principles thereof are similar, and are not described herein again.

The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the method and the device have the advantages that at least one frame of image in the video stream is processed to obtain the detection frame information of at least one face, the attribute information corresponding to the at least one face is determined based on the detection frame information, and then the at least one face is tracked based on the detection frame information of the at least one face and the attribute information corresponding to the at least one face. When at least one face is tracked, the embodiment of the application not only needs to track the face according to the detection frame information in the frame image but also needs to track the attribute information of the face in each detection frame, so that the probability of the alternation of the tracking tracks of a plurality of faces when the plurality of faces are tracked can be reduced, the accuracy of tracking the faces is improved, and the user experience can be further improved.

The computer-readable storage medium of this embodiment is suitable for the method for face tracking provided in the foregoing method embodiments, and the implementation principles thereof are similar, and are not described herein again.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method of face tracking, comprising:

tracking at least one face based on the detection frame information of the at least one face and the attribute information corresponding to the at least one face;

tracking at least one face based on the detection frame information of the at least one face and the attribute information corresponding to the at least one face, including:

updating the existing tracking track information according to the similarity matrix so as to realize the tracking processing of the at least one face;

calculating a similarity matrix according to the existing tracking track information and the detection frame information and attribute information of at least one face, wherein the similarity matrix comprises:

the specific formula is:

A_ij＝(T_i1-f_j1)²×a+(T_i2-f_j2)²×b+(T_i3-f_j3)²×c+(T_i4-f_j4)²x d, wherein, T_i1F age information corresponding to a face in any one of the existing tracking tracks_j1Age information corresponding to the face detection frame; t is_i2Probability information f of male or female sex corresponding to face in any one of the existing tracking tracks_j2Probability information that the gender corresponding to the face detection frame is male or female; t is_i3-f_j3Representing the Euclidean distance of the central point position between any tracking track in the existing tracking tracks and the face detection frame; t is_i4-f_j4And a, b, c and d are constants which are selected in advance for the characteristic distance between any tracking track in the existing tracking tracks and the face frame.

2. The method of claim 1, wherein tracking at least one face based on the detection box information of the at least one face and the attribute information corresponding to the at least one face comprises:

matching with the existing tracking track information according to the detection frame information of the at least one face and the attribute information of the at least one face;

and updating the existing tracking track information based on the matching result so as to realize the tracking processing of the at least one face.

3. The method of claim 1, wherein updating the existing tracking trajectory information according to a similarity matrix comprises:

determining a set of matching edges by a bipartite graph optimal matching algorithm based on elements not greater than a preset threshold in the similarity matrix, wherein any matching edge in the set of matching edges represents any group of matched tracking track information and detection frame information and attribute information of a human face;

4. The method of claim 1, wherein the updating the existing tracking trajectory information comprises at least one of:

if the existing tracking track information does not contain face information corresponding to any frame image in the video stream, adding the face information corresponding to any frame image in the existing tracking track information;

5. The method according to any one of claims 1 to 4, wherein the attribute information corresponding to any one face comprises at least one of the following:

age information; gender information.

6. The method according to any one of claims 1 to 4, wherein determining attribute information corresponding to at least one face based on the detection box information comprises:

and outputting the attribute feature vector corresponding to the at least one face through the trained network model based on the detection frame information.

7. An apparatus for face tracking, comprising:

the tracking module is used for tracking at least one face based on the detection frame information of the at least one face and the attribute information corresponding to the at least one face;

wherein, the tracking module includes: a matching unit and an updating unit, wherein,

the matching unit is specifically used for calculating a similarity matrix according to the existing tracking track information and the detection frame information and attribute information of at least one face;

the updating unit is specifically configured to update the existing tracking trajectory information according to the similarity matrix, so as to perform tracking processing on the at least one face;

the matching unit is specifically used for calculating any element in the similarity matrix according to a specific formula; determining a similarity matrix according to each element in the calculated similarity matrix;

the specific formula is:

8. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: a method of performing face tracking according to any of claims 1 to 6.

9. A computer readable storage medium storing at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of face tracking according to any one of claims 1 to 6.