CN111428607B

CN111428607B - Tracking method and device and computer equipment

Info

Publication number: CN111428607B
Application number: CN202010195529.9A
Authority: CN
Inventors: 俞旭锋; 李平生
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-03-19
Filing date: 2020-03-19
Publication date: 2023-04-28
Anticipated expiration: 2040-03-19
Also published as: CN111428607A

Abstract

The invention discloses a tracking method, a tracking device and computer equipment, which are used for solving the problem of lower face tracking accuracy under the condition that a plurality of faces are mutually blocked. The method comprises the following steps: determining a tracking target and face prediction frame information of the tracking target, and carrying out cross-correlation calculation on the face prediction frame information and a plurality of head-shoulder frame information in a first frame image to determine a shielding area, wherein the tracking target is used for representing information corresponding to face head image information of a first frame image, which is not associated and matched with the face information in a second frame image; determining whether face information which is not matched in an associated mode appears in a preset range corresponding to the shielding area; when the face information which is not matched in a correlation manner appears, expanding a detection area of the shielding area by a preset multiple to obtain first shielding area information, and calculating an intersection ratio of the first shielding area information and the face information which is not matched in a correlation manner; if the cross ratio is greater than or equal to a first threshold, determining that the unassociated face information is the face information corresponding to the tracking target.

Description

Tracking method and device and computer equipment

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a tracking method, a tracking device, and a computer device.

Background

Currently, with the maturation of the deep learning algorithm, the face recognition technology based on the deep learning algorithm has achieved tremendous development and progress in recent years. The existing face tracking method adopts front and back frames of face detection frames to carry out association matching, and the motion trail of the face in the video is determined through the correlation of the face detection frames between the continuous frames.

However, under the condition of large-area shielding, particularly when the faces are mutually shielded, the accuracy of actively tracking the predicted face position is not high, and tracking targets are easily lost or tracking errors are easily caused only by a face detection frame association matching mode. Therefore, in the prior art, the accuracy of face tracking is lower under the condition that a plurality of faces are mutually shielded.

Disclosure of Invention

The embodiment of the invention provides a tracking method which is used for solving the technical problem of lower accuracy of face tracking under the condition that a plurality of faces are mutually blocked in the prior art.

In a first aspect, there is provided a tracking method, the method comprising:

determining a tracking target and face prediction frame information of the tracking target, and performing cross-correlation calculation on the face prediction frame information and a plurality of head-shoulder frame information in a first frame image to determine an occlusion region, wherein the tracking target is used for representing information corresponding to face head image information of a first frame image which is not associated and matched with face information in a second frame image, the first frame image and the second frame image belong to a first video stream, the first video stream comprises a plurality of frame images, and the second frame image and the first frame image are adjacent;

Determining whether face information which is not associated and matched appears in a preset range corresponding to the shielding area from images with preset frames after the frames of the second frame image;

when face information which is not matched in a correlation manner appears, expanding a detection area of the shielding area by a preset multiple to obtain first shielding area information, and calculating an intersection ratio of the first shielding area information and the face information which is not matched in a correlation manner;

and if the intersection ratio is greater than or equal to a first threshold value, determining that the unassociated face information is the face information corresponding to the tracking target.

In one possible embodiment, the method further comprises:

if the intersection ratio is smaller than a first threshold, extracting the characteristics of the face information which is not associated and matched, and performing intersection ratio calculation on the characteristics of the face information which is not associated and matched and the characteristics of first face information, wherein the first face information is the face information corresponding to the tracking target determined according to the first frame image;

and if the intersection ratio of the feature of the face information which is not associated and matched with the feature of the first face information is larger than or equal to a second threshold value, determining the face information which is not associated and matched with the feature of the first face information as the face information corresponding to the tracking target.

In one possible implementation, determining a tracking target and face prediction box information of the tracking target includes:

receiving a first frame image and a second frame image which are input, wherein the first frame image and the second frame image comprise a plurality of face information and a plurality of head and shoulder area information;

and carrying out association matching on the first frame image and the second frame image, determining a tracking target, and determining face prediction frame information of the tracking target according to image information of a first preset frame number before the second frame image in the first video stream.

In one possible implementation manner, performing association matching on the first frame image and the second frame image, and determining the tracking target includes:

detecting the first frame image to obtain a plurality of face image information and a plurality of head and shoulder frame area information, and taking the plurality of face image information with a first association matching relation with the plurality of head and shoulder frame information as a plurality of tracking targets;

detecting the second frame image to obtain a plurality of face image information and a plurality of head-shoulder frame region information;

performing cross-correlation calculation on the tracking targets and the face image information in the second frame image to determine whether the tracking targets are associated and matched with the face image information in the second frame image;

And if the intersection ratio of the first tracking target and the plurality of face image information in the second frame image is smaller than a third threshold value, determining that the first tracking target is a tracking target, wherein the first tracking target is used for representing face information which is not associated and matched with the plurality of face image information in the second frame image.

In one possible implementation manner, determining the face prediction frame information of the tracking target according to the image information of the first predetermined frame number before the second frame image includes:

determining image information of a first preset frame number before the second frame image, and calculating coordinates of the center position of the face detection frame of the tracking target in the image information of the first preset frame number;

and determining the coordinates of the diagonal positions of the face prediction frames of the tracking targets in the second frame image according to the coordinates of the central positions of the face detection frames, so as to obtain the face prediction frame information of the tracking targets according to the coordinates of the diagonal positions.

In a second aspect, there is provided a tracking device, the device comprising:

the system comprises a determining module, a tracking target and face prediction frame information of the tracking target, wherein the tracking target is used for representing information corresponding to face head portrait information of a first frame image which is not associated and matched with face information in a second frame image, the first frame image and the second frame image belong to a first video stream, the first video stream comprises multi-frame images, and the second frame image is adjacent to the first frame image;

The judging module is used for determining whether face information which is not matched in an associated mode appears in a preset range corresponding to the shielding area from images with preset frames after the frames of the second frame image;

the tracking module is used for expanding the detection area of the shielding area by a preset multiple to obtain first shielding area information when the face information which is not associated and matched appears, and calculating the intersection ratio of the first shielding area information and the face information which is not associated and matched; and if the intersection ratio is greater than or equal to a first threshold value, determining that the unassociated face information is the face information corresponding to the tracking target.

In a possible implementation manner, the tracking module is configured to:

In a possible implementation manner, the determining module is configured to:

In a third aspect, there is provided a computer device comprising:

a memory for storing program instructions;

and a processor for calling program instructions stored in the memory, and executing steps comprised in any one of the methods of the first aspect according to the obtained program instructions.

In a fourth aspect, there is provided a storage medium storing computer-executable instructions for causing a computer to perform the steps comprised in any one of the methods of the first aspect.

In a fifth aspect, there is provided a computer program product enabling a computer device to carry out the steps comprised by any of the methods of the first aspect, when said computer program product is run on a computer device.

The technical scheme provided by the embodiment of the invention at least has the following beneficial effects:

in the embodiment of the invention, the tracking target can be determined by performing association matching calculation on the input first frame image and second frame image. Furthermore, face prediction frame information of the tracking target can be determined, and the face prediction frame information and a plurality of head and shoulder frame information in the first frame image are subjected to cross-ratio calculation to determine the shielding area. Then, it is possible to determine whether face information of unassociated matching appears in a predetermined range corresponding to the occlusion region from an image of a predetermined frame number subsequent to the frame number of the second frame image, that is, to determine the occlusion region by calculating an intersection ratio of the face prediction frame information and the head-shoulder frame information, that is, to determine a region in which the face information of the tracking target may be occluded.

Further, when face information which is not matched in an associated mode appears, expanding a detection area of the shielding area by a preset multiple to obtain first shielding area information, and calculating an intersection ratio of the first shielding area information and the face information which is not matched in an associated mode; if the cross ratio is greater than or equal to a first threshold, determining that the unassociated face information is the face information corresponding to the tracking target.

That is, in the embodiment of the invention, when the tracking target appears, the cross-correlation calculation can be performed by using the face prediction frame corresponding to the tracking target and the head-shoulder frame information in the previous frame image, so that the shielding area can be determined, namely, the area where the face information is possibly shielded is determined, and after the frame image, whether the face information which is not associated and matched appears in the preset range corresponding to the shielding area is judged, so that whether the face information is the face information corresponding to the tracking target is judged, and the matching of the tracking target is completed, and further, the accuracy of the face tracking can be improved under the condition that multiple faces are shielded mutually.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention.

FIG. 1 is a schematic view of an application scenario in an embodiment of the present invention;

FIG. 2 is a flowchart of a tracking method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of determining a tracking target according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of determining a tracking target according to an embodiment of the present invention;

FIG. 5 is a block diagram of a tracking device according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a computer device according to an embodiment of the present invention;

fig. 7 is a schematic diagram of another structure of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. Embodiments of the invention and features of the embodiments may be combined with one another arbitrarily without conflict. Also, while a logical order is depicted in the flowchart, in some cases, the steps depicted or described may be performed in a different order than presented herein.

The terms first and second in the description and claims of the invention and in the above-mentioned figures are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

In the embodiment of the present invention, the "plurality" may mean at least two, for example, two, three or more, and the embodiment of the present invention is not limited.

As described above, in the prior art, when performing a face tracking study, a tracking target can be determined using a face detection frame association match. However, when the mutual shielding between faces is encountered, the accuracy of actively tracking the predicted face position is not high, and the tracking target is determined by only association matching, so that the tracking of one person is easily changed into the tracking of the other person, namely the tracking target is easily lost or the tracking target is easily wrong. Therefore, in the prior art, the accuracy of face tracking is lower under the condition that a plurality of faces are mutually shielded.

In view of this, the embodiment of the present invention provides a tracking method, which can calculate the blending ratio of the face prediction frame information and the head-shoulder frame information to track the tracking target. That is, the tracking method provided by the embodiment of the invention determines the tracking target by calculating the association between the face frame information and the head and shoulder frame information, so that the tracking of the tracking target can be accurately realized in a scene with multiple face shielding.

After the design idea of the embodiment of the present invention is introduced, some simple descriptions are made below for application scenarios applicable to the technical solution of the embodiment of the present invention, and it should be noted that the application scenarios described below are only used for illustrating the embodiment of the present invention and are not limiting. In the specific implementation process, the technical scheme provided by the embodiment of the invention can be flexibly applied according to actual needs.

Referring to fig. 1, a schematic view of a scenario in which an embodiment of the present invention is applicable includes a video capturing device 101, a computer device 102, and a processing device 103, and the tracking method of the embodiment of the present invention may be implemented by cooperation of the video capturing device 101, the processing device 103, and the computer device 102 in fig. 1.

In a specific implementation process, the video capturing device 101 may be installed in a scene where people need to track, such as an airport, a high-speed rail, a subway, a railway station, a campus, a business district, etc., for example, installed in an airport entrance or a high-speed rail station, etc., so as to obtain video stream data of continuously flowing changes of people; alternatively, the video acquisition device 101 may be installed at each intersection, and may also acquire video stream data of a crowd that continuously changes.

Taking the application site where the video capturing apparatus 101 is installed in a high-speed rail station as an example, after the video capturing apparatus 101 captures video stream data, the video stream data may be sent to the computer apparatus 102 through the network 104.

The computer device 102 may include one or more processors 1021, memory 1022, I/O interfaces 1023 that interact with video capture devices, I/O interfaces 1024 that interact with processing devices, and so forth. In a specific implementation process, a plurality of computer devices may interact with a plurality of video acquisition devices, or one computer device may interact with one video acquisition device, which is not limited in the embodiment of the present invention. Specifically, the computer device may also receive processed video stream data sent from the processing device, and fig. 1 illustrates an example in which one computer device interacts with one video capturing device and one processing device.

In an embodiment of the present invention, the computer device 102 may receive the video stream data transmitted by the video capture device 101 through the I/O interface 1023, then process the video stream data with the processor 1021, and store the processed information in the memory 1022. Of course, the computer device may receive the video stream data transmitted by the processing device 103 through the I/O interface 1024, then process the video stream data using the processor 1021, and store the processed information in the memory 1022.

Specifically, the computer device 102 may use the same type of processor or different types of processors for the video stream data sent by the video capturing device 101 and the video stream data sent by the processing device 103, which is not limited in the embodiment of the present invention. In a specific implementation, the video capture device 101 may send the captured video stream data to the processing device 103, i.e. the interaction between the video capture device 101 and the processing device 103 may be performed. In fig. 1, the video stream data acquired by the video acquisition device 101 is not sent to the processing device 103.

The video capture device 101 and the computer device 102 may be communicatively coupled via one or more networks 104. The processing device 103 may also be communicatively coupled to the computer device 102 via one or more networks 104. The network 104 may be a wired network, or may be a WIreless network, for example, a mobile cellular network, or may be a WIreless-Fidelity (WIFI) network, or may be other possible networks, which the embodiments of the present invention are not limited to.

In a specific implementation process, after the computer equipment receives the video stream data, target tracking processing can be performed on the video stream data, so that the running track of the person can be accurately determined. The following describes a tracking method according to an exemplary embodiment of the present invention with reference to fig. 2 in conjunction with the application scenario of fig. 1. It should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principle of the present invention, and the embodiments of the present invention are not limited in any way. Rather, embodiments of the invention may be applied to any scenario where applicable.

As shown in fig. 2, a flowchart of a tracking method according to an embodiment of the present invention is shown, and the method may be performed by a computer device shown in fig. 1, for example, and the following describes a method flow according to an embodiment of the present invention.

Step 201: and determining face prediction frame information of a tracking target and the tracking target, and performing cross-correlation calculation on the face prediction frame information and a plurality of head-shoulder frame information in the first frame image to determine a shielding area, wherein the tracking target is used for representing information corresponding to face head image information of the first frame image, which is not associated and matched with the face information in the second frame image.

In the embodiment of the present invention, a plurality of video streams input to the computer device by the video acquisition device may be received, where the plurality of video streams may be triggered and input to the computer device according to a preset condition, for example, the acquired video streams are input to the computer device according to a preset time period, or a predetermined number of video streams are input, and the embodiment of the present invention is not limited. Of course, the multiple video streams may be selected by the user according to actual requirements, so that the input video stream may be determined, that is, the video stream processed by the processing device in fig. 1 may be understood. Hereinafter, for convenience of description, an input video stream will be referred to as a first video stream.

In the embodiment of the invention, the first video stream comprises a plurality of frames of images, and each frame of image in the plurality of frames of images comprises a plurality of face information and a plurality of head and shoulder area information. In the embodiment of the invention, a first frame image and a second frame image belonging to a first video stream can be obtained, wherein the first frame image is adjacent to the second frame image.

For example, the first video stream includes 60 frames of images, and the 60 frames of images are sequentially ordered in order from 1 to 60, and if the 45 th frame of image is referred to as a first frame of image, the 46 th frame of image may be referred to as a second frame of image.

In the embodiment of the invention, the first frame image and the second frame image can be subjected to association matching to determine the tracking target, wherein the tracking target is used for representing the information corresponding to the face head image information of the second frame image, which is not associated and matched with the face information in the first frame image.

In the embodiment of the invention, the first frame image can be detected, so that a plurality of face image information and a plurality of head and shoulder frame area information in the first frame image can be obtained, and a plurality of face image information with a first association matching relation with a plurality of head and shoulder frame areas are used as a plurality of tracking targets. Then, the second frame image may be subjected to detection processing, so that a plurality of face image information and a plurality of head-shoulder frame region information in the second frame image may be obtained. That is, the computer device may perform detection processing on the image of the input first video stream, so that a plurality of face image information and a plurality of head-shoulder frame region information in the image may be obtained.

In a specific implementation process, when the first frame image and the second frame image are subjected to association matching, first association matching can be performed on a plurality of face image information and a plurality of head and shoulder frame area information in the first frame image, namely whether the face image information and the head and shoulder frame area information are associated and matched or not can be determined, and therefore a plurality of tracking targets can be determined.

In order to better understand the process of performing the first association matching on the face image information and the head-shoulder area information, a process of performing the cross-correlation calculation on the face image information and the head-shoulder area information in the embodiment of the invention is described below by taking one face image information and one head-shoulder area information as an example. If it is determined that the lower left corner coordinates of the rectangular frame T corresponding to the face image information are (X0, Y0), the upper right corner coordinates are (X1, Y1), and the lower left corner coordinates of the rectangular frame G corresponding to the head-shoulder area information are (M0, N0), the upper right corner coordinates are (M1, N1), the cross ratio

When the intersection ratio is determined to be larger than the preset threshold value, the face image information and the head and shoulder area information can be determined to have a first association matching relation, namely the face image information and the head and shoulder area information can be completely determined in the frame image.

In the embodiment of the invention, the cross-over ratio calculation can be performed on the tracking targets determined from the first frame image and the face image information in the second frame image, and whether the tracking targets are associated and matched with the face image information in the second frame image or not is determined according to the calculated cross-over ratio. That is, after the tracking target is determined, the face image information corresponding to the tracking target and the face image information of the plurality of face image information in the second frame image may be subjected to the calculation of the intersection ratio between the face image information and the face image information, so as to track the motion trail of the tracking target.

In a specific implementation process, a tracking target and a plurality of face information in a second frame image can be subjected to cross-ratio calculation, and the tracking targets determined in the first frame image and the face information in the second frame image can be subjected to cross-ratio calculation, so that the tracking targets can be determined. In this way, the tracking target can be determined more accurately.

In the embodiment of the invention, if the intersection ratio of the first tracking target and the plurality of face image information in the second frame image is smaller than the third threshold value, determining that the first tracking target is the tracking target, wherein the first tracking target is used for representing the face information which is not associated and matched with the plurality of face image information in the second frame image.

In a specific implementation process, when it is determined that the blending ratio of the first tracking target in the plurality of tracking targets and the plurality of face image information in the second frame image is smaller than the third threshold, it may be understood that the first tracking target and the plurality of face image information in the second frame image are not associated and matched, that is, the first tracking target may be blocked by a face corresponding to any face image information in the plurality of face image information in the second frame image, and for convenience of distinguishing description, the first tracking target is referred to as a tracking target, so as to achieve tracking of a motion track of the first tracking target.

In the embodiment of the invention, the face prediction frame information of the tracking target can be determined according to the image information of the preset frame number before the second frame image in the first video stream, and the face prediction frame information and the head-shoulder frame information in the first frame image are subjected to cross-over comparison calculation to determine the shielding area, wherein the shielding area is used for representing the area, corresponding to the tracking target, of which the face information is shielded.

In the embodiment of the invention, after the tracking target is determined, the face image prediction frame information of the tracking target in the second frame image can be determined, and then the area where the tracking target is blocked can be determined according to the face prediction frame information.

In the embodiment of the invention, after the tracking target is determined, the image information of a first preset frame number before the second frame image in the first video stream can be determined, for example, the image information of 10 frame images before the second frame image in the first video stream is determined, then the coordinates of the central position of the face detection frame of the tracking target in the image information of the preset frame number can be calculated, so that the coordinates of the diagonal position of the face prediction frame of the tracking target in the second frame image can be determined according to the coordinates of the central position of the face detection frame, and the face prediction frame information of the tracking target can be obtained according to the coordinates of the diagonal position.

In a specific implementation process, if the predetermined frame number is 5 frames, the image information of 5 frames of images before the second frame of images in the first video stream can be determined, so that the coordinates of the central position of the face detection frame of the tracking target in the 5 frames of image information can be determined, and then the coordinates of the diagonal position of the face prediction frame of the tracking target in the second frame of images can be determined according to the coordinates of the central position of the face detection frame.

In the practical implementation process, when the mutual occlusion between faces in the images is considered, if only the image corresponding to the face detection frame is used as an occlusion image, the problem of error determination of the tracking target can be caused by the fact that the face detection frame is small and the occlusion area cannot be accurately determined, the head and shoulder frame is stable relative to the detection of the faces, and the two are complemented with each other, so that the implementation stability of a detection algorithm adopted by the computer equipment is greatly improved. When the tracking target is not detected in the second frame image, the motion trail information of the tracking target frame, which is adjacent to 5 frames before the second frame image, can be selected, so that the determination of the prediction frame can be accurately performed.

Specifically, the following formula may be used to calculate the coordinates of the diagonal position of the face prediction frame of the tracking target:

Wherein X is used to characterize the abscissa of the rectangular box and Y is used to characterize the ordinate of the rectangular box, wherein X' ₁ Abscissa, x for representing upper left corner of face prediction frame ₁ The abscissa, y 'used for representing the upper left corner of the face detection frame before the tracking target is blocked' ₁ Ordinate, y used for representing upper left corner of face prediction frame ₁ The ordinate of the upper left corner of the face detection frame before the tracking target is blocked, n represents the information of the detection frame of a preset frame number, i is a counting unit,

representing the transverse sitting of the central points of two adjacent framesDifference of targets>

Representing the difference between the ordinate of the center points of two adjacent frames.

Step 202: from the images of a predetermined frame number after the frame number of the second frame image in the first video stream, determining whether the face information of unassociated matching appears in a predetermined range corresponding to the shielding area, if not, executing step 203, and if so, returning to step 201.

Step 203: when the face information which is not matched in a correlation mode appears, the detection area of the shielding area is expanded by a preset multiple to obtain first shielding area information, and the intersection ratio of the first shielding area information and the face information which is not matched in a correlation mode is calculated.

Step 204: if the cross ratio is greater than or equal to a first threshold, determining that the unassociated face information is the face information corresponding to the tracking target.

In the embodiment of the invention, after determining the shielding area, the tracking period of the tracking target can be determined. For example, 30 frames of images are tracked, or 20 frames of images are tracked, i.e. the predetermined number of frames may be 30 frames or 20 frames, etc. And judging whether face information which is not matched in an associated mode appears in a preset range corresponding to the shielding area in the image with the preset frame number after the frame number of the second frame image.

It should be noted that, in the embodiment of the present invention, the number of frames corresponding to the predetermined number of frames and the first predetermined number of frames are positive integers greater than 0, and the predetermined number of frames and the first predetermined number of frames may be the same or different, which is not limited in the embodiment of the present invention.

In the embodiment of the present invention, when it is determined that face information of unassociated matching occurs in a predetermined range corresponding to an occlusion region from an image of a predetermined frame number subsequent to a frame number of a second frame image in a first video stream, the detection region of the occlusion region may be expanded by a predetermined multiple, specifically, the predetermined multiple may be two times of the length and the width corresponding to the detection region of the occlusion region, or three times of the length and the width corresponding to the detection region of the occlusion region.

In a specific implementation process, when a detection area of a shielding area is enlarged by a predetermined multiple, first shielding area information can be obtained, then an intersection ratio of the first shielding area information and face information which is not associated and matched can be calculated, and when the intersection ratio is greater than or equal to a first threshold value, the face information which is not associated is determined to be the face information corresponding to the tracking target.

For example, when the first threshold is 0.3 and the intersection ratio of the first occlusion region information and the face information of the unassociated matching is 0.4, the unassociated face information can be determined to be the face information corresponding to the tracking target.

In a specific implementation process, please refer to fig. 3, wherein fig. 3.1 can be understood as a first frame image in the embodiment of the present invention, and an object with a tracking ID of 1 and a physical frame in fig. 3.1 can be used as a first tracking target in the embodiment of the present invention. Referring to fig. 3.2, the virtual frame is determined as the occlusion area described above in the embodiment of the present invention, and fig. 3.3 is used to represent an image in which no unassociated face image information appears. When it is determined that unassociated face image information appears near the shielding area, the shielding area can be expanded by a predetermined multiple, the shielding area is determined to be first shielding area information, namely a virtual frame area of a 3.4 chart in fig. 3, and then an intersection ratio of face information corresponding to an entity frame and area information corresponding to the virtual frame in fig. 3.4 can be calculated, when it is determined that the intersection ratio is greater than or equal to a first threshold, the face information corresponding to the entity frame can be determined to be a tracking target, and a corresponding tracking ID can be marked on the tracking target, and specifically can be marked as 1.

Step 205: and if the cross-over ratio is smaller than a first threshold, extracting the characteristics of the face information which is not associated and matched, and carrying out cross-over ratio calculation on the characteristics of the face information which is not associated and matched and the characteristics of the first face information, wherein the first face information is the face information corresponding to the tracking target determined according to the first frame image.

Step 206: and if the intersection ratio of the feature of the face information which is not associated and matched with the feature of the first face information is larger than or equal to a second threshold value, determining the face information which is not associated and matched with the feature of the first face information as the face information corresponding to the tracking target.

In the embodiment of the invention, if the calculated intersection ratio of the first shielding area information and the face information which is not associated and matched is smaller than the first threshold value, the characteristics of the face information which is not associated and matched can be extracted, and then the intersection ratio calculation is carried out on the characteristics of the face information which is not associated and matched and the characteristics of the first face information, wherein the first face information is used for representing the target face information which is not associated and matched with the second frame image in the first frame image.

In a specific implementation process, a depth network algorithm may be used to determine the feature vector of the face information and the feature vector of the first face information which are not associated and matched, so that the feature of the face information and the feature of the first face information which are not associated and matched may be extracted. Specifically, when it is determined that the intersection ratio of the feature of the face information and the feature of the first face information, which are not associated and matched, is greater than or equal to the second threshold, it may be determined that the face information, which is not associated and matched, is the face information corresponding to the tracking target. Specifically, the second threshold may be 0.6, 0.7, or the like. That is, when face image information that is not associated is determined in a predetermined frame number after the frame number of the second frame image, the face image information and the tracking target in the first frame image may be subjected to the cross-correlation calculation, that is, it is determined whether the face image information is the tracking target.

In the embodiment of the present invention, please refer to fig. 4, fig. 4 is a schematic diagram of a tracking target determined by the tracking method in the embodiment of the present invention. With continued reference to fig. 4, in fig. 4.1, a face detection frame of the tracking target may be determined, that is, face image information of the tracking target may be determined. Specifically, in fig. 4.2, a mask frame of a corresponding mask area of the tracking target may be determined, and then face image information corresponding to the tracking target in fig. 4.3 may be determined. The tracking method provided by the embodiment of the invention can accurately determine the tracking target in the scene of mutual shielding of multiple faces.

Based on the same inventive concept, the embodiment of the invention provides a tracking device which can realize the corresponding functions of the tracking method. The tracking means may be a hardware structure, a software module, or a hardware structure plus a software module. The tracking device may be implemented by a chip system, which may be formed by a chip, or may include a chip and other discrete devices. Referring to fig. 5, the tracking device includes a determining module 501, a judging module 502, and a tracking module 503. Wherein:

The determining module 501 determines a tracking target and face prediction frame information of the tracking target, and performs cross-correlation ratio calculation on the face prediction frame information and a plurality of head-shoulder frame information in a first frame image to determine an occlusion region, wherein the tracking target is used for representing information corresponding to face head image information of a first frame image, which is not associated and matched with face information in a second frame image, the first frame image and the second frame image belong to a first video stream, the first video stream comprises a plurality of frame images, and the second frame image and the first frame image are adjacent;

a judging module 502, configured to determine whether face information of unassociated matching appears in a predetermined range corresponding to the occlusion region from an image of a predetermined frame number after the frame number of the second frame image;

a tracking module 503, configured to, when face information of unassociated matching occurs, expand a detection area of the occlusion area by a predetermined multiple to obtain first occlusion area information, and calculate an intersection ratio of the first occlusion area information and the face information of unassociated matching; and if the intersection ratio is greater than or equal to a first threshold value, determining that the unassociated face information is the face information corresponding to the tracking target.

In a possible implementation manner, the tracking module 505 is configured to:

In a possible implementation manner, the determining module 501 is configured to:

All relevant contents of the steps related to the foregoing embodiment of the tracking method shown in fig. 2 may be cited in the functional description of the functional module corresponding to the tracking device in the embodiment of the present invention, which is not described herein.

The division of the units in the embodiments of the present invention is schematically shown, which is merely a logic function division, and may have another division manner when actually implemented, and in addition, each functional unit in each embodiment of the present invention may be integrated in one processor, or may exist separately and physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

Based on the same inventive concept, the embodiment of the present invention further provides a schematic diagram of a computer device, as shown in fig. 6, where the computer device in the embodiment of the present invention includes at least one processor 601, and a memory 602 and a communication interface 603 connected to the at least one processor 601, where the communication interface 603 may be a generic name of the I/O interface in fig. 1. In the embodiment of the present invention, the specific connection medium between the processor 601 and the memory 602 is not limited, and in fig. 6, the connection between the processor 601 and the memory 602 through the bus 600 is taken as an example, the bus 600 is shown by a thick line in fig. 6, and the connection manner between other components is only schematically illustrated and not limited. The bus 600 may be divided into an address bus, a data bus, a control bus, etc., and is represented by only one thick line in fig. 6 for convenience of representation, but does not represent only one bus or one type of bus.

In the embodiment of the present invention, the memory 602 stores instructions executable by the at least one processor 601, and the at least one processor 601 may perform the steps included in the tracking method by executing the instructions stored in the memory 602.

The processor 601 is a control center of a computer device, and may utilize various interfaces and lines to connect various parts of the entire fault detection device, and by executing or executing instructions stored in the memory 602 and invoking data stored in the memory 602, various functions of the computing device and processing the data, thereby performing overall monitoring of the computing device. Alternatively, the processor 601 may include one or more processing units, and the processor 601 may integrate an application processor and a modem processor, wherein the processor 601 primarily processes an operating system, a user interface, and application programs, etc., and the modem processor primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 601. In some embodiments, processor 601 and memory 602 may be implemented on the same chip, or they may be implemented separately on separate chips in some embodiments.

The processor 601 may be a general purpose processor such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, which may implement or perform the methods, steps and logic blocks disclosed in embodiments of the invention. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.

The memory 602 is a non-volatile computer readable storage medium that can be used to store non-volatile software programs, non-volatile computer executable programs, and modules. The Memory 602 may include at least one type of storage medium, which may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory), magnetic Memory, magnetic disk, optical disk, and the like. Memory 602 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 602 in embodiments of the present invention may also be circuitry or any other device capable of performing storage functions for storing program instructions and/or data. The communication interface 603 is a transmission interface that can be used for communication, and data can be received or transmitted through the communication interface 603.

With reference to the further structural schematic diagram of the computer device shown in fig. 7, the computer device further includes a basic input/output system (I/O system) 701 to facilitate the transfer of information between the various devices within the computer device, a mass storage device 705 to store an operating system 702, application programs 703, and other program modules 704.

The basic input/output system 701 includes a display 706 for displaying information and an input device 707, such as a mouse, keyboard, etc., for user input of information. Wherein both the display 706 and the input device 707 are coupled to the processor 601 via a basic input/output system 701 coupled to the system bus 600. The basic input/output system 701 may also include an input/output controller for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input-output controller also provides output to a display screen, a printer, or other type of output device.

The mass storage device 705 is connected to the processor 601 through a mass storage controller (not shown) that is connected to the system bus 600. The mass storage device 705 and its associated computer readable media provide non-volatile storage for the server package. That is, mass storage device 705 may include a computer readable medium (not shown) such as a hard disk or CD-ROM drive.

According to various embodiments of the invention, the computing device package may also be operated by a remote computer connected to the network through a network, such as the Internet. I.e., the computing device may be connected to the network 708 through a communication interface 603 coupled to the system bus 600, or may be connected to other types of networks or remote computer systems (not shown) using the communication interface 603.

In an exemplary embodiment, a storage medium is also provided, e.g., a memory 602, comprising instructions executable by the processor 601 of the apparatus to perform the above-described method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

In some possible embodiments, aspects of the tracking method provided by the present invention may also be implemented in the form of a program product comprising program code for causing a computer device to carry out the steps of the tracking method according to the various exemplary embodiments of the invention as described in the present specification, when said program product is run on the computer device.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method of tracking, the method comprising:

determining a tracking target and face prediction frame information of the tracking target, and performing cross-correlation calculation on the face prediction frame information and a plurality of head-shoulder frame information in a first frame image to determine an occlusion region, wherein the tracking target is used for representing information corresponding to face head image information of a first frame image which is not associated and matched with face information in a second frame image, the first frame image and the second frame image belong to a first video stream, the first video stream comprises a plurality of frame images, and the second frame image and the first frame image are adjacent; the shielding area is used for representing the area where the face information corresponding to the tracking target is shielded;

determining whether face information which is not matched in an associated mode appears in a preset range corresponding to the shielding area from images with preset frames after the second frame image;

if the intersection ratio is greater than or equal to a first threshold value, determining that the unassociated face information is the face information corresponding to the tracking target;

the step of determining the face prediction frame information of the tracking target comprises the following steps:

determining image information of a first preset frame number before a second frame image in the first video stream, and calculating coordinates of the central position of the face detection frame of the tracking target in the image information of the first preset frame number;

2. The method of claim 1, wherein the method further comprises:

3. The method of claim 1, wherein determining a tracking target and face prediction box information for the tracking target comprises:

4. The method of claim 3, wherein performing an associative match on the first frame image and the second frame image to determine a tracking target comprises:

5. A tracking device, the device comprising:

the device comprises a determination module, a first frame image and a second frame image, wherein the determination module is used for determining a tracking target and face prediction frame information of the tracking target, performing cross-correlation ratio calculation on the face prediction frame information and a plurality of head shoulder frame information in a first frame image to determine an occlusion area, the tracking target is used for representing information corresponding to face head image information of the first frame image which is not matched with face information in a second frame image in a correlation manner, the first frame image and the second frame image belong to a first video stream, the first video stream comprises a plurality of frame images, and the second frame image is adjacent to the first frame image; the shielding area is used for representing the area where the face information corresponding to the tracking target is shielded;

The judging module is used for determining whether face information which is not associated and matched appears in a preset range corresponding to the shielding area from images with preset frames after the second frame image;

the tracking module is used for expanding the detection area of the shielding area by a preset multiple to obtain first shielding area information when the face information which is not associated and matched appears, and calculating the intersection ratio of the first shielding area information and the face information which is not associated and matched; if the intersection ratio is greater than or equal to a first threshold value, determining that the unassociated face information is the face information corresponding to the tracking target;

wherein, the determining module is used for:

6. The apparatus of claim 5, wherein the tracking module is to:

7. The apparatus of claim 5, wherein the means for determining is to:

8. A computer device, the computer device comprising:

a memory for storing program instructions;

a processor for invoking program instructions stored in said memory and for executing the steps comprised in the method according to any of claims 1-4 in accordance with the obtained program instructions.

9. A storage medium storing computer-executable instructions for causing a computer to perform the steps comprised by the method of any one of claims 1-4.