CN111709975A

CN111709975A - Multi-target tracking method and device, electronic equipment and storage medium

Info

Publication number: CN111709975A
Application number: CN202010573301.9A
Authority: CN
Inventors: 苏军; 鲁兴龙; 吴昊; 谢锴; 刘晓东
Original assignee: Shanghai Goldway Intelligent Transportation System Co Ltd
Current assignee: Shanghai Goldway Intelligent Transportation System Co Ltd
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2020-09-25
Anticipated expiration: 2040-06-22
Also published as: CN111709975B

Abstract

The embodiment of the application provides a multi-target tracking method, a multi-target tracking device, electronic equipment and a storage medium, a prediction frame of each determined target in a current video frame is predicted according to track information of the target, then the prediction frame is used for matching with a candidate frame group of each target to be matched, NMS operation is carried out on the candidate frame group which is successfully matched, and therefore a target frame of each target to be matched which is successfully matched with the determined target in the current video frame is obtained, namely the actual position of the target to be matched which is successfully matched with the determined target. The actual position of the target to be matched is determined by combining the track information of the determined target and a target detection method, so that the accuracy of the actual position of the target to be matched can be improved, particularly for a multi-target scene with high target similarity, the track information of the determined target is introduced when the actual position of the target to be matched is determined, the condition of target matching errors can be reduced, and the accuracy of multi-target tracking is improved.

Description

Multi-target tracking method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image analysis technologies, and in particular, to a multi-target tracking method and apparatus, an electronic device, and a storage medium.

Background

With the development of computer vision technology, especially the emergence of deep learning networks, automatic target tracking based on video data is made possible.

In the related technology, in the process of performing multi-target tracking by using video data, for a current video frame to be processed, firstly, a computer vision technology is used for performing target detection on the current video frame, so as to obtain the actual positions of targets (hereinafter referred to as second targets) in the current video frame, and the visual features of the second targets in the current video frame are extracted; then, according to the historical track of each target (hereinafter referred to as each first target) in the previous video frame, predicting the predicted position of each first target in the current video frame, acquiring the visual feature of each first target, then respectively calculating the position similarity of the predicted position of each first target and the actual position of each second target, respectively calculating the visual similarity of the visual feature of each first target and each second target, judging the first target and the second target as the same target by combining the position similarity and the visual similarity, and taking the actual position of the second target as the position of the corresponding first target in the current video frame, thereby completing target tracking.

However, the method depends too much on target detection, and the target detection is inaccurate in the actual motion process of the target due to the posture, the angle and the like, so that the actual position of each second target is not accurate, and particularly, for a multi-target scene with high target similarity, the situation of target matching error is very easy to occur, and the accuracy of multi-target tracking is influenced.

Disclosure of Invention

An object of the embodiments of the present application is to provide a target tracking method, apparatus, electronic device and storage medium, so as to increase accuracy of multi-target tracking. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a multi-target tracking method, where the method includes:

in the process of tracking the target in the video, acquiring track information of each determined target before the current video frame;

predicting the position area of each determined target in the current video frame according to the track information of each determined target to respectively obtain a prediction frame of each determined target;

performing target detection on a current video frame by using a computer vision technology to obtain a candidate frame group of each target to be matched in a current video, wherein the candidate frame group of the target to be matched comprises a plurality of candidate frames of the target to be matched aiming at any target to be matched;

matching each determined target and each target to be matched according to the prediction frame of each determined target and the candidate frame group of each target to be matched, and respectively determining the target to be matched, which is successfully matched with each determined target;

respectively carrying out non-maximum suppression NMS operation on the candidate frame group of the target to be matched, which is successfully matched with each determined target, to obtain a target frame of the target to be matched, which is successfully matched with each determined target, in the current video frame;

and determining the track of each determined target in the current video frame according to the track information of each determined target and the target frame of each target to be matched in the current video frame.

In a possible implementation manner, the matching, according to the prediction frame of each determined target and the candidate frame group of each target to be matched, each determined target and each target to be matched, and determining the target to be matched for which the matching of each determined target is successful, respectively, includes:

aiming at any determined target, selecting each target to be matched with the same type as the determined target from the targets to be matched to obtain each target to be matched with the same type of the determined target;

selecting each candidate frame with the distance from the prediction frame of the determined target being smaller than a preset distance threshold value from each candidate frame group of the same type of targets to be matched of the determined target to obtain each target candidate frame of the determined target;

and respectively calculating the intersection ratio IoU between the prediction frame of the determined target and each target candidate frame of the determined target, selecting IoU maximum target candidate frames which are larger than a preset IoU threshold value as the targets to be matched, which are corresponding to the target candidate frames of the determined target, and respectively obtaining the targets to be matched, which are successfully matched by each determined target.

In a possible implementation manner, after the matching the determined targets and the objects to be matched according to the prediction frame of each determined target and the candidate frame group of each object to be matched, and determining the objects to be matched for which the determined targets are successfully matched, respectively, the method further includes:

and aiming at any target to be matched which is not successfully matched with the determined target, carrying out NMS operation on the candidate frame group of the target to be matched to obtain a new detection frame of the determined target.

In a possible implementation manner, the determining, according to the trajectory information of each determined object and the object frame of each object to be matched in the current video frame, each determined trajectory in the current video frame includes:

acquiring true value characteristic information of each determined target according to the track information of each determined target; respectively extracting feature information in a target frame of each target to be matched, which is successfully matched with the determined target, from the current video frame to obtain feature information to be matched of each determined target;

for any determined target, carrying out feature matching on the true value feature information of the determined target and the feature information to be matched;

and if the true value characteristic information of the determined target is successfully matched with the characteristic information to be matched, adding a target frame corresponding to the determined target in the current video frame into the track information of the determined target.

In a possible embodiment, after the performing, for any determined target, feature matching between the true-value feature information of the determined target and the feature information to be matched, the method further includes:

and if the true value characteristic information of the determined target is not successfully matched with the characteristic information to be matched, taking the target frame corresponding to the determined target as a new detection frame of the determined target.

In one possible embodiment, the method further comprises:

calculating the target similarity between any two determined targets which have the same type and are not overlapped in the time sequence of the track information;

and merging the track information of each determined target determined as the same target according to the similarity of each target.

In a possible embodiment, the calculating the object similarity between any two determined objects which are of the same type and have no coincidence in time sequence of the trajectory information includes:

selecting two determined targets which have the same type and are not overlapped in time sequence of the track information to obtain a first determined target and a second determined target;

predicting a predicted position of the first determined target in a designated video frame of the second determined target according to the track information of the first determined target; obtaining a true position of the second determined target in the designated video frame;

IoU calculating the true position and the predicted position to obtain a target IoU;

calculating normalized distances of the true position and the predicted position;

calculating a trajectory appearance similarity of the first determined target and the second determined target;

when the target IoU is greater than a preset IoU threshold, the normalized distance is less than a preset distance threshold, and the track appearance similarity is greater than a preset first similarity threshold, the target IoU, the normalized distance, and the track appearance similarity are integrated, and the target similarity of the first determined target and the second determined target is calculated, so that the target similarity between any two determined targets which are the same in type and have no coincidence in time sequence of track information is obtained.

In one possible embodiment, the length of the trajectory information of the first determined target is greater than the length of the trajectory information of the second determined target.

In a possible implementation manner, the merging trajectory information of the determined targets determined as the same target according to the similarity of the targets includes:

connecting all determined targets which are the same in type and have no superposition on the track information in the time sequence into a directed graph according to the time sequence;

for any two determined targets which are the same in type and have no coincidence in the time sequence of the track information, taking the opposite number of the target similarity between the two determined targets as the weight of the path of the two determined targets on the directed graph, thereby obtaining the weight of each path in the directed graph, wherein the weight of the path between the two determined targets which have no target similarity is infinite;

solving the directed graph by using a network flow algorithm to obtain a plurality of paths, wherein each determined target in the same path is determined as the same target;

and merging the track information of each determined target in the same path according to a time sequence to obtain the tracking track of each target.

In a second aspect, an embodiment of the present application provides a multi-target tracking apparatus, including:

the track information acquisition module is used for acquiring track information of each determined target before the current video frame in the process of tracking the target in the video;

the target position prediction module is used for predicting the position area of each determined target in the current video frame according to the track information of each determined target to respectively obtain a prediction frame of each determined target;

the target position detection module is used for carrying out target detection on a current video frame by utilizing a computer vision technology to obtain a candidate frame group of each target to be matched in a current video, wherein the candidate frame group of the target to be matched comprises a plurality of candidate frames of the target to be matched aiming at any target to be matched;

the target position matching module is used for matching each determined target and each target to be matched according to the prediction frame of each determined target and the candidate frame group of each target to be matched, and respectively determining the target to be matched, which is successfully matched with each determined target;

the target position determining module is used for respectively carrying out non-maximum suppression NMS operation on the candidate frame group of the target to be matched, which is successfully matched with each determined target, so as to obtain a target frame of the target to be matched, which is successfully matched with each determined target, in the current video frame;

and the tracking track determining module is used for determining each track determined in the current video frame according to the track information of each determined target and the target frame of each target to be matched in the current video frame.

In a possible implementation manner, the target location matching module is specifically configured to:

In a possible embodiment, the apparatus further comprises:

and the first new target determining module is used for carrying out NMS operation on the candidate frame group of the target to be matched aiming at any target to be matched, which is not successfully matched with the determined target, so as to obtain a new detection frame of the determined target.

In one possible implementation, the tracking trajectory determination module includes:

the characteristic information acquisition submodule is used for acquiring true value characteristic information of each determined target according to the track information of each determined target; respectively extracting feature information in a target frame of each target to be matched, which is successfully matched with the determined target, from the current video frame to obtain feature information to be matched of each determined target;

the characteristic information matching submodule is used for carrying out characteristic matching on the true value characteristic information of the determined target and the characteristic information to be matched aiming at any determined target;

and the track information updating submodule is used for adding a target frame corresponding to the determined target in the current video frame into the track information of the determined target if the true value characteristic information of the determined target is successfully matched with the characteristic information to be matched.

In a possible embodiment, the apparatus further comprises:

and the second new target determining module is used for taking a target frame corresponding to the determined target as a new detection frame of the determined target if the true characteristic information of the determined target is not successfully matched with the characteristic information to be matched.

In a possible embodiment, the apparatus further comprises:

the target similarity calculation module is used for calculating the target similarity between any two determined targets which are the same in type and have no coincidence in time sequence of the track information;

and the track information merging module is used for merging the track information of each determined target determined as the same target according to the similarity of each target.

In a possible implementation manner, the target similarity calculation module is specifically configured to:

In a possible implementation manner, the track information merging module is specifically configured to:

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory;

the memory is used for storing a computer program;

the processor is used for realizing any one of the multi-target tracking methods when executing the program stored in the memory.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any one of the multi-target tracking methods described above.

In a fifth aspect, the present application further provides a computer program product containing instructions, which when run on a computer, causes the computer to perform any one of the multi-target tracking methods described above.

In the multi-target tracking method, the multi-target tracking device, the electronic equipment and the storage medium provided by the embodiment of the application, in the process of tracking the target in the video, the track information of each determined target before the current video frame is acquired; predicting the position area of each determined target in the current video frame according to the track information of each determined target to respectively obtain a prediction frame of each determined target; performing target detection on a current video frame by using a computer vision technology to obtain a candidate frame group of each target to be matched in a current video, wherein the candidate frame group of the target to be matched comprises a plurality of candidate frames of the target to be matched aiming at any target to be matched; matching each determined target and each target to be matched according to the prediction frame of each determined target and the candidate frame group of each target to be matched, and respectively determining the target to be matched, which is successfully matched with each determined target; respectively carrying out NMS operation on the candidate frame group of the target to be matched, which is successfully matched with each determined target, to obtain a target frame of the target to be matched, which is successfully matched with each determined target, in the current video frame; and determining the track of each determined target in the current video frame according to the track information of each determined target and the target frame of each target to be matched in the current video frame.

In the embodiment of the application, according to the track information of the target, a prediction frame of each determined target in the current video frame is predicted, then the prediction frame is used for matching with a candidate frame group of each target to be matched, and NMS operation is performed on the candidate frame group which is successfully matched, so that a target frame of each target to be matched which is successfully matched with the determined target in the current video frame is obtained, namely the actual position of the target to be matched which is successfully matched with the determined target. The method comprises the steps of determining the actual position of a target to be matched by combining track information of the determined target and a target detection method, and compared with the method of determining the actual position of the target to be matched only by using the target detection method, the method can improve the accuracy of the actual position of the target to be matched. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a first schematic diagram of a multi-target tracking method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a target location matching method according to an embodiment of the present application;

FIG. 3 is a second schematic diagram of a multi-target tracking method according to an embodiment of the present application;

FIG. 4 is a first schematic diagram of a target track information updating method according to an embodiment of the present application;

FIG. 5 is a second schematic diagram of a target track information updating method according to an embodiment of the present application;

fig. 6 is a first schematic diagram of a target track information merging method according to an embodiment of the present application;

FIG. 7 is a diagram illustrating a target similarity calculation method according to an embodiment of the present application;

FIG. 8 is a schematic illustration of a first determined objective and a second determined objective in accordance with an embodiment of the present application;

fig. 9 is a second schematic diagram of a target track information merging method according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a directed graph according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a multi-target tracking apparatus according to an embodiment of the present application;

fig. 12 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Multi-target tracking: the method comprises the steps of positioning specified targets in each frame in a sequence of images, respectively corresponding the specified targets to one another, and outputting motion tracks of the targets, wherein the specified targets are any targets which are interested by a user, such as motor vehicles, non-motor vehicles, pedestrians and the like.

Network flow algorithm: a graph theory algorithm aims to calculate the maximum/small capacity in a network according to a certain rule and output nodes passed by the maximum/small capacity.

NMS (Non-Maximum Suppression ): and eliminating the rest redundant boxes in the candidate boxes according to the rule, and suppressing the elements which are not the local maximum worth searching out the local optimal value.

In the related art, when multi-target tracking is performed, a computer vision technology is used for performing target detection on a current video frame, so that the actual position of each second target in the current video frame is obtained. When the target is detected, the actual position of each second target in the current video frame is obtained based on the confidence coefficient, but the confidence coefficient is not completely reliable, and the position of the target frame with the maximum confidence coefficient may not be optimal for the historical track, so that the actual position of the second target is detected inaccurately, subsequent matching problems are caused, and especially for a multi-target scene with high target similarity, the situation of target matching errors is very easy to occur, and the accuracy of multi-target tracking is influenced.

In view of this, an embodiment of the present application provides a multi-target tracking method, where the method includes:

predicting the position area of each determined target in the current video frame according to the track information of each determined target, and respectively obtaining a prediction frame of each determined target;

respectively carrying out NMS operation on the candidate frame group of the target to be matched, which is successfully matched with each determined target, to obtain a target frame of the target to be matched, which is successfully matched with each determined target, in the current video frame;

In the embodiment of the application, according to the track information of the target, a prediction frame of each determined target in the current video frame is predicted, then the prediction frame is used for matching with a candidate frame group of each target to be matched, and NMS operation is performed on the candidate frame group which is successfully matched, so that a target frame of each target to be matched which is successfully matched with the determined target in the current video frame is obtained, namely the actual position of the target to be matched which is successfully matched with the determined target. The method comprises the steps of determining the actual position of a target to be matched by combining track information of the determined target and a target detection method, and compared with the method of determining the actual position of the target to be matched only by using the target detection method, the method can improve the accuracy of the actual position of the target to be matched.

Referring to fig. 1, fig. 1 is a schematic diagram of a multi-target tracking method according to an embodiment of the present application, including:

s101, in the process of tracking the target in the video, acquiring track information of each determined target before the current video frame.

The multi-target tracking method can be realized through electronic equipment, and specifically, the electronic equipment can be an intelligent camera, a hard disk video recorder, a server or a personal computer and the like.

Target refers to any target that a user wishes to track, such as a motor vehicle, non-motor vehicle, or pedestrian. The determined target refers to a target that has been detected. The current video frame, that is, the current video frame to be processed, is a first frame video frame in the video, and no determined target and track information exists before, so that when the first frame video frame is analyzed, a target frame of each target can be detected and obtained by directly using a target detection method of the related art. The determined targets herein may specifically be determined targets included in a video frame previous to the current video frame.

And S102, predicting the position area of each determined target in the current video frame according to the track information of each determined target, and respectively obtaining the prediction frame of each determined target.

The position of the determined target in the current video frame can be predicted using any relevant trajectory prediction algorithm. For example, according to the track information of the determined target, the movement trend of the determined target, including the movement direction and the movement speed, can be obtained. According to the time difference of the video frame in the track information of the current video frame and the determined target and the motion trend of the determined target, the predicted position of the determined target in the current video frame can be obtained, and the predicted position is generally represented in a form of a predicted frame, so that the predicted frame of the determined target in the current video frame is obtained.

S103, carrying out target detection on the current video frame by utilizing a computer vision technology to obtain a candidate frame group of each target to be matched in the current video, wherein for any target to be matched, the candidate frame group of the target to be matched comprises a plurality of candidate frames of the target to be matched.

The target detection here is a candidate frame group of each target to be matched, that is, a set of possible candidate frames of each target to be matched. Different from the method for directly determining the target frame of the target to be matched in the current video frame by using the confidence coefficient in the related technology, the target frame of the target to be matched is determined by the following steps in combination with the prediction frame of the determined target in the embodiment of the application.

And S104, matching each determined target and each target to be matched according to the prediction frame of each determined target and the candidate frame group of each target to be matched, and respectively determining the target to be matched, which is successfully matched with each determined target.

And performing primary matching on each determined target and each target to be matched based on the prediction frame of the determined target and the candidate frame group of the target to be matched, so that a factor of the determined target is introduced in the process of determining the target frame of the target to be matched, and the accuracy of the target frame of the target to be matched is improved.

And S105, respectively carrying out NMS operation on the candidate frame group of the target to be matched, which is successfully matched with each determined target, so as to obtain the target frame of the target to be matched, which is successfully matched with each determined target, in the current video frame.

And performing NMS operation on the candidate frame group of the successfully matched target to be matched to obtain an optimal candidate frame, namely the target frame of the target to be matched.

And S106, determining the track of each determined target in the current video frame according to the track information of each determined target and the target frame of each target to be matched in the current video frame.

The target frame of the target to be matched in the current video frame can be regarded as the real position of the target to be matched in the current video frame, the position similarity and the visual similarity of the determined target and the target to be matched can be respectively calculated, so that the determined target and the target to be matched are matched again, and if the matching is successful, the track of the determined target is updated in the current video, so that the multi-target tracking of the current video frame is completed. And executing the same operation as the current video frame aiming at each video frame (except the first frame video frame) in the video, thereby realizing the multi-target tracking aiming at the video.

In the embodiment of the present application, the target detection result is a candidate frame group of each target to be matched, that is, a set of possible candidate frames of each target to be matched. Different from the method for directly determining the target frame of the target to be matched in the current video frame by using the confidence coefficient in the related technology, the target frame of the target to be matched is determined by combining the prediction frame of the determined target, so that the accuracy of the actual position of the target to be matched can be improved, especially for multi-target scenes with high similarity of each target, the track information of the determined target is introduced when the actual position of the target to be matched is determined, the condition of target matching errors can be reduced, and the accuracy of multi-target tracking is improved.

In a possible implementation manner, referring to fig. 2, the matching the determined targets and the targets to be matched according to the prediction frames of the determined targets and the candidate frame groups of the targets to be matched, and determining the targets to be matched, which are successfully matched with the determined targets, respectively, includes:

s1041, aiming at any determined target, selecting each target to be matched with the same type as the determined target from the targets to be matched to obtain each target to be matched with the same type of the determined target.

In the same video, tracking can be performed only for one type of target, for example, only for vehicles, or only for pedestrians, etc. In some scenarios, multiple types of targets in the same video may be required to be tracked simultaneously, such as tracking a pedestrian while tracking a vehicle. The target detection can acquire the type of the target to be matched, the same type of the determined target is also known, and different types of targets are certainly not the same target, so that the targets with the same type are matched.

And S1042, selecting each candidate frame of which the distance from the prediction frame of the determined target is smaller than a preset distance threshold from each candidate frame group of the same type of targets to be matched of the determined target to obtain each target candidate frame of the determined target.

The distance between the prediction frame and the candidate frame may be the distance between the center of the prediction frame and the center of the candidate frame, or the average value of the distances between the four corner points. The preset distance threshold may be set according to actual situations and is positively correlated with the resolution of the video, and may be set to 5 pixels, 10 pixels, 20 pixels, or the like, for example.

S1043, separately calculating IoU (Intersection-Over-Union ratio) of the predicted frame of the determined target and each target candidate frame of the determined target, selecting IoU largest target candidate frame greater than a preset IoU threshold as the target to be matched, and thus obtaining the target to be matched, which is successfully matched with the determined target.

And for any determined target, IoU of the prediction frame of the determined target and each target candidate frame of the determined target is calculated, the largest IoU in the results is selected, if the largest IoU of the determined target is greater than a preset IoU threshold value, the target to be matched, to which the target candidate frame corresponding to the largest IoU belongs, is considered to be successfully matched with the determined target, and if the largest IoU is not greater than a preset IoU threshold value, the determined target is considered to be unsuccessfully matched. A failure in matching the determined target indicates that the determined target is missing in the current video frame, i.e., the determined target is not included in the current video frame. And performing the operation on each determined target, and obtaining the target to be matched which is successfully matched for each determined target which is successfully matched.

In the embodiment of the application, the target candidate frame is determined by using the position relation between the prediction frame of the determined target and each candidate frame of the target to be matched, the matching result between the determined target and the target to be matched is calculated IoU, when the target frame of the target to be matched is determined, the accuracy of the actual position of the target to be matched can be improved by combining the prediction frame of the determined target, especially for a multi-target scene with high similarity of each target, the track information of the determined target is introduced when the actual position of the target to be matched is determined, the condition of target matching errors can be reduced, and the accuracy of multi-target tracking is improved.

In a possible implementation manner, referring to fig. 3, after the matching is performed on each of the determined targets and each of the targets to be matched according to the prediction frame of each of the determined targets and the candidate frame group of each of the targets to be matched, and the targets to be matched, which are successfully matched with each of the determined targets, are determined, respectively, the method further includes:

s107, aiming at any target to be matched which is not successfully matched with the determined target, NMS operation is carried out on the candidate frame group of the target to be matched to obtain a new detection frame of the determined target.

The target to be matched in the current video may be newly appeared, so the target to be matched which is not successfully matched is regarded as a new determined target, and the NMS operation is performed on the candidate frame group to obtain the detection frame of the new determined target, so that a new track is opened.

In a possible implementation manner, referring to fig. 4, the determining, according to the track information of each determined object and the object frame of each object to be matched in the current video frame, each determined track in the current video frame includes:

s1061, acquiring true value characteristic information of each determined target according to the track information of each determined target; and respectively extracting the characteristic information in the target frame of the target to be matched, which is successfully matched with each determined target, from the current video frame to obtain the characteristic information to be matched of each determined target.

The visual characteristics of the determined target can be extracted from the video frame which is one frame or a plurality of frames before the current video frame according to the track information of the determined target, and the visual characteristics can be used as the truth characteristic information of the determined target. And in the current video frame, extracting the visual features in the target frame of the target to be matched, which is successfully matched with the determined target, to obtain the information of the features to be matched of the determined target.

S1062, for any determined target, performing feature matching on the true-value feature information of the determined target and the feature information to be matched.

S1063, if the true characteristic information of the determined target is successfully matched with the characteristic information to be matched, adding a target frame corresponding to the determined target in the current video frame to the trajectory information of the determined target.

And the target frame corresponding to the determined target is the target frame of the target to be matched, which is successfully matched with the determined target. When the target frame of the target to be matched is determined, the predicted position of the determined target is considered, so that when secondary matching is carried out, matching can be carried out only aiming at the visual characteristics, and therefore computing resources are saved.

In a possible implementation, referring to fig. 5, after the above-mentioned feature matching is performed on the true-value feature information of any determined target and the feature information to be matched, the method further includes:

s1064, if the true characteristic information of the determined target is not successfully matched with the characteristic information to be matched, using the target frame corresponding to the determined target as a new detection frame of the determined target.

The target to be matched in the current video may be newly appeared, so that the target frame corresponding to the determined target, the characteristic information characteristic of which is not successfully matched, is regarded as a new detection frame of the determined target, and a new track is opened according to the new detection frame of the determined target.

In one possible embodiment, referring to fig. 6, the method further comprises:

and S108, calculating the target similarity between any two determined targets which have the same type and no coincidence of the track information in time sequence.

And S109, merging the track information of the determined targets determined as the same target according to the similarity of the targets.

In practical situations, objects in a part of video frames may be lost due to the objects being occluded or due to object detection failure, and therefore there may be a case where one object has multiple tracks. Whether the two determined targets are the same target or not is determined by calculating the target similarity between the two determined targets which are the same in type and have no coincidence in time sequence of the track information, and if the two determined targets are the same target, the track information of the two determined targets is merged. The determined similarity of objects may be similarity of trajectories or similarity of appearances, etc.

In order to increase the accuracy of the combination, in one possible embodiment, referring to fig. 7, the calculating the object similarity between any two determined objects which are of the same type and have no coincidence in the track information in time sequence includes:

s1081, two determined targets which are the same in type and have no coincidence in track information in time sequence are selected, and a first determined target and a second determined target are obtained.

The track information of two determined targets is not overlapped in time sequence, which means that the determined targets do not appear in the same video frame at the same time. The first determined target may be any one of the two selected determined targets, while the second determined target is the other one.

S1082, predicting a predicted position of the first determined target in the designated video frame of the second determined target according to the track information of the first determined target; a true position of the second determined target in the designated video frame is obtained.

The designated video frame may be a first frame or a last frame of the second determined target, and specifically, in terms of time sequence, when the track information of the first determined target is before the track information of the second determined target, the designated video frame is the first frame of the second determined target; when the track information of the first determined target is subsequent to the track information of the second determined target, designating the video frame as the end frame of the second determined target.

In one possible embodiment, the length of the track information of the first specified target is greater than the length of the track information of the second specified target.

The length of the track information of the first determined target is greater than that of the track information of the second determined target, that is, the number of video frames corresponding to the track information of the first determined target is greater than that of the second determined target, for example, as shown in fig. 8, where a dashed box represents a predicted position.

In the related art, the prediction mode is to predict the position of the target object at the later moment from the track at the previous moment, but the accuracy of the predicted position is low when the track at the current moment is short. In the embodiment of the present application, the length of the track information of the first determined target is greater than the length of the track information of the second determined target, that is, the prediction from the long track to the short track is adopted, so that the accuracy of the predicted position may need to be improved.

S1083, IoU of the true position and the predicted position is calculated to obtain a target IoU.

S1084, calculating a normalized distance between the true position and the predicted position.

S1085, calculating a similarity of the track appearance of the first determined target and the second determined target.

Each target can calculate an appearance attribute according to the appearance model, the track appearance similarity can be calculated through the detected appearance of a single-frame video, and the PCA (Principal Component Analysis) dimension reduction or the averaging can be performed by using the time sequence of the latest frames.

S1086, when the target IoU is greater than a preset IoU threshold, the normalized distance is less than a preset distance threshold, and the trajectory appearance similarity is greater than a preset first similarity threshold, integrating the target IoU, the normalized distance, and the trajectory appearance similarity, and calculating the target similarity between the first determined target and the second determined target, thereby obtaining the target similarity between any two determined targets with the same type and without overlapping trajectory information in time sequence.

The preset IoU threshold, the preset distance threshold and the preset first similarity threshold can be set in a user-defined manner according to actual situations, and are used for eliminating the first determined target and the second determined target which are different greatly. Specifically, the target similarity may be a weighted average of the target IoU, the normalized distance, and the trajectory appearance similarity.

In the embodiment of the application, the target similarity is calculated by integrating the target IoU, the normalized distance and the track appearance similarity, so that the representativeness of the target similarity can be increased, and the accuracy of track information combination is increased.

In a possible implementation manner, referring to fig. 9, the merging trajectory information of the determined targets determined as the same target according to the similarity of the targets includes:

s1091, connecting the determined targets that have the same type and whose track information does not overlap in time series as a directed graph in time series.

Specifically, each determined target may be represented by a video frame, and a first frame and a last frame corresponding to the trajectory information of the determined target may be used to represent one determined target, for example, as shown in fig. 10, each determined target is of the same type, each point ui — > vi represents a first frame and a last frame of a determined target, and the solid line represents an internal association, which is a determined association, and does not need to be processed any more. The dashed lines indicate the association between determined objects allowing skipping a certain time period and connecting to the following determined objects. To facilitate the solution of the algorithm, two edges are virtualized: s 0- > ui represents an entry edge, vi- > t0 represents an edge, resulting in a directed graph, where the dashed lines need to be represented using weights.

S1092, for any two determined targets that are the same in type and have no coincidence in time sequence, taking the opposite number of the target similarity between the two determined targets as the weight of the path of the two determined targets on the directed graph, thereby obtaining the weight of each path in the directed graph, where the weight of the path between the two determined targets that have no target similarity is infinite.

The inverse number of the target similarity between the two determined targets is used as the weight of the path of the two determined targets on the directed graph, and the minimum network flow algorithm is conveniently utilized for solving.

S1093, solving the directed graph by using a network flow algorithm to obtain a plurality of paths, wherein each determined target in the same path is determined as the same target.

And S1094, merging the track information of each determined target in the same path according to a time sequence to obtain the tracking track of each target.

The network flow algorithm herein is any relevant network flow algorithm. After the directed graph is calculated, calculation is performed by adopting a minimum network flow solving mode, specifically, a KSP (Top-K-short paths, front K Shortest paths) algorithm is used for solving, the minimum cost calculation is mainly performed by adopting a minimum heap algorithm, and the main process is to maintain a heap with unknown length. The steps for solving the network flow graph are as follows:

(1) a shortest path is found in the directed graph using a dynamic programming algorithm.

(2) And reversing the shortest path, setting the weight value to zero, and constructing a residual error map.

(3) And solving the shortest path by utilizing a Dijkstra algorithm.

(4) Repeating the steps (2) and (3) until the sum of all path costs solved by utilizing the Dijkstra algorithm is larger than the cost of the path solved by dynamic planning in the first step.

And obtaining a plurality of paths after the network flow graph is solved. And starting from the end point of the network flow chart, reversely extracting paths one by one, namely the tracking tracks of all the targets.

In the embodiment of the application, the path is solved by adopting a network flow algorithm, so that the automation degree of the algorithm can be improved, and the manual intervention is reduced.

An embodiment of the present application further provides a multi-target tracking apparatus, referring to fig. 11, the apparatus includes:

the track information acquiring module 11 is configured to acquire track information of each determined target before a current video frame in a process of tracking a target in a video;

a target position prediction module 12, configured to predict, according to the trajectory information of each determined target, a position area of each determined target in a current video frame, and obtain a prediction frame of each determined target;

a target position detection module 13, configured to perform target detection on a current video frame by using a computer vision technology, so as to obtain a candidate frame group of each target to be matched in a current video, where, for any target to be matched, the candidate frame group of the target to be matched includes multiple candidate frames of the target to be matched;

a target position matching module 14, configured to match each of the determined targets and each of the targets to be matched according to the prediction frame of each of the determined targets and the candidate frame group of each of the targets to be matched, and determine the target to be matched, for which each of the determined targets is successfully matched, respectively;

a target position determining module 15, configured to perform non-maximum suppression NMS operation on the candidate frame group of each target to be matched, where the determined target is successfully matched, to obtain a target frame of each target to be matched, where the determined target is successfully matched, in the current video frame;

a tracking track determining module 16, configured to determine, according to track information of each determined target and a target frame of each target to be matched in the current video frame, a track of each determined target in the current video frame.

In a possible implementation, the target location matching module 14 is specifically configured to: aiming at any determined target, selecting each target to be matched with the same type as the determined target from the targets to be matched to obtain each target to be matched with the same type of the determined target; selecting each candidate frame with the distance from the prediction frame of the determined target being smaller than a preset distance threshold value from each candidate frame group of the same type of targets to be matched of the determined target to obtain each target candidate frame of the determined target; and respectively calculating the intersection ratio IoU between the prediction frame of the determined target and each target candidate frame of the determined target, selecting IoU maximum target candidate frames which are larger than a preset IoU threshold value as the targets to be matched, which are corresponding to the target candidate frames of the determined target, and respectively obtaining the targets to be matched, which are successfully matched with the determined target.

In a possible embodiment, the above apparatus further comprises: and the first new target determining module is used for carrying out NMS operation on the candidate frame group of the target to be matched aiming at any target to be matched, which is not successfully matched with the determined target, so as to obtain a new detection frame of the determined target.

In a possible implementation, the tracking trajectory determining module 16 includes:

a characteristic information obtaining submodule, configured to obtain true value characteristic information of each determined target according to trajectory information of each determined target; respectively extracting feature information in a target frame of each target to be matched, which is successfully matched with the determined target, from the current video frame to obtain feature information to be matched of each determined target;

In a possible embodiment, the above apparatus further comprises: and the second new target determining module is used for taking a target frame corresponding to the determined target as a new detection frame of the determined target if the true characteristic information of the determined target is not successfully matched with the characteristic information to be matched.

In a possible embodiment, the above apparatus further comprises:

In a possible implementation manner, the target similarity calculation module is specifically configured to: selecting two determined targets which have the same type and are not overlapped in time sequence of the track information to obtain a first determined target and a second determined target; predicting the predicted position of the first determined target in the appointed video frame of the second determined target according to the track information of the first determined target; obtaining a true position of the second determined target in the designated video frame; IoU calculating the true position and the predicted position to obtain a target IoU; calculating the normalized distance between the true position and the predicted position; calculating the similarity of the track appearance of the first determined target and the second determined target; when the target IoU is greater than a preset IoU threshold, the normalized distance is less than a preset distance threshold, and the trajectory appearance similarity is greater than a preset first similarity threshold, the target IoU, the normalized distance, and the trajectory appearance similarity are combined to calculate the target similarity of the first determined target and the second determined target, so as to obtain the target similarity between any two determined targets which are of the same type and have no overlap in time sequence of trajectory information.

In a possible implementation manner, the track information merging module is specifically configured to: connecting all determined targets which are the same in type and have no superposition on the track information in the time sequence into a directed graph according to the time sequence; for any two determined targets which are the same in type and have no coincidence in time sequence of the track information, taking the opposite number of the target similarity between the two determined targets as the weight of the path of the two determined targets on the directed graph, thereby obtaining the weight of each path in the directed graph, wherein the weight of the path between the two determined targets which have no target similarity is infinite; solving the directed graph by using a network flow algorithm to obtain a plurality of paths, wherein each determined target in the same path is determined as the same target; and merging the track information of each determined target in the same path according to a time sequence to obtain the tracking track of each target.

An embodiment of the present application further provides an electronic device, including: a processor and a memory;

the memory is used for storing computer programs;

the processor is used for realizing any multi-target tracking method when executing the computer program stored in the memory.

Optionally, referring to fig. 12, the electronic device according to the embodiment of the present application further includes a communication interface 902 and a communication bus 904, where the processor 901, the communication interface 902, and the memory 903 complete communication with each other through the communication bus 904.

The communication bus mentioned in the electronic device may be a PCI (Peripheral component interconnect) bus, an EISA (Extended Industry standard architecture) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a RAM (Random Access Memory) or an NVM (Non-Volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

The embodiment of the application also provides a computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, any one of the multi-target tracking methods in the embodiments is realized.

In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the multi-target tracking methods of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It should be noted that, in this document, the technical features in the various alternatives can be combined to form the scheme as long as the technical features are not contradictory, and the scheme is within the scope of the disclosure of the present application. Relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the apparatus, the electronic device, and the storage medium, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A multi-target tracking method, characterized in that the method comprises:

2. The method according to claim 1, wherein the matching each of the determined objects and each of the objects to be matched according to the prediction frame of each of the determined objects and the candidate frame group of each of the objects to be matched, and respectively determining the object to be matched for which the matching of each of the determined objects is successful, comprises:

3. The method according to claim 1, wherein after the matching of each of the determined objects and each of the objects to be matched is performed according to the prediction frame of each of the determined objects and the candidate frame group of each of the objects to be matched, and the objects to be matched, which are successfully matched with each of the determined objects, are respectively determined, the method further comprises:

4. The method according to claim 1, wherein the determining the track of each determined object in the current video frame according to the track information of each determined object and the object frame of each object to be matched in the current video frame comprises:

5. The method according to claim 4, wherein after the feature matching is performed on the true-value feature information of any determined target and the feature information to be matched, the method further comprises:

6. The method according to any one of claims 1-5, further comprising:

7. The method of claim 6, wherein calculating the object similarity between any two determined objects of the same type and without coincidence in trajectory information in time sequence comprises:

8. The method of claim 7, wherein the length of the track information of the first determined target is greater than the length of the track information of the second determined target.

9. The method according to claim 6, wherein said merging trajectory information of determined objects determined to be the same object according to the similarity of the objects comprises:

10. A multi-target tracking apparatus, the apparatus comprising:

11. An electronic device comprising a processor and a memory;

the memory is used for storing a computer program;

the processor, when executing the program stored in the memory, implements the multi-target tracking method according to any one of claims 1 to 9.

12. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements the multi-target tracking method according to any one of claims 1 to 9.