CN111462174B - Multi-target tracking method and device and electronic equipment - Google Patents

Multi-target tracking method and device and electronic equipment Download PDF

Info

Publication number
CN111462174B
CN111462174B CN202010154953.9A CN202010154953A CN111462174B CN 111462174 B CN111462174 B CN 111462174B CN 202010154953 A CN202010154953 A CN 202010154953A CN 111462174 B CN111462174 B CN 111462174B
Authority
CN
China
Prior art keywords
picture
frame
frame information
detection
target frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010154953.9A
Other languages
Chinese (zh)
Other versions
CN111462174A (en
Inventor
汤力遥
柳俊杰
陈明华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010154953.9A priority Critical patent/CN111462174B/en
Publication of CN111462174A publication Critical patent/CN111462174A/en
Application granted granted Critical
Publication of CN111462174B publication Critical patent/CN111462174B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a multi-target tracking method, a multi-target tracking device and electronic equipment, relates to the technical field of picture processing, and can be used for automatic driving. The specific implementation scheme is as follows: acquiring a picture sequence to be processed; for each frame of picture in the picture sequence, when the frame picture is a non-first frame picture, combining the non-first frame picture, a previous frame picture of the non-first frame picture and a corresponding target frame information set to determine a candidate target frame information set corresponding to the non-first frame picture; acquiring a detection frame information set corresponding to a non-first frame picture; for each detection object in the non-initial frame picture, matching detection frame information corresponding to the detection object with candidate target frame information in a candidate target frame information set, and determining the matched candidate target frame information as target frame information corresponding to the detection object; according to the target frame information of each detection object in each frame of the picture sequence, each detection object is tracked, the detection result of the current frame of picture can be corrected, and tracking accuracy is improved.

Description

Multi-target tracking method and device and electronic equipment
Technical Field
The present application relates to the field of data processing technologies, and in particular, to the field of image processing technologies, and in particular, to a multi-target tracking method and apparatus, and an electronic device.
Background
The current multi-target tracking method is to perform multi-target detection for each frame of picture in a picture sequence to obtain a detection result, wherein the detection result comprises: detection frame information of a plurality of detection objects; and then, correlating detection results of two adjacent frames to obtain tracking results of a plurality of detection objects. In the above scheme, in the case that the accuracy of the detection result is low, the accuracy of the tracking result is also low.
Disclosure of Invention
The application provides a multi-target tracking method, a multi-target tracking device and electronic equipment, wherein a single target tracker is adopted to combine target frame information of each detection object in a previous frame of picture, candidate target frame information of a current frame of picture is predicted, the candidate target frame information is matched with detection frame information of each detection object in the current frame of picture, and detection objects associated with the candidate target frame information are determined, so that tracking of a plurality of detection objects is realized, detection results of the current frame of picture can be corrected, and tracking accuracy is improved.
An embodiment of a first aspect of the present application provides a multi-target tracking method, including: acquiring a picture sequence to be processed;
for each frame of picture in the picture sequence, when the frame picture is a non-first frame picture, combining the non-first frame picture, a previous frame picture of the non-first frame picture and a target frame information set corresponding to the previous frame picture to determine a candidate target frame information set corresponding to the non-first frame picture;
Acquiring a detection frame information set corresponding to the non-first frame picture; the detection frame information set includes: detection frame information corresponding to each detection object in the non-first frame picture;
for each detection object in the non-initial frame picture, matching detection frame information corresponding to the detection object with candidate target frame information in a candidate target frame information set to obtain matched candidate target frame information; determining the matched candidate target frame information as target frame information corresponding to the detection object;
and tracking each detection object according to the target frame information of each detection object in each frame of the picture sequence.
In one embodiment of the present application, for each frame of picture in the picture sequence, when the frame picture is a non-first frame picture, combining the non-first frame picture, a previous frame picture of the non-first frame picture, and a target frame information set corresponding to the previous frame picture, determining a candidate target frame information set corresponding to the non-first frame picture includes:
for each frame of picture in the picture sequence, when the frame of picture is a non-first frame of picture, acquiring a previous frame of picture of the non-first frame of picture and a target frame information set corresponding to the previous frame of picture; the target frame information set includes: target frame information corresponding to each detection object in the previous frame of picture;
For each detection object in the previous frame of picture, combining target frame information corresponding to the detection object in the previous frame of picture, the previous frame of picture and the non-initial frame of picture to obtain candidate target frame information;
and generating a candidate target frame information set corresponding to the non-initial frame picture according to the acquired candidate target frame information.
In one embodiment of the present application, the obtaining candidate target frame information for each detection object in the previous frame picture in combination with target frame information corresponding to the detection object in the previous frame picture, and the non-first frame picture includes:
aiming at each detection object in the previous frame of picture, carrying out target frame interception processing on the previous frame of picture according to target frame information corresponding to the detection object, and obtaining a target frame picture;
inputting the target frame picture and the non-initial frame picture into a single target tracker to obtain a plurality of pieces of predicted target frame information and confidence degrees corresponding to each piece of predicted target frame information;
and determining candidate target frame information according to the confidence coefficient corresponding to each piece of predicted target frame information.
In one embodiment of the present application, the determining candidate target frame information according to the confidence corresponding to each predicted target frame information includes:
The corresponding predicted target frame information with the highest confidence coefficient is determined to be candidate target frame information; or,
and matching the plurality of pieces of predicted target frame information with detection target frame information corresponding to the detection object, and determining candidate target frame information according to a matching result.
In one embodiment of the present application, for each detection object in the non-first frame picture, matching detection frame information corresponding to the detection object with candidate target frame information in a candidate target frame information set, to obtain matched candidate target frame information, including:
for each detection object in the non-first frame picture, determining a detection frame picture corresponding to the detection object in the non-first frame picture according to detection frame information corresponding to the detection object and the non-first frame picture;
determining a candidate target frame picture set according to the candidate target frame information set and the non-first frame picture;
aiming at each candidate target frame picture to be compared in the candidate target frame picture set, acquiring characteristic distance information and cross comparison data of the detection frame picture and the candidate target frame picture to be compared;
determining the matching degree of the detection frame picture and the candidate target frame picture to be compared according to the characteristic distance information and the cross-matching data;
Determining matched candidate target frame pictures according to the matching degree of the detection frame pictures and each candidate target frame picture to be compared in the candidate target frame picture set;
and determining the candidate target frame information corresponding to the matched candidate target frame picture as the matched candidate target frame information.
In one embodiment of the present application, the method further comprises: and for each detection object, if the matched candidate target frame information is not acquired, determining that the detection object is a newly added detection object, and determining the detection frame information of the detection object as the target frame information of the detection object in the non-first frame picture.
In one embodiment of the present application, the number of the single-target trackers is at least one, each single-target tracker is used for tracking a detection object, and outputting candidate target frame information of the non-first frame picture;
for each detection object in the non-first frame picture, matching detection frame information corresponding to the detection object with candidate target frame information in a candidate target frame information set, and after obtaining the matched candidate target frame information, further comprising:
judging whether first candidate target frame information exists in the candidate target frame information set; the first candidate target frame information is candidate target frame information which is not matched with the detection frame information;
Acquiring confidence corresponding to the first candidate target frame information;
when the confidence coefficient corresponding to the first candidate target frame information is larger than a first confidence coefficient threshold value, changing the state of a single target tracker outputting the first candidate target frame information into a tracking losing state;
and when the confidence coefficient corresponding to the first candidate target frame information is smaller than a second confidence coefficient threshold value, changing the state of the single target tracker outputting the first candidate target frame information into the disappearance of the detection object.
In one embodiment of the present application, the method further comprises: judging whether a first single-target tracker exists or not, wherein the first single-target tracker is a single-target tracker with a state that a detection object disappears and a state duration time is larger than a preset time threshold value;
and deleting the first single-target tracker when the first single-target tracker exists.
In one embodiment of the present application, the acquiring the detection frame information set corresponding to the non-first frame picture includes:
performing multi-target detection on the non-first frame picture by adopting a preset multi-target detection model, and acquiring a detection frame information set corresponding to the non-first frame picture;
the multi-target detection model and the single-target tracker are trained in a manner that,
Acquiring an initial joint model; the joint model includes: the system comprises a backbone network, a multi-target detection head network and a single-target tracking head network, wherein the multi-target detection head network and the single-target tracking head network are respectively connected with the backbone network;
acquiring training data, the training data comprising: a current frame picture, a previous frame picture of the current frame picture and a corresponding target frame information set;
training the joint model by adopting the training data to obtain a trained joint model;
combining a main network in the joint model and a multi-target detection head network connected with the main network to obtain a preset multi-target detection model;
and combining a backbone network in the joint model and a single-target tracking head network connected with the backbone network to obtain the single-target tracker.
According to the multi-target tracking method, the image sequence to be processed is obtained; for each frame of picture in the picture sequence, when the frame picture is a non-first frame picture, combining the non-first frame picture, a previous frame picture of the non-first frame picture and a target frame information set corresponding to the previous frame picture to determine a candidate target frame information set corresponding to the non-first frame picture; acquiring a detection frame information set corresponding to a non-first frame picture; for each detection object in the non-initial frame picture, matching detection frame information corresponding to the detection object with candidate target frame information in a candidate target frame information set to obtain matched candidate target frame information; determining the matched candidate target frame information as target frame information corresponding to the detection object; according to the target frame information of each detection object in each frame of the picture sequence, each detection object is tracked, the detection result of the current frame of picture can be corrected, and tracking accuracy is improved.
An embodiment of a second aspect of the present application provides a multi-target tracking apparatus, including:
the acquisition module is used for acquiring a picture sequence to be processed;
the determining module is used for determining a candidate target frame information set corresponding to the non-first frame picture according to the non-first frame picture, a previous frame picture of the non-first frame picture and a target frame information set corresponding to the previous frame picture when the frame picture is the non-first frame picture aiming at each frame picture in the picture sequence;
the acquisition module is further used for acquiring a detection frame information set corresponding to the non-first frame picture; the detection frame information set includes: detection frame information corresponding to each detection object in the non-first frame picture;
the matching module is used for matching the detection frame information corresponding to each detection object in the non-initial frame picture with the candidate target frame information in the candidate target frame information set to obtain matched candidate target frame information; determining the matched candidate target frame information as target frame information corresponding to the detection object;
and the tracking module is used for tracking each detection object according to the target frame information of each detection object in each frame of the picture sequence.
An embodiment of a third aspect of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the multi-target tracking method as described above.
A fourth aspect embodiment of the present application proposes a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the multi-target tracking method as described above.
Other effects of the above alternative will be described below in connection with specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:
FIG. 1 is a schematic diagram of a first embodiment according to the present application;
FIG. 2 is a schematic diagram of a second embodiment according to the present application;
FIG. 3 is a schematic diagram of a single target tracker;
FIG. 4 is a state transition diagram of a single target tracker;
FIG. 5 is a schematic diagram of a third embodiment according to the present application;
FIG. 6 is a schematic diagram of a fourth embodiment according to the present application;
Fig. 7 is a block diagram of an electronic device for implementing a multi-target tracking method of an embodiment of the application.
Detailed Description
Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The following describes a multi-target tracking method, a multi-target tracking device and an electronic device according to an embodiment of the application with reference to the accompanying drawings.
Fig. 1 is a schematic diagram according to a first embodiment of the present application. It should be noted that, the execution body of the multi-target tracking method provided in this embodiment is a multi-target tracking device, and the multi-target tracking device may be a hardware device, or software in the hardware device. Wherein the hardware devices such as terminal devices, servers, etc.
As shown in fig. 1, the specific implementation process of the multi-target tracking method is as follows:
step 101, a sequence of pictures to be processed is obtained.
Step 102, for each frame of picture in the picture sequence, when the frame picture is a non-first frame picture, combining the non-first frame picture, a previous frame picture of the non-first frame picture, and a target frame information set corresponding to the previous frame picture, to determine a candidate target frame information set corresponding to the non-first frame picture.
In an embodiment of the present application, the process of performing step 102 in the multi-object detection state may refer to the embodiment shown in fig. 2. In fig. 2, step 102 may include the steps of:
step 1021, for each frame of picture in the picture sequence, when the frame of picture is a non-first frame of picture, acquiring a previous frame of picture of the non-first frame of picture and a target frame information set corresponding to the previous frame of picture; the target frame information set includes: and the target frame information corresponding to each detection object in the previous frame of picture.
In the embodiment of the application, the detection object is, for example, a person, an animal, a vehicle, or the like. The target frame information corresponding to the detection object in the previous frame of picture may be, for example, coordinate information of four vertices of the target frame. The target frame is generally square, and the position of the target frame can be determined through coordinate information of four vertexes.
Step 1022, for each detection object in the previous frame of picture, combining the target frame information corresponding to the detection object in the previous frame of picture, and the non-first frame of picture, to obtain candidate target frame information.
In the embodiment of the present application, the process of executing step 1022 by the multi-target detection device may be, for example, performing, for each detection object in the previous frame of picture, target frame interception processing on the previous frame of picture according to the target frame information corresponding to the detection object, to obtain a target frame picture; inputting the target frame picture and the non-initial frame picture into a single target tracker, and acquiring a plurality of pieces of predicted target frame information and confidence degrees corresponding to each piece of predicted target frame information; and determining candidate target frame information according to the confidence coefficient corresponding to each piece of predicted target frame information. Wherein a schematic diagram of a single target tracker may be shown in fig. 3, for example.
In the embodiment of the present application, in the first implementation scenario, the multi-target detection device may determine, according to the confidence coefficient corresponding to each piece of predicted target frame information, candidate target frame information by determining, as candidate target frame information, predicted target frame information with the highest corresponding confidence coefficient. In the second implementation scenario, in order to improve accuracy of the determined candidate target frame information, the multi-target detection device may determine the candidate target frame information according to a confidence level corresponding to each piece of predicted target frame information, where the manner of determining the candidate target frame information may be that a plurality of pieces of predicted target frame information are matched with detection target frame information corresponding to the detection object, and the candidate target frame information is determined according to a matching result.
Specifically, matching a plurality of pieces of prediction target frame information with detection target frame information corresponding to a detection object, judging whether first prediction target frame information matched with the detection target frame information exists, and if so, selecting the prediction target frame information with the highest matching degree as candidate target frame information; if the predicted target frame information does not exist, selecting the predicted target frame information with the highest corresponding confidence coefficient as candidate target frame information.
In the embodiment of the application, the single target trackers can be in one-to-one correspondence with the detection objects, and each single target tracker is used for tracking one detection object and outputting candidate target frame information. The single-target tracker can be a neural network, and is obtained by training the initial single-target tracker by adopting training data. In addition, in order to enable the single-target tracker to correct the target frame information with poor accuracy of the previous frame, in the embodiment of the application, the target frame information with poor accuracy can be configured for the first frame of picture to train during training, so that the tracking accuracy of the single-target tracker is further improved.
Step 1023, generating a candidate target frame information set corresponding to the non-first frame picture according to the acquired candidate target frame information.
In addition, in the embodiment of the application, when the frame picture is the first frame picture, the detection frame information corresponding to each detection object in the first frame picture can be directly determined as the target frame information corresponding to each detection object.
Step 103, acquiring a detection frame information set corresponding to a non-first frame picture; the detection frame information set includes: and detecting frame information corresponding to each detection object in the non-first frame picture.
In the embodiment of the present application, the process of executing step 103 by the multi-target detection device may be, for example, performing multi-target detection on the non-first frame picture by using a preset multi-target detection model, and obtaining a detection frame information set corresponding to the non-first frame picture. The preset multi-target detection model is obtained by training an initial multi-target training model by training data.
In addition, in the embodiment of the application, in order to reduce the training amount, reduce the training cost, shorten the training time and improve the training accuracy, the joint model can be generated according to the initial single-target tracker and the multi-target detection model. The joint model includes: the system comprises a backbone network, a multi-target detection head network and a single-target tracking head network, wherein the multi-target detection head network and the single-target tracking head network are respectively connected with the backbone network. The backbone network is a backbone network part with the same structure in the single-target tracker and the multi-target detection model.
The training mode of the joint model may be that training data is obtained, where the training data includes: a current frame picture, a previous frame picture of the current frame picture and a corresponding target frame information set; training the combined mold by adopting training data to obtain a trained combined mold; combining a main network in the joint model and a multi-target detection head network connected with the main network to obtain a preset multi-target detection model; and combining the backbone network in the joint model and a single-target tracking head network connected with the backbone network to obtain the single-target tracker.
Step 104, for each detection object in the non-first frame picture, matching detection frame information corresponding to the detection object with candidate target frame information in a candidate target frame information set to obtain matched candidate target frame information; and determining the matched candidate target frame information as target frame information corresponding to the detection object.
In the embodiment of the application, for each detection object, if the matched candidate target frame information is not acquired, the detection object is determined to be a new detection object, and the detection frame information of the detection object is determined to be the target frame information of the detection object in the non-first frame picture. Under the condition that the detection objects are in one-to-one correspondence with the single target trackers, if the detection objects are newly added detection objects, a single target tracker needs to be newly added, and a correspondence relationship between the newly added detection objects and the newly added single target trackers is established.
In the embodiment of the application, candidate target frame information which is not matched with the detection frame information exists. Thus, after step 104, the method may further comprise the steps of: judging whether first candidate target frame information exists in the candidate target frame information set; the first candidate target frame information is candidate target frame information which is not matched with the detection frame information; acquiring confidence corresponding to the first candidate target frame information; when the confidence coefficient corresponding to the first candidate target frame information is larger than a first confidence coefficient threshold value, changing the state of the single target tracker outputting the first candidate target frame information into the state of losing tracking; and when the confidence coefficient corresponding to the first candidate target frame information is smaller than the second confidence coefficient threshold value, changing the state of the single target tracker outputting the first candidate target frame information into the disappearance of the detection object.
As shown in fig. 4, fig. 4 is a state transition diagram of the single-target tracker. In fig. 4, when there is detection frame information that does not match with the candidate target frame information, determining that the detection frame information fails to match, determining that a detection object corresponding to the detection frame information is a newly added detection object (i.e., a new target in fig. 4), setting a single target tracker for the newly added detection object, and determining the detection frame information determined by the newly added detection object as target frame information corresponding to the newly added detection object in the current frame; then carrying out multi-target detection and matching on the next frame of picture, and if candidate target frame information (namely a tracking frame in FIG. 4) corresponding to the newly added detection object exists in the next frame of picture, successfully tracking the newly added detection object; if the detection frame information of the newly added detection object is not detected in the next frame of picture, or the candidate target frame information output by the single target tracker of the newly added detection object is not matched with the detection frame information, determining that the state of the single target tracker is lost when the confidence coefficient of the candidate target frame information is larger than a first confidence coefficient threshold value, and determining that the state of the single target tracker is lost when the confidence coefficient of the candidate target frame information is smaller than a second confidence coefficient threshold value. In addition, if the state duration of the state of the disappearance of the detection object of the single target tracker is greater than a preset time threshold, the single target tracker is deleted.
Step 105, tracking each detection object according to the target frame information of each detection object in each frame of the picture sequence.
According to the multi-target tracking method, the image sequence to be processed is obtained; for each frame of picture in the picture sequence, when the frame picture is a non-first frame picture, combining the non-first frame picture, a previous frame picture of the non-first frame picture and a target frame information set corresponding to the previous frame picture to determine a candidate target frame information set corresponding to the non-first frame picture; acquiring a detection frame information set corresponding to a non-first frame picture; for each detection object in the non-initial frame picture, matching detection frame information corresponding to the detection object with candidate target frame information in a candidate target frame information set to obtain matched candidate target frame information; determining the matched candidate target frame information as target frame information corresponding to the detection object; according to the target frame information of each detection object in each frame of the picture sequence, each detection object is tracked, the detection result of the current frame of picture can be corrected, and tracking accuracy is improved.
Fig. 5 is a schematic diagram according to a third embodiment of the present application. As shown in fig. 5, the step 104 may specifically include the following steps:
Step 1041, for each detection object in the non-first frame picture, determining a detection frame picture corresponding to the detection object in the non-first frame picture according to the detection frame information corresponding to the detection object and the non-first frame picture.
In the embodiment of the present application, the detection frame information corresponding to the detection object may include: and detecting coordinate information of four vertexes of the frame. Correspondingly, according to the detection frame information corresponding to the detection object and the non-first frame picture, the mode of determining the detection frame picture corresponding to the detection object in the non-first frame picture can be, for example, that the size of the detection frame is enlarged according to a preset rule according to the coordinate information of four vertexes of the detection frame information, and then the non-first frame picture is intercepted according to the enlarged detection frame information to obtain the detection frame picture corresponding to the detection object in the non-first frame picture.
Step 1042, determining a candidate target frame picture set according to the candidate target frame information set and the non-first frame picture.
In the embodiment of the application, for each candidate target frame information in a candidate target frame information set, expanding the candidate target frame information according to a preset rule to obtain expanded candidate target frame information; and intercepting the non-initial frame picture according to the candidate target frame information after the expansion processing to obtain a candidate target frame picture corresponding to the candidate target frame information in the non-initial frame picture.
Step 1043, for each candidate target frame picture to be compared in the candidate target frame picture set, obtaining feature distance information and cross-comparison data of the detection frame picture and the candidate target frame picture to be compared.
In the embodiment of the present application, the manner in which the multi-target tracking device obtains the feature distance information of the detection frame picture and the candidate target frame picture to be compared may be, for example, extracting features in the detection frame picture, such as color histogram features, etc.; extracting characteristics of candidate target frame pictures to be compared; and calculating the distance information of the features in the detection frame picture and the features in the candidate target frame picture to be compared. The distance information may be, for example, JS divergence (Jensen-Shannon divergence).
In the embodiment of the application, the mode of acquiring the cross-correlation data of the detection frame picture and the candidate target frame picture to be compared by the multi-target tracking device can be that the overlapping part of the detection frame picture and the candidate target frame picture to be compared is acquired, the area of the overlapping part is acquired, and the area of the overlapping part is normalized to obtain the cross-correlation data of the detection frame picture and the candidate target frame picture to be compared.
Step 1044, determining the matching degree of the detection frame picture and the candidate target frame picture to be compared according to the feature distance information and the cross-matching data.
In the embodiment of the application, the characteristic distance information of the detection frame picture and the candidate target frame picture to be compared can be normalized, and the normalized characteristic distance information and the cross-correlation data are weighted and summed to obtain the matching degree of the detection frame picture and the candidate target frame picture to be compared.
Step 1045, determining a matched candidate target frame picture according to the matching degree of each candidate target frame picture to be compared in the detection frame picture and the candidate target frame picture set.
In the embodiment of the application, the candidate target frame picture with the largest matching degree can be determined as the candidate target frame picture matched with the detection frame picture; or determining the candidate target frame picture with the matching degree larger than the preset matching degree threshold value as the candidate target frame picture matched with the detection frame picture.
Step 1046, determining the candidate target frame information corresponding to the matched candidate target frame picture as the matched candidate target frame information.
According to the multi-target tracking method, for each detection object in the non-first frame picture, according to detection frame information corresponding to the detection object and the non-first frame picture, the detection frame picture corresponding to the detection object in the non-first frame picture is determined; determining a candidate target frame picture set according to the candidate target frame information set and the non-first frame picture; aiming at each candidate target frame picture to be compared in a candidate target frame picture set, acquiring characteristic distance information and cross comparison data of the detection frame picture and the candidate target frame picture to be compared; determining the matching degree of the detection frame picture and the candidate target frame picture to be compared according to the characteristic distance information and the cross comparison data; determining matched candidate target frame pictures according to the matching degree of each candidate target frame picture to be compared in the detection frame picture and the candidate target frame picture set; and determining the candidate target frame information corresponding to the matched candidate target frame picture as the matched candidate target frame information. Therefore, candidate target frame information matched with the detection frame information can be determined by combining the cross-correlation data, the characteristic distance information and the like, the matching accuracy is improved, and the tracking accuracy is further improved.
In order to implement the embodiments described in fig. 1 to 5, the embodiment of the present application further proposes a multi-target tracking device.
Fig. 6 is a schematic diagram according to a fourth embodiment of the present application. As shown in fig. 6, the multi-target tracking apparatus 600 includes: the system comprises an acquisition module 610, a determination module 620, a matching module 630 and a tracking module 640.
The acquiring module 610 is configured to acquire a sequence of pictures to be processed;
a determining module 620, configured to determine, for each frame picture in the picture sequence, a candidate target frame information set corresponding to a non-first frame picture by combining the non-first frame picture, a previous frame picture of the non-first frame picture, and a target frame information set corresponding to the previous frame picture when the frame picture is the non-first frame picture;
the acquiring module 610 is further configured to acquire a detection frame information set corresponding to the non-first frame picture; the detection frame information set includes: detection frame information corresponding to each detection object in the non-first frame picture;
the matching module 630 is configured to match, for each detection object in the non-first frame picture, detection frame information corresponding to the detection object with candidate target frame information in a candidate target frame information set, and obtain matched candidate target frame information; determining the matched candidate target frame information as target frame information corresponding to the detection object;
And the tracking module 640 is configured to track each detection object according to target frame information of each detection object in each frame of the picture sequence.
According to the multi-target tracking device, the image sequence to be processed is obtained; for each frame of picture in the picture sequence, when the frame picture is a non-first frame picture, combining the non-first frame picture, a previous frame picture of the non-first frame picture and a target frame information set corresponding to the previous frame picture to determine a candidate target frame information set corresponding to the non-first frame picture; acquiring a detection frame information set corresponding to a non-first frame picture; for each detection object in the non-initial frame picture, matching detection frame information corresponding to the detection object with candidate target frame information in a candidate target frame information set to obtain matched candidate target frame information; determining the matched candidate target frame information as target frame information corresponding to the detection object; according to the target frame information of each detection object in each frame of the picture sequence, each detection object is tracked, the detection result of the current frame of picture can be corrected, and tracking accuracy is improved.
In order to achieve the above embodiments, the embodiments of the present application further provide an electronic device.
As shown in fig. 7, is a block diagram of an electronic device of a multi-target tracking method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.
As shown in fig. 7, the electronic device includes: one or more processors 701, memory 702, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 701 is illustrated in fig. 7.
Memory 702 is a non-transitory computer readable storage medium provided by the present application. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the multi-target tracking method provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to execute the multi-target tracking method provided by the present application.
The memory 702 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the acquisition module 610, the determination module 620, the matching module 630, and the tracking module 640 shown in fig. 6) corresponding to the multi-target tracking method according to the embodiments of the present application. The processor 701 executes various functional applications of the server and data processing, i.e., implements the multi-objective tracking method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 702.
Memory 702 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created from the use of the multi-target tracked electronic device, and the like. In addition, the memory 702 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 702 may optionally include memory located remotely from processor 701, which may be connected to multi-target tracking electronic devices via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the multi-target tracking method may further include: an input device 703 and an output device 704. The processor 701, the memory 702, the input device 703 and the output device 704 may be connected by a bus or otherwise, in fig. 7 by way of example.
The input device 703 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the multi-target tracked electronic device, such as a touch screen, a keypad, a mouse, a trackpad, a touchpad, a pointer stick, one or more mouse buttons, a trackball, a joystick, and the like. The output device 704 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims (11)

1. A multi-target tracking method, comprising:
acquiring a picture sequence to be processed;
for each frame of picture in the picture sequence, when the frame of picture is a non-first frame of picture, acquiring a previous frame of picture of the non-first frame of picture and a target frame information set corresponding to the previous frame of picture; the target frame information set includes: target frame information corresponding to each detection object in the previous frame of picture;
for each detection object in the previous frame of picture, combining target frame information corresponding to the detection object in the previous frame of picture, the previous frame of picture and the non-initial frame of picture to obtain candidate target frame information;
generating a candidate target frame information set corresponding to the non-first frame picture according to the acquired candidate target frame information;
Acquiring a detection frame information set corresponding to the non-first frame picture; the detection frame information set includes: detection frame information corresponding to each detection object in the non-first frame picture;
for each detection object in the non-initial frame picture, matching detection frame information corresponding to the detection object with candidate target frame information in a candidate target frame information set to obtain matched candidate target frame information; determining the matched candidate target frame information as target frame information corresponding to the detection object;
and tracking each detection object according to the target frame information of each detection object in each frame of the picture sequence.
2. The method according to claim 1, wherein the obtaining candidate target frame information for each detection object in the previous frame picture in combination with target frame information corresponding to the detection object in the previous frame picture, and the non-first frame picture includes:
aiming at each detection object in the previous frame of picture, carrying out target frame interception processing on the previous frame of picture according to target frame information corresponding to the detection object, and obtaining a target frame picture;
Inputting the target frame picture and the non-initial frame picture into a single target tracker to obtain a plurality of pieces of predicted target frame information and confidence degrees corresponding to each piece of predicted target frame information;
and determining candidate target frame information according to the confidence coefficient corresponding to each piece of predicted target frame information.
3. The method of claim 2, wherein determining candidate target frame information based on the confidence level corresponding to each predicted target frame information comprises:
the corresponding predicted target frame information with the highest confidence coefficient is determined to be candidate target frame information; or,
and matching the plurality of pieces of predicted target frame information with detection target frame information corresponding to the detection object, and determining candidate target frame information according to a matching result.
4. The method according to claim 1, wherein for each detection object in the non-first frame picture, matching detection frame information corresponding to the detection object with candidate target frame information in a candidate target frame information set, and obtaining matched candidate target frame information, includes:
for each detection object in the non-first frame picture, determining a detection frame picture corresponding to the detection object in the non-first frame picture according to detection frame information corresponding to the detection object and the non-first frame picture;
Determining a candidate target frame picture set according to the candidate target frame information set and the non-first frame picture;
aiming at each candidate target frame picture to be compared in the candidate target frame picture set, acquiring characteristic distance information and cross comparison data of the detection frame picture and the candidate target frame picture to be compared;
determining the matching degree of the detection frame picture and the candidate target frame picture to be compared according to the characteristic distance information and the cross-matching data;
determining matched candidate target frame pictures according to the matching degree of the detection frame pictures and each candidate target frame picture to be compared in the candidate target frame picture set;
and determining the candidate target frame information corresponding to the matched candidate target frame picture as the matched candidate target frame information.
5. The method as recited in claim 1, further comprising:
and for each detection object, if the matched candidate target frame information is not acquired, determining that the detection object is a newly added detection object, and determining the detection frame information of the detection object as the target frame information of the detection object in the non-first frame picture.
6. The method according to claim 2, wherein the number of single-target trackers is at least one, each single-target tracker is used for tracking a detection object, and outputting candidate target frame information of the non-first frame picture;
For each detection object in the non-first frame picture, matching detection frame information corresponding to the detection object with candidate target frame information in a candidate target frame information set, and after obtaining the matched candidate target frame information, further comprising:
judging whether first candidate target frame information exists in the candidate target frame information set; the first candidate target frame information is candidate target frame information which is not matched with the detection frame information;
acquiring confidence corresponding to the first candidate target frame information;
when the confidence coefficient corresponding to the first candidate target frame information is larger than a first confidence coefficient threshold value, changing the state of a single target tracker outputting the first candidate target frame information into a tracking losing state;
and when the confidence coefficient corresponding to the first candidate target frame information is smaller than a second confidence coefficient threshold value, changing the state of the single target tracker outputting the first candidate target frame information into the disappearance of the detection object.
7. The method as recited in claim 6, further comprising:
judging whether a first single-target tracker exists or not, wherein the first single-target tracker is a single-target tracker with a state that a detection object disappears and a state duration time is larger than a preset time threshold value;
And deleting the first single-target tracker when the first single-target tracker exists.
8. The method according to claim 2, wherein the obtaining the detection frame information set corresponding to the non-first frame picture includes:
performing multi-target detection on the non-first frame picture by adopting a preset multi-target detection model, and acquiring a detection frame information set corresponding to the non-first frame picture;
the multi-target detection model and the single-target tracker are trained in a manner that,
acquiring an initial joint model; the joint model includes: the system comprises a backbone network, a multi-target detection head network and a single-target tracking head network, wherein the multi-target detection head network and the single-target tracking head network are respectively connected with the backbone network;
acquiring training data, the training data comprising: a current frame picture, a previous frame picture of the current frame picture and a corresponding target frame information set;
training the joint model by adopting the training data to obtain a trained joint model;
combining a main network in the joint model and a multi-target detection head network connected with the main network to obtain a preset multi-target detection model;
and combining a backbone network in the joint model and a single-target tracking head network connected with the backbone network to obtain the single-target tracker.
9. A multi-target tracking apparatus, comprising:
the acquisition module is used for acquiring a picture sequence to be processed;
the determining module is used for acquiring a previous frame picture of the non-initial frame picture and a target frame information set corresponding to the previous frame picture when the frame picture is the non-initial frame picture aiming at each frame picture in the picture sequence; the target frame information set includes: target frame information corresponding to each detection object in the previous frame of picture; for each detection object in the previous frame of picture, combining target frame information corresponding to the detection object in the previous frame of picture, the previous frame of picture and the non-initial frame of picture to obtain candidate target frame information; generating a candidate target frame information set corresponding to the non-first frame picture according to the acquired candidate target frame information;
the acquisition module is further used for acquiring a detection frame information set corresponding to the non-first frame picture; the detection frame information set includes: detection frame information corresponding to each detection object in the non-first frame picture;
the matching module is used for matching the detection frame information corresponding to each detection object in the non-initial frame picture with the candidate target frame information in the candidate target frame information set to obtain matched candidate target frame information; determining the matched candidate target frame information as target frame information corresponding to the detection object;
And the tracking module is used for tracking each detection object according to the target frame information of each detection object in each frame of the picture sequence.
10. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
11. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.
CN202010154953.9A 2020-03-06 2020-03-06 Multi-target tracking method and device and electronic equipment Active CN111462174B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010154953.9A CN111462174B (en) 2020-03-06 2020-03-06 Multi-target tracking method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010154953.9A CN111462174B (en) 2020-03-06 2020-03-06 Multi-target tracking method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111462174A CN111462174A (en) 2020-07-28
CN111462174B true CN111462174B (en) 2023-10-31

Family

ID=71682681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010154953.9A Active CN111462174B (en) 2020-03-06 2020-03-06 Multi-target tracking method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111462174B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112116008B (en) * 2020-09-18 2024-07-05 平安科技(深圳)有限公司 Processing method of target detection model based on intelligent decision and related equipment thereof
CN112528932B (en) * 2020-12-22 2023-12-08 阿波罗智联(北京)科技有限公司 Method and device for optimizing position information, road side equipment and cloud control platform
CN112861819A (en) * 2021-04-01 2021-05-28 潘振波 Method and device for detecting crossing of fence in transformer substation operation and electronic equipment
CN113610819A (en) * 2021-08-11 2021-11-05 杭州申昊科技股份有限公司 Defect detection method and device, electronic equipment and storage medium
CN113450386B (en) * 2021-08-31 2021-12-03 北京美摄网络科技有限公司 Face tracking method and device
CN116403074B (en) * 2023-04-03 2024-05-14 上海锡鼎智能科技有限公司 Semi-automatic image labeling method and device based on active labeling

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06274625A (en) * 1993-03-18 1994-09-30 Toshiba Corp Moving object tracking method for monitor image
CN103218827A (en) * 2013-03-21 2013-07-24 上海交通大学 Contour tracing method based on shape-transmitting united division and image-matching correction
CN106778141A (en) * 2017-01-13 2017-05-31 北京元心科技有限公司 Unlocking method and device based on gesture recognition and mobile terminal
CN106875425A (en) * 2017-01-22 2017-06-20 北京飞搜科技有限公司 A kind of multi-target tracking system and implementation method based on deep learning
CN108053427A (en) * 2017-10-31 2018-05-18 深圳大学 A kind of modified multi-object tracking method, system and device based on KCF and Kalman
CN109753940A (en) * 2019-01-11 2019-05-14 京东方科技集团股份有限公司 Image processing method and device
CN110378259A (en) * 2019-07-05 2019-10-25 桂林电子科技大学 A kind of multiple target Activity recognition method and system towards monitor video
CN110659559A (en) * 2019-08-02 2020-01-07 浙江省北大信息技术高等研究院 Multi-target tracking method and system for monitoring scene
CN110827325A (en) * 2019-11-13 2020-02-21 北京百度网讯科技有限公司 Target tracking method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002190020A (en) * 2000-12-20 2002-07-05 Monolith Co Ltd Method and device for image effect

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06274625A (en) * 1993-03-18 1994-09-30 Toshiba Corp Moving object tracking method for monitor image
CN103218827A (en) * 2013-03-21 2013-07-24 上海交通大学 Contour tracing method based on shape-transmitting united division and image-matching correction
CN106778141A (en) * 2017-01-13 2017-05-31 北京元心科技有限公司 Unlocking method and device based on gesture recognition and mobile terminal
CN106875425A (en) * 2017-01-22 2017-06-20 北京飞搜科技有限公司 A kind of multi-target tracking system and implementation method based on deep learning
CN108053427A (en) * 2017-10-31 2018-05-18 深圳大学 A kind of modified multi-object tracking method, system and device based on KCF and Kalman
CN109753940A (en) * 2019-01-11 2019-05-14 京东方科技集团股份有限公司 Image processing method and device
CN110378259A (en) * 2019-07-05 2019-10-25 桂林电子科技大学 A kind of multiple target Activity recognition method and system towards monitor video
CN110659559A (en) * 2019-08-02 2020-01-07 浙江省北大信息技术高等研究院 Multi-target tracking method and system for monitoring scene
CN110827325A (en) * 2019-11-13 2020-02-21 北京百度网讯科技有限公司 Target tracking method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111462174A (en) 2020-07-28

Similar Documents

Publication Publication Date Title
CN111462174B (en) Multi-target tracking method and device and electronic equipment
US11854237B2 (en) Human body identification method, electronic device and storage medium
CN111582375B (en) Data enhancement policy searching method, device, equipment and storage medium
CN110659600B (en) Object detection method, device and equipment
CN111275190B (en) Compression method and device of neural network model, image processing method and processor
CN110968718B (en) Target detection model negative sample mining method and device and electronic equipment
CN110852321B (en) Candidate frame filtering method and device and electronic equipment
CN111783948A (en) Model training method and device, electronic equipment and storage medium
CN111275827B (en) Edge-based augmented reality three-dimensional tracking registration method and device and electronic equipment
US20210256725A1 (en) Target detection method, device, electronic apparatus and storage medium
CN111695519B (en) Method, device, equipment and storage medium for positioning key point
JP7270114B2 (en) Face keypoint detection method, device and electronic device
CN111708477B (en) Key identification method, device, equipment and storage medium
CN112241716B (en) Training sample generation method and device
CN111462179B (en) Three-dimensional object tracking method and device and electronic equipment
CN111861991A (en) Method and device for calculating image definition
CN111680597A (en) Face recognition model processing method, device, equipment and storage medium
CN115101069A (en) Voice control method, device, equipment, storage medium and program product
CN111488972B (en) Data migration method, device, electronic equipment and storage medium
CN111783644B (en) Detection method, detection device, detection equipment and computer storage medium
CN112488126A (en) Feature map processing method, device, equipment and storage medium
CN111696134A (en) Target detection method and device and electronic equipment
CN111767990A (en) Neural network processing method and device
CN112381877B (en) Positioning fusion and indoor positioning method, device, equipment and medium
CN112183484B (en) Image processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant