CN115994928B - Target tracking method, device, equipment and medium - Google Patents

Target tracking method, device, equipment and medium Download PDF

Info

Publication number
CN115994928B
CN115994928B CN202310293456.0A CN202310293456A CN115994928B CN 115994928 B CN115994928 B CN 115994928B CN 202310293456 A CN202310293456 A CN 202310293456A CN 115994928 B CN115994928 B CN 115994928B
Authority
CN
China
Prior art keywords
detection
result
prediction result
detection result
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310293456.0A
Other languages
Chinese (zh)
Other versions
CN115994928A (en
Inventor
薛巍
赵诗宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Imotion Automotive Technology Suzhou Co Ltd
Original Assignee
Imotion Automotive Technology Suzhou Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Imotion Automotive Technology Suzhou Co Ltd filed Critical Imotion Automotive Technology Suzhou Co Ltd
Priority to CN202310293456.0A priority Critical patent/CN115994928B/en
Publication of CN115994928A publication Critical patent/CN115994928A/en
Application granted granted Critical
Publication of CN115994928B publication Critical patent/CN115994928B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Image Analysis (AREA)

Abstract

The application discloses a target tracking method, a device, equipment and a medium, which relate to the technical field of computer vision and comprise the following steps: performing target detection on the current frame to obtain a detection result set; matching the high-resolution detection result set with the prediction result of the previous frame, updating the prediction result by using the detection result and putting the prediction result into the tracking set if the matching is successful, and putting the detection result and the prediction result which are failed to match into the first detection legacy set and the first prediction result set; matching the low-resolution detection result set with the first prediction result set, updating the prediction result and putting the prediction result into the tracking set if the low-resolution detection result set is successful, discarding the detection result of matching failure, and putting the prediction result of matching failure into the second prediction result set; matching the first detection legacy set with the second prediction result set, and if successful, updating the prediction result and putting the prediction result into the tracking set, and putting the detection result which fails to match into the second detection legacy set; and initializing a second detection legacy set, and then putting the second detection legacy set into a tracking set to return a target tracking result.

Description

Target tracking method, device, equipment and medium
Technical Field
The present invention relates to the field of computer vision, and in particular, to a target tracking method, apparatus, device, and medium.
Background
The most mature tracking method applied at present is as follows: tracking means based on detection. The first stage uses the detection model to obtain a detection result; and in the second stage, a distance or feature matching method is used for matching the detection result with the tracking result of the previous frame. However, the existing tracking method has the following problems: after a plurality of frames are continuously lost, when the tracking target reappears, the detection target and the tracking prediction result have no intersection, so that the detection target and the tracking prediction result cannot be matched, and finally the lost target cannot be matched again, so that the problem of tracking ID switching is caused.
In summary, how to increase the matching success rate to continuously track the target, so as to avoid the problem that the lost target cannot be matched again is currently to be solved.
Disclosure of Invention
Accordingly, the present invention aims to provide a target tracking method, device, equipment and medium, which can increase the matching success rate to continuously track the target, and avoid that the lost target cannot be matched again. The specific scheme is as follows:
in a first aspect, the present application discloses a target tracking method, including:
detecting each target in the current frame to obtain a detection result set, and determining a high-resolution detection result set and a low-resolution detection result set in the detection result set;
Matching the high-resolution detection result set with the prediction result of the previous frame, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into a tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into a first detection legacy set and a first prediction result set;
matching the low-score detection result set with the first prediction result set, updating the corresponding prediction result by using the successfully matched low-score detection result, putting the updated prediction result into the tracking set, discarding the low-score detection result with failed matching, and putting the prediction result with failed matching into a second prediction result set;
matching the first detection legacy set with the second prediction result set, updating the corresponding prediction result by using the successfully matched high-resolution detection result, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to match into the second detection legacy set and the lost tracking set;
initializing a high-score detection result in the second detection legacy set, and placing the initialized high-score detection result into the tracking set so as to return a target tracking result based on the tracking set;
Wherein said matching the first set of detected carryover with the second set of predicted outcomes comprises:
determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method;
determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method;
and determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value.
Optionally, the detecting each target in the current frame to obtain a detection result set includes:
and detecting the current frame by using a target detector to obtain detection results and detection scores corresponding to the targets, and constructing a detection result set based on the detection results.
Optionally, the determining the high-score detection result set and the low-score detection result set in the detection result set includes:
and determining a preset detection score threshold value, and determining a high-resolution detection result set and a low-resolution detection result set in the detection result set based on the detection score threshold value and the detection scores corresponding to the targets.
Optionally, before the matching the high-resolution detection result set with the prediction result of the previous frame, the method further includes:
and predicting the tracking result of the previous frame by using a Kalman prediction method to obtain a prediction result of the previous frame.
Optionally, the determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value includes:
taking the smaller value of the CIoU distance cost and the appearance characteristic distance cost as a target cost value, and judging whether the target cost value is smaller than a preset cost threshold value or not;
if the target cost value is smaller than the preset cost threshold value, the matching result is successful;
and if the target cost value is not smaller than the preset cost threshold, the matching result is a matching failure.
Optionally, the determining, based on the cosine similarity method, an appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set includes:
determining cosine similarity by using a preset CIoU threshold and a preset appearance characteristic threshold;
judging whether the cosine similarity and the CIoU distance are respectively smaller than the corresponding preset appearance characteristic threshold and the corresponding preset CIoU threshold;
If yes, determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set by using the cosine similarity and a preset coefficient;
and if not, determining the appearance characteristic distance cost between the high-resolution detection result in the first detection legacy set and the prediction result in the second prediction result set as 1.
In a second aspect, the present application discloses a target tracking apparatus comprising:
the detection result acquisition module is used for detecting each target in the current frame to acquire a detection result set, and determining a high-resolution detection result set and a low-resolution detection result set in the detection result set;
the first matching module is used for matching the high-resolution detection result set with the prediction result of the previous frame, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into the first detection legacy set and the first prediction result set;
the second matching module is used for matching the low-score detection result set with the first prediction result set, updating the corresponding prediction result by using the successfully matched low-score detection result, putting the updated prediction result into the tracking set, discarding the low-score detection result with failed matching, and putting the prediction result with failed matching into the second prediction result set;
The third matching module is used for matching the first detection legacy set with the second prediction result set, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into the second detection legacy set and the lost tracking set;
the tracking result acquisition module is used for initializing the high-score detection result in the second detection legacy set and placing the initialized high-score detection result into the tracking set so as to return a target tracking result based on the tracking set;
the third matching module is specifically configured to:
determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method;
determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method;
and determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value.
In a third aspect, the present application discloses an electronic device comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the previously disclosed target tracking method.
In a fourth aspect, the present application discloses a computer-readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements the steps of the previously disclosed object tracking method.
Therefore, the method and the device detect each target in the current frame to obtain a detection result set, and determine a high-resolution detection result set and a low-resolution detection result set in the detection result set; matching the high-resolution detection result set with the prediction result of the previous frame, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into a tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into a first detection legacy set and a first prediction result set; matching the low-score detection result set with the first prediction result set, updating the corresponding prediction result by using the successfully matched low-score detection result, putting the updated prediction result into the tracking set, discarding the low-score detection result with failed matching, and putting the prediction result with failed matching into a second prediction result set; matching the first detection legacy set with the second prediction result set, updating the corresponding prediction result by using the successfully matched high-resolution detection result, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to match into the second detection legacy set and the lost tracking set; initializing high-score detection results in the second detection legacy set, and placing the initialized high-score detection results into the tracking set so as to return target tracking results based on the tracking set. Wherein said matching the first set of detected carryover with the second set of predicted outcomes comprises: determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method; determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method; and determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value.
Therefore, after the high-resolution detection result set and the low-resolution detection result set of the current frame are sequentially matched with the prediction result of the previous frame, a first detection legacy set formed by the high-resolution detection result which is not successfully matched and a second prediction result set formed by the prediction result which is not successfully matched are obtained; and then matching the first detection legacy set and the second prediction result set again, if the matching is successful, updating the prediction result by using the corresponding high-resolution detection result, and placing the updated prediction result into the tracking set, if the matching is failed, respectively placing the corresponding high-resolution detection result and the prediction result into the second detection legacy set and the lost tracking set, initializing the high-resolution detection result in the second detection legacy set, and placing the initialized high-resolution detection result into the tracking set so as to return the final target tracking result based on the tracking set. Specifically, when the first detection legacy set is matched with the second prediction result set, the matching method used in calculating the distance is a CIoU method, when the appearance feature distance between the detection target and the tracking target is calculated, a cosine similarity calculation mode is used, the final target cost value is determined from the CIoU distance cost and the appearance feature distance cost, and the matching result is determined based on the target cost value, so that the target cost value can accurately reflect the final cost between the detection target and the tracking target of the previous frame, the matching success rate is increased, and the tracking target ID switching phenomenon is reduced. That is, the matching process of the first detection legacy set and the second prediction result set is further added on the basis of the original tracking process, so that the matching range can be further expanded to a peripheral area, the matching success rate is increased, the problem that targets cannot be continuously tracked due to the fact that lost targets cannot be matched again is solved, and the phenomenon of ID switching of tracking targets is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flow chart of a target tracking method disclosed in the present application;
FIG. 2 is a specific matching flow chart disclosed herein;
FIG. 3 is a schematic diagram of the positions of a detection target and a tracking target disclosed in the present application;
FIG. 4 is a schematic diagram of a target tracking apparatus disclosed in the present application;
fig. 5 is a block diagram of an electronic device disclosed in the present application.
Detailed Description
The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The existing tracking method has the following problems: after a plurality of frames are continuously lost, when the tracking target reappears, the detection target and the tracking prediction result have no intersection, so that the detection target and the tracking prediction result cannot be matched, and finally the lost target cannot be matched again, so that the problem of tracking ID switching is caused. Therefore, the embodiment of the application discloses a target tracking method, device, equipment and medium, which can increase the matching success rate so as to continuously track the target and avoid the situation that the lost target cannot be matched again.
Referring to fig. 1, an embodiment of the present application discloses a target tracking method, which includes:
step S11: and detecting each target in the current frame to obtain a detection result set, and determining a high-resolution detection result set and a low-resolution detection result set in the detection result set.
In this embodiment, a video to be traversed is acquiredVAnd traverse the videoVEach frame of (2)f. In the current framef k For example, performing target detection on the current frame to obtain a corresponding detection result setD k And determining a high-resolution detection result set in the detection result setD high And a low score detection result setD low
In a specific embodiment, the detecting each target in the current frame to obtain a detection result set specifically includes: and detecting the current frame by using a target detector to obtain detection results and detection scores corresponding to the targets, and constructing a detection result set based on the detection results.
Further, in a specific embodiment, the determining the high-score detection result set and the low-score detection result set in the detection result set includes: and determining a preset detection score threshold value, and determining a high-resolution detection result set and a low-resolution detection result set in the detection result set based on the detection score threshold value and the detection scores corresponding to the targets.
It will be appreciated that the present embodiment utilizes a target detectorDetAnd detecting the current frame to obtain detection results and detection scores corresponding to the targets, and constructing a detection result set based on the detection results. And dividing the detection result set into a high-score detection result set and a low-score detection result set according to a preset detection score threshold and detection scores corresponding to the targets. Specifically, when the detection score is lower than the threshold value, the corresponding detection result is put into the low-score detection result set, and when the detection score is not lower than the threshold value, the corresponding detection result is put into the high-score detection result set.
Step S12: and matching the high-resolution detection result set with the prediction result of the previous frame, updating the corresponding prediction result by using the successfully matched high-resolution detection result, putting the updated prediction result into a tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into a first detection legacy set and a first prediction result set.
In the present embodiment, the high-resolution detection results are collectedD high Prediction result of previous frameP k Matching, if the matching is successful, updating the corresponding prediction result by using the high-resolution detection result on the matching, and putting the updated prediction result into a tracking setTThe method comprises the steps of carrying out a first treatment on the surface of the If the matching fails, the high-resolution detection result which is not matched is put into a first detection legacy setD remain And placing the unmatched prediction results into the first prediction result setP remain
It should be noted that, in the specific embodiment, before the matching the high-resolution detection result set with the prediction result of the previous frame, the method further includes: and predicting the tracking result of the previous frame by using a Kalman prediction method to obtain a prediction result of the previous frame. It will be appreciated that the prediction result of the previous frame is obtained by tracking the previous frameT k-1 And (5) performing Kalman prediction.
Step S13: matching the low-score detection result set with the first prediction result set, updating the corresponding prediction result by using the successfully matched low-score detection result, putting the updated prediction result into the tracking set, discarding the low-score detection result with failed matching, and putting the prediction result with failed matching into the second prediction result set.
In the present embodiment, the low-resolution detection results are collectedD low And a first set of prediction resultsP remain Matching, if the matching is successful, updating the corresponding prediction result by using the low-resolution detection result on the matching, and putting the updated prediction result into a tracking setTThe method comprises the steps of carrying out a first treatment on the surface of the If the matching fails, discarding the low-resolution detection result without matching, and putting the prediction result without matching into a second prediction result setP re-remain
Step S14: and matching the first detection legacy set with the second prediction result set, updating the corresponding prediction result by using the successfully matched high-resolution detection result, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to match into the second detection legacy set and the lost tracking set.
In this embodiment, the first detection legacy setD remain And a second prediction result setP re-remain Performing re-matching, if the matching is successful, updating the corresponding prediction result by using the high-resolution detection result on the matching, and putting the updated prediction result into a tracking setTThe method comprises the steps of carrying out a first treatment on the surface of the If the matching is lostIf the detection result is out of date, the high-score detection result which is not matched is put into a second detection legacy setD re-remain And placing the prediction result without the match into the loss tracking set T loss
Specifically, referring to fig. 2, the embodiment of the present application provides specific steps for matching a first detection legacy set with the second prediction result set, including:
step S141: determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method; and determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method.
In this embodiment, when the first detection legacy set is matched with the second prediction result set, the matching method used is the CIoU method. The CIoU method is used for better reflecting the distance between the detection target frame and the tracking target frame, and the distance parameter can be obtained for matching when the detection target frame and the tracking target frame are not intersected. The method comprises the steps of firstly determining CIoU distance cost between a high-resolution detection result in a first detection legacy set and a prediction result in a second prediction result set based on a CIoU method, and determining appearance feature distance cost between the high-resolution detection result in the first detection legacy set and the prediction result in the second prediction result set based on a cosine similarity method. It can be understood that the calculation of the matching value has two parts, one part is distance calculation, and the additional matching in this embodiment uses CIoU; the second part is to calculate the appearance feature distance between the detection target and the tracking target, and the present embodiment uses the cosine similarity calculation mode.
In a specific embodiment, the determining, based on the cosine similarity method, the appearance feature distance cost between the high-resolution detection result in the first detection legacy set and the prediction result in the second prediction result set includes: determining cosine similarity by using a preset CIoU threshold and a preset appearance characteristic threshold; judging whether the cosine similarity and the CIoU distance are respectively smaller than the corresponding preset appearance characteristic threshold and the corresponding preset CIoU threshold; if yes, determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set by using the cosine similarity and a preset coefficient; and if not, determining the appearance characteristic distance cost between the high-resolution detection result in the first detection legacy set and the prediction result in the second prediction result set as 1. In this embodiment, the feature similarity calculation mode is the cosine distance of the feature, and since the ByteTrack calculates only the distance and does not use the feature similarity, the matching mode in the ByteTrack tracking flow is not continued, because the matching range is expanded, and when CIoU is possibly calculated, other target detection frames are also matched, so that the feature cosine similarity is used to further restrict the matching. When the cosine similarity is smaller than a preset appearance feature threshold value and the CIoU distance is also smaller than the preset CIoU threshold value, the appearance feature distance cost is the product of a preset coefficient and the cosine similarity; otherwise, the appearance feature distance cost is set to 1. It can be understood that when the cosine similarity and the CIoU distance are not smaller than the respective corresponding thresholds, it is indicated that the detection result and the tracking result are not the same target, and therefore, by setting the cost value to 1, it is ensured that the subsequent matching result is a matching failure.
The specific formula is as follows:
Figure SMS_1
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_2
is the firstiPrediction result and the firstjAppearance feature distance cost between detection results, < ->
Figure SMS_3
Is the firstiPrediction result and the firstjCosine similarity between the detection results, +.>
Figure SMS_4
For the preset appearance characteristic threshold value, < >>
Figure SMS_5
For the preset CIoU threshold, +.>
Figure SMS_6
Is the firstiPrediction result and the firstjThe CIoU distance between the detection results is 0.5 which is a preset coefficient and can be specifically set according to actual conditions; and, cosine similarity is determined by a preset appearance threshold +.>
Figure SMS_7
And presetting a CIoU threshold
Figure SMS_8
And (5) calculating and determining.
For example, referring to fig. 3, when the tracking target is lost and appears again, due to the problem of prediction error or detection error, the prediction frame and the corresponding detection frame of the tracking target do not necessarily have an overlapping area, so that the length value of IoU is 0, and therefore, the two cannot be matched. In the figure, solid line boxes represent detection targets, respectively denoted by 1', 2', and 3', dotted line boxes represent tracking targets, respectively denoted by 1, 2, and 3, and the same numerals denote the same targets. When the solid line box and the dotted line box do not intersect, matching can also be performed by setting a CIoU threshold. In addition, the additional matching flow does not interfere with the matching logic of the original tracking flow, and can be suitable for most of the current tracking flows to realize plug and play.
Step S142: and determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value.
In this embodiment, the determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value specifically includes: taking the smaller value of the CIoU distance cost and the appearance characteristic distance cost as a target cost value, and judging whether the target cost value is smaller than a preset cost threshold value or not; if the target cost value is smaller than the preset cost threshold value, the matching result is successful; and if the target cost value is not smaller than the preset cost threshold, the matching result is a matching failure. It can be understood that, in this embodiment, the CIoU distance cost and the appearance feature distance cost are fused, and a smaller value of the CIoU distance cost and the appearance feature distance cost is selected as a final target cost value, and the target cost value can be successfully matched when the target cost value meets a cost threshold condition. Specifically, if the target cost value is smaller than the preset cost threshold, the matching result is successful; if the target cost value is not smaller than the preset cost threshold, the matching result is a matching failure. Compared with the scheme that the distance cost and the characteristic cost are weighted and summed to obtain the final cost value in the prior art by using a weighted average mode, the target cost value obtained by the method can accurately reflect the final cost between the detection target and the tracking target of the previous frame, so that the matching success rate is effectively increased, and the ID switching phenomenon of the tracking target is reduced.
Step S15: initializing high-score detection results in the second detection legacy set, and placing the initialized high-score detection results into the tracking set so as to return target tracking results based on the tracking set.
In this embodiment, the second detection legacy set is initializedD re-remain The high-resolution detection result in (1) is newtrackers(t)And put into tracking collectionTSo as to return a target tracking result based on the resulting tracking set. It can be understood that the high-resolution detection result in the second detection legacy set corresponds to an unsuccessfully matched target in the current frame, so that the purpose of initialization is to take the high-resolution detection result as a new target so as to match with the next frame, so as to further improve the matching success rate, and in the process of initialization, a new ID value is given, the position of the high-resolution detection result is recorded, and the state of the high-resolution detection result is set to be an unacknowledged state and then is put into the tracking set of the current frame. Further, it should be noted that when tracking sets are lostT loss When there is continuous loss exceeding the preset frame number, the videos are processedFrames are deleted from the loss tracking set, wherein the preset number of frames may be specifically set to 30.
Therefore, the method and the device detect each target in the current frame to obtain a detection result set, and determine a high-resolution detection result set and a low-resolution detection result set in the detection result set; matching the high-resolution detection result set with the prediction result of the previous frame, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into a tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into a first detection legacy set and a first prediction result set; matching the low-score detection result set with the first prediction result set, updating the corresponding prediction result by using the successfully matched low-score detection result, putting the updated prediction result into the tracking set, discarding the low-score detection result with failed matching, and putting the prediction result with failed matching into a second prediction result set; matching the first detection legacy set with the second prediction result set, updating the corresponding prediction result by using the successfully matched high-resolution detection result, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to match into the second detection legacy set and the lost tracking set; initializing high-score detection results in the second detection legacy set, and placing the initialized high-score detection results into the tracking set so as to return target tracking results based on the tracking set. Wherein said matching the first set of detected carryover with the second set of predicted outcomes comprises: determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method; determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method; and determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value.
Therefore, after the high-resolution detection result set and the low-resolution detection result set of the current frame are sequentially matched with the prediction result of the previous frame, a first detection legacy set formed by the high-resolution detection result which is not successfully matched and a second prediction result set formed by the prediction result which is not successfully matched are obtained; and then matching the first detection legacy set and the second prediction result set again, if the matching is successful, updating the prediction result by using the corresponding high-resolution detection result, and placing the updated prediction result into the tracking set, if the matching is failed, respectively placing the corresponding high-resolution detection result and the prediction result into the second detection legacy set and the lost tracking set, initializing the high-resolution detection result in the second detection legacy set, and placing the initialized high-resolution detection result into the tracking set so as to return the final target tracking result based on the tracking set. Specifically, when the first detection legacy set is matched with the second prediction result set, the matching method used in calculating the distance is a CIoU method, when the appearance feature distance between the detection target and the tracking target is calculated, a cosine similarity calculation mode is used, the final target cost value is determined from the CIoU distance cost and the appearance feature distance cost, and the matching result is determined based on the target cost value, so that the target cost value can accurately reflect the final cost between the detection target and the tracking target of the previous frame, the matching success rate is increased, and the tracking target ID switching phenomenon is reduced. That is, the matching process of the first detection legacy set and the second prediction result set is further added on the basis of the original tracking process, so that the matching range can be further expanded to a peripheral area, the matching success rate is increased, the problem that targets cannot be continuously tracked due to the fact that lost targets cannot be matched again is solved, and the phenomenon of ID switching of tracking targets is reduced.
Referring to fig. 4, an embodiment of the present application discloses a target tracking apparatus, which includes:
a detection result obtaining module 11, configured to detect each target in the current frame to obtain a detection result set, and determine a high-resolution detection result set and a low-resolution detection result set in the detection result set;
a first matching module 12, configured to match the high-resolution detection result set with the prediction result of the previous frame, update the high-resolution detection result that is successfully matched with the corresponding prediction result, put the updated prediction result into the tracking set, and put the high-resolution detection result and the prediction result that are failed to be matched into the first detection legacy set and the first prediction result set, respectively;
a second matching module 13, configured to match the low-resolution detection result set with the first prediction result set, update the low-resolution detection result that is successfully matched with the corresponding prediction result, put the updated prediction result into the tracking set, discard the low-resolution detection result that is failed to be matched, and put the prediction result that is failed to be matched into a second prediction result set;
a third matching module 14, configured to match the first detection legacy set with the second prediction result set, update the high-resolution detection result that is successfully matched with the corresponding prediction result, and put the updated prediction result into the tracking set, and put the high-resolution detection result and the prediction result that are failed to match into the second detection legacy set and the lost tracking set, respectively;
A tracking result obtaining module 15, configured to initialize a high-score detection result in the second detection legacy set, and put the initialized high-score detection result into the tracking set, so as to return a target tracking result based on the tracking set;
wherein, the third matching module 14 is specifically configured to:
determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method;
determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method;
and determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value.
Therefore, the method and the device detect each target in the current frame to obtain a detection result set, and determine a high-resolution detection result set and a low-resolution detection result set in the detection result set; matching the high-resolution detection result set with the prediction result of the previous frame, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into a tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into a first detection legacy set and a first prediction result set; matching the low-score detection result set with the first prediction result set, updating the corresponding prediction result by using the successfully matched low-score detection result, putting the updated prediction result into the tracking set, discarding the low-score detection result with failed matching, and putting the prediction result with failed matching into a second prediction result set; matching the first detection legacy set with the second prediction result set, updating the corresponding prediction result by using the successfully matched high-resolution detection result, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to match into the second detection legacy set and the lost tracking set; initializing high-score detection results in the second detection legacy set, and placing the initialized high-score detection results into the tracking set so as to return target tracking results based on the tracking set. Wherein said matching the first set of detected carryover with the second set of predicted outcomes comprises: determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method; determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method; and determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value.
Therefore, after the high-resolution detection result set and the low-resolution detection result set of the current frame are sequentially matched with the prediction result of the previous frame, a first detection legacy set formed by the high-resolution detection result which is not successfully matched and a second prediction result set formed by the prediction result which is not successfully matched are obtained; and then matching the first detection legacy set and the second prediction result set again, if the matching is successful, updating the prediction result by using the corresponding high-resolution detection result, and placing the updated prediction result into the tracking set, if the matching is failed, respectively placing the corresponding high-resolution detection result and the prediction result into the second detection legacy set and the lost tracking set, initializing the high-resolution detection result in the second detection legacy set, and placing the initialized high-resolution detection result into the tracking set so as to return the final target tracking result based on the tracking set. Specifically, when the first detection legacy set is matched with the second prediction result set, the matching method used in calculating the distance is a CIoU method, when the appearance feature distance between the detection target and the tracking target is calculated, a cosine similarity calculation mode is used, the final target cost value is determined from the CIoU distance cost and the appearance feature distance cost, and the matching result is determined based on the target cost value, so that the target cost value can accurately reflect the final cost between the detection target and the tracking target of the previous frame, the matching success rate is increased, and the tracking target ID switching phenomenon is reduced. That is, the matching process of the first detection legacy set and the second prediction result set is further added on the basis of the original tracking process, so that the matching range can be further expanded to a peripheral area, the matching success rate is increased, the problem that targets cannot be continuously tracked due to the fact that lost targets cannot be matched again is solved, and the phenomenon of ID switching of tracking targets is reduced.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Specifically, the method comprises the following steps: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is configured to store a computer program that is loaded and executed by the processor 21 to implement the relevant steps of the object tracking method performed by the electronic device as disclosed in any of the foregoing embodiments.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
Processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The processor 21 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 21 may also comprise a main processor, which is a processor for processing data in an awake state, also called CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 21 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 21 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon include an operating system 221, a computer program 222, and data 223, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and the computer program 222, so as to implement the operation and processing of the processor 21 on the mass data 223 in the memory 22, which may be Windows, unix, linux. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the object tracking method performed by the electronic device 20 as disclosed in any of the previous embodiments. The data 223 may include, in addition to data received by the electronic device and transmitted by the external device, data collected by the input/output interface 25 itself, and so on.
Further, the embodiment of the application also discloses a computer readable storage medium, wherein the storage medium stores a computer program, and when the computer program is loaded and executed by a processor, the method steps executed in the target tracking process disclosed in any of the previous embodiments are realized.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description of the target tracking method, device, apparatus and storage medium provided by the present invention applies specific examples to illustrate the principles and embodiments of the present invention, and the above examples are only used to help understand the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (8)

1. A target tracking method, comprising:
detecting each target in the current frame to obtain a detection result set, and determining a high-resolution detection result set and a low-resolution detection result set in the detection result set;
matching the high-resolution detection result set with the prediction result of the previous frame, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into a tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into a first detection legacy set and a first prediction result set;
matching the low-score detection result set with the first prediction result set, updating the corresponding prediction result by using the successfully matched low-score detection result, putting the updated prediction result into the tracking set, discarding the low-score detection result with failed matching, and putting the prediction result with failed matching into a second prediction result set;
matching the first detection legacy set with the second prediction result set, updating the corresponding prediction result by using the successfully matched high-resolution detection result, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to match into the second detection legacy set and the lost tracking set;
Initializing a high-score detection result in the second detection legacy set, and placing the initialized high-score detection result into the tracking set so as to return a target tracking result based on the tracking set;
wherein said matching the first set of detected carryover with the second set of predicted outcomes comprises:
determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method;
determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method;
determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value;
and determining appearance feature distance cost between the high-resolution detection result in the first detection legacy set and the prediction result in the second prediction result set based on the cosine similarity method, including:
determining cosine similarity by utilizing high-resolution detection results in the first detection legacy set and prediction results in the second prediction result set;
Judging whether the cosine similarity and the CIoU distance cost are respectively smaller than a corresponding preset appearance characteristic threshold value and a preset CIoU threshold value or not;
if yes, determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set by using the cosine similarity and a preset coefficient;
and if not, determining the appearance characteristic distance cost between the high-resolution detection result in the first detection legacy set and the prediction result in the second prediction result set as 1.
2. The method according to claim 1, wherein detecting each object in the current frame to obtain a detection result set includes:
and detecting the current frame by using a target detector to obtain detection results and detection scores corresponding to the targets, and constructing a detection result set based on the detection results.
3. The target tracking method of claim 2, wherein the determining a high score detection result set and a low score detection result set of the detection result sets comprises:
and determining a preset detection score threshold value, and determining a high-resolution detection result set and a low-resolution detection result set in the detection result set based on the detection score threshold value and the detection scores corresponding to the targets.
4. The target tracking method of claim 1, wherein before the matching the high-resolution detection result set with the prediction result of the previous frame, further comprising:
and predicting the tracking result of the previous frame by using a Kalman prediction method to obtain a prediction result of the previous frame.
5. The target tracking method according to claim 1, wherein the determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value, comprises:
taking the smaller value of the CIoU distance cost and the appearance characteristic distance cost as a target cost value, and judging whether the target cost value is smaller than a preset cost threshold value or not;
if the target cost value is smaller than the preset cost threshold value, the matching result is successful;
and if the target cost value is not smaller than the preset cost threshold, the matching result is a matching failure.
6. An object tracking device, comprising:
the detection result acquisition module is used for detecting each target in the current frame to acquire a detection result set, and determining a high-resolution detection result set and a low-resolution detection result set in the detection result set;
The first matching module is used for matching the high-resolution detection result set with the prediction result of the previous frame, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into the first detection legacy set and the first prediction result set;
the second matching module is used for matching the low-score detection result set with the first prediction result set, updating the corresponding prediction result by using the successfully matched low-score detection result, putting the updated prediction result into the tracking set, discarding the low-score detection result with failed matching, and putting the prediction result with failed matching into the second prediction result set;
the third matching module is used for matching the first detection legacy set with the second prediction result set, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into the second detection legacy set and the lost tracking set;
The tracking result acquisition module is used for initializing the high-score detection result in the second detection legacy set and placing the initialized high-score detection result into the tracking set so as to return a target tracking result based on the tracking set;
the third matching module is specifically configured to:
determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method;
determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method;
determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value;
and, the third matching module is further configured to:
determining cosine similarity by utilizing high-resolution detection results in the first detection legacy set and prediction results in the second prediction result set;
judging whether the cosine similarity and the CIoU distance cost are respectively smaller than a corresponding preset appearance characteristic threshold value and a preset CIoU threshold value or not;
If yes, determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set by using the cosine similarity and a preset coefficient;
and if not, determining the appearance characteristic distance cost between the high-resolution detection result in the first detection legacy set and the prediction result in the second prediction result set as 1.
7. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the object tracking method according to any one of claims 1 to 5.
8. A computer-readable storage medium storing a computer program; wherein the computer program when executed by a processor implements the steps of the object tracking method according to any one of claims 1 to 5.
CN202310293456.0A 2023-03-24 2023-03-24 Target tracking method, device, equipment and medium Active CN115994928B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310293456.0A CN115994928B (en) 2023-03-24 2023-03-24 Target tracking method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310293456.0A CN115994928B (en) 2023-03-24 2023-03-24 Target tracking method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN115994928A CN115994928A (en) 2023-04-21
CN115994928B true CN115994928B (en) 2023-06-09

Family

ID=85995395

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310293456.0A Active CN115994928B (en) 2023-03-24 2023-03-24 Target tracking method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN115994928B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022205936A1 (en) * 2021-03-30 2022-10-06 深圳市优必选科技股份有限公司 Multi-target tracking method and apparatus, and electronic device and readable storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114972443A (en) * 2022-06-28 2022-08-30 深圳一清创新科技有限公司 Target tracking method and device and unmanned vehicle
CN114973205A (en) * 2022-06-28 2022-08-30 深圳一清创新科技有限公司 Traffic light tracking method and device and unmanned automobile
CN115830075A (en) * 2023-02-20 2023-03-21 武汉广银飞科技发展有限公司 Hierarchical association matching method for pedestrian multi-target tracking

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022205936A1 (en) * 2021-03-30 2022-10-06 深圳市优必选科技股份有限公司 Multi-target tracking method and apparatus, and electronic device and readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
结合特征点匹配的在线目标跟踪算法;刘兴云;戴声奎;;华侨大学学报(自然科学版)(第03期);全文 *

Also Published As

Publication number Publication date
CN115994928A (en) 2023-04-21

Similar Documents

Publication Publication Date Title
KR102581429B1 (en) Method and apparatus for detecting obstacle, electronic device, storage medium and program
CN107886048B (en) Target tracking method and system, storage medium and electronic terminal
WO2020098708A1 (en) Lane line detection method and apparatus, driving control method and apparatus, and electronic device
EP3882820A1 (en) Node classification method, model training method, device, apparatus, and storage medium
US11783588B2 (en) Method for acquiring traffic state, relevant apparatus, roadside device and cloud control platform
WO2023273041A1 (en) Target detection method and apparatus in vehicle-road coordination, and roadside device
CN113012176A (en) Sample image processing method and device, electronic equipment and storage medium
CN112528927B (en) Confidence determining method based on track analysis, road side equipment and cloud control platform
EP4170561A1 (en) Method and device for improving performance of data processing model, storage medium and electronic device
CN113112542A (en) Visual positioning method and device, electronic equipment and storage medium
CN116645396A (en) Track determination method, track determination device, computer-readable storage medium and electronic device
CN113569657A (en) Pedestrian re-identification method, device, equipment and storage medium
CN112917467B (en) Robot positioning and map building method and device and terminal equipment
US20230245429A1 (en) Method and apparatus for training lane line detection model, electronic device and storage medium
CN115205855A (en) Vehicle target identification method, device and equipment fusing multi-scale semantic information
CN109242882B (en) Visual tracking method, device, medium and equipment
CN115994928B (en) Target tracking method, device, equipment and medium
CN112819889A (en) Method and device for determining position information, storage medium and electronic device
CN117274370A (en) Three-dimensional pose determining method, three-dimensional pose determining device, electronic equipment and medium
CN111626990A (en) Target detection frame processing method and device and electronic equipment
CN112561956B (en) Video target tracking method and device, electronic equipment and storage medium
US20210390334A1 (en) Vehicle association method and device, roadside equipment and cloud control platform
CN113688920A (en) Model training and target detection method and device, electronic equipment and road side equipment
CN111401285A (en) Target tracking method and device and electronic equipment
CN114049615B (en) Traffic object fusion association method and device in driving environment and edge computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant