CN115994928B - Target tracking method, device, equipment and medium - Google Patents
Target tracking method, device, equipment and medium Download PDFInfo
- Publication number
- CN115994928B CN115994928B CN202310293456.0A CN202310293456A CN115994928B CN 115994928 B CN115994928 B CN 115994928B CN 202310293456 A CN202310293456 A CN 202310293456A CN 115994928 B CN115994928 B CN 115994928B
- Authority
- CN
- China
- Prior art keywords
- detection
- result
- prediction result
- detection result
- matching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 90
- 238000001514 detection method Methods 0.000 claims abstract description 397
- 238000004590 computer program Methods 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000002093 peripheral effect Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Image Analysis (AREA)
Abstract
The application discloses a target tracking method, a device, equipment and a medium, which relate to the technical field of computer vision and comprise the following steps: performing target detection on the current frame to obtain a detection result set; matching the high-resolution detection result set with the prediction result of the previous frame, updating the prediction result by using the detection result and putting the prediction result into the tracking set if the matching is successful, and putting the detection result and the prediction result which are failed to match into the first detection legacy set and the first prediction result set; matching the low-resolution detection result set with the first prediction result set, updating the prediction result and putting the prediction result into the tracking set if the low-resolution detection result set is successful, discarding the detection result of matching failure, and putting the prediction result of matching failure into the second prediction result set; matching the first detection legacy set with the second prediction result set, and if successful, updating the prediction result and putting the prediction result into the tracking set, and putting the detection result which fails to match into the second detection legacy set; and initializing a second detection legacy set, and then putting the second detection legacy set into a tracking set to return a target tracking result.
Description
Technical Field
The present invention relates to the field of computer vision, and in particular, to a target tracking method, apparatus, device, and medium.
Background
The most mature tracking method applied at present is as follows: tracking means based on detection. The first stage uses the detection model to obtain a detection result; and in the second stage, a distance or feature matching method is used for matching the detection result with the tracking result of the previous frame. However, the existing tracking method has the following problems: after a plurality of frames are continuously lost, when the tracking target reappears, the detection target and the tracking prediction result have no intersection, so that the detection target and the tracking prediction result cannot be matched, and finally the lost target cannot be matched again, so that the problem of tracking ID switching is caused.
In summary, how to increase the matching success rate to continuously track the target, so as to avoid the problem that the lost target cannot be matched again is currently to be solved.
Disclosure of Invention
Accordingly, the present invention aims to provide a target tracking method, device, equipment and medium, which can increase the matching success rate to continuously track the target, and avoid that the lost target cannot be matched again. The specific scheme is as follows:
in a first aspect, the present application discloses a target tracking method, including:
detecting each target in the current frame to obtain a detection result set, and determining a high-resolution detection result set and a low-resolution detection result set in the detection result set;
Matching the high-resolution detection result set with the prediction result of the previous frame, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into a tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into a first detection legacy set and a first prediction result set;
matching the low-score detection result set with the first prediction result set, updating the corresponding prediction result by using the successfully matched low-score detection result, putting the updated prediction result into the tracking set, discarding the low-score detection result with failed matching, and putting the prediction result with failed matching into a second prediction result set;
matching the first detection legacy set with the second prediction result set, updating the corresponding prediction result by using the successfully matched high-resolution detection result, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to match into the second detection legacy set and the lost tracking set;
initializing a high-score detection result in the second detection legacy set, and placing the initialized high-score detection result into the tracking set so as to return a target tracking result based on the tracking set;
Wherein said matching the first set of detected carryover with the second set of predicted outcomes comprises:
determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method;
determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method;
and determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value.
Optionally, the detecting each target in the current frame to obtain a detection result set includes:
and detecting the current frame by using a target detector to obtain detection results and detection scores corresponding to the targets, and constructing a detection result set based on the detection results.
Optionally, the determining the high-score detection result set and the low-score detection result set in the detection result set includes:
and determining a preset detection score threshold value, and determining a high-resolution detection result set and a low-resolution detection result set in the detection result set based on the detection score threshold value and the detection scores corresponding to the targets.
Optionally, before the matching the high-resolution detection result set with the prediction result of the previous frame, the method further includes:
and predicting the tracking result of the previous frame by using a Kalman prediction method to obtain a prediction result of the previous frame.
Optionally, the determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value includes:
taking the smaller value of the CIoU distance cost and the appearance characteristic distance cost as a target cost value, and judging whether the target cost value is smaller than a preset cost threshold value or not;
if the target cost value is smaller than the preset cost threshold value, the matching result is successful;
and if the target cost value is not smaller than the preset cost threshold, the matching result is a matching failure.
Optionally, the determining, based on the cosine similarity method, an appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set includes:
determining cosine similarity by using a preset CIoU threshold and a preset appearance characteristic threshold;
judging whether the cosine similarity and the CIoU distance are respectively smaller than the corresponding preset appearance characteristic threshold and the corresponding preset CIoU threshold;
If yes, determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set by using the cosine similarity and a preset coefficient;
and if not, determining the appearance characteristic distance cost between the high-resolution detection result in the first detection legacy set and the prediction result in the second prediction result set as 1.
In a second aspect, the present application discloses a target tracking apparatus comprising:
the detection result acquisition module is used for detecting each target in the current frame to acquire a detection result set, and determining a high-resolution detection result set and a low-resolution detection result set in the detection result set;
the first matching module is used for matching the high-resolution detection result set with the prediction result of the previous frame, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into the first detection legacy set and the first prediction result set;
the second matching module is used for matching the low-score detection result set with the first prediction result set, updating the corresponding prediction result by using the successfully matched low-score detection result, putting the updated prediction result into the tracking set, discarding the low-score detection result with failed matching, and putting the prediction result with failed matching into the second prediction result set;
The third matching module is used for matching the first detection legacy set with the second prediction result set, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into the second detection legacy set and the lost tracking set;
the tracking result acquisition module is used for initializing the high-score detection result in the second detection legacy set and placing the initialized high-score detection result into the tracking set so as to return a target tracking result based on the tracking set;
the third matching module is specifically configured to:
determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method;
determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method;
and determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value.
In a third aspect, the present application discloses an electronic device comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the previously disclosed target tracking method.
In a fourth aspect, the present application discloses a computer-readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements the steps of the previously disclosed object tracking method.
Therefore, the method and the device detect each target in the current frame to obtain a detection result set, and determine a high-resolution detection result set and a low-resolution detection result set in the detection result set; matching the high-resolution detection result set with the prediction result of the previous frame, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into a tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into a first detection legacy set and a first prediction result set; matching the low-score detection result set with the first prediction result set, updating the corresponding prediction result by using the successfully matched low-score detection result, putting the updated prediction result into the tracking set, discarding the low-score detection result with failed matching, and putting the prediction result with failed matching into a second prediction result set; matching the first detection legacy set with the second prediction result set, updating the corresponding prediction result by using the successfully matched high-resolution detection result, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to match into the second detection legacy set and the lost tracking set; initializing high-score detection results in the second detection legacy set, and placing the initialized high-score detection results into the tracking set so as to return target tracking results based on the tracking set. Wherein said matching the first set of detected carryover with the second set of predicted outcomes comprises: determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method; determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method; and determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value.
Therefore, after the high-resolution detection result set and the low-resolution detection result set of the current frame are sequentially matched with the prediction result of the previous frame, a first detection legacy set formed by the high-resolution detection result which is not successfully matched and a second prediction result set formed by the prediction result which is not successfully matched are obtained; and then matching the first detection legacy set and the second prediction result set again, if the matching is successful, updating the prediction result by using the corresponding high-resolution detection result, and placing the updated prediction result into the tracking set, if the matching is failed, respectively placing the corresponding high-resolution detection result and the prediction result into the second detection legacy set and the lost tracking set, initializing the high-resolution detection result in the second detection legacy set, and placing the initialized high-resolution detection result into the tracking set so as to return the final target tracking result based on the tracking set. Specifically, when the first detection legacy set is matched with the second prediction result set, the matching method used in calculating the distance is a CIoU method, when the appearance feature distance between the detection target and the tracking target is calculated, a cosine similarity calculation mode is used, the final target cost value is determined from the CIoU distance cost and the appearance feature distance cost, and the matching result is determined based on the target cost value, so that the target cost value can accurately reflect the final cost between the detection target and the tracking target of the previous frame, the matching success rate is increased, and the tracking target ID switching phenomenon is reduced. That is, the matching process of the first detection legacy set and the second prediction result set is further added on the basis of the original tracking process, so that the matching range can be further expanded to a peripheral area, the matching success rate is increased, the problem that targets cannot be continuously tracked due to the fact that lost targets cannot be matched again is solved, and the phenomenon of ID switching of tracking targets is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flow chart of a target tracking method disclosed in the present application;
FIG. 2 is a specific matching flow chart disclosed herein;
FIG. 3 is a schematic diagram of the positions of a detection target and a tracking target disclosed in the present application;
FIG. 4 is a schematic diagram of a target tracking apparatus disclosed in the present application;
fig. 5 is a block diagram of an electronic device disclosed in the present application.
Detailed Description
The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The existing tracking method has the following problems: after a plurality of frames are continuously lost, when the tracking target reappears, the detection target and the tracking prediction result have no intersection, so that the detection target and the tracking prediction result cannot be matched, and finally the lost target cannot be matched again, so that the problem of tracking ID switching is caused. Therefore, the embodiment of the application discloses a target tracking method, device, equipment and medium, which can increase the matching success rate so as to continuously track the target and avoid the situation that the lost target cannot be matched again.
Referring to fig. 1, an embodiment of the present application discloses a target tracking method, which includes:
step S11: and detecting each target in the current frame to obtain a detection result set, and determining a high-resolution detection result set and a low-resolution detection result set in the detection result set.
In this embodiment, a video to be traversed is acquiredVAnd traverse the videoVEach frame of (2)f. In the current framef k For example, performing target detection on the current frame to obtain a corresponding detection result setD k And determining a high-resolution detection result set in the detection result setD high And a low score detection result setD low 。
In a specific embodiment, the detecting each target in the current frame to obtain a detection result set specifically includes: and detecting the current frame by using a target detector to obtain detection results and detection scores corresponding to the targets, and constructing a detection result set based on the detection results.
Further, in a specific embodiment, the determining the high-score detection result set and the low-score detection result set in the detection result set includes: and determining a preset detection score threshold value, and determining a high-resolution detection result set and a low-resolution detection result set in the detection result set based on the detection score threshold value and the detection scores corresponding to the targets.
It will be appreciated that the present embodiment utilizes a target detectorDetAnd detecting the current frame to obtain detection results and detection scores corresponding to the targets, and constructing a detection result set based on the detection results. And dividing the detection result set into a high-score detection result set and a low-score detection result set according to a preset detection score threshold and detection scores corresponding to the targets. Specifically, when the detection score is lower than the threshold value, the corresponding detection result is put into the low-score detection result set, and when the detection score is not lower than the threshold value, the corresponding detection result is put into the high-score detection result set.
Step S12: and matching the high-resolution detection result set with the prediction result of the previous frame, updating the corresponding prediction result by using the successfully matched high-resolution detection result, putting the updated prediction result into a tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into a first detection legacy set and a first prediction result set.
In the present embodiment, the high-resolution detection results are collectedD high Prediction result of previous frameP k Matching, if the matching is successful, updating the corresponding prediction result by using the high-resolution detection result on the matching, and putting the updated prediction result into a tracking setTThe method comprises the steps of carrying out a first treatment on the surface of the If the matching fails, the high-resolution detection result which is not matched is put into a first detection legacy setD remain And placing the unmatched prediction results into the first prediction result setP remain 。
It should be noted that, in the specific embodiment, before the matching the high-resolution detection result set with the prediction result of the previous frame, the method further includes: and predicting the tracking result of the previous frame by using a Kalman prediction method to obtain a prediction result of the previous frame. It will be appreciated that the prediction result of the previous frame is obtained by tracking the previous frameT k-1 And (5) performing Kalman prediction.
Step S13: matching the low-score detection result set with the first prediction result set, updating the corresponding prediction result by using the successfully matched low-score detection result, putting the updated prediction result into the tracking set, discarding the low-score detection result with failed matching, and putting the prediction result with failed matching into the second prediction result set.
In the present embodiment, the low-resolution detection results are collectedD low And a first set of prediction resultsP remain Matching, if the matching is successful, updating the corresponding prediction result by using the low-resolution detection result on the matching, and putting the updated prediction result into a tracking setTThe method comprises the steps of carrying out a first treatment on the surface of the If the matching fails, discarding the low-resolution detection result without matching, and putting the prediction result without matching into a second prediction result setP re-remain 。
Step S14: and matching the first detection legacy set with the second prediction result set, updating the corresponding prediction result by using the successfully matched high-resolution detection result, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to match into the second detection legacy set and the lost tracking set.
In this embodiment, the first detection legacy setD remain And a second prediction result setP re-remain Performing re-matching, if the matching is successful, updating the corresponding prediction result by using the high-resolution detection result on the matching, and putting the updated prediction result into a tracking setTThe method comprises the steps of carrying out a first treatment on the surface of the If the matching is lostIf the detection result is out of date, the high-score detection result which is not matched is put into a second detection legacy setD re-remain And placing the prediction result without the match into the loss tracking set T loss 。
Specifically, referring to fig. 2, the embodiment of the present application provides specific steps for matching a first detection legacy set with the second prediction result set, including:
step S141: determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method; and determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method.
In this embodiment, when the first detection legacy set is matched with the second prediction result set, the matching method used is the CIoU method. The CIoU method is used for better reflecting the distance between the detection target frame and the tracking target frame, and the distance parameter can be obtained for matching when the detection target frame and the tracking target frame are not intersected. The method comprises the steps of firstly determining CIoU distance cost between a high-resolution detection result in a first detection legacy set and a prediction result in a second prediction result set based on a CIoU method, and determining appearance feature distance cost between the high-resolution detection result in the first detection legacy set and the prediction result in the second prediction result set based on a cosine similarity method. It can be understood that the calculation of the matching value has two parts, one part is distance calculation, and the additional matching in this embodiment uses CIoU; the second part is to calculate the appearance feature distance between the detection target and the tracking target, and the present embodiment uses the cosine similarity calculation mode.
In a specific embodiment, the determining, based on the cosine similarity method, the appearance feature distance cost between the high-resolution detection result in the first detection legacy set and the prediction result in the second prediction result set includes: determining cosine similarity by using a preset CIoU threshold and a preset appearance characteristic threshold; judging whether the cosine similarity and the CIoU distance are respectively smaller than the corresponding preset appearance characteristic threshold and the corresponding preset CIoU threshold; if yes, determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set by using the cosine similarity and a preset coefficient; and if not, determining the appearance characteristic distance cost between the high-resolution detection result in the first detection legacy set and the prediction result in the second prediction result set as 1. In this embodiment, the feature similarity calculation mode is the cosine distance of the feature, and since the ByteTrack calculates only the distance and does not use the feature similarity, the matching mode in the ByteTrack tracking flow is not continued, because the matching range is expanded, and when CIoU is possibly calculated, other target detection frames are also matched, so that the feature cosine similarity is used to further restrict the matching. When the cosine similarity is smaller than a preset appearance feature threshold value and the CIoU distance is also smaller than the preset CIoU threshold value, the appearance feature distance cost is the product of a preset coefficient and the cosine similarity; otherwise, the appearance feature distance cost is set to 1. It can be understood that when the cosine similarity and the CIoU distance are not smaller than the respective corresponding thresholds, it is indicated that the detection result and the tracking result are not the same target, and therefore, by setting the cost value to 1, it is ensured that the subsequent matching result is a matching failure.
The specific formula is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,is the firstiPrediction result and the firstjAppearance feature distance cost between detection results, < ->Is the firstiPrediction result and the firstjCosine similarity between the detection results, +.>For the preset appearance characteristic threshold value, < >>For the preset CIoU threshold, +.>Is the firstiPrediction result and the firstjThe CIoU distance between the detection results is 0.5 which is a preset coefficient and can be specifically set according to actual conditions; and, cosine similarity is determined by a preset appearance threshold +.>And presetting a CIoU thresholdAnd (5) calculating and determining.
For example, referring to fig. 3, when the tracking target is lost and appears again, due to the problem of prediction error or detection error, the prediction frame and the corresponding detection frame of the tracking target do not necessarily have an overlapping area, so that the length value of IoU is 0, and therefore, the two cannot be matched. In the figure, solid line boxes represent detection targets, respectively denoted by 1', 2', and 3', dotted line boxes represent tracking targets, respectively denoted by 1, 2, and 3, and the same numerals denote the same targets. When the solid line box and the dotted line box do not intersect, matching can also be performed by setting a CIoU threshold. In addition, the additional matching flow does not interfere with the matching logic of the original tracking flow, and can be suitable for most of the current tracking flows to realize plug and play.
Step S142: and determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value.
In this embodiment, the determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value specifically includes: taking the smaller value of the CIoU distance cost and the appearance characteristic distance cost as a target cost value, and judging whether the target cost value is smaller than a preset cost threshold value or not; if the target cost value is smaller than the preset cost threshold value, the matching result is successful; and if the target cost value is not smaller than the preset cost threshold, the matching result is a matching failure. It can be understood that, in this embodiment, the CIoU distance cost and the appearance feature distance cost are fused, and a smaller value of the CIoU distance cost and the appearance feature distance cost is selected as a final target cost value, and the target cost value can be successfully matched when the target cost value meets a cost threshold condition. Specifically, if the target cost value is smaller than the preset cost threshold, the matching result is successful; if the target cost value is not smaller than the preset cost threshold, the matching result is a matching failure. Compared with the scheme that the distance cost and the characteristic cost are weighted and summed to obtain the final cost value in the prior art by using a weighted average mode, the target cost value obtained by the method can accurately reflect the final cost between the detection target and the tracking target of the previous frame, so that the matching success rate is effectively increased, and the ID switching phenomenon of the tracking target is reduced.
Step S15: initializing high-score detection results in the second detection legacy set, and placing the initialized high-score detection results into the tracking set so as to return target tracking results based on the tracking set.
In this embodiment, the second detection legacy set is initializedD re-remain The high-resolution detection result in (1) is newtrackers(t)And put into tracking collectionTSo as to return a target tracking result based on the resulting tracking set. It can be understood that the high-resolution detection result in the second detection legacy set corresponds to an unsuccessfully matched target in the current frame, so that the purpose of initialization is to take the high-resolution detection result as a new target so as to match with the next frame, so as to further improve the matching success rate, and in the process of initialization, a new ID value is given, the position of the high-resolution detection result is recorded, and the state of the high-resolution detection result is set to be an unacknowledged state and then is put into the tracking set of the current frame. Further, it should be noted that when tracking sets are lostT loss When there is continuous loss exceeding the preset frame number, the videos are processedFrames are deleted from the loss tracking set, wherein the preset number of frames may be specifically set to 30.
Therefore, the method and the device detect each target in the current frame to obtain a detection result set, and determine a high-resolution detection result set and a low-resolution detection result set in the detection result set; matching the high-resolution detection result set with the prediction result of the previous frame, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into a tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into a first detection legacy set and a first prediction result set; matching the low-score detection result set with the first prediction result set, updating the corresponding prediction result by using the successfully matched low-score detection result, putting the updated prediction result into the tracking set, discarding the low-score detection result with failed matching, and putting the prediction result with failed matching into a second prediction result set; matching the first detection legacy set with the second prediction result set, updating the corresponding prediction result by using the successfully matched high-resolution detection result, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to match into the second detection legacy set and the lost tracking set; initializing high-score detection results in the second detection legacy set, and placing the initialized high-score detection results into the tracking set so as to return target tracking results based on the tracking set. Wherein said matching the first set of detected carryover with the second set of predicted outcomes comprises: determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method; determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method; and determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value.
Therefore, after the high-resolution detection result set and the low-resolution detection result set of the current frame are sequentially matched with the prediction result of the previous frame, a first detection legacy set formed by the high-resolution detection result which is not successfully matched and a second prediction result set formed by the prediction result which is not successfully matched are obtained; and then matching the first detection legacy set and the second prediction result set again, if the matching is successful, updating the prediction result by using the corresponding high-resolution detection result, and placing the updated prediction result into the tracking set, if the matching is failed, respectively placing the corresponding high-resolution detection result and the prediction result into the second detection legacy set and the lost tracking set, initializing the high-resolution detection result in the second detection legacy set, and placing the initialized high-resolution detection result into the tracking set so as to return the final target tracking result based on the tracking set. Specifically, when the first detection legacy set is matched with the second prediction result set, the matching method used in calculating the distance is a CIoU method, when the appearance feature distance between the detection target and the tracking target is calculated, a cosine similarity calculation mode is used, the final target cost value is determined from the CIoU distance cost and the appearance feature distance cost, and the matching result is determined based on the target cost value, so that the target cost value can accurately reflect the final cost between the detection target and the tracking target of the previous frame, the matching success rate is increased, and the tracking target ID switching phenomenon is reduced. That is, the matching process of the first detection legacy set and the second prediction result set is further added on the basis of the original tracking process, so that the matching range can be further expanded to a peripheral area, the matching success rate is increased, the problem that targets cannot be continuously tracked due to the fact that lost targets cannot be matched again is solved, and the phenomenon of ID switching of tracking targets is reduced.
Referring to fig. 4, an embodiment of the present application discloses a target tracking apparatus, which includes:
a detection result obtaining module 11, configured to detect each target in the current frame to obtain a detection result set, and determine a high-resolution detection result set and a low-resolution detection result set in the detection result set;
a first matching module 12, configured to match the high-resolution detection result set with the prediction result of the previous frame, update the high-resolution detection result that is successfully matched with the corresponding prediction result, put the updated prediction result into the tracking set, and put the high-resolution detection result and the prediction result that are failed to be matched into the first detection legacy set and the first prediction result set, respectively;
a second matching module 13, configured to match the low-resolution detection result set with the first prediction result set, update the low-resolution detection result that is successfully matched with the corresponding prediction result, put the updated prediction result into the tracking set, discard the low-resolution detection result that is failed to be matched, and put the prediction result that is failed to be matched into a second prediction result set;
a third matching module 14, configured to match the first detection legacy set with the second prediction result set, update the high-resolution detection result that is successfully matched with the corresponding prediction result, and put the updated prediction result into the tracking set, and put the high-resolution detection result and the prediction result that are failed to match into the second detection legacy set and the lost tracking set, respectively;
A tracking result obtaining module 15, configured to initialize a high-score detection result in the second detection legacy set, and put the initialized high-score detection result into the tracking set, so as to return a target tracking result based on the tracking set;
wherein, the third matching module 14 is specifically configured to:
determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method;
determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method;
and determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value.
Therefore, the method and the device detect each target in the current frame to obtain a detection result set, and determine a high-resolution detection result set and a low-resolution detection result set in the detection result set; matching the high-resolution detection result set with the prediction result of the previous frame, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into a tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into a first detection legacy set and a first prediction result set; matching the low-score detection result set with the first prediction result set, updating the corresponding prediction result by using the successfully matched low-score detection result, putting the updated prediction result into the tracking set, discarding the low-score detection result with failed matching, and putting the prediction result with failed matching into a second prediction result set; matching the first detection legacy set with the second prediction result set, updating the corresponding prediction result by using the successfully matched high-resolution detection result, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to match into the second detection legacy set and the lost tracking set; initializing high-score detection results in the second detection legacy set, and placing the initialized high-score detection results into the tracking set so as to return target tracking results based on the tracking set. Wherein said matching the first set of detected carryover with the second set of predicted outcomes comprises: determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method; determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method; and determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value.
Therefore, after the high-resolution detection result set and the low-resolution detection result set of the current frame are sequentially matched with the prediction result of the previous frame, a first detection legacy set formed by the high-resolution detection result which is not successfully matched and a second prediction result set formed by the prediction result which is not successfully matched are obtained; and then matching the first detection legacy set and the second prediction result set again, if the matching is successful, updating the prediction result by using the corresponding high-resolution detection result, and placing the updated prediction result into the tracking set, if the matching is failed, respectively placing the corresponding high-resolution detection result and the prediction result into the second detection legacy set and the lost tracking set, initializing the high-resolution detection result in the second detection legacy set, and placing the initialized high-resolution detection result into the tracking set so as to return the final target tracking result based on the tracking set. Specifically, when the first detection legacy set is matched with the second prediction result set, the matching method used in calculating the distance is a CIoU method, when the appearance feature distance between the detection target and the tracking target is calculated, a cosine similarity calculation mode is used, the final target cost value is determined from the CIoU distance cost and the appearance feature distance cost, and the matching result is determined based on the target cost value, so that the target cost value can accurately reflect the final cost between the detection target and the tracking target of the previous frame, the matching success rate is increased, and the tracking target ID switching phenomenon is reduced. That is, the matching process of the first detection legacy set and the second prediction result set is further added on the basis of the original tracking process, so that the matching range can be further expanded to a peripheral area, the matching success rate is increased, the problem that targets cannot be continuously tracked due to the fact that lost targets cannot be matched again is solved, and the phenomenon of ID switching of tracking targets is reduced.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Specifically, the method comprises the following steps: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is configured to store a computer program that is loaded and executed by the processor 21 to implement the relevant steps of the object tracking method performed by the electronic device as disclosed in any of the foregoing embodiments.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
Processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The processor 21 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 21 may also comprise a main processor, which is a processor for processing data in an awake state, also called CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 21 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 21 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon include an operating system 221, a computer program 222, and data 223, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and the computer program 222, so as to implement the operation and processing of the processor 21 on the mass data 223 in the memory 22, which may be Windows, unix, linux. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the object tracking method performed by the electronic device 20 as disclosed in any of the previous embodiments. The data 223 may include, in addition to data received by the electronic device and transmitted by the external device, data collected by the input/output interface 25 itself, and so on.
Further, the embodiment of the application also discloses a computer readable storage medium, wherein the storage medium stores a computer program, and when the computer program is loaded and executed by a processor, the method steps executed in the target tracking process disclosed in any of the previous embodiments are realized.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description of the target tracking method, device, apparatus and storage medium provided by the present invention applies specific examples to illustrate the principles and embodiments of the present invention, and the above examples are only used to help understand the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Claims (8)
1. A target tracking method, comprising:
detecting each target in the current frame to obtain a detection result set, and determining a high-resolution detection result set and a low-resolution detection result set in the detection result set;
matching the high-resolution detection result set with the prediction result of the previous frame, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into a tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into a first detection legacy set and a first prediction result set;
matching the low-score detection result set with the first prediction result set, updating the corresponding prediction result by using the successfully matched low-score detection result, putting the updated prediction result into the tracking set, discarding the low-score detection result with failed matching, and putting the prediction result with failed matching into a second prediction result set;
matching the first detection legacy set with the second prediction result set, updating the corresponding prediction result by using the successfully matched high-resolution detection result, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to match into the second detection legacy set and the lost tracking set;
Initializing a high-score detection result in the second detection legacy set, and placing the initialized high-score detection result into the tracking set so as to return a target tracking result based on the tracking set;
wherein said matching the first set of detected carryover with the second set of predicted outcomes comprises:
determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method;
determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method;
determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value;
and determining appearance feature distance cost between the high-resolution detection result in the first detection legacy set and the prediction result in the second prediction result set based on the cosine similarity method, including:
determining cosine similarity by utilizing high-resolution detection results in the first detection legacy set and prediction results in the second prediction result set;
Judging whether the cosine similarity and the CIoU distance cost are respectively smaller than a corresponding preset appearance characteristic threshold value and a preset CIoU threshold value or not;
if yes, determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set by using the cosine similarity and a preset coefficient;
and if not, determining the appearance characteristic distance cost between the high-resolution detection result in the first detection legacy set and the prediction result in the second prediction result set as 1.
2. The method according to claim 1, wherein detecting each object in the current frame to obtain a detection result set includes:
and detecting the current frame by using a target detector to obtain detection results and detection scores corresponding to the targets, and constructing a detection result set based on the detection results.
3. The target tracking method of claim 2, wherein the determining a high score detection result set and a low score detection result set of the detection result sets comprises:
and determining a preset detection score threshold value, and determining a high-resolution detection result set and a low-resolution detection result set in the detection result set based on the detection score threshold value and the detection scores corresponding to the targets.
4. The target tracking method of claim 1, wherein before the matching the high-resolution detection result set with the prediction result of the previous frame, further comprising:
and predicting the tracking result of the previous frame by using a Kalman prediction method to obtain a prediction result of the previous frame.
5. The target tracking method according to claim 1, wherein the determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value, comprises:
taking the smaller value of the CIoU distance cost and the appearance characteristic distance cost as a target cost value, and judging whether the target cost value is smaller than a preset cost threshold value or not;
if the target cost value is smaller than the preset cost threshold value, the matching result is successful;
and if the target cost value is not smaller than the preset cost threshold, the matching result is a matching failure.
6. An object tracking device, comprising:
the detection result acquisition module is used for detecting each target in the current frame to acquire a detection result set, and determining a high-resolution detection result set and a low-resolution detection result set in the detection result set;
The first matching module is used for matching the high-resolution detection result set with the prediction result of the previous frame, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into the first detection legacy set and the first prediction result set;
the second matching module is used for matching the low-score detection result set with the first prediction result set, updating the corresponding prediction result by using the successfully matched low-score detection result, putting the updated prediction result into the tracking set, discarding the low-score detection result with failed matching, and putting the prediction result with failed matching into the second prediction result set;
the third matching module is used for matching the first detection legacy set with the second prediction result set, updating the corresponding prediction result by the high-resolution detection result which is successfully matched, putting the updated prediction result into the tracking set, and respectively putting the high-resolution detection result and the prediction result which are failed to be matched into the second detection legacy set and the lost tracking set;
The tracking result acquisition module is used for initializing the high-score detection result in the second detection legacy set and placing the initialized high-score detection result into the tracking set so as to return a target tracking result based on the tracking set;
the third matching module is specifically configured to:
determining CIoU distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a CIoU method;
determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set based on a cosine similarity method;
determining a target cost value from the CIoU distance cost and the appearance feature distance cost, and determining a matching result based on the target cost value;
and, the third matching module is further configured to:
determining cosine similarity by utilizing high-resolution detection results in the first detection legacy set and prediction results in the second prediction result set;
judging whether the cosine similarity and the CIoU distance cost are respectively smaller than a corresponding preset appearance characteristic threshold value and a preset CIoU threshold value or not;
If yes, determining appearance feature distance cost between a high-resolution detection result in the first detection legacy set and a prediction result in the second prediction result set by using the cosine similarity and a preset coefficient;
and if not, determining the appearance characteristic distance cost between the high-resolution detection result in the first detection legacy set and the prediction result in the second prediction result set as 1.
7. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the object tracking method according to any one of claims 1 to 5.
8. A computer-readable storage medium storing a computer program; wherein the computer program when executed by a processor implements the steps of the object tracking method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310293456.0A CN115994928B (en) | 2023-03-24 | 2023-03-24 | Target tracking method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310293456.0A CN115994928B (en) | 2023-03-24 | 2023-03-24 | Target tracking method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115994928A CN115994928A (en) | 2023-04-21 |
CN115994928B true CN115994928B (en) | 2023-06-09 |
Family
ID=85995395
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310293456.0A Active CN115994928B (en) | 2023-03-24 | 2023-03-24 | Target tracking method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115994928B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022205936A1 (en) * | 2021-03-30 | 2022-10-06 | 深圳市优必选科技股份有限公司 | Multi-target tracking method and apparatus, and electronic device and readable storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114972443A (en) * | 2022-06-28 | 2022-08-30 | 深圳一清创新科技有限公司 | Target tracking method and device and unmanned vehicle |
CN114973205A (en) * | 2022-06-28 | 2022-08-30 | 深圳一清创新科技有限公司 | Traffic light tracking method and device and unmanned automobile |
CN115830075A (en) * | 2023-02-20 | 2023-03-21 | 武汉广银飞科技发展有限公司 | Hierarchical association matching method for pedestrian multi-target tracking |
-
2023
- 2023-03-24 CN CN202310293456.0A patent/CN115994928B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022205936A1 (en) * | 2021-03-30 | 2022-10-06 | 深圳市优必选科技股份有限公司 | Multi-target tracking method and apparatus, and electronic device and readable storage medium |
Non-Patent Citations (1)
Title |
---|
结合特征点匹配的在线目标跟踪算法;刘兴云;戴声奎;;华侨大学学报(自然科学版)(第03期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN115994928A (en) | 2023-04-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102581429B1 (en) | Method and apparatus for detecting obstacle, electronic device, storage medium and program | |
CN107886048B (en) | Target tracking method and system, storage medium and electronic terminal | |
WO2020098708A1 (en) | Lane line detection method and apparatus, driving control method and apparatus, and electronic device | |
EP3882820A1 (en) | Node classification method, model training method, device, apparatus, and storage medium | |
US11783588B2 (en) | Method for acquiring traffic state, relevant apparatus, roadside device and cloud control platform | |
WO2023273041A1 (en) | Target detection method and apparatus in vehicle-road coordination, and roadside device | |
CN113012176A (en) | Sample image processing method and device, electronic equipment and storage medium | |
CN112528927B (en) | Confidence determining method based on track analysis, road side equipment and cloud control platform | |
EP4170561A1 (en) | Method and device for improving performance of data processing model, storage medium and electronic device | |
CN113112542A (en) | Visual positioning method and device, electronic equipment and storage medium | |
CN116645396A (en) | Track determination method, track determination device, computer-readable storage medium and electronic device | |
CN113569657A (en) | Pedestrian re-identification method, device, equipment and storage medium | |
CN112917467B (en) | Robot positioning and map building method and device and terminal equipment | |
US20230245429A1 (en) | Method and apparatus for training lane line detection model, electronic device and storage medium | |
CN115205855A (en) | Vehicle target identification method, device and equipment fusing multi-scale semantic information | |
CN109242882B (en) | Visual tracking method, device, medium and equipment | |
CN115994928B (en) | Target tracking method, device, equipment and medium | |
CN112819889A (en) | Method and device for determining position information, storage medium and electronic device | |
CN117274370A (en) | Three-dimensional pose determining method, three-dimensional pose determining device, electronic equipment and medium | |
CN111626990A (en) | Target detection frame processing method and device and electronic equipment | |
CN112561956B (en) | Video target tracking method and device, electronic equipment and storage medium | |
US20210390334A1 (en) | Vehicle association method and device, roadside equipment and cloud control platform | |
CN113688920A (en) | Model training and target detection method and device, electronic equipment and road side equipment | |
CN111401285A (en) | Target tracking method and device and electronic equipment | |
CN114049615B (en) | Traffic object fusion association method and device in driving environment and edge computing equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |