CN112116634A - Multi-target tracking method of semi-online machine - Google Patents
Multi-target tracking method of semi-online machine Download PDFInfo
- Publication number
- CN112116634A CN112116634A CN202010754142.2A CN202010754142A CN112116634A CN 112116634 A CN112116634 A CN 112116634A CN 202010754142 A CN202010754142 A CN 202010754142A CN 112116634 A CN112116634 A CN 112116634A
- Authority
- CN
- China
- Prior art keywords
- frame
- detection
- target
- trajectory
- kalman
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000001514 detection method Methods 0.000 claims abstract description 105
- 230000033001 locomotion Effects 0.000 claims abstract description 44
- 238000001228 spectrum Methods 0.000 claims abstract description 27
- 230000007246 mechanism Effects 0.000 claims abstract description 15
- 230000008859 change Effects 0.000 claims abstract description 14
- 230000008569 process Effects 0.000 claims description 24
- 239000012634 fragment Substances 0.000 claims description 17
- 239000011159 matrix material Substances 0.000 claims description 13
- 238000010586 diagram Methods 0.000 claims description 7
- 230000004927 fusion Effects 0.000 claims description 4
- 238000013467 fragmentation Methods 0.000 claims description 3
- 238000006062 fragmentation reaction Methods 0.000 claims description 3
- 238000012546 transfer Methods 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 abstract description 28
- 238000006243 chemical reaction Methods 0.000 abstract description 2
- 230000006872 improvement Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 241000764238 Isis Species 0.000 description 1
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/277—Analysis of motion involving stochastic approaches, e.g. using Kalman filters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
A multi-target tracking method of a semi-online mechanism is characterized in that a detection frame of a pedestrian or a moving target is obtained according to a pedestrian or moving target video, a Kalman sequence spectrum is obtained according to position change information among the detection frames in a period of time window, a pair of Kalman heads is found according to the Kalman sequence spectrum, a detection frame of the target or the moving object to be tracked in the next frame is obtained through the similarity of an appearance model, a moving model and a size change model, the target or the moving object is enabled to be located in the detection frame in the frame, and otherwise, the target is indicated to be lost; and splicing the detection frames with the similarity higher than the threshold value into the Kalman sequence spectrum, updating a motion model and an appearance model in the Kalman sequence spectrum, and tracking the pedestrian or moving object target in the next frame. The method is suitable for any track splicing type multi-target tracking algorithm, namely, the method is not limited by the constraint of different tracks generated by a plurality of targets such as pedestrians, moving objects and the like, the tracking precision can be effectively improved, and the identity conversion value is reduced.
Description
Technical Field
The invention relates to a tracking method, in particular to a multi-target tracking method of a semi-online device.
Background
The multi-target tracking method is mainly applied to track tracking of a plurality of persons or moving objects in a video sequence shot by a camera: in the driving scene of the unmanned vehicle, real-time track tracking can be carried out and the motion track of the targets of pedestrians or other vehicles on the road can be predicted through the targets of the pedestrians or other vehicles shot by a camera arranged in the unmanned vehicle, so that the unmanned vehicle can implement effective avoidance or automatic driving decision according to the motion of the targets; in a plurality of cross-camera monitoring scenes, a plurality of pedestrians in the camera can be tracked according to requirements, and walking tracks and positioning of a plurality of pedestrian targets can be monitored through videos captured by different cameras; in a sports scene shot by the camera, such as a basketball game, the moving tracks of a plurality of athletes shot by the camera can be respectively tracked by a multi-target tracking method, and actions, behavior analysis and the like on the athlete field are carried out based on the tracked tracks. The multi-target tracking method can also be applied to tracking a plurality of targets such as enemy ships, vehicles and the like in military scenes. The current tracking methods are numerous, but in order to track efficiently, the multi-target tracking method needs to be prompted and optimized in real time, accuracy and the like.
MOT (multi-target tracking) can be mainly divided into online MOT and offline MOT, and the difference between the online MOT and the offline MOT is as follows: the former can be pushed backwards along with the number of real-time frames, a tracking track result can be given in time, and the real-time performance is higher than that of the latter on the whole, and each precision is relatively low; the latter must wait for the whole video sequence to complete the forward calculation, and then track after obtaining the information of detection frames and the like in all video frames, so that it is difficult to meet the real-time requirement compared with the former, but the accuracy is generally higher due to better combination of global information. On-line tracking requires that real-time trajectory tracking be completed immediately after the detection operation for each next frame is completed. Therefore, the online tracking algorithm intuitively has better real-time performance, but cannot effectively utilize the global information of the video, thereby possibly causing the precision to be reduced; in contrast, offline tracking is tracking a track after all frames of a given video sequence have been detected. The mode can well utilize global information, the tracking result is relatively accurate, and the real-time requirement cannot be met. The time receptive field sizes of the online tracking, the semi-online tracking and the offline tracking are respectively the current frame, the time window, the whole, and are increased in sequence; the real-time performance of the system is reduced in turn.
The occlusion problem has been one of the difficulties in MOT, and although the iterative update of various algorithms is very rapid, most of the algorithm performance is still difficult to maintain robust when severe occlusion is encountered. When an occlusion problem is encountered, either an online MOT or an offline MOT, or a MOT constructed using a deep learning method, various approaches have been made to attempt to solve the occlusion problem. But essentially by sacrificing real-time. The precision and the accuracy are very important in the scene of practical tracking application, for example, the poor real-time performance of a tracking algorithm in an unmanned automobile can lead to the delay of vehicle judgment, lead to the misjudgment or delayed decision and cause unnecessary traffic accidents; poor accuracy can lead to a plurality of targets to track in disorder, leads to tracking inefficacy, for example use multi-target tracking algorithm can lead to chasing away when tracking criminal suspects in many intelligent cameras in the city, or the non-suspects who track, cause real suspects to run away etc..
Disclosure of Invention
The invention aims to provide a semi-online multi-target tracking method.
In order to achieve the purpose, the invention is realized by the following technical scheme:
a multi-target tracking method of a semi-online mechanism is characterized in that detection frames of pedestrians or moving targets are obtained through a YOLO-V3 detector according to videos of the pedestrians or the moving targets, a Kalman sequence spectrum is obtained according to position change information among the detection frames in a period of time window, then a pair of Kalman heads are found according to the Kalman sequence spectrum, detection frames of targets or moving objects to be tracked in the next frame are obtained through similarity of an appearance model, a movement model and a size change model, the targets or the moving objects are enabled to be located in the detection frames in the frame, and otherwise, the targets are indicated to be lost; and splicing the detection frames with the similarity higher than the threshold value into the Kalman sequence spectrum, updating a motion model and an appearance model in the Kalman sequence spectrum, and tracking the pedestrian or moving object target in the next frame.
The invention is further improved in that the similarity of the appearance models is obtained by the following process:
at the n-ththIn the frame video, the video is displayed,the size of the patch is fixed [64,128 ]]There are D check boxes, D patch, the Xth check box is shown asThe patch corresponding to the Xth detection frame is
At nthIn the frame, crop and resize operations are carried out on the area where each detection frame is located to obtain D patches with the quantity equal to the fixed size of the detection frame, then the pixels of the D patches are respectively divided into a plurality of groups according to color intervals,
the matrix result reshape obtained by grouping is used as a one-dimensional vector TsrXA one-dimensional vector TsrXAsThereby obtaining an appearance model, and representing the appearance model of the xth detection box and the yth track as: f (X) and f (Y); finally, the appearance model is updated by vector fusion, represented asThe similarity of the appearance model is as the following formula (3-1);
in the formula: lambdaA(X, Y) represents the similarity of the appearance model.
The invention is further improved in that the similarity between the motion model and the size change model is obtained through the following processes: the time difference between adjacent frames is deltatThe k-th target in the n-th frame isPosition center coordinates ofIs (x, y), the velocity vector corresponding to the coordinates isAcceleration vector corresponding to coordinateIs composed ofThe size of the target corresponding to the detection frameIs (w, h), corresponding to the dimensional change speedIs composed ofVarying the driving force ofThe detector impact factor is α;
motion state of kth object in nth frameAnd dimensional stateAre respectively asAndcovariance matrix between element factors in motion stateThe covariance matrix between the element factors in the size state isAccording to the physical motion law, a position prediction equation and a size prediction equation for the next frame are obtained as follows:
namely, it is
Namely, it is
Order toSimplifying the two iterative state transfer equations and the covariance matrix updating equation as follows:
and (3-8) and (3-9) are used as iterative equations of a motion model and a size model, and Kalman filter prediction based on normal distribution is respectively carried out to obtain position prediction information of the (n +1) th frameAnd size prediction information
For any first segment trajectory X and second segment trajectory Y,anda forward velocity vector pointing from the head to the tail of the first trajectory X and a reverse velocity vector pointing from the tail to the head of the second trajectory Y, respectively;representing a motion process simulated by a kalman filter; FX, Y) is a forward similarity score pointing from the tail of trace X to the head of trace Y;is an inverse similarity score pointing from the head of trajectory Y to the tail of trajectory X;
wherein, ΛM(X, Y) representsSimilarity between the first segment trajectory X and the second segment trajectory Y.
A further development of the invention consists in defining the length of the time window as N and the minimum instantiation length of the short trajectory as TmThe Kalman family is denoted by KFM and the kth detection box in the nth frame is denoted by KFM Representing detection boxes in KFMRepresenting an order in the corresponding patch trajectory;
if it is notIt means that the detection box has not cascaded with any fragmentation traces in the KFM, and x meansIs the (x +1) th member of a certain segment of the fragment track in the KFM, and the ith fragment track in the KFM is defined as TKiIf the length of the ith fragment track is greater than TmAnd its motion model, appearance model, size model are not updated in the nth frame, then the ith patch trajectory is instantiated as a reliable short trajectory STjOtherwise, the ith fragment track is disassembled;
the further improvement of the invention is that the specific process of splicing the detection frames with similarity higher than the threshold value into the Kalman sequence spectrum is as follows:
first, finding out the detection frame KH of the n-th frame of pedestrian pictures: detection frame in n-th frame and n +1 frame picturesIn (1),for the detection frame in the image of the nth frame,finding out each pair of detection frames which possibly belong to the same target and are close in the IOU relation for the detection frames in the (n +1) th frame picture; if it is notThen will beAndrespectively labeled as 0 and 1, and willAndreferred to as a pair detection frame KH;
there will be several pairs KH in the nth and n +1 frames (e.g.,and);representing detection boxes in KFMRepresenting order in corresponding patch trajectoriesThe ith detection box in the nth frame is represented as
And step two, predicting:
predicting the position of the pedestrian target in the next frame of picture according to the motion module of each pedestrian target in the current (n +1) th frame of picture
Step three, track growth: according to the formula (3), selecting andthe detection frame with the most similar position isThen, orderAnd updating the instable TKiThe motion model and appearance model of (1);
using updated unstable trajectory TKiPredicting a position in a next frame using the motion model and the appearance model of (a);
the fourth step: repeating the process from the first step to the third step for tracking each frame;
fifthly, instantiation or backtracking: instantiating or backtracking the short track in KFM in the current frame according to the following conditions:
a) instantiation: if the unstable track TK in the Kalman sequence spectrumiLength, if not updated in the last frame, the unstable trajectory TKiInstantiated as a new reliable trajectory STj;
b) Backtracking: if the unstable track TK in the Kalman sequence spectrumiIs less than a threshold value TmAnd there is no update in the last frame, the unstable trajectory TK will be deleted in the Kalman family diagram KFMiAnd is provided withAnd marking the fragment track in the Kalman sequence spectrum as a forbidden route.
The invention has the further improvement that the specific process of the second step is as follows: according toAndestablishing an unstable trajectory TKiAccording to which the prediction belongs to the trajectory TKiIs (n +2)thThe position of the frame, and defining the position as
Compared with the prior art, the invention has the following beneficial effects:
firstly, the method comprises the following steps: the method is suitable for any track splicing type multi-target tracking algorithm, namely, the method is not limited by the constraint of different tracks generated by a plurality of targets such as pedestrians, moving objects and the like, the tracking precision can be effectively improved, and the identity conversion value is reduced;
secondly, the method comprises the following steps: the generated tracking result can be checked, and the wrong tracking result is corrected, so that the algorithm is more robust, for example, when a target positioning error in a current video frame is caused in the tracking of a pedestrian target, namely when the tracking result of the on-line multi-target tracking algorithm is wrong, the error can be detected through a backtracking module in a time window of the method, so as to correct the tracking track;
thirdly, the method comprises the following steps: by carrying out mask covering (IOU) on intersection areas among the targets, the discrimination among the multiple targets is effectively improved at the cost of extremely small calculated amount, so that the problem of target feature disappearance caused by serious shielding in crowded places such as shopping malls, station intersections and the like can be effectively solved, and the feature discrimination of incompletely shielded targets and shielded targets is effectively improved;
fourthly: the invention can use the global information in the existing time window to check and correct the error tracking result in a certain time under the condition of meeting the requirement of real-time property. The method has very robust results in various extreme scenes, and has good adaptability to other algorithms similar to short-track splicing.
Drawings
FIG. 1 is a flowchart of a backtracking algorithm of the present invention.
FIG. 2 is an overall algorithm flow diagram of the present invention.
FIG. 3 is a schematic diagram of an IOU mask module according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of appearance model establishment according to an embodiment of the present invention.
Figure 5 shows a comparison of MOT2015 algorithm performance.
FIG. 6 shows a comparison of the FPS for the MOT2015 algorithms.
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings.
The invention adopts MOT of semi-on-line mechanism, and the method can make good compromise and optimization in the aspects of real-time performance and precision.
Referring to fig. 1, the specific process of the present invention is: a pedestrian or moving object video shot by a camera is detected by a YOLO-V3 detector to obtain a detection frame of the pedestrian or moving object, namely the frame of the pedestrian or moving object is taken to exclude other objects or backgrounds. Acquiring a Kalman sequence spectrum according to position change information among detection frames in a time window by acquiring a video of a period of time, then finding a pair of Kalman heads (Kalman Head, KH) according to the Kalman sequence spectrum, acquiring a detection frame of a target or a moving object to be tracked in the next frame according to the similarity of an appearance model, a motion model and a size change model, and enabling the target or the moving object to be always in the detection frame in the frame, otherwise, indicating that the target is lost. And splicing the detection frames with the similarity higher than the threshold value into the Kalman sequence spectrum, and updating the motion model and the appearance model in the Kalman sequence spectrum for tracking the pedestrian or moving object target in the next frame.
Wherein, the similarity of the appearance model is obtained through the following processes:
at the n-ththIn a frame, the size of the patch is fixed [64,128 ]]The number of groups of pixel histogram is 64, there are D detection boxes, D patch, and the Xth detection box is shown asThe patch corresponding to the Xth detection frame is
Then, at nthIn a frame, the invention performs crop and resize operations on the region in which each detection box is located (e.g., the invention crops and resizes each patch to a shape such as [64,128 ]]Tensor of). After these operations, the invention obtains a number of D patches equal to the fixed size of the detection box. The invention then divides the pixels of the D patches into groups (e.g. 64 groups) each by color interval,
and the matrix result reshape obtained by grouping is used as a one-dimensional vector TsrXThat is, the present invention obtains a 1 × 144 tensor reshaped from a 3 × 64 tensor obtained by color interval grouping. Then the invention converts TsrXAsIs used to represent the vector. Binding [12]The invention obtains an appearance model, and expresses the appearance model of the X-th detection frame and the Y-th track as: f (X) and f (Y). Finally, the present invention updates the appearance model through vector fusion, which can be expressed asTherefore, the appearance similarity can be obtained as the following formula (3-1).
In the formula: lambdaA(X, Y) represents the similarity of the appearance model. The method is an effective way for enhancing the discrimination between the targets when the relation between the track and the detection frame is complex and the single physical motion model cannot be used.
The similarity between the motion model and the size change model is obtained through the following processes: the time difference between adjacent frames is deltatThe k-th target in the n-th frame isPosition center coordinates ofIs (x, y), the velocity vector corresponding to the coordinates isAcceleration vector corresponding to coordinateIs composed ofThe size of the target corresponding to the detection frameIs (w, h), corresponding to the dimensional change speedIs composed ofVarying the driving force ofThe detector impact factor is a (the higher the mIOU of the detector, the higher the value, the default is 0.7).
Motion state of kth object in nth frameAnd dimensional stateAre respectively asAndcovariance matrix between element factors in motion stateThe covariance matrix between the element factors in the size state isAccording to the physical motion law, a position prediction equation and a size prediction equation for the next frame are obtained as follows:
namely, it is
Namely, it is
Order toSimplifying the two iterative state transfer equations and the covariance matrix updating equation as follows:
and (3-8) and (3-9) are used as iterative equations of a motion model and a size model, and Kalman filter prediction based on normal distribution is respectively carried out to obtain position prediction information of the (n +1) th frameAnd size prediction information
For any two sections of the trajectory X and Y,anda forward velocity vector pointing from the head to the tail of trajectory X and a reverse velocity vector pointing from the tail to the head of trajectory Y, respectively (as derived from equations (3-10) and (3-11)).Representing the course of motion simulated by a kalman filter. FX, Y) is directed from the tail of track X to trackA forward similarity score of the head of Y;is the inverse semblance score from the head of track Y to the tail of track X. The similarity is as follows:
the overall similarity can be represented by (3-12), ΛM(X, Y) represents the similarity between the first-stage trajectory X and the second-stage trajectory Y calculated by the equations (3-10) and (3-11). LambdaMThe value range of (X, Y) is [0,1 ]],ΛMThe closer the value of (X, Y) is to 1, the more likely the first segment trajectory X and the second segment trajectory Y belong to the same target in the simulation process of the physical movement of the model, which is an important basis for judging the relationship between the fragment trajectories.
Trajectory confidence can be intuitively understood as the degree of match between the constructed trajectory and the true trajectory of the object. Trace conf (T)i) The confidence of (c) can be represented by the formula (3-13).
In the formula:representing the average similarity between the various detections in the existing trace,indicating railTrace TiTwo detection frames inAnd representing the continuity of the trajectory, α is the number of frames the object is missing, β is a control parameter related to the accuracy of the detector (default to 0.4).
The video sequence selected by the semi-online mechanism on the time axis is positioned between the online mechanism and the offline mechanism, and the performance of the video sequence is a good compromise between the online mechanism and the offline mechanism, but the semi-online tracking mechanism can be well optimized on the aspects of real-time performance and accuracy through algorithm optimization, such as shielding optimization, semantic segmentation optimization and the like.
The present invention defines the length of the time window as N. The minimum instantiation length of the short track is Tm. The kalman family diagram may be denoted KFM, for recording the detection relationship between the motion model and the appearance model. Note that the kth detection box in the nth frame is represented asIt also contains the coordinates and reliabilities detected in the list: [ x, y, w, h, conf](ii) a For the inventionRepresenting detection boxes in KFMRepresenting the order in the corresponding patch trajectory. It can be represented by the following mathematical expression:
if it is notIt means that the detection box has not cascaded with any fragmentation traces in the KFM. x representsIs the (x +1) th member of a certain segment of fragment trace in the KFM. The ith fragment track in KFM is defined as TKiIf it is longer than TmAnd its motion model, appearance model, size model are not updated in the nth frame, it will be instantiated as a reliable short trajectory STjOtherwise, the trace would be disassembled.
The invention will take the situation of the pedestrian picture of the nth frame as an example, and introduce the short track tracking process and the track backtracking strategy in the invention, as shown in fig. 2:
first, finding out the detection frame KH of the n-th frame of pedestrian pictures: detection frame in n-th frame and n +1 frame picturesIn (1),for the detection frame in the image of the nth frame,for the detection frames in the (n +1) th frame picture, each pair of detection frames close in the IOU relationship, which may belong to the same target, is found. If it is notThen will beAndrespectively labeled as 0 and 1, and willAndreferred to as a pair detection frame KH.
After this step, the present invention will have several pairs KH in the nth and n +1 frames (e.g.,and)。representing detection boxes in KFMRepresenting the order in the corresponding patch track, the ith detection box in the nth frame is denoted as
And step two, predicting:
and predicting the positions of the pedestrian objects in the next frame picture according to the motion module of each pedestrian object in the current (n +1) th frame picture. The specific process is as follows:
according toAndestablishing an unstable trajectory TKiThe motion model of (1). According to the motion model, the TK belonging to the trajectory is predictediIs (n +2)thThe position of the frame, and defining the position as
Step three, track growth: selecting and matching the matching strategy according to the formula (3)The detection frame with the most similar position isThen, orderAnd updating the instable TKiMotion model and appearance model.
Using updated unstable trajectory TKiThe motion model and the appearance model of (c) predict a position in the next frame.
The fourth step: the process of the first step to the third step is repeated for tracking of each frame.
Fifthly, instantiation or backtracking: the KFM (e.g., TK) in the current frame is determined according to the following condition0,TK1,…,TKi) Instantiating or backtracking the short track in (1):
a) instantiation: if the unstable track TK in the Kalman sequence spectrumiLength, if not updated in the last frame, the unstable trajectory TKiInstantiated as a new reliable trajectory STj。
For example, an unstable trajectory TKiIf the length is greater than or equal to the threshold value TmThe trace is then a reliable trace. Threshold value TmAccording to actual conditions, 5 is generally adopted.
b) Backtracking: if the unstable track TK in the Kalman sequence spectrumiIs less than a threshold value TmAnd there is no update in the last frame, the unstable trajectory TK will be deleted in the Kalman family diagram KFMiAnd is provided withAnd marking the fragment track in the Kalman sequence spectrum as a forbidden route so as to avoid the path reappearing by later exploration.
The IOU mask module is adopted by the invention to process the condition that two or more targets are mutually shielded, and the process is as follows. As shown in fig. 3, a scene with objects occluded from each other is shown. When the target A and the target B are mutually shielded, before the characteristics are extracted from the detection frame area, the IOU area between the A and the B is used as a mask to cover the pixel information of the IOU area, so that related targets are prevented from sharing the characteristic information of the IOU area, and the distinguishing degree of different target appearance models is effectively improved. However, when a plurality of targets block each other, it is easy to cause the detection area of the target to be almost completely covered by the plurality of IOU masks, thereby causing a phenomenon that the appearance features of the blocked target are completely covered. To avoid this, the present invention sets threshIOUTo avoid the worst case.
Referring to FIG. 3, in the nth frame, the kth detection frame is marked A set of IOU masks between each detection box isFor the kth detection box, assume that there is a set of detection boxesAll the detection frames in (1) are as followsIf the shielding phenomenon occurs in the area of the detection frame, the detection frame is collectedInner detection box, mask IOUIs recorded as the total occlusion merge areaIf it is covered byAfter coveringHas a residual area ofThen is formed byCoveringIs represented by the process of
If obtained, isIs less than a preset threshold value ThresIMI.e. byCan lead to the appearance characteristics of the targetIt is difficult to express in the appearance model, therefore, the invention adopts the method of detecting the box setInner detection frame andsorting the areas of the shielded areas, and then sequentially rejecting the detection frames with the smallest shielded area one by one in the detection frame setIn addition, a new set of detection boxes is obtainedAnd collecting new detection boxesSubstituting the formula (4) and the formula (5) again to calculate untilThen, the final IOU mask is obtained and is usedAnd when the IOU mask module is used for extracting the appearance characteristics, setting the pixel value of the original image area where the IOU mask module is positioned as zero: and the shielded target are intersected in an area through shielding, so that the feature discrimination between the targets is increased.
The following are specific examples.
The time window length is first set to 40 frames. The video and the detection frame of each frame are used as input, feature extraction is carried out, the patch corresponding to the detection frame obtained after cutting and resize is subjected to appearance representation in a pixel histogram grouping mode, as shown in fig. 3, and an appearance model is established. The appearance model is built by the following process: at the n-ththIn a frame, the size of patch is fixed to [64,128 ]]The number of pixel histogram groups is 64, and there are D detection boxes, D patch, and the Xth detection box in the frameIs composed ofThe patch corresponding to the Xth detection frame is
At the n-ththWithin the frame, crop and resize operations are performed on the regions of each inspection box (e.g., the invention crops and resizes each patch to a shape such as [64,128 ]]Tensor of). After these operations, the invention obtains a number of D patches equal to the fixed size of the detection box.
Then, the invention divides the pixels of the D patches into a plurality of groups (such as 64 groups) according to the color interval, and the matrix result reshape obtained by grouping is a one-dimensional vector TsrX. That is, the present invention obtains a 1 × 192 tensor reshaped from a 3 × 64 tensor obtained by color interval grouping. Then the invention converts the one-dimensional vector TsrXPatch corresponding to the Xth detection frameThe appearance model of (1).
And combining the appearance model, and expressing the appearance model of the X detection frame and the appearance model of the Y track as follows: f (X) and f (Y). Finally, the present invention updates the appearance model through vector fusion, which can be expressed asTherefore, the appearance similarity can be obtained as shown in formula (7).
In the formula: lambdaA(X, Y) represents the similarity of the appearance model.
Through the steps, fragment tracks with greatly improved reliability can be obtained, and noises appearing in detection can be effectively checked, as shown in table 1.
TABLE 1 comparison of the results of the algorithms at MOT15
Referring to fig. 5 and 6, the present invention combines the advantages of online and offline MOT at the expense of a small amount of real-time performance, and has good precision improvement on MOTA, MOTP, IDS, ML, MT, and FM, and a proper balance between real-time performance and accuracy is achieved.
Compared with baseline, on a data set MOT2015, generally, besides fps, a plurality of indexes are basically better to perform, wherein MOTA and MOTP are respectively improved by 12.6 percent and 6.3 percent, which shows that compared with baseline, the algorithm of the invention has very large improvement on continuous tracking capability of a target on the whole, and simultaneously shows that short track fragments generated by the algorithm of the invention are more accurate and robust; the algorithm of the present invention reduces 82 on IDS, with less boost from the overall number of identity translations. The algorithm of the invention has a higher MT value and a lower ML value, which shows that to a certain extent, the more robust short track can reduce the number of frames which are missed among partial fragment tracks. From the overall performance among the algorithms, the MOTA bit column of the algorithm is the first, and other indexes such as MOTP, Recall and IDS are far better than the average value overall, which means that the algorithm has stronger stability and generalization capability on the algorithm framework. It is particularly noted that the backtracking mechanism does not include any module involving complex computation, and only relies on the simple side appearance model and the motion model to perform the tracking process through the state between online and offline, so that the algorithm of the present invention has an extremely significant FPS advantage over other algorithms in the table.
The invention adopts a semi-online mechanism to optimize the real-time performance and the accuracy of the multi-target tracking method. The method can detect and correct the error of the established tracking result, effectively improve the appearance characteristic degree of the target, has high speed and low calculation resource requirement, can be used on special integrated circuits such as Yingdada TX2 and the like in the scenes of automatic driving, pedestrian tracking and the like, and effectively solves the problem that the real-time performance and the algorithm precision such as MOTA (motion over adaptive tracking) indexes of the existing multi-target tracking algorithm are difficult to achieve the optimal performance at the same time.
Claims (6)
1. A multi-target tracking method of a semi-online mechanism is characterized in that a detection frame of a pedestrian or a moving target is obtained through a YOLO-V3 detector according to a pedestrian or moving target video, a Kalman sequence spectrum is obtained according to position change information among the detection frames in a period of time window, then a pair of Kalman heads is found according to the Kalman sequence spectrum, a detection frame of the target or the moving object to be tracked in the next frame is obtained through similarity of an appearance model, a moving model and a size change model, the target or the moving object is enabled to be located in the detection frame in the frame, and if not, the target is indicated to be lost; and splicing the detection frames with the similarity higher than the threshold value into the Kalman sequence spectrum, updating a motion model and an appearance model in the Kalman sequence spectrum, and tracking the pedestrian or moving object target in the next frame.
2. The multi-target tracking method for the semi-online mechanism according to claim 1, wherein the similarity of the appearance model is obtained through the following processes:
at the n-ththIn frame video, the size of patch is fixed to [64,128 ]]There are D check boxes, D patch, the Xth check box is shown asThe patch corresponding to the Xth detection frame is
At nthIn the frame, crop and resize operations are carried out on the area where each detection frame is located to obtain D patches with the quantity equal to the fixed size of the detection frame, then the pixels of the D patches are respectively divided into a plurality of groups according to color intervals,
will be grouped intoThe resulting matrix result reshape is the one-dimensional vector TsrXA one-dimensional vector TsrXAsThereby obtaining an appearance model, and representing the appearance model of the xth detection box and the yth track as: f (X) and f (Y); finally, the appearance model is updated by vector fusion, represented asThe similarity of the appearance model is as the following formula (3-1);
in the formula: lambdaA(X, Y) represents the similarity of the appearance model.
3. The multi-target tracking method for the semi-online mechanism according to claim 1, wherein the similarity between the motion model and the size change model is obtained through the following processes: the time difference between adjacent frames is deltatThe k-th target in the n-th frame isPosition center coordinates ofIs (x, y), the velocity vector corresponding to the coordinates isAcceleration vector corresponding to coordinateIs composed ofThe size of the target corresponding to the detection frameIs (w, h), corresponding to the dimensional change speedIs composed ofVarying the driving force ofThe detector impact factor is α;
motion state of kth object in nth frameAnd dimensional stateAre respectively asAndcovariance matrix between element factors in motion stateThe covariance matrix between the element factors in the size state isAccording to the physical motion law, a position prediction equation and a size prediction equation for the next frame are obtained as follows:
namely, it is
Namely, it is
Order toSimplifying the two iterative state transfer equations and the covariance matrix updating equation as follows:
and (3-8) and (3-9) are used as iterative equations of a motion model and a size model, and Kalman filter prediction based on normal distribution is respectively carried out to obtain position prediction information of the (n +1) th frameAnd size prediction information
For any first segment trajectory X and second segment trajectory Y,anda forward velocity vector pointing from the head to the tail of the first trajectory X and a reverse velocity vector pointing from the tail to the head of the second trajectory Y, respectively;representing a motion process simulated by a kalman filter; f (X, Y) is a forward similarity score pointing from the tail of trajectory X to the head of trajectory Y;is an inverse similarity score pointing from the head of trajectory Y to the tail of trajectory X;
wherein A isM(X, Y) represents the similarity between the first segment of trajectory X and the second segment of trajectory Y.
4. The multi-target tracking method for semi-online machines according to claim 1, wherein the length of the time window is defined as N, and the minimum instantiation length of the short track is TmThe Kalman family is denoted by KFM and the kth detection box in the nth frame is denoted by KFM Representing detection boxes in KFMRepresenting an order in the corresponding patch trajectory;
if it is notIt means that the detection box has not cascaded with any fragmentation traces in the KFM, and x meansIs the (x +1) th member of a certain segment of the fragment track in the KFM, and the ith fragment track in the KFM is defined as TKiIf the length of the ith fragment track is greater than TmAnd its motion model, appearance model, size model are not updated in the nth frame, then the ith patch trajectory is instantiated as a reliable short trajectory STjOtherwise, the ith fragment track is disassembled;
5. the multi-target tracking method for the semi-online machine according to claim 1, wherein the specific process of splicing the detection frames with the similarity higher than the threshold value into the Kalman sequence spectrum is as follows:
first, finding out the detection frame KH of the n-th frame of pedestrian pictures: detection frame in n-th frame and n +1 frame picturesIn (1),for the detection frame in the image of the nth frame,finding out each pair of detection frames which possibly belong to the same target and are close in the IOU relation for the detection frames in the (n +1) th frame picture; if it is notThen will beAndrespectively labeled as 0 and 1, and willAndreferred to as a pair detection frame KH;
there will be several pairs KH in the nth and n +1 frames (e.g.,and);representing detection boxes in KFMRepresenting the order in the corresponding patch track, the ith detection box in the nth frame is denoted as
And step two, predicting:
predicting the position of the pedestrian target in the next frame of picture according to the motion module of each pedestrian target in the current (n +1) th frame of picture
Step three, track growth: according to the formula (3), selecting andthe detection frame with the most similar position isThen, orderAnd updating the instable TKiThe motion model and appearance model of (1);
using updated unstable trajectory TKiPredicting a position in a next frame using the motion model and the appearance model of (a);
the fourth step: repeating the process from the first step to the third step for tracking each frame;
fifthly, instantiation or backtracking: instantiating or backtracking the short track in KFM in the current frame according to the following conditions:
a) instantiation: if the unstable track TK in the Kalman sequence spectrumiLength, if not updated in the last frame, the unstable trajectory TKiInstantiated as a new reliable trajectory STj;
b) Backtracking: if the unstable track TK in the Kalman sequence spectrumiIs less than a threshold value TmAnd there is no update in the last frame, the unstable trajectory TK will be deleted in the Kalman family diagram KFMiAnd is provided withAnd marking the fragment track in the Kalman sequence spectrum as a forbidden route.
6. The multi-target tracking method for the semi-online mechanism according to claim 5, wherein the specific process of the second step is as follows: according toAndestablishing an unstable trajectory TKiAccording to which the prediction belongs to the trajectory TKiIs (n +2)thThe position of the frame, and defining the position as
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010754142.2A CN112116634B (en) | 2020-07-30 | 2020-07-30 | Multi-target tracking method of semi-online machine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010754142.2A CN112116634B (en) | 2020-07-30 | 2020-07-30 | Multi-target tracking method of semi-online machine |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112116634A true CN112116634A (en) | 2020-12-22 |
CN112116634B CN112116634B (en) | 2024-05-07 |
Family
ID=73799581
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010754142.2A Active CN112116634B (en) | 2020-07-30 | 2020-07-30 | Multi-target tracking method of semi-online machine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112116634B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112906533A (en) * | 2021-02-07 | 2021-06-04 | 成都睿码科技有限责任公司 | Safety helmet wearing detection method based on self-adaptive detection area |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101141633A (en) * | 2007-08-28 | 2008-03-12 | 湖南大学 | Moving object detecting and tracing method in complex scene |
CN103530894A (en) * | 2013-10-25 | 2014-01-22 | 合肥工业大学 | Video target tracking method based on multi-scale block sparse representation and system thereof |
CN103632376A (en) * | 2013-12-12 | 2014-03-12 | 江苏大学 | Method for suppressing partial occlusion of vehicles by aid of double-level frames |
CN104915970A (en) * | 2015-06-12 | 2015-09-16 | 南京邮电大学 | Multi-target tracking method based on track association |
CN105809714A (en) * | 2016-03-07 | 2016-07-27 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Track confidence coefficient based multi-object tracking method |
CN106096645A (en) * | 2016-06-07 | 2016-11-09 | 上海瑞孚电子科技有限公司 | Resist and repeatedly block and the recognition and tracking method and system of color interference |
WO2017185688A1 (en) * | 2016-04-26 | 2017-11-02 | 深圳大学 | Method and apparatus for tracking on-line target |
US20180232891A1 (en) * | 2017-02-13 | 2018-08-16 | Electronics And Telecommunications Research Institute | System and method for tracking multiple objects |
CN108447080A (en) * | 2018-03-02 | 2018-08-24 | 哈尔滨工业大学深圳研究生院 | Method for tracking target, system and storage medium based on individual-layer data association and convolutional neural networks |
CN109191497A (en) * | 2018-08-15 | 2019-01-11 | 南京理工大学 | A kind of real-time online multi-object tracking method based on much information fusion |
CN109919981A (en) * | 2019-03-11 | 2019-06-21 | 南京邮电大学 | A kind of multi-object tracking method of the multiple features fusion based on Kalman filtering auxiliary |
CN110135314A (en) * | 2019-05-07 | 2019-08-16 | 电子科技大学 | A kind of multi-object tracking method based on depth Trajectory prediction |
US20190295313A1 (en) * | 2018-03-21 | 2019-09-26 | Leigh Davies | Method and apparatus for masked occlusion culling |
CN110362715A (en) * | 2019-06-28 | 2019-10-22 | 西安交通大学 | A kind of non-editing video actions timing localization method based on figure convolutional network |
KR20200039043A (en) * | 2018-09-28 | 2020-04-16 | 한국전자통신연구원 | Object recognition device and operating method for the same |
US20200126241A1 (en) * | 2018-10-18 | 2020-04-23 | Deepnorth Inc. | Multi-Object Tracking using Online Metric Learning with Long Short-Term Memory |
KR20200061118A (en) * | 2018-11-23 | 2020-06-02 | 인하대학교 산학협력단 | Tracking method and system multi-object in video |
CN111242985A (en) * | 2020-02-14 | 2020-06-05 | 电子科技大学 | Video multi-pedestrian tracking method based on Markov model |
-
2020
- 2020-07-30 CN CN202010754142.2A patent/CN112116634B/en active Active
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101141633A (en) * | 2007-08-28 | 2008-03-12 | 湖南大学 | Moving object detecting and tracing method in complex scene |
CN103530894A (en) * | 2013-10-25 | 2014-01-22 | 合肥工业大学 | Video target tracking method based on multi-scale block sparse representation and system thereof |
CN103632376A (en) * | 2013-12-12 | 2014-03-12 | 江苏大学 | Method for suppressing partial occlusion of vehicles by aid of double-level frames |
CN104915970A (en) * | 2015-06-12 | 2015-09-16 | 南京邮电大学 | Multi-target tracking method based on track association |
CN105809714A (en) * | 2016-03-07 | 2016-07-27 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Track confidence coefficient based multi-object tracking method |
WO2017185688A1 (en) * | 2016-04-26 | 2017-11-02 | 深圳大学 | Method and apparatus for tracking on-line target |
CN106096645A (en) * | 2016-06-07 | 2016-11-09 | 上海瑞孚电子科技有限公司 | Resist and repeatedly block and the recognition and tracking method and system of color interference |
US20180232891A1 (en) * | 2017-02-13 | 2018-08-16 | Electronics And Telecommunications Research Institute | System and method for tracking multiple objects |
CN108447080A (en) * | 2018-03-02 | 2018-08-24 | 哈尔滨工业大学深圳研究生院 | Method for tracking target, system and storage medium based on individual-layer data association and convolutional neural networks |
US20190295313A1 (en) * | 2018-03-21 | 2019-09-26 | Leigh Davies | Method and apparatus for masked occlusion culling |
CN109191497A (en) * | 2018-08-15 | 2019-01-11 | 南京理工大学 | A kind of real-time online multi-object tracking method based on much information fusion |
KR20200039043A (en) * | 2018-09-28 | 2020-04-16 | 한국전자통신연구원 | Object recognition device and operating method for the same |
US20200126241A1 (en) * | 2018-10-18 | 2020-04-23 | Deepnorth Inc. | Multi-Object Tracking using Online Metric Learning with Long Short-Term Memory |
KR20200061118A (en) * | 2018-11-23 | 2020-06-02 | 인하대학교 산학협력단 | Tracking method and system multi-object in video |
CN109919981A (en) * | 2019-03-11 | 2019-06-21 | 南京邮电大学 | A kind of multi-object tracking method of the multiple features fusion based on Kalman filtering auxiliary |
CN110135314A (en) * | 2019-05-07 | 2019-08-16 | 电子科技大学 | A kind of multi-object tracking method based on depth Trajectory prediction |
CN110362715A (en) * | 2019-06-28 | 2019-10-22 | 西安交通大学 | A kind of non-editing video actions timing localization method based on figure convolutional network |
CN111242985A (en) * | 2020-02-14 | 2020-06-05 | 电子科技大学 | Video multi-pedestrian tracking method based on Markov model |
Non-Patent Citations (10)
Title |
---|
SEUNG-HWAN BAE 等,: "Confidence-Based Data Association and Discriminative Deep Appearance Learning for Robust Online Multi-Object Tracking", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 * |
SEUNG-HWAN BAE 等,: "Confidence-Based Data Association and Discriminative Deep Appearance Learning for Robust Online Multi-Object Tracking", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》, vol. 40, no. 3, 31 March 2018 (2018-03-31), pages 595 - 610 * |
STRONGERHUANG,: "深度分析卡尔曼滤波算法原理", 《HTTPS://MP.WEIXIN.QQ.COM/S/OSTYC-NA-GFJNCZ2XQQTDQ》 * |
STRONGERHUANG,: "深度分析卡尔曼滤波算法原理", 《HTTPS://MP.WEIXIN.QQ.COM/S/OSTYC-NA-GFJNCZ2XQQTDQ》, 24 June 2020 (2020-06-24), pages 1 - 18 * |
嵌入式ARM,: "深度解读:卡尔曼滤波,如此强大的工具 你值得弄懂!", 《嵌入式ARM》 * |
嵌入式ARM,: "深度解读:卡尔曼滤波,如此强大的工具 你值得弄懂!", 《嵌入式ARM》, 8 September 2019 (2019-09-08), pages 1 - 21 * |
慧天地,: "详解卡尔曼滤波原理", 《HTTPS://WWW.SOHU.COM/A/332038419_650579》 * |
慧天地,: "详解卡尔曼滤波原理", 《HTTPS://WWW.SOHU.COM/A/332038419_650579》, 7 August 2019 (2019-08-07), pages 1 - 24 * |
李明华 等,: "基于分层数据关联的在线多目标跟踪算法", 《现代计算机》 * |
李明华 等,: "基于分层数据关联的在线多目标跟踪算法", 《现代计算机》, vol. 2018, no. 5, 15 February 2018 (2018-02-15), pages 25 - 29 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112906533A (en) * | 2021-02-07 | 2021-06-04 | 成都睿码科技有限责任公司 | Safety helmet wearing detection method based on self-adaptive detection area |
CN112906533B (en) * | 2021-02-07 | 2023-03-24 | 成都睿码科技有限责任公司 | Safety helmet wearing detection method based on self-adaptive detection area |
Also Published As
Publication number | Publication date |
---|---|
CN112116634B (en) | 2024-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109800689B (en) | Target tracking method based on space-time feature fusion learning | |
WO2017150032A1 (en) | Method and system for detecting actions of object in scene | |
CN109919981A (en) | A kind of multi-object tracking method of the multiple features fusion based on Kalman filtering auxiliary | |
CN106709436A (en) | Cross-camera suspicious pedestrian target tracking system for rail transit panoramic monitoring | |
CN111311666A (en) | Monocular vision odometer method integrating edge features and deep learning | |
US20190378283A1 (en) | System and method for transforming video data into directional object count | |
CN108416780B (en) | Object detection and matching method based on twin-region-of-interest pooling model | |
CN109389086A (en) | Detect the method and system of unmanned plane silhouette target | |
Cheng et al. | A self-constructing cascade classifier with AdaBoost and SVM for pedestriandetection | |
CN111881749B (en) | Bidirectional people flow statistics method based on RGB-D multi-mode data | |
CN111862145A (en) | Target tracking method based on multi-scale pedestrian detection | |
CN112651995A (en) | On-line multi-target tracking method based on multifunctional aggregation and tracking simulation training | |
CN114220061B (en) | Multi-target tracking method based on deep learning | |
CN115841649A (en) | Multi-scale people counting method for urban complex scene | |
CN115731266A (en) | Cross-camera multi-target tracking method, device and equipment and readable storage medium | |
Han et al. | A method based on multi-convolution layers joint and generative adversarial networks for vehicle detection | |
CN111062971A (en) | Cross-camera mud head vehicle tracking method based on deep learning multi-mode | |
CN116152297A (en) | Multi-target tracking method suitable for vehicle movement characteristics | |
Wei et al. | Traffic sign detection and recognition using novel center-point estimation and local features | |
CN113763427A (en) | Multi-target tracking method based on coarse-fine shielding processing | |
CN115346155A (en) | Ship image track extraction method for visual feature discontinuous interference | |
CN114926859A (en) | Pedestrian multi-target tracking method in dense scene combined with head tracking | |
CN111882581A (en) | Multi-target tracking method for depth feature association | |
CN112116634A (en) | Multi-target tracking method of semi-online machine | |
CN111862147B (en) | Tracking method for multiple vehicles and multiple lines of human targets in video |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |