CN112116634A

CN112116634A - Multi-target tracking method of semi-online machine

Info

Publication number: CN112116634A
Application number: CN202010754142.2A
Authority: CN
Inventors: 刘龙军; 金焰明; 孙宏滨; 郑南宁
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2020-12-22
Anticipated expiration: 2040-07-30
Also published as: CN112116634B

Abstract

A multi-target tracking method of a semi-online mechanism is characterized in that a detection frame of a pedestrian or a moving target is obtained according to a pedestrian or moving target video, a Kalman sequence spectrum is obtained according to position change information among the detection frames in a period of time window, a pair of Kalman heads is found according to the Kalman sequence spectrum, a detection frame of the target or the moving object to be tracked in the next frame is obtained through the similarity of an appearance model, a moving model and a size change model, the target or the moving object is enabled to be located in the detection frame in the frame, and otherwise, the target is indicated to be lost; and splicing the detection frames with the similarity higher than the threshold value into the Kalman sequence spectrum, updating a motion model and an appearance model in the Kalman sequence spectrum, and tracking the pedestrian or moving object target in the next frame. The method is suitable for any track splicing type multi-target tracking algorithm, namely, the method is not limited by the constraint of different tracks generated by a plurality of targets such as pedestrians, moving objects and the like, the tracking precision can be effectively improved, and the identity conversion value is reduced.

Description

Multi-target tracking method of semi-online machine

Technical Field

The invention relates to a tracking method, in particular to a multi-target tracking method of a semi-online device.

Background

The multi-target tracking method is mainly applied to track tracking of a plurality of persons or moving objects in a video sequence shot by a camera: in the driving scene of the unmanned vehicle, real-time track tracking can be carried out and the motion track of the targets of pedestrians or other vehicles on the road can be predicted through the targets of the pedestrians or other vehicles shot by a camera arranged in the unmanned vehicle, so that the unmanned vehicle can implement effective avoidance or automatic driving decision according to the motion of the targets; in a plurality of cross-camera monitoring scenes, a plurality of pedestrians in the camera can be tracked according to requirements, and walking tracks and positioning of a plurality of pedestrian targets can be monitored through videos captured by different cameras; in a sports scene shot by the camera, such as a basketball game, the moving tracks of a plurality of athletes shot by the camera can be respectively tracked by a multi-target tracking method, and actions, behavior analysis and the like on the athlete field are carried out based on the tracked tracks. The multi-target tracking method can also be applied to tracking a plurality of targets such as enemy ships, vehicles and the like in military scenes. The current tracking methods are numerous, but in order to track efficiently, the multi-target tracking method needs to be prompted and optimized in real time, accuracy and the like.

MOT (multi-target tracking) can be mainly divided into online MOT and offline MOT, and the difference between the online MOT and the offline MOT is as follows: the former can be pushed backwards along with the number of real-time frames, a tracking track result can be given in time, and the real-time performance is higher than that of the latter on the whole, and each precision is relatively low; the latter must wait for the whole video sequence to complete the forward calculation, and then track after obtaining the information of detection frames and the like in all video frames, so that it is difficult to meet the real-time requirement compared with the former, but the accuracy is generally higher due to better combination of global information. On-line tracking requires that real-time trajectory tracking be completed immediately after the detection operation for each next frame is completed. Therefore, the online tracking algorithm intuitively has better real-time performance, but cannot effectively utilize the global information of the video, thereby possibly causing the precision to be reduced; in contrast, offline tracking is tracking a track after all frames of a given video sequence have been detected. The mode can well utilize global information, the tracking result is relatively accurate, and the real-time requirement cannot be met. The time receptive field sizes of the online tracking, the semi-online tracking and the offline tracking are respectively the current frame, the time window, the whole, and are increased in sequence; the real-time performance of the system is reduced in turn.

The occlusion problem has been one of the difficulties in MOT, and although the iterative update of various algorithms is very rapid, most of the algorithm performance is still difficult to maintain robust when severe occlusion is encountered. When an occlusion problem is encountered, either an online MOT or an offline MOT, or a MOT constructed using a deep learning method, various approaches have been made to attempt to solve the occlusion problem. But essentially by sacrificing real-time. The precision and the accuracy are very important in the scene of practical tracking application, for example, the poor real-time performance of a tracking algorithm in an unmanned automobile can lead to the delay of vehicle judgment, lead to the misjudgment or delayed decision and cause unnecessary traffic accidents; poor accuracy can lead to a plurality of targets to track in disorder, leads to tracking inefficacy, for example use multi-target tracking algorithm can lead to chasing away when tracking criminal suspects in many intelligent cameras in the city, or the non-suspects who track, cause real suspects to run away etc..

Disclosure of Invention

The invention aims to provide a semi-online multi-target tracking method.

In order to achieve the purpose, the invention is realized by the following technical scheme:

a multi-target tracking method of a semi-online mechanism is characterized in that detection frames of pedestrians or moving targets are obtained through a YOLO-V3 detector according to videos of the pedestrians or the moving targets, a Kalman sequence spectrum is obtained according to position change information among the detection frames in a period of time window, then a pair of Kalman heads are found according to the Kalman sequence spectrum, detection frames of targets or moving objects to be tracked in the next frame are obtained through similarity of an appearance model, a movement model and a size change model, the targets or the moving objects are enabled to be located in the detection frames in the frame, and otherwise, the targets are indicated to be lost; and splicing the detection frames with the similarity higher than the threshold value into the Kalman sequence spectrum, updating a motion model and an appearance model in the Kalman sequence spectrum, and tracking the pedestrian or moving object target in the next frame.

The invention is further improved in that the similarity of the appearance models is obtained by the following process:

at the n-th_thIn the frame video, the video is displayed,the size of the patch is fixed [64,128 ]]There are D check boxes, D patch, the Xth check box is shown as

The patch corresponding to the Xth detection frame is

At n_thIn the frame, crop and resize operations are carried out on the area where each detection frame is located to obtain D patches with the quantity equal to the fixed size of the detection frame, then the pixels of the D patches are respectively divided into a plurality of groups according to color intervals,

the matrix result reshape obtained by grouping is used as a one-dimensional vector Tsr_XA one-dimensional vector Tsr_XAs

Thereby obtaining an appearance model, and representing the appearance model of the xth detection box and the yth track as: f (X) and f (Y); finally, the appearance model is updated by vector fusion, represented as

The similarity of the appearance model is as the following formula (3-1);

in the formula: lambda^A(X, Y) represents the similarity of the appearance model.

The invention is further improved in that the similarity between the motion model and the size change model is obtained through the following processes: the time difference between adjacent frames is delta_tThe k-th target in the n-th frame is

Position center coordinates of

Is (x, y), the velocity vector corresponding to the coordinates is

Acceleration vector corresponding to coordinate

Is composed of

The size of the target corresponding to the detection frame

Is (w, h), corresponding to the dimensional change speed

Is composed of

Varying the driving force of

The detector impact factor is α;

motion state of kth object in nth frame

And dimensional state

Are respectively as

And

covariance matrix between element factors in motion state

The covariance matrix between the element factors in the size state is

According to the physical motion law, a position prediction equation and a size prediction equation for the next frame are obtained as follows:

namely, it is

Namely, it is

Order to

Simplifying the two iterative state transfer equations and the covariance matrix updating equation as follows:

and (3-8) and (3-9) are used as iterative equations of a motion model and a size model, and Kalman filter prediction based on normal distribution is respectively carried out to obtain position prediction information of the (n +1) th frame

And size prediction information

For any first segment trajectory X and second segment trajectory Y,

and

a forward velocity vector pointing from the head to the tail of the first trajectory X and a reverse velocity vector pointing from the tail to the head of the second trajectory Y, respectively;

representing a motion process simulated by a kalman filter; FX, Y) is a forward similarity score pointing from the tail of trace X to the head of trace Y;

is an inverse similarity score pointing from the head of trajectory Y to the tail of trajectory X;

wherein, Λ^M(X, Y) representsSimilarity between the first segment trajectory X and the second segment trajectory Y.

A further development of the invention consists in defining the length of the time window as N and the minimum instantiation length of the short trajectory as T_mThe Kalman family is denoted by KFM and the kth detection box in the nth frame is denoted by KFM

Representing detection boxes in KFM

Representing an order in the corresponding patch trajectory;

if it is not

It means that the detection box has not cascaded with any fragmentation traces in the KFM, and x means

Is the (x +1) th member of a certain segment of the fragment track in the KFM, and the ith fragment track in the KFM is defined as TK_iIf the length of the ith fragment track is greater than T_mAnd its motion model, appearance model, size model are not updated in the nth frame, then the ith patch trajectory is instantiated as a reliable short trajectory ST_jOtherwise, the ith fragment track is disassembled;

the further improvement of the invention is that the specific process of splicing the detection frames with similarity higher than the threshold value into the Kalman sequence spectrum is as follows:

first, finding out the detection frame KH of the n-th frame of pedestrian pictures: detection frame in n-th frame and n +1 frame pictures

In (1),

for the detection frame in the image of the nth frame,

finding out each pair of detection frames which possibly belong to the same target and are close in the IOU relation for the detection frames in the (n +1) th frame picture; if it is not

Then will be

And

respectively labeled as 0 and 1, and will

And

referred to as a pair detection frame KH;

there will be several pairs KH in the nth and n +1 frames (e.g.,

and

)；

representing detection boxes in KFM

Representing order in corresponding patch trajectoriesThe ith detection box in the nth frame is represented as

And step two, predicting:

predicting the position of the pedestrian target in the next frame of picture according to the motion module of each pedestrian target in the current (n +1) th frame of picture

Step three, track growth: according to the formula (3), selecting and

the detection frame with the most similar position is

Then, order

And updating the instable TK_iThe motion model and appearance model of (1);

using updated unstable trajectory TK_iPredicting a position in a next frame using the motion model and the appearance model of (a);

the fourth step: repeating the process from the first step to the third step for tracking each frame;

fifthly, instantiation or backtracking: instantiating or backtracking the short track in KFM in the current frame according to the following conditions:

a) instantiation: if the unstable track TK in the Kalman sequence spectrum_iLength, if not updated in the last frame, the unstable trajectory TK_iInstantiated as a new reliable trajectory ST_j；

b) Backtracking: if the unstable track TK in the Kalman sequence spectrum_iIs less than a threshold value T_mAnd there is no update in the last frame, the unstable trajectory TK will be deleted in the Kalman family diagram KFM_iAnd is provided with

And marking the fragment track in the Kalman sequence spectrum as a forbidden route.

The invention has the further improvement that the specific process of the second step is as follows: according to

And

establishing an unstable trajectory TK_iAccording to which the prediction belongs to the trajectory TK_iIs (n +2)_thThe position of the frame, and defining the position as

Compared with the prior art, the invention has the following beneficial effects:

firstly, the method comprises the following steps: the method is suitable for any track splicing type multi-target tracking algorithm, namely, the method is not limited by the constraint of different tracks generated by a plurality of targets such as pedestrians, moving objects and the like, the tracking precision can be effectively improved, and the identity conversion value is reduced;

secondly, the method comprises the following steps: the generated tracking result can be checked, and the wrong tracking result is corrected, so that the algorithm is more robust, for example, when a target positioning error in a current video frame is caused in the tracking of a pedestrian target, namely when the tracking result of the on-line multi-target tracking algorithm is wrong, the error can be detected through a backtracking module in a time window of the method, so as to correct the tracking track;

thirdly, the method comprises the following steps: by carrying out mask covering (IOU) on intersection areas among the targets, the discrimination among the multiple targets is effectively improved at the cost of extremely small calculated amount, so that the problem of target feature disappearance caused by serious shielding in crowded places such as shopping malls, station intersections and the like can be effectively solved, and the feature discrimination of incompletely shielded targets and shielded targets is effectively improved;

fourthly: the invention can use the global information in the existing time window to check and correct the error tracking result in a certain time under the condition of meeting the requirement of real-time property. The method has very robust results in various extreme scenes, and has good adaptability to other algorithms similar to short-track splicing.

Drawings

FIG. 1 is a flowchart of a backtracking algorithm of the present invention.

FIG. 2 is an overall algorithm flow diagram of the present invention.

FIG. 3 is a schematic diagram of an IOU mask module according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of appearance model establishment according to an embodiment of the present invention.

Figure 5 shows a comparison of MOT2015 algorithm performance.

FIG. 6 shows a comparison of the FPS for the MOT2015 algorithms.

Detailed Description

The present invention will be described in detail with reference to the accompanying drawings.

The invention adopts MOT of semi-on-line mechanism, and the method can make good compromise and optimization in the aspects of real-time performance and precision.

Referring to fig. 1, the specific process of the present invention is: a pedestrian or moving object video shot by a camera is detected by a YOLO-V3 detector to obtain a detection frame of the pedestrian or moving object, namely the frame of the pedestrian or moving object is taken to exclude other objects or backgrounds. Acquiring a Kalman sequence spectrum according to position change information among detection frames in a time window by acquiring a video of a period of time, then finding a pair of Kalman heads (Kalman Head, KH) according to the Kalman sequence spectrum, acquiring a detection frame of a target or a moving object to be tracked in the next frame according to the similarity of an appearance model, a motion model and a size change model, and enabling the target or the moving object to be always in the detection frame in the frame, otherwise, indicating that the target is lost. And splicing the detection frames with the similarity higher than the threshold value into the Kalman sequence spectrum, and updating the motion model and the appearance model in the Kalman sequence spectrum for tracking the pedestrian or moving object target in the next frame.

Wherein, the similarity of the appearance model is obtained through the following processes:

at the n-th_thIn a frame, the size of the patch is fixed [64,128 ]]The number of groups of pixel histogram is 64, there are D detection boxes, D patch, and the Xth detection box is shown as

The patch corresponding to the Xth detection frame is

Then, at n_thIn a frame, the invention performs crop and resize operations on the region in which each detection box is located (e.g., the invention crops and resizes each patch to a shape such as [64,128 ]]Tensor of). After these operations, the invention obtains a number of D patches equal to the fixed size of the detection box. The invention then divides the pixels of the D patches into groups (e.g. 64 groups) each by color interval,

and the matrix result reshape obtained by grouping is used as a one-dimensional vector Tsr_XThat is, the present invention obtains a 1 × 144 tensor reshaped from a 3 × 64 tensor obtained by color interval grouping. Then the invention converts Tsr_XAs

Is used to represent the vector. Binding [12]The invention obtains an appearance model, and expresses the appearance model of the X-th detection frame and the Y-th track as: f (X) and f (Y). Finally, the present invention updates the appearance model through vector fusion, which can be expressed as

Therefore, the appearance similarity can be obtained as the following formula (3-1).

In the formula: lambda^A(X, Y) represents the similarity of the appearance model. The method is an effective way for enhancing the discrimination between the targets when the relation between the track and the detection frame is complex and the single physical motion model cannot be used.

The similarity between the motion model and the size change model is obtained through the following processes: the time difference between adjacent frames is delta_tThe k-th target in the n-th frame is

Position center coordinates of

Is (x, y), the velocity vector corresponding to the coordinates is

Acceleration vector corresponding to coordinate

Is composed of

The size of the target corresponding to the detection frame

Is (w, h), corresponding to the dimensional change speed

Is composed of

Varying the driving force of

The detector impact factor is a (the higher the mIOU of the detector, the higher the value, the default is 0.7).

Motion state of kth object in nth frame

And dimensional state

Are respectively as

And

covariance matrix between element factors in motion state

The covariance matrix between the element factors in the size state is

namely, it is

Namely, it is

Order to

And size prediction information

For any two sections of the trajectory X and Y,

and

a forward velocity vector pointing from the head to the tail of trajectory X and a reverse velocity vector pointing from the tail to the head of trajectory Y, respectively (as derived from equations (3-10) and (3-11)).

Representing the course of motion simulated by a kalman filter. FX, Y) is directed from the tail of track X to trackA forward similarity score of the head of Y;

is the inverse semblance score from the head of track Y to the tail of track X. The similarity is as follows:

the overall similarity can be represented by (3-12), Λ^M(X, Y) represents the similarity between the first-stage trajectory X and the second-stage trajectory Y calculated by the equations (3-10) and (3-11). Lambda^MThe value range of (X, Y) is [0,1 ]]，Λ^MThe closer the value of (X, Y) is to 1, the more likely the first segment trajectory X and the second segment trajectory Y belong to the same target in the simulation process of the physical movement of the model, which is an important basis for judging the relationship between the fragment trajectories.

Trajectory confidence can be intuitively understood as the degree of match between the constructed trajectory and the true trajectory of the object. Trace conf (T)_i) The confidence of (c) can be represented by the formula (3-13).

In the formula:

representing the average similarity between the various detections in the existing trace,

indicating railTrace T_iTwo detection frames in

And

representing the continuity of the trajectory, α is the number of frames the object is missing, β is a control parameter related to the accuracy of the detector (default to 0.4).

The video sequence selected by the semi-online mechanism on the time axis is positioned between the online mechanism and the offline mechanism, and the performance of the video sequence is a good compromise between the online mechanism and the offline mechanism, but the semi-online tracking mechanism can be well optimized on the aspects of real-time performance and accuracy through algorithm optimization, such as shielding optimization, semantic segmentation optimization and the like.

The present invention defines the length of the time window as N. The minimum instantiation length of the short track is T_m. The kalman family diagram may be denoted KFM, for recording the detection relationship between the motion model and the appearance model. Note that the kth detection box in the nth frame is represented as

It also contains the coordinates and reliabilities detected in the list: [ x, y, w, h, conf](ii) a For the invention

Representing detection boxes in KFM

Representing the order in the corresponding patch trajectory. It can be represented by the following mathematical expression:

if it is not

It means that the detection box has not cascaded with any fragmentation traces in the KFM. x represents

Is the (x +1) th member of a certain segment of fragment trace in the KFM. The ith fragment track in KFM is defined as TK_iIf it is longer than T_mAnd its motion model, appearance model, size model are not updated in the nth frame, it will be instantiated as a reliable short trajectory ST_jOtherwise, the trace would be disassembled.

The invention will take the situation of the pedestrian picture of the nth frame as an example, and introduce the short track tracking process and the track backtracking strategy in the invention, as shown in fig. 2:

In (1),

for the detection frame in the image of the nth frame,

for the detection frames in the (n +1) th frame picture, each pair of detection frames close in the IOU relationship, which may belong to the same target, is found. If it is not

Then will be

And

respectively labeled as 0 and 1, and will

And

referred to as a pair detection frame KH.

After this step, the present invention will have several pairs KH in the nth and n +1 frames (e.g.,

and

)。

representing detection boxes in KFM

Representing the order in the corresponding patch track, the ith detection box in the nth frame is denoted as

And step two, predicting:

and predicting the positions of the pedestrian objects in the next frame picture according to the motion module of each pedestrian object in the current (n +1) th frame picture. The specific process is as follows:

according to

And

establishing an unstable trajectory TK_iThe motion model of (1). According to the motion model, the TK belonging to the trajectory is predicted_iIs (n +2)_thThe position of the frame, and defining the position as

Step three, track growth: selecting and matching the matching strategy according to the formula (3)

The detection frame with the most similar position is

Then, order

And updating the instable TK_iMotion model and appearance model.

Using updated unstable trajectory TK_iThe motion model and the appearance model of (c) predict a position in the next frame.

The fourth step: the process of the first step to the third step is repeated for tracking of each frame.

Fifthly, instantiation or backtracking: the KFM (e.g., TK) in the current frame is determined according to the following condition₀,TK₁,…,TK_i) Instantiating or backtracking the short track in (1):

a) instantiation: if the unstable track TK in the Kalman sequence spectrum_iLength, if not updated in the last frame, the unstable trajectory TK_iInstantiated as a new reliable trajectory ST_j。

For example, an unstable trajectory TK_iIf the length is greater than or equal to the threshold value T_mThe trace is then a reliable trace. Threshold value T_mAccording to actual conditions, 5 is generally adopted.

And marking the fragment track in the Kalman sequence spectrum as a forbidden route so as to avoid the path reappearing by later exploration.

The IOU mask module is adopted by the invention to process the condition that two or more targets are mutually shielded, and the process is as follows. As shown in fig. 3, a scene with objects occluded from each other is shown. When the target A and the target B are mutually shielded, before the characteristics are extracted from the detection frame area, the IOU area between the A and the B is used as a mask to cover the pixel information of the IOU area, so that related targets are prevented from sharing the characteristic information of the IOU area, and the distinguishing degree of different target appearance models is effectively improved. However, when a plurality of targets block each other, it is easy to cause the detection area of the target to be almost completely covered by the plurality of IOU masks, thereby causing a phenomenon that the appearance features of the blocked target are completely covered. To avoid this, the present invention sets thresh_IOUTo avoid the worst case.

Referring to FIG. 3, in the nth frame, the kth detection frame is marked

A set of IOU masks between each detection box is

For the kth detection box, assume that there is a set of detection boxes

All the detection frames in (1) are as follows

If the shielding phenomenon occurs in the area of the detection frame, the detection frame is collected

Inner detection box, mask IOU

Is recorded as the total occlusion merge area

If it is covered by

After covering

Has a residual area of

Then is formed by

Covering

Is represented by the process of

If obtained, is

Is less than a preset threshold value Thres^IMI.e. by

Can lead to the appearance characteristics of the targetIt is difficult to express in the appearance model, therefore, the invention adopts the method of detecting the box set

Inner detection frame and

sorting the areas of the shielded areas, and then sequentially rejecting the detection frames with the smallest shielded area one by one in the detection frame set

In addition, a new set of detection boxes is obtained

And collecting new detection boxes

Substituting the formula (4) and the formula (5) again to calculate until

Then, the final IOU mask is obtained and is used

And when the IOU mask module is used for extracting the appearance characteristics, setting the pixel value of the original image area where the IOU mask module is positioned as zero: and the shielded target are intersected in an area through shielding, so that the feature discrimination between the targets is increased.

The following are specific examples.

The time window length is first set to 40 frames. The video and the detection frame of each frame are used as input, feature extraction is carried out, the patch corresponding to the detection frame obtained after cutting and resize is subjected to appearance representation in a pixel histogram grouping mode, as shown in fig. 3, and an appearance model is established. The appearance model is built by the following process: at the n-th_thIn a frame, the size of patch is fixed to [64,128 ]]The number of pixel histogram groups is 64, and there are D detection boxes, D patch, and the Xth detection box in the frameIs composed of

The patch corresponding to the Xth detection frame is

At the n-th_thWithin the frame, crop and resize operations are performed on the regions of each inspection box (e.g., the invention crops and resizes each patch to a shape such as [64,128 ]]Tensor of). After these operations, the invention obtains a number of D patches equal to the fixed size of the detection box.

Then, the invention divides the pixels of the D patches into a plurality of groups (such as 64 groups) according to the color interval, and the matrix result reshape obtained by grouping is a one-dimensional vector Tsr_X. That is, the present invention obtains a 1 × 192 tensor reshaped from a 3 × 64 tensor obtained by color interval grouping. Then the invention converts the one-dimensional vector Tsr_XPatch corresponding to the Xth detection frame

The appearance model of (1).

And combining the appearance model, and expressing the appearance model of the X detection frame and the appearance model of the Y track as follows: f (X) and f (Y). Finally, the present invention updates the appearance model through vector fusion, which can be expressed as

Therefore, the appearance similarity can be obtained as shown in formula (7).

Through the steps, fragment tracks with greatly improved reliability can be obtained, and noises appearing in detection can be effectively checked, as shown in table 1.

TABLE 1 comparison of the results of the algorithms at MOT15

Referring to fig. 5 and 6, the present invention combines the advantages of online and offline MOT at the expense of a small amount of real-time performance, and has good precision improvement on MOTA, MOTP, IDS, ML, MT, and FM, and a proper balance between real-time performance and accuracy is achieved.

Compared with baseline, on a data set MOT2015, generally, besides fps, a plurality of indexes are basically better to perform, wherein MOTA and MOTP are respectively improved by 12.6 percent and 6.3 percent, which shows that compared with baseline, the algorithm of the invention has very large improvement on continuous tracking capability of a target on the whole, and simultaneously shows that short track fragments generated by the algorithm of the invention are more accurate and robust; the algorithm of the present invention reduces 82 on IDS, with less boost from the overall number of identity translations. The algorithm of the invention has a higher MT value and a lower ML value, which shows that to a certain extent, the more robust short track can reduce the number of frames which are missed among partial fragment tracks. From the overall performance among the algorithms, the MOTA bit column of the algorithm is the first, and other indexes such as MOTP, Recall and IDS are far better than the average value overall, which means that the algorithm has stronger stability and generalization capability on the algorithm framework. It is particularly noted that the backtracking mechanism does not include any module involving complex computation, and only relies on the simple side appearance model and the motion model to perform the tracking process through the state between online and offline, so that the algorithm of the present invention has an extremely significant FPS advantage over other algorithms in the table.

The invention adopts a semi-online mechanism to optimize the real-time performance and the accuracy of the multi-target tracking method. The method can detect and correct the error of the established tracking result, effectively improve the appearance characteristic degree of the target, has high speed and low calculation resource requirement, can be used on special integrated circuits such as Yingdada TX2 and the like in the scenes of automatic driving, pedestrian tracking and the like, and effectively solves the problem that the real-time performance and the algorithm precision such as MOTA (motion over adaptive tracking) indexes of the existing multi-target tracking algorithm are difficult to achieve the optimal performance at the same time.

Claims

1. A multi-target tracking method of a semi-online mechanism is characterized in that a detection frame of a pedestrian or a moving target is obtained through a YOLO-V3 detector according to a pedestrian or moving target video, a Kalman sequence spectrum is obtained according to position change information among the detection frames in a period of time window, then a pair of Kalman heads is found according to the Kalman sequence spectrum, a detection frame of the target or the moving object to be tracked in the next frame is obtained through similarity of an appearance model, a moving model and a size change model, the target or the moving object is enabled to be located in the detection frame in the frame, and if not, the target is indicated to be lost; and splicing the detection frames with the similarity higher than the threshold value into the Kalman sequence spectrum, updating a motion model and an appearance model in the Kalman sequence spectrum, and tracking the pedestrian or moving object target in the next frame.

2. The multi-target tracking method for the semi-online mechanism according to claim 1, wherein the similarity of the appearance model is obtained through the following processes:

at the n-th_thIn frame video, the size of patch is fixed to [64,128 ]]There are D check boxes, D patch, the Xth check box is shown as

The patch corresponding to the Xth detection frame is

will be grouped intoThe resulting matrix result reshape is the one-dimensional vector Tsr_XA one-dimensional vector Tsr_XAs

The similarity of the appearance model is as the following formula (3-1);

3. The multi-target tracking method for the semi-online mechanism according to claim 1, wherein the similarity between the motion model and the size change model is obtained through the following processes: the time difference between adjacent frames is delta_tThe k-th target in the n-th frame is

Position center coordinates of

Is (x, y), the velocity vector corresponding to the coordinates is

Acceleration vector corresponding to coordinate

Is composed of

The size of the target corresponding to the detection frame

Is (w, h), corresponding to the dimensional change speed

Is composed of

Varying the driving force of

The detector impact factor is α;

motion state of kth object in nth frame

And dimensional state

Are respectively as

And

covariance matrix between element factors in motion state

The covariance matrix between the element factors in the size state is

namely, it is

Namely, it is

Order to

And size prediction information

For any first segment trajectory X and second segment trajectory Y,

and

representing a motion process simulated by a kalman filter; f (X, Y) is a forward similarity score pointing from the tail of trajectory X to the head of trajectory Y;

wherein A is^M(X, Y) represents the similarity between the first segment of trajectory X and the second segment of trajectory Y.

4. The multi-target tracking method for semi-online machines according to claim 1, wherein the length of the time window is defined as N, and the minimum instantiation length of the short track is T_mThe Kalman family is denoted by KFM and the kth detection box in the nth frame is denoted by KFM

Representing detection boxes in KFM

Representing an order in the corresponding patch trajectory;

if it is not

5. the multi-target tracking method for the semi-online machine according to claim 1, wherein the specific process of splicing the detection frames with the similarity higher than the threshold value into the Kalman sequence spectrum is as follows:

In (1),

for the detection frame in the image of the nth frame,

Then will be

And

respectively labeled as 0 and 1, and will

And

referred to as a pair detection frame KH;

there will be several pairs KH in the nth and n +1 frames (e.g.,

and

)；

representing detection boxes in KFM

And step two, predicting:

Step three, track growth: according to the formula (3), selecting and

the detection frame with the most similar position is

Then, order

And updating the instable TK_iThe motion model and appearance model of (1);

6. The multi-target tracking method for the semi-online mechanism according to claim 5, wherein the specific process of the second step is as follows: according to

And