CN113052877A

CN113052877A - Multi-target tracking method based on multi-camera fusion

Info

Publication number: CN113052877A
Application number: CN202110299952.8A
Authority: CN
Inventors: 刘玉杰; 孙奉钰; 张玉鹏; 张敏杰; 李宗民
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2021-06-29

Abstract

The invention combines deep learning and computer vision algorithm, and particularly discloses a multi-target tracking method based on multi-camera fusion, which comprises the following steps: s1, detecting by a detector to obtain a target detection frame; s2, predicting the position of the target at the next moment through Kalman filtering; s3, cascade matching and IOU matching associating the prediction box with the detection box; s4, matching the predicted track with the current moment detection box by using a Hungarian algorithm; s5, correcting the match by the auxiliary view; s6, Kalman filter update. The method of the invention improves the ID exchange problem caused by shielding in multi-target detection through a multi-camera fusion technology.

Description

Multi-target tracking method based on multi-camera fusion

Technical Field

The invention combines deep learning and a computer vision algorithm, and particularly discloses a multi-target tracking method based on multi-camera fusion.

Background

Multi-target tracking is gaining increasing attention in computer vision due to its academic and commercial potential. Although there are a wide variety of approaches to dealing with this issue today, issues such as target overlap, dramatic appearance changes, etc., remain significant challenges it faces. How to more effectively solve the problems of target overlapping and the like has great significance for the application of multi-target tracking technology, and in order to solve all the problems, people put forward a wide range of solutions in the past decades

multi-Object Tracking (MOT or MTT) the main task is to locate Multiple objects of interest simultaneously in a given video, and to maintain their ID, record their trajectories. These objects may be pedestrians on the road, vehicles on the road, players on the playground, or groups of animals (birds, bats, ants, fish, cells, etc.), even different parts of a single object. Besides the challenges of single-target scale change, out-of-plane rotation, illumination change and the like, multi-target tracking also needs to deal with more complex key problems including: 1) frequent shielding; 2) track initialization and termination; 3) a similar appearance; 4) interaction among multiple targets.

The multi-target tracking algorithm with the highest attention in the industry is not SORT or DeepsORT. The two algorithms realize multi-target tracking through a matching detection process and a Kalman prediction and updating mode, but the problems of shielding and ID Switch cannot be well solved.

The method combines deep learning and computer vision methods, constructs a multi-target tracking algorithm based on deep SORT and integrated with multiple cameras, and can well solve the problems of shielding and ID exchange.

Disclosure of Invention

The invention aims to provide a multi-target tracking method based on multi-camera fusion, which adopts the following scheme:

a multi-target tracking method based on multi-camera fusion comprises the following steps:

s1, detecting by a detector to obtain a target detection frame;

s2, predicting the position of the target at the next moment through Kalman filtering;

s3, cascade matching and IOU matching associating the prediction box with the detection box;

s4, matching the predicted track with the current moment detection box by using a Hungarian algorithm;

s5, correcting the match through an auxiliary view;

s6, Kalman filtering updating;

further, in the step s1, a detection frame of the target to be detected is obtained through yolov4 target detection;

further, in step s2, the next time prediction is performed on the detected target by using the kalman filtering technique;

further, in step s3, the prediction block and the detection block are associated by cascade matching and IOU matching

Further, the specific association process is as follows:

s31, distance measurement is carried out on the detection box and the prediction box through the mahalanobis distance;

s32 measuring the appearance characteristics by cosine distance

s33 measuring degree of coincidence by IOU cross-correlation

Further, in the above step s4, the Hungarian algorithm is used to find the optimal match between the detection box and the prediction box.

Further, in the step s5, the matching is corrected by the auxiliary view angle

Further, the correction process is as follows:

s51, restoring the real scene of the matching box by camera calibration algorithm

s52, correcting the matching result by the auxiliary camera

s53, modifying and restoring the corrective result

Further, in step s6, the kalman filter is updated.

The invention has the following advantages:

according to the method, the multiple targets are tracked in real time in a detection and prediction mode through a deep neural network and a computer vision method, meanwhile, a scheme of using multiple cameras for assistance is provided aiming at inherent problems of shielding and the like existing in the multi-target tracking, the shielding position of a main camera is corrected through an auxiliary view angle, the problems of ID exchange and the like caused by shielding are solved, and the accuracy of multi-target detection is improved.

Drawings

FIG. 1 is a block diagram of a multi-target tracking method based on multi-camera fusion according to the present invention;

detailed description of the invention

The invention will be described in further detail with reference to the accompanying figure 1 and the following detailed description:

referring to fig. 1, a multi-target tracking method based on multi-camera fusion includes the following steps:

s1 detecting by the detector to obtain the target detection frame

In order to realize multi-target tracking in a robust mode, the method adopts a detection plus prediction mode, and achieves the detection work of tracking people from video frames by using yolov4 which is faster and more effective.

s2, predicting the position of the target at the next moment by Kalman filtering

The detection algorithm is used for detecting the position of a person in a picture, association of the same person between frames cannot be realized, prediction of the next moment is carried out on the person at the current moment through Kalman filtering, a reference basis is provided for matching association, and the specific calculation process is as follows:

predicting a trajectory for a current time based on previous trajectories

x′＝Fx

Wherein χ is the state vector of the trace at time t-1, F is the state transition matrix, and the formula predicts the state vector x' at time t:

wherein cx and cy respectively represent the abscissa and ordinate positions of the target center point, r represents the length-width ratio, h represents the height, and the other four are corresponding derivatives

s31, distance measurement of the detection box and the prediction box by mahalanobis distance:

d_jrepresents the position of the jth detection box, y_iRepresenting the predicted position of the i-th tracker to the target, S_iRepresenting the covariance matrix between the two frames, and calculating the distance between the frames to correlate the detection frame and the prediction frame, and certainly, besides the control on the distance, the correlation degree of the apparent features in the frames still needs to be looked at

s32 measuring the appearance characteristics by cosine distance

Extracting 128-dimensional feature vectors from the detection frame and the prediction frame through a neural network, regularizing the feature vectors to a hypersphere of a unit sphere through 12, and passing the regularization to a hypersphere of a unit sphere through

The cosine distance r between two eigenvectors is calculated_jCorresponding to the jth detected feature vector,

corresponding to the tracked feature vector, wherein the tracked feature vector is a set, and k times of feature vectors which are successfully tracked in the past are reserved

s33 measuring degree of coincidence by IOU cross-correlation

Wherein area (A) and area (B) are the areas of the intersection of the detection frame and the prediction frame, and rea (A) and area (B) are the areas of their phases.

s4, finding the optimal matching between the detection box and the prediction box by using the Hungarian algorithm;

the Hungarian algorithm is mainly used for solving the distribution problem and finding an optimal distribution, so that the cost for completing all tasks is minimum, and the method is called as a KM algorithm. The specific calculation process is as follows:

1. for each row of the matrix, the smallest element is subtracted

2. For each column of the matrix, the smallest element is subtracted

3. Covering all 0's in the matrix with a minimum of horizontal or vertical lines

4. If the number of lines is equal to N, the optimal allocation is found and the algorithm ends, otherwise step 55 is entered, the smallest element not covered by any line is found, each row not covered by a line subtracts this element, each column covered by a line adds this element, and step 3 is returned.

s5, correcting the match through an auxiliary view;

in order to better improve the ID exchange problem caused by the occlusion, a plurality of auxiliary cameras are used for carrying out auxiliary correction on the occlusion area.

Selecting key points through the visual angle of the camera and the real scene, and calculating a conversion matrix from the visual angle of the camera to the real scene, wherein the calculation process is as follows:

1. selecting four mark points as P in real scene₁、P₂、P₃、P₄The points of the two points are Q in the camera₁、Q₂、Q₃、Q₄Combining the elements to calculate P, Q

P＝[P₁，P₂，P₃]^TQ＝[Q₁，Q₂，Q₃]^T

2. Calculating weights

V＝(P^-1)^T*P₄ ^T

R＝(Q^-1)^T*Q₄ ^T

3. Computing transformation matrices

T′＝(Q^T*W*P)^T

s52, correcting the matching result by the auxiliary camera

And restoring the position in the camera to a real scene through the calibration matrix, and correcting the matching result of the main camera and the matching result of the auxiliary camera.

s53, modifying and restoring the corrective result

And revising the matching result again according to the auxiliary camera.

s6 Kalman Filter update

1. The measurement matrix H maps the mean vector x' of the trajectory to the detection space

2. Mapping the covariance matrix P' to the detection space, plus the noise matrix

S＝HP′H^T+R

3. Calculating a Kalman gain K for estimating the importance of the error

K＝P′H^TS^-1

4. Calculating the updated mean vector x and covariance matrix P

x＝x′+K_y

P＝(1-KH)P′

It should be understood, however, that the description herein of specific embodiments is not intended to limit the invention to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A multi-target tracking method based on multi-camera fusion is characterized by comprising the following steps:

s1, detecting by a detector to obtain a target detection frame;

s5, correcting the match through an auxiliary view;

s6, Kalman filter update.

2. The multi-camera fusion-based multi-target tracking method according to claim 1, wherein in the step s5, the matching target in the multi-target tracking is corrected and improved by a multi-view fusion method.

3. The multi-target tracking method based on multi-camera fusion as claimed in claim 1, wherein in the step s5, the multiple camera angles are restored to the real scene for correction through a camera calibration technique.