CN114638855A

CN114638855A - Multi-target tracking method, equipment and medium

Info

Publication number: CN114638855A
Application number: CN202210072211.0A
Authority: CN
Inventors: 姚诚达
Original assignee: Shandong Huichuang Information Technology Co ltd
Current assignee: Shandong Huichuang Information Technology Co ltd
Priority date: 2022-01-21
Filing date: 2022-01-21
Publication date: 2022-06-17

Abstract

The embodiment of the application discloses a multi-target tracking method, equipment and medium. Acquiring the characteristic association degree between an image corresponding to a detection target and an image corresponding to a track target, and establishing a first association cost matrix based on the characteristic association degree; acquiring coordinate position information of a current frame tracking track prediction frame, and establishing a second associated cost matrix based on the coordinate position information of the tracking track prediction frame and the coordinate position information of a detection frame; based on the first incidence cost matrix and the second incidence cost matrix, cascade matching is carried out on the detection target and the track target determined by the state, and a cascade matching set, an unmatched track target and an unmatched detection target are obtained; and performing IOU matching based on the unmatched track target and the unmatched detection target to obtain an IOU matching set so as to realize multi-target tracking according to the cascade matching set and the IOU matching set. By the method, the association accuracy between the detection target and the tracking target is improved.

Description

Multi-target tracking method, equipment and medium

Technical Field

The present application relates to the field of target detection technologies, and in particular, to a multi-target tracking method, device, and medium.

Background

Multi-target tracking is an important research topic in the field of automated driving. The target tracking is based on target detection, the motion state of a target is continuously estimated, and the problem of unstable detection performance of a sensor is solved. And the target detection information at continuous time is provided for modules of intelligent vehicle such as decision, control and execution, and the autonomous vehicle can acquire the speed of the vehicle in the visual field and make corresponding movement plans through accurate and effective tracking.

The current tracking algorithm only depends on the appearance information of an image or the three-dimensional position information of a laser radar, and independently tracks a target, so that the tracking precision is low. And because the vehicle can meet the shielded site of the target vehicle in the running process, the ID of the tracking target is frequently switched, so that the correlation accuracy between the detection target and the tracking target is low.

Disclosure of Invention

The embodiment of the application provides a multi-target tracking method, equipment and a medium, which are used for solving the following technical problems: in the running process of the vehicle, a shielded site of the target vehicle can be met, so that the ID of the tracking target is frequently switched, and the correlation accuracy between the detection target and the tracking target is low.

The embodiment of the application adopts the following technical scheme:

the embodiment of the application provides a multi-target tracking method. Inputting an image corresponding to a current frame detection target into a preset depth feature descriptor to obtain a feature correlation degree between the image corresponding to the detection target and an image corresponding to a track target, and establishing a first correlation cost matrix based on the feature correlation degree; acquiring coordinate position information of a current frame tracking track prediction frame according to coordinate position information of a previous frame tracking track frame and a Kalman filter, and establishing a second associated cost matrix based on the coordinate position information of the tracking track prediction frame and the coordinate position information of a detection frame; based on the first incidence cost matrix and the second incidence cost matrix, cascade matching is carried out on the detection target and the track target determined by the state, and a cascade matching set, an unmatched track target and an unmatched detection target are obtained; and performing IOU matching based on the unmatched track target and the unmatched detection target to obtain an IOU matching set so as to realize multi-target tracking according to the cascade matching set and the IOU matching set.

According to the embodiment of the application, the matching accuracy between the detection target and the tracking target is improved by inputting multi-mode information and utilizing the appearance characteristics of the image and the three-dimensional position information of the point cloud. For the problem of long-time blocking of the target, the cosine distance based on the appearance characteristics is used for completing the association of the motion state of the target, when vehicles with similar appearances are encountered, the mahalanobis distance based on the three-dimensional motion state information completes the association of the motion state of the target, and the two measurement modes are combined, so that the problems of false detection, missing detection and the like when one condition fails can be effectively avoided, and the problem of frequent switching of the ID of the tracked target caused by the blocking problem is effectively reduced. In addition, the embodiment of the application improves the matching efficiency between the detection target sequence and the tracking target sequence as much as possible and reduces the occurrence of the missing matching phenomenon through a two-stage matching algorithm.

In an implementation manner of the present application, based on the first and second associated cost matrices, cascade matching is performed on a detection target and a track target determined by a state, so as to obtain a cascade matching set, an unmatched track target, and an unmatched detection target, which specifically includes: initializing a plurality of detection targets into an unmatched target set and a plurality of track targets into a track set; based on the first and second associated cost matrixes respectively corresponding to the unmatched target set, the track set and the preset maximum loss frame number, cascade matching is performed to obtain an initial matching set, unmatched track targets and unmatched detection targets.

In an implementation manner of the present application, based on a first associated cost matrix and a second associated cost matrix of an unmatched target set, a track set, and a preset maximum number of lost frames, a cascade matching is performed to obtain an initial matching set, an unmatched track target, and an unmatched detection target, which specifically includes: determining a first sub-track set with the number of lost frames being 0 in the track set, and taking the first sub-track set, a first association cost matrix and a second association cost matrix of an unmatched target set as the input of a Hungarian algorithm to obtain a first matched set and a first unmatched target set; determining a second sub-track set with the number of lost frames being K in the track set, and taking the second sub-track set, a first association cost matrix and a second association cost matrix of the first unmatched target set as the input of the Hungarian algorithm to obtain a second matched set and a second unmatched target set; gradually increasing the value K until the value K reaches a preset maximum loss frame number, and determining an unmatched track target and an unmatched detection target corresponding to the preset maximum loss frame number; and taking the matching sets corresponding to different loss frame numbers as initial matching sets.

In an implementation manner of the present application, before performing cascade matching on a detection target and a track target determined by a state based on a first associated cost matrix and a second associated cost matrix, the method further includes: acquiring the times of track target detection, and dividing the track target into determined track targets when the times are greater than a preset time threshold value so as to perform cascade matching on the determined track targets; and acquiring the unmatched track target to perform IOU matching on the unmatched track target.

In an implementation manner of the present application, the IOU matching is performed based on the unmatched track target and the unmatched detection target, so as to obtain an IOU matching set, which specifically includes: establishing an IOU incidence matrix based on the unmatched track target and the unmatched detection target; and taking the IOU incidence matrix as the input of the Hungarian algorithm to obtain the matching relation between the unmatched track targets and the unmatched detection targets respectively.

According to the method and the device, the cost incidence matrix of the detection target and the track target is obtained through the Ma distance and the cosine distance, the cost incidence matrix is used as the input of the Hungarian algorithm, the algorithm output is the optimal matching relation between the detection target and the track target, and therefore the target matching problem between the front frame and the rear frame is completed. According to the embodiment of the application, the matching efficiency between the detection target sequence and the tracking target sequence is improved as much as possible through a two-stage matching algorithm, and the occurrence of a missing matching phenomenon is reduced.

In an implementation manner of the present application, obtaining coordinate position information of a current frame tracking track prediction frame according to coordinate position information of a previous frame tracking track frame and a kalman filter specifically includes: obtaining a prediction state vector corresponding to the tracking track of the current frame through a prediction model in a Kalman filter and a state vector corresponding to the tracking track of the previous frame; acquiring measurement noise corresponding to a tracking track through a preset sensor; and updating the predicted state vector through a measurement model and measurement noise in the Kalman filter to obtain a final state vector corresponding to the tracking track of the current frame.

In an implementation manner of the present application, establishing a second associated cost matrix based on the coordinate position information of the prediction frame and the coordinate position information of the detection frame specifically includes: acquiring coordinate position information of a detection frame corresponding to a current frame detection target according to the three-dimensional point cloud information; and establishing a second associated cost matrix for the coordinate position information of the detection frame corresponding to the detection target of the current frame based on the coordinate position information of the detection frame corresponding to the detection target of the current frame and the Mahalanobis distance between the coordinate position information of the prediction frame corresponding to the tracking track of the image of the current frame.

In an implementation manner of the present application, before inputting an image corresponding to a current frame detection target into a preset depth feature descriptor, the method further includes: establishing a corresponding tracking target manager based on each track target; track information recording is carried out on different track targets through a tracking target manager; the track information at least comprises one item of motion state information of the track target, characteristic information of the track target, state information of the track target and state parameter information of the track target.

The embodiment of the application provides a multi-target tracking device, includes: at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to: inputting an image corresponding to a current frame detection target into a preset depth feature descriptor to obtain a feature correlation degree between the image corresponding to the detection target and an image corresponding to a track target, and establishing a first correlation cost matrix based on the feature correlation degree; acquiring coordinate position information of a current frame tracking track prediction frame according to coordinate position information of a previous frame tracking track frame and a Kalman filter, and establishing a second associated cost matrix based on the coordinate position information of the tracking track prediction frame and the coordinate position information of a detection frame; based on the first incidence cost matrix and the second incidence cost matrix, cascade matching is carried out on the detection target and the track target determined by the state, and a cascade matching set, an unmatched track target and an unmatched detection target are obtained; and performing IOU matching on the basis of the unmatched track target and the unmatched detection target to obtain an IOU matching set, so as to realize multi-target tracking according to the cascade matching set and the IOU matching set.

A non-volatile computer storage medium provided in an embodiment of the present application stores computer-executable instructions, and the computer-executable instructions are configured to: inputting an image corresponding to a current frame detection target into a preset depth feature descriptor to obtain a feature association degree between the image corresponding to the detection target and an image corresponding to a track target, and establishing a first association cost matrix based on the feature association degree; acquiring coordinate position information of a current frame tracking track prediction frame according to the coordinate position information of the previous frame tracking track frame and a Kalman filter, and establishing a second associated cost matrix based on the coordinate position information of the tracking track prediction frame and the coordinate position information of a detection frame; based on the first incidence cost matrix and the second incidence cost matrix, cascade matching is carried out on the detection target and the track target determined by the state, and a cascade matching set, an unmatched track target and an unmatched detection target are obtained; and performing IOU matching on the basis of the unmatched track target and the unmatched detection target to obtain an IOU matching set, so as to realize multi-target tracking according to the cascade matching set and the IOU matching set.

The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects: the embodiment of the application utilizes the appearance characteristics of the image and the three-dimensional position information of the point cloud to improve the matching accuracy between the detection target and the tracking target through the input of multi-mode information. For the problem of long-time shielding of the target, the cosine distance based on the appearance characteristics is used for completing the association of the motion state of the target, when vehicles with similar appearances are encountered, the mahalanobis distance based on the three-dimensional motion state information completes the association of the motion state of the target, and the two measurement modes are combined, so that the problems of false detection, missed detection and the like when one condition fails can be effectively avoided, and the problem of frequent switching of the ID of the tracked target caused by the shielding problem is effectively reduced. In addition, the embodiment of the application improves the matching efficiency between the detection target sequence and the tracking target sequence as much as possible and reduces the occurrence of the missing matching phenomenon through a two-stage matching algorithm.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort. In the attached figures:

fig. 1 is a flowchart of a multi-target tracking method according to an embodiment of the present disclosure;

FIG. 2 is a block diagram of a multi-target tracking algorithm flow provided by an embodiment of the present application;

fig. 3 is a diagram of a track state transition relationship provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a constant velocity model provided by an embodiment of the present application;

fig. 5 is a block diagram of a kalman filtering algorithm according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of target tracking at different times according to an embodiment of the present application;

fig. 7 is a schematic diagram of point cloud tracking at different time points according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a calculated tracking target trajectory and a calculated heading angle according to an embodiment of the present application;

FIG. 9 is a tracking effect diagram in a daytime scene according to an embodiment of the present application;

fig. 10 is a tracking effect diagram in a night scene according to an embodiment of the present application;

fig. 11 is a road testing diagram provided in an embodiment of the present application;

FIG. 12 is a tracking trajectory and course angle diagram corresponding to a road test under a daytime scene according to an embodiment of the present application;

fig. 13 is a tracking trajectory and course angle diagram corresponding to a road test in an night scene according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of a multi-target tracking device according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides a multi-target tracking method, equipment and medium.

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present disclosure without making any creative effort, shall fall within the protection scope of the present disclosure.

The technical solutions proposed in the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a flowchart of a multi-target tracking method according to an embodiment of the present application. As shown in fig. 1, the multi-target tracking method includes the following steps:

s101, the multi-target tracking equipment inputs an image corresponding to a current frame detection target into a preset depth feature descriptor to obtain a feature association degree between the image corresponding to the detection target and an image corresponding to a track target, and a first association cost matrix is established based on the feature association degree.

In one embodiment of the present application, a respective tracking target manager is established based on each of the trajectory targets. And carrying out track information recording on different track targets through a track target manager. The track information at least comprises one of motion state information of the track target, feature information of the track target, state information of the track target and state parameter information of the track target.

In particular, each track has a lot of abundant information, so related data structures need to be established to sort and store different tracks so as to match with a detection target and output a tracking result.

Further, the tracking target manager mainly has the following functions:

(1) and recording the motion state of the track, realizing the prediction and the update of the motion state of the track through a Kalman filtering algorithm, and calculating the similarity of the motion state of the track and the motion state of a target detection sequence to form a motion information incidence matrix.

(2) And storing the characteristics of the tracks, wherein at most 100 appearance information characteristics are stored in each track and are used for carrying out similarity calculation with the characteristics of the detection target to form an appearance information correlation matrix. The existence of 100 appearance characteristics is beneficial to solving the problem of frequent ID switching caused by occlusion.

(3) The state information of the track is saved, and the track has three existing states, namely Confirmed (Confirmed), suspicious (Tentative) and Deleted (Deleted). Traces with different states may be subject to different processing methods.

(4) And storing parameter information for determining the track state, the Hits number, the number of times of the track being matched, namely the number of times of being successfully tracked, and if the number is more than 3, determining that the track is a determined track, so that the problem of false detection can be avoided to a certain extent. Age number, which is used to record how many frames have been matched in total since the track was initialized, and is incremented by one, i.e. the length of the track is recorded, regardless of whether the track is matched or not. Time _ nonce _ Update number, which records how many frames of the track have not been matched, and if it exceeds 30 frames, the track is marked as deleted. The existence of the number can solve the problem that the target is blocked within 30 frames, and the target cannot be marked as a new track after reappearance because the target is blocked.

Fig. 3 is a track state transition relation diagram according to an embodiment of the present disclosure. As shown in fig. 3, the matching module outputs an unmatched detection target, a matched pair, and an unmatched trajectory. The unmatched detection target is initialized to be a new track, state information of the new track is defined to be in doubt (Tentative), and the unmatched detection target is placed in a track library. And counting the Hits number corresponding to the matching pair, if the Hits number is not less than 3, defining the track as a determined (Confirmed) and putting the determined (Confirmed) track into a track library. And (4) confirming whether the unmatched pair state is definite (Confirmed), if so, putting the unmatched pair state into a track library, if not, confirming whether the number of frames of the track which is not matched is greater than 30, if so, deleting the track, and if not, putting the track into the track library.

In one embodiment of the present application, a small deep learning network, called a deep feature descriptor, is trained in advance for learning features of a vehicle specifically. The network structure of the depth characterizer is shown in table 1.

Name	Patch Size/Stride	Output Size
			Conv
1	3×3/1	32×128×64
			Conv 2	3×3/1	32×128×64
MaxPool 3	3×3/2	32×64×32
			Residual 4	3×3/1	32×64×32
Residual 5	3×3/1	32×64×32
			Residual 6	3×3/2	64×32×16
Residual 7	3×3/1	64×32×16
			Residual 8	3×3/2	128×16×8
Residual 9	3×3/1	128×16×8
			Dense 10		128
Batch and L2 normalization		128

TABLE 1

As shown in Table 1, the final output of the depth profiler is the 128-dimensional feature of the target. And inputting the corresponding image part of the target in each detection sequence into a depth feature descriptor to obtain 128-dimensional features, and comparing the 128-dimensional features with the features stored in the track target sequence through cosine distance to judge the degree of association between the features.

In one embodiment of the application, the similarity between two feature vectors is evaluated by cosine distance. And establishing a first incidence cost matrix according to the distance measurement standard, namely establishing an incidence matrix of appearance characteristic vectors between the detection target sequence and the tracking target sequence.

Specifically, assume that the detection target sequence is D ═ D₀，D₁，D₂，...，D_iAnd the sequence of the tracking target is T ═ T₀，T₁，T₂，...，T_jAnd f, establishing a cosine distance incidence matrix

At most 100 feature vectors are saved for each tracked target, so that for each detected target, at most 100 features saved in a certain track are calculated, and the value with the minimum cosine distance is taken as the value in the incidence matrix.

S102, the multi-target tracking device obtains coordinate position information of a current frame tracking track prediction frame according to coordinate position information of a previous frame tracking track frame and a Kalman filter, and establishes a second associated cost matrix based on the coordinate position information of the tracking track prediction frame and the coordinate position information of a detection frame.

In an embodiment of the present application, a predicted state vector corresponding to a current frame tracking trajectory is obtained through a state vector corresponding to a prediction model in the kalman filter and a previous frame tracking trajectory. And acquiring measurement noise corresponding to the tracking track through a preset sensor. And updating the predicted state vector through a measurement model and measurement noise in the Kalman filter to obtain a final state vector corresponding to the tracking track of the current frame.

Specifically, the embodiment of the application realizes the estimation of the motion state through a kalman filtering algorithm, and the algorithm solves the problem of discrete data linear filtering by a recursive method. The kalman filter is described as a series of recursive formulas, and is an efficient and calculable state estimation method, using the minimum mean square error as the criterion for the best estimation.

Specifically, kalman filtering has two main models, one being a prediction model and one being a measurement model. The predictive model is typically an empirical inductive or physical formula that functions to predict the state at time k +1 using the state at time k. The measurement model describes a calculation relationship when parameters directly measured by the sensor are converted into partial or all parameters in the state vector, and partial or local parameters in the state vector are finally obtained through the measurement model. And updating the state vector obtained by the prediction model through the measurement model to obtain the optimal state estimation. When the prediction model and the measurement model are linear, the standard Kalman filtering algorithm can be selected. The standard Kalman filtering is divided into two steps of prediction and updating:

the calculation formula of the prediction step is as follows:

x_k＝Fx_k+Bu_k

P_k＝FP_kF^T+Q

wherein x is_kRepresenting the state vector to be estimated, is an n-dimensional vector. F is a prediction matrix, which is an n × n dimensional matrix. B is a control matrix, u_kRepresenting the control vector. P_kFor covariance matrix, Q is the prediction noise. In this step, the optimal state estimate at the previous time is known, and the state value at this time is predicted by the prediction model. In addition, the covariance is also subjected to prediction updating, and the covariance describes the error magnitude of the prediction process and reflects the credibility of the set of state prediction values.

The update step has the following formula:

G_k＝P_kH^T(HP_kH^T+R)^-1

x_k＝x_k+G_k(z_k-Hx_k)

P_k＝(1-G_kH)P_k

specifically, the first step is to calculate the Kalman filter gain G_k. The calculation principle is that the comparison of the predicted error is large or the measurement error of the sensor is largeI.e. the magnitude of the covariance is compared, for prediction, the covariance is P_kFor measurement, the sensor measurement covariance is R. H is a measurement matrix. And the second step is to update the predicted value by using the measured value to obtain the optimal state estimation, wherein the result is the linear weighting result of the predicted value and the measured value of the sensor. The third step is to make the covariance matrix P_kAnd updating, calculating the error of the prediction process in the state estimation process, and taking the error as the Kalman gain of the next state estimation of error calculation, if the error of the next cycle is large, correspondingly reducing the weight of the predicted value in the next cycle, so that the measured value has the speech weight.

In an embodiment of the application, the coordinate position information of a detection frame corresponding to a current frame detection target is obtained according to three-dimensional point cloud information. And establishing a second correlation cost matrix based on the Markov distance between the coordinate position information of the detection frame corresponding to the detection target of the current frame and the coordinate position information of the prediction frame corresponding to the tracking track of the image of the current frame.

Fig. 4 is a schematic diagram of a constant velocity model according to an embodiment of the present application. The formula and parametric meaning of kalman filtering is further explained based on a constant velocity model and parameter settings as shown in fig. 4.

A prediction step:

the state vector quantity of the embodiment of the application is x_k＝(p_x，p_y，v_x，v_y) With the constant velocity model (CV) of the vehicle, as shown in the above figure, assuming that the vehicle is traveling on the road at a constant velocity v, the calculation formula can be obtained:

s_t＝s_t-1+vΔt

v_t＝v_t-1

decomposing the formula in the direction of X, Y can obtain the formula:

the matrix form is:

corresponding formula x_k+1＝Fx_k. Thus, in this model, the prediction matrix F is given by:

however, in the actual movement of the vehicle, a constant speed is not always maintained, and a constant acceleration/deceleration is generated, and the control matrix B and the control variable u can be obtained as a control variable in accordance with the degree of tightness of the accelerator in the model, which is expressed by the following expression.

The second step of the prediction is the update of the covariance. P_kIs a covariance matrix of state vector probability distribution, in practical situations, for example, the vehicle may be disturbed by wind, thereby generating fluctuation, i.e. introduction of noise, and Q is a covariance matrix of predicted noise, and the prediction process is compensated by introducing Q. In this model Q is a 4 × 4 matrix, which is empirically initialized to the formula:

an updating step:

in this step, the above prediction values are updated by the sensor measurement values, and in this model (p) can be obtained by measurement_x，p_y). And (3) obtaining a measurement matrix H according to a formula between the state vector and the observed quantity, wherein the formula is as follows:

r is the measurement noise of the sensor, here the lidar, where R is a 2 x 2 matrix, which is empirically initialized to the formula:

as can be seen from the prediction model and the measurement model, both are linear models, thus meeting the requirements of the standard Kalman filtering.

Fig. 5 is a block diagram of a kalman filtering algorithm flow provided in an embodiment of the present application. As shown in fig. 5, kalman filtering mainly has two models, one is a prediction model and one is a measurement model. Q, P, R is initialized, the prediction value is calculated according to the prediction model, and the covariance is calculated. Secondly, calculating Kalman gain, updating the predicted value according to the measured value, and updating the covariance. Kalman filtering calculation is realized through the method.

In one embodiment of the present application, mahalanobis distance represents the covariance distance between state vectors, which is a method that can efficiently calculate the similarity between two vectors. The mahalanobis distance principle is to rotate the state vector along the direction of the feature vector, so that each variable dimension is independent, then normalization is performed, each variable dimension has the same scale distribution, and the mahalanobis distance calculation formula of a single data point is as follows:

the mahalanobis distance between data points x, y is calculated as follows, Σ is the covariance matrix of the multidimensional random variable, and μ is the mean of the variable.

The Mahalanobis distance is applied to calculating the motion correlation degree between the three-dimensional detection frame and the three-dimensional tracking prediction frame, and the calculation formula is as follows

Wherein d is_jAnd y_iRespectively representing the three-dimensional position of the jth detection frame and the three-dimensional position of the ith tracking prediction frame, S_iRepresenting the covariance matrix between the two, a correlation matrix of the mahalanobis distance measure can be obtained

In the embodiment of the application, the inverse chi is used²The Martensitic distance is thresholded by the 95% confidence interval corresponding to the distribution, the measurement state is set to be (x, y, gamma and h) in the embodiment of the application, the measurement state has four dimensionalities, and the corresponding chi is searched²The distribution table threshold is t 9.4877. If the mahalanobis distance is greater than the threshold, the detected target motion state and the tracked target motion state are considered to be irrelevant, and the cosine distance between the detected target motion state and the tracked target motion state is set to be 100000, which is considered to be an infinite state.

The embodiment of the application has the advantages that the cosine distance of the appearance characteristics is very effective for the problem of shielding for a long time, when vehicles with similar appearances are encountered, the Mahalanobis distance based on three-dimensional motion state information has great advantages, two measurement modes are combined, problems such as false detection and missing detection when one condition fails can be effectively avoided, and the matching accuracy is greatly improved.

S103, the multi-target tracking device carries out cascade matching on the detection target and the track target determined by the state based on the first correlation cost matrix and the second correlation cost matrix to obtain a cascade matching set, an unmatched track target and an unmatched detection target.

In an embodiment of the application, the number of times that a track target is detected is obtained, and when the number of times is greater than a preset number threshold, the track target is divided into determined track targets so as to perform cascade matching on the determined track targets.

In particular, it is most important for the tracking task to complete the matching problem of the previous and subsequent frames of the same target, that is, to realize the target association between the detection sequence and the tracking sequence. The target track will still be preserved if the number of lost frames is within 30. However, if the track target is lost for a long time, uncertainty of the track during prediction is increased, that is, covariance is increased, the inverse of the covariance is adopted during calculation according to the mahalanobis distance, and the mahalanobis distance is reduced, so that the detected target is matched with the track lost for a long time, and mismatching is caused. To solve this problem, a cascade matching algorithm is introduced, which makes the time on the track matching closer, the higher the priority of the next matching. Generally speaking, the tracks on the continuous matching are firstly participated in the matching task, then the tracks with one frame missing are participated in the matching, the number of the missing frames is increased one by one until the tracks with 30 frames missing participate in the matching, and the cascade matching is finished.

In one embodiment of the present application, a plurality of detected targets are initialized as a set of unmatched targets, and a plurality of trajectory targets are initialized as a set of trajectories. And performing cascade matching based on the first associated cost matrix and the second associated cost matrix of the unmatched target set, the track set and the preset maximum loss frame number to obtain an initial matching set, an unmatched track target and an unmatched detection target.

Specifically, a first sub-track set with the number of lost frames being 0 in the track set is determined, and the first sub-track set, a first associated cost matrix of the unmatched target set and a second associated cost matrix are used as input of the Hungarian algorithm to obtain a first matched set and a first unmatched target set. And determining a second sub-track set with the number of lost frames being K in the track set, and taking the second sub-track set, a first incidence cost matrix and a second incidence cost matrix of the first unmatched target set as the input of the Hungarian algorithm to obtain a second matched set and a second unmatched target set. And gradually increasing the value K until the value K reaches the preset maximum loss frame number, and determining an unmatched track target and an unmatched detection target corresponding to the preset maximum loss frame number. And taking the matching sets respectively corresponding to different lost frame numbers as initial matching sets.

Specifically, the steps of cascade matching are as follows:

(1) the input of the cascade matching algorithm is that a tracking target sequence is T ═ 1.. multidot.N }, a detection target sequence is D ═ 1.. multidot.M }, and the algorithm has the maximum allowable lost frame number A_maxHerein 30.

(2) A first correlation cost matrix based on cosine distances between two sequences is calculated.

(3) A second correlation cost matrix between the two sequences based on mahalanobis distance is calculated.

(4) The initialization matching set M is empty. And initializing the target detection sequence D into an unmatched target set u.

(5) Traversing the number of lost frames from 0 to A_max. And selecting a sub-track set with the lost frame number of 0 from the track set, and inputting the sub-track set and the associated cost matrix of the unmatched targets into the Hungarian algorithm.

(6) A matching set and unmatched targets are obtained. And selecting a track set of the number of lost frames plus one and the unmatched target of the round to carry out the operation, knowing that the number of lost frames reaches 30, and ending the cascade matching.

(7) And combining the matching sets obtained in each round to obtain a final matching set M and a detection target which is not matched in the last round.

And for the problem that the matching of the detection target and the track target can be classified into bipartite graph matching, selecting a Hungarian algorithm for matching. According to the embodiment of the application, the cost incidence matrix of the detection sequence and the tracking sequence is obtained through the Mayer distance and the cosine distance, the cost incidence matrix is used as the input of the Hungarian algorithm, the algorithm output is the optimal matching relation between the target sequence and the tracking sequence, and therefore the target matching problem between the front frame and the rear frame, namely the RE-ID is completed.

S104, the multi-target tracking device performs IOU matching based on the unmatched track target and the unmatched detection target to obtain an IOU matching set, and multi-target tracking is achieved according to the cascade matching set and the IOU matching set.

In an embodiment of the application, when the number of times is not greater than a preset number threshold, dividing the track target into the track target to be determined, so as to perform IOU matching on the track target to be determined.

In an embodiment of the application, the number of times that a track target is detected is obtained, and when the number of times is greater than a preset number threshold, the track target is divided into determined track targets so as to perform cascade matching on the determined track targets. And acquiring the unmatched track target to perform IOU matching on the unmatched track target.

Specifically, when the track is in a certain state, the track is used as an input of the cascade matching algorithm. This is because when the track state is uncertain, it indicates that the number of times of detecting the current time of this track does not exceed 3 times, and it does not make any sense to perform cascade matching at this time.

In one embodiment of the present application, an IOU incidence matrix is established based on unmatched trajectory targets and unmatched detection targets. And taking the IOU incidence matrix as the input of the Hungarian algorithm to obtain the matching relation between the unmatched track targets and the unmatched detection targets.

Specifically, the IOU matching is for the detection sequence and the tracking sequence, and the metric factor for measuring whether two targets are matched is whether the IOU value of the two targets is within a threshold value, and if the IOU value is smaller than the threshold value, the two targets are considered to have a certain correlation. IOU correlation matrix for tracking sequence and detection sequence can be easily established

And taking the IOU incidence matrix as the input of the Hungarian algorithm, and outputting the correlation matrix as the matching relation between the tracking sequence and the detection sequence. The IOU is a very important measurement factor for determining whether two targets have a matching relation or not, and has important significance for the correct use and the mastering of the IOU, the training of the IOU and the matching relation among multiple targets. And inputting the tracking target sequence and the detection sequence with the determined states into a cascade matching algorithm to obtain a matched pair, an unmatched tracking target and an unmatched detection target. At this time, a new unmatched tracking target consisting of the tracking target sequence to be determined in state and the unmatched tracking target is input to the IOU matching calculation together with the unmatched detection target.

According to the embodiment of the application, the matching efficiency between the detection target sequence and the tracking target sequence is improved as much as possible through a two-stage matching algorithm, and the occurrence of a missing matching phenomenon is reduced.

According to the embodiment of the application, through multi-mode information input, the matching accuracy between the detection target and the tracking target is improved by using the appearance characteristics of the image and the three-dimensional position information of the point cloud. For a long-time shielding problem, the cosine distance of the appearance characteristics is used for completing the association of the target motion state, when vehicles with similar appearances are encountered, the Mahalanobis distance based on the three-dimensional motion state information is used for completing the association of the target motion state, and the two measurement modes are combined, so that the problems of false detection, missing detection and the like when one condition fails can be effectively avoided, and the problem of frequent ID switching of the tracked target caused by the shielding problem is effectively reduced.

Fig. 2 is a block diagram of a multi-target tracking algorithm flow provided in an embodiment of the present application. As shown in fig. 2, first, a separate trace manager is established for each target to implement management of trace target information. And establishing a correlation matrix between the detection target and the tracking target by using the appearance information and the motion state information of the target, and then realizing the correlation between the front frame and the rear frame between the detection sequence and the tracking sequence by adopting a cascade matching algorithm and an IOU (input output Unit) matching algorithm. And predicting and updating the motion state of the target by adopting Kalman filtering based on a constant speed model.

Fig. 6 is a schematic diagram of target tracking at different times according to an embodiment of the present application. As shown in fig. 5, a scene in the KITTI tracking data set is randomly selected for displaying the tracking effect, two vehicles are tracked in the scene, the tracking effects at 3 moments are intercepted, and the first moment, the second moment and the third moment are respectively from top to bottom. At the first time, tracking object 224 has been tracked for a period of time and tracking object 230 has just begun to be tracked. The second time point continues to track for 1.3s on the basis of the first time point, and the third time point continues to track for 0.8s on the basis of the second time point.

Fig. 7 is a schematic diagram of point cloud tracking at different time points according to an embodiment of the present application. As shown in fig. 7, the point cloud tracking display effect and the tracking track corresponding to the first, second and third moments are respectively from left to right.

As can be seen from fig. 6 and 7, the tracking algorithm in the embodiment of the present application can continuously track multiple targets, and the tracked IDs are not switched. However, the trace 230 has a missing detection problem of two frames in the middle, and the target is still marked as the original tracking target ID after being re-detected, and no ID switching occurs. The effect demonstration of the scene shows that the tracking algorithm provided by the patent can realize simultaneous tracking of a plurality of targets, and when a short omission phenomenon occurs, the original target ID can still be continuously tracked, so that the problem of frequent ID switching is effectively reduced.

Fig. 8 is a schematic diagram of a calculated tracking target track and a calculated heading angle according to an embodiment of the present application. As shown in fig. 8, fig. 8(a) shows a comparison of the track tracked by the KF algorithm and the actual track groudtuth. It can be seen from the figure that the trajectory tracked by KF and groudtuth are substantially coincident, and the mean square error of the position deviation is calculated to be 1.059 m. Fig. 8(b) and 8(c) show the position change of the target in the direction X, Y, the calculated mean square deviations are 1.016m and 0.303m, respectively, indicating that the main error is in the X direction, i.e. the driving direction of the car. The tracking algorithm has poor positioning effect on the target in the driving direction, and mainly has the defect that when the point cloud detection algorithm clusters the target, the obtained target position has large error, so that the tracking generates large position error. FIG. 8(d) shows the comparison between the course angle obtained by tracking and the actual course angle in the target tracking process, and the mean square error of the calculated course angle deviation is 0.033 degrees, which shows that the tracking algorithm is more accurate for the course angle.

Fig. 9 is a tracking effect diagram in a daytime scene according to an embodiment of the present application. As shown in fig. 9, fig. 9 is a captured scene of the real vehicle testing platform when passing through the overpass, the scene relates to the change of illumination, fig. 9 includes four pictures, from left to right, from top to bottom, the scene when just entering the overpass, the scene under the overpass, the scene when leaving the overpass and the scene when leaving the overpass, it can be seen that, in the whole process, the front vehicle 4 object of the lane and the vehicle 18 object of the left adjacent lane are tracked, in the whole process of passing through the overpass, the detection of the 18 object is lost due to overexposure and shielding when leaving the overpass, but the 18 object is detected again soon and the object ID is not changed. Target No. 4 can be stably tracked all the time, and no switching of target IDs occurs. The tracking algorithm can be intuitively explained to be capable of tracking two vehicles simultaneously, and the problem that target ID switching is frequent due to problems such as shielding can be improved to a certain extent.

Fig. 10 is a tracking effect diagram in a night scene according to an embodiment of the present application. As shown in fig. 10, fig. 10 shows the tracking effect of the night scene, the number of tracked vehicles changes in the tracking process of fig. 10, and tracking of one vehicle is changed into tracking of two vehicles at the same time, it can be intuitively seen that the tracking algorithm according to the embodiment of the present application can also better adapt to the night environment and track multiple vehicles at the same time, and the ID of the tracked target is relatively stable, and frequent switching does not occur.

Through real vehicle test verification, the algorithm can meet the requirements of a real vehicle test platform on multi-target detection and tracking tasks, can complete detection, fusion and tracking of multiple targets in the daytime and at night, can correct detection results through a fusion process, and effectively reduces the problems of missed detection and false detection. Through multi-mode information input, the matching accuracy between a detection target and a tracking target can be improved by utilizing the appearance characteristics of an image and the three-dimensional position information of point cloud, and the problem of frequent ID switching of the tracking target caused by the shielding problem is effectively reduced.

Fig. 11 is a road measurement diagram provided in an embodiment of the present application. According to the embodiment of the application, the motion state of the target is predicted and updated through a Kalman filtering algorithm, and the tracking effect of the target position and the course angle is displayed by selecting a day scene and a night scene. The selected day scene is shown in fig. 11(a), and the night scene is shown in fig. 11 (b).

Fig. 12 is a tracking trajectory and heading angle diagram corresponding to a road test in a daytime scene according to an embodiment of the present application. Fig. 12(a) and 12(b) illustrate that the tracking algorithm according to the embodiment of the present application can continuously and stably track the front vehicle target to obtain the track information and the heading information of the target. The driving direction and behavior of the target can be judged through the heading angle, and if the angle change of the target in a short time violently shows that the target can perform lane changing behavior or turning behavior.

Fig. 13 is a tracking trajectory and course angle diagram corresponding to a road test under an night scene provided in an embodiment of the present application. The result of tracking the target in the scene is shown in fig. 13(a), and the course angle tracking is shown in fig. 13(b), which illustrates that the tracking algorithm can also track the target position and course angle in the night scene.

Fig. 14 is a schematic structural diagram of a multi-target tracking device according to an embodiment of the present application. As shown in fig. 14, the multi-target tracking device includes, at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to enable the at least one processor to:

inputting an image corresponding to a current frame detection target into a preset depth feature descriptor to obtain a feature correlation degree between the image corresponding to the detection target and an image corresponding to a track target, and establishing a first correlation cost matrix based on the feature correlation degree;

acquiring coordinate position information of a current frame tracking track prediction frame according to coordinate position information of a previous frame tracking track frame and a Kalman filter, and establishing a second associated cost matrix based on the coordinate position information of the tracking track prediction frame and the coordinate position information of a detection frame;

based on the first incidence cost matrix and the second incidence cost matrix, cascade matching is carried out on a detection target and a track target determined by the state, and a cascade matching set, an unmatched track target and an unmatched detection target are obtained;

and performing IOU matching based on the unmatched track target and the unmatched detection target to obtain an IOU matching set so as to realize multi-target tracking according to the cascade matching set and the IOU matching set.

Embodiments of the present application further provide a non-volatile computer storage medium storing computer-executable instructions, where the computer-executable instructions are configured to:

acquiring coordinate position information of a current frame tracking track prediction frame according to coordinate position information of a previous frame tracking track frame and a Kalman filter, and establishing a second associated cost matrix based on the coordinate position of the tracking track prediction frame and the coordinate position information of a detection frame;

The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the apparatus, the device, and the nonvolatile computer storage medium, since they are basically similar to the embodiments of the method, the description is simple, and the relevant points can be referred to the partial description of the embodiments of the method.

The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and alterations to the embodiments of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present application should be included in the scope of the claims of the present application.

Claims

1. A multi-target tracking method, characterized in that the method comprises:

acquiring coordinate position information of a current frame tracking track prediction frame according to coordinate position information of a previous frame tracking track frame and a Kalman filter, and establishing a second associated cost matrix based on the coordinate position information of the tracking track prediction frame and the detection frame;

based on the first incidence cost matrix and the second incidence cost matrix, cascade matching is carried out on a detection target and a track target determined by a state, and a cascade matching set, an unmatched track target and an unmatched detection target are obtained;

2. The multi-target tracking method according to claim 1, wherein the step of performing cascade matching on the detection target and the track target determined by the state based on the first and second associated cost matrices to obtain a cascade matching set, an unmatched track target and an unmatched detection target specifically comprises:

initializing a plurality of the detection targets to an unmatched target set and a plurality of the trajectory targets to a trajectory set;

and performing cascade matching based on the first and second associated cost matrixes of the unmatched target set, the track set and a preset maximum loss frame number to obtain an initial matching set, unmatched track targets and unmatched detection targets.

3. The multi-target tracking method according to claim 2, wherein the performing cascade matching based on the first and second associated cost matrices of the unmatched target set, the track set and a preset maximum number of lost frames to obtain an initial matching set, an unmatched track target and an unmatched detection target specifically comprises:

determining a first sub-track set with the number of lost frames being 0 in the track set, and taking the first sub-track set, a first associated cost matrix and a second associated cost matrix of the unmatched target set as the input of the Hungarian algorithm to obtain a first matched set and a first unmatched target set; determining a second sub-track set with the number of lost frames being K in the track set, and taking the second sub-track set, a first incidence cost matrix and a second incidence cost matrix of the first unmatched target set as the input of a Hungarian algorithm to obtain a second matched set and a second unmatched target set;

gradually increasing the value K until the value K reaches the preset maximum loss frame number, and determining an unmatched track target and an unmatched detection target corresponding to the preset maximum loss frame number;

and taking the matching sets respectively corresponding to different lost frame numbers as the initial matching sets.

4. The multi-target tracking method according to claim 1, wherein before performing cascade matching on the detected target and the track target determined by the state based on the first and second associated cost matrices, the method further comprises:

acquiring the times of the track target being detected, and dividing the track target into determined track targets when the times are greater than a preset time threshold value so as to perform cascade matching on the determined track targets;

obtaining an unmatched track target, and carrying out IOU matching on the unmatched track target.

5. The multi-target tracking method according to claim 1, wherein the IOU matching is performed based on the unmatched track target and the unmatched detection target to obtain an IOU matching set, and specifically includes:

establishing an IOU incidence matrix based on the unmatched track target and the unmatched detection target;

and taking the IOU incidence matrix as the input of the Hungarian algorithm to obtain the matching relation between the unmatched track target and the unmatched detection target.

6. The multi-target tracking method according to claim 1, wherein the obtaining of the coordinate position information of the current frame tracking track prediction frame according to the coordinate position information of the previous frame tracking track frame and the kalman filter specifically comprises:

obtaining a predicted state vector corresponding to the tracking track of the current frame through a state vector corresponding to the prediction model in the Kalman filter and the tracking track of the previous frame;

acquiring measurement noise corresponding to the tracking track through a preset sensor;

and updating the predicted state vector through a measurement model in the Kalman filter and the measurement noise to obtain a final state vector corresponding to the current frame tracking track.

7. The multi-target tracking method according to claim 1, wherein the establishing of the second relevance cost matrix based on the coordinates of the tracking trajectory prediction box and the position information of the detection box specifically comprises:

acquiring coordinate position information of a detection frame corresponding to a current frame detection target according to the three-dimensional point cloud information;

and establishing the second association cost matrix based on the detection frame coordinate position information corresponding to the current frame detection target and the Mahalanobis distance between the prediction frame coordinate position information corresponding to the current frame image tracking track.

8. The multi-target tracking method according to claim 1, wherein before the image corresponding to the current frame detection target is input into the preset depth feature descriptor, the method further comprises:

establishing a corresponding tracking target manager based on each track target;

carrying out track information recording on different track targets through the tracking target manager; the track information at least comprises one of motion state information of a track target, feature information of the track target, state information of the track target and state parameter information of the track target.

9. A multi-target tracking device, comprising:

at least one processor; and (c) a second step of,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

10. A non-transitory computer storage medium storing computer-executable instructions configured to: