CN112528730B

CN112528730B - Cost matrix optimization method based on space constraint under Hungary algorithm

Info

Publication number: CN112528730B
Application number: CN202011128387.0A
Authority: CN
Inventors: 柯逍; 叶宇; 李悦洲; 於志勇
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2020-10-20
Filing date: 2020-10-20
Publication date: 2022-06-10
Anticipated expiration: 2040-10-20
Also published as: CN112528730A

Abstract

The invention relates to a cost matrix optimization method based on space constraint under Hungarian algorithm, which comprises the steps of firstly obtaining a stored appearance characteristic vector set of all tracked targets, then obtaining detection results of all pedestrians in a frame, and combining the detection results with the detection results to construct an initial cost matrix; then, estimating the current position of the target by utilizing Kalman filtering according to the information of the tracked target, and modifying the weight of the cost matrix according to the relative relation between the estimated position and the detection result for subsequent assignment tasks; and finally, obtaining the optimal assignment of the cost matrix by using a Hungarian algorithm, distributing the detection result according to the optimal assignment, and updating and storing the appearance characteristics of the tracked target. The method can effectively improve the matching effect of the Hungarian algorithm according to the video scene and remove some unreasonable matching.

Description

Cost matrix optimization method based on space constraint under Hungary algorithm

Technical Field

The invention relates to the field of computer vision, in particular to a cost matrix optimization method based on space constraint under Hungarian algorithm.

Background

Multi-Object Tracking (MOT). The main task is to give an image sequence, find moving objects in the image sequence, correspond moving objects in different frames one to one (Identity), and then give the motion tracks of different objects. The mainstream framework adopted by the academia in the multi-target Tracking (MOT) problem at present is TBD (Tracking-by-Detection), that is, Tracking based on Detection, and in this mainstream Tracking framework, the multi-target Tracking problem is expressed as an association matching problem: if the detection result obtained from a certain frame is matched with the detection result obtained from the previous frame, the same target is identified.

In the traditional multi-target tracking, in a matching and associating stage: most Hungarian algorithms are used to associate the detection results belonging to the same target with the tracked target according to the similarity/distance measure, i.e. the same ID is assigned to the detection results identifying the same target. However, there is a certain disadvantage that, because there is no proper spatial constraint, the two far-distance detection results and the tracked target may be mistaken for the same target because of extremely high similarity of appearance, but the practical situation is impossible because the target does not generate too large unique value in two adjacent frames of a video sequence, so the cost matrix is optimized by using the spatial constraint method, and the optimized matrix will be used as the input of the hungarian algorithm in the future.

Disclosure of Invention

In view of the above, the invention aims to provide a cost matrix optimization method based on space constraint under the hungarian algorithm, which can effectively improve the matching effect of the hungarian algorithm according to a video scene and remove some unreasonable matches.

The invention is realized by adopting the following scheme: a cost matrix optimization method based on space constraint under Hungarian algorithm comprises the following steps:

step S1: acquiring an appearance characteristic vector set of all tracked targets of a current frame and a pedestrian detection result of the current frame, and combining the two sets to construct an initial cost matrix;

step S2: estimating the speed of the target in the previous frame by using Kalman filtering according to the target position and the speed of the tracked target, performing comparative analysis on the linearly estimated target position and the actually detected pedestrian, and modifying the weight of the initial cost matrix for subsequent assignment tasks, namely applying the weight to the step S3;

step S3: and obtaining the optimal assignment of the modified cost matrix by using a Hungarian algorithm, distributing the results of pedestrian detection according to the optimal assignment, judging which detection results belong to the tracked target and which detection results belong to a new target, updating a Kalman filter if the detection results belong to the tracked target, initializing a new tracker if the detection results belong to the tracked target, and finally inputting the new targets into an appearance feature extraction network to obtain and store new appearance feature vectors.

Further, the step S1 specifically includes the following steps:

step S11: let the stored historical appearance feature set be F, F ═ F_iM, where M denotes the number of objects being tracked, f_iThe feature representing the ith tracked target is a 128-dimensional vector;

step S12: let the pedestrian detection result be R ═ { Kframe ═_k,Person_vDet _ x, det _ y, det _ w, det _ h, det _ c, k 1,2,. U, V1, 2,. V represents the set of all detection results in a video sequence, where U represents the number of all image frames in a video sequence, V represents the number of all detected pedestrians in a frame of image, and Kframe represents the number of detected pedestrians in a frame of image_kRepresenting the k-th frame image, Person, in a video sequence_vRepresenting the vth pedestrian in the frame image, det _ x, det _ y, det _ w and det _ h respectively represent the x coordinate and the y coordinate of the upper left corner of the detection frame of the pedestrian and the width and the height of the detection frame, and det _ c represents the confidence coefficient of the detection frame;

step S13: let the confidence threshold value of pedestrian detection be Th_dThe pedestrian aspect ratio threshold is Th_rDeleting the detection results satisfying the following conditions:

det_c＜Th_d or det_w/det_h＞Th_r

step S14: inputting the detection result into a feature extraction network, and similarly obtaining an appearance feature set D of the detection result, wherein D is { D ═ D }_jN, where N denotes a frame imageThe number of detected pedestrians; calculating the distance between every two eigenvectors in the pedestrian detection appearance characteristic set and the tracked target appearance characteristic set to obtain an initial cost matrix, and setting the initial cost matrix as

Wherein c is^ijE C represents the distance (similarity) between the ith detection result and the appearance characteristic of the jth tracked target, and is also called distribution cost; the distance between the two appearance characteristics is calculated by adopting a cosine distance calculation method, which comprises the following steps:

wherein G is_sAnd E_sRespectively, the s-th elements constituting the feature vector G and the feature vector E are represented, and G ∈ F, E ∈ D.

Further, the step S2 specifically includes the following steps:

step S21: let the information to be estimated of the tracked target be O ═ color _ x, color _ y, color _ w, color _ h }, where color _ x represents the estimated value of the x coordinate of the tracked target, color _ y represents the estimated value of the y coordinate of the tracked target, color _ w represents the estimated value of the width of the tracked target, and color _ h represents the estimated value of the height of the tracked target;

step S22: the velocity of the target is estimated by using a prediction formula of Kalman filtering, and the calculation method comprises the following steps:

wherein

And

respectively representing the prior state estimation value at the time k and the posterior state estimation value at the time k-1, A representing a state transition matrix, B representing a transformation matrix input to a state variable, u_kAn input representing the time at which k is present,

and P_k-1Respectively representing prior estimation covariance at the moment k and posterior estimation covariance at the moment k-1, Q representing system process covariance, and T representing the transposition of a matrix;

step S23: updating the state of the filter by using an updating formula of Kalman filtering, and obtaining a final Kalman prediction result, wherein the specific calculation method comprises the following steps:

wherein K_kRepresenting the filter gain matrix (Kalman gain), H the transformation matrix of the state variable to the observed variable, R the measurement noise covariance, z_kRepresenting an observed variable, P_kRepresenting the a posteriori estimated covariance at time k,

expressing a posterior estimation value at the moment k, namely the solved optimal Kalman estimation;

step S24: the moving speed and the size of the detection frame of the pedestrians of two adjacent frames in the video cannot be changed too much, so that the speed and the moving direction of the pedestrian in the previous frame are estimated by using Kalman filtering to predict the position of the pedestrian of the current frame;

L^k＝L^k-1+ΔV^k-1

M^k＝L^k+E^k

wherein L is^k,L^k-1Position of the object, Δ V, at time k and at time k-1, respectively^k-1Representing the target velocity estimated by means of Kalman filtering at time k-1, E^kRepresents the expansion vector of the target, δ is the expansion coefficient, w, h represents the width and height M of the target^kEstimating a domain for the target;

step S25: calculating the pedestrian detection result of the current frame and the GIOU (generalized intersection ratio) of all the target estimation domains to eliminate wrong matching results, wherein the calculation mode is as follows:

wherein S_aArea of bounding box representing first object, S_bArea of bounding box representing second object, S_cIs composed of S_aAnd S_bThe smallest box of (1); the calculation method of the bounding box area is that the length of the rectangle is multiplied by the width of the rectangle;

step S26: modifying the initial cost matrix according to the calculation result of IOU (cross-over ratio), making the predefined IOU threshold value be t, if f_iAnd d_jCross-over ratio of (IOU)^ijAnd if the value is less than t, then:

c^ij＝0

if f_iAnd d_jIOU (IoU)^ijIf t is greater than or equal to t, then:

c^ij＝c^ij×IOU^ij。

further, the step S3 specifically includes the following steps:

step S31: and calculating the optimal assignment of the modified cost matrix by using the Hungarian algorithm, wherein the optimal assignment must meet the following conditions:

wherein bin^ijIs a binary decision variable, bin^ijE is {0,1}, 0 represents that the ith target feature is not matched with the jth detection result, and 1 represents that the ith target feature is matched with the jth detection result;

step S32: calculating to obtain matching results according to the Hungarian algorithm in the step S31, judging whether all detection results are successfully matched with the tracked target, if the detection results are successfully matched with the tracked target, indicating that the detection results belong to the target, taking the target coordinates and the speed of the detection results as input, updating the state of a Kalman filter, and executing the step S33, otherwise executing the step S34;

step S33: inputting the successfully matched detection result into an appearance feature extraction network to extract a feature vector and update the historical appearance feature of the tracked target;

step S34: if no detection result can be matched with the tracked target, the detection result corresponds to a new target, a new target tracker and a Kalman filter are initialized, and the step S35 is executed;

step S35: inputting the detection result of unsuccessful matching into the appearance feature extraction network to extract feature vectors and update the historical appearance features of the new target, and then turning to step S11.

Compared with the prior art, the invention has the following beneficial effects:

1. compared with the traditional Hungarian algorithm, only Kalman filtering and correlation calculation of the IOU are added to predict the target position, so that the operation efficiency is high and the memory is small compared with the method for predicting by using a neural network.

2. The method has low requirement independent of the detection result, and can predict the position of the target to be marked even if the target is missed to be detected, and the marked target can be continuously tracked without track fragments.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

As shown in fig. 1, the embodiment provides a cost matrix optimization method based on space constraint under the hungarian algorithm, wherein a cost matrix is optimized based on space constraint, and the optimized cost matrix is used in the hungarian algorithm; the matching effect of the Hungarian algorithm can be effectively improved, and some unreasonable matches are removed;

the method comprises the following steps:

Preferably, in this embodiment, the step S2 has already been completed, the subsequent process is to use the cost matrix as an input of the hungarian algorithm, and the output of the algorithm is the optimal assignment. These assignments are the basis for the association matching.

In this embodiment, the step S1 includes the following steps:

step S12: let the pedestrian detection result be R ═ { Kframe ═_k,Person_vDet _ x, det _ y, det _ w, det _ h, det _ c, k 1,2,. U, V1, 2,. V represents the set of all detection results in a video sequence, where U represents the number of all image frames in a video sequence, V represents the number of all detected pedestrians in a frame of image, and Kframe represents the number of detected pedestrians in a frame of image_kRepresenting the k-th frame image, Person, in a video sequence_vThe v-th pedestrian, det _ x, det _ y, det _ w and det _ h in the frame image respectively represent the x coordinate and the y coordinate of the upper left corner of the detection frame of the pedestrian and the position of the detection frameWidth and height, det _ c represents the confidence of this detection box;

step S13: let the confidence threshold value of pedestrian detection be Th_dThe pedestrian width-height ratio threshold is Th_rDeleting the detection results satisfying the following conditions:

det_c＜Th_d or det_w/det_h＞Th_r

step S14: inputting the detection result into a feature extraction network, and similarly obtaining an appearance feature set D of the detection result, wherein D is { D ═ D }_jN, where N represents the number of all detected pedestrians in one frame of image; calculating the distance between every two eigenvectors in the pedestrian detection appearance characteristic set and the tracked target appearance characteristic set to obtain an initial cost matrix, and setting the initial cost matrix as

Wherein c is^ijThe epsilon C represents the apparent characteristic distance (similarity) between the ith detection result and the jth tracked target, and is also called the distribution cost; the distance between the two appearance characteristics is calculated by adopting a cosine distance calculation method, which comprises the following steps:

In this embodiment, the step S2 specifically includes the following steps:

wherein

And

wherein K_kRepresenting a filter gain matrix (Kalman gain), H representing a state variable to observation variable transformation matrix, R representing a measurement noise covarianceVariance, z_kRepresenting an observed variable, P_kRepresenting the a posteriori estimated covariance at time k,

L^k＝L^k-1+ΔV^k-1

M^k＝L^k+E^k

wherein L is^k,L^k-1The position of the object, Δ V, at time k and at time k-1, respectively^k-1Representing the target velocity estimated by means of Kalman filtering at time k-1, E^kRepresents the expansion vector of the target, δ is the expansion coefficient, w, h represents the width and height of the target, M^kEstimating a domain for the target;

step S25: calculating the pedestrian detection result of the current frame and the GIOU (generalized intersection ratio) of all the target estimation domains to eliminate the wrong matching result (based on the previous assumption that the same target position of two adjacent frames does not change greatly), and calculating the following way:

wherein S_aArea of bounding box representing first object, S_bArea of bounding box representing second object, S_cIs composed of S_aAnd S_bThe minimum frame of (a); the calculation method of the bounding box area is that the length of the rectangle is multiplied by the width of the rectangle;

c^ij＝0

if f_iAnd d_jIOU (IoU)^ijIf t is greater than or equal to t, then:

c^ij＝c^ij×IOU^ij。

that is, the appearance feature distance (allocation cost) of IOU < t is set to 0, and the appearance feature distance of IOU > ═ t is modified according to the IOU (the closer the distance is, the greater the weight is, the higher the confidence is)

In this embodiment, the step S3 specifically includes the following steps:

step S34: if no detection result can be matched with the tracked target, the detection result corresponds to a new target, a new target tracker and a Kalman filter are initialized, and the step S35 is executed; the target tracker comprises a series of information such as target position, appearance characteristics, life cycle and the like, and the Kalman filter is a module which is only responsible for predicting the target position in the target tracker. And initializing a new target tracker, namely recalculating information such as target positions, appearance characteristics and the like.

Preferably, in the traditional multi-target tracking task, the Hungarian algorithm is high in efficiency and simple to implement, so that the Hungarian algorithm is often used in the last association matching stage of the multi-target tracking task, but the traditional Hungarian algorithm only considers the similarity/distance measurement between a detection result and a tracked target, so that matching errors are easy to occur. For example, two objects with a long distance in two adjacent images are mistakenly regarded as one object because of the extremely high degree of similarity of appearance, thereby causing a tracking failure, and therefore, the present embodiment adds a spatial constraint, such as step S26, and predicts the position of the tracked object, such as step S24, to effectively solve the problem.

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. A cost matrix optimization method based on space constraint under Hungarian algorithm is characterized in that: the method comprises the following steps:

step S3: acquiring the optimal assignment of the modified cost matrix by using a Hungarian algorithm, distributing the results of pedestrian detection according to the optimal assignment, judging which detection results belong to the tracked target and which detection results belong to a new target, updating a Kalman filter if the detection results belong to the tracked target, initializing a new tracker if the detection results belong to the tracked target, and finally inputting the new targets into an appearance feature extraction network to obtain and store new appearance feature vectors;

the step S1 includes the following steps:

step S12: let the pedestrian detection result be R ═ { Kframe ═_k,Person_vDet _ x, det _ y, det _ w, det _ h, det _ c, k is 1,2,. U, V is 1,2,. V, where R represents the set of all detection results in a video sequence, where U represents the number of all image frames in a video sequence, V represents the number of all detected pedestrians in a frame, and Kframe_kRepresenting the k-th frame image, Person, in a video sequence_vRepresenting the vth pedestrian in the frame image, det _ x, det _ y, det _ w and det _ h respectively represent the x coordinate and the y coordinate of the upper left corner of the detection frame of the pedestrian and the width and the height of the detection frame, and det _ c represents the confidence coefficient of the detection frame;

det_c＜Th_d or det_w/det_h＞Th_r

Wherein c is^ijE C represents the distance between the ith detection result and the appearance characteristic of the jth tracked target, which is also called distribution cost; the distance between the two appearance characteristics is calculated by adopting a cosine distance calculation method, which comprises the following steps:

wherein G is_sAnd E_sRespectively representing the s-th elements forming the feature vector G and the feature vector E, and G belongs to F, E and belongs to D;

the step S3 specifically includes the following steps:

wherein bin^ijIs a binary decision variable, bin^ijThe element is {0,1}, 0 represents that the ith target characteristic is not matched with the jth detection result, and 1 represents that the ith target characteristic is matched with the jth detection result;

step S35: and inputting the detection result which is not successfully matched into the appearance feature extraction network to extract feature vectors and update the historical appearance features of the new target, and turning to the step S11.

2. The cost matrix optimization method based on the space constraint under the Hungarian algorithm, according to claim 1, is characterized in that: the step S2 specifically includes the following steps:

wherein

And

respectively representing the prior state estimation value at the time k and the posterior state estimation value at the time k-1, A representing a state transition matrix, B representing a transformation matrix input to a state variable, u_kAn input representing the time of the k-th instance,

wherein K is_kRepresenting a filter gain matrix, H representing a state variable to observation variable transition matrix, R representing a measurement noise covariance, z_kRepresenting an observed variable, P_kRepresenting the a posteriori estimated covariance at time k,

step S24: the moving speed and the size of the detection frame of the pedestrians in two adjacent frames in the video cannot be changed, so that the speed and the moving direction of the pedestrians in the previous frame are estimated by using Kalman filtering to predict the position of the pedestrians in the current frame;

L^k＝L^k-1+ΔV^k-1

M^k＝L^k+E^k

wherein L is^k,L^k-1Position of the object, Δ V, at time k and at time k-1, respectively^k-1Representing the target velocity estimated by means of Kalman filtering at time k-1, E^kRepresents the expansion vector of the target, δ is the expansion coefficient, w, h represents the width and height of the target, M^kEstimating a domain for the target;

step S25: and calculating the pedestrian detection result of the current frame and the GIOU of all target estimation domains in the following way:

wherein S_aRepresents the firstArea of bounding box of individual object, S_bArea of bounding box representing second object, S_cIs composed of S_aAnd S_bThe minimum frame of (a); the calculation method of the bounding box area is that the length of the rectangle is multiplied by the width of the rectangle;

step S26: modifying the initial cost matrix according to the calculation result of the IOU, and making the predefined IOU threshold value be t if f is_iAnd d_jCross over and cross over IOU of^ijAnd if the value is less than t, then:

c^ij＝0

if f_iAnd d_jIOU (IoU)^ijIf t is greater than or equal to t, then:

c^ij＝c^ij×IOU^ij。