CN116246232A

CN116246232A - Cross-border head and local feature strategy optimized vehicle multi-target tracking method

Info

Publication number: CN116246232A
Application number: CN202310260074.8A
Authority: CN
Inventors: 曾地荣; 吴伟华; 叶桔
Original assignee: Jiangsu Huazhen Information Technology Co ltd
Current assignee: Jiangsu Huazhen Information Technology Co ltd
Priority date: 2023-03-16
Filing date: 2023-03-16
Publication date: 2023-06-09

Abstract

The invention discloses a vehicle multi-target tracking method based on cross-border head and local feature strategy optimization, and relates to the technical field of intelligent video monitoring security protection. The method comprises the following steps: the method comprises the steps of adjusting the height and depth of a yolov7 model detection anchor frame and a network to obtain a detection frame by reasoning according to a traffic scene picture, giving out confidence score of the detection frame, and knowing the position of a vehicle according to the coordinate position of the detection frame; the object is tracked by discriminating occlusion modules, motion estimation, correlation matching and state updating. The invention provides a multi-target tracking method for a vehicle by proposing local feature re-identification, wherein the local feature re-identification is to strengthen the extraction of visual features of the vehicle by utilizing multiple attributes of the vehicle, and specifically re-identify local features in the same vehicle, such as local objects such as annual check marks, tissue boxes and the like in the same vehicle, so as to realize the local tracking of the vehicle and further track the target vehicle across the boundary heads.

Description

Cross-border head and local feature strategy optimized vehicle multi-target tracking method

Technical Field

The invention belongs to the technical field of intelligent video monitoring security, and particularly relates to a vehicle multi-target tracking method based on cross-border head and local feature policy optimization.

Background

MOT is important in the field of video monitoring, aims to detect and estimate the space-time track of targets in video streams, and has important roles in traffic security and protection application in multi-target tracking and traffic flow monitoring of vehicles;

multi-object tracking (MOT) generally comprises two parts, object detection and tracking, the tracking part consisting of three steps: state estimation, data association and target positioning, most MOT methods based on detection are currently researched based on SORT and JDE methods; the above method achieves good effect in the MOT algorithm, and still has the following disadvantages:

(1) The SORT-based method adopts Kalman filtering as a motion state estimation model, predicts a motion estimation frame of the next frame, correlates with a detection frame of the next frame obtained by a detection model, and is used for predicting a track state under the condition of closed loop or missing detection. Since the kalman filtering is linear, the real scene is nonlinear, and the tracking target is easily influenced by shielding and moving speed, the inaccurate width and size estimation of the estimation frame is caused. Similar to the SORT based IOU method, the quality of the prediction frame tends to be tracked, so that in a complex real scene, the prediction frame can obtain an error position due to the movement of a camera, and the tracking performance is reduced due to the low overlapping rate of the detection frame and the prediction frame.

(2) Because of the running speed of the vehicle, the resolution of the camera and the shooting angle, a high-quality vehicle picture cannot be obtained generally, when the vehicle identification fails, the vehicle reid becomes a very important substitute technology, and the current reid is influenced by low resolution of images, person shielding, visual angles, posture changes, illumination changes, visual ambiguity and the like, so that the difference in the same ID is increased, the difference between different IDs is reduced, and the phenomenon of losing tracking of a vehicle target occurs.

Disclosure of Invention

The invention aims to provide a vehicle multi-target tracking method optimized by cross-border head and local feature strategies, so as to solve the technical problems in the background technology.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the invention relates to a vehicle multi-target tracking method optimized by cross-border head and local feature strategies, which comprises the following steps:

the method comprises the steps of adjusting the height and depth of a yolov7 model detection anchor frame and a network to obtain a detection frame by reasoning according to a traffic scene picture, giving out confidence score of the detection frame, and knowing the position of a vehicle according to the coordinate position of the detection frame;

tracking the target through judging the shielding module, motion estimation, association matching and state updating;

tracking a vehicle across multiple shots by cross-shot tracking;

and counting the ID successfully tracked, setting a time threshold, counting the number of vehicles passing in a specified time, monitoring the vehicle flow in a traffic scene, and feeding back in time.

Further, the coordinate position of the detection frame is obtained as follows:

wherein, dect_Model represents the yolov7 detection Model,

the method comprises the steps of representing a detection result of an ith target of a t-th frame, wherein x, y, w and h represent positions of a detection frame, x and y are coordinate points of the upper left corner of the frame, and w and h represent the length and the width of the frame; confidence represents the confidence score of the box.

Further, the occlusion module identifies the extent of visualization of the current target by sampling the image area of each target and inputting it into the Yolov7 detection network.

Further, local feature extraction takes a reference frame and a current frame as input, generates visual features through a backbone with shared weights, establishes pixel correspondence between the two frames by utilizing a feature interaction model, and then generates local features for each object to enhance the robustness of the appearance features of the target.

Further, the motion estimation aligns the same target of the adjacent frames by adding information on the detection noise scale ignored during the detection of the ECC camera motion compensation Kalman filtering, learns the target running state in the t-1 frame by using the Kalman filtering, estimates the target detection frame of the next frame, marks as an initial target frame (x, y, w, h), and adds camera compensation on the basis of the initial target frame.

Further, in the camera motion compensation process, firstly, a homography matrix I of 3×3 is generated by using an ECC algorithm, and a next frame of image target is aligned to a previous frame of image target through homography association, so that tracking accuracy is improved, and a homography matrix I calculation formula is as follows:

wherein ,

representing the ith target image in the t-1 th frame,/th target image in the t-1 th frame>

Representing the ith target image in the t-th frame, the calculated x, y is the coordinate information of the upper left corner and the lower right corner of the target frame.

Further, the association matching step is as follows:

simultaneously using the appearance and motion information to solve the association allocation problem;

comparing detection box boxes by using generalized IOU as motion cost matrix _Dtc And a target prediction box _target Is a generalization of the IOU distance;

and (3) transmitting the local reference target to a current frame to form a target priori, fusing the target priori features and the visual features, and transmitting the fused target priori features and the visual features to a local feature detection head to obtain the local tracking target of the vehicle.

Further, the motion cost matrix D uses a weighted sum of appearance and motion information, and the calculation formula is as follows:

D＝λD _a +(1-λ)D _m

wherein the weight factor lambda is 0.95, D _a Representing appearance information, D _m Representing motion information, determining a matched vehicle ID and an unmatched vehicle ID through a distance function;

under the constraint of space-time range, tracking targets within the IOU distance range, improving the detection rate of the tracked targets, and generalizing the IOU function formula as follows:

matching by using local feature reid according to similarity cosine distance D _cos Identifying frame F for local characteristics to be re-identified _part And a vehicle local feature base F _all Judging, wherein the calculation formula is as follows:

further, the state update adopts an exponential moving average mode to update the appearance state of the ith track at the t frame, and the EMA update strategy not only improves the matching quality, but also reduces the time consumption, and the formula is as follows:

wherein ,

status of the t frame representing the i-th track,/->

Representing the current appearance state;

and judging and updating the tracking state of the target object in the current frame, and if the target state in the continuous 30 frames is judged to be state_delete, considering that tracking is lost.

Further, cross-shot tracking includes offline tracking and real-time online tracking;

the off-line tracking is to generate a motion track of a target by multiple lenses, and realize the target tracking of a cross-border head through track-to-track matching;

the real-time online tracking is to construct the connection between the local and the target, the target and the track, and the track by using the local reid and the dynamic clustering algorithm as the motion cost matrix, so as to realize the vehicle target tracking across the boundary heads.

The invention has the following beneficial effects:

1. the invention provides a multi-target tracking method for a vehicle by proposing local feature re-identification, wherein the local feature re-identification is to strengthen the extraction of visual features of the vehicle by utilizing multiple attributes of the vehicle, and specifically re-identify local features in the same vehicle, such as local objects such as annual check marks, tissue boxes and the like in the same vehicle, so as to realize the local tracking of the vehicle and further track the target vehicle across the boundary heads.

2. According to the method, a track dynamic clustering algorithm based on space-time constraint is provided, the space-time constraint models the space-time relationship between related cameras through the track clustering algorithm, the cross-track characteristics of the target vehicles are dynamically clustered to constrain the cross-border head matching of the vehicles, and the cross-lens vehicle tracking capability is enhanced.

3. According to the method, a single-frame motion track of the vehicle is estimated by introducing a camera motion compensation and shielding judgment method; the camera motion can best capture local motion in an uncontrolled environment, and the sensitivity of the camera to the moving part well compensates for the estimation of nonlinear or variable motion of the target vehicle; the shielding module extracts local feature reid of the features under the shielding module, and the accuracy of estimation is improved through a local reid method.

4. The invention provides a three-stage association matching strategy; the first stage and the second stage weight the apparent motion information and the generalized IOU distance function to construct a motion cost matrix between the detection and the track; and in the third stage, a local feature matching method is introduced, a motion cost matrix between the local part and the track is constructed, the accuracy of the target to be tracked is improved, and the possibility of tracking the wrong target is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a MOT network framework of the present invention;

fig. 2 is a schematic diagram of an initialization state setting according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1 and 2, the present invention is a vehicle multi-objective tracking method optimized by cross-border head and local feature strategies.

1. Target detection section

YOLOv7 surpasses currently known algorithms in terms of both speed and accuracy using module re-parameterization and dynamic label allocation strategies. Aiming at vehicle detection of a vehicle scene, the invention invokes the existing pre-training model of yolov7, carries out fine adjustment on and detects the heights and depths of the anchor frame and the network, and ensures that the smallest possible model occupies memory while achieving the best detection effect. Giving a traffic scene picture, reasoning through a detection model to obtain a detection frame, giving confidence score of the detection frame, and knowing the position of the vehicle according to the coordinate position of the detection frame. The detection result is shown as formula (1):

wherein, dect_Model represents the yolov7 detection Model,

the detection result of the ith target of the t-th frame is represented, x, y, w and h represent the position of the detection frame, x and y are the upper left corner coordinate points of the frame, and w and h represent the length and width of the frame. confidence represents the confidence score of the box.

2. Target tracking section

Step 1: judging and shielding module

Occlusion module utilizes a multi-target detection bounding box B _r ＝{b ₁ ，b ₂ ，...，b _N Where N represents the number of objects, the extent of visualization of the current object is identified by sampling the image area of each object and inputting it into the Yolov7 detection network, by identifying the occlusion of the object at various times.

The local feature extraction takes a reference frame and a current frame as input, generates visual features through a backbone of weight sharing (for using the same model weight through two identical networks to ensure process consistency), then establishes pixel correspondence between the two frames by utilizing a feature interaction model, specifically comparing pixel-level distances between the frames, carries out Gaussian filtering interpolation on an image area of a target to 128×96 for improving the accuracy of an algorithm, sends the image area to a backbone network, then generates local features for each object, and enhances the robustness of the appearance features of the target, wherein the robustness refers to generalization capability, and has better detection effect on all targets.

In this embodiment, the backbone network is an existing network, specifically, a Resnet network, which is an existing mature technology.

Step 2: motion estimation

The Kalman filtering method is suitable for a non-shielding scene and an object moving at a uniform speed, ignores information on the detection noise scale, adds ECC camera motion compensation, aligns the same target of adjacent frames, and well overcomes the defect of Kalman estimation on a variable speed object.

First, the target running state in the t-1 frame is learned by Kalman filtering, and the target detection frame of the next frame is estimated and marked as an initial target frame (x, y, w, h). And adding camera compensation on the basis of the initial target frame, wherein the camera motion compensation process firstly uses the existing ECC algorithm to generate a homography matrix I of 3 multiplied by 3, and then constructs homography association through an image registration function to enable the image target of the next frame to be aligned with the image target of the previous frame, thereby improving tracking accuracy.

In this embodiment, the implementation of homography association is implemented by constructing an image registration function in opencv in the prior art.

The calculation process is shown as formula (2):

wherein ,

After the target frame is obtained, the current state is initialized, and the pseudo code process is shown in fig. 2.

Step 3: correlation matching stage

The invention provides a three-stage matching strategy. The first stage uses both appearance and motion information to solve the allocation problem, and the motion cost matrix D uses a weighted sum of appearance and motion information as shown in equation (3):

D＝λD _a +(1-λ)D _m (3)

wherein the weight factor lambda is 0.95, D _a Representing appearance information, D _m Representing movement information, the matched vehicle ID and the unmatched vehicle ID are determined by a distance function. Since the unmatched ID and the unmatched vehicle ID of the previous frame contain IDs which have low confidence but can be detected.

In the second stage, generalized IOU is used as a motion cost matrix, and a detection box is compared _Dtc And a target prediction box _target Under the constraint of space-time range, targets within the IOU distance range are tracked, and the detection rate of tracked targets is further improved. The generalization IOU function is shown in formula (4):

the third stage is to propagate the local reference target to the current frame to form a target prior, then to fuse the target prior feature with the visual feature, and send the fused target prior feature to the local feature detection head to obtain the local tracking target of the vehicle, and finally to use the local feature reid to match, according to the similarity cosine distance D _cos Identifying frame F for local characteristics to be re-identified _part And a vehicle local feature base F _all And judging, wherein the process is as shown in a formula (5):

step 4: status update phase

The appearance state of the ith track at the t frame is updated by adopting an index moving average mode according to the EMA updating strategy, so that the matching quality is improved, the time consumption is reduced, and the EMA updating strategy is shown as a formula (6):

wherein ,

status of the t frame representing the i-th track,/->

Indicating the current appearance state, α is a super parameter, typically 0.8.

In addition, the tracking state of the target object of the current frame is judged and updated, and if the target state in the continuous 30 frames is judged to be state_delete, the tracking is considered to be lost.

3. Cross-border head tracking

The cross-lens tracking comprises two mechanisms of off-line tracking and real-time on-line tracking, wherein the off-line tracking is to generate a motion track of a target by multiple lenses, and the target tracking of a cross-border head is realized through track-to-track matching. The real-time online tracking is to construct the connection between the local and the target, the target and the track, and the track by using the local reid and the dynamic clustering algorithm as the motion cost matrix, thereby realizing the vehicle target tracking of the cross-border head.

4. Traffic flow monitoring part

Firstly, counting the ID successfully tracked, then setting a time threshold, counting the number of vehicles passing in a specified time, monitoring the vehicle flow in a traffic scene, and feeding back in time.

The following benefits are achieved when the present embodiment is in use:

1. and (5) providing a local feature re-identification to obtain a vehicle multi-target tracking method. The local feature re-identification is to enhance the extraction of the visual features of the vehicle by utilizing multiple attributes of the vehicle, and specifically re-identify the local features in the same vehicle, for example, re-identify local objects such as annual checkmarks and tissue boxes in the same vehicle, thereby realizing the local tracking of the vehicle and further achieving the purpose of tracking the target vehicle across the boundary heads.

2. And a track dynamic clustering algorithm based on space-time constraint is provided, the space-time constraint models a space-time relation between related cameras through the track clustering algorithm, cross-track characteristics of target vehicles are dynamically clustered to constrain cross-border head matching of the vehicles, and the cross-lens vehicle tracking capability is enhanced.

3. In order to improve the accuracy of estimation, a method of camera motion compensation and occlusion judgment is introduced to estimate a single frame motion track of a vehicle. The camera motion can best capture local motion in an uncontrolled environment, and the sensitivity of the camera to the moving part well compensates for the estimation of nonlinear or variable motion of the target vehicle; the shielding module extracts local feature reid of the features under the shielding module, and the accuracy of estimation is improved through a local reid method.

4. In order to increase the number of target vehicle tracks, a three-stage associative matching strategy is proposed. The first stage and the second stage weight the apparent motion information and the generalized IOU distance function to construct a motion cost matrix between the detection and the track; and in the third stage, a local feature matching method is introduced, a motion cost matrix between the local part and the track is constructed, the accuracy of the target to be tracked is improved, and the possibility of tracking the wrong target is reduced.

In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims

1. A cross-border head and local feature policy optimized vehicle multi-target tracking method is characterized by comprising the following steps:

the method comprises the steps of adjusting the height and depth of a yolov7 model detection anchor frame and a network, obtaining a detection frame by reasoning according to a traffic scene picture, giving out confidence score of the detection frame, and knowing the position of a vehicle according to the coordinate position of the detection frame;

tracking a vehicle across multiple shots by cross-shot tracking;

2. The cross-border head and local feature policy optimized vehicle multi-objective tracking method of claim 1, wherein the coordinate position of the detection frame is obtained as follows:

wherein, dect_Model represents the yolov7 detection Model,

representing the detection result of the ith target of the t-th frame, wherein x, y, w and h represent the position of the detection frame, and x and y are the upper left corner coordinate points of the frameW, h represents the length and width of the box; confidence represents the confidence score of the box.

3. The cross-head and local feature policy optimized vehicle multi-objective tracking method of claim 1 wherein the occlusion module identifies the extent of visualization of the current objective by sampling and inputting image areas of each objective into the Yolov7 detection network.

4. The vehicle multi-target tracking method optimized by cross-border head and local feature strategies according to claim 3, wherein local feature extraction uses a reference frame and a current frame as input, generates visual features through a backbone with shared weights, establishes pixel correspondence between two frames by using a feature interaction model, generates local features for each object, and enhances the robustness of target appearance features.

5. The method for vehicle multi-target tracking with cross-border head and local feature policy optimization according to claim 1, wherein the motion estimation aligns the same target of adjacent frames by adding information on a detection noise scale ignored in motion compensation kalman filter detection of an ECC camera, learns a target running state in a t-1 th frame by using kalman filter, obtains a target detection frame of a next frame, marks as an initial target frame (x, y, w, h), and adds camera compensation based on the initial target frame.

6. The vehicle multi-target tracking method optimized by cross-border head and local feature strategies according to claim 5, wherein the camera motion compensation process firstly uses an ECC algorithm to generate a 3×3 homography matrix I, and the homography matrix I is calculated as follows by aligning a next frame of image target with a previous frame of image target through homography association to improve tracking accuracy:

wherein ,

7. The cross-head and local feature policy optimized vehicle multi-objective tracking method of claim 1, wherein said association matching step is as follows:

8. The cross-head and local feature policy optimized vehicle multi-objective tracking method of claim 7, wherein the motion cost matrix D uses a weighted sum of appearance and motion information, and the calculation formula is as follows:

D＝λD _a +(1-λ)D _m

9. the cross-border head and local feature policy optimized vehicle multi-target tracking method according to claim 1, wherein the state update updates the appearance state of the ith track at the t-th frame in an exponential moving average manner, and the EMA update policy not only improves the matching quality, but also reduces the time consumption, and the formula is as follows:

wherein ,

status of the t frame representing the i-th track,/->

Representing the current appearance state, wherein alpha is a super parameter;

and judging and updating the tracking state of the target object in the current frame, and considering that tracking is lost when the target state in the continuous 30 frames is judged to be state_delete.

10. The cross-lens and local feature policy optimized vehicle multi-target tracking method of claim 1, wherein cross-lens tracking comprises offline tracking and real-time online tracking;