CN117576166A

CN117576166A - Target tracking method and system based on camera and low-frame-rate laser radar

Info

Publication number: CN117576166A
Application number: CN202410054817.0A
Authority: CN
Inventors: 叶建标; 王双彬; 李云鹏; 朱佳豪; 雷明根; 赵立立
Original assignee: Zhejiang Whyis Technology Co ltd
Current assignee: Zhejiang Whyis Technology Co ltd
Priority date: 2024-01-15
Filing date: 2024-01-15
Publication date: 2024-02-20
Anticipated expiration: 2044-01-15
Also published as: CN117576166B

Abstract

The invention discloses a target tracking method and a target tracking system based on a camera and a low-frame-rate laser radar, wherein the method comprises the following steps: s1, acquiring an image and point cloud data at the current moment; s2, identifying a 2D detection target in the image; when the point cloud data is not empty, identifying a 3D detection target in the point cloud data; when the point cloud data is empty, taking the 2D detection target as an actual detection target and turning to step S4; s3, projecting the 3D detection target into a two-dimensional coordinate system of the image, and fusing the 3D detection target with a corresponding 2D detection target to obtain an actual detection target; s4, predicting to obtain a predicted track target at the current moment, calculating the total matching cost of the predicted track target and the actual detection target, and obtaining a matching result by adopting a Hungary algorithm according to a matching threshold and the total matching cost; s5, carrying out state updating and image feature updating on the successfully matched predicted track target. And the stable and accurate target tracking is ensured to be completed on the premise of low cost.

Description

Target tracking method and system based on camera and low-frame-rate laser radar

Technical Field

The invention relates to the field of target tracking, in particular to a target tracking method and system based on a camera and a low-frame-rate laser radar.

Background

With the development of deep learning technology, target detection and tracking technology is increasingly applied to the security industry. The specific requirements of the security industry also put higher requirements on the target detection and tracking technology. For example, in a dangerous work place, it is necessary to determine whether the target is close to a dangerous area based on the 3D motion trajectory of the target, and in a confidential place, it is necessary to determine the intention of the target based on the 3D motion trajectory of the target.

To meet these needs, the prior art has two main solutions. The first method is a laser radar-based 3D target tracking method, which is mainly used for predicting the 3D space position of a target based on point cloud and realizing the tracking of the 3D target based on continuous 3D space positions or point cloud characteristics of the target. However, implementing such tracking methods requires the use of high frame rate lidars, while requiring real-time processing of high frame rate point cloud data using high performance computing platforms, which results in higher costs. The second is a camera-based 3D object tracking method. The method is used for detecting and tracking the target based on the image, and meanwhile, a laser radar is used for acquiring the 3D space position of the image target, so that the tracking of the 3D target is realized. However, the method is used for tracking the 3D target, is completely dependent on the image, is easily influenced by environmental factors such as illumination, weather and the like, and has poor practical application effect.

There is no effective solution to the above problems in the prior art.

Disclosure of Invention

In order to solve the problems, the invention provides a target tracking method and a target tracking system based on a camera and a low-frame-rate laser radar, which are used for completing tracking of a target by processing point cloud data acquired by the low-frame-rate laser radar and images acquired by the camera, and performing target matching of adjacent frames by fusing a 2D detection frame, a 3D position and image features of the target in the process, so as to solve the problems of high cost and unstable tracking effect in the prior art.

In order to achieve the above object, the present invention provides a target tracking method based on a camera and a low frame rate lidar, comprising: s1, acquiring an image and point cloud data at the current moment; s2, detecting a target in the image by adopting a first identification model to obtain a 2D detection target; when the point cloud data are not empty, detecting a target in the point cloud data by adopting a second identification model to obtain a 3D detection target; when the point cloud data is empty, taking the 2D detection target as an actual detection target and transferring to step S4; the first recognition model and the second recognition model are obtained by model training according to the annotation image and the annotation point cloud data respectively; s3, projecting the 3D detection target into a two-dimensional coordinate system of the image, and fusing the 3D detection target with a corresponding 2D detection target to obtain an actual detection target; s4, carrying out state prediction on the track target at the previous moment to obtain a predicted track target at the current moment, calculating the total matching cost of the predicted track target and the actual detection target, and obtaining a matching result of the predicted track target and the actual detection target by adopting a Hungary algorithm according to a matching threshold and the total matching cost; s5, carrying out state updating and image feature updating on the successfully matched predicted track target according to the matched actual detection target.

Further optionally, the projecting the 3D detection target into the two-dimensional coordinate system of the image and fusing with the corresponding 2D detection target includes: s301, projecting a three-dimensional target frame of the 3D detection target into a two-dimensional coordinate system of an image according to camera internal parameters and external parameters of a relative laser radar to obtain a two-dimensional target frame; s302, calculating the matching scores of the two-dimensional target frames and the target frames of the 2D detection targets according to the distances to obtain matching score matrixes of all the two-dimensional target frames and all the target frames of the 2D detection targets; s303, obtaining a matching result of the 3D detection target and the 2D detection target by adopting a Hungary algorithm according to the matching score matrix and the matching score threshold; s304, fusing the category and the confidence of the 3D detection target and the corresponding 2D detection target; the category and the confidence of the 3D detection target are detected according to the first recognition model, and the category and the confidence of the 2D detection target are detected according to the second recognition model.

Further optionally, the obtaining, by using a hungarian algorithm, a matching result between the predicted track target and the actual detection target according to the matching threshold and the total matching cost includes: s401, marking an actual detection target with the fused confidence coefficient higher than or equal to a first confidence coefficient threshold value as a high confidence coefficient target, and marking an actual detection target with the fused confidence coefficient higher than or equal to a second confidence coefficient threshold value and lower than the first confidence coefficient threshold value as a low confidence coefficient target; wherein the first confidence threshold is greater than the second confidence threshold; s402, matching the high-confidence-degree target with a predicted track target by adopting a Hungary algorithm according to a first matching cost threshold and matching cost to obtain a predicted track target corresponding to the high-confidence-degree target; s403, taking the high-confidence-degree targets and the low-confidence-degree targets which are not successfully matched as other targets, and matching the other targets with the unmatched predicted track targets by adopting a Hungary algorithm according to a second matching cost threshold and matching cost to obtain predicted track targets corresponding to the other targets; wherein the second matching cost threshold is less than the first matching cost threshold.

Further optionally, the calculating the total matching cost of the predicted track target and the actual detection target includes: s404, calculating the IOU matching cost of the target frame of the 2D detection target corresponding to the actual detection target and the 2D target frame of the predicted track target; s405, calculating feature matching cost of the image feature vector of the predicted track target and the image feature vector of the actual detection target; wherein, each image feature vector is extracted by adopting a pre-trained feature extraction model; s406, calculating the distance matching cost of the target frame of the 3D detection target corresponding to the actual detection target and the 3D target frame of the predicted track target; s407, weighting and summing the IOU matching cost, the feature matching cost and the distance matching cost to obtain the total matching cost.

Further optionally, after the performing state update and image feature update on the successfully matched predicted track target according to the matched actual detection target, the method includes: s6, for the predicted track target which is not successfully matched, adding one to the lost frame number of the predicted track target, and deleting the corresponding target when the lost frame number corresponding to the predicted track target is higher than a frame number threshold; s7, for an actual detection target which is not successfully matched, initializing a new track when the actual detection target corresponds to a 3D detection target and the confidence coefficient is higher than a third confidence coefficient threshold value; otherwise, no initialization is performed.

On the other hand, the invention also provides a target tracking system based on the camera and the low frame rate laser radar, which comprises: the data acquisition module is used for acquiring the image and the point cloud data at the current moment; the target identification module is used for detecting a target in the image by adopting a first identification model to obtain a 2D detection target; when the point cloud data are not empty, detecting a target in the point cloud data by adopting a second identification model to obtain a 3D detection target; when the point cloud data is empty, transferring the 2D detection target as an actual detection target to a track target matching module for processing; the first recognition model and the second recognition model are obtained by model training according to the annotation image and the annotation point cloud data respectively; the fusion module is used for projecting the 3D detection target into a two-dimensional coordinate system of the image and fusing the 3D detection target with a corresponding 2D detection target to obtain an actual detection target; the track target matching module is used for carrying out state prediction on the track target at the previous moment to obtain a predicted track target at the current moment, calculating the total matching cost of the predicted track target and the actual detection target, and obtaining a matching result of the predicted track target and the actual detection target by adopting a Hungary algorithm according to a matching threshold and the total matching cost; and the updating module is used for carrying out state updating and image characteristic updating on the successfully matched predicted track target according to the matched actual detection target.

Further optionally, the fusion module includes: the projection sub-module is used for projecting the three-dimensional target frame of the 3D detection target into a two-dimensional coordinate system of an image according to the camera internal parameters and the external parameters of the relative laser radar to obtain a two-dimensional target frame; the matching score calculation sub-module is used for calculating the matching scores of the two-dimensional target frames and the target frames of the 2D detection targets according to the distances to obtain matching score matrixes of all the two-dimensional target frames and all the target frames of the 2D detection targets; the target matching sub-module is used for obtaining a matching result of the 3D detection target and the 2D detection target by adopting a Hungary algorithm according to the matching score matrix and the matching score threshold; the fusion sub-module is used for fusing the category and the confidence of the 3D detection target and the corresponding 2D detection target; the category and the confidence of the 3D detection target are detected according to the first recognition model, and the category and the confidence of the 2D detection target are detected according to the second recognition model.

Further optionally, the track target matching module includes: the priority determining sub-module is used for marking the actual detection targets with the fused confidence coefficient higher than or equal to the first confidence coefficient threshold value as high-confidence coefficient targets, and marking the actual detection targets with the fused confidence coefficient higher than or equal to the second confidence coefficient threshold value and lower than the first confidence coefficient threshold value as low-confidence coefficient targets; wherein the first confidence threshold is greater than the second confidence threshold; the first matching sub-module is used for matching the high-confidence-degree target with the predicted track target by adopting a Hungary algorithm according to a first matching cost threshold and matching cost to obtain the predicted track target corresponding to the high-confidence-degree target; the second matching sub-module is used for taking the high-confidence-coefficient targets and the low-confidence-coefficient targets which are not successfully matched as other targets, and matching the other targets with the unmatched predicted track targets by adopting a Hungary algorithm according to a second matching cost threshold value and matching cost to obtain predicted track targets corresponding to the other targets; wherein the second matching cost threshold is less than the first matching cost threshold.

Further optionally, the track target matching module includes: the first matching cost calculation sub-module is used for calculating the IOU matching cost of the target frame of the 2D detection target corresponding to the actual detection target and the 2D target frame of the predicted track target; the second matching cost calculation sub-module calculates the feature matching cost of the image feature vector of the predicted track target and the image feature vector of the actual detection target; wherein, each image feature vector is extracted by adopting a pre-trained feature extraction model; a third matching cost calculation sub-module for calculating the distance matching cost of the target frame of the 3D detection target corresponding to the actual detection target and the 3D target frame of the predicted track target; and the total matching cost calculation sub-module is used for carrying out weighted summation on the IOU matching cost, the feature matching cost and the distance matching cost to obtain the total matching cost.

Further optionally, the system further comprises: the track deleting module is used for adding one to the lost frame number of the unmatched predicted track target, and deleting the corresponding target when the lost frame number corresponding to the predicted track target is higher than a frame number threshold; the track initialization module is used for initializing a new track when the actual detection target is not matched with the successfully-detected target, the 3D detection target corresponds to the actually-detected target, and the confidence coefficient is higher than a third confidence coefficient threshold value; otherwise, no initialization is performed.

The technical scheme has the following beneficial effects: on the basis of using the low-frame-rate laser radar, the detection results of the point cloud and the image are fused, the total matching cost is obtained by integrating various information, and the target matching of the adjacent frames is carried out on the basis of the total matching cost, so that a more accurate and stable tracking effect is obtained.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for camera and low frame rate lidar based target tracking provided by an embodiment of the present invention;

FIG. 2 is a flow chart of a target fusion method provided by an embodiment of the present invention;

FIG. 3 is a flowchart of a track matching method provided by an embodiment of the present invention;

FIG. 4 is a flowchart of a total matching cost calculation method provided by an embodiment of the present invention;

FIG. 5 is a flowchart of a method for processing an unmatched object according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a camera and low frame rate lidar based target tracking system according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a fusion module according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a submodule for matching tracks in a track target matching module according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a sub-module for calculating total matching cost in a track target matching module according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a track deleting module and a track initializing module according to an embodiment of the present invention.

Reference numerals: 100-a data acquisition module; 200-a target recognition module; 300-a fusion module; 3001-projection submodules; 3002-matching score calculation sub-module; 3003-target matching sub-module; 3004-fusion submodule; 400-track target matching module; 4001-priority determination submodule; 4002-a first matching sub-module; 4003-a second matching sub-module; 4004-a first matching cost calculation sub-module; 4005-a second matching cost calculation sub-module; 4006-a third matching cost calculation sub-module; 4007—a total matching cost calculation sub-module; 500-updating the module; 600-track deletion module; 700-track initialization module.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order to solve the problem that a low-frame-rate radar cannot be adopted to perform efficient and stable target tracking in the prior art, an embodiment of the present invention provides a target tracking method based on a camera and a low-frame-rate laser radar, and fig. 1 is a flowchart of the target tracking method based on the camera and the low-frame-rate laser radar provided in the embodiment of the present invention, as shown in fig. 1, the method includes:

s1, acquiring an image and point cloud data at the current moment;

the point cloud data is acquired using a low frame rate lidar and the image data is acquired using a high frame rate camera. Because of the low frame rate of lidar, point cloud data may not be acquired when an image is acquired. And if the point cloud data is not successfully acquired, setting the point cloud data to be empty. Wherein, the low frame rate radar refers to a laser radar with a frame rate below 10 HZ.

As an alternative implementation, two threads are started when data is acquired, one thread is used for reading a video stream from a camera, and the two threads are called camera threads; the other thread is used for receiving point cloud data from the lidar, and is called a lidar thread. The acquisition based on double threads ensures that the data of the camera and the laser radar can be acquired simultaneously.

And combining the images and the point cloud data at the same moment into a data pair according to the time stamp.

S2, detecting a target in the image by adopting a first identification model to obtain a 2D detection target; when the point cloud data is not empty, detecting a target in the point cloud data by adopting a second identification model to obtain a 3D detection target; when the point cloud data is empty, taking the 2D detection target as an actual detection target and turning to step S4; the first recognition model and the second recognition model are obtained by model training according to the annotation image and the annotation point cloud data respectively;

the first recognition model and the second recognition model are trained in advance. With respect to the first recognition model, corresponding image data is acquired and annotated in advance, and a neural network model for image object detection is trained using the annotated annotation image. As an alternative embodiment, the network model includes YOLO series, center net series, and DETR series.

And carrying out target detection on the image by adopting the trained first recognition model to obtain a 2D detection target. For one frame image, there may be a plurality of 2D detection targets.

And (3) regarding the second recognition model, acquiring and labeling corresponding point cloud data in advance, and training a neural network model for point cloud target detection by using the labeled point cloud data. As an alternative embodiment, the network model comprises: pointNet, centerPoint, pointPillars.

And when the point cloud data is not empty, performing target recognition on the point cloud data by adopting a trained second recognition model to obtain a 3D detection target. For one point cloud data set, there may be multiple 3D detection targets.

S3, projecting the 3D detection target into a two-dimensional coordinate system of the image, and fusing the 3D detection target with a corresponding 2D detection target to obtain an actual detection target;

and projecting the 3D detection target into the image so that the 3D detection target and the 2D detection target can be matched according to the distance, and the corresponding relation between the 3D detection target and the 2D detection target is obtained, so that the corresponding 3D detection target and the 2D detection target are fused, and the actual detection target corresponds to the 2D detection target and the matched 3D detection target.

S4, carrying out state prediction on the track target at the previous moment to obtain a predicted track target at the current moment, calculating the total matching cost of the predicted track target and the actual detection target, and obtaining a matching result of the predicted track target and the actual detection target by adopting a Hungary algorithm according to a matching threshold and the total matching cost;

and carrying out state prediction on the track target obtained at the previous moment to obtain a predicted track target at the current moment. As an alternative embodiment, the states of the track targets are [ x3d, y3d, z3d, x2d, y2d, a2d, h2d, vx3d, vy3d, vz3d, vx2d, vy2d, va2d, vh2d ]. Where x3D, y3D, z3D denote the center coordinates of the target in the 3D space detection frame. x2d, y2d represents the center coordinates of the object detection frame on the image, and a2d and h2d represent the aspect ratio and height of the image detection frame. vx3d, vy3d, vz3d, vx2d, vy2d, va2d, vh2d represent the rate of change of the corresponding state value.

Then, the state transition equation of the target is:

；

wherein,for the state value of the target at time t +.>For the state value of the target at time t-1, ->Is a state transition matrix. Assuming constant speed motion of the target, then:

；

after the predicted track target is obtained, the predicted track target is matched with an actual detection target (if the point cloud data is empty, the 2D detection target is taken as the actual detection target) so as to ensure that a continuous motion track of the target is obtained. This process may be in a many-to-many situation where multiple predicted trajectory targets match multiple actual detection targets, in which case a global match is required. At this time, it is necessary to calculate the total matching cost of each predicted track target corresponding to each actual detection target, where the total matching cost may be calculated by integrating the distance matching cost, the image feature matching cost, and the like. And then carrying out global matching according to the obtained total matching cost and a preset matching threshold value to obtain an optimal matching result, so that the position of a certain target in the frame and the position of the previous frame are matched into the same track.

S5, carrying out state updating and image feature updating on the successfully matched predicted track target according to the matched actual detection target.

The method comprises the steps of updating the successfully matched predicted track target, namely updating the state parameters according to the matched detection target, and updating the image characteristics of the track target according to the image characteristics of the detection target so as to meet the matching requirement of the next frame.

Wherein, the state update of the track target is realized by the following formula:

；

wherein,for predicting the state variable of the track object, +.>For the matched actual detection target, H is the transition matrix of the state variable to the actual detection target,/-A>For Kalman gain, ++>Is the updated track state variable. It is to be noted that when the 3D detection target is not included in the actual detection targets, the 3D position-related variable among the state variables is not updated.

And presetting a confidence coefficient threshold value for updating the image features, and if the confidence coefficient of the actual detection target is higher than the confidence coefficient threshold value, updating the image features of the track by using the image features corresponding to the actual detection target. Otherwise, the image features of the track are not updated to reduce mismatching. As an alternative embodiment, image features are extracted using a pre-trained CNN network model.

As an optional implementation manner, fig. 2 is a flowchart of a target fusion method provided by an embodiment of the present invention, where, as shown in fig. 2, a 3D detection target is projected into a two-dimensional coordinate system of an image and fused with a corresponding 2D detection target, including:

s301, projecting a three-dimensional target frame of a 3D detection target into a two-dimensional coordinate system of an image according to camera internal parameters and external parameters of a relative laser radar to obtain a two-dimensional target frame;

assume that the vertex coordinates of a three-dimensional target frame of a 3D detection target areThe coordinates of this point in the camera coordinate system can be obtained from>。

；

Wherein T is an external reference matrix of the camera relative to the laser radar. After obtaining the coordinates of the vertices of the three-dimensional target frame in the camera coordinate system, the pixel coordinates (u, v) of the three-dimensional target frame on the image (two-dimensional coordinate system) can be obtained by calculating the following formula, wherein a is the camera internal reference.

；

After the projection of the vertexes of the three-dimensional target frame on the image is obtained, the circumscribed frames of all the vertexes on the image can be calculated, so that the corresponding two-dimensional target frame is obtained.

S302, calculating the matching scores of the two-dimensional target frames and the target frames of the 2D detection targets according to the distances to obtain matching score matrixes of all the two-dimensional target frames and all the target frames of the 2D detection targets;

Let the two-dimensional target frame projected by the 3D detection target on the image beThe target frame of the 2D detection target is +.>. Where the subscript a x b denotes a matrix of a rows and b columns. Calculate->And->Obtain a matching score matrix +.>. The calculation mode of the matching score comprises the following steps: IOU, DIOU, and CIOU.

S303, obtaining a matching result of the 3D detection target and the 2D detection target by adopting a Hungary algorithm according to the matching score matrix and the matching score threshold;

and presetting a matching score threshold, obtaining a matching score matrix and the matching score threshold according to calculation, and performing global matching by adopting a Hungary algorithm to obtain the corresponding relation between the two-dimensional target frame and the target frame of the 2D detection target, thereby obtaining the matching result of the 2D detection target and the 3D detection target.

S304, fusing the category and the confidence of the 3D detection target and the corresponding 2D detection target; the category and the confidence of the 3D detection target are detected according to the first recognition model, and the category and the confidence of the 2D detection target are detected according to the second recognition model.

And the matching result shows the corresponding relation between the 3D detection target and the 2D detection target, and the targets are fused according to the corresponding relation.

The fusion includes a fusion of the class and confidence of the identified object.

Because the image contains abundant semantic information, the target category can be identified more accurately, and therefore the category in the 2D detection target is directly used as the category of the actual detection target.

The confidence coefficient can be fused in various ways, taking average fusion as an example, and assuming that the confidence coefficient of the 3D detection target is c1 and the confidence coefficient of the 2D detection target is c2, the confidence coefficient after fusion is (c1+c2)/2. In addition, the weighting coefficient can be set according to the image in the target frame of the 2D detection target to realize the dynamic fusion of the confidence coefficient, which comprises the following steps: based on fusion of image gradient, brightness and contrast.

The first recognition model recognizes the 2D detection target and outputs the category and the confidence of the 2D detection target; the second recognition model recognizes the 3D detection target and outputs the category and the confidence of the 3D detection target.

As an optional implementation manner, fig. 3 is a flowchart of a track matching method provided by the embodiment of the present invention, as shown in fig. 3, a matching result of a predicted track target and an actual detection target is obtained by using a hungarian algorithm according to a matching threshold and a total matching cost, including:

s401, marking an actual detection target with the fused confidence coefficient higher than or equal to a first confidence coefficient threshold value as a high confidence coefficient target, and marking an actual detection target with the fused confidence coefficient higher than or equal to a second confidence coefficient threshold value and lower than the first confidence coefficient threshold value as a low confidence coefficient target; wherein the first confidence threshold is greater than the second confidence threshold;

Presetting two confidence thresholds, i.e. a first confidence thresholdAnd a second confidence threshold->And (2) and>. The actual detection targets with the confidence higher than or equal to the first confidence threshold value are called high-confidence targets, the actual detection targets with the confidence higher than or equal to the second confidence threshold value but lower than the first confidence threshold value are labeled low-confidence targets, and the targets with the confidence lower than the second confidence threshold value are not subjected to subsequent matching processing.

S402, matching the high-confidence-degree target with a predicted track target by adopting a Hungary algorithm according to a first matching cost threshold and matching cost to obtain a predicted track target corresponding to the high-confidence-degree target;

firstly, matching a high confidence coefficient target with a predicted track target to obtain a matching result:

；

wherein,cost matrix for matching high confidence target with predicted trajectory target, +.>And a preset first matching cost threshold value is set.And returning a matching result according to the matching cost matrix and the first matching cost threshold value for the Hungary matching function.

S403, taking the high-confidence-degree targets and the low-confidence-degree targets which are not successfully matched as other targets, and matching the other targets with the unmatched predicted track targets by adopting a Hungary algorithm according to a second matching cost threshold and matching cost to obtain predicted track targets corresponding to the other targets; wherein the second matching cost threshold is less than the first matching cost threshold.

Combining the high confidence coefficient target and the low confidence coefficient target which are not successfully matched into other targets, and carrying out secondary matching on the other targets and the unmatched predicted track target to obtain a matching result:

；

wherein,for a matching cost matrix of the remaining objects and the unmatched predicted trajectory objects,for a second matching cost threshold value set in advance, +.>. That is, the matching requirement is more strict for the actual detection target with low confidence.

As an optional implementation manner, fig. 4 is a flowchart of a total matching cost calculating method provided by an embodiment of the present invention, and as shown in fig. 4, calculating a total matching cost of a predicted track target and an actual detection target includes:

s404, calculating the IOU matching cost of the target frame of the 2D detection target corresponding to the actual detection target and the 2D target frame of the predicted track target;

in order to improve accuracy and stability of the matching result, the embodiment calculates the matching cost based on multiple items of information.

The IOU matching cost is calculated as follows:

；

wherein,，the IOU () is a function of calculating two rectangular boxes IOU, which are 2D object boxes of the predicted track object and the actual detected object, respectively.

S405, calculating feature matching cost of the image feature vector of the predicted track target and the image feature vector of the actual detection target; wherein, each image feature vector is extracted by adopting a pre-trained feature extraction model;

Furthermore, the embodiment also calculates the feature matching cost of the image feature vector, and the calculation process is as follows:

；

wherein,image feature vectors of the predicted trajectory target and the actual detected target, respectively, are extracted based on a pre-trained CNN model, including but not limited to network models such as VGG, resNet and the like.Then a function that calculates the cosine similarity of the two vectors.

S406, calculating the distance matching cost of the target frame of the 3D detection target corresponding to the actual detection target and the 3D target frame of the predicted track target;

because the frame rate of the laser radar is low, the condition that the point cloud data at the current moment is empty may occur, and then a 3D detection target does not exist, and at the moment, the distance matching cost of 3D is not calculated.

When the point cloud data is not empty, the 3D distance matching cost is calculated as follows:

；

wherein,，center coordinates of target frames of 3D detection targets of the predicted trajectory target and the actual detection target, respectively. Dist () is a function that calculates the distance between two 3D points.

S407, weighting and summing the IOU matching cost, the feature matching cost and the distance matching cost to obtain the total matching cost.

When the point cloud data is not empty, carrying out weighted average on the three matching costs to obtain the total matching cost:

；

If the current frame data pair does not contain the point cloud data, not calculatingItem, total matching cost:

；

wherein, alpha, beta and gamma are the weights of the corresponding matching cost items.

As an optional implementation manner, fig. 5 is a flowchart of a method for processing an unmatched target according to an embodiment of the present invention, where, as shown in fig. 5, after performing state update and image feature update on a successfully matched predicted track target according to a matched actual detection target, the method includes:

s6, for the predicted track target which is not successfully matched, adding one to the lost frame number of the predicted track target, and deleting the corresponding target when the lost frame number corresponding to the predicted track target is higher than a frame number threshold;

and accumulating the lost frame numbers of the track targets which are not successfully matched, and when the accumulated lost frame numbers of the track targets are higher than a preset frame number threshold value, considering that the targets are moved out of the detection range, and deleting the track targets.

S7, for an actual detection target which is not successfully matched, initializing a new track when the actual detection target corresponds to a 3D detection target and the confidence coefficient is higher than a third confidence coefficient threshold value; otherwise, no initialization is performed.

And initializing a new track according to whether the detection target contains 3D position information and the confidence level for the unmatched detection target. If the target contains 3D position information (i.e., the frame data contains point cloud data), further determining the confidence of the target, otherwise, not initializing a new track. If the confidence of the target is higher than the threshold, the target is considered to be the latest target, and a new track is initialized at the moment, otherwise, the new track is not initialized.

The embodiment of the invention also provides a target tracking system based on the camera and the low-frame-rate laser radar, and fig. 6 is a schematic structural diagram of the target tracking system based on the camera and the low-frame-rate laser radar, as shown in fig. 6, and the system comprises:

the data acquisition module 100 is used for acquiring an image and point cloud data at the current moment;

the point cloud data is acquired using a low frame rate lidar and the image data is acquired using a high frame rate camera. Because of the low frame rate of lidar, point cloud data may not be acquired simultaneously when images are acquired. And if the point cloud data is not successfully acquired, setting the point cloud data to be empty. Wherein, the low frame rate radar refers to a laser radar with a frame rate below 10 HZ.

The target recognition module 200 is configured to detect a target in the image by using the first recognition model, so as to obtain a 2D detection target; when the point cloud data is not empty, detecting a target in the point cloud data by adopting a second identification model to obtain a 3D detection target; when the point cloud data is empty, transferring the 2D detection target as an actual detection target to a track target matching module for processing; the first recognition model and the second recognition model are obtained by model training according to the annotation image and the annotation point cloud data respectively;

The fusion module 300 is configured to project the 3D detection target into a two-dimensional coordinate system of the image, and fuse the 3D detection target with a corresponding 2D detection target to obtain an actual detection target;

The track target matching module 400 is configured to perform state prediction on a track target at a previous time to obtain a predicted track target at a current time, calculate a total matching cost of the predicted track target and an actual detection target, and obtain a matching result of the predicted track target and the actual detection target by adopting a hungarian algorithm according to a matching threshold and the total matching cost;

Then, the state transition equation of the target is:

；

wherein,for the state value of the target at time t +.>For the state value of the target at time t-1, ->Is a state transition matrix. Assuming constant speed motion of the target, then: />

；

And the updating module 500 is used for carrying out state updating and image characteristic updating on the successfully matched predicted track target according to the matched actual detection target.

；

wherein,to predictState variable of track object->For the matched actual detection target, H is the transition matrix of the state variable to the actual detection target,/-A>For Kalman gain, ++>Is the updated track state variable. It is to be noted that when the 3D detection target is not included in the actual detection targets, the 3D position-related variable among the state variables is not updated.

As an alternative implementation manner, fig. 7 is a schematic structural diagram of a fusion module provided by an embodiment of the present invention, and as shown in fig. 7, a fusion module 300 includes:

The projection submodule 3001 is used for projecting a three-dimensional target frame of the 3D detection target into a two-dimensional coordinate system of an image according to the camera internal parameters and the external parameters of the relative laser radar to obtain a two-dimensional target frame;

assume that the vertex coordinates of a three-dimensional target frame of a 3D detection target areThe coordinates of this point in the camera coordinate system can be obtained from>：

；

Wherein T is an external reference matrix of the camera relative to the laser radar. After obtaining the coordinates of the vertices of the three-dimensional target frame in the camera coordinate system, the pixel coordinates (u, v) of the three-dimensional target frame on the image (two-dimensional coordinate system) can be obtained by calculating the following formula, wherein a is the camera internal reference:

；

A matching score calculating submodule 3002, configured to calculate matching scores of the two-dimensional target frames and the target frames of the 2D detection targets according to the distances, and obtain matching score matrices of all the two-dimensional target frames and all the target frames of the 2D detection targets;

let the two-dimensional target frame projected by the 3D detection target on the image beThe target frame of the 2D detection target is +.>. Where the subscript a x b denotes a matrix of a rows and b columns. Calculate- >And->Obtain a matching score matrix +.>. The calculation mode of the matching score comprises the following steps: IOU, DIOU, and CIOU.

The target matching submodule 3003 is used for obtaining a matching result of the 3D detection target and the 2D detection target by adopting a Hungary algorithm according to the matching score matrix and the matching score threshold;

A fusion submodule 3004, configured to fuse the category and the confidence of the 3D detection target and the corresponding 2D detection target; the category and the confidence of the 3D detection target are detected according to the first recognition model, and the category and the confidence of the 2D detection target are detected according to the second recognition model.

And the matching result shows the corresponding relation between the 3D detection target and the 2D detection target, and the targets are fused according to the corresponding relation. The fusion includes a fusion of the class and confidence of the identified object.

Because the image contains abundant semantic information, the target category can be identified more accurately, and therefore the category of the 2D detection target is directly taken as the category of the actual detection target.

As an alternative implementation manner, fig. 8 is a schematic structural diagram of a sub-module for track matching in a track target matching module provided in an embodiment of the present invention, and as shown in fig. 8, a track target matching module 400 includes:

the priority determining submodule 4001 is configured to mark an actual detection target with a fused confidence level higher than or equal to the first confidence level threshold as a high confidence level target, and mark an actual detection target with a fused confidence level higher than or equal to the second confidence level threshold and lower than the first confidence level threshold as a low confidence level target; wherein the first confidence threshold is greater than the second confidence threshold;

The first matching submodule 4002 is used for matching the high-confidence-degree target with the predicted track target by adopting a hungarian algorithm according to a first matching cost threshold and matching cost to obtain the predicted track target corresponding to the high-confidence-degree target;

；

wherein,cost matrix for matching high confidence target with predicted trajectory target, +.>And a preset first matching cost threshold value is set.For Hungary matching functionsAnd returning a matching result according to the matching cost matrix and the first matching cost threshold.

The second matching submodule 4003 is configured to take the high-confidence-level target and the low-confidence-level target that are not successfully matched as other targets, and match the other targets with the unmatched predicted track target by adopting a hungarian algorithm according to a second matching cost threshold and a matching cost to obtain predicted track targets corresponding to the other targets; wherein the second matching cost threshold is less than the first matching cost threshold.

；

As an alternative implementation manner, fig. 9 is a schematic structural diagram of a sub-module for calculating a total matching cost in a track target matching module provided in an embodiment of the present invention, and as shown in fig. 9, a track target matching module 400 includes:

a first matching cost calculation submodule 4004, configured to calculate an IOU matching cost of a target frame of the 2D detection target corresponding to the actual detection target and a 2D target frame of the predicted track target;

The IOU matching cost is calculated as follows:

；

A second matching cost calculation sub-module 4005 that calculates a feature matching cost of the image feature vector of the predicted trajectory target and the image feature vector of the actual detection target; wherein, each image feature vector is extracted by adopting a pre-trained feature extraction model;

；

A third matching cost calculation sub-module 4006 that calculates a distance matching cost of a target frame of the 3D detection target corresponding to the actual detection target and a 3D target frame of the predicted trajectory target;

；

The total matching cost calculation submodule 4007 performs weighted summation on the IOU matching cost, the feature matching cost and the distance matching cost to obtain the total matching cost.

；

As an alternative implementation manner, fig. 10 is a schematic structural diagram of a track deletion module and a track initialization module provided by the embodiment of the present invention, and as shown in fig. 10, the system further includes:

the track deleting module 600 is configured to add one to the number of lost frames of the predicted track target that is not successfully matched, and delete the corresponding target when the number of lost frames corresponding to the predicted track target is higher than the threshold number of frames;

The track initialization module 700 is configured to initialize a new track when, for an actual detection target that is not successfully matched, the actual detection target corresponds to a 3D detection target and the confidence level is higher than a third confidence level threshold; otherwise, no initialization is performed.

The foregoing description of the embodiments of the present invention further provides a detailed description of the objects, technical solutions and advantages of the present invention, and it should be understood that the foregoing description is only illustrative of the embodiments of the present invention and is not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements, etc. that fall within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A camera and low frame rate lidar based target tracking method, comprising:

s1, acquiring an image and point cloud data at the current moment;

s2, detecting a target in the image by adopting a first identification model to obtain a 2D detection target; when the point cloud data are not empty, detecting a target in the point cloud data by adopting a second identification model to obtain a 3D detection target; when the point cloud data is empty, taking the 2D detection target as an actual detection target and transferring to step S4; the first recognition model and the second recognition model are obtained by model training according to the annotation image and the annotation point cloud data respectively;

2. The camera and low frame rate lidar based target tracking method of claim 1, wherein the projecting the 3D detection target into the two-dimensional coordinate system of the image and fusing with the corresponding 2D detection target comprises:

s301, projecting a three-dimensional target frame of the 3D detection target into a two-dimensional coordinate system of an image according to camera internal parameters and external parameters of a relative laser radar to obtain a two-dimensional target frame;

3. The method for tracking the target based on the camera and the low-frame-rate laser radar according to claim 2, wherein the obtaining the matching result of the predicted track target and the actual detection target by using the hungarian algorithm according to the matching threshold and the total matching cost comprises:

4. The method of claim 1, wherein calculating a total matching cost of the predicted trajectory target and the actual detected target comprises:

5. The method for tracking a target based on a camera and a low frame rate lidar according to claim 2, wherein after the state update and the image feature update of the successfully matched predicted trajectory target according to the matched actual detection target, the method comprises:

6. A camera and low frame rate lidar based target tracking system comprising:

the data acquisition module is used for acquiring the image and the point cloud data at the current moment;

the target identification module is used for detecting a target in the image by adopting a first identification model to obtain a 2D detection target; when the point cloud data are not empty, detecting a target in the point cloud data by adopting a second identification model to obtain a 3D detection target; when the point cloud data is empty, transferring the 2D detection target as an actual detection target to a track target matching module for processing; the first recognition model and the second recognition model are obtained by model training according to the annotation image and the annotation point cloud data respectively;

the fusion module is used for projecting the 3D detection target into a two-dimensional coordinate system of the image and fusing the 3D detection target with a corresponding 2D detection target to obtain an actual detection target;

the track target matching module is used for carrying out state prediction on the track target at the previous moment to obtain a predicted track target at the current moment, calculating the total matching cost of the predicted track target and the actual detection target, and obtaining a matching result of the predicted track target and the actual detection target by adopting a Hungary algorithm according to a matching threshold and the total matching cost;

And the updating module is used for carrying out state updating and image characteristic updating on the successfully matched predicted track target according to the matched actual detection target.

7. The camera and low frame rate lidar based target tracking system of claim 6, wherein the fusion module comprises:

the projection sub-module is used for projecting the three-dimensional target frame of the 3D detection target into a two-dimensional coordinate system of an image according to the camera internal parameters and the external parameters of the relative laser radar to obtain a two-dimensional target frame;

the matching score calculation sub-module is used for calculating the matching scores of the two-dimensional target frames and the target frames of the 2D detection targets according to the distances to obtain matching score matrixes of all the two-dimensional target frames and all the target frames of the 2D detection targets;

the target matching sub-module is used for obtaining a matching result of the 3D detection target and the 2D detection target by adopting a Hungary algorithm according to the matching score matrix and the matching score threshold;

the fusion sub-module is used for fusing the category and the confidence of the 3D detection target and the corresponding 2D detection target; the category and the confidence of the 3D detection target are detected according to the first recognition model, and the category and the confidence of the 2D detection target are detected according to the second recognition model.

8. The camera and low frame rate lidar based target tracking system of claim 7, wherein the trajectory target matching module comprises:

the priority determining sub-module is used for marking the actual detection targets with the fused confidence coefficient higher than or equal to the first confidence coefficient threshold value as high-confidence coefficient targets, and marking the actual detection targets with the fused confidence coefficient higher than or equal to the second confidence coefficient threshold value and lower than the first confidence coefficient threshold value as low-confidence coefficient targets; wherein the first confidence threshold is greater than the second confidence threshold;

the first matching sub-module is used for matching the high-confidence-degree target with the predicted track target by adopting a Hungary algorithm according to a first matching cost threshold and matching cost to obtain the predicted track target corresponding to the high-confidence-degree target;

the second matching sub-module is used for taking the high-confidence-coefficient targets and the low-confidence-coefficient targets which are not successfully matched as other targets, and matching the other targets with the unmatched predicted track targets by adopting a Hungary algorithm according to a second matching cost threshold value and matching cost to obtain predicted track targets corresponding to the other targets; wherein the second matching cost threshold is less than the first matching cost threshold.

9. The camera and low frame rate lidar based target tracking system of claim 6, wherein the trajectory target matching module comprises:

the first matching cost calculation sub-module is used for calculating the IOU matching cost of the target frame of the 2D detection target corresponding to the actual detection target and the 2D target frame of the predicted track target;

the second matching cost calculation sub-module calculates the feature matching cost of the image feature vector of the predicted track target and the image feature vector of the actual detection target; wherein, each image feature vector is extracted by adopting a pre-trained feature extraction model;

a third matching cost calculation sub-module for calculating the distance matching cost of the target frame of the 3D detection target corresponding to the actual detection target and the 3D target frame of the predicted track target;

and the total matching cost calculation sub-module is used for carrying out weighted summation on the IOU matching cost, the feature matching cost and the distance matching cost to obtain the total matching cost.

10. The camera and low frame rate lidar based target tracking system of claim 7, further comprising:

the track deleting module is used for adding one to the lost frame number of the unmatched predicted track target, and deleting the corresponding target when the lost frame number corresponding to the predicted track target is higher than a frame number threshold;

The track initialization module is used for initializing a new track when the actual detection target is not matched with the successfully-detected target, the 3D detection target corresponds to the actually-detected target, and the confidence coefficient is higher than a third confidence coefficient threshold value; otherwise, no initialization is performed.