CN116681730A

CN116681730A - Target tracking method, device, computer equipment and storage medium

Info

Publication number: CN116681730A
Application number: CN202310705715.6A
Authority: CN
Inventors: 张振林; 卫玉蓉; 陈胤子
Original assignee: China Automotive Innovation Co Ltd
Current assignee: China Automotive Innovation Co Ltd
Priority date: 2023-06-14
Filing date: 2023-06-14
Publication date: 2023-09-01

Abstract

The present application relates to a target tracking method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: acquiring an original image and point cloud data of the current moment in a target scene; performing depth fitting on the original image to obtain a depth image of the original image; performing data fusion on the depth image and the point cloud data to obtain fusion data at the current moment; extracting characteristics of a target object from fusion data at the current moment; and updating the tracking track of the target object extracted at the historical moment based on the characteristics of the target object extracted at the current moment to obtain the tracking track of the target object extracted at the current moment. By adopting the method, the characteristic information of the three-dimensional target can be better extracted in the scene which is unfavorable for the detection of the laser radar, the accuracy of the association matching is improved, the mismatching and missed detection in the multi-target matching caused by the fact that the laser radar is difficult to reach the normal working state are reduced, and the target tracking efficiency is improved.

Description

Target tracking method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of intelligent driving technology, and in particular, to a target tracking method, apparatus, computer device, storage medium, and computer program product.

Background

Along with the development of intelligent driving technology, a sensing technology is developed, and the sensing technology is mainly used for detecting dynamic obstacles such as pedestrians, vehicles and the like and tracking the dynamic obstacles as targets. Target tracking generally requires correlation matching of detected obstacles in successive frames, thereby enabling real-time accurate tracking of multiple types of obstacles during intelligent driving.

In the conventional art, a laser radar is generally used to detect dynamic obstacles. However, the lidar is greatly affected by environmental factors, and detection by means of point cloud data under severe weather conditions may have some cases of false detection and missed detection, so that the effect of real-time tracking is easily affected under such cases.

Disclosure of Invention

Accordingly, there is a need for a target tracking method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve accuracy of real-time tracking in a scenario that is unfavorable for laser radar detection, so as to reduce mismatching and missed detection in multi-target matching, and improve target tracking efficiency.

In a first aspect, the present application provides a method for tracking a target. The method comprises the following steps: acquiring an original image and point cloud data of the current moment in a target scene;

performing depth fitting on the original image to obtain a depth image of the original image;

performing data fusion on the depth image and the point cloud data to obtain fusion data at the current moment;

extracting characteristics of a target object from fusion data at the current moment;

and updating the tracking track of the target object extracted at the historical moment based on the characteristics of the target object extracted at the current moment to obtain the tracking track of the target object extracted at the current moment.

In one embodiment, the data fusion of the depth image and the point cloud data includes:

extracting texture information of each pixel point from the original image;

processing the depth image based on the texture information;

and carrying out data fusion on the processed depth image and the point cloud data.

In one embodiment, the data fusion of the depth image and the point cloud data to obtain fusion data of the current moment includes:

mapping the point cloud data from a laser radar coordinate system to a camera coordinate system corresponding to the original image to obtain mapped point cloud data;

Determining corresponding pixel points of each mapping laser point in the mapping point cloud data in the depth image;

determining an included angle between a normal vector of each mapping laser point and a normal vector of a corresponding pixel point;

reserving a mapping laser point of which the included angle is in a preset threshold range;

and carrying out data fusion on the depth image and the reserved mapping laser points to obtain fusion data at the current moment.

In one embodiment, the extracting the feature of the target object from the fusion data at the current time includes:

identifying one or more target areas corresponding to the targets based on the fusion data at the current moment;

according to the attribute data of each mapping laser point reserved at the current moment and the relative position relation between each reserved mapping laser point and the target area, corresponding weights are distributed for each mapping laser point reserved at the current moment;

and extracting the characteristics of the target object from the fusion data at the current moment according to the weight.

determining the parallax of each mapping laser point reserved at the current moment before and after mapping;

Determining fusion confidence between the reserved mapping laser points and the depth image according to the parallax;

and extracting the characteristics of the target object from the fusion data at the current moment by combining the fusion confidence.

In one embodiment, the tracking of the target object at the current time based on the feature of the target object extracted at the current time and the tracking track of the target object extracted at the historical time includes:

predicting the position of the target object at the current moment according to the tracking track of the target object extracted at the historical moment;

detecting the position of the target object at the current moment according to the characteristics of the target object extracted at the current moment;

constructing a cost matrix model between the position of the detected target object at the current moment and the predicted position of the target object;

solving the cost matrix model, and determining the matching degree between the detected target object at the current moment and the tracking track of the extracted target object at the historical moment;

and updating the tracking track of the target object extracted at the historical moment according to the matching degree to obtain the tracking track of the extracted target object at the current moment.

In one embodiment, the updating the tracking track of the target object extracted at the historical moment according to the matching degree includes:

If the maximum matching degree of the tracking track of the target object detected at the current moment and the tracking track of the target object extracted at the historical moment is larger than a preset first threshold value, updating the tracking track corresponding to the maximum matching degree based on the position of the target object detected at the current moment;

if the maximum matching degree of the tracking track of the target object detected at the current moment and the tracking track of the target object extracted at the historical moment is smaller than a preset second threshold value, a new tracking track is established according to the characteristics of the target object;

if the maximum matching degree between the tracking track of the target object extracted at the historical moment and the target object detected at the current moment is smaller than a preset third threshold value, deleting the tracking track of the target object extracted at the historical moment;

wherein the preset first threshold is greater than the second threshold and the third threshold.

In a second aspect, the application further provides a target tracking device. The device comprises:

the data acquisition module is used for acquiring an original image and point cloud data at the current moment in the target scene;

the depth fitting module is used for performing depth fitting on the original image to obtain a depth image of the original image;

the data fusion module is used for carrying out data fusion on the depth image and the point cloud data to obtain fusion data at the current moment;

The feature extraction module is used for extracting features of the target object from the fusion data at the current moment;

and the real-time tracking module is used for updating the tracking track of the target object extracted at the historical moment based on the characteristics of the target object extracted at the current moment to obtain the tracking track of the target object extracted at the current moment.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the method according to the first aspect when executing the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the method of the first aspect.

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the method of the first aspect.

According to the target tracking method, the device, the computer equipment, the storage medium and the computer program product, the corresponding depth image capable of reflecting the three-dimensional information of the target is obtained by performing depth fitting on the original image at the current moment in the scene, and the point cloud data collected under the scene (such as under severe weather conditions) which is unfavorable for laser radar detection can be supplemented and corrected by fusing the depth image and the point cloud data at the current moment, so that the characteristic information of the three-dimensional target can be better extracted, the accuracy of target identification and tracking is improved, and the occurrence of mismatching and missed detection events in the multi-target matching process due to the fact that the laser radar is difficult to reach the normal working state under the scene which is unfavorable for laser radar detection is reduced, so that the target tracking efficiency is improved.

Drawings

FIG. 1 is a diagram of an application environment of a target tracking method according to an embodiment;

FIG. 2 is a flow chart of a target tracking method according to an embodiment;

FIG. 3 is a schematic flow chart of data fusion of depth images and point cloud data in one embodiment;

FIG. 4 is a schematic flow chart of data fusion of depth images and point cloud data in one embodiment;

FIG. 5 is a flow chart of extracting features of a target object from fusion data at a current time in an embodiment;

fig. 6 is a flowchart illustrating a process of extracting features of a target object from fusion data at a current time in another embodiment;

fig. 7 is a flowchart illustrating a process of extracting features of a target object from fusion data at a current time in yet another embodiment;

FIG. 8 is a schematic flow chart of tracking a target object based on a tracking track extracted at a historical moment in one embodiment;

FIG. 9 is a flowchart of a target tracking method according to another embodiment;

FIG. 10 is a block diagram of an object tracking device according to one embodiment;

FIG. 11 is an internal block diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The target tracking method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. The laser radar 102 and the camera 103 detect the target object 101 in the same scene at the same moment, and respectively obtain an original image and point cloud data at the moment. The server 104 respectively acquires original images and point cloud data of the same scene and the same moment from the laser radar 102 and the camera 103; performing depth fitting on the original image to obtain a depth image; respectively carrying out data fusion on the depth images and point clouds at the same time to obtain fusion data at the time; extracting characteristics of the target object from the fusion data; and tracking the target object in real time based on the characteristics of the target object extracted at the current moment. The server 104 may be implemented as a stand-alone server or a server cluster including a plurality of servers.

In one embodiment, as shown in fig. 2, a target tracking method is provided, and the method is applied to the server 104 in fig. 1 for illustration, and includes the following steps:

step 202, obtaining an original image and point cloud data of a current moment in a target scene.

For target tracking, the relative position between the lidar and the camera needs to be fixed first. Then, an original image of the current moment in the target scene is obtained through shooting by a camera, and meanwhile, point cloud data of the current moment in the target scene is obtained through laser radar acquisition. Next, the server may acquire an original image captured by the camera and point cloud data acquired by the lidar.

The original image generally includes a plurality of pixel units, and each pixel unit stores pixel data for recording information such as two-dimensional position data and color data of the pixel unit.

When the laser radar detects the target object, a laser beam is emitted, the laser beam is reflected on the target object, and a receiver in the laser radar can receive the reflected light beam and obtain position data (such as three-dimensional coordinates and orientation) and attribute data (such as reflection intensity) of a reflection point according to the reflected light beam so as to form point cloud data.

Step 204, performing depth fitting on the original image to obtain a depth image of the original image.

The depth fitting of the original image may be achieved by the server inputting the original image into a trained depth annotation model. The depth labeling model can be obtained by training an initial convolutional neural network by using a sample image labeled with depth information, and can be used for labeling the depth information in the image. The depth image is an image in which depth information of each pixel point in the image is recorded, and the depth information may refer to a distance or a depth value corresponding to each pixel to the camera. And the server performs depth fitting on the original image by using the depth annotation model to obtain a depth image corresponding to the original image.

And 206, carrying out data fusion on the depth image and the point cloud data to obtain fusion data at the current moment.

The data fusion mainly comprises a process of determining the relevance between each pixel point in the depth image and the mapping laser point in the point cloud data, and can also comprise a process of storing the relevant data of the depth image and the mapping laser point and the relevant data capable of describing the relevance between the two, so as to be used for extracting the characteristics of the subsequent steps.

The depth image and the point cloud data are all data forms for acquiring depth information in a scene, and in some scenes, all the depth information cannot be accurately acquired only by using the depth image or the point cloud data, so that the depth image and the point cloud data can be subjected to data fusion, and more accurate depth information in the scene can be acquired through the acquired fusion data.

And step 208, extracting the characteristics of the target object from the fusion data at different moments.

The target refers to an object of interest to be identified and tracked. As an example, in the field of intelligent driving, the object to be tracked and identified may typically be a dynamic obstacle on a road, such as a vehicle, a pedestrian, a driver, etc.

The characteristics of the target refer to characteristics or marks shown by the target, and mainly comprise category characteristics and attribute characteristics of the target. The category characteristics of the object may be used to indicate the category of the object, for example, to indicate the characteristics of the object as a vehicle, pedestrian, or driver. The attribute characteristics of the object may be used to indicate state information of the object, such as shape, size, position, movement speed, movement orientation, etc. of the object.

Optionally, the server trains the fusion data at different moments through the pre-trained feature recognition model, and outputs one or more detection results of the targets, wherein the detection results comprise category features and attribute features of the targets, so that multiple visual and movement characteristics of the detected targets can be determined. Here, the feature recognition model may be obtained by training the convolutional neural network by using a supervised learning method through a large amount of sample data marked with category features and attribute features. The visual characteristics of the target may be the type, shape, size, etc. of the target, and the movement characteristics of the target may be the position, movement speed, movement direction, etc. of the target.

Step 210, updating the tracking track of the target object extracted at the historical moment based on the characteristics of the target object extracted at the current moment, so as to obtain the tracking track of the target object extracted at the current moment.

Following the processing of steps 202-208, the server may extract, at each time instant, the characteristics of the object to the time instant. Then, by means of the features of the target object extracted at the plurality of continuous moments, the server can generate tracking tracks of different target objects so as to reflect the state change condition of the target object in the plurality of continuous moments. Based on this, for the current time, the server may have generated tracking tracks of the target object extracted at the historical time, and then update these tracking tracks based on the characteristics of the target object extracted at the current time, so as to obtain tracking tracks of the target object extracted at the current time.

According to the target tracking method, the server obtains the corresponding depth image capable of reflecting the three-dimensional information of the target by performing depth fitting on the original image at the current moment in the scene, and the point cloud data acquired under the scene (such as under severe weather conditions) which is unfavorable for laser radar detection can be supplemented and corrected by fusing the depth image and the point cloud data at the current moment, so that the characteristic information of the three-dimensional target can be better extracted, the accuracy of target identification and tracking is improved, and the occurrence of mismatching and missed detection events in the multi-target matching process due to the fact that the laser radar is difficult to reach a normal working state under the scene which is unfavorable for laser radar detection is reduced, so that the target tracking efficiency is improved.

In one embodiment, the data fusion operation may be performed after preprocessing the depth image, and accordingly, as shown in fig. 3, may include:

in step 301, texture information of each pixel point is extracted from the original image.

The texture information of the pixel point is the change information of the color and brightness of the pixel point and other surrounding pixel points, has space invariance and robustness, can help to identify different targets, classify and track the targets and the like, and can be generally extracted by sampling and counting the color and brightness around the pixel point.

Step 302, processing the depth image based on the texture information.

The server can process the depth information in the depth image through the extracted texture information to obtain an image with higher quality or more accurate depth information.

The processing formula is as follows:

D＝k _p ∑ω _d (p′，q′)ω _r (I _p ，I _q )D _q ；

wherein D is the depth information after transformation, D _q Is the original depth information, k _p For the normalization coefficient, (p ', q') is the coordinates of the pixel point in the original image, (p, q) is the coordinates of the pixel point corresponding to the pixel point in the original image with higher quality depth information, ω _d And omega _r Representing a pixel distance term and a spatial distance term, respectively, I _p Is texture information of pixel point with coordinates of (p, q) in transverse direction, I _q Is texture information of the pixel point with coordinates (p, q) in the longitudinal direction.

And step 303, carrying out data fusion on the processed depth image and the point cloud data.

In this embodiment, after the server extracts the depth information and texture information of the original image, the depth information is processed by the texture information, so that a depth image with higher quality or more accurate depth information can be obtained, and the characteristics of the target object can be extracted more accurately in the subsequent steps.

In one embodiment, as shown in fig. 4, the process of data fusion in step 206 may specifically include:

in step 401, the point cloud data is mapped from the laser radar coordinate system to the camera coordinate system corresponding to the original image, so as to obtain mapped point cloud data.

The laser radar coordinate system is a three-dimensional coordinate system established by taking the laser radar component as a center, and the camera coordinate system is a three-dimensional coordinate system established by taking the camera as a center. The server can determine the coordinates of the three-dimensional points corresponding to the coordinates of the three-dimensional laser points in the point cloud data in the camera coordinate system through a conversion matrix between the laser radar and the camera, so that the server can map the point cloud data from the laser radar coordinate system to the camera coordinate system corresponding to the original image to obtain mapped point cloud data.

Step 402, determining corresponding pixel points of each mapping laser point in the mapping point cloud data in the depth image.

Wherein the server can realize the conversion from the coordinates of the three-dimensional points in the camera coordinate system to the coordinates in the pixel coordinate system of the camera through the homography matrix H. The image resolution can be reserved through the homography matrix H, and image distortion is reduced. Specifically, the formula for performing viewpoint conversion by using the homography matrix is as follows:

X’＝Hx，

X＝[x _c ,y _c ,z _c ] ^T ，

X’＝[u,v,1] ^T ，

Wherein [ x ] _c ,y _c ,z _c ]Is the coordinates of three-dimensional points in the camera coordinate system, [ u, v ]]Is the coordinate in the pixel coordinate system of the camera after being converted by the homography matrix H.

The homography matrix H can be obtained by combining internal parameters and external parameters calibrated by a camera. Specifically, the homography matrix H can be expressed as:

wherein s represents a scale factor; f (f) _x 、f _y 、u ₀ 、v ₀ Is the internal reference of the camera, f _x 、f _y Respectively represent focal lengths in two coordinate axis directions of a camera coordinate system, u ₀ 、v ₀ The deflection parameters of two coordinate axes in a camera coordinate system, which are generated by manufacturing errors, are usually very small; r is (r) ₁ 、r ₂ And t represents 3 camera parameters; m represents an internal reference matrix, and:

the internal parameters and external parameters of the camera can be determined by an internal and external parameter calibration method. Alternatively, the internal and external parameters of the camera may be determined by a Zhang Zhengyou calibration method.

Thus, through the homography matrix, the server can determine corresponding pixel points of each mapping laser point in the mapping point cloud data in the corresponding depth image.

Step 403, determining an included angle between the normal vector of each mapped laser spot and the normal vector of the corresponding pixel spot.

Wherein, it is assumed that the coordinate of one mapping laser spot is x= [ X _c ,y _c ,z _c ] ^T The coordinates of the corresponding pixel points in the pixel coordinate system are X' = [ u, v,1 ] ^T The angle between the normal vector of the mapped laser spot and the normal vector of the corresponding pixel spot is equal to the angle between the vectors X and X'.

And step 404, reserving the corresponding mapping laser points when the included angle is within a preset threshold range.

The reason for comparing the included angle with the preset threshold range is that parameters such as conversion matrixes between the laser radar and the camera, and the like, obtained by calibration, cannot be completely accurate, so that when the laser point cloud is mapped to a camera coordinate system or a pixel coordinate system, an error is inevitably caused in the obtained mapping result. When the deviation of the included angle between the normal vector of the mapping laser point and the normal vector of the corresponding pixel point is too large, the error can cause larger influence, so that the mapping result is more inaccurate. Therefore, by comparing the included angle with the preset threshold value, the corresponding mapping laser points when the included angle is within the preset threshold value are reserved. Further, the mapped laser points corresponding to the outside of the preset threshold need to be discarded.

The range of the preset threshold value can be adjusted according to different data. Alternatively, the preset threshold may be set to 15 degrees, for example. At this time, the mapping laser points at an included angle of less than 15 degrees will be retained, and the mapping laser points at an included angle of more than 15 degrees will be discarded. The mapping laser points when the included angle is equal to 15 degrees can be set to be reserved or discarded, and different settings are carried out according to specific situations.

And step 405, performing data fusion on the depth image and the reserved mapping laser points to obtain fusion data at the current moment.

In this step, the data fusion mainly includes determining the correlation between each pixel point in the depth image and the reserved mapped laser point, without focusing on the discarded mapped laser point. Meanwhile, the step can further comprise saving the depth image data, the reserved relevant data of the mapping laser points and the relevant data or information capable of describing the relevance between the two data for feature extraction.

In this embodiment, the server reserves the mapping laser points when the included angle is greater than the preset threshold value by determining the included angle between the normal vector of each mapping laser point in the mapping point cloud and the normal vector of the corresponding pixel point in the depth image of the mapping laser point, and discards the mapping laser points when the included angle is greater than the preset threshold value, so that the problem that the mapping result of the corresponding mapping laser points is inaccurate when the included angle is too large can be avoided, and thus, the characteristic information of the three-dimensional target can be better and more accurately extracted in the subsequent steps.

In one embodiment, as shown in fig. 5, extracting the feature of the target object from the fusion data at the current time includes:

In step 501, based on the fusion data at the current time, one or more target areas corresponding to the targets are identified.

The fusion data at the current moment actually contains depth image data at the moment, and the depth image data comprises original image data and depth information of each pixel point in the original image. The class characteristics of one or more targets may be extracted based on the fusion data and further target areas to which the one or more targets approximately correspond may be identified.

Step 502, corresponding weights are allocated to each mapping laser point reserved at the current moment according to the attribute data of each mapping laser point reserved at the current moment and the relative position relation between each reserved mapping laser point and the target area.

The purpose of assigning weights is to enable more accurate identification of the object and extraction of the object features in subsequent steps.

Wherein the attribute data of the mapped laser spot may comprise attribute data of a laser spot to which the mapped laser spot corresponds before mapping. It was mentioned earlier that the property data of the laser spot in the spot cloud may comprise property information of the reflection intensity of the laser spot. In general, points in the same region of the same object typically have similar reflection intensities. Therefore, on the basis of screening out the mapping laser points to be reserved, the probability that the mapping laser points and the adjacent mapping laser points belong to the same target object can be determined according to the difference degree of the reflection intensity between the reserved mapping laser points and the adjacent mapping laser points near the corresponding pixel points of the depth image. Therefore, corresponding weights can be allocated to the corresponding mapped laser points according to the attribute data of the reserved mapped laser points.

Further, the weight value needs to be set in consideration of the attribute data of each of the reserved mapped laser points and the relative positional relationship with the target area. For example, in the case where the object is a vehicle, the mapped laser points located within the target area of the vehicle and near the edge of the target area may generally be set to have a higher weight than the laser points located away from the target area, because it can better reflect the characteristics of the object. Meanwhile, the mapped laser spot located near the edge of the target area may be set to have a higher weight than the laser spot within the target area, because the relevant data of the mapped laser spot located at the edge may determine the attribute characteristics of the vehicle (e.g., length, width, orientation, etc. of the vehicle).

The specific setting rule of the weight should comprehensively consider the attribute data of each reserved mapping laser point and the relative position relation with the target area.

Step 503, extracting the characteristics of the target object from the fusion data at the current moment according to the weight.

Specifically, a weight-related loss function may be constructed, and the pre-trained convolutional neural network may be trained with the minimum of the loss function as a goal. And inputting the fusion data at different moments into a trained convolutional neural network, so as to extract the characteristics of the target object. The characteristics of the object extracted herein may include attribute characteristics of the object, such as shape, size, position, speed, orientation, etc. of the object.

In this embodiment, the server allocates corresponding weights for the reserved mapping laser points based on the attribute data of the reserved mapping laser points, so that accuracy of feature information extracted from the three-dimensional object can be improved.

In another embodiment, as shown in fig. 6, extracting the feature of the target object from the fusion data at the current time includes:

step 601, determining the parallax of each mapping laser point reserved at the current moment before and after mapping.

Wherein, for any mapping laser spot, parallax refers to the depth value (which can be expressed as z) of the original laser spot corresponding to the mapping laser spot under the laser radar coordinate system _r ) And mapping the depth value of the laser spot in the camera coordinate system (which can be denoted as z _c ) The difference between them (can be noted as Δd, Δd=z _r -z _c )。

Step 602, determining the fusion confidence between the reserved mapping laser points and the corresponding depth images according to the parallax.

Wherein, the confidence coefficient P is fused _d The calculation formula of (2) is as follows:

wherein Δd is the parallax of the mapped laser spot; the parallax angle can be regarded as obeying the mean valueVariance is->Is a gaussian distribution of (c). Mean->Sum of mean square error->May be determined by a maximum likelihood estimation algorithm.

And establishing the reserved fusion confidence coefficient of each mapping laser point and the corresponding pixel point through parallax errors, wherein the smaller the error value is, the higher the fusion confidence coefficient of the corresponding mapping laser point and the corresponding pixel point is.

And 603, extracting the characteristics of the target object from the fusion data at the current moment by combining the fusion confidence.

Specifically, a loss function related to fusion confidence can be constructed, and the pre-trained convolutional neural network is trained with the minimum of the loss function as a target. And inputting the fusion data at the current moment into a trained convolutional neural network, so as to extract the characteristics of the target object. The characteristics of the object extracted herein may include attribute characteristics of the object, such as shape, size, position, speed, orientation, etc. of the object.

In this embodiment, the server obtains the fusion confidence coefficient of each mapping laser point and the corresponding pixel point through the parallax of each mapping laser point before and after mapping, and by combining the fusion confidence coefficient, the accuracy of the feature information extracted from the three-dimensional object in the subsequent step can be better improved.

In yet another embodiment, the feature of the target object may be extracted from the fusion data at the corresponding time based on the weight and the fusion confidence of the reserved mapped laser point, as shown in fig. 7, which specifically includes:

in step 701, based on the fusion data at the current time, one or more target areas corresponding to the targets are identified.

Step 702, allocating corresponding weights for each reserved mapping laser point according to the attribute data of each reserved mapping laser point at the current moment and the relative position relation with the target area.

Step 703, determining the parallax of each reserved mapping laser spot before and after mapping.

And step 704, determining fusion confidence between the reserved mapping laser points and the corresponding depth images according to the parallax.

Step 705, combining the weight and the fusion confidence, extracting the characteristics of the target object from the fusion data at the current moment.

Specifically, a loss function related to the weight and the fusion confidence can be constructed, and the pre-trained convolutional neural network is trained with the minimum loss function as a target. And inputting the fusion data at the current moment into a trained convolutional neural network, so as to extract the characteristics of the target object. The characteristics of the object extracted herein may include attribute characteristics of the object, such as shape, size, position, speed, orientation, etc. of the object.

In this embodiment, the server performs data fusion on the corresponding depth image and the reserved mapping laser point by combining the weight and the fusion confidence, so that the image information and the point cloud information can be fused better, and the characteristics of the three-dimensional object can be extracted more accurately.

It will be appreciated that the order of execution of steps 701-704 described above is not fixed, as the calculation of weights and fusion confidence is relatively independent. For example, steps 701 and 702 may be performed first, followed by steps 703 and 704; steps 703 and 704 may be performed first, and steps 701 and 702 may be performed later; or steps 703 and 704 may be performed simultaneously with steps 701 and 702.

In addition, it can be understood that the weights of the reserved mapping laser points and the fusion confidence of the weights and the corresponding pixel points are not required to be obtained, and the reserved mapping laser points corresponding to the depth image and the same time can be directly subjected to data fusion.

In one embodiment, as shown in fig. 8, performing object tracking at the current time based on the feature of the object extracted at the current time and the tracking trajectory of the object extracted at the historical time includes:

step 801, predicting the position of the target at the current moment according to the tracking track of the target extracted at the historical moment.

Specifically, it is assumed that n targets have been extracted at the historical moment, each target corresponding to 1 tracking track; the positions of the n tracks in the original image of the sensing frame at the current moment are predicted by a prediction equation of Kalman filtering. For example, according to the set motion equation and noise equation, the vehicle position at the current time can be predicted from the vehicle position in the original image at the time immediately before the current time. Specifically, the prediction equation for kalman filtering is as follows:

x′(k)＝A*x(k-1)+B*u(k)

P′(k)＝A*P(k-1)*A ^T +*Q，

The purpose of the above equation is to predict the state quantity x '(k) at the time k (i.e., the current time) based on the state quantity at the time k-1 (i.e., the state quantity regarding the position information) x (k-1) via a matrix change, a is a state transition matrix, B is a control input matrix, P (k-1) is a covariance matrix of the state vector at the time k-1, P' (k) is a prediction covariance matrix of the state vector at the time k, and Q is a process noise matrix.

Step 802, detecting the position of the target object at the current moment according to the characteristics of the target object extracted at the current moment.

It was mentioned above that the characteristics of the object include class characteristics and attribute characteristics of the object. Wherein the attribute features include shape, size, position, speed, orientation. Therefore, the position of the target object can be determined according to the characteristics of the extracted target object.

Step 803, constructing a cost matrix model between the position of the detected object at the current moment and the predicted position of the existing object.

Specifically, the server determines a similarity value between the object detected at the current moment and the object tracking track obtained before the current moment, and the similarity value is represented in a two-dimensional matrix form. The similarity between the m detected targets and the n tracking tracks can be calculated, and the value in the cost matrix cost (m, n) can be calculated through the IOU (inter-Union) similarity and the mahalanobis distance. Arbitrary two points x ₁ And x ₂ Mahalanobis distance D _M The calculation formula of (2) is as follows:

wherein X is X ₁ And x ₂ Covariance matrix of the distance between them.

Alternatively, the values in the cost matrix may also be calculated by Euclidean distance and IOU similarity. However, mahalanobis distance is advantageous over euclidean distance calculations because interference in coherence between variables can be eliminated without being affected by the dimension.

Step 804, solving the cost matrix model, and determining the matching degree between the detected target object and the tracking track of the existing target object at the current moment.

The solution of the cost matrix model can be used as a bipartite graph matching problem to solve, and the object with the highest matching degree and the corresponding tracking track can be obtained by searching an augmented path through a weighted KM (kuhn-Munkres, coulomb-Manchurian) matching algorithm.

The cost matrix is calculated as a two-dimensional matrix. And carrying out bipartite graph solving (namely, solving by using a Hungary matching algorithm with weights) by using a KM matching algorithm according to the calculated cost matrix, so as to obtain detection and tracking results with high correlation matching degree.

And step 805, updating the tracking track of the target object extracted at the historical moment according to the matching degree to obtain the tracking track of the extracted target object at the current moment.

Specifically, updating the result of the association matching processing through Kalman filtering, and updating the history tracking track information of the target object by using the current observation value aiming at the target object successfully associated with the history tracking track; creating a tracking track ID of a new target object aiming at the target object which is not successfully associated; and deleting the ID of the track which is not associated with the track for a long time within the set threshold value, so that real-time accurate tracking is achieved. The updated formula for the Kalman filter is shown below:

K(k)＝P′(k)*H ^T *(H*P′(k)*H ^T +R) ^-1

x(k)＝x(k)+K(k)*(z(k)-H*x′(k))

P(k)＝(I-K(k)*H)*P′(k)

where H is a state quantity to measurement quantity conversion matrix, R represents a measurement noise matrix, K (K) represents a kalman gain at time K, x (K) is a state quantity at time K (i.e., current time), and P (K) is a predicted state covariance matrix of a state vector at time K. The measured value is a current state value acquired by the laser radar, z (k) is an observed value input by a sensor (for example, the laser radar) at the moment k, and I is an identity matrix.

In this embodiment, the server combines the prediction and update algorithm of the kalman filter, constructs the cost matrix and solves the cost matrix, so that matching between the target object detected at the current moment and the history tracking track obtained before the current moment can be accurately realized, and further real-time tracking of the target object is realized.

Optionally, the process of updating the tracking track may specifically be as follows:

if the maximum matching degree of the tracking track of the target object detected at the current moment and the tracking track of the target object extracted at the historical moment is larger than a preset first threshold value, updating the tracking track corresponding to the maximum matching degree based on the position of the target object detected at the current moment; if the maximum matching degree of the tracking track of the detected target object at the current moment and the tracking track of the target object extracted at the historical moment is smaller than a preset second threshold value, a new tracking track is established according to the characteristics of the target object; if the maximum matching degree between the tracking track of the target object extracted at the historical moment and the target object detected at the current moment is smaller than a preset third threshold value, deleting the tracking track of the target object extracted at the historical moment; the preset first threshold value is larger than the second threshold value and the third threshold value.

In one embodiment, as shown in fig. 9, there is provided a target tracking method, including:

step 901, obtaining an original image and point cloud data of a current moment in a target scene.

Step 902, performing depth fitting on the original image to obtain a depth image of the original image;

in step 903, texture information of each pixel is extracted from the original image.

Step 904, processing the depth image based on the texture information.

In step 905, the point cloud data is mapped from the laser radar coordinate system to the camera coordinate system corresponding to the original image, so as to obtain mapped point cloud data.

Step 906, determining corresponding pixel points of each mapping laser point in the mapping point cloud data in the processed depth image.

In step 907, the angle between the normal vector of each mapped laser spot and the normal vector of the corresponding pixel spot is determined.

Step 908, the mapped laser points with included angles within a preset threshold are retained.

And step 909, performing data fusion on the depth image and the reserved mapping laser points to obtain fusion data at the current moment.

In step 910, a target area corresponding to one or more targets is identified based on the fusion data at the current time.

Step 911, corresponding weights are allocated to the mapping laser points reserved at the current moment according to the attribute data of the mapping laser points reserved at the current moment and the relative position relation between the reserved mapping laser points and the target area.

Step 912, extracting the feature of the target object from the fusion data at the current moment according to the weight.

Step 913, predicting the position of the target object at the current moment according to the tracking track of the target object extracted at the historical moment.

Step 914, detecting the position of the target object at the current moment according to the characteristics of the target object extracted at the current moment.

In step 915, a cost matrix model is constructed between the position of the detected object at the current time and the predicted position of the object.

And step 916, solving the cost matrix model, and determining the matching degree between the detected target object at the current moment and the tracking track of the extracted target object at the historical moment.

Step 917, updating the tracking track of the target object extracted at the historical moment according to the matching degree, so as to obtain the tracking track of the extracted target object at the current moment.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a target tracking device for realizing the target tracking method. The implementation of the solution provided by the device is similar to that described in the above method, so the specific limitation of one or more embodiments of the object tracking device provided below may be referred to above for limitation of the object tracking method, and will not be repeated here.

In one embodiment, as shown in fig. 10, there is provided a target tracking apparatus including: a data acquisition module 1001, a depth fitting module 1002, a data fusion module 1003, a feature extraction module 1004, and a track tracking module 1005, wherein:

the data acquisition module 1001 is configured to acquire an original image and point cloud data at a current moment in a target scene;

the depth fitting module 1002 is configured to perform depth fitting on the original image to obtain a depth image of the original image;

the data fusion module 1003 is configured to perform data fusion on the depth image and the point cloud data, so as to obtain fusion data at the current moment;

the feature extraction module 1004 is configured to extract features of the target object from the fusion data at the current moment;

The track tracking module 1005 is configured to update the tracking track of the target object extracted at the historical moment based on the feature of the target object extracted at the current moment, so as to obtain the tracking track of the target object extracted at the current moment.

In one embodiment, the data fusion module 1003 is specifically configured to:

extracting texture information of each pixel point from the original image;

processing the depth image based on the texture information;

In one embodiment, the data fusion module 1003 is specifically configured to:

In one embodiment, the feature extraction module 1004 is specifically configured to:

In one embodiment, the track tracking module 1005 is specifically configured to:

The various modules in the object tracking device described above may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 11. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing original image data, point cloud data, depth image data, fusion data, data generated in the neural network training process and the like. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a target tracking method.

It will be appreciated by those skilled in the art that the structure shown in FIG. 11 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of tracking a target, the method comprising:

acquiring an original image and point cloud data of the current moment in a target scene;

2. The method of claim 1, wherein data fusing the depth image and the point cloud data comprises:

extracting texture information of each pixel point from the original image;

processing the depth image based on the texture information;

3. The method of claim 1, wherein the data fusing the depth image and the point cloud data to obtain fused data at a current time comprises:

4. A method according to claim 3, wherein the extracting the feature of the object from the fusion data at the current time comprises:

5. A method according to claim 3, wherein the extracting the feature of the object from the fusion data at the current time comprises:

6. The method according to any one of claims 1 to 5, wherein the tracking of the object at the present moment based on the feature of the object extracted at the present moment and the tracking trajectory of the object extracted at the historic moment includes:

7. The method of claim 6, wherein updating the tracking trajectory of the object extracted at the historical moment according to the matching degree comprises:

8. An object tracking device, the device comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.