WO2021134258A1

WO2021134258A1 - Point cloud-based target tracking method and apparatus, computer device and storage medium

Info

Publication number: WO2021134258A1
Application number: PCT/CN2019/130034
Authority: WO
Inventors: 许家妙; 何明; 叶茂盛; 邹晓艺; 吴伟; 许双杰; 曹通易
Original assignee: 深圳元戎启行科技有限公司
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2021-07-08

Abstract

A point-cloud based target tracking method, comprising: according to point cloud data of a current frame, generating a feature map of the current frame; acquiring a candidate region corresponding to the feature map of the current frame; in the feature map of the current frame, intercepting a candidate feature map that matches the candidate region; extracting an image feature corresponding to the candidate feature map; and calling a target tracking model obtained by training on the basis of point cloud data of the previous frame, and according to the image feature, determining a target tracking region corresponding to the point cloud data of the current frame.

Description

Point cloud-based target tracking method, device, computer equipment and storage medium

Technical field

This application relates to a point cloud-based target tracking method, device, computer equipment, storage medium, and transportation.

Background technique

With the development of computer technology, visual tracking technology has emerged, and the position and speed of the target can be tracked through the computer's visual tracking technology. For example, in the field of automatic driving, the automatic driving equipment can be controlled by tracking the surrounding target objects or pedestrians. In traditional methods, target tracking is usually based on captured images.

However, the inventor realizes that target tracking based on images is easily affected by image quality. The image quality is lower when the ambient light changes, the target moves faster, and so on. Based on low-quality images, the target cannot be tracked accurately, and the accuracy of the tracking result is low.

Summary of the invention

According to various embodiments disclosed in the present application, a point cloud-based target tracking method, device, computer equipment, storage medium, and transportation tool are provided.

A point cloud-based target tracking method includes:

Generate a feature map of the current frame according to the point cloud data of the current frame;

Acquiring the candidate area corresponding to the feature map of the current frame;

Intercept a candidate feature map matching the candidate region in the current frame feature map;

Extracting the image feature corresponding to the candidate feature map; and

The target tracking model trained based on the point cloud data of the previous frame is called, and the target tracking area corresponding to the point cloud data of the current frame is determined according to the image characteristics.

A point cloud-based target tracking device includes:

The feature map generating module is used to generate the feature map of the current frame according to the point cloud data of the current frame;

A candidate region acquiring module, configured to acquire a candidate region corresponding to the current frame feature map; intercept a candidate feature map matching the candidate region in the current frame feature map;

A feature extraction module for extracting image features corresponding to the candidate feature map; and

The target tracking module is used to call the target tracking model trained based on the point cloud data of the previous frame, and determine the target tracking area corresponding to the point cloud data of the current frame according to the image characteristics.

A computer device, including a memory and one or more processors, the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the one or more processors execute The following steps:

Extracting the image feature corresponding to the candidate feature map; and

One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:

Extracting the image feature corresponding to the candidate feature map; and

A vehicle includes the steps of executing the above-mentioned point cloud-based target tracking method.

The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. A person of ordinary skill in the art can obtain other drawings based on these drawings without creative work.

Fig. 1 is an application scenario diagram of a point cloud-based target tracking method according to one or more embodiments.

Fig. 2 is a schematic flowchart of a point cloud-based target tracking method according to one or more embodiments.

Fig. 3 is a schematic flowchart of the steps of generating a feature map of the current frame according to the point cloud data of the current frame according to one or more embodiments.

Fig. 4 is a schematic flowchart of a point cloud-based target tracking method in another embodiment.

Fig. 5 is a block diagram of a point cloud-based target tracking device according to one or more embodiments.

Figure 6 is a block diagram of a computer device according to one or more embodiments.

Detailed ways

In order to make the technical solutions and advantages of the present application clearer, the following further describes the present application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.

The point cloud-based target tracking method provided in this application can be applied to the application environment of automatic driving as shown in FIG. 1. The laser sensor 102 can communicate with the computer device 104. The laser sensor 102 may be a vehicle-mounted laser sensor, and the computer device 104 may be a vehicle-mounted computer device. The point cloud data can be collected by the laser sensor 102, or pre-stored by a computer device. The computer device 104 generates a feature map of the current frame according to the point cloud data of the current frame. The computer device 104 obtains the candidate region corresponding to the feature map of the current frame, and intercepts the candidate feature map matching the candidate region in the feature map of the current frame. The computer device 104 extracts the image features corresponding to the candidate feature map, calls the target tracking model trained based on the point cloud data of the previous frame, and determines the target tracking area corresponding to the point cloud data of the current frame according to the image features. The laser sensor 102 may be a laser sensor carried by an automatic driving device, and may specifically include a laser radar, a laser scanner, and the like.

In one of the embodiments, as shown in FIG. 2, a point cloud-based target tracking method is provided. Taking the method applied to the computer device 104 in FIG. 1 as an example for description, the method includes the following steps:

Step 202: Generate a feature map of the current frame according to the point cloud data of the current frame.

The laser sensor is equipped with a device capable of autonomous driving. For example, it may be a laser sensor mounted on an unmanned vehicle, or a laser sensor mounted on a vehicle including an automatic driving mode. Laser sensors can be used to collect surrounding environmental data. Specifically, the laser sensor can emit a detection signal, such as a laser beam. The laser sensor compares the reflected signal with the detection signal to obtain surrounding environmental data. The environmental data may specifically be point cloud data. Point cloud data refers to a collection of point data corresponding to multiple points on the surface of the object in the scanning environment recorded in the form of points. Multiple can refer to two or more than two. The laser sensor can collect according to a preset frequency to obtain multi-frame point cloud data. The preset frequency can be preset according to actual needs.

The point cloud data may be three-dimensional point cloud data, and each frame of point cloud data may include point data corresponding to multiple points. The point data may specifically include at least one of three-dimensional coordinates, laser reflection intensity, and color information corresponding to the point. Among them, the three-dimensional coordinates may be the coordinates of the point in the Cartesian coordinate system, and specifically include the horizontal axis coordinates, the vertical axis coordinates, and the vertical axis coordinates of the point in the Cartesian coordinate system. The Cartesian coordinate system is a three-dimensional space coordinate system established with a laser sensor as the origin. The three-dimensional space coordinate system includes a horizontal axis (x axis), a vertical axis (y axis), and a vertical axis (z axis). The three-dimensional space coordinate system established with the laser sensor as the origin satisfies the right-hand rule.

The computer equipment can follow the time sequence of the point cloud data collected by the laser sensor, and track the target according to the multi-frame point cloud data in turn. A target refers to a living or non-living body in the surrounding environment. The target can be moving or stationary. For example, the target may specifically include at least one of pedestrians, roadblocks, vehicles, and buildings. The current frame of point cloud data refers to a frame of point cloud data being processed by the computer equipment. It is understandable that when the computer device finishes tracking the point cloud data of the current frame and starts to track the point cloud data of the next frame, the point cloud data of the current frame can be recorded as the point cloud data of the previous frame, and the point cloud data of the next frame can be recorded as the point cloud data of the previous frame. The data is re-recorded as the point cloud data of the current frame. The computer device can obtain the point data included in the point cloud data of the current frame, encode the points according to the point data, and obtain the point features corresponding to each of the multiple points. The point feature can be expressed in the form of a vector, and the point feature of each point can include a point vector. The point vector can be a multidimensional vector. The computer device can generate a feature map according to the point features corresponding to each of the multiple points, and the feature map generated from the point cloud data of the current frame can be recorded as the current frame feature map. Compared with the environmental image collected in the traditional way, the feature map generated from the point cloud data in this implementation does not include RGB (red, green, blue) channels. The generated feature map is different from the traditionally collected image, based on the point cloud data Target tracking will not be affected by factors such as ambient lighting and target movement speed.

Step 204: Obtain a candidate region corresponding to the feature map of the current frame.

The candidate area is a feature map area used to determine the target tracking area, and the candidate area may include the target tracking area. The candidate area is an area range in the feature map of the current frame, and the candidate area can be the entire area of the feature map of the current frame, or a partial area of the feature map of the current frame. The candidate area may include the area size, the area shape, and the position in the current frame feature map. The candidate area may be one of a variety of shapes. For example, in principle, the shape of the candidate region can generally be rectangular, and the shape of the candidate region can also be circular.

After the computer device generates the feature map of the current frame according to the point cloud data of the current frame, it can obtain the candidate region corresponding to the feature map of the current frame in a variety of ways. For example, the computer device may record the entire area of the feature map of the current frame as the corresponding candidate area. The computer device can also obtain the candidate area corresponding to the feature map of the current frame according to the point cloud data of the previous frame. Specifically, the computer device can obtain the target area of the previous frame corresponding to the point cloud data of the previous frame according to the tracking result of the point cloud data of the previous frame, and the computer device can determine the candidate area corresponding to the feature map of the current frame according to the target area of the previous frame. .

In one of the embodiments, obtaining the candidate area corresponding to the feature map of the current frame includes: obtaining the target area of the previous frame corresponding to the point cloud data of the previous frame; expanding the target area of the previous frame according to a preset multiple; The target area of the previous frame is used as the candidate area corresponding to the feature map of the current frame.

Specifically, the computer device may obtain the target area of the previous frame corresponding to the point cloud data of the previous frame according to the tracking result of the point cloud data of the previous frame. The target area of the previous frame refers to the area where the target is located in the point cloud data of the previous frame. The computer device can expand the target area of the previous frame according to the preset multiple to obtain the expanded target area of the previous frame, so as to ensure that the current frame feature map in the candidate area can include the target. The computer equipment can expand the target area of the previous frame with a preset multiple area, or expand the side length of the target area according to the preset multiple, and determine the closed area formed by the expanded side length as the expanded target area of the previous frame . The computer device may determine the expanded previous frame target area in the current frame feature map according to the expanded previous frame target area, and determine the expanded previous frame target area as the candidate area corresponding to the current frame feature map. The preset multiple may be preset by the user according to actual needs, for example, the preset multiple may specifically be 2 times.

In this embodiment, the candidate area corresponding to the feature map of the current frame is determined according to the target area of the previous frame, so that the target of the previous frame is tracked in the feature map of the current frame. The computer device can expand the target area of the previous frame according to the preset multiple, and determine the expanded target area of the previous frame as the candidate area corresponding to the feature map of the current frame, which ensures the accuracy of target tracking and does not need to process the entire current frame The feature map effectively saves the computing resources of the computer equipment.

Step 206: Extract a candidate feature map matching the candidate region in the current frame feature map.

The computer device can intercept the candidate feature map in the current frame feature map according to the candidate region corresponding to the current frame feature map, and intercept the candidate feature map corresponding to the candidate region. Corresponding to the candidate region, the candidate feature map can be all feature maps of the current frame feature map, or part of the feature map of the current frame feature map. The candidate feature map may include the target to be tracked, and the candidate feature map obtained by interception matches the size and shape of the candidate area.

Step 208: Extract image features corresponding to the candidate feature map.

The computer device can perform feature extraction on the selected candidate feature map, and extract the image features corresponding to the candidate feature map from the candidate feature map. The image feature corresponding to the candidate feature map may include at least one of multiple feature types. For example, the image features may specifically include at least one of multiple feature types such as edge features, corner features, and regional features. Image features can be recorded by means of feature vectors, etc. The computer device can extract the image features corresponding to the candidate feature map, perform feature analysis on the candidate feature map according to the image features, and determine the target tracking area corresponding to the point cloud data of the current frame in the candidate feature map according to the analysis result.

In one of the embodiments, extracting the image features corresponding to the candidate feature map includes: obtaining a feature extraction model; inputting the candidate feature map to the feature extraction model; performing feature extraction on the candidate feature map according to the feature extraction model to obtain the candidate feature map location Corresponding image characteristics.

Specifically, the computer device can obtain the feature extraction model, and the feature extraction model can be pre-configured in the computer device. The computer device can input the intercepted candidate feature maps into the feature extraction model, and perform operations on the input candidate feature maps through the feature extraction model to perform feature extraction on the candidate feature maps. The feature extraction model can be one of a variety of neural network models. For example, the feature extraction model may specifically be one of neural network models such as a traditional Convolutional Neural Networks (CNN) model and a VGG (Visual Geometry Group Network) model. The feature extraction model may specifically include an input layer, a convolutional layer, a pooling layer, a fully connected layer, a BN (Batch Normalization, batch normalization) layer, an output layer, and so on. The computer device can sequentially perform operations corresponding to the network structure on the candidate feature maps according to the network structure of the feature extraction model to obtain image features corresponding to the candidate feature maps output by the feature extraction model.

In this embodiment, the computer device can obtain the feature extraction model, input the candidate feature map into the feature extraction model, perform feature extraction on the candidate feature map according to the feature extraction model, and obtain the image features corresponding to the candidate feature map, which effectively improves The accuracy of image features. Furthermore, the target tracking area corresponding to the point cloud data of the current frame is determined according to the image characteristics, which improves the accuracy of target tracking.

Step 210: Call the target tracking model trained based on the point cloud data of the previous frame, and determine the target tracking area corresponding to the point cloud data of the current frame according to the image characteristics.

The computer device can call the target tracking model, and perform tracking processing on the image features corresponding to the candidate feature map according to the target tracking model to obtain the target tracking area corresponding to the point cloud data of the current frame. The target tracking model is a tracking model obtained after training based on the point cloud data of the previous frame. The target tracking model is used to track the area where the target is located in the point cloud data of the current frame to obtain the target tracking area. The target tracking area refers to the location area of the target in the feature map of the current frame estimated by the tracking process. The computer device can input the extracted image features into the target tracking model trained based on the point cloud data of the previous frame, and use the target tracking model to track the image features of the current frame feature map to obtain the target tracking area output by the target tracking model.

It is understandable that the computer device can obtain multiple frames of point cloud data, and process each frame of point cloud data in sequence according to the sequence of the point cloud data collection time of each frame. When the computer device performs tracking processing on the point cloud data of the current frame, the tracking model can be trained according to the point cloud data of the previous frame corresponding to the point cloud data of the current frame to obtain the target tracking model corresponding to the point cloud data of the current frame. After the computer equipment finishes processing the point cloud data of the current frame, it can perform tracking processing on the point cloud data of the next frame. The computer equipment can train the tracking model according to the point cloud data of the current frame, and obtain the target tracking model corresponding to the point cloud data of the next frame, so as to track the point cloud data of the next frame. The computer equipment can process the point cloud data according to the point cloud data. Train the tracking model sequentially iteratively.

In this embodiment, the computer device can obtain the point cloud data of the current frame, and generate a feature map of the current frame according to the point cloud data of the current frame. The computer device can extract the image features of the candidate feature map in the current frame feature map, and track the image features according to the target tracking model trained based on the point cloud data of the previous frame to determine the target tracking area corresponding to the current frame point cloud data. Compared with the traditional image-based target tracking method, the target tracking based on point cloud data in this embodiment will not be affected by factors such as external light, target movement speed, etc., which effectively improves the accuracy of target tracking.

In one of the embodiments, as shown in FIG. 3, generating a feature map of the current frame according to the point cloud data of the current frame includes:

Step 302: Obtain the point cloud data of the current frame.

Step 304: Perform structured processing on the point cloud data of the current frame to obtain a processing result.

Step 306: Encode the points in the point cloud data of the current frame based on the processing result to obtain point features corresponding to the points.

Step 308: Generate a feature map of the current frame corresponding to the point cloud data of the current frame according to the point features.

The computer equipment can obtain the point cloud data of multiple frames within the visible range collected by the laser sensor, and process the point cloud data of each frame in sequence according to the time sequence of the point cloud data collected by the laser sensor. The computer device may record the point cloud data that is being processed or being processed as the point cloud data of the current frame. The computer device can process the point cloud data of the current frame to generate a feature map of the current frame corresponding to the point cloud data of the current frame.

Specifically, the computer device may perform structured processing on the point cloud data of the current frame to obtain a processing result after the structured processing. There are many ways for computer equipment to perform structured processing. For example, the computer device may perform rasterization processing on the current frame point cloud data, or may perform voxelization processing on the current frame point cloud data. Taking rasterization processing as an example, the computer equipment can rasterize the plane with the laser sensor as the origin, and divide the plane into multiple grids. The structured space after the structuring process may be a columnar space, and the points may be distributed in the columnar space corresponding to the vertical axis of the grid, that is, the abscissa and ordinate of the points in the columnar space are within the corresponding grid coordinate range.

The computer device can encode the point in the point cloud data of the current frame according to the processing result of the structured processing to obtain the point feature corresponding to the point. The point feature can be a point vector corresponding to the point. Specifically, the computer device can count the point data of all points in each structured space, encode the points according to the statistical point data, and obtain the point vectors corresponding to the points.

For example, after the computer equipment rasterizes the origin plane, it counts the point data in each columnar space. Among them, the point data may specifically include the three-dimensional coordinates and reflection coefficients corresponding to each point. The computer equipment can encode the point according to the point data in the columnar space to obtain the point vector corresponding to the point. For example, the point vector may be a 9-dimensional vector. The point vector may specifically include the horizontal axis coordinate, the vertical axis coordinate, the vertical axis coordinate, the reflection coefficient, the distance from the center of the cylindrical space, and the distance between each point and the average value of the three-dimensional coordinates of all points. Among them, the distance between a point and the center of the columnar space can be represented by the horizontal axis distance and the vertical axis distance, and the distance between the point and the average of the three-dimensional coordinates of all points can be represented by the horizontal axis distance, the vertical axis distance, and the vertical axis distance. The computer equipment can record the 9-dimensional vector corresponding to the point as the point feature corresponding to the point.

The computer device can count the point features in multiple structured spaces, and generate the current frame feature map corresponding to the current frame point cloud data according to the multiple point features.

In this embodiment, the computer device can encode the points according to the point cloud data of the current frame to obtain the point characteristics, and generate the current frame feature map corresponding to the point cloud data of the current frame according to the point characteristics. The point cloud data will not be affected by ambient light or target The influence of factors such as movement speed ensures the accuracy of tracking the target. At the same time, compared to the traditional method of using Kalman filtering to track the point cloud data, the current frame feature map is generated according to the current frame point cloud data in this implementation, which can effectively use the depth features in the point cloud data, and then Effectively improve the accuracy of target tracking.

In one of the embodiments, the computer device may collect a preset number of sampling points from the structured space, encode the sampling points, and obtain the point characteristics corresponding to the sampling points. The computer device can generate the current frame feature map corresponding to the current frame point cloud data according to the point features corresponding to the sampling points in each structured space. Specifically, the computer device can count the number of points included in the structured space, and compare the number of points with the preset number. The preset number can be preset according to actual needs and historical point cloud data after big data analysis. When the number of points in the structured space is greater than or equal to the preset number, the computer device may randomly select a preset number of points from the structured space as sampling points. When the number of points in the structured space is less than the preset number, the computer device can obtain all points in the structured space as sampling points, and add virtual points as sampling points, so that the preset number of sampling points can be collected. Among them, the three-dimensional coordinates of the virtual point may be located at the origin of the coordinate system.

In this embodiment, the computer device collects a preset number of points from each structured space as sampling points, so that the number of sampling points in each structured space is the same, thereby balancing the point characteristics of multiple structured spaces , Which helps computer equipment to generate a feature map of the current frame based on structured data.

In one of the embodiments, generating the current frame feature map corresponding to the point cloud data of the current frame according to the point features includes: generating a point feature matrix based on multiple point features; calling the image generation model, and inputting the point feature matrix to the image generation model; Obtain the current frame feature map output by the image generation model.

The computer device may generate a point feature matrix based on multiple point features in multiple structured spaces, and the point feature matrix may specifically include point features, structured spaces, and corresponding points, etc. The computer equipment can call the image generation model. The image generation model may be pre-configured in the computer device, and the image generation model may be obtained by training a large number of point feature samples and feature map samples corresponding to the point feature samples. The image generation model can be one of a variety of neural network models. For example, the image generation model may be a convolutional neural network model, and specifically may be a PointNet model. The computer device can input the generated point feature matrix to the image generation model, and calculate the point feature matrix through the image generation model, and perform the maximum pooling operation on the point features of the point quantity dimension to obtain the current frame feature corresponding to the current frame point cloud data Figure.

In this embodiment, the computer device generates a point feature matrix based on multiple point features, and performs operations on the point feature matrix according to the image generation model to obtain the current frame feature map output by the image generation model. The computer equipment generates a feature map of the current frame according to the point cloud data of the current frame, which effectively utilizes the depth features in the point cloud data, and improves the accuracy of target tracking based on the point cloud data.

In one of the embodiments, calling the target tracking model trained based on the point cloud data of the previous frame, and determining the target tracking area corresponding to the point cloud data of the current frame according to the image features, includes: generating an image feature matrix according to the image features; The matrix is input to the target tracking model, and the area label output by the target tracking model is obtained; the target tracking area corresponding to the point cloud data of the current frame is determined according to the area label.

The computer device can generate a corresponding image feature matrix according to the extracted image features. Specifically, the computer device can process the extracted image features, and after each column of image features are connected to the previous column of image features, a column vector is generated according to the image features. The computer device can cyclically shift the column vector, and arrange all the image features obtained after the shift in columns to obtain an image feature matrix. Cyclic shifting may include rotating left or rotating right.

The computer device can input the generated image feature matrix to the target tracking model, which is obtained after training based on the point cloud data of the previous frame. The target tracking model can use at least one of a variety of visual tracking algorithms. For example, KCF (Kernel Correlation Filter, kernel correlation filtering algorithm) algorithm, and KCF-based target tracking algorithm, etc. can be specifically used. The computer device can perform operations on the image feature matrix through the target tracking model, and obtain the area label output by the target tracking model after the calculation. The target tracking model can output multiple area labels, which are used to mark corresponding areas. The area marked by the area label indicates the possible range of the target, and the size and shape of the area can be the same as the target area of the previous frame corresponding to the point cloud data of the previous frame. The area label can indicate the possibility that the target is within the corresponding area. In one of the embodiments, the area label may be the probability of the corresponding area where the target is located.

The computer device can compare multiple area labels output by the target tracking model, and obtain the area label with the largest label value from the multiple area labels as the target area label. The computer device can determine the area corresponding to the target area tag as the target tracking area. Specifically, the computer device can obtain the cyclic offset corresponding to the target area label, determine the area after the offset of the target area of the previous frame according to the cyclic offset, and determine the area after the offset of the target area of the previous frame as the current frame point Target tracking area corresponding to cloud data.

In this embodiment, the computer device can generate an image feature matrix based on image features, call the target tracking model to perform operations on the image feature matrix, and determine the target tracking area corresponding to the point cloud data of the current frame according to the area label output by the target tracking model. Compared with the traditional Kalman filtering method for point cloud data that does not consider point cloud features, this embodiment uses the point cloud features corresponding to the point cloud data of the current frame, which effectively improves the accuracy of target tracking based on point cloud data. Sex.

In one of the embodiments, as shown in FIG. 4, before the step of calling the target tracking model trained based on the point cloud data of the previous frame, the above point cloud-based target tracking method further includes:

Step 402: Generate a feature map of the previous frame according to the point cloud data of the previous frame.

Step 404: Take a sample of the feature map from the previous frame of feature map, and extract the sample feature corresponding to the sample feature map.

Step 406: Generate a sample label corresponding to the sample feature.

Step 408: Train the standard tracking model according to the sample features and sample labels to obtain the target tracking model.

Before calling the target tracking model to process the image features corresponding to the point cloud data of the current frame, the computer device also needs to train the standard tracking model according to the point cloud data of the previous frame to obtain the target tracking model.

Specifically, the computer device may generate a feature map of the previous frame according to the point cloud data of the previous frame, cut a sample of the feature map of the previous frame of the feature map, and extract the sample feature corresponding to the sample feature map. It is understandable that since the computer device can track the point cloud data collected by the laser sensor in turn, iteratively trains the target tracking model based on multiple frames of point cloud data. Therefore, the computer device generates the feature map of the previous frame according to the point cloud data of the previous frame, cuts samples of the feature map of the previous frame of the feature map, and extracts the sample feature corresponding to the sample feature map. The device generates a feature map of the current frame according to the point cloud data of the current frame, intercepts candidate feature maps around the current frame feature map, and extracts image features in the candidate feature map in the same or similar manner, so we will not repeat them here.

The computer device can generate the sample label corresponding to the sample characteristic according to the sample characteristic. Specifically, the computer device may perform a cyclic shift on the sample features to obtain multiple sample features, and generate a sample feature matrix based on the multiple sample features. Determine the sample label corresponding to the sample feature according to the shift value corresponding to each sample feature in the sample feature matrix. The shift value may specifically include a horizontal axis shift value and a vertical axis shift value. The computer device can perform operations on the shift values corresponding to the multiple sample features in the sample feature matrix according to the preset function to obtain sample labels corresponding to each of the multiple sample features. The preset function may be a function preset by the user, and the preset function may specifically be a two-dimensional Gaussian function.

The computer device can train the established standard tracking model according to the multiple sample features in the sample feature matrix and the sample labels corresponding to the multiple sample features to obtain the target tracking model.

In this embodiment, the computer device can train the standard tracking model based on the point cloud data of the previous frame to obtain the target tracking model, so as to use the target tracking model to track the target tracking area of the point cloud data of the current frame, and perform multi-frame points. Cloud data iteratively trains the target tracking model, which effectively improves the accuracy of target tracking.

In one of the embodiments, the above-mentioned point cloud-based target tracking method further includes detecting the point cloud data of the current frame to obtain the target detection area; according to the target detection area and the target tracking area, determining the current frame point cloud data corresponding to the current Frame target area.

The computer equipment can detect the point cloud data of the current frame according to the point cloud data to obtain the target detection area. The computer device may use at least one of multiple target detection algorithms to perform target detection on the point cloud data of the current frame to obtain the target detection area. The computer device can determine the current frame target area corresponding to the current frame point cloud data according to the target detection area and the target tracking area corresponding to the current frame point cloud data. Specifically, the computer device can compare the target detection area with the target tracking area. When the target detection area is the same as the target tracking area, the computer device can determine the area corresponding to the target detection area as the current frame target area corresponding to the current frame point cloud data. When the target detection area and the target tracking area are not the same, the computer device can synthesize the target detection area and the target tracking area, and determine the comprehensive area as the current frame target area corresponding to the current frame point cloud data.

In one of the embodiments, the computer device can obtain the detection confidence level corresponding to the target detection area and the tracking confidence level corresponding to the target tracking area. The computer device may integrate the target detection area and the target tracking area based on the detection confidence and the tracking confidence, and determine the comprehensive area as the current frame target area corresponding to the current frame point cloud data.

In this embodiment, the computer device can detect the target detection area according to the current frame point cloud data, adjust the target detection area according to the target tracking area, and determine that the adjusted area is the current frame target area corresponding to the current frame point cloud data. Effectively improve the accuracy of determining the target area.

In one of the embodiments, the above-mentioned point cloud-based target tracking method further includes determining target displacement data according to the target area of the current frame and the target area of the previous frame; acquiring the point cloud acquisition frequency; and determining the target according to the point cloud acquisition frequency and the target displacement data Movement data.

The computer device can compare the target area of the current frame with the target area of the previous frame, and determine the target displacement data according to the comparison result. The target displacement data may include the length and direction of the target displacement. The computer equipment can obtain the point cloud collection frequency corresponding to the laser sensor. The point cloud collection frequency can be preset by the user according to actual needs, and the laser sensor collects point cloud data according to the set point cloud collection frequency. The point cloud collection frequency can be a constant. For example, a laser sensor can collect point cloud data at a frequency of 50 frames per second. The point cloud collection frequency can also be a variable. For example, the laser sensor can adjust the point cloud collection frequency according to different situations or modes. For example, a laser sensor can increase the point cloud collection frequency when there are many targets in the environment and the movement speed is fast, and reduce the point cloud collection frequency when there are fewer targets in the environment and the movement speed is slow.

The computer device can determine the time difference between the acquisition time of the point cloud data of the previous frame and the acquisition time of the point cloud data of the current frame according to the acquired point cloud acquisition frequency. For example, when the point cloud acquisition frequency is 50 frames per second, the computer device can determine that the time difference between the two frames is 0.02 seconds. The computer device can determine the target motion data corresponding to the target according to the time difference and the target displacement data. The target motion data may specifically include information such as the motion speed and direction corresponding to the target, so that the computer equipment can prompt or control the unmanned driving device according to the target motion data.

In this embodiment, the computer device can determine the target displacement data according to the target area of the current frame and the target area of the previous frame, and determine the target motion data according to the point cloud collection frequency and the target displacement data, which helps the computer device to determine the target motion data according to the target motion data. The person drives the device to prompt or control.

It should be understood that although the various steps in the flowcharts of FIGS. 2-4 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least part of the steps in Figures 2-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.

In one of the embodiments, as shown in FIG. 5, a point cloud-based target tracking device is provided, including: a feature map generation module 502, a candidate region acquisition module 504, a feature extraction module 506, and a target tracking module 508, where :

The feature map generating module 502 is configured to generate a feature map of the current frame according to the point cloud data of the current frame.

The candidate region acquiring module 504 is used to acquire the candidate region corresponding to the feature map of the current frame; and intercept the candidate feature map matching the candidate region in the feature map of the current frame.

The feature extraction module 506 is used to extract image features corresponding to the candidate feature map.

The target tracking module 508 is configured to call the target tracking model trained based on the point cloud data of the previous frame, and determine the target tracking area corresponding to the point cloud data of the current frame according to the image characteristics.

In one of the embodiments, the above-mentioned feature map generation module 502 is also used to obtain frame point cloud data; structure the current frame point cloud data to obtain the processing result; perform processing on the points in the current frame point cloud data based on the processing result Encode to obtain the point feature corresponding to the point; generate the current frame feature map corresponding to the point cloud data of the current frame according to the point feature.

In one of the embodiments, the above-mentioned feature map generation module 502 is further configured to generate a point feature matrix based on multiple point features; call the image generation model, and input the point feature matrix to the image generation model; obtain the current frame features output by the image generation model Figure.

In one of the embodiments, the above-mentioned candidate area acquisition module 504 is also used to acquire the target area of the previous frame corresponding to the point cloud data of the previous frame; expand the target area of the previous frame according to a preset multiple; and determine the expanded previous The frame target area is used as the candidate area corresponding to the feature map of the current frame.

In one of the embodiments, the feature extraction module 506 is also used to obtain a feature extraction model; input the candidate feature map to the feature extraction model; perform feature extraction on the candidate feature map according to the feature extraction model to obtain the image corresponding to the candidate feature map feature.

In one of the embodiments, the above-mentioned target tracking module 508 is also used to generate an image feature matrix based on image features; input the image feature matrix to the target tracking model to obtain the area label output by the target tracking model; determine the current frame point cloud according to the area label Target tracking area corresponding to the data.

In one of the embodiments, the above-mentioned point cloud-based target tracking device further includes a model training module, which is used to generate a feature map of the previous frame according to the point cloud data of the previous frame; Sample features corresponding to the sample feature map; generate sample labels corresponding to the sample features; train the standard tracking model according to the sample features and sample labels to obtain the target tracking model.

In one of the embodiments, the above-mentioned point cloud-based target tracking device further includes a target area determining module for detecting the point cloud data of the current frame to obtain the target detection area; determining the current frame according to the target detection area and the target tracking area The target area of the current frame corresponding to the point cloud data.

In one of the embodiments, the above-mentioned point cloud-based target tracking device further includes a target data determining module for determining target displacement data according to the target area of the current frame and the target area of the previous frame; acquiring the point cloud collection frequency; The frequency and target displacement data determine the target motion data.

For the specific definition of the point cloud-based target tracking device, please refer to the above definition of the point cloud-based target tracking method, which will not be repeated here. Each module in the above-mentioned point cloud-based target tracking device can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 6. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store the target tracking data based on the point cloud. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by the processor to realize a point cloud-based target tracking method.

Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

In one of the embodiments, a computer device is provided, including a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the one or more processors When executed, the steps in the above method embodiments are implemented.

In one of the embodiments, one or more non-volatile computer-readable storage media storing computer-readable instructions are provided. When the computer-readable instructions are executed by one or more processors, one or more processing The steps in the above method embodiments are implemented when the device is executed.

In one of the embodiments, a vehicle is provided. The vehicle may specifically include self-driving vehicles, electric vehicles, bicycles, and aircraft. The vehicle includes the above-mentioned computer equipment and can execute the above-mentioned embodiment of the point cloud-based target tracking method. Steps in.

The embodiments and implementation objects created by the present invention are not limited to autonomous vehicles, electric vehicles, bicycles, aircrafts, robots, etc., but also include simulation devices and test equipment related to these devices.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database, or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.

The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims

A point cloud-based target tracking method includes:

Generate a feature map of the current frame according to the point cloud data of the current frame;

Acquiring the candidate area corresponding to the feature map of the current frame;

Intercept a candidate feature map matching the candidate region in the current frame feature map;

Extracting the image feature corresponding to the candidate feature map; and

The target tracking model trained based on the point cloud data of the previous frame is called, and the target tracking area corresponding to the point cloud data of the current frame is determined according to the image characteristics.
The method according to claim 1, wherein the generating a feature map of the current frame according to the point cloud data of the current frame comprises:

Acquiring the point cloud data of the current frame;

Structured processing the point cloud data of the current frame to obtain a processing result;

Encoding the points in the point cloud data of the current frame based on the processing result to obtain the point features corresponding to the points; and

The current frame feature map corresponding to the current frame point cloud data is generated according to the point feature.
The method according to claim 2, wherein the generating a current frame feature map corresponding to the current frame point cloud data according to the point feature comprises:

Generating a point feature matrix according to a plurality of said point features;

Calling an image generation model, and input the point feature matrix to the image generation model; and

Obtain the current frame feature map output by the image generation model.
The method according to claim 1, wherein said obtaining the candidate area corresponding to the feature map of the current frame comprises:

Obtain the target area of the previous frame corresponding to the point cloud data of the previous frame;

Expand the target area of the previous frame according to a preset multiple; and

The expanded target area of the previous frame is determined as the candidate area corresponding to the feature map of the current frame.
The method according to claim 1, wherein said extracting the image feature corresponding to the candidate feature map comprises:

Obtain feature extraction model;

Input the candidate feature map to the feature extraction model; and

Perform feature extraction on the candidate feature map according to the feature extraction model to obtain the image feature corresponding to the candidate feature map.
The method according to claim 1, wherein the invoking the target tracking model trained based on the point cloud data of the previous frame, and determining the target tracking area corresponding to the point cloud data of the current frame according to the image features, comprises :

Generating an image feature matrix according to the image features;

Input the image feature matrix to the target tracking model, and obtain the area label output by the target tracking model; and

The target tracking area corresponding to the point cloud data of the current frame is determined according to the area tag.
The method according to claim 1, characterized in that, before the invoking the target tracking model trained based on the point cloud data of the previous frame, the method further comprises:

Generate the feature map of the previous frame according to the point cloud data of the previous frame;

Take a sample of the feature map from the previous frame of feature map, and extract the sample feature corresponding to the sample feature map;

Generate a sample label corresponding to the sample feature; and

The standard tracking model is trained according to the sample feature and the sample label to obtain a target tracking model.
The method according to claim 1, wherein the method further comprises:

Detect the point cloud data of the current frame to obtain the target detection area; and

According to the target detection area and the target tracking area, the current frame target area corresponding to the current frame point cloud data is determined.
The method according to claim 8, wherein the method further comprises:

Determining target displacement data according to the target area of the current frame and the target area of the previous frame;

Obtain the point cloud collection frequency; and

Determine target motion data according to the point cloud collection frequency and the target displacement data.
A point cloud-based target tracking device includes:

The feature map generating module is used to generate the feature map of the current frame according to the point cloud data of the current frame;

A candidate region acquiring module, configured to acquire a candidate region corresponding to the current frame feature map; intercept a candidate feature map matching the candidate region in the current frame feature map;

A feature extraction module for extracting image features corresponding to the candidate feature map; and

The target tracking module is used to call the target tracking model trained based on the point cloud data of the previous frame, and determine the target tracking area corresponding to the point cloud data of the current frame according to the image characteristics.
The device according to claim 10, wherein the feature map generation module is further configured to obtain the current frame point cloud data; structure the current frame point cloud data to obtain a processing result; The processing result encodes the points in the point cloud data of the current frame to obtain a point feature corresponding to the point; and according to the point feature, a current frame feature map corresponding to the point cloud data of the current frame is generated.
A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more Each processor performs the following steps:

Generate a feature map of the current frame according to the point cloud data of the current frame;

Acquiring the candidate area corresponding to the feature map of the current frame;

Intercept a candidate feature map matching the candidate region in the current frame feature map;

Extracting the image feature corresponding to the candidate feature map; and

The target tracking model trained based on the point cloud data of the previous frame is called, and the target tracking area corresponding to the point cloud data of the current frame is determined according to the image characteristics.
The computer device according to claim 12, wherein the processor further executes the following steps when executing the computer-readable instruction:

Acquiring the point cloud data of the current frame;

Structured processing the point cloud data of the current frame to obtain a processing result;

Encoding the points in the point cloud data of the current frame based on the processing result to obtain the point features corresponding to the points; and

The current frame feature map corresponding to the current frame point cloud data is generated according to the point feature.
The computer device according to claim 13, wherein the processor further executes the following steps when executing the computer-readable instruction:

Generating a point feature matrix according to a plurality of said point features;

Calling an image generation model, and input the point feature matrix to the image generation model; and

Obtain the current frame feature map output by the image generation model.
The computer device according to claim 12, wherein the processor further executes the following steps when executing the computer-readable instruction:

Obtain the target area of the previous frame corresponding to the point cloud data of the previous frame;

Expand the target area of the previous frame according to a preset multiple; and

The expanded target area of the previous frame is determined as the candidate area corresponding to the feature map of the current frame.
One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:

Generate a feature map of the current frame according to the point cloud data of the current frame;

Acquiring the candidate area corresponding to the feature map of the current frame;

Intercept a candidate feature map matching the candidate region in the current frame feature map;

Extracting the image feature corresponding to the candidate feature map; and

The target tracking model trained based on the point cloud data of the previous frame is called, and the target tracking area corresponding to the point cloud data of the current frame is determined according to the image characteristics.
The storage medium according to claim 16, wherein the following steps are further executed when the computer-readable instructions are executed by the processor:

Acquiring the point cloud data of the current frame;

Structured processing the point cloud data of the current frame to obtain a processing result;

Encoding the points in the point cloud data of the current frame based on the processing result to obtain the point features corresponding to the points; and

The current frame feature map corresponding to the current frame point cloud data is generated according to the point feature.
18. The storage medium according to claim 17, wherein the following steps are further executed when the computer-readable instructions are executed by the processor:

Generating a point feature matrix according to a plurality of said point features;

Calling an image generation model, and input the point feature matrix to the image generation model; and

Obtain the current frame feature map output by the image generation model.
The storage medium according to claim 16, wherein the following steps are further executed when the computer-readable instructions are executed by the processor:

Obtain the target area of the previous frame corresponding to the point cloud data of the previous frame;

Expand the target area of the previous frame according to a preset multiple; and

The expanded target area of the previous frame is determined as the candidate area corresponding to the feature map of the current frame.
A vehicle, comprising executing the point cloud-based target tracking method according to any one of claims 1-9.