WO2021134285A1

WO2021134285A1 - Image tracking processing method and apparatus, and computer device and storage medium

Info

Publication number: WO2021134285A1
Application number: PCT/CN2019/130077
Authority: WO
Inventors: 许双杰; 何明; 叶茂盛; 邹晓艺; 吴伟; 许家妙; 曹通易
Original assignee: 深圳元戎启行科技有限公司
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2021-07-08
Also published as: CN113490965A

Abstract

An image tracking processing method, comprising: acquiring point cloud data of a current frame; pre-processing the point cloud data of the current frame to generate a projected image; acquiring a standard region image corresponding to point cloud data of a standard frame; calling a target tracking model, and acquiring a candidate region tag corresponding to a candidate region on the basis of the projected image and the standard region image; and determining a target tracking region corresponding to the point cloud data of the current frame according to the candidate region tag.

Description

Image tracking processing method, device, computer equipment and storage medium

Technical field

This application relates to an image tracking processing method, device, computer equipment, storage medium, and transportation.

Background technique

Visual tracking refers to the use of computer technology to extract, identify, and track targets to obtain information such as the location of the target for subsequent processing and analysis. With the development of computer technology, visual tracking technology can be implemented in many application scenarios. For example: visual tracking technology can be applied to related fields such as autonomous driving and assisted driving.

In the traditional way, the visual tracking technology is usually based on the image taken by the camera and other equipment for target tracking. However, the inventor realized that the tracking result is easily affected by the image quality in the way of target tracking based on the captured image. Under the influence of factors such as environmental lighting changes and target movement speed, the image quality is lower, which in turn leads to lower accuracy and robustness of target tracking results.

Summary of the invention

According to various embodiments disclosed in the present application, an image tracking processing method, device, computer equipment, storage medium, and transportation tool are provided.

An image tracking processing method, including:

Obtain the point cloud data of the current frame;

Preprocessing the point cloud data of the current frame to generate a projection image;

Obtain the standard area image corresponding to the standard frame point cloud data;

Calling the target tracking model, and obtaining the candidate area label corresponding to the candidate area based on the projection image and the standard area image; and

The target tracking area corresponding to the point cloud data of the current frame is determined according to the candidate area tag.

An image tracking processing device, including:

Point cloud acquisition module for acquiring point cloud data of the current frame;

The preprocessing module is used to preprocess the point cloud data of the current frame to generate a projection image;

The standard image acquisition module is used to acquire the standard area image corresponding to the standard frame point cloud data; and

The target tracking module is used to call the target tracking model, and obtain the candidate area label corresponding to the candidate area based on the projection image and the standard area image; determine the target tracking area corresponding to the current frame point cloud data according to the candidate area label .

A computer device, including a memory and one or more processors, the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the one or more processors execute The following steps:

Obtain the point cloud data of the current frame;

One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:

Obtain the point cloud data of the current frame;

A vehicle includes the steps of executing the above-mentioned image tracking processing method.

The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. A person of ordinary skill in the art can obtain other drawings based on these drawings without creative work.

Fig. 1 is an application scene diagram of an image tracking processing method according to one or more embodiments.

Fig. 2 is a schematic flowchart of an image tracking processing method according to one or more embodiments.

FIG. 3 is a schematic flowchart of the step of obtaining a standard detection area corresponding to a standard frame image according to one or more embodiments.

Fig. 4 is a block diagram of an image tracking processing device according to one or more embodiments.

Figure 5 is a block diagram of a computer device according to one or more embodiments.

Detailed ways

In order to make the technical solutions and advantages of the present application clearer, the following further describes the present application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.

The image tracking processing method provided in this application can be applied to a variety of application environments. For example, it can be applied to the application environment of automatic driving as shown in FIG. 1, and it can include a laser sensor 102 and a computer device 104. The computer device 104 can communicate with the laser sensor 102 according to the connection established with the laser sensor 102. A wired connection or a wireless connection can be established between the laser sensor 102 and the computer device 104. The laser sensor 102 can collect multi-frame point cloud data of the surrounding environment, the computer device 104 can acquire the current frame point cloud data collected by the laser sensor 102, and the computer device 104 can also acquire preset current frame point cloud data. The computer device 104 preprocesses the point cloud data of the current frame, generates a projection image, and obtains a standard area image corresponding to the standard frame point cloud data. The computer device 104 calls the target tracking model, and obtains the candidate region label corresponding to the candidate region based on the projection image and the standard region image. The computer device 104 determines the target tracking area corresponding to the point cloud data of the current frame according to the candidate area tag. The laser sensor 102 may be a laser sensor carried by an automatic driving device, and may specifically include a laser radar, a laser scanner, and the like.

In one of the embodiments, as shown in FIG. 2, an image tracking processing method is provided. Taking the method applied to the computer device 104 in FIG. 1 as an example for description, the method includes the following steps:

Step 202: Obtain the point cloud data of the current frame.

The laser sensor may be equipped with a device capable of autonomous driving. For example, it can be carried by an unmanned vehicle, or it can be carried by a vehicle including an autonomous driving model. Laser sensors can be used to collect environmental data within the visual range. Specifically, the laser sensor can emit a detection signal, such as a laser beam. The laser sensor compares the signal reflected by the object in the environment with the detection signal to obtain the surrounding environment data. The environmental data collected by the laser sensor may specifically be point cloud data. Point cloud data refers to a collection of point data corresponding to multiple points on the surface of the object in the scanning environment recorded in the form of points. Among them, multiple specifically may refer to two or more than two. The laser sensor can collect according to a preset frequency to obtain multi-frame point cloud data. The preset frequency may be preset according to actual needs, for example, it may be specifically set to 50 frames per second.

The point cloud data may be three-dimensional point cloud data, and each frame of point cloud data may include point data corresponding to multiple points. The point data may specifically include at least one of three-dimensional coordinates, laser reflection intensity, and color information corresponding to the point. Among them, the three-dimensional coordinates may be the coordinates of the point in the Cartesian coordinate system, and specifically include the horizontal axis coordinates, the vertical axis coordinates, and the vertical axis coordinates of the point in the Cartesian coordinate system. The Cartesian coordinate system is a three-dimensional space coordinate system established with the location of the laser sensor as the origin. The three-dimensional space coordinate system includes a horizontal axis (x axis), a vertical axis (y axis), and a vertical axis (z axis). The three-dimensional space coordinate system established with the position of the laser sensor as the origin satisfies the right-hand rule.

Computer equipment can obtain point cloud data. Specifically, the computer device may obtain the collected point cloud data in real time every time the laser sensor collects one frame of point cloud data, or may obtain the collected multi-frame point cloud data after the laser sensor collects the multi-frame point cloud data. The computer equipment can follow the time sequence of the point cloud data collected by the laser sensor, and perform target tracking based on the multi-frame point cloud data in turn. The computer device may record the point cloud data that has started or is in the process of target tracking as the point cloud data of the current frame. Targets can include living or non-living objects in the surrounding environment. The target can be moving or stationary. For example, the target may specifically include at least one of pedestrians, roadblocks, vehicles, and buildings. It is understandable that when the computer equipment finishes tracking the point cloud data of the current frame and starts to track the point cloud data of the next frame, the point cloud data of the current frame can be recorded as the point cloud of the previous frame according to the order of point cloud data collection. Data, get the point cloud data of the next frame and record it as the point cloud data of the current frame.

Step 204: Preprocessing the point cloud data of the current frame to generate a projection image.

The computer device may preprocess the acquired point cloud data of the current frame, and the preprocessing may include at least one of multiple processing methods. Specifically, the preprocessing performed by the computer device on the point cloud data of the current frame may specifically include at least one of processing methods such as data cleaning, point cloud segmentation, and point cloud projection. The computer equipment generates a projection image from point cloud data with a large number of discrete point data, which effectively reduces the amount of data calculation and saves the computing resources of the computer equipment.

For example, the method for the computer device to preprocess the point cloud data of the current frame may include point cloud projection. Specifically, the computer device may obtain point data corresponding to multiple points in the point cloud data of the current frame, and extract the three-dimensional coordinates corresponding to the points from the point data. The computer device can project the points in the point cloud data of the current frame onto a plane according to the three-dimensional coordinates of the points, and record the image formed by the points projected on the plane as the projected image. The generated projection image is a two-dimensional image. For example, the computer device can project the points in the point cloud data of the current frame to the x-y plane where the horizontal axis and the vertical axis are located to obtain a top view of the point cloud, and the computer device can record the top view of the point cloud as a projection image.

The way for the computer equipment to preprocess the point cloud data of the current frame may also include data cleaning and point cloud projection. Specifically, the computer device can clean up the point cloud data of the current frame, and clean up abnormal point data from multiple point data included in the point cloud data of the current frame, thereby avoiding the interference of abnormal point data on target tracking and ensuring Track the accuracy of the results. The computer device can perform point cloud projection according to the cleaned current frame point cloud data to obtain a projected image generated after projection.

The way for the computer equipment to preprocess the point cloud data of the current frame may also include point cloud segmentation and point cloud projection. Specifically, the computer device may divide the point cloud data of the current frame into multiple sub-point clouds according to the point data, and generate a segmentation threshold corresponding to the sub-point cloud based on the point data included in each sub-point cloud. The computer device can segment the points in the corresponding sub-point cloud according to the segmentation threshold, and count the segmentation results corresponding to the multiple sub-point clouds to obtain the ground point set and the non-ground point set corresponding to the point cloud data of the current frame. The computer equipment can project the points in the non-ground point set to generate a projected image. By segmenting the point cloud data, the interference of the ground points on the target tracking is eliminated, thereby ensuring the accuracy of the tracking results. In one of the embodiments, the way for the computer device to preprocess the point cloud data of the current frame may also include data cleaning, point cloud segmentation, and point cloud projection.

Step 206: Obtain a standard area image corresponding to the standard frame point cloud data.

The standard frame point cloud data can be used as a reference basis for target tracking, and the computer device can perform target tracking on the current frame point cloud data based on the standard frame point cloud data. The standard frame point cloud data can be one of a variety of point cloud data. For example, the standard frame point cloud data may be a frame of point cloud data determined by the user from multiple frames of point cloud data according to actual needs, or may be the first frame of point cloud data in the multiple frames of point cloud data collected by the laser sensor.

The computer equipment can obtain the standard area image corresponding to the standard frame point cloud data. The standard frame point cloud data may correspond to one or more standard area images, and the standard area image refers to an image corresponding to the area where the target is located in the standard frame point cloud data. The standard area image can be an image of various shapes. For example, the standard area image can be rectangular or circular. The standard area image may be a part of the standard image corresponding to the standard frame point cloud data, and the standard image may be obtained after point cloud projection is performed according to the standard frame point cloud data.

The computer device can obtain the standard area image corresponding to the standard frame point cloud data in a variety of ways. Specifically, the computer device can detect the standard frame point cloud data to obtain a standard area image corresponding to the standard frame point cloud data. The standard area image can also be preset by the user according to actual needs. For example, the computer device can receive the target to be tracked selected by the user in advance, and determine the standard area image corresponding to the target to be tracked. The computer equipment can obtain the standard area image corresponding to the standard frame point cloud data.

In step 208, the target tracking model is called, and the candidate area label corresponding to the candidate area is obtained based on the projection image and the standard area image.

The computer device can call the target tracking model, and perform tracking processing on the projected image according to the target tracking model to obtain the tracking area corresponding to the point cloud data of the current frame. The target tracking model can be pre-configured in the computer device. The target tracking model can be one of a variety of deep learning models. For example, it may be one of a variety of convolutional neural network models, deep trust network models, and so on. The target tracking model may be obtained after training the deep learning model according to the point cloud image samples.

The computer device can input the projection image generated by preprocessing and the standard area image corresponding to the standard frame point cloud data to the target tracking model, and calculate the projection image and the standard area image through the target tracking model to obtain the candidate output of the target tracking model The label of the candidate area corresponding to the area. The candidate area refers to the area where the target may be located in the projected image, and the candidate area may specifically include the location, range and shape of the area where the target may be located. The candidate area label refers to the tag label corresponding to the candidate area, and the candidate area label is uniquely associated with the candidate area. The candidate area label may include the area confidence or probability value of the candidate area belonging to the real area of the target.

Step 210: Determine the target tracking area corresponding to the point cloud data of the current frame according to the candidate area tag.

The computer device can obtain the candidate area tags corresponding to the multiple candidate areas, and determine the target tracking area corresponding to the point cloud data of the current frame according to the candidate area tags, so as to achieve target tracking. The target tracking area refers to the location area of the target in the point cloud data of the current frame estimated through tracking processing, and the target tracking area may be a target frame corresponding to the target. Specifically, the computer device may use one of a variety of algorithms to determine the target tracking area. For example, the computer device can use the maximum value algorithm to compare multiple candidate area tags with each other, and determine the candidate area corresponding to the candidate area tag with the highest area confidence among the multiple candidate area tags, as the point cloud data corresponding to the current frame Target tracking area.

In one of the embodiments, the computer device may also use a non-maximum suppression algorithm (Non-Maximum Suppression, NMS for short) to filter the candidate region tags. Specifically, the computer device may screen multiple candidate regions according to the region confidence level according to the non-maximum value suppression algorithm, and remove unselected candidate regions each time until the screening ends. The computer device can determine the candidate area corresponding to the selected candidate area label as the target tracking area corresponding to the point cloud data of the current frame, which effectively improves the accuracy of determining the target tracking area from multiple candidate areas.

In this embodiment, the computer device preprocesses the acquired point cloud data of the current frame, generates a projection image, and tracks the projection image. By processing the point cloud data of the current frame with a large amount of discrete point data to generate a projection image, the calculation amount of the computer equipment is effectively reduced, and the calculation resources of the computer equipment are saved. Call the target tracking model to process the standard area image and projection image corresponding to the standard frame point cloud data to obtain the candidate area label corresponding to the candidate area, and determine the target tracking area according to the candidate area label, so as to realize the target based on the current frame point cloud data track. Compared with the traditional image-based target tracking method, the point cloud data collected by the laser sensor is not easily affected by factors such as environmental lighting changes and target movement speed, which effectively improves the accuracy and robustness of target tracking.

In one of the embodiments, the point cloud data of the current frame is preprocessed, and the steps of generating a projection image include: obtaining a target tracking task; obtaining a corresponding image plane according to the target tracking task; projecting the points in the current frame point cloud data To the image plane, get the projected image.

The computer equipment can acquire the target tracking task, and the target tracking task can be used to instruct the computer equipment and the laser sensor to track the target. The target tracking task can be triggered according to the user's operating instructions, or it can be automatically generated by the computer equipment according to actual needs. Target tracking tasks can carry tracking task types. The tracking task type refers to the task type corresponding to the target tracking task, and the target tracking task can correspond to one of a variety of task types.

The tracking task type can be used to represent multiple tracking scenarios. In different tracking scenarios, the requirements for point cloud projection can be different, and the tracking task type of the target tracking task can also be different. The computer device can obtain the image plane corresponding to the tracking task type according to the tracking task type. The image plane is used to project the point cloud data of the current frame to generate a projected image. In different tracking scenarios, the computer equipment can determine different planes as image planes.

For example, when a vehicle equipped with a laser sensor is driving on a level road, it is necessary to determine the distribution of the target in the horizontal plane where the vehicle is located. The computer equipment can determine the horizontal plane where the laser sensor is located, that is, the xy plane formed by the horizontal axis and the vertical axis in the spatial coordinate system. As the image plane, the vertical axis coordinates in the three-dimensional coordinates of the points are not considered. When a vehicle equipped with a laser sensor is driving on an uphill or downhill route, the computer equipment can determine the vertical plane corresponding to the laser sensor, that is, the yz plane formed by the vertical axis and the vertical axis in the space coordinate system as the image plane, regardless of the three-dimensional point The abscissa coordinate in the coordinate.

The computer device can project multiple points in the point cloud data of the current frame, project the multiple points into the image plane, and obtain multiple projection points in the image plane. The computer device can record the images corresponding to multiple projection points in the image plane as the projected image, and the projected image is a two-dimensional image. The computer device can track according to the generated projection image to obtain a two-dimensional target tracking area in the projection image.

In one of the embodiments, the computer device may obtain multiple image planes, and respectively project the points in the point cloud data of the current frame to the multiple image planes to obtain multiple projection images. The computer device can separately track multiple projection images to obtain target tracking areas corresponding to the multiple projection images. It can be understood that the target tracking area determined in the two-dimensional projection image is also two-dimensional. The computer equipment can synthesize the target tracking area corresponding to multiple projection images to generate the three-dimensional target tracking area corresponding to the point cloud data of the current frame, so as to more accurately determine the position and size of the tracked target in the three-dimensional space, which is beneficial to the computer equipment according to Three-dimensional target tracking area for analysis and control of automatic driving.

In this embodiment, the computer device can determine the corresponding image plane according to the target tracking task, and project the points in the point cloud data of the current frame to the image plane corresponding to the target tracking task to obtain the projected image. The dimensionality reduction reduces the data volume of the point cloud data of the current frame. The computer equipment performs target tracking according to the generated projection image, and can use the image characteristics in the projection image. Compared with the traditional Kalman filtering method of point cloud data to achieve target tracking, it effectively improves the target tracking based on point cloud data. Accuracy.

In one of the embodiments, the step of obtaining the standard area image corresponding to the standard frame point cloud data includes: generating a standard frame image according to the standard frame point cloud data; obtaining the standard detection area corresponding to the standard frame image; intercepting the standard frame image The standard area image that matches the standard detection area.

Computer equipment can obtain standard frame point cloud data. The standard frame point cloud data can be a frame of point cloud data where the user determines the target from the multi-frame point cloud data according to actual needs, or it can be the first frame of point cloud data in the multi-frame point cloud data collected by the laser sensor.

Specifically, the computer device may use various methods to generate a standard frame image based on the standard frame point cloud data. For example, the computer device can project the points in the standard frame point cloud data, and determine the image obtained by the projection as the standard frame image. The way that the computer device projects the standard frame image according to the standard frame point cloud data may be similar to the way of generating the projection image according to the current frame point cloud data in the above embodiment, so it will not be repeated here. The computer equipment can also obtain the point data included in the standard frame point cloud data, encode the points according to the point data, and obtain the point features corresponding to each of the multiple points, and generate the feature map according to the point features corresponding to the multiple points. The computer equipment can The feature map generated from the standard frame point cloud data is recorded as the standard frame image.

The computer device can obtain the standard detection area corresponding to the standard frame image. The standard detection area can be used to indicate the area where the target is located in the standard frame image, and it can be a part of the area range in the standard frame image. The standard detection area can be detected by computer equipment based on standard frame point cloud data. Specifically, the computer device can perform target detection based on the standard frame point cloud data to obtain the standard detection area. The computer device can also generate a standard frame image based on the standard frame point cloud data, and then perform target detection based on the standard frame image to obtain a standard detection area. The standard detection area may specifically include the position, range, and area shape of the target in the standard frame point cloud data. The computer device can obtain one standard detection area corresponding to the standard frame image, and can also obtain multiple corresponding standard detection areas.

The computer equipment can intercept the standard area image in the standard frame image according to the standard detection area corresponding to the standard frame image to obtain the standard area image corresponding to the standard detection area. The standard area image may include the target to be tracked, and the intercepted standard area image matches the size and shape of the standard area.

In this embodiment, the computer device generates a standard frame image according to the standard frame point cloud data, acquires a standard detection area corresponding to the standard frame image, and intercepts a standard area image matching the standard detection area from the standard frame image. The computer equipment can use the intercepted standard area image as the basis of target tracking, and perform target tracking on the projected image. By generating the image, the depth characteristics of the point cloud data are used, which effectively improves the accuracy of target tracking.

In one of the embodiments, as shown in Fig. 3, the step of obtaining the standard detection area corresponding to the standard frame image includes:

Step 302: Perform rasterization processing on the standard frame point cloud data to obtain multiple rasters.

Step 304: Extract point features corresponding to the standard frame point cloud data in the multiple rasters to generate a point feature matrix.

Step 306: Invoke the target detection model, and input the point feature matrix into the target detection model to obtain the point cloud detection area corresponding to the standard frame point cloud data.

Step 308: Determine a standard detection area corresponding to the standard frame image according to the point cloud detection area.

The computer equipment can detect the target according to the standard frame point cloud data, and obtain the standard detection area corresponding to the target. Specifically, the computer device may perform rasterization processing on the standard frame point cloud data, and divide the three-dimensional space corresponding to the standard frame point cloud data into multiple grids. The computer device can determine the grid to which the point belongs according to the three-dimensional coordinates of the point in the standard frame point cloud data.

The computer equipment can count the point data corresponding to the points in each grid, perform feature extraction on the points in each grid, and obtain the point features corresponding to the points. Specifically, the computer device can call the feature extraction model to extract the point features in the grid. The feature extraction model can be obtained after training through a large number of point cloud samples and point feature samples. The feature extraction model can be one of a variety of neural network models. For example, the feature extraction model may be a convolutional neural network model, and specifically may be a PointNet model. The computer device can input the point data in each grid to the feature extraction model, and calculate the point data through the feature extraction model to obtain the point features output by the feature extraction model. The computer equipment can count the point features corresponding to multiple points in the grid to generate a point feature matrix. The point feature matrix can be a three-dimensional matrix.

The computer equipment can call the target detection model, and detect the target in the standard frame point cloud data through the target detection model. The target detection model may be pre-trained and configured in the computer device. The target detection model may be obtained after training based on a convolutional neural network (Convolutional Neural Networks, referred to as CNN) model, and the target detection model may specifically include one of a YOLO model or a Mask RCNN model. The computer device can input the generated point feature matrix to the target detection model, and calculate the point feature matrix through the target detection model to obtain the detection area output by the target detection model. The computer equipment can de-rasterize the detection area output by the target detection model to obtain the point cloud detection area corresponding to the standard frame point cloud data.

Since the point cloud detection area is a three-dimensional detection area corresponding to the standard frame point cloud data, the computer can determine the standard detection area corresponding to the standard frame image according to the point cloud detection area. Specifically, the computer device can project the point cloud detection area to the corresponding image plane according to the standard frame point cloud data projection to generate the standard frame image, to obtain the standard detection area corresponding to the standard frame image.

In one of the embodiments, when the computer device performs detection based on the standard frame image, the two-dimensional detection area corresponding to the standard frame image can be obtained. The computer equipment can directly record the two-dimensional detection area corresponding to the standard frame image as the standard detection area corresponding to the standard frame image.

In this embodiment, the computer device can call the target detection model to detect the point feature matrix corresponding to the standard frame point cloud data, and obtain the standard detection area corresponding to the standard frame image, so that the computer device can compare the current frame point cloud data based on the standard detection area. Tracking the target in the target, effectively improving the accuracy of target tracking.

In one of the embodiments, calling the target tracking model and obtaining the candidate area label corresponding to the candidate area based on the projection image and the standard area image includes: extracting the current image feature corresponding to the projected image and the standard image feature corresponding to the standard area image; Input the current image features and standard image features into the target tracking model; filter the current image features and standard image features based on the target tracking model to obtain candidate region labels corresponding to multiple candidate regions output by the target tracking model.

The computer device can perform feature extraction on the projection image corresponding to the current frame point cloud data and the standard region image corresponding to the standard frame point cloud data to obtain the current image feature corresponding to the projection image and the standard image feature corresponding to the standard region image. Specifically, the computer device may sequentially extract the image features of the projected image and the standard area image in a single thread, or it may extract the image characteristics of the projected image and the standard area image in parallel in multiple threads. The computer equipment can call the image feature model, perform feature extraction on the projection image and the standard area image, and obtain the current image features and standard image features output by the image feature model. The image feature model may be a two-dimensional convolutional neural network model. In one of the embodiments, when the computer device extracts image features in parallel in multiple threads, the computer device can obtain the twin network model corresponding to the image feature model, and extract the features of the projected image and the standard area image in parallel.

The computer device can input the extracted current image features and standard image features into the target tracking model. The target tracking model can be one of a variety of convolutional neural network models. For example, the target tracking model may specifically include a SiamMask model, a Siamese RPN (Region Proposal Network) model, etc. The computer equipment can process the current image features and standard image features based on the target tracking model. Specifically, the target tracking model can perform convolution filtering on the current image feature and the standard image feature, and compare the current image feature with the standard image feature respectively to obtain the candidate region labels corresponding to the multiple candidate regions output by the target tracking model. The avatar image of the candidate area corresponds to the standard area image. In one of the embodiments, when the standard frame point cloud data corresponds to multiple standard area images, the computer can obtain the twin network models corresponding to the multiple target tracking models, and perform operations on the standard image features corresponding to the multiple standard area images to obtain Candidate regions corresponding to multiple standard region images.

In this embodiment, the computer device calculates the current image feature corresponding to the projected image and the standard image feature corresponding to the standard area image by calling the target tracking model to obtain candidate area labels corresponding to multiple candidate areas, making full use of the point cloud The image features of the image corresponding to the data are used to determine multiple candidate regions through the deep learning model. Compared with the tracking method of Kalman filtering on the point cloud, the accuracy of target tracking is effectively improved.

In one of the embodiments, the step of filtering the current image features and standard image features based on the target tracking model includes: obtaining a historical feature matrix; adjusting the standard image features according to the historical feature matrix, and filtering according to the adjusted image features deal with.

Before calling the target tracking model to perform operations on standard image features and current image features, the computer device can also obtain a historical feature matrix. The historical feature matrix refers to a feature matrix generated by a computer device based on historical image features corresponding to historical target images in historical point cloud data. The historical point cloud data may include the point cloud data including the target collected by the laser sensor before the point cloud data of the current frame. The historical feature matrix may be generated from image features corresponding to the target in multiple frames of historical point cloud data, and the historical feature matrix and historical point cloud data may be stored in a memory corresponding to the computer device.

It is understandable that the computer device can record the current frame point cloud data as historical point cloud data after finishing target tracking on the current frame point cloud data. The computer device can adjust the historical feature matrix according to the image characteristics of the target tracking area corresponding to the current frame point cloud data, and continuously adjust the historical feature matrix corresponding to the target, which effectively improves the accuracy and robustness of the historical feature matrix corresponding to the target.

The computer device can adjust the standard image feature according to the acquired historical feature matrix. Specifically, the computer device can perform convolution processing on the historical feature matrix and the standard image feature through the target tracking model to obtain the adjusted image feature. The computer device may perform convolution filtering according to the adjusted image feature and the current image feature to obtain candidate region labels corresponding to multiple candidate regions.

In this embodiment, the computer device can obtain the historical feature matrix corresponding to the target to adjust the standard image features, and perform filtering processing according to the adjusted image features to obtain candidate region labels corresponding to multiple candidate regions. The standard image features are adjusted through the historical feature matrix corresponding to the target. The adjusted image features can more accurately reflect the characteristics of the target in the image. Through the multi-frame point cloud data in the historical time, the accuracy of target tracking is effectively improved. And robustness.

It should be understood that, although the various steps in the flowcharts of FIGS. 2-3 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figure 2-3 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.

In one embodiment, as shown in FIG. 4, an image tracking processing device is provided, including: a point cloud acquisition module 402, a preprocessing module 404, a standard image acquisition module 406, and a target tracking module 408, wherein:

The point cloud acquisition module 402 is used to acquire the point cloud data of the current frame.

The preprocessing module 404 is used to preprocess the point cloud data of the current frame to generate a projection image.

The standard image acquisition module 406 is used to acquire the standard area image corresponding to the standard frame point cloud data.

The target tracking module 408 is used to call the target tracking model, obtain the candidate area label corresponding to the candidate area based on the projection image and the standard area image; determine the target tracking area corresponding to the point cloud data of the current frame according to the candidate area label.

In one of the embodiments, the preprocessing module 404 is also used to obtain a target tracking task; obtain a corresponding image plane according to the target tracking task; project a point in the point cloud data of the current frame onto the image plane to obtain a projected image.

In one of the embodiments, the above-mentioned standard image acquisition module 406 is further configured to generate a standard frame image according to the standard frame point cloud data; acquire the standard detection area corresponding to the standard frame image; and intercept the standard frame image to match the standard detection area Standard area image.

In one of the embodiments, the above-mentioned standard image acquisition module 406 is also used for rasterizing the standard frame point cloud data to obtain multiple grids; extracting point features corresponding to the standard frame point cloud data in the multiple grids, Generate a point feature matrix; call the target detection model and input the point feature matrix to the target detection model to obtain the point cloud detection area corresponding to the standard frame point cloud data; determine the standard detection area corresponding to the standard frame image according to the point cloud detection area.

In one of the embodiments, the target tracking module 408 is also used to extract the current image features corresponding to the projected image and the standard image features corresponding to the standard area image; input the current image features and standard image features into the target tracking model; The tracking model performs filtering processing on the current image features and standard image features to obtain candidate region labels corresponding to multiple candidate regions output by the target tracking model.

In one of the embodiments, the target tracking module 408 is also used to obtain a historical feature matrix; adjust the standard image features according to the historical feature matrix, and perform filtering processing according to the adjusted image features.

In one of the embodiments, the candidate area label includes the area confidence, and the target tracking module 408 is also used to screen multiple candidate areas according to the area confidence; determine the selected candidate area as the target corresponding to the point cloud data of the current frame Tracking area.

For the specific definition of the image tracking processing device, please refer to the above definition of the image tracking processing method, which will not be repeated here. Each module in the above-mentioned image tracking processing device can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules can be embedded in the form of hardware or independent of the processor in the computer equipment, or can be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the corresponding operations of the above-mentioned modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 5. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store image tracking processing data. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer readable instruction is executed by the processor to realize an image tracking processing method.

Those skilled in the art can understand that the structure shown in FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

In one of the embodiments, a computer device is provided, including a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the one or more processors When executed, the steps in the above method embodiments are implemented.

In one of the embodiments, one or more non-volatile computer-readable storage media storing computer-readable instructions are provided. When the computer-readable instructions are executed by one or more processors, one or more processing The steps in the above method embodiments are implemented when the device is executed.

In one of the embodiments, a vehicle is provided. The vehicle may specifically include self-driving vehicles, electric vehicles, bicycles, and aircraft. The vehicle includes the above-mentioned computer equipment and can execute the steps in the above-mentioned image tracking processing method embodiment. .

The embodiments and implementation objects created by the present invention are not limited to autonomous vehicles, electric vehicles, bicycles, aircrafts, robots, etc., but also include simulation devices and test equipment related to these devices.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database, or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.

The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims

An image tracking processing method, including:

Obtain the point cloud data of the current frame;

Preprocessing the point cloud data of the current frame to generate a projection image;

Obtain the standard area image corresponding to the standard frame point cloud data;

Calling the target tracking model, and obtaining the candidate area label corresponding to the candidate area based on the projection image and the standard area image; and

The target tracking area corresponding to the point cloud data of the current frame is determined according to the candidate area tag.
The method according to claim 1, wherein the preprocessing the point cloud data of the current frame to generate a projection image comprises:

Obtain target tracking tasks;

Obtain a corresponding image plane according to the target tracking task; and

Projecting the points in the point cloud data of the current frame onto the image plane to obtain a projected image.
The method according to claim 1, wherein said obtaining a standard area image corresponding to standard frame point cloud data comprises:

Generating a standard frame image according to the standard frame point cloud data;

Acquiring the standard detection area corresponding to the standard frame image; and

A standard area image matching the standard detection area is intercepted from the standard frame image.
The method according to claim 3, wherein said obtaining the standard detection area corresponding to the standard frame image comprises:

Performing rasterization processing on the standard frame point cloud data to obtain multiple rasters;

Extracting point features corresponding to standard frame point cloud data in a plurality of said grids to generate a point feature matrix;

Calling a target detection model, input the point feature matrix to the target detection model, and obtain the point cloud detection area corresponding to the standard frame point cloud data; and

The standard detection area corresponding to the standard frame image is determined according to the point cloud detection area.
The method according to claim 1, wherein the invoking the target tracking model to obtain the candidate area label corresponding to the candidate area based on the projection image and the standard area image comprises:

Extracting the current image feature corresponding to the projected image and the standard image feature corresponding to the standard area image;

Inputting the current image feature and the standard image feature to the target tracking model; and

Perform filtering processing on the current image feature and the standard image feature based on the target tracking model to obtain candidate region labels corresponding to multiple candidate regions output by the target tracking model.
The method according to claim 5, wherein the filtering processing of the current image feature and the standard image feature based on the target tracking model comprises:

Obtain the historical feature matrix; and

The standard image feature is adjusted according to the historical feature matrix, and filtering processing is performed according to the adjusted image feature.
The method according to claim 1, wherein the candidate area label includes an area confidence, and the determining the target tracking area corresponding to the current frame point cloud data according to the candidate area label comprises:

Screening a plurality of the candidate regions according to the region confidence; and

Determine the selected candidate area as the target tracking area corresponding to the point cloud data of the current frame.
An image tracking processing device, including:

Point cloud acquisition module for acquiring point cloud data of the current frame;

The preprocessing module is used to preprocess the point cloud data of the current frame to generate a projection image;

The standard image acquisition module is used to acquire the standard area image corresponding to the standard frame point cloud data; and

The target tracking module is used to call the target tracking model, and obtain the candidate area label corresponding to the candidate area based on the projection image and the standard area image; determine the target tracking area corresponding to the current frame point cloud data according to the candidate area label .
The device according to claim 8, wherein the pre-processing module is further used to obtain a target tracking task; obtain a corresponding image plane according to the target tracking task; and combine the current frame point cloud data The point is projected onto the image plane to obtain a projected image.
A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more Each processor performs the following steps:

Obtain the point cloud data of the current frame;

Preprocessing the point cloud data of the current frame to generate a projection image;

Obtain the standard area image corresponding to the standard frame point cloud data;

Calling the target tracking model, and obtaining the candidate area label corresponding to the candidate area based on the projection image and the standard area image; and

The target tracking area corresponding to the point cloud data of the current frame is determined according to the candidate area tag.
The computer device according to claim 10, wherein the processor further executes the following steps when executing the computer-readable instruction:

Obtain target tracking tasks;

Obtain a corresponding image plane according to the target tracking task; and

Projecting the points in the point cloud data of the current frame onto the image plane to obtain a projected image.
The computer device according to claim 10, wherein the processor further executes the following steps when executing the computer-readable instruction:

Generating a standard frame image according to the standard frame point cloud data;

Acquiring the standard detection area corresponding to the standard frame image; and

A standard area image matching the standard detection area is intercepted from the standard frame image.
The computer device according to claim 12, wherein the processor further executes the following steps when executing the computer-readable instruction:

Performing rasterization processing on the standard frame point cloud data to obtain multiple rasters;

Extracting point features corresponding to standard frame point cloud data in a plurality of said grids to generate a point feature matrix;

Calling a target detection model, input the point feature matrix to the target detection model, and obtain the point cloud detection area corresponding to the standard frame point cloud data; and

The standard detection area corresponding to the standard frame image is determined according to the point cloud detection area.
The computer device according to claim 10, wherein the processor further executes the following steps when executing the computer-readable instruction:

Extracting the current image feature corresponding to the projected image and the standard image feature corresponding to the standard area image;

Inputting the current image feature and the standard image feature to the target tracking model; and

Perform filtering processing on the current image feature and the standard image feature based on the target tracking model to obtain candidate region labels corresponding to multiple candidate regions output by the target tracking model.
The computer device according to claim 14, wherein the processor further executes the following steps when executing the computer-readable instruction:

Obtain the historical feature matrix; and

The standard image feature is adjusted according to the historical feature matrix, and filtering processing is performed according to the adjusted image feature.
One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:

Obtain the point cloud data of the current frame;

Preprocessing the point cloud data of the current frame to generate a projection image;

Obtain the standard area image corresponding to the standard frame point cloud data;

Calling the target tracking model, and obtaining the candidate area label corresponding to the candidate area based on the projection image and the standard area image; and

The target tracking area corresponding to the point cloud data of the current frame is determined according to the candidate area tag.
The storage medium according to claim 16, wherein the following steps are further executed when the computer-readable instructions are executed by the processor:

Obtain target tracking tasks;

Obtain a corresponding image plane according to the target tracking task; and

Projecting the points in the point cloud data of the current frame onto the image plane to obtain a projected image.
The storage medium according to claim 16, wherein the following steps are further executed when the computer-readable instructions are executed by the processor:

Generating a standard frame image according to the standard frame point cloud data;

Acquiring the standard detection area corresponding to the standard frame image; and

A standard area image matching the standard detection area is intercepted from the standard frame image.
18. The storage medium of claim 18, wherein the following steps are further performed when the computer-readable instructions are executed by the processor:

Performing rasterization processing on the standard frame point cloud data to obtain multiple rasters;

Extracting point features corresponding to standard frame point cloud data in a plurality of said grids to generate a point feature matrix;

Calling a target detection model, input the point feature matrix to the target detection model, and obtain the point cloud detection area corresponding to the standard frame point cloud data; and

The standard detection area corresponding to the standard frame image is determined according to the point cloud detection area.
A vehicle, comprising executing the image tracking processing method according to any one of claims 1-7.