CN115984583A

CN115984583A - Data processing method, apparatus, computer device, storage medium and program product

Info

Publication number: CN115984583A
Application number: CN202211722555.8A
Authority: CN
Inventors: 申瑜茜; 陈影; 姜波; 王镇; 王瑞
Original assignee: Guangzhou Woya Technology Co ltd
Current assignee: Guangzhou Woya Technology Co ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-04-18
Anticipated expiration: 2042-12-30
Also published as: CN115984583B

Abstract

The present application relates to a data processing method, apparatus, computer device, storage medium and program product. The method comprises the following steps: acquiring point cloud data of a target area; inputting the point cloud data into a feature extraction network to obtain voxel features corresponding to each voxel point cloud in the point cloud data, wherein each voxel point cloud is obtained by performing voxel segmentation on the point cloud data, and each voxel feature is used for performing point cloud segmentation processing or target detection processing on the point cloud data; the feature extraction network is obtained by performing comparison learning based on the super-pixel features and the point cloud features corresponding to the super-pixels in the sample image. By adopting the method, the data processing efficiency can be improved.

Description

Data processing method, apparatus, computer device, storage medium and program product

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a data processing method, an apparatus, a computer device, a storage medium, and a program product.

Background

Lidar is currently being used in more and more industries. Taking automatic driving as an example, the laser radar is used as the 'eyes' of the vehicle, can sense the driving environment of the vehicle, detect road conditions and obstacles in the driving process of the vehicle, and has an extremely important role.

The point cloud data generated by the laser radar contains a lot of information, the data volume is huge, and in the related technology, when the point cloud data is used for point cloud segmentation or target detection, supervised learning is usually performed on the basis of a lot of point cloud data with labels in a model training stage.

However, the above method has a problem of low data processing efficiency.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a data processing method, an apparatus, a computer device, a storage medium, and a program product capable of improving data processing efficiency.

In a first aspect, the present application provides a data processing method. The method comprises the following steps:

acquiring point cloud data of a target area;

inputting the point cloud data into a feature extraction network to obtain voxel features corresponding to each voxel point cloud in the point cloud data, wherein each voxel point cloud is obtained by performing voxel segmentation on the point cloud data, and each voxel feature is used for performing point cloud segmentation processing or target detection processing on the point cloud data;

the feature extraction network is obtained by performing comparison learning based on the super-pixel features and the point cloud features corresponding to the super pixels in the sample image.

In one embodiment, the method further comprises:

acquiring the sample image and sample point cloud data aiming at a sample region, and performing superpixel segmentation on the sample image to obtain each superpixel;

acquiring a super-pixel characteristic corresponding to each super-pixel according to the sample image, and acquiring a point cloud characteristic corresponding to each super-pixel according to the sample point cloud data;

and performing comparison learning based on each super-pixel feature and each point cloud feature to obtain the feature extraction network.

In one embodiment, the obtaining of the super-pixel feature corresponding to each super-pixel according to the sample image includes:

inputting the sample image into a neural network model to obtain pixel characteristics corresponding to the sample image;

and determining target pixel characteristics corresponding to the super pixels from the pixel characteristics, and performing pooling processing on the target pixel characteristics to obtain the super pixel characteristics.

In one embodiment, the obtaining the point cloud feature corresponding to each super pixel according to the sample point cloud data includes:

inputting the sample point cloud data into an initial feature extraction network to obtain initial voxel features corresponding to each initial voxel point cloud in the sample point cloud data;

and acquiring the point cloud characteristics corresponding to the super pixels according to the initial voxel characteristics.

In one embodiment, the obtaining the point cloud feature corresponding to each superpixel according to each initial voxel feature includes:

according to the coordinate mapping relation between the sample point cloud data and the sample image, projecting each initial voxel characteristic to a coordinate system where the pixel characteristic is located to obtain an intermediate voxel characteristic;

and determining target voxel characteristics corresponding to the superpixels respectively from the intermediate voxel characteristics, and performing pooling processing on the target voxel characteristics respectively to obtain the point cloud characteristics.

In one embodiment, the method further comprises:

performing downsampling processing on the intermediate voxel characteristic so that the characteristic dimension of the intermediate voxel characteristic after the downsampling processing is the same as the characteristic dimension of the pixel characteristic;

the determining the target voxel characteristics respectively corresponding to the superpixels from the intermediate voxel characteristics comprises:

and determining the target voxel characteristics corresponding to the super pixels from the intermediate voxel characteristics after the down-sampling processing.

In one embodiment, the method further comprises:

according to the coordinate mapping relation, eliminating invalid point features from the intermediate voxel features to obtain the eliminated intermediate voxel features;

the determining of the target voxel characteristics respectively corresponding to the superpixels from the intermediate voxel characteristics includes:

and determining the target voxel characteristics corresponding to the super pixels from the intermediate voxel characteristics after the elimination.

In one embodiment, the method further comprises:

inputting each voxel characteristic into a prediction network to obtain a prediction category corresponding to each voxel point cloud, wherein the prediction network is obtained by utilizing the characteristic extraction network and a point cloud training sample to perform fine tuning pre-training on an initial prediction network.

In a second aspect, the present application further provides a data processing apparatus. The device comprises:

the first acquisition module is used for acquiring point cloud data of a target area;

the processing module is used for inputting the point cloud data into a feature extraction network to obtain voxel features corresponding to each voxel point cloud in the point cloud data, each voxel point cloud is obtained by performing voxel segmentation on the point cloud data, and each voxel feature is used for performing point cloud segmentation processing or target detection processing on the point cloud data;

the feature extraction network is obtained by performing comparison learning based on the super-pixel features and the point cloud features corresponding to the super-pixels in the sample image.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method according to the first aspect as described above when the computer program is executed by the processor.

In a fourth aspect, the present application further provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the method according to the first aspect.

In a fifth aspect, the present application further provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, performs the steps of the method according to the first aspect as described above.

According to the data processing method, the data processing device, the computer equipment, the storage medium and the program product, the point cloud data of the target area is obtained, then the point cloud data is input into the feature extraction network, the voxel features corresponding to the voxel point clouds in the point cloud data are obtained, the voxel point clouds are obtained by performing voxel segmentation on the point cloud data, the voxel features are used for performing point cloud segmentation processing or target detection processing on the point cloud data, and the feature extraction network is obtained by performing contrast learning on the superpixel features and the point cloud features corresponding to the superpixels in the sample image.

Drawings

FIG. 1 is a diagram illustrating an internal structure of a computer device according to an embodiment;

FIG. 2 is a flow diagram illustrating a data processing method according to one embodiment;

FIG. 3 is a schematic flow diagram of training a feature extraction network in one embodiment;

FIG. 4 is a schematic diagram of a sample image and superpixel segmentation of the sample image into superpixels, according to one embodiment;

FIG. 5 is a schematic flow chart illustrating obtaining superpixel features corresponding to each superpixel in one embodiment;

FIG. 6 is a schematic flow chart illustrating an embodiment of obtaining point cloud features corresponding to superpixels;

FIG. 7 is a flow diagram illustrating a step 602, according to an embodiment;

FIG. 8 is a diagram illustrating mapping relationships between sample point cloud data and sample images according to one embodiment;

FIG. 9 is a block diagram of a training process for a feature extraction network in one embodiment;

FIG. 10 is a diagram illustrating segmentation results obtained by performing panorama segmentation on a sample image according to an embodiment;

FIG. 11 is a block diagram showing the structure of a data processing apparatus according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

LiDAR (Light Detection And Ranging) provides rich information about the 3D world And is now increasingly used in all industries. Taking automatic driving as an example, the laser radar is used as the 'eyes' of the vehicle, and can sense the driving environment of the vehicle and detect road conditions and obstacles in the driving process of the vehicle, so that understanding of point cloud data generated by the laser radar is important for safe driving of the vehicle under different external conditions.

In the related technology, when point cloud data is used for point cloud segmentation (or referred to as semantic segmentation) or target detection, a network structure of a feature extraction network + a prediction network/a target detection network is generally needed, and in a model training stage of the feature extraction network, supervised learning is generally performed based on a large amount of point cloud data with labels. However, labeling point cloud data is a time-consuming and costly task, and the method for training the feature extraction network has the problem of low data processing efficiency.

In view of this, an embodiment of the present application provides a data processing method, where point cloud data of a target area is obtained, and then the point cloud data is input into a feature extraction network, so as to obtain voxel features corresponding to voxel point clouds in the point cloud data, where each voxel point cloud is obtained by performing voxel segmentation on the point cloud data, and each voxel feature is used to perform point cloud segmentation processing or target detection processing on the point cloud data, and the feature extraction network is obtained by performing contrast learning based on a superpixel feature and a point cloud feature corresponding to each superpixel in a sample image.

Next, an implementation environment of the data processing method provided in the embodiment of the present application will be briefly described.

For example, the data processing method provided by the embodiment of the present application may be used in a computer device shown in fig. 1, where the computer device may be a terminal or a server. The internal structure of the embodiment of the present application can be as shown in fig. 1. The computer device comprises a processor, a memory, an Input/Output (I/O) interface and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data processing data. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a data processing method.

Those skilled in the art will appreciate that the architecture shown in fig. 1 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

It should be noted that, in the data processing method provided in the embodiments of the present application, the execution main body may also be a data processing apparatus, and the data processing apparatus may be implemented as part of or all of a computer device by software, hardware, or a combination of software and hardware. In the following method embodiments, the execution subject is a computer device as an example.

In one embodiment, as shown in fig. 2, there is provided a data processing method including the steps of:

step 201, a computer device acquires point cloud data of a target area.

Taking the data processing method applied to an automatic driving scene as an example: the computer device may be a terminal, for example a terminal of a driver, but of course also an associated controller in the vehicle; the vehicle is also provided with a laser radar, and the target area is a scanning area covered by the laser radar.

In this way, during the driving process of the vehicle, the computer device may acquire the point cloud data of the target area scanned by the laser radar, where the point cloud data includes a plurality of information, such as three-dimensional coordinates (XYZ), laser reflection intensity, and color information.

Step 202, inputting the point cloud data into a feature extraction network by the computer equipment to obtain a voxel feature corresponding to each voxel point cloud in the point cloud data.

The feature extraction network is obtained by pre-training, in the embodiment of the application, sample point cloud data (namely point cloud data serving as a training sample) and a sample image (serving as an image of the training sample) are combined for self-supervised learning to obtain the feature extraction network, and the sample point cloud data and the sample image are both acquired for the same sample area.

As an implementation mode, on the basis of each super pixel divided in a sample image, the super pixel feature of each super pixel is obtained according to the sample image, the point cloud feature of each super pixel and the super pixel feature, namely the feature of pixel dimension, are obtained according to sample point cloud data, and then comparison learning is performed according to the obtained features, namely a feature extraction network is obtained on the basis of the super pixel feature and the point cloud feature corresponding to each super pixel in the sample image.

In the training process, the characteristic extraction network fully learns the voxel segmentation of sample point cloud data and extracts the sample voxel characteristics corresponding to each sample voxel point cloud obtained by voxel segmentation in a contrast learning mode. The voxel can be Pillar, which is to divide all points in the point cloud uniformly along the horizontal x-y direction, and the step of dividing the sample point cloud data by the voxel means that all points in the sample point cloud data are divided uniformly along the horizontal x-y direction, and the step of dividing is not performed in the Z direction, so that each sample voxel point cloud means a point cloud data set of all points in each unit column after voxel division.

The specific training process of the feature extraction network will be described in detail in the following embodiments, and will not be described herein again.

In this way, after the computer device obtains the point cloud data of the target area, the computer device inputs the point cloud data into the feature extraction network to obtain the voxel features corresponding to the voxel point clouds in the point cloud data, each voxel point cloud is obtained by performing voxel segmentation on the point cloud data, namely performing voxel segmentation on the point cloud data through the feature extraction network to obtain each voxel point cloud, and then performing feature extraction on each voxel point cloud to obtain the voxel features corresponding to each voxel point cloud.

In the embodiment of the application, each voxel feature is used for performing point cloud segmentation processing or target detection processing on point cloud data, the point cloud segmentation processing refers to performing voxel segmentation on the point cloud data to obtain each voxel point cloud and outputting corresponding prediction categories to each voxel point cloud, and the target detection processing refers to outputting the position of a target in the point cloud data in the form of a target position frame.

Since the point cloud data is three-dimensional data of a space, the feature extraction network is a 3D backhaul network, and in an actual application process, a corresponding 3D Head (Head network) can be selected according to a business requirement, and connected to the feature extraction network to form a network architecture of the 3D backhaul +3D Head, so as to implement a required specific function, for example, a point cloud segmentation function or a target detection function.

In the embodiment, the point cloud data of the target area is obtained, and then the point cloud data is input into the feature extraction network to obtain the voxel features corresponding to the voxel point clouds in the point cloud data, wherein the voxel point clouds are obtained by performing voxel segmentation on the point cloud data, the voxel features are used for performing point cloud segmentation processing or target detection processing on the point cloud data, and the feature extraction network is obtained by performing comparison learning based on the super-pixel features and the point cloud features corresponding to the super-pixels in the sample image.

In one embodiment, based on the embodiment shown in fig. 2, the embodiment relates to a training process of a feature extraction network. Referring to fig. 3, the process includes

steps

301, 302, and 303:

step 301, a computer device obtains a sample image and sample point cloud data for a sample region, and performs superpixel segmentation on the sample image to obtain each superpixel.

It should be noted that, in the training phase of the feature extraction network, in view of a large data processing amount, the computer device may be a server, and the server may be a single server or a server cluster formed by a plurality of servers.

In the embodiment of the application, the sample image can be obtained by acquiring the image of the sample region through the camera, the sample point cloud data can be obtained by scanning the sample region through the laser radar, the sample region can be any region covered by the camera and the laser radar, and the number of the sample images can be multiple.

Illustratively, assume that there are C sample images, denoted as I ₁ ,...,I _c The sample image may be a three-channel RGB image; assuming n points in the sample point cloud data, the sample point cloud data is denoted as P = (P) _i ) _i＝1,...,n Each point in the sample point cloud data may include multidimensional information, which may include, for example, six-dimensional information: three-dimensional coordinates (XYZ), laser reflection intensity, color information, and normal vector, among others.

Next, in a possible implementation manner, the computer device performs superpixel segmentation on the sample image, in this embodiment, the superpixel segmentation may be performed by using a Single Linear Iterative Clustering (SLIC) algorithm to obtain superpixel segmentation results, that is, Q superpixels are obtained

The super pixel is an irregular pixel block which is formed by adjacent pixels with similar texture, color, brightness and other characteristics and has a certain visual significance.

Exemplarily, referring to fig. 4, fig. 4 is a schematic diagram of an exemplary sample image and a superpixel segmentation of the sample image into superpixels, fig. 4 only exemplarily indicates three superpixels, and it is understood that each closed region in fig. 4 is a superpixel.

Step 302, the computer device obtains the superpixel features corresponding to the superpixels according to the sample image, and obtains the point cloud features corresponding to the superpixels according to the sample point cloud data.

First, a process of acquiring a superpixel feature corresponding to each superpixel from a sample image by a computer device will be described.

Illustratively, referring to fig. 5, the computer device may perform

steps

501 and 502 shown in fig. 5 to implement a process of acquiring a superpixel feature corresponding to each superpixel according to the sample image:

step 501, inputting a sample image into a neural network model by computer equipment to obtain pixel characteristics corresponding to the sample image.

The neural network model may be, for example, a residual neural network, such as ResNet-50, with the computer device converting the sample image I ₁ ,...,I _c Inputting the initial pixel characteristic g (I) into ResNet-50 ₁ ),...,g(I _c ) The ResNet-50 uses weights pre-trained with an auto-supervised model Mocov2 on an Image Net dataset.

Since the sample image is planar two-dimensional data, the neural network model is a 2D Back bone, otherwise known as2D pre trained Back bone, then the computer device pairs the initial pixel feature g (I) ₁ ),...,g(I _c ) Obtaining the corresponding pixel characteristic g '(I') of the sample image by up-sampling the 2D Head to a specific dimension ₁ ),...,g′(I _c )。

Step 502, the computer device determines target pixel characteristics corresponding to each superpixel from the pixel characteristics, and performs pooling processing on each target pixel characteristic to obtain each superpixel characteristic.

The computer device compares the pixel characteristics g' (I) obtained in step 501 ₁ ),...,g′(I _c ) Pooling according to super pixels to obtain super pixel characteristics g 'corresponding to each super pixel' _S1 (I ₁ ),...,g′ _SQ (I ₁ ),...,g′ _S1 (I _c ),...,g′ _SQ (I _c ) That is, each super pixel in each sample image will get the corresponding super pixel feature.

Since the super-pixel includes a plurality of adjacent pixels with similar texture, color, brightness, and other features, in this embodiment of the application, the computer device may determine, from the pixel features obtained in step 501, target pixel features corresponding to each super-pixel, for example, the super-pixel 1 includes 100 pixels, and the computer device determines, from the pixel features, features corresponding to the 100 pixels as the target pixel features, and then performs pooling processing, for example, averaging processing, on the target pixel features to obtain the super-pixel features corresponding to the super-pixels.

The above process is a process in which the computer device obtains the superpixel features corresponding to the superpixels according to the sample image, and the process in which the computer device obtains the point cloud features corresponding to the superpixels according to the sample point cloud data is described. Illustratively, referring to fig. 6, a computer device may execute

steps

601 and 602 shown in fig. 6 to implement a process of acquiring point cloud features corresponding to respective superpixels according to sample point cloud data:

step 601, inputting sample point cloud data into an initial feature extraction network by computer equipment to obtain initial voxel features corresponding to each initial voxel point cloud in the sample point cloud data.

The initial feature extraction network may be a parameter-initialized 3D Backbone network, for example, a parameter-initialized pilar-based 3D network, the computer device inputs the sample point cloud data into the pilar-based 3D network, performs voxel segmentation on the sample point cloud data through the pilar-based 3D network to obtain each initial voxel point cloud, and then performs feature extraction to obtain an initial voxel feature corresponding to each initial voxel point cloud: pillar feature f (P).

It should be noted that any pilar-based 3D backbone can be used in the embodiments of the present application.

Different from a common point-based model in the prior art, the 3D network used in the embodiment of the present application is a pillarbased, and the network with such a structure has a smaller calculation amount and consumes less time during training.

Step 602, the computer device obtains point cloud features corresponding to the superpixels according to the initial voxel features.

And pooling the initial voxel characteristics obtained in the step 601 by the computer equipment according to the superpixels to obtain point cloud characteristics corresponding to each superpixel.

In one possible implementation, referring to fig. 7, step 602 may include

steps

6021 and 6022 shown in fig. 7:

step 6021, the computer device projects each initial voxel characteristic to the coordinate system of the pixel characteristic according to the coordinate mapping relation between the sample point cloud data and the sample image, and obtains the middle voxel characteristic.

The laser radar and the camera are in different coordinate systems, so that the acquired data are also in different coordinate systems, and the computer equipment can project points output by the laser radar into the coordinate system of the image output by the camera according to the pose relation between the laser radar and the camera. In the embodiment of the application, the computer equipment can establish the coordinate mapping relation between the sample point cloud data and the sample image according to the pose relation between the laser radar and the camera, so that each 3D point p in the sample point cloud data _i Through the mapping of the coordinate mapping relation, the position of the coordinate mapping relation in the sample image can be obtainedThe pixel coordinate (u, v) in the pixel coordinate system of (a), and the pixel coordinate corresponding to the 3D point which is not visible in the camera of the camera is 0.

Exemplarily, referring to fig. 8, fig. 8 is a schematic diagram illustrating exemplary mapping of sample point cloud data to sample image correspondences.

In this way, the computer device may project each initial voxel feature to the coordinate system where the pixel feature is located according to the coordinate mapping relationship, to obtain an intermediate voxel feature, which is in the same pixel coordinate system as the sample image and the pixel feature.

Step 6022, the computer device determines the target voxel characteristics corresponding to each superpixel from the intermediate pixel characteristics, and performs pooling processing on each target voxel characteristic to obtain each point cloud characteristic.

And pooling the intermediate voxel characteristics obtained in the step 6021 by the computer equipment according to the superpixels to obtain the point cloud characteristics corresponding to each superpixel.

Illustratively, before step 2022, the computer device may further perform a downsampling process on the intermediate voxel feature, such that a feature dimension of the downsampled intermediate voxel feature is the same as a feature dimension of the pixel feature.

Namely, the initial characteristic extraction network is used as a 3D backhaul to obtain the initial voxel characteristics corresponding to each initial voxel point cloud in the sample point cloud data: the Pillar feature f (P) is then downsampled by 3D Head to the same dimension as the feature dimension of the pixel feature, resulting in f' (P).

Thus, determining the target voxel characteristics respectively corresponding to the superpixels from the intermediate pixel characteristics comprises: and the computer equipment determines target voxel characteristics corresponding to the super pixels from the intermediate voxel characteristics after the down-sampling processing. Namely, the computer device pools f' (P) according to the superpixels to obtain the point cloud characteristics corresponding to each superpixel.

Because the scanning range of the laser radar is usually larger than the coverage range of the camera, the area range corresponding to the sample point cloud data output by the laser radar is larger than the area corresponding to the sample image output by the cameraIn this regard, for example, before step 2022, the computer device may further remove the invalid point feature from the intermediate voxel feature according to the coordinate mapping relationship to obtain a removed intermediate voxel feature f _T ' (P) invalid point feature is a point feature corresponding to an area not covered by the camera, and f _T ' (P) is the significant point feature. Thus, determining the target voxel characteristics respectively corresponding to the superpixels from the intermediate pixel characteristics comprises: and the computer equipment determines the target voxel characteristics corresponding to the super pixels from the removed intermediate voxel characteristics. And then, pooling the target voxel characteristics corresponding to the super pixels by the computer equipment to obtain the point cloud characteristics corresponding to each super pixel.

And step 303, the computer equipment performs comparative learning based on the super pixel features and the cloud features of the points to obtain a feature extraction network.

The method comprises the steps that computer equipment obtains super-pixel features corresponding to super pixels according to sample images, after the point cloud features corresponding to the super pixels are obtained according to sample point cloud data, comparison learning is conducted on the basis of the super-pixel features and the point cloud features, namely an initial feature extraction network is trained on the basis of the super-pixel features and the point cloud features, a feature extraction network is obtained, and the feature extraction network is made to learn effective 3D representation.

In the related art, the 3D neural network technology working on real point clouds is mostly suitable for indoor scenes with dense point clouds, and this way has the following disadvantages when processing sparse point clouds: point cloud pairs or image pixel-point cloud pairs are required to be constructed, a large amount of pairwise relations are processed, a large amount of computing space and time are consumed, large-scale sparse point cloud data such as automatic driving data cannot be processed, and an adaptive model is only a point-based 3D model and has certain limitation.

In the embodiment of the present application, referring to fig. 9, fig. 9 is a schematic diagram of a training process of an exemplary feature extraction network, and a self-supervision pre-training method is designed for scene characteristics of multi-sensor (laser radar and camera) fusion in an automatic driving scene, and a laser radar and a camera in a vehicle are used to distill a self-supervision pre-trained 2D backhaul into a 3D backhaul to obtain a feature extraction network. In addition, in the training process of the feature extraction network, no point cloud or image marking is needed, the superpixel is used for segmenting visually similar regions to collect 3D point features and 2D pixel features, then the 3D Back bone is trained to complete a self-supervision task, and the point features and the pixel features of the superpixel are matched, wherein the consumption of calculation space and time can be greatly reduced by using the point features and the pixel features of the superpixel. In addition, different from a common point-based model in the prior art, the 3D network used in the embodiment of the present application is a pilar-based model, and the network with such a structure has a smaller calculation amount and consumes less time in training.

In another possible embodiment, the step of performing superpixel segmentation on the sample image by the computer device in the step 301 to obtain each superpixel may be replaced by the following step B:

and step B, carrying out panoramic segmentation on the sample image by the computer equipment to obtain each super pixel.

The panoramic segmentation combines the example segmentation and the semantic segmentation, not only all objects on a sample image are segmented, but also different objects of the same class are distinguished independently, and the super-pixel generation mode is closer to the assumption of super-pixels in pre-training: each superpixel is a single object on the sample image.

In this embodiment of the present application, the panorama segmentation model may be panopticlfpn, which extracts image features of different levels using ResNet101 and a Feature Pyramid Network (Feature Pyramid Network), outputs results of instance segmentation and semantic segmentation based on the image features through two different branches, and finally mixes the results of the two branches to output a panorama segmentation result.

Exemplarily, referring to fig. 10, fig. 10 is a schematic diagram of a segmentation result exemplarily performing panorama segmentation on a sample image.

Superpixels are simply divided based on the colors and lines of the image as compared to superpixel segmentation. The PanopticFPN panoramic segmentation model is trained on a COCO-panoramic data set in a full-supervised mode, and the characteristics of different daily objects on pictures are learned in a large quantity. Through the learned model, reasonable superpixel segmentation can be generated for the image in the automatic driving scene more accurately, and follow-up accurate pre-training is facilitated.

In one embodiment, based on the embodiment shown in fig. 2, the data processing method further includes step a:

step A, inputting the characteristics of each voxel into a prediction network by computer equipment to obtain the prediction category corresponding to each voxel point cloud.

The prediction network is obtained by performing fine tuning pre-training on the initial prediction network by using a feature extraction network and a point cloud training sample, wherein the point cloud training sample can be a small amount of point cloud data containing labels. And fine adjustment, namely performing additional training according to a data set highly similar to the use scene of the prediction network, so that the parameters of the prediction network can be more adaptive to the required task, and a better effect is achieved.

Therefore, the prediction network is obtained by extracting the network through the self-supervision pre-training characteristics and then fine-tuning, the problems of point cloud segmentation processing (or point cloud semantic segmentation) and few labeled samples in a target detection task can be solved, and the precision of the task can be improved by effectively utilizing a large amount of label-free data. In addition, the self-supervision pre-trains the feature extraction network by utilizing a large amount of label-free data, and then fine-tunes on a small amount of labeled data sets to obtain effects similar to or even better than those of supervision training, so that the burden of labeling large data sets is reduced, the iteration speed of a model is accelerated, the rapid evolution of automatic driving is promoted, and various complex downstream tasks can be well migrated.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides a data processing apparatus for implementing the above-mentioned data processing method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so the specific limitations in one or more embodiments of the data processing device provided below may refer to the limitations on the data processing method in the above description, and are not described herein again.

In one embodiment, as shown in fig. 11, there is provided a data processing apparatus including:

a first obtaining module 1101, configured to obtain point cloud data of a target area;

a processing module 1102, configured to input the point cloud data into a feature extraction network, to obtain voxel features corresponding to each voxel point cloud in the point cloud data, where each voxel point cloud is obtained by performing voxel segmentation on the point cloud data, and each voxel feature is used to perform point cloud segmentation processing or target detection processing on the point cloud data;

In one embodiment, the apparatus further comprises:

the second acquisition module is used for acquiring the sample image and the sample point cloud data aiming at the sample region and performing superpixel segmentation on the sample image to obtain each superpixel;

the third acquisition module is used for acquiring the superpixel characteristics corresponding to the superpixels according to the sample image and acquiring the point cloud characteristics corresponding to the superpixels according to the sample point cloud data;

and the training module is used for carrying out comparison learning based on each super-pixel feature and each point cloud feature to obtain the feature extraction network.

In one embodiment, the third obtaining module includes:

the first acquisition unit is used for inputting the sample image into a neural network model to obtain pixel characteristics corresponding to the sample image; and determining target pixel characteristics corresponding to the super pixels from the pixel characteristics, and performing pooling processing on the target pixel characteristics to obtain the super pixel characteristics.

In one embodiment, the third obtaining module further includes:

the second acquisition unit is used for inputting the sample point cloud data into an initial feature extraction network to obtain initial voxel features corresponding to each initial voxel point cloud in the sample point cloud data;

and the third acquisition unit is used for acquiring the point cloud characteristics corresponding to the super pixels according to the initial voxel characteristics.

In one embodiment, the third obtaining unit is specifically configured to project each of the initial voxel features to a coordinate system in which the pixel feature is located according to a coordinate mapping relationship between the sample point cloud data and the sample image, so as to obtain an intermediate voxel feature; and determining target voxel characteristics corresponding to the superpixels from the intermediate voxel characteristics, and performing pooling processing on the target voxel characteristics to obtain the point cloud characteristics.

In one embodiment, the apparatus further comprises:

a down-sampling module, configured to perform down-sampling processing on the intermediate voxel characteristic, so that a feature dimension of the intermediate voxel characteristic after the down-sampling processing is the same as a feature dimension of the pixel characteristic;

the third obtaining unit is specifically configured to determine the target voxel characteristics corresponding to the super pixels from the intermediate voxel characteristics after the down-sampling processing.

In one embodiment, the apparatus further comprises:

the eliminating module is used for eliminating invalid point characteristics from the intermediate voxel characteristics according to the coordinate mapping relation to obtain the eliminated intermediate voxel characteristics;

the third obtaining unit is specifically configured to determine the target voxel features corresponding to the superpixels respectively from the intermediate voxel features after the elimination.

In one embodiment, the apparatus further comprises:

and the prediction module is used for inputting the voxel characteristics into a prediction network to obtain the prediction categories corresponding to the voxel point clouds, and the prediction network is obtained by utilizing the characteristic extraction network and the point cloud training samples to perform fine tuning pre-training on the initial prediction network.

The various modules in the data processing apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

acquiring point cloud data of a target area;

In one embodiment, the processor, when executing the computer program, further performs the steps of:

and carrying out comparison learning based on each super-pixel feature and each point cloud feature to obtain the feature extraction network.

inputting the sample point cloud data into an initial feature extraction network to obtain initial voxel features corresponding to initial voxel point clouds in the sample point cloud data;

and determining target voxel characteristics corresponding to the superpixels from the intermediate voxel characteristics, and performing pooling processing on the target voxel characteristics to obtain the point cloud characteristics.

In one embodiment, the processor when executing the computer program further performs the steps of:

and determining the target voxel characteristics corresponding to the super pixels from the removed intermediate voxel characteristics.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

acquiring point cloud data of a target area;

In one embodiment, the computer program when executed by the processor further performs the steps of:

acquiring the sample image and sample point cloud data aiming at a sample area, and performing superpixel segmentation on the sample image to obtain each superpixel;

according to a coordinate mapping relation between the sample point cloud data and the sample image, projecting each initial voxel characteristic to a coordinate system where the pixel characteristic is located to obtain an intermediate voxel characteristic;

In one embodiment, a computer program product is provided, comprising a computer program which when executed by a processor performs the steps of:

acquiring point cloud data of a target area;

It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are processed on the premise of having a legitimate basis (for example, obtaining consent of a personal information subject, or being necessary for contract fulfillment, etc.), and are only processed within a prescribed or agreed scope. The user refuses to process personal information except the necessary information required by the basic function, and the use of the basic function by the user is not influenced.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include a Read-Only Memory (ROM), a magnetic tape, a floppy disk, a flash Memory, an optical Memory, a high-density embedded nonvolatile Memory, a resistive Random Access Memory (ReRAM), a Magnetic Random Access Memory (MRAM), a Ferroelectric Random Access Memory (FRAM), a Phase Change Memory (PCM), a graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims

1. A method of data processing, the method comprising:

acquiring point cloud data of a target area;

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein said obtaining a superpixel feature corresponding to each of said superpixels from said sample image comprises:

4. The method of claim 3, wherein said obtaining point cloud features corresponding to each of said superpixels from said sample point cloud data comprises:

5. The method of claim 4, wherein said obtaining the point cloud feature corresponding to each superpixel from each of the initial voxel features comprises:

6. The method of claim 5, further comprising:

7. The method of claim 5, further comprising:

8. The method of claim 1, further comprising:

9. A data processing apparatus, characterized in that the apparatus comprises:

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 8.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.

12. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, realizes the steps of the method of any one of claims 1 to 8.