CN113902043A

CN113902043A - Target identification method, device and equipment

Info

Publication number: CN113902043A
Application number: CN202111445551.5A
Authority: CN
Inventors: 陈纪凯; 潘虹宇; 苗振伟; 汪洋; 朱均; 刘凯旋; 郝培涵; 占新; 卿泉
Original assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Current assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2022-01-07

Abstract

The application discloses a target identification method and a device, wherein the method comprises the following steps: acquiring current frame point cloud data and multi-frame historical point cloud data in a target environment; obtaining point characteristics of each point in the current frame point cloud data and the multi-frame historical point cloud data, and obtaining multi-view characteristics according to the current frame point cloud data and the multi-frame historical point cloud data; obtaining point dimension description characteristics under the target environment according to the point characteristics of each point and the multi-view characteristics; and identifying the target in the target environment according to the point dimension description characteristics in the target environment. By adopting the processing mode, the point dimension description characteristics of multiple visual angles are obtained according to the multi-frame point cloud, so that the perception performance of the laser point cloud is ensured, and higher target identification accuracy and higher identification speed can be effectively considered.

Description

Target identification method, device and equipment

Technical Field

The application relates to the technical field of automatic driving, in particular to a target identification method, device and equipment.

Background

In the fields of unmanned driving, robots and the like, machine sensing is an important component, and sensing sensors comprise laser radars, cameras, ultrasonic waves, millimeter wave radars and the like. Compared with sensors such as a camera, ultrasonic waves and millimeter wave radars, the laser point cloud signal of the multi-line laser radar contains accurate target position information and geometric shape information of a target, so that the laser point cloud signal plays an important role in the perception of unmanned driving and robots.

At present, a commonly used method for realizing target identification by using laser point cloud includes: the method comprises the steps of a traditional segmentation detection method, a laser point cloud projection-based deep learning method, a voxelization-based 3D laser point cloud detection method, a point cloud 3D detection method based on point dimension characteristics and the like. However, in the process of implementing the present invention, the inventor finds that the above methods all have the problem that higher target identification accuracy and faster target identification speed cannot be considered at the same time, and the requirements of the automatic driving vehicle system on the real-time performance and accuracy of target detection are difficult to achieve, and the safety of automatic driving under complex road conditions cannot be ensured.

Disclosure of Invention

The application provides a target identification method, which aims to solve the problem that the prior art cannot give consideration to higher target identification accuracy and higher target identification speed. The application additionally provides a target recognition device, equipment and a vehicle.

The application provides a target identification method, which comprises the following steps:

acquiring current frame point cloud data and multi-frame historical point cloud data in a target environment;

obtaining point characteristics of each point in the current frame point cloud data and the multi-frame historical point cloud data, and obtaining multi-view characteristics according to the current frame point cloud data and the multi-frame historical point cloud data;

obtaining point dimension description characteristics under the target environment according to the point characteristics of each point and the multi-view characteristics;

and identifying the target in the target environment according to the point dimension description characteristics in the target environment.

Optionally, the obtaining of the multi-view feature according to the current frame point cloud data and the multi-frame historical point cloud data includes:

aligning the multi-frame historical point cloud data to a coordinate system of the current frame point cloud data through positioning;

extracting the characteristics of a top view from each frame of point cloud data in the multi-frame historical point cloud data;

and projecting the current frame point cloud data into the visual angle of a front view, and extracting the characteristics of the front visual angle.

Optionally, the extracting features of the top view from each frame of point cloud data in the multiple frames of historical point cloud data includes:

voxelizing each frame of point cloud data in the multi-frame historical point cloud data;

extracting features in non-empty voxels in each frame of point cloud data;

and splicing the features in all the non-empty voxels to obtain the features of the multi-frame accumulated top view.

Optionally, the method further includes:

aiming at the same laser point in the historical point cloud data and the current frame point cloud data, acquiring the characteristics of a top view and the characteristics of a front view corresponding to the same laser point;

and splicing the characteristics of the top view and the characteristics of the front view corresponding to the same laser point to obtain the multi-view characteristics corresponding to the same laser point.

Optionally, the obtaining, according to the point feature of each point and the multi-view feature, a point dimension description feature in the target environment includes:

and splicing the point characteristics of the same laser point and the multi-view characteristics corresponding to the same laser point to obtain the point dimension description characteristics corresponding to the same laser point.

Optionally, the identifying the target in the target environment according to the point dimension description feature in the target environment includes:

and performing multi-task learning on the point dimension description characteristics of each laser point, wherein the multi-task learning comprises but is not limited to central point, size and direction supervision, so as to realize a target identification task in the target environment.

Optionally, the multitask learning further includes point cloud segmentation, and then the method further includes:

and performing point cloud segmentation on the point dimension description characteristics of each laser point to realize a point cloud segmentation task in the target environment.

according to the point dimension description characteristics under the target environment, target frame prediction is carried out on the target environment;

the method further comprises the following steps:

carrying out migration processing on the foreground points according to the predicted offset value of the foreground points to serve as foreground migration points;

selecting a plurality of target key points from the foreground offset points;

acquiring a foreground offset point set corresponding to each target key point;

determining a target prediction frame according to the foreground offset point set;

and screening the target prediction frames to eliminate redundant target prediction frames.

The present application further provides a target recognition apparatus, including:

the device comprises a multi-frame point cloud obtaining unit, a multi-frame point cloud obtaining unit and a control unit, wherein the multi-frame point cloud obtaining unit is used for obtaining current frame point cloud data and multi-frame historical point cloud data in a target environment;

the point feature extraction unit is used for obtaining the point features of each point in the current frame point cloud data and the multi-frame historical point cloud data;

the multi-view characteristic extraction unit is used for obtaining multi-view characteristics according to the current frame point cloud data and the multi-frame historical point cloud data;

the feature fusion unit is used for obtaining point dimension description features under the target environment according to the point features of each point and the multi-view features;

and the target identification unit is used for identifying the target in the target environment according to the point dimension description characteristics in the target environment.

The present application further provides an electronic device, comprising:

a processor; and

a memory for storing a program for implementing the method for object recognition according to the above, the apparatus being powered on and the program for executing the method by the processor.

The present application also provides a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the various methods described above.

The present application also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the various methods described above.

Compared with the prior art, the method has the following advantages:

according to the target identification method provided by the embodiment of the application, current frame point cloud data and multi-frame historical point cloud data in a target environment are obtained; obtaining point characteristics of each point in the current frame point cloud data and the multi-frame historical point cloud data, and obtaining multi-view characteristics according to the current frame point cloud data and the multi-frame historical point cloud data; obtaining point dimension description characteristics under the target environment according to the point characteristics of each point and the multi-view characteristics; and identifying the target in the target environment according to the point dimension description characteristics in the target environment. By adopting the processing mode, the point dimension description characteristics of multiple visual angles are obtained according to the multi-frame point cloud, so that the perception performance of the laser point cloud is ensured, and higher target identification accuracy and higher identification speed can be effectively considered.

Drawings

FIG. 1 is a flow chart of an embodiment of a target identification method provided herein;

FIG. 2 is a schematic network diagram of an embodiment of a target identification method provided in the present application;

fig. 3 is a schematic diagram of target frame prediction in an embodiment of the target identification method provided in the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The application provides a target identification method, a target identification device, a target identification equipment and a vehicle. Each of the schemes is described in detail in the following examples.

First embodiment

Please refer to fig. 1, which is a flowchart illustrating an embodiment of a target identification method according to the present application. The method is carried out by a subject including but not limited to an unmanned vehicle such as a smart logistics vehicle and the like, and the identifiable objects include pedestrians, vehicles, buildings, trees, road teeth, traffic lights, zebra crossings and the like. The target identification method provided by the application comprises the following steps:

step S101: acquiring current frame point cloud data and multi-frame historical point cloud data in a target environment.

According to the method provided by the embodiment of the application, in the driving process of the vehicle, the space coordinates of each sampling Point on the surface of the environmental space object of the driving road of the vehicle can be obtained through the three-dimensional space scanning device arranged on the vehicle, so that a Point set is obtained, and the mass Point data is called as road environmental Point Cloud (Point Cloud) data. The road environment point cloud data enables the surface of the scanned object to be recorded in the form of points, and each point comprises three-dimensional coordinates, and some points may comprise color information (RGB) or reflection Intensity information (Intensity), and the like. By means of the point cloud data, the target space can be expressed under the same spatial reference system.

The three-dimensional space scanning device may be a laser radar (Lidar), And performs laser Detection And measurement in a laser scanning manner to obtain obstacle information in a surrounding environment, such as buildings, trees, people, vehicles, And the like, where measured data is represented by discrete points of a Digital Surface Model (DSM). In specific implementation, a multi-line laser radar such as 16 lines, 32 lines, 64 lines and the like can be adopted, the Frame rates (Frame rates) of the point cloud data collected by radars with different laser beam quantities are different, and for example, 10 frames of point cloud data are generally collected per second by 16 lines and 32 lines. The three-dimensional space scanning device may be a three-dimensional laser scanner, a photographic scanner, or the like.

According to the method provided by the embodiment of the application, after point cloud data (current frame point cloud data) is collected at a certain moment, target identification can be carried out on the current road environment according to the current frame point cloud data and point cloud data (multi-frame historical point cloud data) of a plurality of historical frames before the current frame. The plurality of history frames may be a plurality of history frames adjacent to the current frame, or one or more history frames not adjacent to the current frame.

Step S103: and acquiring point characteristics of each point in the current frame point cloud data and the multi-frame historical point cloud data, and acquiring multi-view characteristics according to the current frame point cloud data and the multi-frame historical point cloud data.

The current frame point cloud data and the multi-frame historical point cloud data are combined to obtain richer characteristic information of each target under the target environment. In this embodiment, two types of features are available: the current frame point cloud data and the point characteristics of each point in the multi-frame historical point cloud data, and the multi-view characteristics obtained based on the current frame point cloud data and the multi-frame historical point cloud data.

The current frame point cloud data and the multi-frame historical point cloud data can comprise point cloud data of the same point and point cloud data of different points, which are collected at different positions, on the surface of the same target, and the point cloud data can represent a target space more accurately. Correspondingly, more abundant point features of the same target, namely the point features of each point in the current frame point cloud data and the multi-frame historical point cloud data, can be extracted from the point cloud data.

In one example, the point features of each point in the current frame point cloud data and the multiple frames of historical point cloud data may be obtained by a multi-tier perceptron. In specific implementation, the current frame point cloud data and the multiple frames of historical point cloud data can be used as input data of a multilayer perceptron, and output data of the multilayer perceptron is point characteristics of each point in the current frame point cloud data and the multiple frames of historical point cloud data. The multilayer perceptron can adopt a mature perceptron in the prior art, such as a perceptron based on a neural network.

The multi-view feature may include features of the target environment observed from different viewing angles, such as a front view feature, a top view feature, a back view feature, a bottom view feature, a left view feature, a right view feature, and may also be a designated viewing angle, such as a 45 degree elevation view. The multi-view feature can be obtained according to the current frame point cloud data and the multi-frame historical point cloud data.

In one example, the multi-view feature may be obtained by:

step S201: and aligning the multi-frame historical point cloud data to a coordinate system of the current frame point cloud data through positioning.

The multi-frame historical point cloud data and the current frame point cloud data are point cloud data of a target space collected at different positions. The point cloud data of different historical frames can comprise point cloud data of the same target, the point cloud data of different frames of the same target are located in different coordinate systems, and the multi-frame historical point cloud data are converted to the coordinate system of the current frame point cloud data, so that the point cloud data of different frames of the same target can correspond to each other. In specific implementation, the position information corresponding to each historical frame can be obtained, and the multiple frames of historical point cloud data are aligned to the coordinate system of the current frame of point cloud data according to the position information. And aligning the multi-frame historical point cloud data to the coordinate system of the current frame point cloud data through positioning, which belongs to the field of the prior art and is not repeated here.

Step S203: and extracting the characteristics of the top view from each frame of point cloud data in the multi-frame historical point cloud data.

After the multi-frame historical point cloud data are aligned to the coordinate system of the current frame point cloud data, the characteristics of the top plan view can be extracted according to the aligned multi-frame historical point cloud data, and therefore rich top plan view characteristics can be extracted.

In this embodiment, the features of the top view may be extracted for each frame of point cloud data in the plurality of frames of historical point cloud data after the coordinate system conversion. In a specific implementation, step S203 may include the following sub-steps: 1) voxelizing each frame of point cloud data in the multiple frames of historical point clouds; 2) extracting features in non-empty voxels in each frame of point cloud data; 3) and splicing the features in all the non-empty voxels to obtain the features of the multi-frame accumulated top view. By adopting the processing mode, the point cloud can be voxelized and the characteristics in each non-empty voxel are extracted

And then the features are spliced together to form a plurality of frames of accumulated top view features.

In another example, step S203 may include the following sub-steps: 1) overlapping current frame point cloud data and aligned multi-frame historical point cloud data; 2) voxelizing the superposed multi-frame point cloud data; 3) projecting the point cloud data of the non-empty voxels to a top view plane; 4) and determining the characteristics of the top view according to the projected data.

Step S205: and projecting the current frame point cloud data to the visual angle of a front view, and extracting the characteristics of the front visual angle.

The current frame point cloud data generally includes more comprehensive point cloud data in one surface of a target (such as buildings around a road, vehicles running on the road and the like) facing an execution main body (such as current unmanned vehicles), so that more abundant forward view angle characteristics of a target space can be extracted according to the current frame point cloud data. In a specific implementation, step S205 may include the following sub-steps: 1) projecting the current frame point cloud data to a front view plane; 2) and determining the characteristics of the front view according to the projected data.

In another example, the process of obtaining the multi-view feature may further include at least one of: extracting the characteristics of a right view from each frame of point cloud data in the multi-frame historical point cloud data; extracting the characteristics of a left view from each frame of point cloud data in the multi-frame historical point cloud data; extracting the characteristics of a bottom view of each frame of point cloud data in the multi-frame historical point cloud data; and projecting the current frame point cloud data into a visual angle of a rear view, and extracting the characteristics of the rear visual angle. Therefore, the characteristics of more visual angles can be extracted, and the characteristic information of richer target space is obtained.

In specific implementation, the multi-view feature can also be obtained by the following steps: aligning the multi-frame historical point cloud data to a coordinate system of the current frame point cloud data through positioning; extracting the characteristics of a front view angle for each frame of point cloud data in the multi-frame historical point cloud data; and projecting the current frame point cloud data to a visual angle of a top view, and extracting the characteristics of the top view. Therefore, richer forward-looking angle characteristics can be extracted according to the multi-frame point cloud data.

In addition, the multi-view feature can be obtained by the following steps: aligning the multi-frame historical point cloud data to a coordinate system of the current frame point cloud data through positioning; and extracting the characteristics of the front view angle and the characteristics of the top view angle of each frame of point cloud data in the multiple frames of historical point cloud data and the current frame of point cloud data. Therefore, richer front view angle characteristics and top view characteristics can be extracted according to the multi-frame point cloud data.

In one example, after extracting the features of the top view, the features of the front view angle of the target environment, the method may further include the steps of:

step S401: and aiming at the same laser point in the historical point cloud data and the current frame point cloud data, obtaining the characteristics of a top view and the characteristics of a front view corresponding to the same laser point.

In this embodiment, the multi-view features obtained in step S103 include an overall top view feature and a front view feature in a target environment, and according to the position information of the same laser point in the overall environment in the historical point cloud data and the current frame point cloud data, the top view feature and the front view feature corresponding to the same laser point can be obtained. That is, the top view feature and the front view feature of the whole object environment are converted into the top view feature and the front view feature of each point in the object environment.

Step S403: and splicing the characteristics of the top view and the characteristics of the front view corresponding to the same laser point to obtain the multi-view characteristics corresponding to the same laser point.

After the top view features and the front view features of the points in the target environment are obtained, the top view features and the front view features of the points can be spliced to obtain the multi-view features of the points.

Step S105: and obtaining the point dimension description characteristics under the target environment according to the point characteristics of each point and the multi-view characteristics.

The target environment includes a plurality of laser points, and in this embodiment, the point dimension description feature of each point includes not only the point feature of the point obtained based on the original point cloud data, but also the multi-view feature of each point. Therefore, the point dimension characteristic data can more accurately keep the measurement information of the original target, and more abundant characteristic information of each laser point can be represented, so that the perception performance of the sensor can be improved.

In this embodiment, step S105 can be implemented as follows: and splicing the point characteristics of the same laser point and the multi-view characteristics corresponding to the same laser point to obtain the point dimension description characteristics corresponding to the same laser point. In specific implementation, the manner of splicing the point feature and the multi-view feature of one laser point may be to connect the dimensions of the point feature and the dimensions of the multi-view feature, where the point feature is 10-dimensional data, and the multi-view feature is 15-dimensional data, and the connected feature is 25-dimensional data. In addition, weighted summation can be performed on each dimension of the point feature and each dimension of the multi-view feature, and if the point feature is 10-dimensional data, the point feature has a weight of 0.6, the multi-view feature is 10-dimensional data, and the multi-view weight is 0.4, the feature after splicing is still 10-dimensional data.

Step S107: and identifying the target in the target environment according to the point dimension description characteristics in the target environment.

The point dimension description features have richer feature information, and target identification is performed based on the point dimension description features, so that a more accurate identification result can be obtained. The target in the target environment may be identified by identifying a target category (e.g., building, tree, person, vehicle, etc.), identifying a target frame (e.g., bounding box of the target), identifying a target point cloud (e.g., dividing the point cloud of the entire environment into point clouds corresponding to the targets), and so on.

In this embodiment, step S107 can be implemented as follows: and performing multi-task learning on the point dimension description characteristics of each laser point, wherein the multi-task learning comprises but is not limited to central point, size and direction supervision, so as to realize a target identification task in the target environment. The center point may be a center point of the dynamic target. The size may be the size of the dynamic object. The direction may be an orientation of a dynamic object, such as a direction of travel of a vehicle. By adopting the multi-task learning mode, the target identification efficiency can be effectively improved.

The multitask learning can further comprise target point cloud segmentation, and correspondingly, the method can further comprise the following steps: and performing point cloud segmentation on the point dimension description characteristics of each laser point to realize a point cloud segmentation task in the target environment, such as performing point cloud segmentation on static targets (such as trees and buildings) in the target environment. Thus, point cloud data of each target in the target environment can be obtained.

The multitask learning can further comprise target class identification, and correspondingly, the method can further comprise the following steps: and identifying the target category in the target environment according to the point dimension description characteristics in the target environment. In this way, the categories of objects in the object environment, such as buildings, trees, people, vehicles, etc., can be obtained.

In this embodiment, the specific process of target identification is as follows: 1) acquiring current frame and multi-frame historical laser point cloud data in a target environment through a laser radar; 2) acquiring point characteristics of each laser point in a target environment according to multi-frame original point cloud data, and acquiring multi-view characteristics of each laser point, such as characteristics of a top view and characteristics of a front view, according to the multi-frame original point cloud data; 3) performing feature extraction on the original multi-view features through a convolutional neural network; 4) fusing the point characteristics of each point with the multi-view characteristics processed by the convolutional neural network to obtain point dimension description characteristics; 5) and performing target identification processing based on the point dimension description characteristics.

In one example, the target recognition process is performed by a neural network-based target recognition model. The object recognition model includes: the system comprises a point feature extraction network, a multi-view feature processing network, a feature fusion network and a multi-task learning network. And the point feature extraction network is used for extracting the point features of all laser points in the target environment according to the multi-frame point cloud data. The point feature extraction network can adopt a multilayer perceptron structure or other network structures and the like. And the multi-view feature processing network is used for carrying out feature transformation on the multi-view features. The multi-view feature processing network can adopt a convolutional neural network and the like. And the feature fusion network is used for obtaining point dimension description features according to the point features and the transformed multi-view features. The multi-task learning network is also called a multi-task decision network and is used for multi-task target identification according to the point dimension description characteristics. By adopting the processing mode, the end-to-end point cloud panoramic segmentation network based on the multi-frame point cloud can be realized, higher target identification accuracy and higher target identification speed can be effectively considered, and the method is an important component for ensuring the sensing performance of the laser point cloud.

As shown in fig. 2, the point feature extraction network is configured to extract point features of each laser point in a target environment according to current frame point cloud data and multiple frames of historical point cloud data. The point feature extraction network can adopt a multilayer perceptron, and the multilayer perceptron can be a multilayer fully-connected network. In this embodiment, there is no field between the point features of different laser points, so the points are isolated from each other. The multi-view feature processing network comprises an overhead view feature processing network and a front view feature processing network. And the top view feature processing network is used for carrying out feature transformation on top view features accumulated by multiple frames. The top view feature processing network can adopt a U-shaped network and comprises a plurality of convolution layers and a plurality of deconvolution layers, so that an output feature map with the same resolution as an input feature map can be obtained, and further transformed top view features of each laser point, namely point-level features of a top view, can be obtained. Similarly, the foresight feature processing network may also adopt a U-type network to obtain the transformed foresight features of each laser point, i.e., the point-level features of the foresight. In this embodiment, the transformed multi-view features at the point level have rich structured semantic information, and the representation capability is stronger. The input data of the feature fusion network comprises: point features of individual laser points, transformed top view features at point level, and transformed front view features at point level. The feature fusion mode can be the splicing of different features (point feature, transformed top view feature, transformed front view feature) of the same laser point, or the weighted summation of different features, and can also adopt an attention mechanism to perform feature fusion processing. The input data of the feature fusion network comprises: the point dimension of each laser point describes features that can have accurate target location information and rich target semantic information. The multi-task learning network can comprise a point cloud segmentation network, a target category identification network and a target frame regression network.

The target frame regression network can realize the prediction of the three-dimensional target frame of each laser point. Since there are multiple laser points (foreground points) for each foreground object, there may be a large number of redundant object prediction frames in the object prediction frames output by the object bounding box regression network. In order to eliminate a large number of redundant target prediction boxes, the target frame regresses the network, a non-maximum suppression method (NMS) can be adopted, and the result with the highest prediction score (confidence of the foreground target) is used as the final output of the target prediction box. However, the NMS method has two drawbacks, one is that it is computationally expensive and not suitable for operation on low power devices, and the other is that a single detection box cannot achieve high accuracy target box prediction due to the presence of noise.

In one example, the target bounding box regression network may predict a target bounding box using the following steps:

step S501: and carrying out migration processing on the foreground points according to the predicted values of the offsets of the foreground points to serve as foreground migration points.

In a target environment, part of targets are foreground targets such as automobiles, pedestrians and the like; some of the objects are background objects such as buildings, trees, road teeth, traffic lights, etc. The laser points of the foreground object are called foreground points, and the laser points of the background object are called background points.

In this embodiment, the target category corresponding to each laser point may be predicted by the target category identification network, and a foreground point, such as the foreground point shown in (a) in fig. 3, is determined according to the target category. And through a target frame regression network, the offset of each laser point relative to the target central point can be predicted. In this step, the foreground point is shifted to the corresponding target central point according to the predicted offset value of each foreground point, and the shifted foreground point is referred to as the foreground shift point for short, as shown in (b) of fig. 3, the foreground shift points are gathered together.

Step S503: and selecting a plurality of target key points from the foreground offset points.

The plurality of target key points may include points of different foreground targets in the target environment. In this embodiment, each foreground point after the shift is taken as a vote, as shown in (b) of fig. 3, and of the votes shifted to the corresponding target central point, a plurality of laser points can be selected as main votes (i.e., target key points) by using the farthest point sampling method, as shown in (c) of fig. 3.

Step S505: and acquiring a foreground offset point set corresponding to each target key point.

Corresponding to each target key point, a foreground offset point around the point can be obtained by using a ball query mode. For example, a sphere radius may be set, a sphere range is determined according to the sphere radius, and an original foreground point corresponding to a foreground offset point in the sphere range is formed into a target prediction frame.

Step S507: and determining a target prediction frame according to the foreground offset point set.

In specific implementation, the size of the target frame may be predicted by an estimator, for example, a mean estimator averages a central point, a length, a width, a height, a direction, and a score (a target category confidence of each foreground point) in a single original laser point set corresponding to the master vote, so as to obtain a target prediction frame, as shown in (d) of fig. 3. For example, the confidence of the target class of each laser point in the target environment can be predicted through the target class recognition network, for example, the confidence that a certain laser point is an automobile is 90%, and the confidence that a certain laser point is a pedestrian is 86%.

Step S509: and screening the target prediction frame.

Since the number K of the main votes may be greater than the actual number of the targets, the target prediction frames determined directly from the main votes need to be filtered to filter out redundant target prediction frames. For example, there are 5 foreground objects in the object environment, and the number k of the main votes is set to 256 in advance, which results in that each object in the 5 foreground objects samples a plurality of object key points (256 key points), thereby forming 256 object prediction frames, and obviously there are a large number of redundant object frames that need to be filtered.

In specific implementation, a non-maximum suppression method (NMS) may be used, and the result with the highest prediction score (confidence of the foreground object) is used as the final output of the object prediction box. In addition, a clustering method (such as dbscan and kmean algorithm) can be adopted to aggregate target frames corresponding to a plurality of target key points sampled from each target. Thus, the estimation of the frame is realized from the point granularity by utilizing each foreground point.

In the present embodiment, through the above steps S501 to S509, the goal frame prediction with higher accuracy is realized by means of voting.

As can be seen from the foregoing embodiments, the target identification method provided in the embodiments of the present application obtains current frame point cloud data and multiple frames of historical point cloud data in a target environment; obtaining point characteristics of each point in the current frame point cloud data and the multi-frame historical point cloud data, and obtaining multi-view characteristics according to the current frame point cloud data and the multi-frame historical point cloud data; obtaining point dimension description characteristics under the target environment according to the point characteristics of each point and the multi-view characteristics; and identifying the target in the target environment according to the point dimension description characteristics in the target environment. By adopting the processing mode, the point dimension description characteristics of multiple visual angles are obtained according to the multi-frame point cloud, so that the perception performance of the laser point cloud is ensured, and higher target identification accuracy and higher identification speed can be effectively considered.

Second embodiment

In the foregoing embodiment, a target identification method is provided, and correspondingly, the present application also provides a target identification apparatus. The apparatus corresponds to an embodiment of the method described above. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

An object recognition apparatus of this embodiment includes: the device comprises a multi-frame point cloud acquisition unit, a point feature extraction unit, a multi-view feature extraction unit and a feature fusion unit.

The device comprises a multi-frame point cloud obtaining unit, a multi-frame point cloud obtaining unit and a control unit, wherein the multi-frame point cloud obtaining unit is used for obtaining current frame point cloud data and multi-frame historical point cloud data in a target environment; the point feature extraction unit is used for obtaining the point features of each point in the current frame point cloud data and the multi-frame historical point cloud data; the multi-view characteristic extraction unit is used for obtaining multi-view characteristics according to the current frame point cloud data and the multi-frame historical point cloud data; the feature fusion unit is used for obtaining point dimension description features under the target environment according to the point features of each point and the multi-view features; and the target identification unit is used for identifying the target in the target environment according to the point dimension description characteristics in the target environment.

Optionally, the multi-view feature extraction unit includes:

the coordinate conversion subunit is used for aligning the multi-frame historical point cloud data to a coordinate system of the current frame point cloud data through positioning;

the top view feature extraction subunit is used for extracting features of a top view for each frame of point cloud data in the multiple frames of historical point cloud data;

and the foresight angle feature extraction subunit is used for projecting the current frame point cloud data into the view angle of the foresight image and extracting the features of the foresight angle.

Optionally, the top view feature extraction subunit includes:

the voxelization subunit is used for voxelizing each frame of point cloud data in the multi-frame historical point cloud data;

the voxel characteristic extraction subunit is used for extracting the characteristics in the non-empty voxels in each frame of point cloud data;

and the voxel characteristic splicing subunit is used for splicing the characteristics in all the non-empty voxels to obtain the characteristics of the multi-frame accumulated top view.

Optionally, the apparatus further comprises:

a multi-view feature acquisition unit of point granularity, configured to obtain, for a same laser point in the historical point cloud data and the current frame point cloud data, a top view feature and a front view feature corresponding to the same laser point;

and the multi-view characteristic splicing unit is used for splicing the top view characteristic and the front view characteristic corresponding to the same laser point to obtain the multi-view characteristic corresponding to the same laser point.

Optionally, the feature fusion unit is specifically configured to splice a point feature of the same laser point and a multi-view feature corresponding to the same laser point, so as to obtain a point dimension description feature corresponding to the same laser point.

Optionally, the target identification unit is specifically configured to perform multi-task learning on the point dimension description feature of each laser point, where the multi-task learning includes, but is not limited to, center point, size, and direction supervision, so as to implement a target identification task in the target environment.

Optionally, the multi-task learning further includes point cloud segmentation, and the point dimension description features of each laser point are subjected to point cloud segmentation, so that a point cloud segmentation task in the target environment is realized.

Optionally, the multi-task learning further includes predicting a target frame of the target environment according to the point dimension description feature in the target environment; the target frame prediction unit is used for carrying out migration processing on the foreground points according to the migration prediction values of the foreground points to serve as foreground migration points; selecting a plurality of target key points from the foreground offset points; acquiring a foreground offset point set corresponding to each target key point; determining a target prediction frame according to the foreground offset point set; and screening the target prediction frames to eliminate redundant target prediction frames.

Third embodiment

In the foregoing embodiment, a target identification method is provided, and accordingly, the present application also provides an electronic device. The device corresponds to the method embodiment described above. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.

The application provides an electronic device, including: a processor and a memory; wherein the memory is used for storing a program for implementing the above object recognition method, the device is powered on and the program for the method is run by the processor.

The electronic equipment can be a server side or an unmanned vehicle.

In one example, the electronic device is a server, the server can store multi-frame historical point cloud data of an unmanned vehicle in a target environment, receive current frame point cloud data of the unmanned vehicle in the target environment, obtain point characteristics of each point in the current frame point cloud data and the multi-frame historical point cloud data, and obtain multi-view characteristics according to the current frame point cloud data and the multi-frame historical point cloud data; obtaining point dimension description characteristics under the target environment according to the point characteristics of each point and the multi-view characteristics; and identifying the target in the target environment according to the point dimension description characteristics in the target environment.

In another example, the electronic device is an unmanned vehicle, the unmanned vehicle collects current frame point cloud data in a target environment, acquires multi-frame historical point cloud data of the unmanned vehicle in the target environment, acquires point features of each point in the current frame point cloud data and the multi-frame historical point cloud data, and acquires multi-view features according to the current frame point cloud data and the multi-frame historical point cloud data; obtaining point dimension description characteristics under the target environment according to the point characteristics of each point and the multi-view characteristics; and identifying the target in the target environment according to the point dimension description characteristics in the target environment.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. A method of object recognition, comprising:

2. The method of claim 1, wherein obtaining a multi-view feature from the current frame point cloud data and the plurality of frames of historical point cloud data comprises:

3. The method of claim 2, wherein extracting the features of the top view for each frame of point cloud data in the plurality of frames of historical point cloud data comprises:

extracting features in non-empty voxels in each frame of point cloud data;

4. The method of claim 2, further comprising:

5. The method according to claim 4, wherein the obtaining the point dimension description feature under the target environment according to the point feature of each point and the multi-view feature comprises:

6. The method of claim 5, wherein the identifying the target in the target environment according to the point dimension description features in the target environment comprises:

7. The method of claim 5, wherein the multitask learning further comprises point cloud segmentation, then the method further comprises:

8. The method according to any one of claims 1-7, wherein the identifying the target in the target environment according to the point dimension description features in the target environment comprises:

the method further comprises the following steps:

selecting a plurality of target key points from the foreground offset points;

acquiring a foreground offset point set corresponding to each target key point;

9. An object recognition apparatus, comprising:

10. An electronic device, comprising:

a processor; and

a memory for storing a program for implementing the object recognition method according to any one of claims 1 to 8, the apparatus being powered on and the program for executing the method by the processor.