CN114913331A

CN114913331A - Point cloud data-based target detection method and device

Info

Publication number: CN114913331A
Application number: CN202110174725.2A
Authority: CN
Inventors: 苗振伟; 陈纪凯; 朱均; 刘凯旋; 郝培涵; 占新; 卿泉
Original assignee: Alibaba Group Holding Ltd
Current assignee: Wuzhou Online E Commerce Beijing Co ltd
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2022-08-16
Anticipated expiration: 2041-02-08
Also published as: CN114913331B

Abstract

The disclosure discloses a target detection method and device based on point cloud data. The method comprises the following steps: performing voxelization on the point cloud data, and extracting the voxel characteristics of non-empty voxels; obtaining a voxel characteristic and a pixel characteristic corresponding to a laser point in a non-empty voxel according to the voxel characteristic; obtaining the fusion characteristic of the laser point according to the original characteristic of the laser point and the voxel characteristic of the non-empty voxel, and the voxel characteristic and the pixel characteristic corresponding to the laser point; and determining the target to be identified in the point cloud data according to the fusion characteristics of the laser points. The point dimension target segmentation detection is realized, and the detection accuracy and precision are high.

Description

Target detection method and device based on point cloud data

Technical Field

The disclosure relates to the technical field of deep learning, and in particular relates to a point cloud data-based target detection method and device.

Background

The laser point cloud data can be used for predicting the position information and the geometric shape information of the target, so that the method plays an important role in the field of machine perception, such as unmanned driving, robots and the like.

In the prior art, a method for realizing target detection by using point cloud data mainly comprises the following steps:

1. based on a conventional segmentation detection algorithm. The method comprises the steps of filtering ground point clouds from laser point cloud data through a ground segmentation algorithm, clustering the point cloud data based on a graph-based segmentation clustering algorithm, filtering background point clouds, and classifying the segmented point cloud clusters through a classifier (such as an SVM classifier). However, the graph-based segmentation and clustering algorithm is large in calculation amount and depends on a ground segmentation algorithm, so that the detection accuracy and precision are limited in a complex urban environment.

2. A deep learning method based on laser point cloud projection is provided. The 3D laser point cloud data are projected to a specific 2D plane, the 3D laser point cloud target detection problem is reduced to be the target detection problem of a 2D image, but the reduction of the dimensionality also enables the point cloud data to lose partial information of the target, and the accuracy and precision of final target prediction are reduced.

In summary, the method for detecting the target by using the point cloud data in the prior art is difficult to meet the requirements of accuracy and precision.

Disclosure of Invention

In view of the above, the present disclosure is proposed to provide a method and apparatus for object detection based on point cloud data that overcomes or at least partially solves the above problems.

In a first aspect, an embodiment of the present disclosure provides a target detection method based on point cloud data, including:

performing voxelization on the point cloud data, and extracting the voxel characteristics of non-empty voxels;

obtaining the voxel characteristics and the pixel characteristics corresponding to the laser points in the non-empty voxels according to the voxel characteristics;

obtaining the fusion characteristic of the laser point according to the original characteristic of the laser point, the voxel characteristic of the non-empty voxel, and the voxel characteristic and the pixel characteristic corresponding to the laser point;

and determining the target to be identified in the point cloud data according to the fusion characteristics of the laser points.

In a second aspect, an embodiment of the present disclosure provides a target detection apparatus based on point cloud data, including:

the voxelization module is used for voxelizing the point cloud data and extracting the voxel characteristics of non-empty voxels;

the characteristic acquisition module is used for acquiring the voxel characteristic and the pixel characteristic corresponding to the laser point in the non-empty voxel according to the voxel characteristic extracted by the voxelization module;

the fusion module is used for obtaining the fusion characteristic of the laser point according to the original characteristic of the laser point, the voxel characteristic of the non-empty voxel extracted by the voxelization module, and the voxel characteristic and the pixel characteristic which are corresponding to the laser point and obtained by the characteristic obtaining module;

and the target identification module is used for determining a target to be identified in the point cloud data according to the fusion characteristics of the laser points obtained by the fusion module.

In a third aspect, the present disclosure provides a computer program product with an object detection function, which includes a computer program/instruction, where the computer program/instruction when executed by a processor implements the above-mentioned object detection method based on point cloud data.

In a fourth aspect, an embodiment of the present disclosure provides a server, including: the system comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the target detection method based on the point cloud data when executing the program.

The beneficial effects of the above technical scheme provided by the embodiment of the present disclosure at least include:

the target detection method based on the point cloud data provided by the embodiment of the disclosure voxelizes the point cloud data and extracts the voxel characteristics of non-empty voxels; obtaining a voxel characteristic and a pixel characteristic corresponding to a laser point in a non-empty voxel according to the voxel characteristic; obtaining the fusion characteristic of the laser point according to the original characteristic of the laser point and the voxel characteristic of the non-empty voxel, and the voxel characteristic and the pixel characteristic corresponding to the laser point; and determining the target to be identified in the point cloud data according to the fusion characteristics of the laser points. The fusion characteristics of the laser points comprise the original characteristics of the points, and the accurate position information of the points is reserved; the method also comprises voxel characteristics and pixel characteristics corresponding to the laser points, and retains relative information between the surrounding laser points, so that the characteristics of the points are more abundantly represented, the fusion characteristics of the points not only retain the original characteristics of the points, but also comprise abundant context semantic information, the characteristic expression capability of each point is enhanced, and the accuracy and precision of the target identified by deep learning by utilizing the fusion characteristics of the points are ensured.

Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the disclosure. The objectives and other advantages of the disclosure may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.

Drawings

The accompanying drawings are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings:

fig. 1 is a flowchart of a target detection method based on point cloud data according to an embodiment of the disclosure;

FIG. 2 is a flowchart illustrating an implementation of step S12 in FIG. 1;

fig. 3 is a flowchart of a specific implementation of target identification based on foreground point features in the second embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating a specific implementation of step S33 in FIG. 3;

FIG. 5 is an exemplary diagram of a method for object detection based on point cloud data in an embodiment of the disclosure;

fig. 6 is a schematic structural diagram of a target detection apparatus based on point cloud data in an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In order to solve the problem that the accuracy and precision of target detection by using point cloud data are low in the prior art, the embodiment of the disclosure provides a target detection method and device based on point cloud data, which realize point dimension target segmentation detection and have high detection accuracy and precision.

Example one

The embodiment of the disclosure provides a target detection method based on point cloud data, the flow of which is shown in fig. 1, and the method comprises the following steps:

step S11: and (4) performing voxelization on the point cloud data, and extracting the voxel characteristics of non-empty voxels.

The point cloud data can be obtained by a multi-line laser radar or radar such as 4 lines, 16 lines, 32 lines, 64 lines or 128 lines.

Performing point cloud data voxelization, converting point data without space length information into cube data with three-dimensional space information, specifically, determining a minimum cuboid containing all point cloud data according to the minimum value and the maximum value of all point cloud data in X, Y, Z three coordinate directions; determining the size of a voxel according to the size of the minimum cuboid and the resolution requirement; the smallest cuboid is equally divided into voxels according to their size.

In one embodiment, it may include extracting voxel characteristics of non-empty voxels using multi-layer perceptual network MLP and/or convolutional neural network CNN layers.

A Multi-Layer Perceptron (MLP), also called Artificial Neural Network (ANN), comprises an input Layer, an output Layer and at least one hidden Layer in between. The neural network is a technology similar to a biological neural network, and a target is finally achieved through connecting a plurality of characteristic values and through combination of linearity and nonlinearity.

Convolutional Neural Networks (CNNs) are a class of feed Forward Neural Networks (FNNs) that contain convolution computations and have a deep structure, and are one of the algorithms that represent deep learning. The voxel characteristics of non-empty voxels can be extracted through the posing layer of the convolutional neural network.

Alternatively, other manually set features of non-empty voxels may be obtained and added to the voxel features extracted through the network.

The acquired voxel characteristics can be characteristics of each laser point in non-empty voxels, and characteristics obtained by fusion according to the characteristics of the laser points in the non-empty voxels and the characteristics of the surrounding laser points.

Step S12: and obtaining shallow layer information of the laser points in the non-empty voxels, and voxel characteristics and pixel characteristics corresponding to the laser points according to the voxel characteristics.

Referring to fig. 2, a specific implementation flow of step S12 may include the following steps:

step S121: and splicing the original characteristics of the laser points in the non-empty voxels and the voxel characteristics of the non-empty voxels to obtain the shallow information of the laser points.

And (2) performing one-dimensional convolution on the original characteristics { x, y, z, intensity } (the intensity is the reflection intensity information of a point) of the laser point by using a multilayer perception network MLP, splicing the obtained processed characteristics with the voxel characteristics of the voxel where the laser point is located, and then obtaining shallow information of the point dimension of the laser point through the multilayer perception network MLP.

Step S122: and inputting the voxel characteristics of the non-empty voxels into a characteristic network integrating the three-dimensional convolution and the two-dimensional convolution, and extracting the three-dimensional network characteristics and the two-dimensional network characteristics of the non-empty voxels.

Inputting the voxel characteristics of the non-empty voxels into a characteristic network integrating three-dimensional convolution and two-dimensional convolution, sequentially performing three-dimensional convolution and two-dimensional convolution, and extracting the three-dimensional network characteristics and the two-dimensional network characteristics of the non-empty voxels.

Sparse three-dimensional convolution can be carried out on the voxel characteristics of the non-empty voxels through a back bone network to obtain a three-dimensional data volume with set resolution and including the processed voxel characteristics of the non-empty voxels, and the three-dimensional network characteristics of the non-empty voxels are extracted; and performing two-dimensional convolution on the three-dimensional data volume through a backhaul network to obtain a two-dimensional data volume which comprises the pixel characteristics of the non-empty voxels and has a corresponding resolution, and extracting the two-dimensional network characteristics of the non-empty voxels.

For example, 2 times, 4 times and 8 times of downsampling may be performed through a backbone sparse 3D convolutional network, where three-dimensional data volumes of 4 layers of 3D networks are extracted from the backbone network, and the three-dimensional data volumes correspond to full resolution, 1/2, 1/4 and 1/8 resolutions, respectively, and optionally, other resolutions may also be used. And continuously performing two-dimensional convolution on the three-dimensional data volume with each resolution ratio to obtain a two-dimensional data volume with the corresponding resolution ratio.

Step S123: and obtaining the voxel characteristics and the pixel characteristics corresponding to the laser points in the non-empty voxels according to the corresponding relationship between the laser points in the non-empty voxels and the three-dimensional network characteristics and the two-dimensional network characteristics.

The original features of the laser points in the point cloud include the position information of the laser points, the projection positions of the laser points in the three-dimensional data body and the two-dimensional data body are determined according to the position information of the laser points, namely, the voxels where the laser points are located are determined according to the position information of the laser points, and the processed voxel features and pixel features of the voxels in the three-dimensional data body and the two-dimensional data body are determined as the voxel features and pixel features corresponding to the laser points.

Step S13: and obtaining the fusion characteristic of the laser spot according to the shallow information of the laser spot and the corresponding voxel characteristic and pixel characteristic.

And fusing the shallow information of the point dimension of the laser point with the corresponding voxel characteristic and pixel characteristic to obtain the fused characteristic of the point dimension of the laser point.

Step S14: and determining the target to be identified in the point cloud data according to the fusion characteristics of the laser points.

The method realizes the point cloud segmentation task and the target detection task by performing multi-task learning on the fusion characteristics of each laser point, including front background segmentation, point classification identification, iou score supervision, central point, size and angle supervision and the like.

Specifically, foreground and background segmentation is to distinguish whether a laser point is a foreground point or a background point, if the laser point is the background point, the laser point is filtered, and if the laser point is the foreground point, the laser point is used for subsequent target identification; point classification identification, namely after determining that a point is a foreground point, further distinguishing the type of a target corresponding to the foreground point, such as a person, a vehicle or a tree; in an evaluation system of target detection, an iou parameter is provided, simple understanding is the overlapping rate of a target window and an actual window generated by a model, and the iou score is the score of target prediction accuracy; the center point, the size and the angle are the center point, the size and the angle of the target contour identified according to the foreground point.

The first object detection method based on point cloud data provided by the embodiment of the disclosure voxelizes the point cloud data and extracts the voxel characteristics of non-empty voxels; obtaining shallow layer information of laser points in the non-empty voxels, and voxel characteristics and pixel characteristics corresponding to the laser points according to the voxel characteristics; obtaining the fusion characteristic of the laser spot according to the shallow information of the laser spot and the corresponding voxel characteristic and pixel characteristic; and determining the target to be identified in the point cloud data according to the fusion characteristics of the laser points. The fusion characteristics of the laser points comprise shallow information of the points, and accurate position information of the points is reserved; the method also comprises the voxel characteristics and the pixel characteristics corresponding to the laser points, and retains the relative information between the surrounding laser points, so that the characteristics of the points are more abundantly characterized, the fusion characteristics of the points retain the original characteristics of the points and also contain abundant context semantic information, the characteristic expression capability of each point is enhanced, and the accuracy and the precision of the target identified by deep learning by utilizing the fusion characteristics of the points are ensured.

In one embodiment, before the point cloud data is voxelized, at least one of the following steps may be further performed:

filtering background point clouds in the point cloud data;

and filtering point clouds in non-interest blocks in the point cloud data.

By filtering background point cloud and non-interest block point cloud data, more than 50% of irrelevant points can be filtered out, the number of laser points to be processed is greatly reduced, and the subsequent calculation amount is reduced.

When the above-described target detection method is applied to the field of automatic driving, the interest corresponding to the non-interest region refers to a region that affects automatic driving, such as a driving road, a sidewalk that is closer to the driving road, and the like.

Example two

The second embodiment of the present disclosure provides a target identification method based on foreground point features, a flow of which is shown in fig. 3, and the method includes the following steps:

step S31: and identifying the foreground points by the fusion characteristics of the laser points through a deep learning network, and determining target prediction information of the foreground points.

The method comprises the steps of identifying a foreground point after multi-task learning is carried out on the fusion characteristics of each laser point, and determining central point position information, size information, angle information and prediction scores of a target corresponding to the foreground point to serve as target prediction information. Optionally, the target prediction information may further include a classification of the target.

The central point position information can be the offset of the foreground point relative to the target central point, and the central point position information is the offset of the foreground point relative to the target central point rather than a specific coordinate value of the central point because the coordinate value of the central point position is often large, so that the deep learning process is simplified, and the calculated amount is reduced.

Step S32: and screening a preset number of foreground points as main foreground points by using a farthest point sampling mode.

In one embodiment, deleting foreground points in the target prediction information, the prediction scores of which are lower than a preset second score threshold value; determining the central point position of a target corresponding to a foreground point according to the position information in the original features of the foreground point and the offset of the foreground point in the target prediction information relative to the target central point; and screening a preset number of foreground points as main foreground points by utilizing a mode of sampling the farthest points according to the central point position of the target corresponding to the foreground points.

Specifically, one foreground point may be randomly determined as a primary foreground point, and then the foreground points farthest from the current primary foreground point are sequentially determined as the primary foreground points until the number of the primary foreground points reaches the preset number.

The preset number is determined according to the specific situation, and may be determined according to the accuracy requirement of prediction, the approximate number of targets, the size of the target contour and other factors. For example, the predetermined number may be 256, or may be other numbers.

Step S33: and determining the recognition result of the target to be recognized according to the target prediction information of the main foreground point.

In one embodiment, as shown in fig. 4, the following steps may be included:

step S331: and aiming at each main foreground point, determining a set of main foreground points of which the distance between the target central point and the target central point of the main foreground point is less than a preset distance threshold according to the target central point position information in the target prediction information.

For each main foreground point, a ball query method may be used to determine a main foreground point set corresponding to the same target as the main foreground point.

Step S332: and determining a first target prediction result according to the target prediction information of each main foreground point in the set.

The information of the corresponding target can be predicted by different estimators according to the target prediction information of each main foreground point in the set, for example, the average estimator simply averages the central point, size, angle and prediction score of the target in the target prediction information of a single main foreground point to obtain the output of a single target.

Step S333: and clustering the first target prediction results, and determining a second target prediction result according to the first target prediction results of the same class as a target recognition result of the target to be recognized.

Although each main foreground point corresponds to one main foreground point set and further corresponds to one first target prediction result, the obtained main foreground point sets are often the same by taking different main foreground points in the same main foreground point set as a reference, so that the obtained first target prediction results are also often the same. Therefore, the same type of first target prediction results obtained by clustering the first target prediction results may only include one unique first target prediction result, and at this time, the unique first target prediction result is used as the final target identification result of the target.

When the first target prediction results of the same class comprise more than one first target prediction result, determining the average score of the prediction scores in the target prediction information of the main foreground point corresponding to the first target prediction results of the class; and determining a second target prediction result according to the first target prediction result with the average score higher than a preset first score threshold.

According to the target identification method based on the foreground point characteristics, provided by the second embodiment of the disclosure, a farthest point sampling manner is utilized to screen a preset number of foreground points as main foreground points, so that the number of the main foreground points is greatly reduced, the calculated amount is reduced, and the real-time requirement of target prediction is ensured; meanwhile, the selected main foreground point has strong representativeness, and can better complete the prediction of each target, thereby ensuring the prediction accuracy.

Referring to fig. 5, the above process of target detection based on point cloud data can be summarized as follows: (1) point cloud data input, namely inputting the point cloud data into a multi-dimensional fusion end-to-end 3D perception network; (2) performing point cloud voxelization and voxel characteristic extraction, namely voxelizing point cloud data, and extracting the voxel characteristic of each non-empty voxel, wherein the characteristic can comprise the characteristic output by a posing layer of a lightweight multilayer perception network MLP + convolutional neural network CNN and can also comprise the manually designed characteristic; (3) after 3D processing, voxel characteristic extraction and 2D pixel characteristic extraction are carried out on a Backbone network, the voxel characteristics are further input into the characteristic Backbone network integrating 3D volume and 2D convolution, and voxels and pixel characteristics with richer semantic information under multiple scales are extracted, namely three-dimensional data bodies of various resolutions and including the processed voxel characteristics of non-empty voxels and two-dimensional data bodies including the pixel characteristics of the non-empty voxels; (4) determining a voxel where the point is located, and determining the projection positions of the point in the three-dimensional data volume and the two-dimensional data volume according to the position information in the original characteristic of the point, namely determining the voxel to which the point belongs; (5) determining the voxel characteristics and the pixel characteristics of the points after the point processing, and determining the processed voxel characteristics and the pixel characteristics of the points according to the voxels to which the points belong; (6) performing point feature fusion, namely splicing the original features of the laser points with voxel features after passing through mlp networks, and further realizing the extraction of the dimension shallow features of the points in the voxels through mlp networks; on the basis of obtaining the shallow layer characteristics of the points and the corresponding processed body pixel characteristics and pixel characteristics, further fusing the characteristics in the three dimensions to obtain point dimension description characteristics with accurate position information and rich semantic information; (7) and (3) point cloud segmentation and target detection, wherein a point cloud segmentation task and a target detection task are realized by performing multi-task learning on the fusion characteristics of each laser point, including front background segmentation, point classification identification, iou score supervision, and central point, size and angle supervision.

The target detection method in the embodiment of the disclosure can be applied to automatic driving, and can predict objects around a vehicle in the automatic driving process, so as to provide guarantee for the safe realization of automatic driving; the method can also be applied to scenes such as high-precision maps and Augmented Reality (AR) navigation.

Based on the inventive concept of the present disclosure, an embodiment of the present disclosure further provides a target detection apparatus based on point cloud data, which has a structure as shown in fig. 6, and includes:

the voxelization module 61 is used for voxelizing the point cloud data and extracting the voxel characteristics of non-empty voxels;

a feature obtaining module 62, configured to obtain shallow information of a laser point in the non-empty voxel according to the voxel feature extracted by the voxelization module 61, and a voxel feature and a pixel feature corresponding to the laser point;

a fusion module 63, configured to obtain a fusion feature of the laser spot according to the shallow information of the laser spot obtained by the feature obtaining module 62 and the voxel feature and the pixel feature corresponding to the shallow information;

and the target identification module 64 is configured to determine a target to be identified in the point cloud data according to the fusion feature of the laser point obtained by the fusion module 63.

In an embodiment, the feature obtaining module 62 is configured to obtain shallow information of a laser point in the non-empty voxel according to the voxel feature, and specifically configured to:

and splicing the original characteristics of the laser points in the non-empty voxels and the voxel characteristics of the non-empty voxels to obtain the shallow information of the laser points.

In an embodiment, the feature obtaining module 62 obtains a voxel feature and a pixel feature corresponding to the laser point, and is specifically configured to:

inputting the voxel characteristics of the non-empty voxels into a characteristic network integrating three-dimensional convolution and two-dimensional convolution, and extracting the three-dimensional network characteristics and the two-dimensional network characteristics of the non-empty voxels; and obtaining the voxel characteristics and the pixel characteristics corresponding to the laser points in the non-empty voxels according to the corresponding relationship between the laser points in the non-empty voxels and the three-dimensional network characteristics and the two-dimensional network characteristics.

In one embodiment, the target identification module 64 determines the target to be identified in the point cloud data according to the fusion feature of the laser point, and is specifically configured to:

identifying a foreground point by the fusion characteristics of the laser points through a deep learning network, and determining target prediction information of the foreground point; screening a preset number of foreground points as main foreground points by using a farthest point sampling mode; and determining the recognition result of the target to be recognized according to the target prediction information of the main foreground point.

In an embodiment, the target identification module 64 determines an identification result of the target to be identified according to the target prediction information of the primary foreground point, and is specifically configured to:

aiming at each main foreground point, determining a set of main foreground points of which the distance between a target central point and the target central point of the main foreground point is less than a preset distance threshold according to the target central point position information in the target prediction information of the main foreground point; determining a first target prediction result according to the target prediction information of each main foreground point in the set; and clustering the first target prediction results, and determining a second target prediction result according to the first target prediction results of the same class as a target recognition result of the target to be recognized.

In one embodiment, the target identification module 64 determines the second target prediction result according to the first target prediction result of the same class, and is specifically configured to:

if the first target prediction results of the same class contain more than one first target prediction result, determining the average score of the prediction scores in the target prediction information of the main foreground points corresponding to the first target prediction results of the class; and determining a second target prediction result according to the first target prediction result with the score average value higher than a preset first score threshold value.

In one embodiment, the target identification module 64 determines the target prediction information of the foreground point, and is specifically configured to:

and determining central point position information, size information, angle information and a prediction score of a target corresponding to the foreground point as target prediction information, wherein the central point position information is the offset of the foreground point relative to the target central point.

In one embodiment, the target identification module 64 filters a preset number of foreground points as main foreground points in a farthest point sampling manner, and is specifically configured to:

deleting foreground points with the prediction scores lower than a preset second score threshold value in the target prediction information; determining the central point position of a target corresponding to a foreground point according to the position information in the original features of the foreground point and the offset of the foreground point in the target prediction information relative to the target central point; and screening a preset number of foreground points as main foreground points by utilizing a mode of sampling the farthest points according to the central point position of the target corresponding to the foreground points.

In one embodiment, voxelization module 61 extracts voxel characteristics of non-empty voxels, and is specifically configured to:

and extracting the voxel characteristics of the non-empty voxels by using the multilayer perception network MLP and/or the convolutional neural network CNN.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Based on the inventive concept of the present disclosure, an embodiment of the present disclosure further provides a computer program product with an object detection function, which includes a computer program/instruction, where the computer program/instruction is executed by a processor to implement the above object detection method based on point cloud data.

Based on the inventive concept of the present disclosure, an embodiment of the present disclosure further provides a server, including: the system comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the target detection method based on the point cloud data when executing the program.

Unless specifically stated otherwise, terms such as processing, computing, calculating, determining, displaying, or the like, may refer to an action and/or process of one or more processing or computing systems or similar devices that manipulates and transforms data represented as physical (e.g., electronic) quantities within the processing system's registers and memories into other data similarly represented as physical quantities within the processing system's memories, registers or other such information storage, transmission or display devices. Information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.

It should be understood that the specific order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy presented.

In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, the disclosure may lie in less than all features of a single disclosed embodiment. Thus, the following claims are hereby expressly incorporated into the detailed description, with each claim standing on its own as a separate preferred embodiment of the disclosure.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. Of course, the processor and the storage medium may reside as discrete components in a user terminal.

For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in memory units and executed by processors. The memory unit may be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.

What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean a "non-exclusive or". The terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Claims

1. A target detection method based on point cloud data comprises the following steps:

and determining the target to be identified in the point cloud data according to the fusion characteristic of the laser point.

2. The method according to claim 1, wherein obtaining the voxel characteristics and the pixel characteristics corresponding to the laser points specifically comprises:

inputting the voxel characteristics of the non-empty voxels into a characteristic network integrating three-dimensional convolution and two-dimensional convolution, and extracting the three-dimensional network characteristics and the two-dimensional network characteristics of the non-empty voxels;

and obtaining the voxel characteristics and the pixel characteristics corresponding to the laser points in the non-empty voxels according to the corresponding relationship between the laser points in the non-empty voxels and the three-dimensional network characteristics and the two-dimensional network characteristics.

3. The method according to claim 1, wherein the determining the target to be identified in the point cloud data according to the fusion feature of the laser point specifically comprises:

identifying a foreground point through the fusion characteristics of the laser point through a deep learning network, and determining target prediction information of the foreground point;

screening a preset number of foreground points as main foreground points by using a farthest point sampling mode;

and determining the recognition result of the target to be recognized according to the target prediction information of the main foreground point.

4. The method as claimed in claim 3, wherein the determining the recognition result of the target to be recognized according to the target prediction information of the primary foreground point specifically includes:

aiming at each main foreground point, according to the position information of the target central point in the target prediction information, determining a set of the main foreground points of which the distance between the target central point and the target central point of the main foreground point is less than a preset distance threshold;

determining a first target prediction result according to the target prediction information of each main foreground point in the set;

and clustering the first target prediction results, and determining a second target prediction result according to the first target prediction results of the same class as a target recognition result of the target to be recognized.

5. The method of claim 4, wherein determining the second target prediction based on the first target prediction of the same class comprises:

if the first target prediction results of the same class comprise more than one first target prediction result, determining the average score of the prediction scores in the target prediction information of the main foreground point corresponding to the first target prediction results of the class;

and determining a second target prediction result according to the first target prediction result with the average score higher than a preset first score threshold.

6. The method according to any one of claims 3 to 5, wherein the determining the target prediction information of the foreground point specifically comprises:

7. The method as claimed in claim 6, wherein the filtering a preset number of foreground points as the main foreground points by using a farthest point sampling method specifically includes:

deleting foreground points with prediction scores lower than a preset second score threshold in the target prediction information;

determining the central point position of a target corresponding to a foreground point according to the position information in the original features of the foreground point and the offset of the foreground point in the target prediction information relative to the target central point;

and screening a preset number of foreground points as main foreground points by utilizing a mode of sampling the farthest points according to the central point position of the target corresponding to the foreground points.

8. The method according to claim 1, wherein the extracting voxel characteristics of non-empty voxels specifically includes:

9. An object detection apparatus based on point cloud data, comprising:

10. A computer program product with object detection functionality, comprising a computer program/instructions, wherein the computer program/instructions, when executed by a processor, implement the method for object detection based on point cloud data according to any of claims 1 to 8.