CN115546785A - Three-dimensional target detection method and device - Google Patents

Three-dimensional target detection method and device Download PDF

Info

Publication number
CN115546785A
CN115546785A CN202211504429.5A CN202211504429A CN115546785A CN 115546785 A CN115546785 A CN 115546785A CN 202211504429 A CN202211504429 A CN 202211504429A CN 115546785 A CN115546785 A CN 115546785A
Authority
CN
China
Prior art keywords
target
detected
points
determining
point cloud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211504429.5A
Other languages
Chinese (zh)
Inventor
张天奇
曹容川
陈博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FAW Group Corp
Original Assignee
FAW Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FAW Group Corp filed Critical FAW Group Corp
Priority to CN202211504429.5A priority Critical patent/CN115546785A/en
Publication of CN115546785A publication Critical patent/CN115546785A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The application discloses a three-dimensional target detection method and device. Wherein, the method comprises the following steps: acquiring an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected; normalizing a plurality of first points to be detected in the three-dimensional candidate area range to obtain a plurality of second points to be detected; extracting the features of the second points to be detected to obtain the target features of the original point cloud; and determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected. The method and the device solve the technical problem that the target size is uncertain due to the fact that the related technology cannot accurately judge the candidate area.

Description

Three-dimensional target detection method and device
Technical Field
The application relates to the technical field of automatic driving, in particular to a three-dimensional target detection method and device.
Background
Currently, automatic driving technology has gradually become the center of future vehicle driving development, and a crucial task in automatic driving technology is to estimate the state of surrounding objects in a complex real-world environment.
Generally, the related art employs three-dimensional object detection based on point cloud. In the detection process, in view of the irregularity of point cloud data distribution, a voxel segmentation method can be adopted, namely, the point cloud is converted into a voxel with hand-made characteristics to realize three-dimensional target detection, but the generalization capability of the hand-made characteristics limits the real performance of the point cloud in a complex real world; or the point cloud can also be converted into multiple views in two-dimensional space, but the problem of data loss is usually caused when the multiple views are constructed.
Therefore, no matter the point cloud is converted into a regular voxel grid, or the point cloud is projected onto a two-dimensional space, quantization errors are caused to limit the performance of the point cloud, so that the error of a predicted value is large, and the problem of size ambiguity is caused.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the application provides a three-dimensional target detection method and device, and aims to at least solve the technical problem of uncertainty of target size caused by the fact that a candidate area cannot be accurately judged in the related technology.
According to an aspect of an embodiment of the present application, there is provided a three-dimensional target detection method, including: acquiring an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected; carrying out normalization processing on a plurality of first detection points to be detected in the three-dimensional candidate area range to obtain a plurality of second detection points to be detected; extracting the features of the second points to be detected to obtain the target features of the original point cloud; and determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected.
Optionally, normalizing the plurality of first points to be detected within the three-dimensional candidate region range to obtain a plurality of second points to be detected includes: and establishing a normalized coordinate system by taking the center of the three-dimensional candidate area as an origin, taking the course direction as an x axis, taking the direction which is horizontally orthogonal to the course direction as a y axis, and taking the direction which is vertically upward as a z axis, and carrying out normalization processing on the plurality of first points to be detected to obtain a plurality of second points to be detected.
Optionally, the step of performing feature extraction on the second points to be detected to obtain target features of the original point cloud includes: acquiring a plurality of target sub-features of a plurality of second points to be detected; based on the plurality of target sub-features, a target feature of the original point cloud is determined.
Optionally, determining a target detection result of the target to be detected based on the target feature includes: calculating the classification loss of the target characteristics based on a classification loss function, and determining the target category of the target to be detected; calculating the regression loss of the target characteristics based on a regression loss function, and determining the target pose of the target to be detected; and determining a target detection result of the target to be detected according to the target category and the target pose.
Optionally, calculating a classification loss of the target feature based on the classification loss function, and determining a target class of the target to be detected includes: determining the number of positive samples in the original point cloud, wherein the positive samples are second points to be detected, and the intersection ratio of the second points to be detected is larger than a preset threshold value; determining the probability of the target category to which the original point cloud belongs by adopting a probability pre-estimation model; calculating the classification loss of the target features based on the number and probability of the positive samples; and determining the target category of the target to be detected according to the classification loss.
Optionally, a target anchor point corresponding to the target category is determined according to the target category of the target to be detected.
Optionally, calculating a regression loss of the target feature based on a regression loss function, and determining the target pose of the target to be detected includes: calculating the regression loss of the target size, the regression loss of the target position and the regression loss of the target direction according to the pose coordinate of the second point to be detected, wherein the pose coordinate comprises the following steps: pose information, size information, and orientation information; calculating a regression loss of the target feature based on the regression loss of the target size, the regression loss of the target position, and the regression loss of the target direction; and determining the target pose of the target to be detected according to the regression loss.
According to another aspect of the embodiments of the present application, there is also provided a three-dimensional target detection apparatus, including: the acquisition module is used for acquiring an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected; the processing module is used for carrying out normalization processing on a plurality of first detection points to be detected in the three-dimensional candidate area range to obtain a plurality of second detection points to be detected; the extraction module is used for extracting the features of the second points to be detected to obtain the target features of the original point cloud; the determining module is used for determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected.
According to another aspect of the embodiments of the present application, there is also provided a non-volatile storage medium including a stored program, wherein an apparatus in which the non-volatile storage medium is located executes the three-dimensional object detection method by running a computer program.
According to another aspect of the embodiments of the present application, there is also provided an electronic device, including: a memory in which a computer program is stored, and a processor configured to execute the above-described three-dimensional object detection method by the computer program.
In the embodiment of the application, the method comprises the steps of obtaining an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected; carrying out normalization processing on a plurality of first detection points to be detected in the three-dimensional candidate area range to obtain a plurality of second detection points to be detected; extracting the features of the second points to be detected to obtain the target features of the original point cloud; and determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected. Due to the irregularity of the original point cloud distribution, the further quantization error is avoided by carrying out normalization processing on the first point to be detected in the original point cloud, so that the technical problem of uncertainty of target size caused by the fact that a candidate area cannot be accurately judged in the related technology is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic flow chart of a three-dimensional target detection method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an alternative PointNet configuration according to embodiments of the present application;
fig. 3 is a schematic structural diagram of a three-dimensional object detection apparatus according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be implemented in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For a better understanding of the embodiments of the present application, some of the terms or expressions appearing in the course of describing the embodiments of the present application are to be interpreted as follows:
object Detection (OD) is a behavioral action to find and mark out a region of interest from a video or picture, and can identify and locate objects of a specific category by extracting features through an algorithm.
Example 1
Automatic driving is always a popular research direction in the vehicle field, and the current automatic driving systems can be divided into two types: unmanned driving focuses on full unmanned vehicle driving to save the Driver's labor cost, and ADAS (Advanced Driver Assistance System) focuses on assisting the Driver to reduce the Driver's stress in driving and improve the safety of the vehicle. In both of them, various sensors mounted on the vehicle are used for collecting data, and the data are combined with map data to perform system calculation, so that the type, the accurate position and the orientation of an obstacle are identified, and a reasonable driving route is guaranteed to be planned and the vehicle is controlled to safely reach a preset pose.
Autonomous vehicles typically use LiDAR (Light Detection And Ranging) And a plurality of cameras mounted at different poses of the vehicle to collect perception data, and then analyze And detect the collected visual data to locate targets such as lanes, vehicles And pedestrians. With the rapid development of deep learning and artificial intelligence technology, the capability of analyzing image data by a computer is remarkably improved compared with the traditional method.
In smart driving systems, mainstream sensor solutions include laser RADAR, cameras, and millimeter wave RADAR (RADAR). The laser radar has the advantages of three-dimensional modeling, wide detection range and high detection precision, so that the three-dimensional target detection based on the laser radar point cloud is a popular research direction, however, the current three-dimensional target detection based on the laser radar point cloud still has some problems, one is that data output by a laser radar sensor is sparse point cloud and does not have dense characteristics like an image output by a camera. The other is that the point cloud output by the laser radar is three-dimensional data, and the three-dimensional data is compressed to two-dimensional projection so as to reduce the high calculation amount of the three-dimensional laser radar data, but information loss is usually caused in the process.
In order to solve the above problems, an embodiment of a method for detecting a three-dimensional target is provided in the present application, and specifically, an unfiltered original point cloud is used to more accurately reflect a spatial position of a target to be detected; due to the irregularity of the distribution of the original point cloud, the further quantization error is avoided by carrying out normalization processing on the first point to be detected in the original point cloud; aiming at the problem of size ambiguity possibly existing in the target detection process, different anchor points are set for different types of targets, and the predicted value of the target detection result is refined.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a schematic flowchart of an alternative three-dimensional target detection method according to an embodiment of the present application, and as shown in fig. 1, the method at least includes steps S102-S108, where:
step S102, acquiring an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected.
Optionally, the original point cloud in the three-dimensional candidate region where the target to be detected is located may be obtained by using a laser radar, a 3D camera, or the like, where the laser radar may accurately measure distance information between the sensor and the obstacle, and provide rich geometric information, shape, and scale information, and therefore the preferred mode is the laser radar.
The obtained original point cloud can reflect the most accurate position information, often the point cloud can only reflect the characteristics of a plurality of first points to be detected, the space between the first points to be detected is ignored, and basic information such as the proportion of a target to be detected is encoded by the space, so that the quantization error of a predicted value is caused.
For this reason, the embodiment of the present application may obtain the refined predicted value through steps S104 to S108.
Step S104, carrying out normalization processing on a plurality of first points to be detected in the three-dimensional candidate area range to obtain a plurality of second points to be detected.
Optionally, a normalization coordinate system is established by taking the center of the three-dimensional candidate area as an origin, the heading direction as an x-axis, the direction horizontally orthogonal to the heading direction as a y-axis, and the vertical direction as a z-axis, and the plurality of first points to be detected are normalized to obtain a plurality of second points to be detected.
Specifically, in order to improve the recall rate and detection accuracy of target detection, the length and width of the three-dimensional candidate region bounding box may be expanded. In addition, because the coordinates of the first point to be detected are based on the pixel value of the original image of the target to be detected, the size of the target to be detected needs to be determined in advance during target detection, and if the original image is zoomed, the size information of the target to be detected cannot be accurately determined
Figure 147146DEST_PATH_IMAGE001
Converting the coordinate system into a normalized coordinate system to obtain a second point to be detected
Figure 404689DEST_PATH_IMAGE002
And establishing a normalized coordinate system by taking the course direction as an x axis, the horizontal orthogonal direction of the course direction as a y axis and the vertical direction as a z axis.
And S106, extracting the features of the second points to be detected to obtain the target features of the original point cloud.
Optionally, a plurality of target sub-features of a plurality of second points to be detected are obtained; based on the plurality of target sub-features, a target feature of the original point cloud is determined.
As an optional implementation manner, the PointNet is used to perform feature extraction on the second points to be detected to obtain target features of the original point cloud. PointNet consists of three fully-connected layers of multilayer perceptron (MLP) and max pooling operator for feature aggregation.
Wherein, two space transformation networks in the PointNet are respectively as follows: the input transformation network is used for aligning point clouds in a three-dimensional space, and simply comprehends that the pose is changed, so that the changed pose is more beneficial to classification, for example, a target to be detected is rotated to the front; and a feature transformation network for aligning the extracted 64-dimensional features, i.e. converting the point cloud at the feature level. The multi-layer perceptron is realized by convolution sharing weight, the size of a convolution kernel of the first layer is 1 × 3 (the dimensionality of each point is xyz), the size of each convolution kernel of the subsequent layer is 1 × 1, namely, a feature extraction layer only connects each point, 512-dimensional features are extracted from each point after passing through two space transformation networks and two multi-layer perceptrons, the features are changed into N × 512 global features through a maximum pooling operator, and probability scores of 256 categories are obtained through the last multi-layer perceptron; and finally, inputting target features of the original point cloud into two branches, wherein one branch is used for classifying the target to be detected, namely 256 × C +1, and the other branch is used for posture regression of the target to be detected, namely 256 × 7, wherein 7 represents 7 degrees of freedom, namely three-dimensional coordinates, dimensions and rotation angles.
Specifically, as shown in fig. 2, in order to eliminate the size ambiguity problem, multiple first points to be detected in the original point cloud are normalized, and multiple second points to be detected after normalization are used as input, where N represents the number of the second points to be detected, and 3 represents a dimension of a coordinate; extracting target sub-features of a plurality of second points to be detected from input data through a multilayer perceptron; and executing a maximum pooling operator on each dimension of the characteristic to obtain the target characteristic of the original point cloud.
And S108, determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises the target type and the target pose of the target to be detected.
As an optional implementation manner, the classification loss of the target features is calculated based on a classification loss function, and the target class of the target to be detected is determined; calculating the regression loss of the target characteristics based on a regression loss function, and determining the target pose of the target to be detected; and determining a target detection result of the target to be detected according to the target category and the target pose.
Optionally, determining the number of positive samples in the original point cloud, wherein the positive samples are second points to be detected, and the intersection ratio of the second points to be detected is greater than a preset threshold; determining the probability of the target category to which the original point cloud belongs by adopting a probability pre-estimation model; calculating the classification loss of the target features based on the number and the probability of the positive samples; and determining the target category of the target to be detected according to the classification loss.
For example, by calculating the intersection ratio of the second points to be detected, the second points to be detected larger than the preset threshold are determined as positive samples, and the second points to be detected smaller than the preset threshold are determined as negative samples, wherein different preset thresholds are set for different types of targets, and the setting of the preset thresholds can be set according to actual conditions. The probability of the target category to which the predicted original point cloud belongs is output through PointNet, and the classification loss is calculated through a classification loss function, wherein the classification loss function can be a cross entropy loss function, and the expression of the classification loss function is as follows:
Figure 878396DEST_PATH_IMAGE003
wherein B is the number of all the second points to be detected in the original point cloud,
Figure 988435DEST_PATH_IMAGE004
is the true category of the object to be detected,
Figure 940210DEST_PATH_IMAGE005
and outputting the probability of the target category to which the predicted original point cloud belongs for the PointNet.
Figure 147201DEST_PATH_IMAGE004
The smaller the uncertainty, the better the classification result.
It should be noted that the target anchor point corresponding to the target type may be determined according to the target type of the target to be detected.
Optionally, the regression loss of the target size, the regression loss of the target position, and the regression loss of the target direction are calculated according to the pose coordinates of the second point to be detected, where the pose coordinates include: pose information, size information, and orientation information; calculating regression loss of the target feature based on the regression loss of the target size, the regression loss of the target position and the regression loss of the target direction; and determining the target pose of the target to be detected according to the regression loss.
For example, for a positive sample, the pose coordinates of the first point to be detected are first set
Figure 667175DEST_PATH_IMAGE006
And converting the position and pose coordinates of the first point to be detected into a normalized coordinate system to obtain the position and pose coordinates of the second point to be detected
Figure 389143DEST_PATH_IMAGE007
Wherein, the normalization transformation can be expressed by the following formula:
Figure 19976DEST_PATH_IMAGE008
calculating the regression loss of the target size by a regression loss function:
Figure 815893DEST_PATH_IMAGE009
calculating the regression loss of the target location by a regression loss function:
Figure 631403DEST_PATH_IMAGE010
specifically, the pass formula is determined
Figure 214568DEST_PATH_IMAGE011
Calculating the minimum residual error of the target to be detected in the original direction and the overturning direction, and determining the regression loss of the target direction according to the minimum residual error, wherein when the minimum residual error is detected, the regression loss of the target direction is determined
Figure 773726DEST_PATH_IMAGE012
When the utility model is used, the water is discharged,
Figure 689729DEST_PATH_IMAGE013
(ii) a When in use
Figure 285927DEST_PATH_IMAGE014
When the utility model is used, the water is discharged,
Figure 982487DEST_PATH_IMAGE015
through the above-described calculation of the regression loss of the target size, the regression loss of the target position, and the regression loss of the target direction, the regression loss of the target feature may be calculated by a regression loss function, wherein the regression loss function may be a stable Smooth L1, wherein,
Figure 955122DEST_PATH_IMAGE016
and finally, determining the target pose of the target to be detected according to the regression loss of the target characteristics.
The overall loss function can also be determined by a weighted average of the classification loss and the regression loss:
Figure 991212DEST_PATH_IMAGE017
wherein, the first and the second end of the pipe are connected with each other,
Figure 882944DEST_PATH_IMAGE018
and
Figure 942167DEST_PATH_IMAGE019
the weighting parameters can be set according to actual conditions. Integral loss function
Figure 46389DEST_PATH_IMAGE020
The smaller the calculated value of (A), the smaller the inconsistency degree between the real value and the predicted value is, namely the predicted value is closer to the real value.
In the embodiment of the application, the method comprises the steps of obtaining an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected; carrying out normalization processing on a plurality of first detection points to be detected in the three-dimensional candidate area range to obtain a plurality of second detection points to be detected; extracting the features of the second points to be detected to obtain the target features of the original point cloud; and determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected. Due to the fact that the distribution of the original point cloud is irregular, the first point to be detected in the original point cloud is subjected to normalization processing, further quantization errors are avoided, and the technical problem that the target size is uncertain due to the fact that a candidate area cannot be accurately judged in the related technology is solved.
Example 2
According to an embodiment of the present application, there is also provided a three-dimensional object detection apparatus for implementing the three-dimensional object detection method, as shown in fig. 3, the apparatus at least includes an obtaining module 31, a processing module 32, an extracting module 33, and a determining module 34, where:
the acquiring module 31 is configured to acquire an original point cloud in a three-dimensional candidate region where the target to be detected is located, where the original point cloud includes a plurality of first points to be detected.
Optionally, the original point cloud in the three-dimensional candidate area where the target to be detected is located may be obtained by using a laser radar, a 3D camera, or the like, where the laser radar may accurately measure the distance information between the sensor and the obstacle, and provide rich geometric information, shape and scale information, and therefore the preferred mode is the laser radar.
The obtained original point cloud can reflect the most accurate position information, often the point cloud can only reflect the characteristics of a plurality of first points to be detected, the space between the first points to be detected is ignored, and basic information such as the proportion of a target to be detected is encoded by the space, so that the quantization error of a predicted value is caused.
For this reason, the embodiment of the present application may obtain the refined predicted value through the following processing module 32, the extraction module 33, and the determination module 34.
The processing module 32 is configured to perform normalization processing on the plurality of first points to be detected within the three-dimensional candidate region range to obtain a plurality of second points to be detected.
Optionally, a normalization coordinate system is established by taking the center of the three-dimensional candidate area as an origin, the heading direction as an x-axis, the direction horizontally orthogonal to the heading direction as a y-axis, and the vertical direction as a z-axis, and the plurality of first points to be detected are normalized to obtain a plurality of second points to be detected.
Specifically, in order to increase the length and width of the three-dimensional candidate region bounding box by improving the recall rate of target detection and the detection accuracy. In addition, because the coordinates of the first point to be detected are based on the pixel value of the original image of the target to be detected, the size of the target to be detected needs to be determined in advance during target detection, and if the original image is zoomed, the size information of the target to be detected cannot be accurately determined
Figure 468143DEST_PATH_IMAGE021
Converting the coordinate system into a normalized coordinate system to obtain a second point to be detected
Figure 170258DEST_PATH_IMAGE022
And establishing a normalized coordinate system by taking the course direction as an x axis, the horizontal orthogonal direction of the course direction as a y axis and the vertical direction as a z axis.
And the extraction module 33 is configured to perform feature extraction on the plurality of second points to be detected to obtain target features of the original point cloud.
Selectively, acquiring a plurality of target sub-features of a plurality of second points to be detected; based on the plurality of target sub-features, target features of the original point cloud are determined.
As an optional implementation manner, the PointNet is used to perform feature extraction on the second points to be detected to obtain target features of the original point cloud. PointNet consists of three fully-connected layers of multilayer perceptron (MLP) and max pooling operator for feature aggregation.
Wherein, two space transformation networks in the PointNet are respectively as follows: the input transformation network is used for aligning point clouds in a three-dimensional space, and simply comprehends that the pose is changed, so that the changed pose is more beneficial to classification, for example, a target to be detected is rotated to the front; and the feature transformation network is used for aligning the extracted 64-dimensional features, namely converting the point cloud at a feature layer. The multi-layer perceptron is realized by convolution of shared weight, the size of a convolution kernel of a first layer is 1 x 3 (the dimension of each point is xyz), the size of each convolution kernel of the subsequent layer is 1 x 1, namely, a feature extraction layer only connects each point, 1024-dimensional features are extracted from each point after passing through two space transformation networks and two multi-layer perceptrons, the overall features are changed into 1 x 1024 by a maximum pooling operator, and probability scores of k categories are obtained by the last multi-layer perceptron; and finally, inputting the target characteristics of the original point cloud into two branches, wherein one branch is used for classifying the target to be detected, and the other branch is used for returning the pose of the target to be detected.
The determining module 34 is configured to determine a target detection result of the target to be detected based on the target feature, where the target detection result includes a target category and a target pose of the target to be detected.
As an optional implementation manner, the classification loss of the target features is calculated based on a classification loss function, and the target class of the target to be detected is determined; calculating the regression loss of the target characteristics based on a regression loss function, and determining the target pose of the target to be detected; and determining a target detection result of the target to be detected according to the target category and the target pose.
Optionally, determining the number of positive samples in the original point cloud, wherein the positive samples are second points to be detected, and the intersection ratio of the second points to be detected is greater than a preset threshold; determining the probability of the target category to which the original point cloud belongs by adopting a probability pre-estimation model; calculating the classification loss of the target features based on the number and the probability of the positive samples; and determining the target category of the target to be detected according to the classification loss.
It should be noted that the target anchor point corresponding to the target type may be determined according to the target type of the target to be detected.
Optionally, the regression loss of the target size, the regression loss of the target position, and the regression loss of the target direction are calculated according to the pose coordinates of the second point to be detected, where the pose coordinates include: pose information, size information, and orientation information; calculating a regression loss of the target feature based on the regression loss of the target size, the regression loss of the target position, and the regression loss of the target direction; and determining the target pose of the target to be detected according to the regression loss.
It should be noted that, in the embodiment of the present application, each module in the three-dimensional object detection device corresponds to each implementation step of the three-dimensional object detection method in embodiment 1 one to one, and since the detailed description is already performed in embodiment 1, details that are not partially represented in this embodiment may refer to embodiment 1, and are not described herein again.
Example 3
According to an embodiment of the present application, there is also provided a nonvolatile storage medium including a stored program, wherein, when the program runs, a device in which the nonvolatile storage medium is located is controlled to execute the three-dimensional object detection method in embodiment 1.
Optionally, the apparatus in which the non-volatile storage medium is located controls the computer program to execute the following steps: acquiring an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected; carrying out normalization processing on a plurality of first detection points to be detected in the three-dimensional candidate area range to obtain a plurality of second detection points to be detected; extracting the features of the second points to be detected to obtain target features of the original point cloud; and determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected.
Optionally, a normalization coordinate system is established by taking the center of the three-dimensional candidate area as an origin, the heading direction as an x-axis, the direction horizontally orthogonal to the heading direction as a y-axis, and the vertical direction as a z-axis, and the plurality of first points to be detected are normalized to obtain a plurality of second points to be detected.
As an optional implementation manner, the PointNet is used to perform feature extraction on the second points to be detected to obtain target features of the original point cloud. PointNet consists of three fully-connected-Layer multi-Layer perceptrons (MLPs) and a max-pooling operator for feature aggregation.
As an optional implementation manner, the classification loss of the target features is calculated based on a classification loss function, and the target class of the target to be detected is determined; calculating the regression loss of the target characteristics based on a regression loss function, and determining the target pose of the target to be detected; and determining a target detection result of the target to be detected according to the target category and the target pose.
Optionally, determining the number of positive samples in the original point cloud, wherein the positive samples are second points to be detected, and the intersection ratio of the second points to be detected is greater than a preset threshold; determining the probability of the target category to which the original point cloud belongs by adopting a probability pre-estimation model; calculating the classification loss of the target features based on the number and probability of the positive samples; and determining the target category of the target to be detected according to the classification loss.
Optionally, a target anchor point corresponding to the target category is determined according to the target category of the target to be detected.
Example 4
According to an embodiment of the present application, there is also provided a processor configured to execute a program, where the program executes the three-dimensional object detection method in embodiment 1 when running.
Optionally, the apparatus in which the non-volatile storage medium is controlled during program execution performs the following steps: acquiring an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected; carrying out normalization processing on a plurality of first detection points to be detected in the three-dimensional candidate area range to obtain a plurality of second detection points to be detected; extracting the features of the second points to be detected to obtain the target features of the original point cloud; and determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected.
Optionally, a normalization coordinate system is established by taking the center of the three-dimensional candidate area as an origin, the heading direction as an x-axis, the direction horizontally orthogonal to the heading direction as a y-axis, and the vertical direction as a z-axis, and the plurality of first points to be detected are normalized to obtain a plurality of second points to be detected.
As an optional implementation manner, the PointNet is used to perform feature extraction on the second points to be detected to obtain target features of the original point cloud. PointNet consists of three fully-connected-Layer multi-Layer perceptrons (MLPs) and a max-pooling operator for feature aggregation.
As an optional implementation manner, the classification loss of the target features is calculated based on a classification loss function, and the target class of the target to be detected is determined; calculating the regression loss of the target characteristics based on a regression loss function, and determining the target pose of the target to be detected; and determining a target detection result of the target to be detected according to the target category and the target pose.
Optionally, determining the number of positive samples in the original point cloud, wherein the positive samples are second points to be detected, and the intersection ratio of the second points to be detected is greater than a preset threshold; determining the probability of the target category to which the original point cloud belongs by adopting a probability pre-estimation model; calculating the classification loss of the target features based on the number and probability of the positive samples; and determining the target category of the target to be detected according to the classification loss.
Optionally, a target anchor point corresponding to the target category is determined according to the target category of the target to be detected.
Example 5
According to an embodiment of the present application, there is also provided an electronic device including: a memory in which a computer program is stored, and a processor configured to execute the three-dimensional object detection method in embodiment 1 by the computer program.
Optionally, the apparatus in which the non-volatile storage medium is controlled during program execution performs the following steps: acquiring an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected; normalizing a plurality of first points to be detected in the three-dimensional candidate area range to obtain a plurality of second points to be detected; extracting the features of the second points to be detected to obtain the target features of the original point cloud; and determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected.
Optionally, a normalization coordinate system is established by taking the center of the three-dimensional candidate area as an origin, the heading direction as an x-axis, the direction horizontally orthogonal to the heading direction as a y-axis, and the vertical direction as a z-axis, and the plurality of first points to be detected are normalized to obtain a plurality of second points to be detected.
As an optional implementation manner, the PointNet is used to perform feature extraction on the second points to be detected to obtain target features of the original point cloud. PointNet consists of three fully-connected-Layer multi-Layer perceptrons (MLPs) and a max-pooling operator for feature aggregation.
As an optional implementation manner, the classification loss of the target features is calculated based on a classification loss function, and the target class of the target to be detected is determined; calculating the regression loss of the target characteristics based on a regression loss function, and determining the target pose of the target to be detected; and determining a target detection result of the target to be detected according to the target category and the target pose.
Optionally, determining the number of positive samples in the original point cloud, wherein the positive samples are second points to be detected, and the intersection ratio of the second points to be detected is greater than a preset threshold; determining the probability of the target category to which the original point cloud belongs by adopting a probability pre-estimation model; calculating the classification loss of the target features based on the number and the probability of the positive samples; and determining the target category of the target to be detected according to the classification loss.
Optionally, a target anchor point corresponding to the target category is determined according to the target category of the target to be detected.
The above-mentioned serial numbers of the embodiments of the present application are merely for description, and do not represent the advantages and disadvantages of the embodiments.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technical content can be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, or portions or all or portions of the technical solutions that contribute to the prior art, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present application and it should be noted that, as will be apparent to those skilled in the art, numerous modifications and adaptations can be made without departing from the principles of the present application and such modifications and adaptations are intended to be considered within the scope of the present application.

Claims (10)

1. A three-dimensional target detection method is characterized by comprising the following steps:
acquiring an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected;
normalizing the first detection points to be detected in the three-dimensional candidate area range to obtain a plurality of second detection points to be detected;
extracting the features of the second points to be detected to obtain the target features of the original point cloud;
and determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected.
2. The method according to claim 1, wherein normalizing the plurality of first points to be detected within the three-dimensional candidate region to obtain a plurality of second points to be detected comprises:
and establishing a normalized coordinate system by taking the center of the three-dimensional candidate area as an origin, taking the course direction as an x axis, taking the direction horizontally orthogonal to the course direction as a y axis, and taking the vertical direction as a z axis, and performing normalization processing on the plurality of first points to be detected to obtain a plurality of second points to be detected.
3. The method according to claim 1, wherein performing feature extraction on the second points to be detected to obtain target features of the original point cloud comprises:
acquiring a plurality of target sub-features of a plurality of second points to be detected;
determining a target feature of the original point cloud based on a plurality of the target sub-features.
4. The method according to claim 1, wherein determining the target detection result of the target to be detected based on the target feature comprises:
calculating the classification loss of the target characteristics based on a classification loss function, and determining the target category of the target to be detected;
calculating regression loss of the target characteristics based on a regression loss function, and determining the target pose of the target to be detected;
and determining a target detection result of the target to be detected according to the target category and the target pose.
5. The method according to claim 4, wherein the step of calculating the classification loss of the target features based on a classification loss function and determining the target class of the target to be detected comprises:
determining the number of positive samples in the original point cloud, wherein the positive samples are second points to be detected, and the intersection ratio of the second points to be detected is larger than a preset threshold value;
determining the probability of the target category to which the original point cloud belongs by adopting a probability pre-estimation model;
calculating a classification loss of the target feature based on the number of positive samples and the probability;
and determining the target category of the target to be detected according to the classification loss.
6. The method of claim 5, comprising:
and determining a target anchor point corresponding to the target category according to the target category of the target to be detected.
7. The method according to claim 2, wherein calculating the regression loss of the target feature based on a regression loss function, and determining the target pose of the target to be detected comprises:
calculating the regression loss of the target size, the regression loss of the target position and the regression loss of the target direction according to the pose coordinate of the second point to be detected, wherein the pose coordinate comprises the following steps: pose information, size information, and orientation information;
calculating a regression loss for the target feature based on the regression loss for the target dimension, the regression loss for the target location, and the regression loss for the target orientation;
and determining the target pose of the target to be detected according to the regression loss.
8. A three-dimensional object detecting device, comprising:
the system comprises an acquisition module, a detection module and a processing module, wherein the acquisition module is used for acquiring an original point cloud in a three-dimensional candidate region where a target to be detected is located, and the original point cloud comprises a plurality of first points to be detected;
the processing module is used for carrying out normalization processing on the first detection points to be detected in the three-dimensional candidate area range to obtain a plurality of second detection points to be detected;
the extraction module is used for extracting the features of the second points to be detected to obtain the target features of the original point cloud;
and the determining module is used for determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected.
9. A non-volatile storage medium, characterized in that the non-volatile storage medium includes a stored program, wherein a device on which the non-volatile storage medium resides executes the three-dimensional object detection method according to any one of claims 1 to 7 by running the computer program.
10. An electronic device, comprising: a memory in which a computer program is stored and a processor configured to execute the three-dimensional object detection method of any one of claims 1 to 7 by the computer program.
CN202211504429.5A 2022-11-29 2022-11-29 Three-dimensional target detection method and device Pending CN115546785A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211504429.5A CN115546785A (en) 2022-11-29 2022-11-29 Three-dimensional target detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211504429.5A CN115546785A (en) 2022-11-29 2022-11-29 Three-dimensional target detection method and device

Publications (1)

Publication Number Publication Date
CN115546785A true CN115546785A (en) 2022-12-30

Family

ID=84722303

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211504429.5A Pending CN115546785A (en) 2022-11-29 2022-11-29 Three-dimensional target detection method and device

Country Status (1)

Country Link
CN (1) CN115546785A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199206A (en) * 2019-12-30 2020-05-26 上海眼控科技股份有限公司 Three-dimensional target detection method and device, computer equipment and storage medium
CN112287939A (en) * 2020-10-29 2021-01-29 平安科技(深圳)有限公司 Three-dimensional point cloud semantic segmentation method, device, equipment and medium
CN114509785A (en) * 2022-02-16 2022-05-17 中国第一汽车股份有限公司 Three-dimensional object detection method, device, storage medium, processor and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199206A (en) * 2019-12-30 2020-05-26 上海眼控科技股份有限公司 Three-dimensional target detection method and device, computer equipment and storage medium
CN112287939A (en) * 2020-10-29 2021-01-29 平安科技(深圳)有限公司 Three-dimensional point cloud semantic segmentation method, device, equipment and medium
CN114509785A (en) * 2022-02-16 2022-05-17 中国第一汽车股份有限公司 Three-dimensional object detection method, device, storage medium, processor and system

Similar Documents

Publication Publication Date Title
CN111242041B (en) Laser radar three-dimensional target rapid detection method based on pseudo-image technology
CN113706480B (en) Point cloud 3D target detection method based on key point multi-scale feature fusion
CN102236794A (en) Recognition and pose determination of 3D objects in 3D scenes
CN108182695B (en) Target tracking model training method and device, electronic equipment and storage medium
WO2021017211A1 (en) Vehicle positioning method and device employing visual sensing, and vehicle-mounted terminal
CN114782499A (en) Image static area extraction method and device based on optical flow and view geometric constraint
CN111913177A (en) Method and device for detecting target object and storage medium
CN111739099B (en) Falling prevention method and device and electronic equipment
CN110569926A (en) point cloud classification method based on local edge feature enhancement
CN110992424B (en) Positioning method and system based on binocular vision
CN110636248B (en) Target tracking method and device
CN112405526A (en) Robot positioning method and device, equipment and storage medium
CN110826575A (en) Underwater target identification method based on machine learning
CN116246119A (en) 3D target detection method, electronic device and storage medium
CN115565072A (en) Road garbage recognition and positioning method and device, electronic equipment and medium
CN115240168A (en) Perception result obtaining method and device, computer equipment and storage medium
CN115546785A (en) Three-dimensional target detection method and device
CN115236672A (en) Obstacle information generation method, device, equipment and computer readable storage medium
CN112766100A (en) 3D target detection method based on key points
CN115994934B (en) Data time alignment method and device and domain controller
CN113963027B (en) Uncertainty detection model training method and device, and uncertainty detection method and device
CN117557599B (en) 3D moving object tracking method and system and storage medium
CN117576665B (en) Automatic driving-oriented single-camera three-dimensional target detection method and system
CN116977572B (en) Building elevation structure extraction method for multi-scale dynamic graph convolution
Ambata et al. Three-dimensional mapping system for environment and object detection on an unmanned aerial vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20221230