CN115546785A - Three-dimensional target detection method and device - Google Patents
Three-dimensional target detection method and device Download PDFInfo
- Publication number
- CN115546785A CN115546785A CN202211504429.5A CN202211504429A CN115546785A CN 115546785 A CN115546785 A CN 115546785A CN 202211504429 A CN202211504429 A CN 202211504429A CN 115546785 A CN115546785 A CN 115546785A
- Authority
- CN
- China
- Prior art keywords
- target
- detected
- points
- determining
- point cloud
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 89
- 238000000034 method Methods 0.000 claims abstract description 22
- 230000006870 function Effects 0.000 claims description 27
- 238000010606 normalization Methods 0.000 claims description 21
- 238000012545 processing Methods 0.000 claims description 21
- 238000000605 extraction Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 9
- 238000005516 engineering process Methods 0.000 abstract description 7
- 230000009466 transformation Effects 0.000 description 9
- 238000011176 pooling Methods 0.000 description 8
- 238000013139 quantization Methods 0.000 description 6
- 230000002776 aggregation Effects 0.000 description 5
- 238000004220 aggregation Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Abstract
The application discloses a three-dimensional target detection method and device. Wherein, the method comprises the following steps: acquiring an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected; normalizing a plurality of first points to be detected in the three-dimensional candidate area range to obtain a plurality of second points to be detected; extracting the features of the second points to be detected to obtain the target features of the original point cloud; and determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected. The method and the device solve the technical problem that the target size is uncertain due to the fact that the related technology cannot accurately judge the candidate area.
Description
Technical Field
The application relates to the technical field of automatic driving, in particular to a three-dimensional target detection method and device.
Background
Currently, automatic driving technology has gradually become the center of future vehicle driving development, and a crucial task in automatic driving technology is to estimate the state of surrounding objects in a complex real-world environment.
Generally, the related art employs three-dimensional object detection based on point cloud. In the detection process, in view of the irregularity of point cloud data distribution, a voxel segmentation method can be adopted, namely, the point cloud is converted into a voxel with hand-made characteristics to realize three-dimensional target detection, but the generalization capability of the hand-made characteristics limits the real performance of the point cloud in a complex real world; or the point cloud can also be converted into multiple views in two-dimensional space, but the problem of data loss is usually caused when the multiple views are constructed.
Therefore, no matter the point cloud is converted into a regular voxel grid, or the point cloud is projected onto a two-dimensional space, quantization errors are caused to limit the performance of the point cloud, so that the error of a predicted value is large, and the problem of size ambiguity is caused.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the application provides a three-dimensional target detection method and device, and aims to at least solve the technical problem of uncertainty of target size caused by the fact that a candidate area cannot be accurately judged in the related technology.
According to an aspect of an embodiment of the present application, there is provided a three-dimensional target detection method, including: acquiring an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected; carrying out normalization processing on a plurality of first detection points to be detected in the three-dimensional candidate area range to obtain a plurality of second detection points to be detected; extracting the features of the second points to be detected to obtain the target features of the original point cloud; and determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected.
Optionally, normalizing the plurality of first points to be detected within the three-dimensional candidate region range to obtain a plurality of second points to be detected includes: and establishing a normalized coordinate system by taking the center of the three-dimensional candidate area as an origin, taking the course direction as an x axis, taking the direction which is horizontally orthogonal to the course direction as a y axis, and taking the direction which is vertically upward as a z axis, and carrying out normalization processing on the plurality of first points to be detected to obtain a plurality of second points to be detected.
Optionally, the step of performing feature extraction on the second points to be detected to obtain target features of the original point cloud includes: acquiring a plurality of target sub-features of a plurality of second points to be detected; based on the plurality of target sub-features, a target feature of the original point cloud is determined.
Optionally, determining a target detection result of the target to be detected based on the target feature includes: calculating the classification loss of the target characteristics based on a classification loss function, and determining the target category of the target to be detected; calculating the regression loss of the target characteristics based on a regression loss function, and determining the target pose of the target to be detected; and determining a target detection result of the target to be detected according to the target category and the target pose.
Optionally, calculating a classification loss of the target feature based on the classification loss function, and determining a target class of the target to be detected includes: determining the number of positive samples in the original point cloud, wherein the positive samples are second points to be detected, and the intersection ratio of the second points to be detected is larger than a preset threshold value; determining the probability of the target category to which the original point cloud belongs by adopting a probability pre-estimation model; calculating the classification loss of the target features based on the number and probability of the positive samples; and determining the target category of the target to be detected according to the classification loss.
Optionally, a target anchor point corresponding to the target category is determined according to the target category of the target to be detected.
Optionally, calculating a regression loss of the target feature based on a regression loss function, and determining the target pose of the target to be detected includes: calculating the regression loss of the target size, the regression loss of the target position and the regression loss of the target direction according to the pose coordinate of the second point to be detected, wherein the pose coordinate comprises the following steps: pose information, size information, and orientation information; calculating a regression loss of the target feature based on the regression loss of the target size, the regression loss of the target position, and the regression loss of the target direction; and determining the target pose of the target to be detected according to the regression loss.
According to another aspect of the embodiments of the present application, there is also provided a three-dimensional target detection apparatus, including: the acquisition module is used for acquiring an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected; the processing module is used for carrying out normalization processing on a plurality of first detection points to be detected in the three-dimensional candidate area range to obtain a plurality of second detection points to be detected; the extraction module is used for extracting the features of the second points to be detected to obtain the target features of the original point cloud; the determining module is used for determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected.
According to another aspect of the embodiments of the present application, there is also provided a non-volatile storage medium including a stored program, wherein an apparatus in which the non-volatile storage medium is located executes the three-dimensional object detection method by running a computer program.
According to another aspect of the embodiments of the present application, there is also provided an electronic device, including: a memory in which a computer program is stored, and a processor configured to execute the above-described three-dimensional object detection method by the computer program.
In the embodiment of the application, the method comprises the steps of obtaining an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected; carrying out normalization processing on a plurality of first detection points to be detected in the three-dimensional candidate area range to obtain a plurality of second detection points to be detected; extracting the features of the second points to be detected to obtain the target features of the original point cloud; and determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected. Due to the irregularity of the original point cloud distribution, the further quantization error is avoided by carrying out normalization processing on the first point to be detected in the original point cloud, so that the technical problem of uncertainty of target size caused by the fact that a candidate area cannot be accurately judged in the related technology is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic flow chart of a three-dimensional target detection method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an alternative PointNet configuration according to embodiments of the present application;
fig. 3 is a schematic structural diagram of a three-dimensional object detection apparatus according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be implemented in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For a better understanding of the embodiments of the present application, some of the terms or expressions appearing in the course of describing the embodiments of the present application are to be interpreted as follows:
object Detection (OD) is a behavioral action to find and mark out a region of interest from a video or picture, and can identify and locate objects of a specific category by extracting features through an algorithm.
Example 1
Automatic driving is always a popular research direction in the vehicle field, and the current automatic driving systems can be divided into two types: unmanned driving focuses on full unmanned vehicle driving to save the Driver's labor cost, and ADAS (Advanced Driver Assistance System) focuses on assisting the Driver to reduce the Driver's stress in driving and improve the safety of the vehicle. In both of them, various sensors mounted on the vehicle are used for collecting data, and the data are combined with map data to perform system calculation, so that the type, the accurate position and the orientation of an obstacle are identified, and a reasonable driving route is guaranteed to be planned and the vehicle is controlled to safely reach a preset pose.
Autonomous vehicles typically use LiDAR (Light Detection And Ranging) And a plurality of cameras mounted at different poses of the vehicle to collect perception data, and then analyze And detect the collected visual data to locate targets such as lanes, vehicles And pedestrians. With the rapid development of deep learning and artificial intelligence technology, the capability of analyzing image data by a computer is remarkably improved compared with the traditional method.
In smart driving systems, mainstream sensor solutions include laser RADAR, cameras, and millimeter wave RADAR (RADAR). The laser radar has the advantages of three-dimensional modeling, wide detection range and high detection precision, so that the three-dimensional target detection based on the laser radar point cloud is a popular research direction, however, the current three-dimensional target detection based on the laser radar point cloud still has some problems, one is that data output by a laser radar sensor is sparse point cloud and does not have dense characteristics like an image output by a camera. The other is that the point cloud output by the laser radar is three-dimensional data, and the three-dimensional data is compressed to two-dimensional projection so as to reduce the high calculation amount of the three-dimensional laser radar data, but information loss is usually caused in the process.
In order to solve the above problems, an embodiment of a method for detecting a three-dimensional target is provided in the present application, and specifically, an unfiltered original point cloud is used to more accurately reflect a spatial position of a target to be detected; due to the irregularity of the distribution of the original point cloud, the further quantization error is avoided by carrying out normalization processing on the first point to be detected in the original point cloud; aiming at the problem of size ambiguity possibly existing in the target detection process, different anchor points are set for different types of targets, and the predicted value of the target detection result is refined.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a schematic flowchart of an alternative three-dimensional target detection method according to an embodiment of the present application, and as shown in fig. 1, the method at least includes steps S102-S108, where:
step S102, acquiring an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected.
Optionally, the original point cloud in the three-dimensional candidate region where the target to be detected is located may be obtained by using a laser radar, a 3D camera, or the like, where the laser radar may accurately measure distance information between the sensor and the obstacle, and provide rich geometric information, shape, and scale information, and therefore the preferred mode is the laser radar.
The obtained original point cloud can reflect the most accurate position information, often the point cloud can only reflect the characteristics of a plurality of first points to be detected, the space between the first points to be detected is ignored, and basic information such as the proportion of a target to be detected is encoded by the space, so that the quantization error of a predicted value is caused.
For this reason, the embodiment of the present application may obtain the refined predicted value through steps S104 to S108.
Step S104, carrying out normalization processing on a plurality of first points to be detected in the three-dimensional candidate area range to obtain a plurality of second points to be detected.
Optionally, a normalization coordinate system is established by taking the center of the three-dimensional candidate area as an origin, the heading direction as an x-axis, the direction horizontally orthogonal to the heading direction as a y-axis, and the vertical direction as a z-axis, and the plurality of first points to be detected are normalized to obtain a plurality of second points to be detected.
Specifically, in order to improve the recall rate and detection accuracy of target detection, the length and width of the three-dimensional candidate region bounding box may be expanded. In addition, because the coordinates of the first point to be detected are based on the pixel value of the original image of the target to be detected, the size of the target to be detected needs to be determined in advance during target detection, and if the original image is zoomed, the size information of the target to be detected cannot be accurately determinedConverting the coordinate system into a normalized coordinate system to obtain a second point to be detectedAnd establishing a normalized coordinate system by taking the course direction as an x axis, the horizontal orthogonal direction of the course direction as a y axis and the vertical direction as a z axis.
And S106, extracting the features of the second points to be detected to obtain the target features of the original point cloud.
Optionally, a plurality of target sub-features of a plurality of second points to be detected are obtained; based on the plurality of target sub-features, a target feature of the original point cloud is determined.
As an optional implementation manner, the PointNet is used to perform feature extraction on the second points to be detected to obtain target features of the original point cloud. PointNet consists of three fully-connected layers of multilayer perceptron (MLP) and max pooling operator for feature aggregation.
Wherein, two space transformation networks in the PointNet are respectively as follows: the input transformation network is used for aligning point clouds in a three-dimensional space, and simply comprehends that the pose is changed, so that the changed pose is more beneficial to classification, for example, a target to be detected is rotated to the front; and a feature transformation network for aligning the extracted 64-dimensional features, i.e. converting the point cloud at the feature level. The multi-layer perceptron is realized by convolution sharing weight, the size of a convolution kernel of the first layer is 1 × 3 (the dimensionality of each point is xyz), the size of each convolution kernel of the subsequent layer is 1 × 1, namely, a feature extraction layer only connects each point, 512-dimensional features are extracted from each point after passing through two space transformation networks and two multi-layer perceptrons, the features are changed into N × 512 global features through a maximum pooling operator, and probability scores of 256 categories are obtained through the last multi-layer perceptron; and finally, inputting target features of the original point cloud into two branches, wherein one branch is used for classifying the target to be detected, namely 256 × C +1, and the other branch is used for posture regression of the target to be detected, namely 256 × 7, wherein 7 represents 7 degrees of freedom, namely three-dimensional coordinates, dimensions and rotation angles.
Specifically, as shown in fig. 2, in order to eliminate the size ambiguity problem, multiple first points to be detected in the original point cloud are normalized, and multiple second points to be detected after normalization are used as input, where N represents the number of the second points to be detected, and 3 represents a dimension of a coordinate; extracting target sub-features of a plurality of second points to be detected from input data through a multilayer perceptron; and executing a maximum pooling operator on each dimension of the characteristic to obtain the target characteristic of the original point cloud.
And S108, determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises the target type and the target pose of the target to be detected.
As an optional implementation manner, the classification loss of the target features is calculated based on a classification loss function, and the target class of the target to be detected is determined; calculating the regression loss of the target characteristics based on a regression loss function, and determining the target pose of the target to be detected; and determining a target detection result of the target to be detected according to the target category and the target pose.
Optionally, determining the number of positive samples in the original point cloud, wherein the positive samples are second points to be detected, and the intersection ratio of the second points to be detected is greater than a preset threshold; determining the probability of the target category to which the original point cloud belongs by adopting a probability pre-estimation model; calculating the classification loss of the target features based on the number and the probability of the positive samples; and determining the target category of the target to be detected according to the classification loss.
For example, by calculating the intersection ratio of the second points to be detected, the second points to be detected larger than the preset threshold are determined as positive samples, and the second points to be detected smaller than the preset threshold are determined as negative samples, wherein different preset thresholds are set for different types of targets, and the setting of the preset thresholds can be set according to actual conditions. The probability of the target category to which the predicted original point cloud belongs is output through PointNet, and the classification loss is calculated through a classification loss function, wherein the classification loss function can be a cross entropy loss function, and the expression of the classification loss function is as follows:
wherein B is the number of all the second points to be detected in the original point cloud,is the true category of the object to be detected,and outputting the probability of the target category to which the predicted original point cloud belongs for the PointNet.The smaller the uncertainty, the better the classification result.
It should be noted that the target anchor point corresponding to the target type may be determined according to the target type of the target to be detected.
Optionally, the regression loss of the target size, the regression loss of the target position, and the regression loss of the target direction are calculated according to the pose coordinates of the second point to be detected, where the pose coordinates include: pose information, size information, and orientation information; calculating regression loss of the target feature based on the regression loss of the target size, the regression loss of the target position and the regression loss of the target direction; and determining the target pose of the target to be detected according to the regression loss.
For example, for a positive sample, the pose coordinates of the first point to be detected are first setAnd converting the position and pose coordinates of the first point to be detected into a normalized coordinate system to obtain the position and pose coordinates of the second point to be detectedWherein, the normalization transformation can be expressed by the following formula:
calculating the regression loss of the target size by a regression loss function:
calculating the regression loss of the target location by a regression loss function:
specifically, the pass formula is determinedCalculating the minimum residual error of the target to be detected in the original direction and the overturning direction, and determining the regression loss of the target direction according to the minimum residual error, wherein when the minimum residual error is detected, the regression loss of the target direction is determinedWhen the utility model is used, the water is discharged,(ii) a When in useWhen the utility model is used, the water is discharged,。
through the above-described calculation of the regression loss of the target size, the regression loss of the target position, and the regression loss of the target direction, the regression loss of the target feature may be calculated by a regression loss function, wherein the regression loss function may be a stable Smooth L1, wherein,
and finally, determining the target pose of the target to be detected according to the regression loss of the target characteristics.
The overall loss function can also be determined by a weighted average of the classification loss and the regression loss:
wherein, the first and the second end of the pipe are connected with each other,andthe weighting parameters can be set according to actual conditions. Integral loss functionThe smaller the calculated value of (A), the smaller the inconsistency degree between the real value and the predicted value is, namely the predicted value is closer to the real value.
In the embodiment of the application, the method comprises the steps of obtaining an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected; carrying out normalization processing on a plurality of first detection points to be detected in the three-dimensional candidate area range to obtain a plurality of second detection points to be detected; extracting the features of the second points to be detected to obtain the target features of the original point cloud; and determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected. Due to the fact that the distribution of the original point cloud is irregular, the first point to be detected in the original point cloud is subjected to normalization processing, further quantization errors are avoided, and the technical problem that the target size is uncertain due to the fact that a candidate area cannot be accurately judged in the related technology is solved.
Example 2
According to an embodiment of the present application, there is also provided a three-dimensional object detection apparatus for implementing the three-dimensional object detection method, as shown in fig. 3, the apparatus at least includes an obtaining module 31, a processing module 32, an extracting module 33, and a determining module 34, where:
the acquiring module 31 is configured to acquire an original point cloud in a three-dimensional candidate region where the target to be detected is located, where the original point cloud includes a plurality of first points to be detected.
Optionally, the original point cloud in the three-dimensional candidate area where the target to be detected is located may be obtained by using a laser radar, a 3D camera, or the like, where the laser radar may accurately measure the distance information between the sensor and the obstacle, and provide rich geometric information, shape and scale information, and therefore the preferred mode is the laser radar.
The obtained original point cloud can reflect the most accurate position information, often the point cloud can only reflect the characteristics of a plurality of first points to be detected, the space between the first points to be detected is ignored, and basic information such as the proportion of a target to be detected is encoded by the space, so that the quantization error of a predicted value is caused.
For this reason, the embodiment of the present application may obtain the refined predicted value through the following processing module 32, the extraction module 33, and the determination module 34.
The processing module 32 is configured to perform normalization processing on the plurality of first points to be detected within the three-dimensional candidate region range to obtain a plurality of second points to be detected.
Optionally, a normalization coordinate system is established by taking the center of the three-dimensional candidate area as an origin, the heading direction as an x-axis, the direction horizontally orthogonal to the heading direction as a y-axis, and the vertical direction as a z-axis, and the plurality of first points to be detected are normalized to obtain a plurality of second points to be detected.
Specifically, in order to increase the length and width of the three-dimensional candidate region bounding box by improving the recall rate of target detection and the detection accuracy. In addition, because the coordinates of the first point to be detected are based on the pixel value of the original image of the target to be detected, the size of the target to be detected needs to be determined in advance during target detection, and if the original image is zoomed, the size information of the target to be detected cannot be accurately determinedConverting the coordinate system into a normalized coordinate system to obtain a second point to be detectedAnd establishing a normalized coordinate system by taking the course direction as an x axis, the horizontal orthogonal direction of the course direction as a y axis and the vertical direction as a z axis.
And the extraction module 33 is configured to perform feature extraction on the plurality of second points to be detected to obtain target features of the original point cloud.
Selectively, acquiring a plurality of target sub-features of a plurality of second points to be detected; based on the plurality of target sub-features, target features of the original point cloud are determined.
As an optional implementation manner, the PointNet is used to perform feature extraction on the second points to be detected to obtain target features of the original point cloud. PointNet consists of three fully-connected layers of multilayer perceptron (MLP) and max pooling operator for feature aggregation.
Wherein, two space transformation networks in the PointNet are respectively as follows: the input transformation network is used for aligning point clouds in a three-dimensional space, and simply comprehends that the pose is changed, so that the changed pose is more beneficial to classification, for example, a target to be detected is rotated to the front; and the feature transformation network is used for aligning the extracted 64-dimensional features, namely converting the point cloud at a feature layer. The multi-layer perceptron is realized by convolution of shared weight, the size of a convolution kernel of a first layer is 1 x 3 (the dimension of each point is xyz), the size of each convolution kernel of the subsequent layer is 1 x 1, namely, a feature extraction layer only connects each point, 1024-dimensional features are extracted from each point after passing through two space transformation networks and two multi-layer perceptrons, the overall features are changed into 1 x 1024 by a maximum pooling operator, and probability scores of k categories are obtained by the last multi-layer perceptron; and finally, inputting the target characteristics of the original point cloud into two branches, wherein one branch is used for classifying the target to be detected, and the other branch is used for returning the pose of the target to be detected.
The determining module 34 is configured to determine a target detection result of the target to be detected based on the target feature, where the target detection result includes a target category and a target pose of the target to be detected.
As an optional implementation manner, the classification loss of the target features is calculated based on a classification loss function, and the target class of the target to be detected is determined; calculating the regression loss of the target characteristics based on a regression loss function, and determining the target pose of the target to be detected; and determining a target detection result of the target to be detected according to the target category and the target pose.
Optionally, determining the number of positive samples in the original point cloud, wherein the positive samples are second points to be detected, and the intersection ratio of the second points to be detected is greater than a preset threshold; determining the probability of the target category to which the original point cloud belongs by adopting a probability pre-estimation model; calculating the classification loss of the target features based on the number and the probability of the positive samples; and determining the target category of the target to be detected according to the classification loss.
It should be noted that the target anchor point corresponding to the target type may be determined according to the target type of the target to be detected.
Optionally, the regression loss of the target size, the regression loss of the target position, and the regression loss of the target direction are calculated according to the pose coordinates of the second point to be detected, where the pose coordinates include: pose information, size information, and orientation information; calculating a regression loss of the target feature based on the regression loss of the target size, the regression loss of the target position, and the regression loss of the target direction; and determining the target pose of the target to be detected according to the regression loss.
It should be noted that, in the embodiment of the present application, each module in the three-dimensional object detection device corresponds to each implementation step of the three-dimensional object detection method in embodiment 1 one to one, and since the detailed description is already performed in embodiment 1, details that are not partially represented in this embodiment may refer to embodiment 1, and are not described herein again.
Example 3
According to an embodiment of the present application, there is also provided a nonvolatile storage medium including a stored program, wherein, when the program runs, a device in which the nonvolatile storage medium is located is controlled to execute the three-dimensional object detection method in embodiment 1.
Optionally, the apparatus in which the non-volatile storage medium is located controls the computer program to execute the following steps: acquiring an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected; carrying out normalization processing on a plurality of first detection points to be detected in the three-dimensional candidate area range to obtain a plurality of second detection points to be detected; extracting the features of the second points to be detected to obtain target features of the original point cloud; and determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected.
Optionally, a normalization coordinate system is established by taking the center of the three-dimensional candidate area as an origin, the heading direction as an x-axis, the direction horizontally orthogonal to the heading direction as a y-axis, and the vertical direction as a z-axis, and the plurality of first points to be detected are normalized to obtain a plurality of second points to be detected.
As an optional implementation manner, the PointNet is used to perform feature extraction on the second points to be detected to obtain target features of the original point cloud. PointNet consists of three fully-connected-Layer multi-Layer perceptrons (MLPs) and a max-pooling operator for feature aggregation.
As an optional implementation manner, the classification loss of the target features is calculated based on a classification loss function, and the target class of the target to be detected is determined; calculating the regression loss of the target characteristics based on a regression loss function, and determining the target pose of the target to be detected; and determining a target detection result of the target to be detected according to the target category and the target pose.
Optionally, determining the number of positive samples in the original point cloud, wherein the positive samples are second points to be detected, and the intersection ratio of the second points to be detected is greater than a preset threshold; determining the probability of the target category to which the original point cloud belongs by adopting a probability pre-estimation model; calculating the classification loss of the target features based on the number and probability of the positive samples; and determining the target category of the target to be detected according to the classification loss.
Optionally, a target anchor point corresponding to the target category is determined according to the target category of the target to be detected.
Example 4
According to an embodiment of the present application, there is also provided a processor configured to execute a program, where the program executes the three-dimensional object detection method in embodiment 1 when running.
Optionally, the apparatus in which the non-volatile storage medium is controlled during program execution performs the following steps: acquiring an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected; carrying out normalization processing on a plurality of first detection points to be detected in the three-dimensional candidate area range to obtain a plurality of second detection points to be detected; extracting the features of the second points to be detected to obtain the target features of the original point cloud; and determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected.
Optionally, a normalization coordinate system is established by taking the center of the three-dimensional candidate area as an origin, the heading direction as an x-axis, the direction horizontally orthogonal to the heading direction as a y-axis, and the vertical direction as a z-axis, and the plurality of first points to be detected are normalized to obtain a plurality of second points to be detected.
As an optional implementation manner, the PointNet is used to perform feature extraction on the second points to be detected to obtain target features of the original point cloud. PointNet consists of three fully-connected-Layer multi-Layer perceptrons (MLPs) and a max-pooling operator for feature aggregation.
As an optional implementation manner, the classification loss of the target features is calculated based on a classification loss function, and the target class of the target to be detected is determined; calculating the regression loss of the target characteristics based on a regression loss function, and determining the target pose of the target to be detected; and determining a target detection result of the target to be detected according to the target category and the target pose.
Optionally, determining the number of positive samples in the original point cloud, wherein the positive samples are second points to be detected, and the intersection ratio of the second points to be detected is greater than a preset threshold; determining the probability of the target category to which the original point cloud belongs by adopting a probability pre-estimation model; calculating the classification loss of the target features based on the number and probability of the positive samples; and determining the target category of the target to be detected according to the classification loss.
Optionally, a target anchor point corresponding to the target category is determined according to the target category of the target to be detected.
Example 5
According to an embodiment of the present application, there is also provided an electronic device including: a memory in which a computer program is stored, and a processor configured to execute the three-dimensional object detection method in embodiment 1 by the computer program.
Optionally, the apparatus in which the non-volatile storage medium is controlled during program execution performs the following steps: acquiring an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected; normalizing a plurality of first points to be detected in the three-dimensional candidate area range to obtain a plurality of second points to be detected; extracting the features of the second points to be detected to obtain the target features of the original point cloud; and determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected.
Optionally, a normalization coordinate system is established by taking the center of the three-dimensional candidate area as an origin, the heading direction as an x-axis, the direction horizontally orthogonal to the heading direction as a y-axis, and the vertical direction as a z-axis, and the plurality of first points to be detected are normalized to obtain a plurality of second points to be detected.
As an optional implementation manner, the PointNet is used to perform feature extraction on the second points to be detected to obtain target features of the original point cloud. PointNet consists of three fully-connected-Layer multi-Layer perceptrons (MLPs) and a max-pooling operator for feature aggregation.
As an optional implementation manner, the classification loss of the target features is calculated based on a classification loss function, and the target class of the target to be detected is determined; calculating the regression loss of the target characteristics based on a regression loss function, and determining the target pose of the target to be detected; and determining a target detection result of the target to be detected according to the target category and the target pose.
Optionally, determining the number of positive samples in the original point cloud, wherein the positive samples are second points to be detected, and the intersection ratio of the second points to be detected is greater than a preset threshold; determining the probability of the target category to which the original point cloud belongs by adopting a probability pre-estimation model; calculating the classification loss of the target features based on the number and the probability of the positive samples; and determining the target category of the target to be detected according to the classification loss.
Optionally, a target anchor point corresponding to the target category is determined according to the target category of the target to be detected.
The above-mentioned serial numbers of the embodiments of the present application are merely for description, and do not represent the advantages and disadvantages of the embodiments.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technical content can be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, or portions or all or portions of the technical solutions that contribute to the prior art, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present application and it should be noted that, as will be apparent to those skilled in the art, numerous modifications and adaptations can be made without departing from the principles of the present application and such modifications and adaptations are intended to be considered within the scope of the present application.
Claims (10)
1. A three-dimensional target detection method is characterized by comprising the following steps:
acquiring an original point cloud in a three-dimensional candidate area where a target to be detected is located, wherein the original point cloud comprises a plurality of first points to be detected;
normalizing the first detection points to be detected in the three-dimensional candidate area range to obtain a plurality of second detection points to be detected;
extracting the features of the second points to be detected to obtain the target features of the original point cloud;
and determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected.
2. The method according to claim 1, wherein normalizing the plurality of first points to be detected within the three-dimensional candidate region to obtain a plurality of second points to be detected comprises:
and establishing a normalized coordinate system by taking the center of the three-dimensional candidate area as an origin, taking the course direction as an x axis, taking the direction horizontally orthogonal to the course direction as a y axis, and taking the vertical direction as a z axis, and performing normalization processing on the plurality of first points to be detected to obtain a plurality of second points to be detected.
3. The method according to claim 1, wherein performing feature extraction on the second points to be detected to obtain target features of the original point cloud comprises:
acquiring a plurality of target sub-features of a plurality of second points to be detected;
determining a target feature of the original point cloud based on a plurality of the target sub-features.
4. The method according to claim 1, wherein determining the target detection result of the target to be detected based on the target feature comprises:
calculating the classification loss of the target characteristics based on a classification loss function, and determining the target category of the target to be detected;
calculating regression loss of the target characteristics based on a regression loss function, and determining the target pose of the target to be detected;
and determining a target detection result of the target to be detected according to the target category and the target pose.
5. The method according to claim 4, wherein the step of calculating the classification loss of the target features based on a classification loss function and determining the target class of the target to be detected comprises:
determining the number of positive samples in the original point cloud, wherein the positive samples are second points to be detected, and the intersection ratio of the second points to be detected is larger than a preset threshold value;
determining the probability of the target category to which the original point cloud belongs by adopting a probability pre-estimation model;
calculating a classification loss of the target feature based on the number of positive samples and the probability;
and determining the target category of the target to be detected according to the classification loss.
6. The method of claim 5, comprising:
and determining a target anchor point corresponding to the target category according to the target category of the target to be detected.
7. The method according to claim 2, wherein calculating the regression loss of the target feature based on a regression loss function, and determining the target pose of the target to be detected comprises:
calculating the regression loss of the target size, the regression loss of the target position and the regression loss of the target direction according to the pose coordinate of the second point to be detected, wherein the pose coordinate comprises the following steps: pose information, size information, and orientation information;
calculating a regression loss for the target feature based on the regression loss for the target dimension, the regression loss for the target location, and the regression loss for the target orientation;
and determining the target pose of the target to be detected according to the regression loss.
8. A three-dimensional object detecting device, comprising:
the system comprises an acquisition module, a detection module and a processing module, wherein the acquisition module is used for acquiring an original point cloud in a three-dimensional candidate region where a target to be detected is located, and the original point cloud comprises a plurality of first points to be detected;
the processing module is used for carrying out normalization processing on the first detection points to be detected in the three-dimensional candidate area range to obtain a plurality of second detection points to be detected;
the extraction module is used for extracting the features of the second points to be detected to obtain the target features of the original point cloud;
and the determining module is used for determining a target detection result of the target to be detected based on the target characteristics, wherein the target detection result comprises a target category and a target pose of the target to be detected.
9. A non-volatile storage medium, characterized in that the non-volatile storage medium includes a stored program, wherein a device on which the non-volatile storage medium resides executes the three-dimensional object detection method according to any one of claims 1 to 7 by running the computer program.
10. An electronic device, comprising: a memory in which a computer program is stored and a processor configured to execute the three-dimensional object detection method of any one of claims 1 to 7 by the computer program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211504429.5A CN115546785A (en) | 2022-11-29 | 2022-11-29 | Three-dimensional target detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211504429.5A CN115546785A (en) | 2022-11-29 | 2022-11-29 | Three-dimensional target detection method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115546785A true CN115546785A (en) | 2022-12-30 |
Family
ID=84722303
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211504429.5A Pending CN115546785A (en) | 2022-11-29 | 2022-11-29 | Three-dimensional target detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115546785A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111199206A (en) * | 2019-12-30 | 2020-05-26 | 上海眼控科技股份有限公司 | Three-dimensional target detection method and device, computer equipment and storage medium |
CN112287939A (en) * | 2020-10-29 | 2021-01-29 | 平安科技(深圳)有限公司 | Three-dimensional point cloud semantic segmentation method, device, equipment and medium |
CN114509785A (en) * | 2022-02-16 | 2022-05-17 | 中国第一汽车股份有限公司 | Three-dimensional object detection method, device, storage medium, processor and system |
-
2022
- 2022-11-29 CN CN202211504429.5A patent/CN115546785A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111199206A (en) * | 2019-12-30 | 2020-05-26 | 上海眼控科技股份有限公司 | Three-dimensional target detection method and device, computer equipment and storage medium |
CN112287939A (en) * | 2020-10-29 | 2021-01-29 | 平安科技(深圳)有限公司 | Three-dimensional point cloud semantic segmentation method, device, equipment and medium |
CN114509785A (en) * | 2022-02-16 | 2022-05-17 | 中国第一汽车股份有限公司 | Three-dimensional object detection method, device, storage medium, processor and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111242041B (en) | Laser radar three-dimensional target rapid detection method based on pseudo-image technology | |
CN113706480B (en) | Point cloud 3D target detection method based on key point multi-scale feature fusion | |
CN102236794A (en) | Recognition and pose determination of 3D objects in 3D scenes | |
CN108182695B (en) | Target tracking model training method and device, electronic equipment and storage medium | |
WO2021017211A1 (en) | Vehicle positioning method and device employing visual sensing, and vehicle-mounted terminal | |
CN114782499A (en) | Image static area extraction method and device based on optical flow and view geometric constraint | |
CN111913177A (en) | Method and device for detecting target object and storage medium | |
CN111739099B (en) | Falling prevention method and device and electronic equipment | |
CN110569926A (en) | point cloud classification method based on local edge feature enhancement | |
CN110992424B (en) | Positioning method and system based on binocular vision | |
CN110636248B (en) | Target tracking method and device | |
CN112405526A (en) | Robot positioning method and device, equipment and storage medium | |
CN110826575A (en) | Underwater target identification method based on machine learning | |
CN116246119A (en) | 3D target detection method, electronic device and storage medium | |
CN115565072A (en) | Road garbage recognition and positioning method and device, electronic equipment and medium | |
CN115240168A (en) | Perception result obtaining method and device, computer equipment and storage medium | |
CN115546785A (en) | Three-dimensional target detection method and device | |
CN115236672A (en) | Obstacle information generation method, device, equipment and computer readable storage medium | |
CN112766100A (en) | 3D target detection method based on key points | |
CN115994934B (en) | Data time alignment method and device and domain controller | |
CN113963027B (en) | Uncertainty detection model training method and device, and uncertainty detection method and device | |
CN117557599B (en) | 3D moving object tracking method and system and storage medium | |
CN117576665B (en) | Automatic driving-oriented single-camera three-dimensional target detection method and system | |
CN116977572B (en) | Building elevation structure extraction method for multi-scale dynamic graph convolution | |
Ambata et al. | Three-dimensional mapping system for environment and object detection on an unmanned aerial vehicle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20221230 |