Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
In the related art, a point cloud is usually used to detect a target, so as to guide an airplane, an automobile, a robot, and the like to avoid an obstacle and plan a path. Especially for intelligent driving, a laser radar sensor is usually equipped to detect a target through a laser point cloud output by the laser radar sensor.
However, the above-mentioned target detection using point cloud usually employs a clustering algorithm. Clustering is to collect similar elements, so that the accuracy of a clustering result is often low, the error rate of the target detection result is high, and subsequently, airplanes, automobiles, robots and the like cannot be guided to avoid obstacles and plan paths.
In order to solve the technical problem, the embodiment provides a target detection method, which determines a point cloud vector through the number of points in a point cloud to be processed and the attribute of each point; and then, extracting target three-dimensional point cloud features of the point cloud vector by using a three-dimensional convolution neural network, and determining target information in the point cloud to be processed according to the extracted target three-dimensional point cloud features, wherein the target information comprises geometric position information of each dimension of a target in a three-dimensional coordinate system.
The method has the advantages that the target three-dimensional point cloud feature extraction is carried out on the point cloud vector by utilizing the three-dimensional convolution neural network, the three-dimensional convolution is used for guaranteeing the retention of height information when the three-dimensional point cloud feature extraction is carried out, further the three-dimensional structure information of the point cloud is retained to the maximum extent, the target information in the point cloud to be processed is determined according to the extracted target three-dimensional point cloud feature, the three-dimensional target in the point cloud can be accurately found out, and the problems that the error rate of the existing target detection result is high, and the follow-up guidance of airplanes, automobiles, robots and the like for.
Fig. 1 is a schematic structural diagram of a target detection system according to an embodiment of the present disclosure. As shown in fig. 1, includes: a sensor 101, a first processor 102 and a second processor 103. Taking the target as a vehicle as an example, the sensor 101 may generate a point cloud to be processed in real time, wherein the point cloud is used to identify the ground of the vehicle surroundings. The first processor 102 may determine a point cloud vector including geometric position information of each dimension of a point in a three-dimensional coordinate system by combining the number of points in the point cloud generated by the sensor 101 and an attribute of each point, process the point cloud vector by using a three-dimensional convolutional neural network to extract a target three-dimensional point cloud feature of the point cloud vector, determine target information in the point cloud according to the target three-dimensional point cloud feature, where the target information includes geometric position information of each dimension of a target in the three-dimensional coordinate system, and send the target information to the second processor 103 for subsequent driving planning and use.
Here, the first processor 102 and the second processor 103 may be a vehicle computing platform, an unmanned aerial vehicle processor, or the like. The present embodiment does not particularly limit the implementation manner of the first processor 102 and the second processor 103, as long as the first processor 102 and the second processor 103 can perform the above-described corresponding functions.
It should be understood that the above architecture is only an exemplary system architecture block diagram, and in particular, the architecture may be configured according to application requirements, for example, the first processor 102 and the second processor 103 may be separately configured or may be combined together to meet different application requirements.
In addition, the target detection system may further include a receiving device, a display device, and the like.
In a specific implementation process, the receiving device may be an input/output interface or a communication interface. The receiving means may receive an instruction from a user, for example, the receiving means may be an input interface connected to a mouse.
The display device may be configured to display the target information. The display device can also be a touch display screen, and is used for receiving a user instruction while displaying the target information so as to realize interaction with a user.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 2 is a schematic flowchart of a target detection method according to an embodiment of the present application, where an execution main body of the embodiment may be the first processor in the embodiment shown in fig. 1, and as shown in fig. 2, the method includes:
s201, determining a point cloud vector according to the number of points in the point cloud to be processed and the attribute of each point, wherein the point cloud vector comprises the geometric position information of each dimension of the points in a three-dimensional coordinate system.
The point cloud can be an image point cloud, a radar point cloud, a laser point cloud and the like, and one or more point clouds can be adopted in subsequent processing according to actual conditions.
Here, as described above, before determining the point cloud vector according to the number of points and the attribute of each point in the point cloud to be processed, the point cloud to be processed may be acquired by a sensor, and specifically, the acquisition range may be limited in a three-dimensional space, for example, F meters before the sensor is acquired, B meters after the sensor is acquired, L, R meters to the left and right of the sensor, U meters above the sensor, and D meters below the sensor. Thus, the processing range of the point cloud in the whole three-dimensional space is limited to the range of (F + B) x (L + R) x (U + D). The value of F, B, L, R, U, D can be set according to actual conditions.
After the point cloud to be processed is obtained, a point cloud vector can be determined according to the number of points in the point cloud to be processed and the attributes of each point.
Optionally, the determining a point cloud vector according to the number of points in the point cloud to be processed and the attribute of each point includes:
carrying out mesh division on the point cloud to be processed;
determining a vector of each divided grid according to the number of points in each divided grid and the attribute of each point, wherein the attribute of each point comprises a three-dimensional coordinate and a reflectivity of the point in a three-dimensional coordinate system;
and determining the point cloud vector according to the vector of each divided grid.
Specifically, the determining the vector of each divided grid according to the number of points in each divided grid and the attribute of each point includes:
adjusting the number of points in each divided grid according to a preset number of points;
and determining the vector of each divided grid according to the product of the number of points in each grid and the attribute of each point after adjustment.
Illustratively, the grid is divided along each coordinate axis of the point cloud coordinate system, the forward direction (X axis) is divided every resx meters, the left-right direction (Y axis) is divided every resy meters, and the upward direction (Z axis) is divided every resz meters. Thus, for the entire three-dimensional space, T ═ ((F + B)/resx) ((L + R)/resy) ((U + D)/resz) small grids are divided, and each small grid is a small rectangular parallelepiped of resx × resy × resz. The values of resx, resy, and resz may be set according to actual conditions.
In addition, all the points are divided according to the range of the grids, the grid to which each point belongs is judged, the number N of the laser points contained in any grid can be limited, when the number of the laser points is larger than N, random sampling can be carried out, the N points can be obtained, and redundant laser points are discarded. When the number of points is less than N, random copy can be carried out to reach N points, so that all the small grids contain the same number of points. The value of the number N can be set according to actual conditions.
Thus, by the above processing, the entire point cloud is represented by a vector of K × N × 4, where K represents the number of meshes whose number of points is not 0. N is the maximum number of points per grid. 4 denotes that each point has a 4-dimensional attribute, being xyz coordinate and reflectivity, respectively. It should be understood that in addition to using K x N x 4 vectors to represent the point clouds, other information may be added to the point cloud vectors, such as density, height, etc. information.
The whole three-dimensional space is divided into quantitative small grids, then the characteristic vector of each grid is determined, namely, the point cloud is preprocessed (point cloud coding) in a certain structuring mode, and the original structure information of the point cloud is reserved to the maximum extent.
S202, processing the point cloud vector by utilizing a three-dimensional convolution neural network to extract a target three-dimensional point cloud feature of the point cloud vector.
Here, for feature extraction, in order to retain three-dimensional structure information, the embodiment of the present application employs three-dimensional convolution to extract features from a three-dimensional space, and retains spatial structure information.
Optionally, the three-dimensional convolutional neural network comprises a third convolutional neural network.
The processing the point cloud vector by using a three-dimensional convolution neural network to extract a target three-dimensional point cloud feature of the point cloud vector comprises:
and performing three-dimensional grid feature extraction on the vector of each divided grid by using the third convolutional neural network.
And the third convolutional neural network is obtained by training a three-dimensional grid vector and three-dimensional grid characteristics.
After the point cloud is preprocessed in a certain structured mode, the feature extraction is performed on each small grid by using a three-dimensional convolutional neural network, and the feature extraction can be realized by using a convolutional layer, an upsampling layer and a full-connected layer, namely, the vector of K x N x 4 is input, and after the vector passes through a series of convolutional layers, upsampling layers and full-connected layers, the obtained feature vector is K x C, wherein the feature corresponding to each small grid is C-dimensional. For all grids in the three-dimensional space, the corresponding feature of each grid is C-dimensional, and if the grid has no point, the feature is a zero vector of C-dimensional. If the total number of small grids is X1 × Y1 × Z1, the resulting feature vector is X1 × Y1 × Z1 × C, and the structured data is obtained, which can be directly characterized using convolution operations.
Optionally, the three-dimensional convolutional neural network further comprises a fourth convolutional neural network.
After the three-dimensional grid feature extraction is performed on the vector of each divided grid by using the third convolutional neural network, the method further includes:
and extracting the target three-dimensional point cloud characteristics of the extracted three-dimensional grid characteristics by using the fourth convolutional neural network.
And the fourth convolutional neural network is obtained by training three-dimensional grid characteristics and three-dimensional point cloud characteristics.
For example, if two-dimensional convolution is used, the feature vector reshape needs to be X1 × Y1 (Z1 × C), where Z1 × C is denoted as the feature dimension, and when two-dimensional convolution is used, the height dimension (Z direction) needs to be merged with the feature channel dimension, and then certain height information is lost. According to the method and the device, the three-dimensional convolution is used, the structural information of all directions can be kept, and the spatial information of the original point cloud is kept to the maximum extent. After a series of three-dimensional convolution operations, the final feature vector is obtained, and is denoted as X2 × Y2 × Z2 × C2.
In addition, the three-dimensional convolutional neural network may include:
a plurality of convolution layers for performing convolution operation on the point cloud vector to output three-dimensional point cloud characteristics;
the upper sampling layer is connected with at least one of the plurality of convolution layers and is used for acquiring three-dimensional point cloud characteristics output by at least one of the plurality of convolution layers and processing the acquired three-dimensional point cloud characteristics to output the processed three-dimensional point cloud characteristics;
and the full-connection layer is connected with the convolution layers in the plurality of convolution layers and the upper sampling layer and is used for acquiring the three-dimensional point cloud characteristics output by the convolution layers and the processed three-dimensional point cloud characteristics, performing characteristic fusion on the acquired three-dimensional point cloud characteristics and the processed three-dimensional point cloud characteristics to generate fused three-dimensional point cloud characteristics, inputting the fused three-dimensional point cloud characteristics into another convolution layer, and determining the target three-dimensional point cloud characteristics after convolution operation of the other convolution layer.
Optionally, the plurality of convolutional layers have different depths;
the full-connection layer is connected with a first convolution layer, a second convolution layer and the upper sampling layer in the convolution layers to obtain three-dimensional point cloud characteristics output by the first convolution layer and processed three-dimensional point cloud characteristics, feature fusion is carried out on the three-dimensional point cloud characteristics output by the first convolution layer and the processed three-dimensional point cloud characteristics to generate fused three-dimensional point cloud characteristics, the fused three-dimensional point cloud characteristics are input into the second convolution layer to determine target three-dimensional point cloud characteristics after convolution operation of the second convolution layer, and the depth of the second convolution layer is larger than that of the first convolution layer.
Optionally, the number of the full connection layers is multiple.
Optionally, the number of the upsampling layers is multiple.
Illustratively, as shown in fig. 3, the three-dimensional convolutional neural network includes a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a first upsampling layer, a first fully-connected layer, a fifth convolutional layer, a second upsampling layer, a second fully-connected layer, and a sixth convolutional layer, wherein the activation functions of the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer, and the sixth convolutional layer all use Relu.
The first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer sequentially extract three-dimensional features of the extracted three-dimensional point cloud, the first up-sampling layer is used for up-sampling the three-dimensional features extracted by the fourth convolution layer according to a first preset spatial resolution, the first full-connection layer is used for performing feature fusion on the three-dimensional features up-sampled by the first up-sampling layer and the three-dimensional features extracted by the third convolution layer, the fifth convolution layer is used for performing three-dimensional feature extraction on the features fused by the first full-connection layer, the second up-sampling layer is used for up-sampling the three-dimensional features extracted by the fifth convolution layer according to a second preset spatial resolution, the second full-connection layer is used for performing feature fusion on the three-dimensional features up-sampled by the second up-sampling layer and the three-dimensional features extracted by the second convolution layer, and the sixth convolution layer is used for performing three-dimensional feature extraction on the features fused by the second full-connection layer, and determining the characteristics of the target three-dimensional point cloud.
Wherein the plurality of convolutional layers have different depths. The depths of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer are increased in sequence.
The up-sampling layer can enlarge the image, increase the information of the image, for example, the convolution layer reduces the resolution of the image, and the quality of the up-sampled image can exceed the quality of the original image after up-sampling by the up-sampling layer, thereby facilitating the subsequent processing.
The full-connection layer realizes splicing and fusion of data, data output by a plurality of convolution layers (some data are sampled by an upper sampling layer) are respectively input into different full-connection layers, and the characteristics of a deep layer and a shallow layer are fused.
S203, determining target information in the point cloud to be processed according to the target three-dimensional point cloud characteristics, wherein the target information comprises geometric position information of each dimension of a target in a three-dimensional coordinate system.
The point clouds to be processed comprise point clouds to be processed corresponding to the target. Here, the point cloud may be encoded for input to a neural network, and features of the point cloud may be learned through the neural network, and directly used for prediction of a three-dimensional object. The prediction is intensive on a neural network characteristic diagram, and a final detection result can be obtained through end-to-end learning.
Optionally, the determining target information in the point cloud to be processed according to the target three-dimensional point cloud feature includes:
and determining the coordinates of the central point, the three-dimensional size and the yaw angle of the target in the point cloud to be processed in a three-dimensional coordinate system based on the characteristics of the target three-dimensional point cloud by utilizing a first convolution neural network.
The first convolution neural network is obtained through point cloud characteristics and training of a center point coordinate, a three-dimensional size and an aircraft yaw angle of a target in a three-dimensional coordinate system.
Here, for each scene point cloud, the features of the scene point cloud are first extracted, and then target information is determined based on the features by using a convolutional neural network, which can be expressed by 7 parameters, (x, y, z) representing the coordinates of the center point thereof, (l, h, w) representing the three-dimensional size thereof, and r representing the yaw angle thereof. The target information can be represented by the coordinates of the corner points, the coordinates of the bottom surface rectangle, the height and the like besides the manner of selecting the center point, the three-dimensional size and the yaw angle.
Specifically, the determining the target information in the point cloud to be processed according to the target three-dimensional point cloud feature includes:
and determining the target information according to the effective points in the target three-dimensional point cloud characteristics.
Optionally, before the determining the target information according to the valid point in the target three-dimensional point cloud feature, the method further includes:
obtaining the Euclidean distance of each point in the target three-dimensional point cloud characteristic from the central point of the grid where the point is located;
and determining effective points in the target three-dimensional point cloud characteristics according to the Euclidean distance of the pixels and a preset distance threshold.
After the point cloud to be processed is subjected to mesh division, each mesh is mapped to a feature map to obtain Ri, so that for each individual pixel point on the target three-dimensional point cloud feature, the Ri closest to the individual pixel point is found out firstly, and the judgment mode can use the pixel Euclidean distance L between the point and the central point. At this time, a distance threshold T is set at the same time, if L is smaller than T, this point is a valid point, otherwise, it is an invalid point. And determining the target information, namely the coordinates, the three-dimensional size and the yaw angle of the central point according to all the effective points. The distance threshold T may be set according to actual conditions, for example, taking the target as a vehicle, and the threshold may be set according to the length of the vehicle body.
In addition, after the target information in the point cloud to be processed is determined according to the target three-dimensional point cloud feature, the category probability of the target can be determined based on the target three-dimensional point cloud feature by utilizing a second convolutional neural network;
and removing the error target of the target according to the class probability of the target.
And the second convolutional neural network is obtained by point cloud characteristics and target class probability training.
Here, the determined target information in the point cloud to be processed may be a large amount (obtained by performing dense prediction on each voxel point), and in order to obtain an accurate result, a final detection result may be obtained by suppressing non-maximum values and setting a corresponding score threshold. Specifically, the class probability of the target may be determined based on the above target three-dimensional point cloud feature, for example, the probability of being a% of the vehicle is a%. Finally, the corresponding relationship between the target information and the class probability of the target may be determined, for example, the class probability of the target corresponding to the first target information: the probability of being a vehicle is 99%; the class probability of the target corresponding to the second target information: the probability of being a vehicle is 10%. And removing the error target, for example, removing the second target, and so on, according to the corresponding relationship between the target information and the class probability of the target, so as to obtain a final detection result.
According to the target detection method provided by the embodiment, point cloud vectors are determined according to the number of points in the point cloud to be processed and the attributes of the points; and then, extracting the target three-dimensional point cloud features of the point cloud vector by using a three-dimensional convolution neural network, reserving height information, further reserving three-dimensional structure information of the point cloud to the maximum extent, and determining target information in the point cloud to be processed according to the extracted target three-dimensional point cloud features, wherein the target information comprises geometric position information of each dimension of a target in a three-dimensional coordinate system, so as to accurately find out the three-dimensional target in the point cloud.
The method comprises the steps of utilizing a convolutional neural network to detect a three-dimensional target object on point cloud data, wherein the detection comprises the detection of information such as coordinates, three-dimensional sizes and yaw angles of the object in the real world relative to a sensor, so that dynamic obstacles can be detected by utilizing the point cloud data, and airplanes, automobiles, robots and the like are guided to carry out obstacle avoidance and path planning.
Especially for automatically driven automobiles, a laser radar sensor is usually equipped, and the detection of obstacles through laser point cloud is an important part of the whole technical link
Fig. 4 is a schematic flowchart of another target detection method provided in the embodiment of the present application, and the embodiment describes in detail a specific implementation process of the embodiment on the basis of the embodiment of fig. 2. As shown in fig. 4, the method includes:
s401, carrying out grid division on the point cloud to be processed.
S402, adjusting the number of points in each divided grid according to the preset number of points.
And S403, determining the vector of each divided grid according to the product of the number of points in each grid after adjustment and the attribute of each point, wherein the attribute of each point comprises the three-dimensional coordinate and the reflectivity of the point in the three-dimensional coordinate system.
Here, the whole three-dimensional space is divided into quantitative small grids, and then the feature vector of each grid is determined, that is, the point cloud is preprocessed (point cloud coding) in a certain structuring mode, and the original structure information of the point cloud is maximally reserved.
And S404, performing three-dimensional grid feature extraction on the vector of each divided grid by using a third convolutional neural network, wherein the third convolutional neural network is obtained by training three-dimensional grid vectors and three-dimensional grid features.
S405, extracting target three-dimensional point cloud characteristics of the extracted three-dimensional grid characteristics by using a fourth convolutional neural network, wherein the fourth convolutional neural network is obtained by training the three-dimensional grid characteristics and the three-dimensional point cloud characteristics.
And the third convolutional neural network and the fourth convolutional neural network are three-dimensional convolutional neural networks.
For feature extraction, in order to retain three-dimensional structure information, the embodiment of the present application adopts three-dimensional convolution, extracts features from a three-dimensional space, and retains spatial structure information.
S406, determining a central point coordinate, a three-dimensional size and a yaw angle of a target in the point cloud to be processed in a three-dimensional coordinate system based on the target three-dimensional point cloud feature by using a first convolution neural network, wherein the first convolution neural network is obtained by training the point cloud feature and the central point coordinate, the three-dimensional size and the yaw angle of the target in the three-dimensional coordinate system.
After the point cloud is coded, the point cloud is used for inputting of a neural network, and characteristics of the point cloud are learned through the neural network and directly used for prediction of a three-dimensional target. The prediction is intensive on a neural network characteristic diagram, and a final detection result can be obtained through end-to-end learning.
S407, determining the class probability of the target based on the target three-dimensional point cloud feature by using a second convolutional neural network, wherein the second convolutional neural network is obtained by point cloud feature and target class probability training.
And S408, removing the error target of the target according to the class probability of the target.
Since the determined target information in the point cloud to be processed may be a lot (obtained by performing dense prediction on each voxel point), in order to obtain an accurate result, the target to be processed is removed, so as to obtain a final detection result.
According to the target detection method provided by the embodiment, point cloud vectors are determined according to the number of points in the point cloud to be processed and the attributes of the points; and then, extracting target three-dimensional point cloud features of the point cloud vector by using a three-dimensional convolutional neural network, reserving height information, further reserving three-dimensional structure information of the point cloud to the maximum extent, and determining target information in the point cloud to be processed according to the extracted target three-dimensional point cloud features, wherein the target information comprises geometric position information of each dimension of a target in a three-dimensional coordinate system, so that a three-dimensional target in the point cloud is accurately found out, and the problems that the error rate of the existing target detection result is high, and the follow-up guidance of airplanes, automobiles, robots and the like for obstacle avoidance and path planning cannot be carried out are solved.
Fig. 5 is a schematic structural diagram of an object detection device according to an embodiment of the present application. For convenience of explanation, only portions related to the embodiments of the present application are shown. As shown in fig. 5, the object detection device 50 includes: a first determination module 501, an extraction module 502 and a second determination module 503.
The first determining module 501 is configured to determine a point cloud vector according to the number of points in the point cloud to be processed and the attribute of each point, where the point cloud vector includes geometric position information of each dimension of the point in a three-dimensional coordinate system.
An extracting module 502, configured to process the point cloud vector by using a three-dimensional convolutional neural network to extract a target three-dimensional point cloud feature of the point cloud vector.
A second determining module 503, configured to determine target information in the point cloud to be processed according to the target three-dimensional point cloud feature, where the target information includes geometric position information of each dimension of a target in a three-dimensional coordinate system.
The device provided in this embodiment may be used to implement the technical solution of the above method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
Fig. 6 is a schematic structural diagram of another object detection device according to an embodiment of the present invention. As shown in fig. 6, in this embodiment, on the basis of the embodiment in fig. 5, the target detection apparatus further includes: a third determination module 504 and a removal module 505.
In one possible design, the second determining module 503 is specifically configured to:
and determining the coordinates of the central point, the three-dimensional size and the yaw angle of the target in the point cloud to be processed in a three-dimensional coordinate system based on the characteristics of the target three-dimensional point cloud by utilizing a first convolution neural network.
In one possible design, the first convolutional neural network is obtained by training point cloud features and coordinates of a central point, three-dimensional dimensions and a yaw angle of a target in a three-dimensional coordinate system.
In one possible design, the third determining module 504 is configured to determine the class probability of the target based on the target three-dimensional point cloud feature by using a second convolutional neural network after the second determining module 503 determines the target information in the point cloud to be processed according to the target three-dimensional point cloud feature.
The removing module 505 is configured to remove an error target from the target according to the class probability of the target.
In one possible design, the second convolutional neural network is obtained by point cloud feature and target class probability training.
In one possible design, the first determining module 501 is specifically configured to:
carrying out mesh division on the point cloud to be processed;
determining a vector of each divided grid according to the number of points in each divided grid and the attribute of each point, wherein the attribute of each point comprises a three-dimensional coordinate and a reflectivity of the point in a three-dimensional coordinate system;
and determining the point cloud vector according to the vector of each divided grid.
In one possible design, the determining module 501 determines a vector of each divided grid according to the number of points in each divided grid and the attribute of each point, including:
adjusting the number of points in each divided grid according to a preset number of points;
and determining the vector of each divided grid according to the product of the number of points in each grid and the attribute of each point after adjustment.
In one possible design, the three-dimensional convolutional neural network includes a third convolutional neural network.
The extracting module 502 is specifically configured to:
and performing three-dimensional grid feature extraction on the vector of each divided grid by using the third convolutional neural network.
In one possible design, the third convolutional neural network is obtained by training three-dimensional grid vectors and three-dimensional grid features.
In one possible design, the three-dimensional convolutional neural network further includes a fourth convolutional neural network.
The extracting module 502 is further configured to, after performing three-dimensional mesh feature extraction on the vector of each divided mesh by using the third convolutional neural network:
and extracting the target three-dimensional point cloud characteristics of the extracted three-dimensional grid characteristics by using the fourth convolutional neural network.
In one possible design, the fourth convolutional neural network is obtained by training three-dimensional grid features and three-dimensional point cloud features
In one possible design, the second determining module 503 is specifically configured to:
and determining the target information according to the effective points in the target three-dimensional point cloud characteristics.
In one possible design, the second determining module 503 is further configured to, before determining the target information according to the valid points in the target three-dimensional point cloud feature:
obtaining the Euclidean distance of each point in the target three-dimensional point cloud characteristic from the central point of the grid where the point is located;
and determining effective points in the target three-dimensional point cloud characteristics according to the Euclidean distance of the pixels and a preset distance threshold.
In one possible design, the three-dimensional convolutional neural network comprises:
a plurality of convolution layers for performing convolution operation on the point cloud vector to output three-dimensional point cloud characteristics;
the upper sampling layer is connected with at least one of the plurality of convolution layers and is used for acquiring three-dimensional point cloud characteristics output by at least one of the plurality of convolution layers and processing the acquired three-dimensional point cloud characteristics to output the processed three-dimensional point cloud characteristics;
and the full-connection layer is connected with the convolution layers in the plurality of convolution layers and the upper sampling layer and is used for acquiring the three-dimensional point cloud characteristics output by the convolution layers and the processed three-dimensional point cloud characteristics, performing characteristic fusion on the acquired three-dimensional point cloud characteristics and the processed three-dimensional point cloud characteristics to generate fused three-dimensional point cloud characteristics, inputting the fused three-dimensional point cloud characteristics into another convolution layer, and determining the target three-dimensional point cloud characteristics after convolution operation of the other convolution layer.
In one possible design, the plurality of convolutional layers differ in depth of convolutional layer;
the full-connection layer is connected with a first convolution layer, a second convolution layer and the upper sampling layer in the convolution layers to obtain three-dimensional point cloud characteristics output by the first convolution layer and processed three-dimensional point cloud characteristics, feature fusion is carried out on the three-dimensional point cloud characteristics output by the first convolution layer and the processed three-dimensional point cloud characteristics to generate fused three-dimensional point cloud characteristics, the fused three-dimensional point cloud characteristics are input into the second convolution layer to determine target three-dimensional point cloud characteristics after convolution operation of the second convolution layer, and the depth of the second convolution layer is larger than that of the first convolution layer.
In one possible design, the number of the full connection layers is multiple.
In one possible design, the number of the up-sampling layers is plural.
The device provided by the embodiment of the present application may be used to implement the technical solution of the method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again.
Fig. 7 is a schematic diagram of a hardware structure of the target detection device according to the embodiment of the present application. As shown in fig. 7, the object detection device 70 of the present embodiment includes: a memory 701 and a processor 702; wherein
A memory 701 for storing program instructions;
a processor 702 for executing memory-stored program instructions that, when executed, perform the steps of:
determining a point cloud vector according to the number of points in the point cloud to be processed and the attribute of each point, wherein the point cloud vector comprises geometric position information of each dimension of the point in a three-dimensional coordinate system;
processing the point cloud vector by using a three-dimensional convolution neural network to extract a target three-dimensional point cloud feature of the point cloud vector;
and determining target information in the point cloud to be processed according to the target three-dimensional point cloud characteristics, wherein the target information comprises geometric position information of each dimension of a target in a three-dimensional coordinate system.
In one possible design, the determining the target information in the point cloud to be processed according to the target three-dimensional point cloud feature includes:
and determining the coordinates of the central point, the three-dimensional size and the yaw angle of the target in the point cloud to be processed in a three-dimensional coordinate system based on the characteristics of the target three-dimensional point cloud by utilizing a first convolution neural network.
In one possible design, the first convolutional neural network is obtained by training point cloud features and coordinates of a central point, three-dimensional dimensions and a yaw angle of a target in a three-dimensional coordinate system.
In one possible design, after the determining the target information in the point cloud to be processed according to the target three-dimensional point cloud feature, the method further includes:
determining the class probability of the target based on the target three-dimensional point cloud characteristic by utilizing a second convolutional neural network;
and removing the error target of the target according to the class probability of the target.
In one possible design, the second convolutional neural network is obtained by point cloud feature and target class probability training
In one possible design, the determining a point cloud vector according to the number of points in the point cloud to be processed and the attribute of each point includes:
carrying out mesh division on the point cloud to be processed;
determining a vector of each divided grid according to the number of points in each divided grid and the attribute of each point, wherein the attribute of each point comprises a three-dimensional coordinate and a reflectivity of the point in a three-dimensional coordinate system;
and determining the point cloud vector according to the vector of each divided grid.
In one possible design, the determining a vector of each divided grid according to the number of points in each divided grid and the attribute of each point includes:
adjusting the number of points in each divided grid according to a preset number of points;
and determining the vector of each divided grid according to the product of the number of points in each grid and the attribute of each point after adjustment.
In one possible design, the three-dimensional convolutional neural network comprises a third convolutional neural network:
the processing the point cloud vector by using a three-dimensional convolution neural network to extract a target three-dimensional point cloud feature of the point cloud vector comprises:
and performing three-dimensional grid feature extraction on the vector of each divided grid by using the third convolutional neural network.
In one possible design, the third convolutional neural network is obtained by training three-dimensional grid vectors and three-dimensional grid features.
In one possible design, the three-dimensional convolutional neural network further comprises a fourth convolutional neural network;
after the three-dimensional grid feature extraction is performed on the vector of each divided grid by using the third convolutional neural network, the method further includes:
and extracting the target three-dimensional point cloud characteristics of the extracted three-dimensional grid characteristics by using the fourth convolutional neural network.
In one possible design, the fourth convolutional neural network is obtained by training three-dimensional grid features and three-dimensional point cloud features.
In one possible design, the determining the target information in the point cloud to be processed according to the target three-dimensional point cloud feature includes:
and determining the target information according to the effective points in the target three-dimensional point cloud characteristics.
In one possible design, before the determining the target information according to the valid points in the target three-dimensional point cloud feature, the method further includes:
obtaining the Euclidean distance of each point in the target three-dimensional point cloud characteristic from the central point of the grid where the point is located;
and determining effective points in the target three-dimensional point cloud characteristics according to the Euclidean distance of the pixels and a preset distance threshold.
In one possible design, the three-dimensional convolutional neural network comprises:
a plurality of convolution layers for performing convolution operation on the point cloud vector to output three-dimensional point cloud characteristics;
the upper sampling layer is connected with at least one of the plurality of convolution layers and is used for acquiring three-dimensional point cloud characteristics output by at least one of the plurality of convolution layers and processing the acquired three-dimensional point cloud characteristics to output the processed three-dimensional point cloud characteristics;
and the full-connection layer is connected with the convolution layers in the plurality of convolution layers and the upper sampling layer and is used for acquiring the three-dimensional point cloud characteristics output by the convolution layers and the processed three-dimensional point cloud characteristics, performing characteristic fusion on the acquired three-dimensional point cloud characteristics and the processed three-dimensional point cloud characteristics to generate fused three-dimensional point cloud characteristics, inputting the fused three-dimensional point cloud characteristics into another convolution layer, and determining the target three-dimensional point cloud characteristics after convolution operation of the other convolution layer.
In one possible design, the plurality of convolutional layers differ in depth of convolutional layer;
the full-connection layer is connected with a first convolution layer, a second convolution layer and the upper sampling layer in the convolution layers to obtain three-dimensional point cloud characteristics output by the first convolution layer and processed three-dimensional point cloud characteristics, feature fusion is carried out on the three-dimensional point cloud characteristics output by the first convolution layer and the processed three-dimensional point cloud characteristics to generate fused three-dimensional point cloud characteristics, the fused three-dimensional point cloud characteristics are input into the second convolution layer to determine target three-dimensional point cloud characteristics after convolution operation of the second convolution layer, and the depth of the second convolution layer is larger than that of the first convolution layer.
In one possible design, the number of the full connection layers is multiple.
In one possible design, the number of the up-sampling layers is plural.
In one possible design, memory 701 may be separate or integrated with processor 702.
When the memory 701 is separately provided, the object detection apparatus further includes a bus 703 for connecting the memory 701 and the processor 702.
In one possible design, the object detection device 70 may be a stand-alone device that includes the above-described memory 701, processor 702, and/or the like. In addition, taking a vehicle as an example, the components of the object detection device 70 may be distributively integrated on the vehicle, i.e., the memory 701, the processor 702, and the like may be respectively disposed at different locations of the vehicle.
Fig. 8 is a schematic structural diagram of a movable platform according to an embodiment of the present disclosure. As shown in fig. 8, the movable platform 80 of the present embodiment includes: a movable platform body 801, and a target detection device 802; the target detection device 802 is disposed on the movable platform body 801, and the movable platform body 801 and the target detection device 802 are connected wirelessly or through wires.
The target detection device 802 determines a point cloud vector according to the number of points in the point cloud to be processed and the attribute of each point, wherein the point cloud vector comprises the geometric position information of each dimension of the point in a three-dimensional coordinate system;
processing the point cloud vector by using a three-dimensional convolution neural network to extract a target three-dimensional point cloud feature of the point cloud vector;
and determining target information in the point cloud to be processed according to the target three-dimensional point cloud characteristics, wherein the target information comprises geometric position information of each dimension of a target in a three-dimensional coordinate system.
In one possible design, the determining the target information in the point cloud to be processed according to the target three-dimensional point cloud feature includes:
and determining the coordinates of the central point, the three-dimensional size and the yaw angle of the target in the point cloud to be processed in a three-dimensional coordinate system based on the characteristics of the target three-dimensional point cloud by utilizing a first convolution neural network.
In one possible design, the first convolutional neural network is obtained by training point cloud features and coordinates of a central point, three-dimensional dimensions and a yaw angle of a target in a three-dimensional coordinate system.
In one possible design, after the determining the target information in the point cloud to be processed according to the target three-dimensional point cloud feature, the method further includes:
determining the class probability of the target based on the target three-dimensional point cloud characteristic by utilizing a second convolutional neural network;
and removing the error target of the target according to the class probability of the target.
In one possible design, the second convolutional neural network is obtained by point cloud feature and target class probability training.
In one possible design, the determining a point cloud vector according to the number of points in the point cloud to be processed and the attribute of each point includes:
carrying out mesh division on the point cloud to be processed;
determining a vector of each divided grid according to the number of points in each divided grid and the attribute of each point, wherein the attribute of each point comprises a three-dimensional coordinate and a reflectivity of the point in a three-dimensional coordinate system;
and determining the point cloud vector according to the vector of each divided grid.
In one possible design, the determining a vector of each divided grid according to the number of points in each divided grid and the attribute of each point includes:
adjusting the number of points in each divided grid according to a preset number of points;
and determining the vector of each divided grid according to the product of the number of points in each grid and the attribute of each point after adjustment.
In one possible design, the three-dimensional convolutional neural network comprises a third convolutional neural network:
the processing the point cloud vector by using a three-dimensional convolution neural network to extract a target three-dimensional point cloud feature of the point cloud vector comprises:
and performing three-dimensional grid feature extraction on the vector of each divided grid by using the third convolutional neural network.
In one possible design, the third convolutional neural network is obtained by training three-dimensional grid vectors and three-dimensional grid features.
In one possible design, the three-dimensional convolutional neural network further comprises a fourth convolutional neural network;
after the three-dimensional grid feature extraction is performed on the vector of each divided grid by using the third convolutional neural network, the method further includes:
and extracting the target three-dimensional point cloud characteristics of the extracted three-dimensional grid characteristics by using the fourth convolutional neural network.
In one possible design, the fourth convolutional neural network is obtained by training three-dimensional grid features and three-dimensional point cloud features.
In one possible design, the determining the target information in the point cloud to be processed according to the target three-dimensional point cloud feature includes:
and determining the target information according to the effective points in the target three-dimensional point cloud characteristics.
In one possible design, before the determining the target information according to the valid points in the target three-dimensional point cloud feature, the method further includes:
obtaining the Euclidean distance of each point in the target three-dimensional point cloud characteristic from the central point of the grid where the point is located;
and determining effective points in the target three-dimensional point cloud characteristics according to the Euclidean distance of the pixels and a preset distance threshold.
In one possible design, the three-dimensional convolutional neural network comprises:
a plurality of convolution layers for performing convolution operation on the point cloud vector to output three-dimensional point cloud characteristics;
the upper sampling layer is connected with at least one of the plurality of convolution layers and is used for acquiring three-dimensional point cloud characteristics output by at least one of the plurality of convolution layers and processing the acquired three-dimensional point cloud characteristics to output the processed three-dimensional point cloud characteristics;
and the full-connection layer is connected with the convolution layers in the plurality of convolution layers and the upper sampling layer and is used for acquiring the three-dimensional point cloud characteristics output by the convolution layers and the processed three-dimensional point cloud characteristics, performing characteristic fusion on the acquired three-dimensional point cloud characteristics and the processed three-dimensional point cloud characteristics to generate fused three-dimensional point cloud characteristics, inputting the fused three-dimensional point cloud characteristics into another convolution layer, and determining the target three-dimensional point cloud characteristics after convolution operation of the other convolution layer.
In one possible design, the plurality of convolutional layers differ in depth of convolutional layer;
the full-connection layer is connected with a first convolution layer, a second convolution layer and the upper sampling layer in the convolution layers to obtain three-dimensional point cloud characteristics output by the first convolution layer and processed three-dimensional point cloud characteristics, feature fusion is carried out on the three-dimensional point cloud characteristics output by the first convolution layer and the processed three-dimensional point cloud characteristics to generate fused three-dimensional point cloud characteristics, the fused three-dimensional point cloud characteristics are input into the second convolution layer to determine target three-dimensional point cloud characteristics after convolution operation of the second convolution layer, and the depth of the second convolution layer is larger than that of the first convolution layer.
In one possible design, the number of the full connection layers is multiple.
In one possible design, the number of the up-sampling layers is plural.
The movable platform that this embodiment provided includes: the system comprises a movable platform body and target detection equipment, wherein the target detection equipment is arranged on the movable platform body, and determines a point cloud vector through the number of points in a point cloud to be processed and the attribute of each point; and then, extracting target three-dimensional point cloud features of the point cloud vector by using a three-dimensional convolutional neural network, reserving height information, further reserving three-dimensional structure information of the point cloud to the maximum extent, and determining target information in the point cloud to be processed according to the extracted target three-dimensional point cloud features, wherein the target information comprises geometric position information of each dimension of a target in a three-dimensional coordinate system, so that a three-dimensional target in the point cloud is accurately found out, and the problems that the error rate of the existing target detection result is high, and the follow-up guidance of airplanes, automobiles, robots and the like for obstacle avoidance and path planning cannot be carried out are solved.
An embodiment of the present application provides a computer-readable storage medium, in which program instructions are stored, and when a processor executes the program instructions, the object detection method as described above is implemented.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application.
It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (enhanced Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.