WO2022166400A1 - 一种处理三维点云的方法、装置、设备以及存储介质 - Google Patents

一种处理三维点云的方法、装置、设备以及存储介质 Download PDF

Info

Publication number
WO2022166400A1
WO2022166400A1 PCT/CN2021/137305 CN2021137305W WO2022166400A1 WO 2022166400 A1 WO2022166400 A1 WO 2022166400A1 CN 2021137305 W CN2021137305 W CN 2021137305W WO 2022166400 A1 WO2022166400 A1 WO 2022166400A1
Authority
WO
WIPO (PCT)
Prior art keywords
point
points
indistinguishable
neural network
point cloud
Prior art date
Application number
PCT/CN2021/137305
Other languages
English (en)
French (fr)
Inventor
乔宇
徐名业
张钧皓
周志鹏
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2022166400A1 publication Critical patent/WO2022166400A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application belongs to the field of computer technology, and in particular relates to a method for processing a 3D point cloud, a device for processing a 3D point cloud, a device for processing a 3D point cloud, and a storage medium.
  • Point cloud (full name in English) is a collection of point data on the surface of the product obtained by measuring instruments in reverse engineering. In addition to geometric positions, point cloud data also has color information. Color information is usually obtained by acquiring a color image through a camera, and then assigning the color information (RGB) of the pixel at the corresponding position to the corresponding point in the point cloud. The acquisition of intensity information is the echo intensity collected by the receiving device of the laser scanner. This intensity information is related to the surface material, roughness, incident angle direction of the target, as well as the emitted energy and laser wavelength of the instrument.
  • the 3D point cloud data is denormalized.
  • the multi-view projection technology projects the denormalized 3D point cloud into a 2D image, and then converts the 2D point cloud into a 2D image.
  • Image processing At present, the point cloud data processing needs to convert the point cloud data into other data formats, such as projecting a three-dimensional point cloud to a two-dimensional image as the input of the convolutional neural network; however, this process has the following shortcomings: (1) Due to occlusion, the projection process itself will cause some data to be missing. (2) The amount of calculation in the process of data transformation is relatively large. Therefore, it is necessary to directly construct a convolutional neural network to process 3D point cloud data.
  • the embodiments of the present application provide a method for processing a 3D point cloud, a device for processing a 3D point cloud, a device for processing a 3D point cloud, and a storage medium, so as to solve the problem of existing methods that can directly process 3D point cloud data.
  • the processed convolutional neural network cannot accurately extract the feature information of each point, which leads to the problem of inaccurate prediction results when predicting the category of these points.
  • a first aspect of the embodiments of the present application provides a method for processing a three-dimensional point cloud, including:
  • the point cloud data is input into the trained convolutional neural network for processing, and the target feature corresponding to each point is obtained.
  • the convolutional neural network includes a geometric attention fusion module and a focusing module.
  • the geometric attention fusion module For extracting the local enhancement feature of each said point, the focusing module is used for extracting the target feature of each said point based on the local enhancement feature of each said point;
  • the prediction category corresponding to each point is determined.
  • a second aspect of the embodiments of the present application provides an apparatus for processing a three-dimensional point cloud, including:
  • the processing unit is used to input the point cloud data into the trained convolutional neural network for processing, and obtain the target feature corresponding to each point.
  • the convolutional neural network includes a geometric attention fusion module and a focusing module.
  • the geometric attention fusion module is used for extracting the local enhancement feature of each said point
  • the said focusing module is used for extracting the target feature of each said point based on the local enhancement feature of each said point;
  • the determining unit is used for determining the prediction category corresponding to each point based on the target feature corresponding to each point.
  • a third aspect of the embodiments of the present application provides a device for processing a three-dimensional point cloud, including a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that: When the processor executes the computer program, the steps of the method for processing a three-dimensional point cloud according to the first aspect above are implemented.
  • a fourth aspect of the embodiments of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the three-dimensional processing according to the first aspect above is implemented.
  • the steps of the point cloud method are described in detail below.
  • a fifth aspect of the embodiments of the present application provides a computer program product that, when the computer program product runs on a device for processing a three-dimensional point cloud, causes a device for processing a three-dimensional point cloud to execute the above-mentioned first aspect. Steps of a method for processing a three-dimensional point cloud.
  • a method for processing a three-dimensional point cloud, a device for processing a three-dimensional point cloud, a device for processing a three-dimensional point cloud, and a storage medium provided by the embodiments of the present application have the following beneficial effects:
  • a device for processing a three-dimensional point cloud processes point cloud data through a trained convolutional neural network to obtain a target feature corresponding to each point, and determines each point based on the target feature corresponding to each point. the corresponding prediction category. Because when extracting the target feature corresponding to each point, the local enhancement feature of each point is extracted based on the geometric attention fusion module included in the convolutional neural network, and then based on the focusing module included in the convolutional neural network and each point The local enhancement features are extracted to obtain the target features of each point.
  • the target feature of each point extracted based on this method contains important geometric information corresponding to each point, which makes the extracted target feature of each point more accurate and effective, and then predicts according to the target feature of each point. category, the resulting predictions are very accurate.
  • Fig. 1 is a kind of schematic diagram of difficult division of complex point cloud scene provided by the present application
  • FIG. 2 is a schematic flowchart of a method for processing a three-dimensional point cloud provided by an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a method for processing a three-dimensional point cloud provided by another embodiment of the present application.
  • FIG. 4 is a schematic diagram of a geometric attention fusion module provided by the present application.
  • FIG. 5 is a schematic flowchart of a method for processing a 3D point cloud provided by another embodiment of the present application.
  • FIG. 6 is a schematic flowchart of a method for processing a three-dimensional point cloud provided by yet another embodiment of the present application.
  • FIG. 7 is a schematic diagram of a focusing module provided by the present application.
  • FIG. 8 is an evaluation process of the new evaluation criteria for indivisible regions provided by the present application.
  • Fig. 9 is a kind of semantic segmentation network oriented to large-scale complex scene point cloud provided by this application.
  • FIG. 11 provides an application scenario diagram for this application
  • FIG. 12 is a schematic diagram of an apparatus for processing a three-dimensional point cloud provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a device for processing a three-dimensional point cloud provided by another embodiment of the present application.
  • a point cloud (full name in English) is a collection of point data on the appearance surface of a product obtained by a measuring instrument in reverse engineering.
  • the point cloud data also has color information.
  • Color information is usually obtained by acquiring a color image through a camera, and then assigning the color information (RGB) of the pixel at the corresponding position to the corresponding point in the point cloud.
  • the acquisition of intensity information is the echo intensity collected by the receiving device of the laser scanner. This intensity information is related to the surface material, roughness, incident angle direction of the target, as well as the emitted energy and laser wavelength of the instrument.
  • the 3D point cloud data is denormalized.
  • the multi-view projection technology projects the denormalized 3D point cloud into a 2D image, and then converts the 2D point cloud into a 2D image.
  • Image processing At present, the point cloud data processing needs to convert the point cloud data into other data formats, such as projecting a three-dimensional point cloud to a two-dimensional image as the input of the convolutional neural network; however, this process has the following shortcomings: (1) Due to occlusion, the projection process itself will cause some data to be missing. (2) The data conversion process requires a large amount of calculation, which will lead to a large amount of memory consumption, occupy a lot of computer resources, and easily lose spatial geometric information during the conversion process.
  • one-dimensional convolutional neural networks can directly manipulate and process non-normalized point cloud data.
  • the basic idea is to learn the spatial encoding of each point, and then aggregate all single point features into an overall representation. But this design cannot fully capture the relationship between points.
  • An enhanced version of point cloud convolution can divide point clouds into overlapping local regions based on distance measurements in the underlying space, and use 2D convolution to extract local feature neighborhood structures that capture fine geometry. However, it only considers the local area of each point and cannot correlate similar local features on the point cloud.
  • the existing convolutional neural networks that can directly process 3D point cloud data cannot accurately extract the feature information of each point, resulting in inaccurate prediction results when predicting the category of these points.
  • the existing 3D scene point cloud processing methods are particularly poor in segmentation of difficult-to-segment regions, and the problems are mainly concentrated in the segmentation edges of objects, inside objects that are easily confused, and some discrete and confusing small regions.
  • FIG. 1 is a schematic diagram of a difficult region of a complex point cloud scene provided by the present application.
  • the first type is the complex boundary region, which belongs to boundary points (object boundary and prediction boundary).
  • boundary points object boundary and prediction boundary.
  • the second type is the obfuscated interior region, which contains interior points from different classes of objects with similar textures and geometries. For example, doors and walls have a similar appearance, are almost flat, and have similar colors. In this case, even for humans, it is difficult to accurately identify whether certain points belong to a door or a wall.
  • the third type is isolated small areas, which are scattered and difficult to predict. Also, objects in the scene are not fully captured by the device due to occlusion. Therefore, for points in isolated small regions, it is also impossible to accurately predict the class to which they belong.
  • the present application provides a method for processing a three-dimensional point cloud.
  • a device for processing a three-dimensional point cloud processes the point cloud data through a trained convolutional neural network, and obtains The target feature corresponding to each point, and the prediction category corresponding to each point is determined based on the target feature corresponding to each point.
  • the local enhancement feature of each point is extracted based on the geometric attention fusion module included in the convolutional neural network, and then based on the focusing module included in the convolutional neural network and each point
  • the local enhancement features are extracted to obtain the target features of each point.
  • the target feature of each point extracted based on this method contains important geometric information corresponding to each point, which makes the extracted target feature of each point more accurate and effective, and then predicts according to the target feature of each point. category, the resulting predictions are very accurate.
  • the method for processing 3D point cloud provided by this application can be applied to various fields that need to analyze 3D point cloud, such as automatic driving (such as obstacle detection and automatic path planning of automatic driving equipment, etc.), robots (object detection of home service robots, etc.) , route recognition, etc.) and other human-computer interaction fields, this method can provide users with real-time and accurate behavior recognition and detection functions, improve accuracy and interest, and ensure the safety of human-computer interaction activities. This is only an exemplary description, and it is not limited.
  • FIG. 2 is a schematic flowchart of a method for processing a three-dimensional point cloud provided by an embodiment of the present application.
  • the execution subject of the method for processing a three-dimensional point cloud in this embodiment is a device for processing a three-dimensional point cloud, and the device includes but is not limited to a smartphone, a tablet computer, a computer, a Personal Digital Assistant (PDA), a notebook computer, a supercomputer A mobile personal computer (ultra-mobile personal computer, UMPC), a netbook, an independent server, a distributed server, a server cluster or a cloud server, etc., may also include a terminal such as a desktop computer.
  • the method for processing a three-dimensional point cloud as shown in FIG. 2 may include S101 to S103, and the specific implementation principles of each step are as follows.
  • S101 Acquire point cloud data including multiple points.
  • Point cloud data including a plurality of points may be acquired by a device that processes a three-dimensional point cloud.
  • the device for processing the three-dimensional point cloud includes a laser device, a stereo camera, or a time-of-flight camera, etc.
  • the collection can be performed by a laser device, a stereo camera, or a time-of-flight camera.
  • point cloud data can be collected for three-dimensional objects by using a data collection method based on automatic point cloud splicing. During the collection process, multiple stations can be used to scan and stitch the data of each station together to obtain point cloud data. , and achieve accurate registration of point clouds from different angles by iteratively optimizing the coordinate transformation parameters.
  • S102 Input the point cloud data into a trained convolutional neural network for processing to obtain a target feature corresponding to each point.
  • the convolutional neural network includes a geometric attention fusion module and a focusing module.
  • the geometric attention The fusion module is used for extracting the local enhancement feature of each said point
  • the focusing module is used for extracting the target feature of each said point based on the local enhancement feature of each said point.
  • a pre-trained convolutional neural network is pre-stored in the device for processing a three-dimensional point cloud.
  • the convolutional neural network is obtained by training the initial convolutional neural network based on the training set and the test set using a machine learning algorithm.
  • the convolutional neural network includes a geometric attention fusion module and a focusing module.
  • the geometric attention fusion module is used to extract the local enhancement feature of each point
  • the focusing module is used to extract each point based on the local enhancement feature of each point. target features.
  • the training set includes sample point cloud data of multiple sample points
  • the test set includes sample features and sample categories corresponding to each sample point.
  • the convolutional neural network can be pre-trained by a device that processes 3D point clouds, or the files corresponding to the convolutional neural network can be transplanted to a device that processes 3D point clouds after being pre-trained by other devices. That is, the execution subject for training the convolutional neural network and the execution subject for using the convolutional neural network may be the same or different. For example, when other equipment is used to train the initial convolutional neural network, after the other equipment finishes training the initial convolutional neural network, the network parameters of the initial convolutional neural network are fixed to obtain a file corresponding to the convolutional neural network. This file is then ported to a device that processes 3D point clouds.
  • the device for processing 3D point cloud uses the geometric attention fusion module included in the convolutional neural network to extract the local enhancement feature of each point; and then based on the local enhancement feature of each point, Using the focusing module included in the convolutional neural network, the target features of each point are extracted.
  • the predicted probability value corresponding to each category corresponding to each point is determined; based on the predicted probability value corresponding to each category, the predicted category corresponding to each point is determined.
  • the local enhancement feature of each point is first extracted based on the geometric attention fusion module included in the convolutional neural network, and then based on the focus included in the convolutional neural network The module and the local enhancement feature of each point are extracted to obtain the target feature of each point.
  • the target feature of each point extracted based on this method contains important geometric information corresponding to each point, which makes the extracted target feature of each point more accurate and effective, and then predicts according to the target feature of each point. category, the resulting predictions are very accurate.
  • FIG. 3 is a schematic flowchart of a method for processing a three-dimensional point cloud provided by another embodiment of the present application, which mainly involves a possible implementation of extracting local enhancement features of each point based on the geometric attention fusion module. Way.
  • the method includes:
  • the K-nearest neighbor query algorithm For each point in the point cloud data, use the K-nearest neighbor query algorithm to obtain the point's neighbors in Euclidean space, and determine the eigenvalue map structure based on the point's neighbors in the Euclidean space;
  • the eigenvalue matrix is obtained by decomposing the three-dimensional structure tensor; the neighbor points of the point in the eigenvalue space are determined based on the eigenvalue matrix.
  • a tuple of eigenvalues is calculated based on the original coordinates of each point, expressed as and as the input feature for that point
  • the local feature corresponding to the point can be obtained by fusing the adjacent points of the point in the Euclidean space and the adjacent points of the point in the eigenvalue space by the following formula (1), the formula (1) is as follows:
  • S203 Aggregate local features corresponding to the points to obtain local enhanced features corresponding to the points.
  • the local features corresponding to each point calculated in S202 are aggregated, and the aggregation is That is, the local enhancement feature corresponding to the point is obtained.
  • local features corresponding to the point may be aggregated based on an attention pooling manner to obtain local enhanced features corresponding to the point.
  • the local feature corresponding to the point can be aggregated by the following formula (2) to obtain the local enhancement feature corresponding to the point.
  • the formula (2) is as follows:
  • FIG. 4 is a schematic diagram of a geometric attention fusion module provided by this application.
  • the geometric attention fusion module can also be called a geometry-based attention fusion module.
  • the point coordinates, point features and feature roots of each point are input, and the feature roots are performed based on the feature roots.
  • Space K nearest neighbors based on the point coordinates and point features of each point, perform the Euclidean space K nearest neighbors, and obtain the point's nearest neighbors in the Euclidean space and the point's nearest neighbors in the eigenvalue space.
  • the neighbors of the point in the Euclidean space and the neighbors of the point in the eigenvalue space are fused to obtain the local features corresponding to the point.
  • the local feature corresponding to the point is processed by the dot product and summation of the multi-layer perceptron, and the local enhancement feature corresponding to the point is obtained. That is, in the geometry-based attention fusion module, the inputs are point-wise coordinates, point features, and feature roots. In this module, we aggregate features in eigenvalue space and Euclidean space, and then use attention pooling to generate output features for each point.
  • FIG. 5 is a schematic flowchart of a method for processing a three-dimensional point cloud provided by another embodiment of the present application, which mainly involves a possibility of extracting the target feature of each point based on the local enhancement feature of each point. implementation.
  • the method includes:
  • S301 Perform a local difference on each point based on the local enhancement feature of each point to obtain a local difference corresponding to each point.
  • S302 Determine the indistinguishable point among the multiple points according to the local difference corresponding to each point.
  • the acquired points include indistinguishable points, which are points in which the prediction category is difficult to determine among the points, that is, points in the indistinguishable area in the indistinguishable area diagram of the complex point cloud scene shown in FIG. 1 . Namely, complex boundary regions, obfuscated interior regions, and points in isolated small regions.
  • the focusing module also referred to as an indistinguishable area focusing (IAF) module, can adaptively select indistinguishable points and enhance the features of each point.
  • IAF indistinguishable area focusing
  • the IAF module is a new indistinguishable region model based on hierarchical semantic features, which can adaptively select indistinguishable points.
  • the IAF model first obtains fine-grained features and high-level semantic features of indistinguishable points, and then enhances the features through non-local operations between the indistinguishable points and the corresponding overall point set.
  • Local difference refers to the difference between each point and its neighbors.
  • the local difference reflects the difference of each point to a certain extent, which depends on the low-level geometric features, latent space and high-level semantic features. So we use local difference as the criterion for mining indistinguishable points. For each point p i , we get the K-nearest neighbors in Euclidean space, then we get the following local differences for each point in each layer, then we accumulate these local differences together, we adjust the accumulated results according to the local differences Sort in descending order, and then select a portion of points with large local differences as indistinguishable points. Corresponding to the points in the three regions we mentioned earlier, these indistinguishable points change dynamically as the network is updated iteratively.
  • the indistinguishable points are distributed in regions where the original attributes (coordinates and colors) change rapidly. As the training process progresses, the indistinguishable points are located in the indistinguishable regions mentioned in the introduction.
  • this application utilizes a non-local mechanism through the following equations The features of all points are updated, thereby implicitly enhancing the features of indistinguishable points. In addition to this, we also compute the predicted output of the current layer.
  • the local difference is performed on each point based on the local enhancement feature of each point, and the local difference corresponding to each point is obtained, which can be realized by the following formula (3), and the formula (3) is as follows:
  • FIG. 6 is a schematic flowchart of a method for processing a three-dimensional point cloud provided by yet another embodiment of the present application, which mainly involves using a multilayer perceptron to extract a target feature corresponding to each indistinguishable point.
  • the method includes:
  • S401 Acquire a predicted label corresponding to each indistinguishable point, and acquire an intermediate feature corresponding to each indistinguishable point.
  • the predicted label corresponding to each indistinguishable point can be obtained by the following formula (5).
  • the following formula (6) can be used to aggregate the predicted labels and intermediate features corresponding to the indistinguishable point to obtain an aggregation result corresponding to the indistinguishable point.
  • this paper uses a non-local mechanism to update the features of all points through the following formula (7), thereby implicitly enhancing the features of indistinguishable points.
  • FIG. 7 is a schematic diagram of a focusing module provided by the present application.
  • the focusing module may also be referred to as an indivisible area focusing processing module. Upsampling, multi-layer perceptron learning and other processing are performed on the input features, the corresponding features of the coding layer and the predicted values of the previous layer, and finally the indistinguishable points and the target features corresponding to the indistinguishable points are extracted, and the current layer is also calculated. prediction output.
  • IAF-Net Indistinguishable Area Focusing Network
  • the present application also provides a new evaluation criterion for indivisible regions. Based on the preset measurement method, it is possible to evaluate whether the prediction category corresponding to each indistinguishable point is accurate; when it is detected that the number of indistinguishable points whose predicted category is accurate does not meet the preset threshold, continue to train the convolutional neural network.
  • the K nearest neighbors in the Euclidean space are predicted as ⁇ Z i ,1 ⁇ j ⁇ K ⁇ , and then we count the points m i that satisfy the condition Z i ⁇ Z i,gt , then put Use 0, ⁇ 1 , ⁇ 2 , 1 to divide into three parts S1, S2, S3, and finally use As a new evaluation criterion, it corresponds to the segmentation performance on three inseparable regions.
  • FIG. 8 is an evaluation process of the new evaluation criteria for indivisible regions provided by this application.
  • determining the prediction category corresponding to each point based on the target feature corresponding to each point includes: determining each indistinguishable point based on the target feature corresponding to each indistinguishable point. The predicted probability value corresponding to each category. Based on the predicted probability values corresponding to each category, the predicted category corresponding to each indistinguishable point is determined.
  • FIG. 9 is a semantic segmentation network for large-scale complex scene point clouds provided by the present application.
  • the semantic segmentation network includes a feature extraction unit and a segmentation unit.
  • the network takes N points as input, and uses the geometric attention module and the indivisible region focusing processing module mentioned in the first aspect and the second aspect to extract the features of the point cloud.
  • the network connects each layer and computes the predicted class probability for each point in the point cloud.
  • the predicted probability value corresponding to each category corresponding to each indistinguishable point is determined.
  • the predicted category corresponding to each indistinguishable point is determined. For example, if an indistinguishable point corresponds to the category of table with a predicted probability value of 0.6, and corresponds to the category of book with a predicted probability value of 0.9, then the predicted category corresponding to the indistinguishable point is book. This is only an exemplary description, and the comparison is not limited.
  • indistinguishable points include points located on complex boundaries, points with similar local textures but different categories, and points in isolated small hard regions, which greatly affect the performance of 3D semantic segmentation.
  • IAF-Net Indistinguishable Region Focusing Network
  • IPBM indistinguishable point-based metric
  • the embodiment of the present application directly processes the point cloud data by setting the point cloud convolutional neural network for sharing local geometric information, and does not need to convert the point cloud data into other complex data formats, which is beneficial to reduce memory occupation and computer resource consumption, and can Extract rich feature data faster. And the method of geometric change attention is more conducive to exploring the overall structural geometric features of point cloud edge contours, thereby improving the accuracy of classification and segmentation tasks.
  • a method for processing a three-dimensional point cloud provided by the present application further includes: obtaining a training set and A test set, the training set includes sample point cloud data of multiple sample points, and the test set includes the sample features and sample categories corresponding to each sample point; the initial convolutional neural network is trained through the training set, and the training set is obtained.
  • Convolutional neural network verify the convolutional neural network in training based on the sample set; when the verification result does not meet the preset conditions, adjust the network parameters of the convolutional neural network in the training, and continue based on the training set Train the convolutional neural network in the training; when the verification result meets the preset conditions, stop training the convolutional neural network in the training, and use the trained convolutional neural network as the trained convolutional neural network .
  • the sample point cloud data of multiple sample points can be collected by the device, or it can be collected by other devices and transmitted to the device.
  • the points in the point cloud data can be rotated and/or the points in the point source data can be rotated.
  • the point coordinates are perturbed within a predetermined range around the point to enhance the point cloud data; and/or randomly delete points in the point cloud data.
  • a random probability is randomly generated according to a preset maximum random probability, and points in the point cloud data are deleted according to the generated random probability.
  • the convolutional neural network When the parameters of the convolutional neural network are input, it can be further carried out: manually classify and filter the collected 3D point cloud data according to the category, and complete the preliminary data preparation work.
  • the convolution kernels of the trained convolutional neural network can be obtained using the first part of the point cloud data in the classified category to obtain the trained convolutional neural network; the second part of the point cloud data in the classified category can be used as validation data to evaluate The convolutional neural network. For example, according to the data sorting process, 90% of the data of each category of the 3D point cloud is selected as the training data for network training, and the remaining 10% of the data is reserved as the experimental verification data for later evaluation of the model recognition accuracy and generalization ability. Evaluate.
  • FIG. 10 is an adaptive change process of indistinguishable points in the training process provided by the present application.
  • the indistinguishable points are distributed in regions where the original attributes (coordinates and colors) change rapidly.
  • the indistinguishable points are located in the indistinguishable regions mentioned in the introduction.
  • the present application may further process the features of the point cloud data after extraction: after performing several deconvolution modules on the geometric feature information, the maximum K pooling operation may be used to extract the geometric features of the point cloud. , for subsequent classification, segmentation or registration.
  • the feature obtained by the multi-layer convolution module is an NxM dimension vector
  • N is the number of points
  • M is the dimension of each point feature
  • the maximum k pooling operation refers to taking the largest K value among the i-th dimension features of N points.
  • the global feature vector of the KxM dimensional point cloud is obtained.
  • the output features of the convolutional modules at each layer can be combined for a max pooling operation and finally passed through a fully connected layer.
  • the cross-entropy function can be used as the loss function
  • the back-propagation algorithm can be used to train and optimize the model.
  • the global features and object category information of the point cloud are used as the local features of the point cloud, and a higher-dimensional local cloud feature is formed after the point cloud, and the previously extracted point cloud features are formed.
  • segmentation prediction is performed by the predicted probabilities of the object segmentation parts obtained by the multilayer perceptron and the normalized exponential function.
  • This application designs a convolutional neural network structure for 3D point cloud classification and segmentation, adjusts the network parameters of the neural network, including but not limited to (learning rate, batch size), and adopts different learning strategies to promote convolution
  • the neural network converges to the best optimization direction of the network model; finally, the trained network model is used to test the verification data to realize the classification and segmentation of the point cloud.
  • the geometric information disentanglement convolution designed by the present invention is a module in the neural network, which can directly extract features with large and small geometric changes from the signals distributed on the point cloud, so it can be used in combination with other modules in the neural network. . network.
  • the number of input and output channels and the combination of output channels can be changed to achieve the best results in different tasks.
  • Different neural network structures can be designed by using the geometric feature information sharing module.
  • the present application can be applied to scene segmentation tasks and three-dimensional scene reconstruction tasks in the field of unmanned driving and robot vision.
  • FIG. 11 provides an application scenario diagram for this application.
  • FIG. 11 mainly shows the application of the present invention to the scene segmentation task of unmanned vehicles and robot vision.
  • the class and location of objects can be obtained, which is the basis for other tasks in this field.
  • a method for processing a three-dimensional point cloud provided in this application can be used for scene segmentation tasks of an unmanned intelligent robot.
  • the depth camera is used to collect point cloud data of the actual scene, and then use the trained neural network to extract the local features of the point cloud, and then segment the scene.
  • the segmentation results i.e. the different objects in the scene
  • the characteristics of the input can be changed according to different tasks, for example, the distance between the point and its neighbors, the color information of the point, the combination of feature vectors, and the local shape context information of the point.
  • Features are substituted or combined.
  • the inseparable area focusing module in the network is a portable point cloud feature learning module, which can be used as a feature extractor for other point cloud-related tasks, such as 3D point cloud completion, 3D point cloud detection. and other tasks.
  • FIG. 12 is a schematic diagram of an apparatus for processing a three-dimensional point cloud provided by an embodiment of the present application.
  • Each unit included in the apparatus is used to execute each step in the embodiment corresponding to FIG. 2 , FIG. 3 , FIG. 5 , and FIG. 6 .
  • FIG. 12 please refer to the relevant descriptions in the corresponding embodiments of FIG. 2 , FIG. 3 , FIG. 5 , and FIG. 6 .
  • Figure 11 including:
  • an acquiring unit 510 configured to acquire point cloud data including a plurality of points
  • the processing unit 520 is configured to input the point cloud data into the trained convolutional neural network for processing, and obtain the target feature corresponding to each point.
  • the convolutional neural network includes a geometric attention fusion module and a focusing module, so The geometric attention fusion module is used for extracting the local enhancement feature of each described point, and the focusing module is used for extracting the target feature of each described point based on the local enhancement feature of each described point;
  • the determining unit 530 is configured to determine the prediction category corresponding to each point based on the target feature corresponding to each point.
  • processing unit 520 is specifically configured to:
  • the neighbors of the point in Euclidean space are obtained based on the geometric attention fusion module, and the neighbors of the point in the eigenvalue space are determined based on the neighbors of the point in the Euclidean space point;
  • the local features corresponding to the points are aggregated to obtain local enhanced features corresponding to the points.
  • processing unit 520 is further configured to:
  • the local features corresponding to the points are aggregated based on the attention pooling method to obtain the local enhanced features corresponding to the points.
  • the plurality of points include indistinguishable points, and the indistinguishable points are points in which the prediction category is not easily determined among the plurality of points, and the processing unit 520 is further configured to:
  • a multi-layer perceptron is used to extract the target feature corresponding to each indistinguishable point.
  • processing unit 520 is further configured to:
  • a multi-layer perceptron is used to extract the target feature corresponding to each indistinguishable point.
  • the determining unit 530 is specifically configured to:
  • the predicted category corresponding to each indistinguishable point is determined.
  • the device further includes:
  • a sample acquisition unit configured to acquire a training set and a test set
  • the training set includes sample point cloud data of a plurality of sample points
  • the test set includes sample features and sample categories corresponding to each sample point
  • the first training unit is used to train the initial convolutional neural network through the training set to obtain the convolutional neural network in training;
  • a verification unit for verifying the convolutional neural network in the training based on the sample set
  • an adjustment unit configured to adjust the network parameters of the convolutional neural network in the training when the verification result does not meet the preset conditions, and continue to train the convolutional neural network in the training based on the training set;
  • the second training unit is configured to stop training the convolutional neural network in training when the verification result meets the preset condition, and use the trained convolutional neural network as the trained convolutional neural network.
  • the device further includes:
  • the evaluation unit is used to evaluate whether the prediction category corresponding to each indistinguishable point is accurate based on the preset measurement method
  • the third training unit is configured to continue training the convolutional neural network when it is detected that the number of indistinguishable points with accurate predicted categories does not meet the preset threshold.
  • FIG. 13 is a schematic diagram of a device for processing a three-dimensional point cloud provided by another embodiment of the present application.
  • the apparatus 6 for processing a three-dimensional point cloud in this embodiment includes: a processor 60 , a memory 61 , and computer instructions 62 stored in the memory 61 and executable on the processor 60 .
  • the processor 60 executes the computer instructions 62
  • the steps in each of the foregoing method embodiments for processing a three-dimensional point cloud are implemented, for example, S101 to S103 shown in FIG. 2 .
  • the processor 60 executes the computer instructions 62
  • the functions of the units in the above embodiments for example, the functions of the units 510 to 530 shown in FIG. 12 are implemented.
  • the computer instructions 62 may be divided into one or more units, and the one or more units are stored in the memory 61 and executed by the processor 60 to complete the present application.
  • the one or more units may be a series of computer instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer instructions 62 in the apparatus 6 for processing three-dimensional point clouds.
  • the computer instructions 62 can be divided into an acquisition unit, a processing unit, and a determination unit, and the specific functions of each unit are as described above.
  • the device for processing a three-dimensional point cloud may include, but is not limited to, a processor 60 and a memory 61 .
  • FIG. 6 is only an example of the device 6 for processing three-dimensional point clouds, and does not constitute a limitation on the device for processing three-dimensional point clouds, and may include more or less components than those shown, or combine certain Some components, or different components, for example, the device for processing a 3D point cloud may also include an input and output terminal, a network access terminal, a bus, and the like.
  • the so-called processor 60 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), Off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 61 may be an internal storage unit of the device for processing three-dimensional point clouds, such as a hard disk or memory of the device for processing three-dimensional point clouds.
  • the memory 61 can also be an external storage terminal of the device for processing three-dimensional point clouds, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 61 may also include both an internal storage unit of the device for processing three-dimensional point clouds and an external storage terminal.
  • the memory 61 is used to store the computer instructions and other programs and data required by the terminal.
  • the memory 61 can also be used to temporarily store data that has been output or will be output.
  • Embodiments of the present application also provide a computer storage medium, which may be non-volatile or volatile, and stores a computer program that implements the above products when the computer program is executed by a processor. Steps in the embodiment of the method for constructing the knowledge graph.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种处理三维点云的方法、装置、设备及存储介质,涉及计算机技术领域,该方法包括:获取包括多个点的点云数据(S101);将所述点云数据输入到已训练的卷积神经网络中处理,得到每个点对应的目标特征,所述卷积神经网络包括几何注意力融合模块和聚焦模块(S102);基于每个点对应的目标特征,确定每个点对应的预测类别(S103)。基于该方法提取到的每个点的目标特征,包含了每个点对应的几何信息,使提取到的每个点的目标特征更加准确、有效,进而根据每个点的目标特征进行预测类别时,得到的预测结果非常准确。

Description

一种处理三维点云的方法、装置、设备以及存储介质 技术领域
本申请属于计算机技术领域,尤其涉及一种处理三维点云的方法、处理三维点云的装置、一种处理三维点云的设备以及存储介质。
背景技术
点云(英文全称为Point Cloud)是在逆向工程中通过测量仪器得到的产品外观表面的点数据集合,点云数据除了具有几何位置以外,有的还有颜色信息。颜色信息通常是通过相机获取彩色影像,然后将对应位置的像素的颜色信息(RGB)赋予点云中对应的点。强度信息的获取是激光扫描仪接收装置采集到的回波强度,此强度信息与目标的表面材质、粗糙度、入射角方向,以及仪器的发射能量,激光波长有关。
但是,对点云数据进行处理时,由于三维点云数据与图像不同,三维点云数据是非规格化的,多视角投影技术将非规格化的三维点云投影为二维图像,再对二维图像进行处理,目前对点云数据进行处理需要先将点云数据转换为其它的数据格式,比如将三维点云投影到二维图像,作为卷积神经网络的输入;但是这个过程存在以下缺点:(1)由于遮挡原因,投影的过程本身会造成部分数据缺失。(2)数据转化的过程计算量比较大。因此,直接构造卷积神经网络处理三维点云数据非常有必要。
然而,现有的可直接对三维点云数据进行处理的卷积神经网络,不能准确地提取到每个点的特征信息,导致对这些点进行类别预测时,预测结果不准确。
发明内容
有鉴于此,本申请实施例提供了一种处理三维点云的方法、处理三维点云的装置、一种处理三维点云的设备以及存储介质,以解决现有的可直接对三维点云数据进行处理的卷积神经网络,不能准确地提取到每个点的特征信息,导致对这些点进行类别预测时,预测结果不准确的问题。
本申请实施例的第一方面提供了一种处理三维点云的方法,包括:
获取包括多个点的点云数据;
将所述点云数据输入到已训练的卷积神经网络中处理,得到每个点对应的目标特征,所述卷积神经网络包括几何注意力融合模块和聚焦模块,所述几何注意力融合模块用于提取每个所述点的局部增强特征,所述聚焦模块用于基于每个所述点的局部增强特征,提取每个所述点的目标特征;
基于每个点对应的目标特征,确定每个点对应的预测类别。
本申请实施例的第二方面提供了一种处理三维点云的装置,包括:
获取单元,用于获取包括多个点的点云数据;
处理单元,用于将所述点云数据输入到已训练的卷积神经网络中处理,得到每个点对应的目标特征,所述卷积神经网络包括几何注意力融合模块和聚焦模块,所述几何注意力融合模块用于提取每个所述点的局部增强特征,所述聚焦模块用于基于每个所述点的局部增强特征,提取每个所述点的目标特征;
确定单元,用于基于每个点对应的目标特征,确定每个点对应的预测类别。
本申请实施例的第三方面提供了一种一种处理三维点云的设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如上述第一方面所述的处理三维点云的方法的步骤。
本申请实施例的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如上述第一方面所述的处理三维点云的方法的步骤。
本申请实施例的第五方面提供了一种计算机程序产品,当计算机程序产品在一种处理三维点云的设备上运行时,使得一种处理三维点云的设备执行上述第一方面所述的处理三维点云的方法的步骤。
本申请实施例提供的一种处理三维点云的方法、处理三维点云的装置、一种处理三维点云的设备以及存储介质具有以下有益效果:
本申请实施例,一种处理三维点云的设备通过已训练的卷积神经网络对点云数据进行处理,得到每个点对应的目标特征,基于每个点对应的目标特征,确定每个点对应的预测类别。由于在提取每个点对应的目标特征时,先基于卷积神经网络包括的几何注意力融合模块,提取了每个点的局部增强特征,再基于卷积神经网络包括的聚焦模块以及每个点的局部增强特征,提取得到了每个点的目标特征。基于该 方法提取到的每个点的目标特征,包含了每个点对应的重要的几何信息,使提取到的每个点的目标特征更加准确、有效,进而根据每个点的目标特征进行预测类别时,得到的预测结果非常准确。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本申请提供的一种复杂点云场景难分区域示意图;
图2是本申请实施例提供的一种处理三维点云的方法的示意流程图;
图3为本申请另一实施例提供的一种处理三维点云的方法的示意流程图;
图4为本申请提供的一种几何注意力融合模块示意图;
图5为本申请又一实施例提供的一种处理三维点云的方法的示意流程图;
图6为本申请再一实施例提供的一种处理三维点云的方法的示意流程图;
图7为本申请提供的一种聚焦模块示意图;
图8为本申请提供的针对不可分区域的新的评价标准的评价过程;
图9为本申请提供的一种面向大型复杂场景点云的语义分割网络;
图10为本申请提供的训练过程中不可分辨点的自适应变化过程;
图11为本申请提供应用场景图;
图12是本申请一实施例提供的一种处理三维点云的装置的示意图;
图13是本申请另一实施例提供的一种处理三维点云的设备的示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
现有技术中,点云(英文全称为Point Cloud)是在逆向工程中通过测量仪器得到的产品外观表面的点数据集合,点云数据除了具有几何位置以外,有的还有颜色信息。颜色信息通常是通过相机获取彩色影像,然后将对应位置的像素的颜色信息(RGB)赋予点云中对应的点。强度信息的获取是激光扫描仪接收装置采集到的回波强度,此强度信息与目标的表面材质、粗糙度、入射角方向,以及仪器的发射能量,激光波长有关。
但是,对点云数据进行处理时,由于三维点云数据与图像不同,三维点云数据是非规格化的,多视角投影技术将非规格化的三维点云投影为二维图像,再对二维图像进行处理,目前对点云数据进行处理需要先将点云数据转换为其它的数据格式,比如将三维点云投影到二维图像,作为卷积神经网络的输入;但是这个过程存在以下缺点:(1)由于遮挡原因,投影的过程本身会造成部分数据缺失。(2)数据转化的过程计算量比较大,会导致大量内存消耗,占用大量计算机资源,且在转换过程中容易丢失空间几何信息。
也有采用体素变换方法将非标准化点云数据转换为空间体素数据,此过程虽然可以减轻数据丢失的问题,但是,变换后的体素数据具有大量数据,存在高度冗余的问题。
另外,一维卷积神经网络可以直接操作并处理非标准化点云数据,其基本思想是学习每个点的空间编码,然后将所有单个点特征聚合为一个整体表示。但是这种设计不能完全捕捉点之间的关系。
点云卷积的增强版本可以根据基础空间的距离测量将点云划分为重叠的局部区域,并使用二维卷积来提取捕获精细几何图形的局部特征邻域结构。但是,它仅考虑每个点的局部区域,而不能将点云上的相似局部特征相关联。
因此,直接构造卷积神经网络处理三维点云数据非常有必要。
然而,现有的可直接对三维点云数据进行处理的卷积神经网络,不能准确地提取到每个点的特征信息,导致对这些点进行类别预测时,预测结果不准确。且现有的三维场景点云的处理方法对难分区域的分割效果特别差,问题主要集中在物体的分割边缘、容易混淆的物体内部和一些离散的有迷惑性的小区域。
请参见图1,图1为本申请提供的一种复杂点云场景难分区域示意图。如图1所示,第一种类型为复杂边界区域,属于边界点(对象边界和预测边界)。在大多数情况下,很难准确地确定不同对象之间的边界。由于每个点的特征都是由局部区域的信息来表征的,因此在欧氏空间中接近的不同类别的物体之间,边界点的预测会过于平滑,导致无法对这些点的类别进行准确预测。
第二种类型为混淆的内部区域,它包含来自具有相似纹理和几何结构的不同类别对象的内部点。例如,门和墙有相似的外观,几乎是平的,有相似的颜色。在这种情况下,即使对人类来说,也很难准确地识别出某些点属于门还是属于墙。
第三种类型为孤立小区域,这些区域分散且难以预测。此外,由于遮挡,场景中的对象不会被设备完全捕获。因此,对于孤立小区域中的点,也无法准确地对它们所属的类别进行预测。
有鉴于此,本申请提供了一种处理三维点云的方法,该方法中,本申请实施例,一种处理三维点云的设备通过已训练的卷积神经网络对点云数据进行处理,得到每个点对应的目标特征,基于每个点对应的目标特征,确定每个点对应的预测类别。由于在提取每个点对应的目标特征时,先基于卷积神经网络包括的几何注意力融合模块,提取了每个点的局部增强特征,再基于卷积神经网络包括的聚焦模块以及每个点的局部增强特征,提取得到了每个点的目标特征。基于该方法提取到的每个点的目标特征,包含了每个点对应的重要的几何信息,使提取到的每个点的目标特征更加准确、有效,进而根据每个点的目标特征进行预测类别时,得到的预测结果非常准确。
本申请提供的处理三维点云的方法可应用于各种需要分析三维点云的领域,例如自动驾驶(如障碍物检测和自动驾驶设备的路径自动规划等)、机器人(家庭服务机器人的物体检测、路线识别等)等人机交互领域,该方法可以为用户提供实时、准确的行为识别和检测功能,提高准确性和趣味性,还保证了为人机交互活动的安全性。此处仅为示例性说明,对此不做限定。
请参见图2,图2是本申请实施例提供的一种处理三维点云的方法的示意流程图。本实施例中处理三维点云的方法的执行主体为处理三维点云的设备,该设备包括但不限于智能手机、平板电脑、计算机、个人数字助理(Personal Digital Assistant,PDA)、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网 本、独立的服务器、分布式服务器、服务器集群或云服务器等,还可以包括台式电脑等终端。如图2所示的处理三维点云的方法可包括S101~S103,各个步骤的具体实现原理如下。
S101:获取包括多个点的点云数据。
可通过处理三维点云的设备采集包括多个点的点云数据。具体地,若处理三维点云的设备中有包括激光设备、立体摄像机或者越渡时间相机等,则可以通过激光设备、立体摄像机或者越渡时间相机进行采集。具体地,可以通过采用基于点云自动拼接的数据采集方法对三维物体进行点云数据的采集,在采集过程中,可以使用多个测站扫描并将各测站数据拼接到一起得到点云数据,通过迭代优化坐标变换参数的方法实现不同角度点云的精确配准。
还可以是通过其他设备采集点云数据,将采集好的点云数据传输至本申请处理三维点云的设备中。此处仅为示例性说明,对此不做限定。
S102:将所述点云数据输入到已训练的卷积神经网络中处理,得到每个点对应的目标特征,所述卷积神经网络包括几何注意力融合模块和聚焦模块,所述几何注意力融合模块用于提取每个所述点的局部增强特征,所述聚焦模块用于基于每个所述点的局部增强特征,提取每个所述点的目标特征。
S103:基于每个点对应的目标特征,确定每个点对应的预测类别。
在本实施例中,处理三维点云的设备中预先存储有预先训练好的卷积神经网络。该卷积神经网络是使用机器学习算法,基于训练集和测试集对初始卷积神经网络进行训练得到。其中,卷积神经网络包括几何注意力融合模块和聚焦模块,该几何注意力融合模块用于提取每个点的局部增强特征,聚焦模块用于基于每个点的局部增强特征,提取每个点的目标特征。训练集包括多个样本点的样本点云数据,测试集包括每个样本点对应的样本特征以及样本类别。
可以理解的是,卷积神经网络可以由处理三维点云的设备预先训练好,也可以由其他设备预先训练好后将卷积神经网络对应的文件移植至处理三维点云的设备中。也就是说,训练该卷积神经网络的执行主体与使用该卷积神经网络的执行主体可以是相同的,也可以是不同的。例如,当采用其他设备训练初始卷积神经网络时,其他设备对初始卷积神经网络结束训练后,固定初始卷积神经网络的网络参数,得到卷积神经网络对应的文件。然后将该文件移植到处理三维点云的设 备中。
处理三维点云的设备在获取到多个点的点云数据后,利用卷积神经网络包括的几何注意力融合模块,提取每个点的局部增强特征;再基于每个点的局部增强特征,利用卷积神经网络包括的聚焦模块,提取每个点的目标特征。
再基于每个点对应的目标特征,确定每个点对应的各个类别所对应的预测概率值;基于各个类别所对应的预测概率值,确定每个点对应的预测类别。
本实施例中,由于在提取每个点对应的目标特征时,先基于卷积神经网络包括的几何注意力融合模块,提取了每个点的局部增强特征,再基于卷积神经网络包括的聚焦模块以及每个点的局部增强特征,提取得到了每个点的目标特征。基于该方法提取到的每个点的目标特征,包含了每个点对应的重要的几何信息,使提取到的每个点的目标特征更加准确、有效,进而根据每个点的目标特征进行预测类别时,得到的预测结果非常准确。
示例性的,图3为本申请另一实施例提供的一种处理三维点云的方法的示意流程图,主要涉及基于几何注意力融合模块提取每个点的局部增强特征的一种可能的实施方式。参见图3,该方法包括:
S201:针对点云数据中的每个点,基于所述几何注意力融合模块获取所述点在欧式空间的近邻点,且基于所述点在欧式空间的近邻点确定所述点在特征值空间的近邻点。
针对点云数据中的每个点,用K近邻查询算法获取该点在欧式空间的近邻点,基于该点在欧式空间的近邻点确定特征值图结构;基于该特征值图结构确定三维结构张量;对该三维结构张量进行分解得到特征值矩阵;基于该特征值矩阵确定该点在特征值空间中的近邻点。或者,基于每个点的原始坐标计算得到特征值元组,表示为
Figure PCTCN2021137305-appb-000001
并作为该点的输入特征
Figure PCTCN2021137305-appb-000002
S202:融合所述点在欧式空间的近邻点以及所述点在特征值空间的近邻点,得到所述点对应的局部特征。
可通过下述公式(1)融合该点在欧式空间的近邻点以及该点在特征值空间的近邻点,得到该点对应的局部特征,公式(1)如下:
Figure PCTCN2021137305-appb-000003
上述(1)式中,
Figure PCTCN2021137305-appb-000004
表示每个点对应的局部特征,
Figure PCTCN2021137305-appb-000005
是具有一组可学习的非线性函数,在实施例中的几何注意力融合模块中,
Figure PCTCN2021137305-appb-000006
是一个两层二维卷积,
Figure PCTCN2021137305-appb-000007
表示级联操作,前一个
Figure PCTCN2021137305-appb-000008
用于表示在欧式空间的特征,后一个
Figure PCTCN2021137305-appb-000009
用于表示在特征值空间的特征。
S203:聚合所述点对应的局部特征,得到所述点对应的局部增强特征。
对于每个点,对S202中计算得到每个点对应的的局部特征进行聚合,聚合为
Figure PCTCN2021137305-appb-000010
即得到该点对应的局部增强特征。
可选地,在一种可能的实现方式中,可基于注意力池化方式聚合该点对应的局部特征,得到该点对应的局部增强特征。
具体地,可通过下述公式(2)聚合该点对应的局部特征,得到该点对应的局部增强特征,公式(2)如下:
Figure PCTCN2021137305-appb-000011
上述(2)式中,
Figure PCTCN2021137305-appb-000012
表示每个点对应的局部增强特征,
Figure PCTCN2021137305-appb-000013
是具有一组可学习的非线性函数。
为了便于理解几何注意力融合模块提取每个点的局部增强特征的过程,请参见图4,图4为本申请提供的一种几何注意力融合模块示意图。如图4所示,几何注意力融合模块也可称为基于几何的注意力融合模块,在几何注意力融合模块中输入每个点的点坐标、点特征以及特征根,基于特征根进行特征根空间K近邻,基于每个点的点坐标、点特征进行欧式空间K近邻,得到该点在欧式空间的近邻点以及该点在特征值空间的近邻点。融合该点在欧式空间的近邻点以及该点在特征值空间的近邻点,得到该点对应的局部特征。对该点对应的局部特征通过多层感知机进行点积和求和处理,得到该点对应的局部增强特征。即在基于几何的注意力融合模块中,输入是逐点坐标、点特征和特征根。在该模块中,我们在特征值空间和欧氏空间中聚集特征,然后使用注意池生成每个点的输出特征。
本实施方式中,为了更好地描述每一个点,我们在每个点上用特征值增强局部特征,即基于几何注意力融合模块,提取每个点的局部增强特征,有效地保留了每个点在空间的几何信息。通过该几何注意力融合模块有效捕捉到各个点最重要的几何信息,且有效地融合了各个点的几何信息,有利于后续基于每个点的局部增强特征,准确地提取每个点的目标特征。
示例性的,图5为本申请又一实施例提供的一种处理三维点云的方法的示意流程图,主要涉及基于每个点的局部增强特征,提取每个点的目标特征的一种可能的实施方式。参见图5,该方法包括:
S301:基于每个所述点的局部增强特征对每个点进行局部差分,得到每个点对应的局部差异。
S302:根据每个点对应的局部差异,在所述多个点中确定所述不可分辨点。
S303:采用多层感知器提取每个不可分辨点对应的目标特征。
获取的多个点中包括不可分辨点,该不可分辨点为多个点中不易确定预测类别的点,即图1所示的复杂点云场景难分区域示意图中的难分区域的点。即复杂边界区域、混淆的内部区域以及孤立小区域中的点。
示例性地,聚焦模块也可称为不可区分区域聚焦(IAF)模块,该模块可以自适应地选择不可区分的点并增强每个点的特征。
IAF模块是一种新的基于层次语义特征的不可区分区域模型,该模型能够自适应地选择不可区分点。为了增强不可分辨点的特征,IAF模型首先获取不可分辨点的细粒度特征和高级语义特征,然后通过不可分辨点与相应的整体点集之间的非局部运算来增强特征。
为了在训练过程中自适应地发现不可分辨点,可以利用低层次的几何信息和高层次的语义信息来挖掘不可分辨点。
局部差异是指每个点与其相邻点之间的差异。局部差异在一定程度上反映了每一点的差异性,这种差异性依赖于低层次的几何特征、潜在空间和高层次的语义特征。所以我们用局部差分作为挖掘不可分辨点的判据。对于每个点p i,我们得到欧氏空间中的K近邻,然后我们得到每个层中每个点的以下局部差分,然后我们将这些局部差异累积在一起,我们调整根据局部差异累积结果的降序排列,然后选择局部差异较大的一部分点作为不可区分的点。对应我们之前提到了三种区域的点,这些不可区分的点随着网络的迭代更新而动态变化。需 要注意的是,在训练开始时,不可分辨点分布在原始属性(坐标和颜色)快速变化的区域。随着训练过程的进行,不可区分点位于引言中提到的不可区分区域。我们聚集中间特征和不可区分点的标签预测,然后使用多层感知器分别提取不可区分点的特征,为了增强点的特征,特别是不可分辨点的特征,本申请利用非局部机制,通过下列方程更新所有点的特征,从而隐式地增强不可分辨点的特征。除此之外,我们还会计算当前层的预测输出。
示例性地,基于每个点的局部增强特征对每个点进行局部差分,得到每个点对应的局部差异,可通过下述公式(3)实现,公式(3)如下:
Figure PCTCN2021137305-appb-000014
Figure PCTCN2021137305-appb-000015
Figure PCTCN2021137305-appb-000016
然后我们通过下述公式(4)将这些局部差异累积在一起:
Figure PCTCN2021137305-appb-000017
我们根据LD l按降序排列,然后选择最上面的
Figure PCTCN2021137305-appb-000018
点作为不可区分的点。
示例性的,图6为本申请再一实施例提供的一种处理三维点云的方法的示意流程图,主要涉及采用多层感知器提取每个不可分辨点对应的目标特征一种可能的实施方式。参见图6,该方法包括:
S401:获取每个不可分辨点对应的预测标签,以及获取每个不可分辨点对应的中间特征。
可通过下述公式(5)获取每个不可分辨点对应的预测标签。
Figure PCTCN2021137305-appb-000019
S402:针对每个不可分辨点,聚集所述不可分辨点对应的预测标签和中间特征,得到所述不可分辨点对应的聚集结果。
S403:基于每个不可分辨点对应的聚集结果,采用多层感知器提取每个不可分辨点对应的目标特征。
针对每个不可分辨点,可通过下述公式(6),聚集所述不可分辨点对应的预测标签和中间特征,得到不可分辨点对应的聚集结果。
我们聚集中间特征和不可区分点的标签预测,然后使用多层感知器分别提取不可区分点的特征。
Figure PCTCN2021137305-appb-000020
j∈M l-1表示这些点属于不可区分点集。
为了增强点的特征,特别是不可分辨点的特征,本文利用非局部机制,通过下式(7)更新所有点的特征,从而隐式地增强不可分辨点的特征。
Figure PCTCN2021137305-appb-000021
为了便于理解聚焦模块的处理过程,请参见图7,图7为本申请提供的一种聚焦模块示意图。如图7所示,聚焦模块也可称为不可分区域聚焦处理模块。对于输入特征、编码层对应特征以及上一层的预测值进行上采样、多层感知器学习等处理,最终提取出不可分辨点,以及不可分辨点对应的目标特征,同时还计算了当前层的预测输出。
本实施例中,提出了一种新的不可区分区域聚焦网络(IAF-Net),该网络利用层次语义特征自适应地选择不可区分的点,并增强点的细粒度特征,特别是那些不可区分的点。我们还引入了多阶段损失,以渐进的方式改进特征表示;在网络设计方面,采用级联结构,递进式地学习点云数据的几何特征。
可选地,在一种可能的实现方式中,本申请还提供了一种针对不可分区域的新的评价标准。可基于预设度量方法,评价每个不可分辨点对应的预测类别是否准确;当检测到预测类别准确的不可分辨点的数量不满足预设阈值时,继续训练卷积神经网络。
具体地,为了更好地区分不同方法在三维语义分割中的效果,我们提出了一种新的基于不可区分点度量的评价方法。此评估指标侧重于不可区分区域的分割方法的有效性。对于整个点云P={p 1,p 2,....,p N},我们有预测数据Pred={Z i,1≤i≤N}和数据真实值Label={Z i,gt,1≤i≤N}。
对于所有满足条件Z i≠Z i,gt的p i点欧式空间的K近邻点预测为{Z i,1≤j≤K},然后我们统计满足条件Z i≠Z i,gt的点数m i,再把
Figure PCTCN2021137305-appb-000022
用0,ζ 1,ζ 2,1划分为三部分S1、S2、S3,最后使用
Figure PCTCN2021137305-appb-000023
作为新的评价标准,其对应三种不可分区域上的分割性能。
为了便于理解,请参见图8,图8为本申请提供的针对不可分区域的新的评价标准的评价过程。
可选地,在一种可能的实现方式中,基于每个点对应的目标特征,确定每个点对应的预测类别,包括:基于每个不可分辨点对应的目标特征,确定每个不可分辨点对应的各个类别所对应的预测概率值。基于各个类别所对应的预测概率值,确定每个不可分辨点对应的预测类别。
示例性地,请参见图9,图9为本申请提供的一种面向大型复杂场景点云的语义分割网络。该语义分割网络包括特征提取单元和分割单元。
在特征提取单元中,我们使用层次结构来学习各个层次的特征。所述网络以N个点为输入,利用第一方面第二方面提到的几何注意力模块和不可分区域聚焦处理模块提取点云的特征。
对于分割,网络将每个层次连接起来,然后计算得到点云中每个点所对应的类别预测概率。示例性地,通过该分割单元,确定每个不可分辨点对应的各个类别所对应的预测概率值。基于各个类别所对应的预测概率值,确定每个不可分辨点对应的预测类别。例如,某个不可分辨点对应为桌子这一类别的预测概率值为0.6,对应为书这一类别的预测概率值为0.9,则该不可分辨点对应的预测类别为书。此处仅为示例性说明,对比不做限定。
上述实施方式中,不可分辨点包括位于复杂边界上的点、局部纹理相似但类别不同的点以及孤立的小硬区域中的点,这些点在很大程度上影响了三维语义分割的性能。
为了解决这一问题,我们提出了一种新的不可区分区域聚焦网络(IAF-Net),该网络利用层次语义特征自适应地选择不可区分的点,并增强点的细粒度特征,特别是那些不可区分的点。我们还引入了多阶段损失,以渐进的方式改进特征表示。此外,为了分析不可区分区域的分割性能,提出了一种新的基于不可区分点的度量方法(IPBM)。 我们的IAF-Net在一些流行的3D点云数据集(如S3DIS和ScanNet)上取得了与最新性能相当的结果,并且明显优于IPBM上的其他方法。
本申请实施例通过设定局部几何信息共享的点云卷积神经网络直接处理点云数据,不需要将点云数据转换为其它复杂的数据格式,有利于减少内存占用和计算机资源的消耗,能够更为快速的提取丰富的特征数据。和几何变化注意力的方法更加有利于探索点云边缘轮廓整体结构几何特征,从而提高分类和分割任务的精度。
可选地,在将点云数据输入到已训练的卷积神经网络中处理,得到每个点对应的目标特征之前,本申请提供的一种处理三维点云的方法还包括:获取训练集和测试集,该训练集包括多个样本点的样本点云数据,该测试集包括每个样本点对应的样本特征以及样本类别;通过该训练集对初始卷积神经网络进行训练,得到训练中的卷积神经网络;基于该样本集对该训练中的卷积神经网络进行验证;当验证结果不满足预设条件时,调整该训练中的卷积神经网络的网络参数,并继续基于该训练集对该训练中的卷积神经网络进行训练;当验证结果满足预设条件时,停止训练该训练中的卷积神经网络,并将训练后的卷积神经网络作为该已训练的卷积神经网络。
获取训练集和测试集时,可以是本设备采集多个样本点的样本点云数据,也可以是其他设备采集后传输至本设备中。可选地,无论是本设备采集的点云数据,还是其他设备采集后传输至本设备中的点云数据,均可通过旋转点云数据中的点,和/或使点源数据中点的点坐标在该点周围的预定范围内扰动,以对该点云数据进行增强;和/或,对该点云数据中的点进行随机删除。示例性地,根据预先设定的最大随机概率,随机生成随机概率,根据生成的随机概率对点云数据中的点进行删除。基于实验,发现基于上述数据增强方法,可以增强卷积神经网络学习的泛化能力,进而提高测试集(训练时没有用到的点云数据)上的测试的准确率。
当输入卷积神经网络的参数时,还可以进一步进行:根据类别对采集到的三维点云数据进行人工分类和筛选,完成初步的数据准备工作。可以使用分类后类别的第一部分点云数据的训练卷积神经网络的卷积核,以获得训练后的卷积神经网络;分类类别中的第二部分点云数据可以用作验证数据,以评估该卷积神经网络。例如,根据数据整理过程,选择三维点云每个类别的数据的90%作为网络训练的训练数 据,其余10%的数据保留为实验验证数据,用于后期对模型识别准确性和泛化能力的评估。
示例性地,请参见图10,图10为本申请提供的训练过程中不可分辨点的自适应变化过程。在训练开始时,不可分辨点分布在原始属性(坐标和颜色)快速变化的区域。随着训练过程的进行,不可区分点位于引言中提到的不可区分区域。
可选地,本申请还可以对提取了点云数据的特征后,进一步进行处理:在对几何特征信息进行几个反卷积模块处理之后,可以使用最大K池化操作提取点云的几何特征,以进行后续的分类,分割或配准。假设多层卷积模块得到的特征是NxM维向量,N是点数,M是每个点特征的维度,最大k池化操作是指在N个点第i维特征中,取最大的K个值,由此最后得到KxM维点云的全局特征向量。可以将每层卷积模块的输出特征合并,以进行最大池化操作,最后通过全连接层。另外,可以将交叉熵函数用作损失函数,并将反向传播算法用于训练和优化模型。对于分割任务,在得到全局特征的基础上,将点云的全局特征和对象类别信息作为点云的局部特征,并在点云后形成较高维的局部云特征,并在之前已提取的点云的局部特征之后,通过由多层感知器和归一化指数函数获得的物体分割部位的预测概率来进行分割预测。
本申请通过设计了一种用于三维点云分类和分割的卷积神经网络结构,调整神经网络的网络参数,包括但不限于(学习率、批量大小),采用不同的学习策略来促进卷积神经网络收敛到最佳的网络模型优化方向;最后,使用训练好的网络模型对验证数据进行测试,实现点云的分类和分割。此外,本发明设计的几何信息解缠卷积是神经网络中的一个模块,可以直接从分布在点云上的信号中提取几何变化大和小的特征,因此可以与神经网络中的其他模块结合使用。网络。可以更改输入和输出通道的数量以及输出通道的组合,以在不同任务中实现最佳效果。通过使用几何特征信息共享模块可以设计出不同的神经网络结构。
另外,通过实验验证,本申请中描述的面向点云的特征提取方法可以测试大规模点云数据(S3DIS,ScanNet)的场景分割任务。与目前国际上先进的方法相比,Area-5的m-IOU为64.6%,6-flod的结果为70.3%,在性能上具有领先优势。
示例性地,本申请可应用于无人驾驶和机器人视觉领域的场景分 割任务和三维场景重建任务。请参见图11,图11为本申请提供应用场景图。图11主要示出了本发明应用于无人驾驶车辆和机器人视觉的场景分割任务。通过分析和处理从扫描获得的三维点云,可以获得对象的类别和位置,这是该领域其他任务的基础。
示例性地,本申请提供的一种处理三维点云的方法,可用于无人智能机器人的场景分割任务。首先,用深度相机采集场景的点云数据,并标记场景点云数据中的物体类别。通过基于几何共享的卷积神经网络提取点云的局部特征,并将该局部特征用于像素级分类,这是对场景分割的训练。在实际使用中,深度相机用于收集实际场景的点云数据,然后使用训练好的神经网络提取点云的局部特征,然后对场景进行分割。将分割结果(即场景中的不同物体)返回到无人驾驶车辆(或智能机器人)以进行数据存储和进一步分析。
可选地,在实际应用中,根据任务的不同可以更改输入的特征,例如,用点与近邻点之间的距离、点的颜色信息、特征向量的组合、点的局部形状上下文信息对输入的特征进行替代或组合。
可选地,网络中的不可分区域聚焦模块是可移植的点云特征学习模块,可以作为一个特征提取器应用在其他的与点云相关的任务上,比如三维点云补全、三维点云检测等任务。
请参见图12,图12是本申请一实施例提供的一种处理三维点云的装置的示意图。该装置包括的各单元用于执行图2、图3、图5、图6对应的实施例中的各步骤。具体请参阅图2、图3、图5、图6各自对应的实施例中的相关描述。为了便于说明,仅示出了与本实施例相关的部分。参见图11,包括:
获取单元510,用于获取包括多个点的点云数据;
处理单元520,用于将所述点云数据输入到已训练的卷积神经网络中处理,得到每个点对应的目标特征,所述卷积神经网络包括几何注意力融合模块和聚焦模块,所述几何注意力融合模块用于提取每个所述点的局部增强特征,所述聚焦模块用于基于每个所述点的局部增强特征,提取每个所述点的目标特征;
确定单元530,用于基于每个点对应的目标特征,确定每个点对应的预测类别。
可选地,所述处理单元520具体用于:
针对点云数据中的每个点,基于所述几何注意力融合模块获取所述点在欧式空间的近邻点,且基于所述点在欧式空间的近邻点确定所 述点在特征值空间的近邻点;
融合所述点在欧式空间的近邻点以及所述点在特征值空间的近邻点,得到所述点对应的局部特征;
聚合所述点对应的局部特征,得到所述点对应的局部增强特征。
可选地,所述处理单元520还用于:
基于注意力池化方式聚合所述点对应的局部特征,得到所述点对应的局部增强特征。
可选地,所述多个点包括不可分辨点,所述不可分辨点为所述多个点中不易确定预测类别的点,所述处理单元520还用于:
基于每个所述点的局部增强特征对每个点进行局部差分,得到每个点对应的局部差异;
根据每个点对应的局部差异,在所述多个点中确定所述不可分辨点;
采用多层感知器提取每个不可分辨点对应的目标特征。
可选地,所述处理单元520还用于:
获取每个不可分辨点对应的预测标签,以及获取每个不可分辨点对应的中间特征;
针对每个不可分辨点,聚集所述不可分辨点对应的预测标签和中间特征,得到所述不可分辨点对应的聚集结果;
基于每个不可分辨点对应的聚集结果,采用多层感知器提取每个不可分辨点对应的目标特征。
可选地,所述确定单元530具体用于:
基于每个不可分辨点对应的目标特征,确定每个不可分辨点对应的各个类别所对应的预测概率值;
基于各个类别所对应的预测概率值,确定每个不可分辨点对应的预测类别。
可选地,所述装置还包括:
样本获取单元,用于获取训练集和测试集,所述训练集包括多个样本点的样本点云数据,所述测试集包括每个样本点对应的样本特征以及样本类别;
第一训练单元,用于通过所述训练集对初始卷积神经网络进行训练,得到训练中的卷积神经网络;
验证单元,用于基于所述样本集对所述训练中的卷积神经网络进行验证;
调整单元,用于当验证结果不满足预设条件时,调整所述训练中的卷积神经网络的网络参数,并继续基于所述训练集对所述训练中的卷积神经网络进行训练;
第二训练单元,用于当验证结果满足预设条件时,停止训练所述训练中的卷积神经网络,并将训练后的卷积神经网络作为所述已训练的卷积神经网络。
可选地,所述装置还包括:
评价单元,用于基于预设度量方法,评价每个不可分辨点对应的预测类别是否准确;
第三训练单元,用于当检测到预测类别准确的不可分辨点的数量不满足预设阈值时,继续训练所述卷积神经网络。
请参见图13,图13是本申请另一实施例提供的一种处理三维点云的设备的示意图。如图13所示,该实施例的处理三维点云的设备6包括:处理器60、存储器61以及存储在所述存储器61中并可在所述处理器60上运行的计算机指令62。所述处理器60执行所述计算机指令62时实现上述各个处理三维点云的方法实施例中的步骤,例如图2所示的S101至S103。或者,所述处理器60执行所述计算机指令62时实现上述各实施例中各单元的功能,例如图12所示单元510至530功能。
示例性地,所述计算机指令62可以被分割成一个或多个单元,所述一个或者多个单元被存储在所述存储器61中,并由所述处理器60执行,以完成本申请。所述一个或多个单元可以是能够完成特定功能的一系列计算机指令段,该指令段用于描述所述计算机指令62在所述处理三维点云的设备6中的执行过程。例如,所述计算机指令62可以被分割为获取单元、处理单元以及确定单元,各单元具体功能如上所述。
所述处理三维点云的设备可包括,但不仅限于,处理器60、存储器61。本领域技术人员可以理解,图6仅仅是处理三维点云的设备6的示例,并不构成对处理三维点云的设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述处理三维点云的设备还可以包括输入输出终端、网络接入终端、总线等。
所称处理器60可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal  Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器61可以是所述处理三维点云的设备的内部存储单元,例如处理三维点云的设备的硬盘或内存。所述存储器61也可以是所述处理三维点云的设备的外部存储终端,例如所述处理三维点云的设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器61还可以既包括所述处理三维点云的设备的内部存储单元也包括外部存储终端。所述存储器61用于存储所述计算机指令以及所述终端所需的其他程序和数据。所述存储器61还可以用于暂时地存储已经输出或者将要输出的数据。
本申请实施例还提供了一种计算机存储介质,计算机存储介质可以是非易失性,也可以是易失性,该计算机存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述各个产品知识图谱的构建方法实施例中的步骤。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神范围,均应包含在本申请的保护范围之内。

Claims (11)

  1. 一种处理三维点云的方法,其特征在于,包括:
    获取包括多个点的点云数据;
    将所述点云数据输入到已训练的卷积神经网络中处理,得到每个点对应的目标特征,所述卷积神经网络包括几何注意力融合模块和聚焦模块,所述几何注意力融合模块用于提取每个所述点的局部增强特征,所述聚焦模块用于基于每个所述点的局部增强特征,提取每个所述点的目标特征;
    基于每个点对应的目标特征,确定每个点对应的预测类别。
  2. 如权利要求1所述的方法,其特征在于,所述提取每个所述点的局部增强特征,包括:
    针对点云数据中的每个点,基于所述几何注意力融合模块获取所述点在欧式空间的近邻点,且基于所述点在欧式空间的近邻点确定所述点在特征值空间的近邻点;
    融合所述点在欧式空间的近邻点以及所述点在特征值空间的近邻点,得到所述点对应的局部特征;
    聚合所述点对应的局部特征,得到所述点对应的局部增强特征。
  3. 如权利要求2所述的方法,其特征在于,所述聚合所述点对应的局部特征,得到所述点对应的局部增强特征,包括:
    基于注意力池化方式聚合所述点对应的局部特征,得到所述点对应的局部增强特征。
  4. 如权利要求1所述的方法,其特征在于,所述多个点包括不可分辨点,所述不可分辨点为所述多个点中不易确定预测类别的点,所述基于每个所述点的局部增强特征,提取每个所述点的目标特征,包括:
    基于每个所述点的局部增强特征对每个点进行局部差分,得到每个点对应的局部差异;
    根据每个点对应的局部差异,在所述多个点中确定所述不可分辨点;
    采用多层感知器提取每个不可分辨点对应的目标特征。
  5. 如权利要求4所述的方法,其特征在于,所述采用多层感知器提取每个不可分辨点对应的目标特征,包括:
    获取每个不可分辨点对应的预测标签,以及获取每个不可分辨点对应的中间特征;
    针对每个不可分辨点,聚集所述不可分辨点对应的预测标签和中间特征,得到所述不可分辨点对应的聚集结果;
    基于每个不可分辨点对应的聚集结果,采用多层感知器提取每个不可分辨点对应的目标特征。
  6. 如权利要求4所述的方法,其特征在于,所述基于每个点对应的目标特 征,确定每个点对应的预测类别,包括:
    基于每个不可分辨点对应的目标特征,确定每个不可分辨点对应的各个类别所对应的预测概率值;
    基于各个类别所对应的预测概率值,确定每个不可分辨点对应的预测类别。
  7. 如权利要求1至6任一项所述的方法,其特征在于,所述将所述点云数据输入到已训练的卷积神经网络中处理,得到每个点对应的目标特征之前,所述方法还包括:
    获取训练集和测试集,所述训练集包括多个样本点的样本点云数据,所述测试集包括每个样本点对应的样本特征以及样本类别;
    通过所述训练集对初始卷积神经网络进行训练,得到训练中的卷积神经网络;
    基于所述样本集对所述训练中的卷积神经网络进行验证;
    当验证结果不满足预设条件时,调整所述训练中的卷积神经网络的网络参数,并继续基于所述训练集对所述训练中的卷积神经网络进行训练;
    当验证结果满足预设条件时,停止训练所述训练中的卷积神经网络,并将训练后的卷积神经网络作为所述已训练的卷积神经网络。
  8. 如权利要求4所述的方法,其特征在于,所述方法还包括:
    基于预设度量方法,评价每个不可分辨点对应的预测类别是否准确;
    当检测到预测类别准确的不可分辨点的数量不满足预设阈值时,继续训练所述卷积神经网络。
  9. 一种处理三维点云的装置,其特征在于,包括:
    获取单元,用于获取包括多个点的点云数据;
    处理单元,用于将所述点云数据输入到已训练的卷积神经网络中处理,得到每个点对应的目标特征,所述卷积神经网络包括几何注意力融合模块和聚焦模块,所述几何注意力融合模块用于提取每个所述点的局部增强特征,所述聚焦模块用于基于每个所述点的局部增强特征,提取每个所述点的目标特征;
    确定单元,用于基于每个点对应的目标特征,确定每个点对应的预测类别。
  10. 一种处理三维点云的设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至8任一项所述的方法。
  11. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至8任一项所述的方法。
PCT/CN2021/137305 2021-02-05 2021-12-12 一种处理三维点云的方法、装置、设备以及存储介质 WO2022166400A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110163660.1 2021-02-05
CN202110163660.1A CN112966696B (zh) 2021-02-05 2021-02-05 一种处理三维点云的方法、装置、设备以及存储介质

Publications (1)

Publication Number Publication Date
WO2022166400A1 true WO2022166400A1 (zh) 2022-08-11

Family

ID=76274715

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/137305 WO2022166400A1 (zh) 2021-02-05 2021-12-12 一种处理三维点云的方法、装置、设备以及存储介质

Country Status (2)

Country Link
CN (1) CN112966696B (zh)
WO (1) WO2022166400A1 (zh)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115312119A (zh) * 2022-10-09 2022-11-08 之江实验室 基于蛋白质三维结构图像鉴定蛋白质结构域的方法及系统
CN115311534A (zh) * 2022-08-26 2022-11-08 中国铁道科学研究院集团有限公司 基于激光雷达的铁路周界入侵辨识方法、装置及存储介质
CN115457496A (zh) * 2022-09-09 2022-12-09 北京百度网讯科技有限公司 自动驾驶的挡墙检测方法、装置及车辆
CN115546302A (zh) * 2022-10-20 2022-12-30 上海人工智能创新中心 一种局部几何建模的点云数据解算方法
CN115661812A (zh) * 2022-11-14 2023-01-31 苏州挚途科技有限公司 目标检测方法、目标检测装置与电子设备
CN115984331A (zh) * 2023-02-13 2023-04-18 中国人民解放军国防科技大学 基于多尺度点云特征融合的运动小目标事件流分割方法
CN116137059A (zh) * 2023-04-17 2023-05-19 宁波大学科学技术学院 基于多层次特征提取网络模型的三维点云质量评价方法
CN116413740A (zh) * 2023-06-09 2023-07-11 广汽埃安新能源汽车股份有限公司 一种激光雷达点云地面检测方法及装置
CN116524197A (zh) * 2023-06-30 2023-08-01 厦门微亚智能科技有限公司 一种结合边缘点和深度网络的点云分割方法、装置及设备
CN116520289A (zh) * 2023-07-04 2023-08-01 东莞市新通电子设备有限公司 五金件加工过程智能控制方法及相关装置
CN117152363A (zh) * 2023-10-30 2023-12-01 浪潮电子信息产业股份有限公司 基于预训练语言模型的三维内容生成方法、装置及设备
WO2024060395A1 (zh) * 2022-09-19 2024-03-28 南京邮电大学 一种基于深度学习的高精度点云补全方法及装置
CN117994778A (zh) * 2024-04-07 2024-05-07 吉林大学 一种基于点云数据的冲压件自动计数和分类方法
CN118037703A (zh) * 2024-03-16 2024-05-14 中新聚能建设工程有限公司 一种基于多源信息的隧道掌子面分区方法及相关设备
CN118172398A (zh) * 2024-05-14 2024-06-11 山东省计算中心(国家超级计算济南中心) 基于双层聚焦-注意力特征交互的点云配准方法及系统

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966696B (zh) * 2021-02-05 2023-10-27 中国科学院深圳先进技术研究院 一种处理三维点云的方法、装置、设备以及存储介质
CN113327279B (zh) * 2021-08-04 2021-09-28 腾讯科技(深圳)有限公司 一种点云数据处理方法、装置、计算机设备及存储介质
CN113486988B (zh) * 2021-08-04 2022-02-15 广东工业大学 一种基于适应性自注意力变换网络的点云补全装置和方法
CN114998414B (zh) * 2022-06-02 2024-08-09 华侨大学 一种基于点云数据的零件三维尺寸测量方法、装置和介质
CN117368876B (zh) * 2023-10-18 2024-03-29 广州易而达科技股份有限公司 一种人体检测方法、装置、设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111028327A (zh) * 2019-12-10 2020-04-17 深圳先进技术研究院 一种三维点云的处理方法、装置及设备
EP3767521A1 (en) * 2019-07-15 2021-01-20 Promaton Holding B.V. Object detection and instance segmentation of 3d point clouds based on deep learning
CN112966696A (zh) * 2021-02-05 2021-06-15 中国科学院深圳先进技术研究院 一种处理三维点云的方法、装置、设备以及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10650278B1 (en) * 2017-07-21 2020-05-12 Apple Inc. Semantic labeling of point clouds using images
CN109655019B (zh) * 2018-10-29 2021-02-02 北方工业大学 一种基于深度学习和三维重建的货物体积测量方法
CN111414953B (zh) * 2020-03-17 2023-04-18 集美大学 点云分类方法和装置
CN112287939B (zh) * 2020-10-29 2024-05-31 平安科技(深圳)有限公司 三维点云语义分割方法、装置、设备及介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3767521A1 (en) * 2019-07-15 2021-01-20 Promaton Holding B.V. Object detection and instance segmentation of 3d point clouds based on deep learning
CN111028327A (zh) * 2019-12-10 2020-04-17 深圳先进技术研究院 一种三维点云的处理方法、装置及设备
CN112966696A (zh) * 2021-02-05 2021-06-15 中国科学院深圳先进技术研究院 一种处理三维点云的方法、装置、设备以及存储介质

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311534A (zh) * 2022-08-26 2022-11-08 中国铁道科学研究院集团有限公司 基于激光雷达的铁路周界入侵辨识方法、装置及存储介质
CN115311534B (zh) * 2022-08-26 2023-07-18 中国铁道科学研究院集团有限公司 基于激光雷达的铁路周界入侵辨识方法、装置及存储介质
CN115457496B (zh) * 2022-09-09 2023-12-08 北京百度网讯科技有限公司 自动驾驶的挡墙检测方法、装置及车辆
CN115457496A (zh) * 2022-09-09 2022-12-09 北京百度网讯科技有限公司 自动驾驶的挡墙检测方法、装置及车辆
WO2024060395A1 (zh) * 2022-09-19 2024-03-28 南京邮电大学 一种基于深度学习的高精度点云补全方法及装置
CN115312119A (zh) * 2022-10-09 2022-11-08 之江实验室 基于蛋白质三维结构图像鉴定蛋白质结构域的方法及系统
US11908140B1 (en) 2022-10-09 2024-02-20 Zhejiang Lab Method and system for identifying protein domain based on protein three-dimensional structure image
CN115546302A (zh) * 2022-10-20 2022-12-30 上海人工智能创新中心 一种局部几何建模的点云数据解算方法
CN115546302B (zh) * 2022-10-20 2024-06-14 上海人工智能创新中心 一种局部几何建模的点云数据解算方法
CN115661812A (zh) * 2022-11-14 2023-01-31 苏州挚途科技有限公司 目标检测方法、目标检测装置与电子设备
CN115984331A (zh) * 2023-02-13 2023-04-18 中国人民解放军国防科技大学 基于多尺度点云特征融合的运动小目标事件流分割方法
CN116137059A (zh) * 2023-04-17 2023-05-19 宁波大学科学技术学院 基于多层次特征提取网络模型的三维点云质量评价方法
CN116137059B (zh) * 2023-04-17 2024-04-26 宁波大学科学技术学院 基于多层次特征提取网络模型的三维点云质量评价方法
CN116413740B (zh) * 2023-06-09 2023-09-05 广汽埃安新能源汽车股份有限公司 一种激光雷达点云地面检测方法及装置
CN116413740A (zh) * 2023-06-09 2023-07-11 广汽埃安新能源汽车股份有限公司 一种激光雷达点云地面检测方法及装置
CN116524197B (zh) * 2023-06-30 2023-09-29 厦门微亚智能科技股份有限公司 一种结合边缘点和深度网络的点云分割方法、装置及设备
CN116524197A (zh) * 2023-06-30 2023-08-01 厦门微亚智能科技有限公司 一种结合边缘点和深度网络的点云分割方法、装置及设备
CN116520289B (zh) * 2023-07-04 2023-09-01 东莞市新通电子设备有限公司 五金件加工过程智能控制方法及相关装置
CN116520289A (zh) * 2023-07-04 2023-08-01 东莞市新通电子设备有限公司 五金件加工过程智能控制方法及相关装置
CN117152363A (zh) * 2023-10-30 2023-12-01 浪潮电子信息产业股份有限公司 基于预训练语言模型的三维内容生成方法、装置及设备
CN117152363B (zh) * 2023-10-30 2024-02-13 浪潮电子信息产业股份有限公司 基于预训练语言模型的三维内容生成方法、装置及设备
CN118037703A (zh) * 2024-03-16 2024-05-14 中新聚能建设工程有限公司 一种基于多源信息的隧道掌子面分区方法及相关设备
CN117994778A (zh) * 2024-04-07 2024-05-07 吉林大学 一种基于点云数据的冲压件自动计数和分类方法
CN118172398A (zh) * 2024-05-14 2024-06-11 山东省计算中心(国家超级计算济南中心) 基于双层聚焦-注意力特征交互的点云配准方法及系统

Also Published As

Publication number Publication date
CN112966696B (zh) 2023-10-27
CN112966696A (zh) 2021-06-15

Similar Documents

Publication Publication Date Title
WO2022166400A1 (zh) 一种处理三维点云的方法、装置、设备以及存储介质
US10885352B2 (en) Method, apparatus, and device for determining lane line on road
CN108898086B (zh) 视频图像处理方法及装置、计算机可读介质和电子设备
CN106204522B (zh) 对单个图像的联合深度估计和语义标注
CN108229419B (zh) 用于聚类图像的方法和装置
CN111028327B (zh) 一种三维点云的处理方法、装置及设备
EP3844669A1 (en) Method and system for facilitating recognition of vehicle parts based on a neural network
CN110084299B (zh) 基于多头融合注意力的目标检测方法和装置
WO2022193335A1 (zh) 点云数据处理方法、装置、计算机设备和存储介质
US20130346346A1 (en) Semi-supervised random decision forests for machine learning
CN110390706B (zh) 一种物体检测的方法和装置
CN113807399A (zh) 一种神经网络训练方法、检测方法以及装置
CN112336342A (zh) 手部关键点检测方法、装置及终端设备
CN115294332B (zh) 一种图像处理方法、装置、设备和存储介质
Chen et al. Lrgnet: Learnable region growing for class-agnostic point cloud segmentation
Ward et al. RGB-D image-based object detection: from traditional methods to deep learning techniques
Chew et al. Large-scale 3D point-cloud semantic segmentation of urban and rural scenes using data volume decomposition coupled with pipeline parallelism
Panda et al. Kernel density estimation and correntropy based background modeling and camera model parameter estimation for underwater video object detection
Liu et al. Fast and robust ellipse detector based on edge following method
CN116432060A (zh) 基于雷达的目标自适应聚类方法、装置、设备及存储介质
CN113989574B (zh) 图像解释方法、图像解释装置、电子设备和存储介质
CN114283343A (zh) 基于遥感卫星图像的地图更新方法、训练方法和设备
CN111340145B (zh) 点云数据分类方法、装置、分类设备
CN111813882B (zh) 一种机器人地图构建方法、设备及存储介质
Zhang et al. A small target detection algorithm based on improved YOLOv5 in aerial image

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21924393

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21924393

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21924393

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19/01/2024)