CN116206308A - Semantic segmentation method and device for point cloud data and excavator - Google Patents
Semantic segmentation method and device for point cloud data and excavator Download PDFInfo
- Publication number
- CN116206308A CN116206308A CN202310165367.8A CN202310165367A CN116206308A CN 116206308 A CN116206308 A CN 116206308A CN 202310165367 A CN202310165367 A CN 202310165367A CN 116206308 A CN116206308 A CN 116206308A
- Authority
- CN
- China
- Prior art keywords
- point cloud
- cloud data
- dimensional point
- dimensional
- semantic segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 90
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000003062 neural network model Methods 0.000 claims abstract description 57
- 238000012549 training Methods 0.000 claims abstract description 42
- 238000012545 processing Methods 0.000 claims abstract description 18
- 238000001914 filtration Methods 0.000 claims description 34
- 238000012805 post-processing Methods 0.000 claims description 20
- 238000013507 mapping Methods 0.000 claims description 11
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000004590 computer program Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000009412 basement excavation Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/36—Applying a local operator, i.e. means to operate on image points situated in the vicinity of a given point; Non-linear local filtering operations, e.g. median filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/766—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Nonlinear Science (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a semantic segmentation method, a semantic segmentation device and an excavator for point cloud data, which are characterized in that three-dimensional point cloud data are firstly converted into two-dimensional point cloud data, so that the data are more compact and dense, voxelization is avoided, then the two-dimensional point cloud data are input into a neural network model for mature processing of two-dimensional images, semantic information of the three-dimensional point cloud data is directly generated, so that semantic segmentation of a three-dimensional environment scene is realized, training samples of the neural network model are generated in real time by a scene point cloud generation model, the scene point cloud generation model generates training samples under a corresponding scene according to scene requirements, namely, the scene point cloud generation model is utilized to generate training samples in real time, and when the semantic segmentation of an environment scene is carried out, the corresponding training samples are generated in real time according to the type of the environment scene so as to further train the neural network model, so that the accuracy of the semantic segmentation of the neural network model for the environment scene is improved, and the generalization capability of the neural network model is improved.
Description
Technical Field
The application relates to the technical field of acquisition of excavator operation scenes, in particular to a semantic segmentation method and device of point cloud data and an excavator.
Background
With the continuous development of automation technology, more and more engineering equipment and engineering vehicles also adopt automation operations, such as automation operations of an excavator, namely unmanned excavation operations. To realize unmanned excavation work, the excavator needs to understand the working scene, namely, the excavator predicts the actual class label through the sensor data, so that information such as a working area, an obstacle and the like in the working scene is clear.
The three-dimensional environment sensing technology is one of the core difficulties of automatic operation, and is responsible for identifying pedestrians, vehicles and other dynamic and static elements around the excavator so as to provide comprehensive environment information, and further plan operation lines, avoid static barriers, and dynamically pedestrians, vehicles and the like. The three-dimensional semantic segmentation method based on the laser radar aims to identify semantic categories of elements in the three-dimensional scene point cloud scanned by the laser radar, and is a basic task in the whole three-dimensional environment perception technology.
The existing three-dimensional environment point cloud semantic recognition is obtained based on a working mode or scene, for example, a trained model is adopted for semantic recognition, however, for engineering equipment such as an excavator, the operation scene of the engineering equipment is greatly changed, and if the same model is used for semantic recognition, the recognition precision is not high.
Disclosure of Invention
The present application has been made in order to solve the above technical problems. The embodiment of the application provides a semantic segmentation method and device for point cloud data and an excavator, and solves the technical problems.
According to one aspect of the present application, there is provided a semantic segmentation method of point cloud data, including: projecting the three-dimensional point cloud data to a two-dimensional image space to obtain two-dimensional point cloud data; inputting the two-dimensional point cloud data into a neural network model to generate semantic information of the three-dimensional point cloud data; the training samples of the neural network model are generated in real time by a scene point cloud generating model, and the scene point cloud generating model generates the training samples under the corresponding scene according to scene requirements.
In an embodiment, before the inputting the two-dimensional point cloud data into the neural network model and generating the semantic information of the three-dimensional point cloud data, the semantic segmentation method of the point cloud data further includes: and training the neural network model by adopting the training sample.
In an embodiment, the inputting the two-dimensional point cloud data into a neural network model, and generating the semantic information of the three-dimensional point cloud data includes: inputting the two-dimensional point cloud data into the neural network model to obtain semantic information of the three-dimensional point cloud data; the neural network model comprises a semantic segmentation model and a point cloud post-processing model, and the semantic segmentation model and the point cloud post-processing model are both converted into ONNX-format files and are integrated into one ONNX-format file.
In an embodiment, the inputting the two-dimensional point cloud data into a neural network model, and generating the semantic information of the three-dimensional point cloud data includes: carrying out semantic segmentation on the two-dimensional point cloud data to obtain semantic information of a two-dimensional image; and mapping the two-dimensional point cloud data back to a three-dimensional point cloud space based on the semantic information of the two-dimensional image to obtain the semantic information of the three-dimensional point cloud data.
In an embodiment, the mapping the two-dimensional point cloud data back to a three-dimensional point cloud space based on the semantic information of the two-dimensional image to obtain the semantic information of the three-dimensional point cloud data includes: mapping the two-dimensional point cloud data back to a three-dimensional point cloud space based on semantic information of the two-dimensional image to obtain initial information of the three-dimensional point cloud data; and post-processing the initial information to obtain semantic information of the three-dimensional point cloud data.
In an embodiment, the post-processing the initial information to obtain semantic information of the three-dimensional point cloud data includes: and eliminating edge errors in the initial information by adopting a proximity algorithm to obtain semantic information of the three-dimensional point cloud data.
In an embodiment, before the projecting the three-dimensional point cloud data into the two-dimensional image space to obtain the two-dimensional point cloud data, the semantic segmentation method of the point cloud data further includes: filtering the three-dimensional point cloud data to obtain filtered three-dimensional point cloud data;
the projecting the three-dimensional point cloud data into the two-dimensional image space to obtain the two-dimensional point cloud data comprises the following steps:
and projecting the filtered three-dimensional point cloud data to a two-dimensional image space to obtain the two-dimensional point cloud data.
In an embodiment, the filtering the three-dimensional point cloud data includes: and adopting any one or a combination of a plurality of filtering processing modes for the three-dimensional point cloud data: outlier filtering, null filtering, intensity filtering, box filtering, sphere filtering.
In an embodiment, the projecting the three-dimensional point cloud data into the two-dimensional image space to obtain the two-dimensional point cloud data includes: and projecting the three-dimensional point cloud data to a two-dimensional image space by adopting a spherical projection method to obtain the two-dimensional point cloud data.
According to another aspect of the present application, there is provided a semantic segmentation apparatus for point cloud data, including: the data projection module is used for projecting the three-dimensional point cloud data to a two-dimensional image space to obtain two-dimensional point cloud data; the semantic generation module is used for inputting the two-dimensional point cloud data into a neural network model and generating semantic information of the three-dimensional point cloud data; the training samples of the neural network model are generated in real time by a scene point cloud generating model, and the scene point cloud generating model generates the training samples under the corresponding scene according to scene requirements.
According to another aspect of the present application, there is provided an excavator, including: a body; the laser radar is arranged on the machine body and is used for collecting three-dimensional point cloud data; and the semantic segmentation device of the point cloud data is connected with the laser radar.
According to the semantic segmentation method and device for the point cloud data and the excavator, the three-dimensional point cloud data are projected to a two-dimensional image space to obtain the two-dimensional point cloud data; inputting the two-dimensional point cloud data into a neural network model to generate semantic information of the three-dimensional point cloud data; the training samples of the neural network model are generated in real time by a scene point cloud generating model, and the scene point cloud generating model generates the training samples under the corresponding scene according to scene requirements; firstly, converting three-dimensional point cloud data into two-dimensional point cloud data, enabling the data to be more compact and dense, avoiding voxelization, then inputting the two-dimensional point cloud data into a neural network model for mature processing of two-dimensional images, directly generating semantic information of the three-dimensional point cloud data, and therefore achieving semantic segmentation of three-dimensional environment scenes.
Drawings
The foregoing and other objects, features and advantages of the present application will become more apparent from the following more particular description of embodiments of the present application, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.
Fig. 1 is a flowchart of a semantic segmentation method of point cloud data according to an exemplary embodiment of the present application.
Fig. 2 is a flowchart of a semantic segmentation method of point cloud data according to another exemplary embodiment of the present application.
Fig. 3 is a flowchart of a semantic segmentation method of point cloud data according to another exemplary embodiment of the present application.
Fig. 4 is a flowchart of a semantic segmentation method of point cloud data according to another exemplary embodiment of the present application.
Fig. 5 is a schematic diagram of a semantic segmentation method of point cloud data according to an exemplary embodiment of the present application.
Fig. 6 is a schematic structural diagram of a semantic segmentation device for point cloud data according to an exemplary embodiment of the present application.
Fig. 7 is a schematic structural diagram of a semantic segmentation device for point cloud data according to another exemplary embodiment of the present application.
Fig. 8 is a block diagram of an electronic device according to an exemplary embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application and not all of the embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
Fig. 1 is a flowchart of a semantic segmentation method of point cloud data according to an exemplary embodiment of the present application. As shown in fig. 1, the semantic segmentation method of the point cloud data includes the following steps:
step 110: and projecting the three-dimensional point cloud data to a two-dimensional image space to obtain two-dimensional point cloud data.
Specifically, the point cloud data semantic segmentation method can be applied to environmental perception of a working scene of the unmanned excavator, namely, environmental data is perceived by acquiring the point cloud data in the working scene and identifying the type of the point cloud data. This application can set up one or more lidar and one or more camera on the organism of excavator, and lidar or camera preferred arrangement is at the excavator cab top to guarantee to obtain the biggest working field of view scope, and be difficult for receiving the influence of excavator working device vibrations, other lidar or camera can be arranged in the swing arm department, plays the effect of supplementing excavator cab top blind area. Compared with a camera sensor, the laser radar sensor has a wider visual field range, can return more accurate distance measurement, is not easily affected by environmental changes such as illumination and the like, and has great application prospect in the sensing of complex and changeable working scenes of the excavator.
In one embodiment, the specific implementation of step 110 may be: and projecting the three-dimensional point cloud data to a two-dimensional image space by adopting a spherical projection method to obtain two-dimensional point cloud data. In order to compactly and structurally characterize sparse and irregular three-dimensional point cloud data and perform standard convolution operation, the three-dimensional point cloud data are projected to a two-dimensional image space through the following formula:
where w and h represent the height and width, respectively, of the projected image, r represents the distance between each point and the origin of coordinates,f represents the vertical field of view of the lidar (i.e. the visibility of the lidar in the vertical direction), f= |f down |+|f up |。
Step 120: inputting the two-dimensional point cloud data into a neural network model, and generating semantic information of the three-dimensional point cloud data.
The training samples of the neural network model are generated in real time by a scene point cloud generating model, and the scene point cloud generating model generates the training samples under the corresponding scene according to scene requirements. After the two-dimensional point cloud data are obtained, the two-dimensional point cloud data are input into the trained neural network model, and semantic information of the three-dimensional point cloud data is directly generated, so that perception of a working environment is realized. According to the method, a scene point cloud generation model is built according to an actual scene, corresponding scene data (three-dimensional point cloud data of corresponding scenes) can be generated for different working scenes by the scene point cloud generation model, so that sample data for different working scenes are obtained, and training samples corresponding to different working scenes are obtained by marking the sample data (for example, two-dimensional images obtained by converting the three-dimensional point cloud data). When the excavator performs actual operation, according to the working scene of the current operation, the scene point cloud generation model generates a corresponding training sample to further train the neural network model so as to improve the semantic segmentation capability and accuracy of the neural network model on the working scene of the current operation. Specifically, the scene point cloud generation model can generate training samples for the current operation by adopting methods of image word bag feature matching, point cloud ICP matching and the like based on data similarity, and iteratively train the neural network model to enable the training samples to meet the actual working scene requirement of the excavator.
Specifically, the neural network model comprises a semantic segmentation model and a point cloud post-processing model, and the semantic segmentation model and the point cloud post-processing model are both converted into ONNX format files and are integrated into one ONNX format file. The open neural network exchange (Open Neural Network Exchange, ONNX) format is a standard for representing deep learning models that enable the model to be transferred between different frameworks. ONNX is an open file format designed for machine learning to store trained models that allows different artificial intelligence frameworks to store model data and interact in the same format. Regardless of the training framework used to train the model (e.g., tensorFlow/Pytorch/OneFlow/Paddle), the model of these frameworks can be uniformly converted to a uniform format, ONNX, for storage after training is completed. According to the method, the semantic segmentation model and the point cloud post-processing model are both converted into ONNX-format files, then the ONNX-format files obtained by converting the semantic segmentation model and the point cloud post-processing model are fused into one ONNX-format file, so that one link is reduced in the model reasoning stage, the point cloud data are directly input into the fused ONNX-format files, segmented and post-processed results can be directly output, and the time consuming effect of an algorithm can be reduced.
According to the semantic segmentation method for the point cloud data, the three-dimensional point cloud data are projected to a two-dimensional image space, so that the two-dimensional point cloud data are obtained; inputting the two-dimensional point cloud data into a neural network model to generate semantic information of the three-dimensional point cloud data; the training samples of the neural network model are generated in real time by a scene point cloud generating model, and the scene point cloud generating model generates the training samples under the corresponding scene according to scene requirements; firstly, converting three-dimensional point cloud data into two-dimensional point cloud data, enabling the data to be more compact and dense, avoiding voxelization, then inputting the two-dimensional point cloud data into a neural network model for mature processing of two-dimensional images, directly generating semantic information of the three-dimensional point cloud data, and therefore achieving semantic segmentation of three-dimensional environment scenes.
Fig. 2 is a flowchart of a semantic segmentation method of point cloud data according to another exemplary embodiment of the present application. As shown in fig. 2, before step 120, the semantic segmentation method of the point cloud data may further include:
step 130: training the neural network model by using a training sample.
Before semantic segmentation is executed, a corresponding training sample is generated by the scene point cloud generation model according to the working scene of the current operation, and the neural network model is trained iteratively, so that the requirements of the actual working scene of the excavator are met, and the semantic segmentation capability and accuracy of the neural network model on the working scene of the current operation are improved.
Fig. 3 is a flowchart of a semantic segmentation method of point cloud data according to another exemplary embodiment of the present application. As shown in fig. 3, the step 120 may include:
step 121: and carrying out semantic segmentation on the two-dimensional point cloud data to obtain semantic information of the two-dimensional image.
By carrying out semantic segmentation on the two-dimensional point cloud data, the semantic segmentation of the two-dimensional point cloud data can be realized by adopting a neural network model (comprising a standard convolution module and the like) for processing the two-dimensional image, so that the difficulty and the calculated amount of the semantic segmentation can be reduced, and the sparse unordered laser radar point cloud data can be converted into more compact structured two-dimensional point cloud data, thereby improving the efficiency of the semantic segmentation. Specifically, the application can adopt a SalsaNext, rangeNet ++, squeezeSegV3 and other neural network models for semantic segmentation.
Step 122: and mapping the two-dimensional point cloud data back to the three-dimensional point cloud space based on the semantic information of the two-dimensional image so as to obtain the semantic information of the three-dimensional point cloud data.
In one embodiment, the implementation of step 122 may be: mapping the two-dimensional point cloud data back to a three-dimensional point cloud space based on semantic information of the two-dimensional image to obtain initial information of the three-dimensional point cloud data; and post-processing is carried out on the initial information to obtain semantic information of the three-dimensional point cloud data. After the semantic information of the two-dimensional image is obtained, the two-dimensional point cloud data are mapped back to the three-dimensional point cloud space, namely, the two-dimensional point cloud data are restored to the three-dimensional point cloud data, so that scene information (including information of a target object and the like) in the three-dimensional space is obtained, after the two-dimensional point cloud data are mapped back to the three-dimensional point cloud space, post-processing is carried out on the three-dimensional point cloud data (initial information) so as to solve discretization errors caused by spherical projection (namely, the problem of edge error classification of the target object caused when the two-dimensional image is mapped back to the three-dimensional point cloud space), and therefore the semantic information of the accurate three-dimensional point cloud data is obtained. Specifically, an edge error in the initial information is eliminated by adopting a proximity algorithm (such as a KNN post-processing algorithm), so as to obtain semantic information of the three-dimensional point cloud data. KNN refers to a K-nearest neighbor classification algorithm, which is a data mining classification method, and K nearest neighbors are the meaning of K nearest neighbors, i.e. each sample can be represented by K nearest neighbors. The core idea of the kNN algorithm is that if a sample belongs to a certain class for the most of the k nearest samples in the feature space, then that sample also belongs to that class and has the characteristics of the samples on that class. The method only determines the category to which the sample to be classified belongs according to the category of one or more samples which are nearest to each other in determining the classification decision. The kNN method is related to a very small number of neighboring samples when making a class decision. Since the kNN method mainly depends on the adjacent samples with limited surroundings, rather than the method of distinguishing the class domain, the kNN method is more suitable than other methods for the set of samples to be separated with more cross or overlap of the class domain.
Fig. 4 is a flowchart of a semantic segmentation method of point cloud data according to another exemplary embodiment of the present application. As shown in fig. 4, before step 110, the semantic segmentation method of the point cloud data may further include:
step 140: and filtering the three-dimensional point cloud data to obtain filtered three-dimensional point cloud data.
Because the laser radar mounted position is at the top of the cab of the excavator or the movable arm, the problems that laser radar emitting beams are blocked by the excavator body to cause failure or point cloud data are too concentrated at the position of the excavator body and the like can be avoided, and in addition, abnormal point cloud data can possibly occur due to complex working scenes of the excavator, so that the accuracy of subsequent semantic segmentation is ensured.
In one embodiment, the specific implementation of step 140 may be: any one or a combination of a plurality of the following filtering processing modes is adopted for the three-dimensional point cloud data: outlier filtering, null filtering, intensity filtering, box filtering, sphere filtering.
Correspondingly, the step 110 may include:
step 111: and projecting the filtered three-dimensional point cloud data to a two-dimensional image space to obtain two-dimensional point cloud data.
After the filtering processing, the three-dimensional point cloud data after the filtering processing is projected to a two-dimensional image space, so that not only can the accuracy of the basic point cloud data be improved, but also the calculated amount of the point cloud data can be reduced, and the segmentation efficiency is improved.
Fig. 5 is a schematic diagram of a semantic segmentation method of point cloud data according to an exemplary embodiment of the present application. As shown in fig. 5, the semantic segmentation method of the point cloud data includes the following steps:
step 510: and the laser radar acquires the point cloud data.
The method and the device acquire three-dimensional point cloud data in a working scene in real time by using the laser radar arranged on the excavator.
Step 520: training samples are generated.
According to the method and the device, training samples in corresponding scenes are generated by using the scene point cloud generation model according to scene requirements so as to update data.
Step 530: and (5) preprocessing point cloud data.
According to the method, the three-dimensional point cloud data are subjected to filtering processing by utilizing the pre-filtering processing, so that more accurate three-dimensional point cloud data are obtained, and an accurate basic data guarantee is provided for the accuracy of subsequent semantic segmentation.
Step 540: and (5) projecting point cloud data.
According to the method, the three-dimensional point cloud data are projected to the two-dimensional image space by using the spherical projection method, so that the two-dimensional point cloud data are obtained, the sparse and random three-dimensional point cloud data are compactly and structurally represented, and standard convolution operation can be performed, so that the difficulty of semantic segmentation is reduced, and the efficiency of semantic segmentation is improved.
Step 550: and (5) model training.
The neural network model comprises a semantic segmentation model and a point cloud data post-processing model, namely, semantic segmentation and point cloud data post-processing are integrated into one neural network model, so that the semantic segmentation and post-processing of the point cloud data are realized. Specifically, the semantic segmentation model of the method can adopt an ONNX open neural network, the point cloud data post-processing model can adopt an ONNX operator, and the ONNX operator is integrated into the ONNX open neural network.
Step 560: model deployment quantifies acceleration.
TensorRT is a high-performance deep learning reasoning (Infinite) optimizer developed by NVIDIA, which can provide low-latency, high-throughput deployment reasoning for deep learning applications. The TensorRT can be used for reasoning and accelerating a very large-scale data center, an embedded platform or an automatic driving platform. According to the method, the neural network model is deployed and optimized by using neural network reasoning optimizers such as a TensorRT optimizer, and particularly, the TensorRT optimizer can conduct FP16 quantization so as to reduce reasoning links, improve reasoning speed and facilitate the deployment of the neural network model to the actual working environment of the excavator.
Step 570: and (5) point cloud data segmentation.
After the neural network model is trained, semantic segmentation is carried out on point cloud data acquired by the laser radar, so that a semantic segmentation result of a working scene is obtained.
Fig. 6 is a schematic structural diagram of a semantic segmentation device for point cloud data according to an exemplary embodiment of the present application. As shown in fig. 6, the semantic segmentation apparatus 60 of the point cloud data includes: the data projection module 61 is configured to project the three-dimensional point cloud data into a two-dimensional image space to obtain two-dimensional point cloud data; the semantic generation module 62 is used for inputting the two-dimensional point cloud data into the neural network model to generate semantic information of the three-dimensional point cloud data; the training samples of the neural network model are generated in real time by a scene point cloud generating model, and the scene point cloud generating model generates the training samples under the corresponding scene according to scene requirements.
According to the semantic segmentation device for the point cloud data, the three-dimensional point cloud data are projected to a two-dimensional image space through the data projection module 61, and two-dimensional point cloud data are obtained; then, the semantic generation module 62 inputs the two-dimensional point cloud data into a neural network model to generate semantic information of the three-dimensional point cloud data; the training samples of the neural network model are generated in real time by a scene point cloud generating model, and the scene point cloud generating model generates the training samples under the corresponding scene according to scene requirements; firstly, converting three-dimensional point cloud data into two-dimensional point cloud data, enabling the data to be more compact and dense, avoiding voxelization, then inputting the two-dimensional point cloud data into a neural network model for mature processing of two-dimensional images, directly generating semantic information of the three-dimensional point cloud data, and therefore achieving semantic segmentation of three-dimensional environment scenes.
In an embodiment, the data projection module 61 may be further configured to: and projecting the three-dimensional point cloud data to a two-dimensional image space by adopting a spherical projection method to obtain two-dimensional point cloud data.
Fig. 7 is a schematic structural diagram of a semantic segmentation device for point cloud data according to another exemplary embodiment of the present application. As shown in fig. 7, the semantic segmentation apparatus 60 of the point cloud data may further include: the model training module 63 is configured to train the neural network model by using the training samples.
In one embodiment, as shown in FIG. 7, the semantic generation module 62 may include: a two-dimensional segmentation unit 621, configured to perform semantic segmentation on the two-dimensional point cloud data to obtain semantic information of a two-dimensional image; the point cloud mapping unit 622 is configured to map the two-dimensional point cloud data back to the three-dimensional point cloud space based on the semantic information of the two-dimensional image, so as to obtain the semantic information of the three-dimensional point cloud data.
In an embodiment, the point cloud mapping unit 622 may be further configured to: mapping the two-dimensional point cloud data back to a three-dimensional point cloud space based on semantic information of the two-dimensional image to obtain initial information of the three-dimensional point cloud data; and post-processing is carried out on the initial information to obtain semantic information of the three-dimensional point cloud data.
In an embodiment, as shown in fig. 7, the semantic segmentation apparatus 60 of the point cloud data may further include: the preprocessing module 64 is configured to perform filtering processing on the three-dimensional point cloud data, so as to obtain filtered three-dimensional point cloud data.
In an embodiment, the pre-processing module 64 may be further configured to: any one or a combination of a plurality of the following filtering processing modes is adopted for the three-dimensional point cloud data: outlier filtering, null filtering, intensity filtering, box filtering, sphere filtering.
Correspondingly, the data projection module 61 may be further configured to: and projecting the filtered three-dimensional point cloud data to a two-dimensional image space to obtain two-dimensional point cloud data.
The application also provides an excavator, comprising: a body; the laser radar is arranged on the machine body and is used for collecting three-dimensional point cloud data; and the semantic segmentation device of the point cloud data is connected with the laser radar.
According to the excavator, the three-dimensional point cloud data are projected to a two-dimensional image space to obtain two-dimensional point cloud data; inputting the two-dimensional point cloud data into a neural network model to generate semantic information of the three-dimensional point cloud data; the training samples of the neural network model are generated in real time by a scene point cloud generating model, and the scene point cloud generating model generates the training samples under the corresponding scene according to scene requirements; firstly, converting three-dimensional point cloud data into two-dimensional point cloud data, enabling the data to be more compact and dense, avoiding voxelization, then inputting the two-dimensional point cloud data into a neural network model for mature processing of two-dimensional images, directly generating semantic information of the three-dimensional point cloud data, and therefore achieving semantic segmentation of three-dimensional environment scenes.
Next, an electronic device according to an embodiment of the present application is described with reference to fig. 8. The electronic device may be either or both of the first device and the second device, or a stand-alone device independent thereof, which may communicate with the first device and the second device to receive the acquired input signals therefrom.
Fig. 8 illustrates a block diagram of an electronic device according to an embodiment of the present application.
As shown in fig. 8, the electronic device 10 includes one or more processors 11 and a memory 12.
The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.
Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 11 to implement the methods of the various embodiments of the present application described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, and the like may also be stored in the computer-readable storage medium.
In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).
When the electronic device is a stand-alone device, the input means 13 may be a communication network connector for receiving the acquired input signals from the first device and the second device.
In addition, the input device 13 may also include, for example, a keyboard, a mouse, and the like.
The output device 14 may output various information to the outside, including the determined distance information, direction information, and the like. The output means 14 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.
Of course, only some of the components of the electronic device 10 that are relevant to the present application are shown in fig. 8 for simplicity, components such as buses, input/output interfaces, etc. are omitted. In addition, the electronic device 10 may include any other suitable components depending on the particular application.
The computer program product may write program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the application to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.
Claims (10)
1. The semantic segmentation method of the point cloud data is characterized by comprising the following steps of:
projecting the three-dimensional point cloud data to a two-dimensional image space to obtain two-dimensional point cloud data; and
inputting the two-dimensional point cloud data into a neural network model to generate semantic information of the three-dimensional point cloud data; the training samples of the neural network model are generated in real time by a scene point cloud generating model, and the scene point cloud generating model generates the training samples under the corresponding scene according to scene requirements.
2. The semantic segmentation method of point cloud data according to claim 1, wherein before the inputting of the two-dimensional point cloud data into a neural network model to generate semantic information of the three-dimensional point cloud data, the semantic segmentation method of point cloud data further comprises:
and training the neural network model by adopting the training sample.
3. The method for semantic segmentation of point cloud data according to claim 1, wherein the inputting the two-dimensional point cloud data into a neural network model, generating semantic information of the three-dimensional point cloud data comprises:
inputting the two-dimensional point cloud data into the neural network model to obtain semantic information of the three-dimensional point cloud data; the neural network model comprises a semantic segmentation model and a point cloud post-processing model, and the semantic segmentation model and the point cloud post-processing model are both converted into ONNX-format files and are integrated into one ONNX-format file.
4. The method for semantic segmentation of point cloud data according to claim 1, wherein the inputting the two-dimensional point cloud data into a neural network model, generating semantic information of the three-dimensional point cloud data comprises:
carrying out semantic segmentation on the two-dimensional point cloud data to obtain semantic information of a two-dimensional image; and
and mapping the two-dimensional point cloud data back to a three-dimensional point cloud space based on the semantic information of the two-dimensional image to obtain the semantic information of the three-dimensional point cloud data.
5. The method of claim 4, wherein mapping the two-dimensional point cloud data back to a three-dimensional point cloud space based on semantic information of the two-dimensional image to obtain semantic information of the three-dimensional point cloud data comprises:
mapping the two-dimensional point cloud data back to a three-dimensional point cloud space based on semantic information of the two-dimensional image to obtain initial information of the three-dimensional point cloud data; and
and carrying out post-processing on the initial information to obtain semantic information of the three-dimensional point cloud data.
6. The method for semantic segmentation of point cloud data according to claim 5, wherein the post-processing the initial information to obtain semantic information of the three-dimensional point cloud data comprises:
and eliminating edge errors in the initial information by adopting a proximity algorithm to obtain semantic information of the three-dimensional point cloud data.
7. The semantic segmentation method of point cloud data according to claim 1, wherein before the projecting three-dimensional point cloud data into a two-dimensional image space to obtain two-dimensional point cloud data, the semantic segmentation method of point cloud data further comprises:
filtering the three-dimensional point cloud data to obtain filtered three-dimensional point cloud data;
the projecting the three-dimensional point cloud data into the two-dimensional image space to obtain the two-dimensional point cloud data comprises the following steps:
and projecting the filtered three-dimensional point cloud data to a two-dimensional image space to obtain the two-dimensional point cloud data.
8. The semantic segmentation method of point cloud data according to claim 7, wherein the filtering the three-dimensional point cloud data comprises:
and adopting any one or a combination of a plurality of filtering processing modes for the three-dimensional point cloud data: outlier filtering, null filtering, intensity filtering, box filtering, sphere filtering.
9. A semantic segmentation apparatus for point cloud data, comprising:
the data projection module is used for projecting the three-dimensional point cloud data to a two-dimensional image space to obtain two-dimensional point cloud data; and
the semantic generation module is used for inputting the two-dimensional point cloud data into a neural network model and generating semantic information of the three-dimensional point cloud data; the training samples of the neural network model are generated in real time by a scene point cloud generating model, and the scene point cloud generating model generates the training samples under the corresponding scene according to scene requirements.
10. An excavator, comprising:
a body;
the laser radar is arranged on the machine body and is used for collecting three-dimensional point cloud data; and
the semantic segmentation device for point cloud data according to claim 9, wherein the semantic segmentation device for point cloud data is connected with the lidar.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310165367.8A CN116206308A (en) | 2023-02-24 | 2023-02-24 | Semantic segmentation method and device for point cloud data and excavator |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310165367.8A CN116206308A (en) | 2023-02-24 | 2023-02-24 | Semantic segmentation method and device for point cloud data and excavator |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116206308A true CN116206308A (en) | 2023-06-02 |
Family
ID=86509063
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310165367.8A Pending CN116206308A (en) | 2023-02-24 | 2023-02-24 | Semantic segmentation method and device for point cloud data and excavator |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116206308A (en) |
-
2023
- 2023-02-24 CN CN202310165367.8A patent/CN116206308A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110363058B (en) | Three-dimensional object localization for obstacle avoidance using one-shot convolutional neural networks | |
CN111615703B (en) | Sensor Data Segmentation | |
US11017244B2 (en) | Obstacle type recognizing method and apparatus, device and storage medium | |
JP2020042816A (en) | Object detection method, device, apparatus, storage media, and vehicle | |
CN110176078B (en) | Method and device for labeling training set data | |
CN111816020A (en) | Migrating synthetic lidar data to a real domain for autonomous vehicle training | |
JP7224682B1 (en) | 3D multiple object detection device and method for autonomous driving | |
CN113761999A (en) | Target detection method and device, electronic equipment and storage medium | |
CN115273002A (en) | Image processing method, device, storage medium and computer program product | |
CN115810133B (en) | Welding control method based on image processing and point cloud processing and related equipment | |
Shepel et al. | Occupancy grid generation with dynamic obstacle segmentation in stereo images | |
CN111695497B (en) | Pedestrian recognition method, medium, terminal and device based on motion information | |
KR20210121628A (en) | Method and system for automatically processing point cloud based on Reinforcement learning | |
Wen et al. | CAE-RLSM: Consistent and efficient redundant line segment merging for online feature map building | |
Guo et al. | Road environment perception for safe and comfortable driving | |
JP2022035033A (en) | Information processing system, information processing method, program and vehicle control system | |
CN117032215A (en) | Mobile robot object identification and positioning method based on binocular vision | |
CN116311114A (en) | Method and device for generating drivable region, electronic equipment and storage medium | |
Liu et al. | A lightweight lidar-camera sensing method of obstacles detection and classification for autonomous rail rapid transit | |
CN116206308A (en) | Semantic segmentation method and device for point cloud data and excavator | |
KR20230120116A (en) | Electronic device for detecting object and method for controlling the same | |
CN115937817A (en) | Target detection method and system and excavator | |
CN115565072A (en) | Road garbage recognition and positioning method and device, electronic equipment and medium | |
CN117152240A (en) | Object detection method, device, equipment and storage medium based on monocular camera | |
Katare et al. | Autonomous embedded system enabled 3-D object detector:(With point cloud and camera) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |