CN111178413A - 3D point cloud semantic segmentation method, device and system - Google Patents
3D point cloud semantic segmentation method, device and system Download PDFInfo
- Publication number
- CN111178413A CN111178413A CN201911329744.7A CN201911329744A CN111178413A CN 111178413 A CN111178413 A CN 111178413A CN 201911329744 A CN201911329744 A CN 201911329744A CN 111178413 A CN111178413 A CN 111178413A
- Authority
- CN
- China
- Prior art keywords
- point cloud
- point
- rgb image
- piece
- ground
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000011218 segmentation Effects 0.000 title claims description 51
- 238000012549 training Methods 0.000 claims abstract description 39
- 238000001914 filtration Methods 0.000 claims abstract description 28
- 238000001514 detection method Methods 0.000 claims description 21
- 238000013527 convolutional neural network Methods 0.000 claims description 20
- 238000012545 processing Methods 0.000 claims description 19
- 238000010586 diagram Methods 0.000 description 12
- 241000282326 Felis catus Species 0.000 description 7
- 238000004590 computer program Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 3
- 238000013508 migration Methods 0.000 description 3
- 230000005012 migration Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20024—Filtering details
- G06T2207/20032—Median filtering
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a method, a device and a system for segmenting 3D point cloud semantics, which are used for solving the technical problem that the 3D point cloud semantics can not be segmented effectively in the prior art, and the method comprises the following steps: respectively acquiring an RGB image to be detected and a first 3D point cloud corresponding to the RGB image, detecting and judging an object in the RGB image by adopting a pre-training model, and determining whether the detected object is an object of a specified type; if yes, acquiring a second 3D point cloud from the first 3D point cloud, and removing the ground point cloud in the second 3D point cloud; 3D clustering is carried out on the second 3D point cloud after the ground point cloud is removed, and a point cloud piece set is obtained; determining the maximum value and the minimum value of the points contained in the 3D point cloud of the detected object, and filtering a target point cloud film containing the points between the maximum value and the minimum value from the point cloud film set; and determining the point cloud piece with the maximum points in the target point cloud pieces as the 3D semantic point cloud of the detected object.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a method, a device and a system for 3D point cloud semantic segmentation.
Background
In recent years, with the wide application of 3D sensors such as laser radars and Red Green Blue-Depth (RGB-D) cameras in the fields of robots and unmanned driving, the application of deep learning in 3D point cloud data has become one of research hotspots. The 3D point cloud is a set of vectors in a three-dimensional coordinate system, the vectors are usually expressed in the form of x, y, z three-dimensional coordinates, and generally represent the shape of the external surface of an object, and in addition, besides the geometric information represented by (x, y, z), information such as RGB color, intensity, gray value, depth or number of returns may be contained, and the semantic segmentation of the 3D point cloud is to divide the 3D point cloud into semantically meaningful parts, and then semantically mark each part as one of predefined classes, i.e. segment the 3D point cloud, and distinguish different segmentations.
At present, a common 3D point cloud semantic segmentation method is to convert a 3D point cloud into a manual voxel grid feature or a multi-view image feature, and then send the manual voxel grid feature or the multi-view image feature to a deep learning network for feature extraction, so that the method for converting features is large in data size and complex in calculation, and if the resolution is reduced, the segmentation accuracy is reduced.
In view of this, how to effectively segment the 3D point cloud semantics, so as to improve the segmentation accuracy, is a major research point.
Disclosure of Invention
The embodiment of the application provides a method, a device and a system for segmenting 3D point cloud semantics, which are used for solving the technical problem that the 3D point cloud semantics cannot be segmented effectively in the prior art.
In a first aspect, to solve the above technical problem, an embodiment of the present application provides a method for semantic segmentation of a 3D point cloud, where a technical scheme of the method is as follows:
respectively obtaining an RGB image to be detected and a first 3D point cloud corresponding to the RGB image, detecting and judging an object in the RGB image by adopting a pre-training model, and determining whether the detected object is an object of a specified type, wherein the pre-training model is a target detection model generated according to a fast regional convolutional neural network Faster R-CNN;
if the object is of the designated type, acquiring a second 3D point cloud from the first 3D point cloud, and removing a ground point cloud in the second 3D point cloud, wherein the second 3D point cloud corresponds to a target detection frame of the detected object in the RGB image;
3D clustering is carried out on the second 3D point cloud after the ground point cloud is removed, and a point cloud piece set is obtained, wherein the point cloud piece set comprises at least one point cloud piece, and the point cloud pieces represent the 3D point clouds of objects of different types;
determining the maximum value and the minimum value of the points contained in the 3D point cloud of the detected object, and filtering a target point cloud picture containing the points between the maximum value and the minimum value from the point cloud picture set;
and acquiring a point cloud picture containing the most points from the target point cloud pictures, and determining the point cloud picture containing the most points as the 3D semantic point cloud of the detected object.
In the embodiment of the application, an RGB image to be detected and a first 3D point cloud corresponding to the RGB image may be obtained, then a pre-training model is used to detect and judge an object in the RGB image, determine whether the detected object is an object of a specified type, if the object is an object of a specified type, a second 3D point cloud is obtained from the first 3D point cloud, and a ground point cloud in the second 3D point cloud is removed, wherein the second 3D point cloud corresponds to a target detection frame of the object detected in the RGB image, then 3D clustering is performed on the second 3D point cloud after the ground point cloud is removed, a point cloud slice set is obtained, wherein the point cloud slice set includes at least one point cloud slice representing the 3D point clouds of different types of objects, a maximum value and a minimum value of the number contained in the detected object 3D point clouds are determined, a target point cloud slice containing the number between the maximum value and the minimum value is filtered from the point cloud slice set, and acquiring the point cloud piece with the most points from the target point cloud pieces, and determining the point cloud piece with the most points as the 3D semantic point cloud of the detected object. The method comprises the steps of detecting and judging objects in an RGB image through a target detection model generated according to Faster R-CNN, obtaining 3D point cloud corresponding to a target detection frame of the detected objects in the RGB image, carrying out a series of processing on the 3D point cloud corresponding to the target detection frame, filtering interference point cloud in the 3D point cloud, and determining the 3D semantic point cloud of the detected objects, so that the data calibration workload is reduced, the segmentation precision is improved, and the 3D point cloud semantic is effectively segmented.
With reference to the first aspect, in a first optional implementation manner of the first aspect, before respectively acquiring an RGB image to be detected and a first 3D point cloud corresponding to the RGB image, the method further includes:
obtaining a sample RGB image set of a pre-training model, wherein the sample RGB image set comprises at least one sample RGB image;
marking an object of a specified type in a sample RGB image of the sample RGB image set to generate a sample data set;
training Faster R-CNN by using the sample data set to generate the pre-training model.
With reference to the first aspect, in a second optional implementation manner of the first aspect, removing the ground point cloud in the second 3D point cloud includes:
performing median filtering and voxelization processing on the second 3D point cloud;
determining ground point clouds in the median filtering and voxelization processed second 3D point clouds according to a ground equation;
removing the ground point cloud from the median filtered and voxelized second 3D point cloud.
With reference to the second optional implementation manner of the first aspect, in a third optional implementation manner of the first aspect, the performing 3D clustering on the second 3D point cloud after removing the ground point cloud to obtain a point cloud slice set includes:
adding points in a first point in the second 3D point cloud after the ground point cloud is removed into a point cloud slice corresponding to the first point according to the 3D neighborhood ball of the first point, wherein the 3D neighborhood ball is a ball with the first point as the center of sphere and a preset threshold as the radius;
adding the point cloud piece corresponding to the first point into a point cloud piece corresponding to the first point until a point in a 3D neighborhood sphere of any point in the point cloud pieces corresponding to the first point is added, and obtaining a point cloud piece corresponding to the first point;
acquiring a point cloud sheet corresponding to a second point according to the 3D neighborhood sphere of the second point in the second 3D point cloud after the ground point cloud is removed, wherein the second point is any point except the point in the point cloud sheet corresponding to the first point;
and adding the corresponding point cloud piece into any point in the second 3D point cloud after the ground point cloud is removed, and acquiring a point cloud piece set.
With reference to the first aspect or the third optional implementation manner of the first aspect, in a fourth optional implementation manner of the first aspect, determining a maximum value and a minimum value of point numbers included in the 3D point cloud of the detected object includes:
establishing a geometric model of the detected object according to the geometric size of the detected object;
and acquiring the maximum value and the minimum value of the points contained in the 3D point cloud of the detected object according to the geometric model and the size of the midpoint of the 3D point cloud.
In a second aspect, an apparatus for semantic segmentation of 3D point cloud is provided, including:
the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for respectively obtaining an RGB image to be detected and a first 3D point cloud corresponding to the RGB image, detecting and judging an object in the RGB image by adopting a pre-training model, and determining whether the detected object is an object of a specified type, and the pre-training model is a target detection model generated according to a fast regional convolutional neural network Faster R-CNN;
the processing module is used for acquiring a second 3D point cloud from the first 3D point cloud and removing a ground point cloud in the second 3D point cloud if the object is of the specified type, wherein the second 3D point cloud corresponds to the target detection frame of the detected object in the RGB image;
the acquisition module is used for performing 3D clustering on the second 3D point cloud after the ground point cloud is removed to acquire a point cloud piece set, wherein the point cloud piece set comprises at least one point cloud piece, and the point cloud pieces represent the 3D point clouds of objects of different types;
the filtering module is used for determining the maximum value and the minimum value of the points contained in the 3D point cloud of the detected object and filtering a target point cloud film containing the points between the maximum value and the minimum value from the point cloud film set;
and the second determining module is used for acquiring the point cloud piece with the most points from the target point cloud pieces and determining the point cloud piece with the most points as the 3D semantic point cloud of the detected object.
With reference to the second aspect, in a first optional implementation manner of the second aspect, the apparatus further includes a generating module configured to:
obtaining a sample RGB image set of a pre-training model, wherein the sample RGB image set comprises at least one sample RGB image;
marking an object of a specified type in a sample RGB image of the sample RGB image set to generate a sample data set;
training Faster R-CNN by using the sample data set to generate the pre-training model.
With reference to the second aspect, in a second optional implementation manner of the second aspect, the processing module is specifically configured to:
performing median filtering and voxelization processing on the second 3D point cloud;
determining ground point clouds in the median filtering and voxelization processed second 3D point clouds according to a ground equation;
removing the ground point cloud from the median filtered and voxelized second 3D point cloud.
With reference to the second optional implementation manner of the second aspect, in a third optional implementation manner of the second aspect, the obtaining module is specifically configured to:
adding points in a first point in the second 3D point cloud after the ground point cloud is removed into a point cloud slice corresponding to the first point according to the 3D neighborhood ball of the first point, wherein the 3D neighborhood ball is a ball with the first point as the center of sphere and a preset threshold as the radius;
adding the point cloud piece corresponding to the first point into a point cloud piece corresponding to the first point until a point in a 3D neighborhood sphere of any point in the point cloud pieces corresponding to the first point is added, and obtaining a point cloud piece corresponding to the first point;
acquiring a point cloud sheet corresponding to a second point according to the 3D neighborhood sphere of the second point in the second 3D point cloud after the ground point cloud is removed, wherein the second point is any point except the point in the point cloud sheet corresponding to the first point;
and adding the corresponding point cloud piece into any point in the second 3D point cloud after the ground point cloud is removed, and acquiring a point cloud piece set.
With reference to the second aspect or the third optional implementation manner of the second aspect, in a fourth optional implementation manner of the second aspect, the filter module is specifically configured to:
establishing a geometric model of the detected object according to the geometric size of the detected object;
and acquiring the maximum value and the minimum value of the points contained in the 3D point cloud of the detected object according to the geometric model and the size of the midpoint of the 3D point cloud.
In a third aspect, a system for semantic segmentation of 3D point cloud is provided, including:
a memory for storing program instructions;
and the processor is used for calling the program instructions stored in the memory and executing the steps included in any one of the implementation modes of the first aspect according to the obtained program instructions.
In a fourth aspect, there is provided a storage medium having stored thereon computer-executable instructions for causing a computer to perform the steps included in any one of the embodiments of the first aspect.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application.
Fig. 1 is a schematic structural diagram of a 3D point cloud semantic segmentation system in an embodiment of the present application;
FIG. 2 is a schematic flow chart of a method for semantic segmentation of a 3D point cloud in an embodiment of the present application;
fig. 3 is a schematic structural diagram of an apparatus for semantic segmentation of 3D point cloud in the embodiment of the present application;
fig. 4 is a schematic structural diagram of a 3D point cloud semantic segmentation system in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described can be performed in an order different than here.
The terms "first" and "second" in the description and claims of the present application and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the term "comprises" and any variations thereof, which are intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
In the embodiments of the present application, "at least one" may mean one or at least two, for example, one, two, three, or more, and the embodiments of the present application are not limited.
In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this document generally indicates that the preceding and following related objects are in an "or" relationship unless otherwise specified.
At present, a common 3D point cloud semantic segmentation method is to convert a 3D point cloud into a manual voxel grid feature or a multi-view image feature, and then send the manual voxel grid feature or the multi-view image feature to a deep learning network for feature extraction, so that the method for converting features is large in data size and complex in calculation, and if the resolution is reduced, the segmentation accuracy is reduced. Therefore, the problems that the data calibration workload is large, the segmentation precision is low, and the 3D point cloud semantics cannot be segmented effectively exist in the prior art.
In view of this, the present application provides a method for semantic segmentation of a 3D point cloud, which may obtain an RGB image to be detected and a first 3D point cloud corresponding to the RGB image, then detect and judge an object in the RGB image by using a pre-training model, determine whether the detected object is an object of a specified type, if the detected object is an object of a specified type, obtain a second 3D point cloud from the first 3D point cloud, remove a ground point cloud from the second 3D point cloud, wherein the second 3D point cloud corresponds to a target detection frame of the detected object in the RGB image, then perform 3D clustering on the second 3D point cloud after the ground point cloud is removed, obtain a point cloud piece set, wherein the point cloud piece set includes at least one point cloud piece representing 3D point clouds of different types of objects, determine a maximum value and a minimum value of points included in the 3D point cloud of the detected object, and filtering a target point cloud picture containing points between the maximum value and the minimum value from the point cloud picture set, acquiring a point cloud picture containing the most points from the target point cloud picture, and determining the point cloud picture containing the most points as the 3D semantic point cloud of the detected object. The method comprises the steps of detecting and judging objects in an RGB image through a target detection model generated according to Faster R-CNN, obtaining 3D point cloud corresponding to a target detection frame of the detected objects in the RGB image, carrying out a series of processing on the 3D point cloud corresponding to the target detection frame, filtering interference point cloud in the 3D point cloud, and determining the 3D semantic point cloud of the detected objects, so that the data calibration workload is reduced, the segmentation precision is improved, and the 3D point cloud semantic is effectively segmented.
In order to better understand the technical solutions, the technical solutions of the present application are described in detail below through the drawings and the specific embodiments of the specification, and it should be understood that the specific features of the embodiments and examples of the present application are detailed descriptions of the technical solutions of the present application, and are not limitations of the technical solutions of the present application, and the technical features of the embodiments and examples of the present application may be combined with each other without conflict.
Fig. 1 is a structure of a 3D point cloud semantic segmentation system to which the method provided by the embodiment of the present application is applicable, but it should be understood that the 3D point cloud semantic segmentation system shown in fig. 1 is a simple description of the 3D point cloud semantic segmentation system to which the method provided by the embodiment of the present application is applicable, and is not a limitation of the 3D point cloud semantic segmentation system to which the method provided by the embodiment of the present application is applicable.
The system for semantic segmentation of 3D point clouds shown in fig. 1 includes a memory 101, a processor 102, and a bus interface 103. The memory 101 and the processor 102 are connected via a bus interface 103. The memory 101 is used to store program instructions. The processor 102 is configured to call the program instructions stored in the memory 101, and execute all steps included in the method for semantic segmentation of the 3D point cloud according to the obtained program instructions.
Referring to fig. 2, an embodiment of the present application provides a method for semantic segmentation of a 3D point cloud, which can be performed by the system shown in fig. 1. The specific flow of the method is described below.
Step 201: respectively acquiring an RGB image to be detected and a first 3D point cloud corresponding to the RGB image, detecting and judging an object in the RGB image by adopting a pre-training model, and determining whether the detected object is an object of a specified type.
In the embodiment of the application, before the RGB image to be detected and the first 3D point cloud corresponding to the RGB image are respectively acquired, a pre-training model for detecting and judging an object in the RGB image is generated, wherein the pre-training model is a target detection model generated according to a fast area convolutional neural network fast R-CNN.
In a specific implementation process, a sample RGB image set of a pre-training model is obtained through an RGB-D camera, wherein the RGB-D camera can simultaneously acquire indoor scene data within a camera view angle to obtain synchronous scene color information and depth information, the color information and the depth information can be embodied through the RGB image and the depth image, and pixel points of the two images correspond to one another. The method comprises the steps of enabling a sample RGB image set to comprise at least one sample RGB image, marking an object of a designated type in each sample RGB image of the sample RGB image set to generate a sample data set, training fast R-CNN by using the sample data set to generate a pre-training model capable of detecting and judging the object in the RGB image, extracting the characteristics of each convolution layer of the fast R-CNN by using the generated pre-training model, visually observing the semantic expression of the characteristics, selecting part of training data, removing a background, generating a Mask image for migration training, and improving the detection rate of the pre-training model, reducing the false alarm rate and enhancing the generalization capability through migration training.
After the pre-training model is subjected to migration training, an RGB image to be detected and a depth image are obtained through an RGB-D camera, a first 3D point cloud corresponding to the RGB image to be detected is obtained according to the RGB image to be detected and the depth image, the first 3D point cloud is a space point set formed by color information stored in the whole RGB image and position information stored in the whole depth image, then the pre-training model is adopted to detect and judge an object in the RGB image to be detected, whether the detected object is an appointed type object or not is determined, and if the detected object is determined to be the appointed type object, step 202 is executed.
Step 202: and if the object is the specified type, acquiring a second 3D point cloud from the first 3D point cloud, and removing the ground point cloud in the second 3D point cloud.
In the embodiment of the application, if it is determined that the detected object is an object of the designated type, a second 3D point cloud is obtained from the first 3D point cloud, and the ground point cloud in the second 3D point cloud is removed, wherein the second 3D point cloud corresponds to the target detection frame of the object detected in the RGB image.
In a specific implementation process, in order to remove noise in the second 3D point cloud and make the second 3D point cloud uniformly distributed, the second 3D point cloud is subjected to median filtering and voxelization. Then, because a series of background point clouds such as ground point clouds and wall point clouds also exist in the second 3D point cloud after the median filtering and the voxelization processing, the background point clouds need to be removed. For the ground point cloud in the second 3D point cloud after the median filtering and the voxelization, a ground equation can be determined according to the height from the RGB-D camera to the ground, a point in the second 3D point cloud is brought into the ground equation, if the error is less than a certain threshold, the point is considered to be a point in the ground point cloud, the ground point cloud in the second 3D point cloud after the median filtering and the voxelization can be determined by repeating the operation, and then the ground point cloud is removed from the second 3D point cloud after the median filtering and the voxelization. After removing the ground point cloud in the second 3D point cloud, step 203 may be performed to remove a series of background point clouds such as wall point cloud.
Step 203: and 3D clustering the second 3D point cloud after the ground point cloud is removed to obtain a point cloud piece set.
In the embodiment of the application, the second 3D point cloud not only includes ground point cloud, but also may include other background point clouds such as wall point cloud, and the like, and 3D clustering needs to be performed on the second 3D point cloud after the ground point cloud is removed, so as to obtain a point cloud piece set, and a 3D semantic point cloud of the detected object is screened from the point cloud piece set according to the maximum value and the minimum value of the detected 3D point cloud of the object, wherein the point cloud piece set includes at least one point cloud piece, and the point cloud piece represents the 3D point cloud of the objects of different types.
In a specific implementation process, adding points in a 3D neighborhood sphere to a point cloud piece corresponding to a first point according to the 3D neighborhood sphere of the first point in a second 3D point cloud after removing the ground point cloud, wherein the 3D neighborhood sphere is a sphere with the first point as a sphere center and a preset threshold as a radius until all points in the 3D neighborhood sphere of any point in the point cloud piece corresponding to the first point are added to the point cloud piece corresponding to the first point, and obtaining the point cloud piece corresponding to the first point. And acquiring a point cloud piece corresponding to the second point according to the 3D neighborhood sphere of the second point in the second 3D point cloud after the ground point cloud is removed, wherein the second point is any point except the point in the point cloud piece corresponding to the first point, and the corresponding point cloud piece is added to any point in the second 3D point cloud after the ground point cloud is removed, so as to acquire a point cloud piece set. For ease of understanding, the following description is given by way of example:
for example, step one: for a point a in the second 3D point cloud after the ground point cloud is removed, determining points in a 3D neighborhood sphere of the point a, such as a point b and a point c, classifying the point b and the point c into a point cloud piece A corresponding to the point a, performing the same operation on the point b and the point c, namely determining points in the 3D neighborhood sphere of the point b and the point c, classifying the points in the 3D neighborhood sphere of the point b and the point c into the point cloud piece A corresponding to the point a, and determining the point cloud piece A corresponding to the point a until a new point cannot be found and added into the point cloud piece A;
step two: selecting a point D in the second 3D point cloud after the ground point cloud is removed, wherein the point D is not any point in the point cloud piece A, and repeating the operation on the point D to determine a point cloud piece D corresponding to the point D;
step three: and repeating the first step and the second step until any point in the second 3D point cloud after the ground point cloud is removed belongs to a certain point cloud piece, and determining the point cloud piece set.
Step 204: determining the maximum value and the minimum value of the points contained in the 3D point cloud of the detected object, and filtering a target point cloud picture containing the points between the maximum value and the minimum value from the point cloud picture set.
In the embodiment of the application, a geometric model of the detected object can be established according to the geometric size of the detected object, then the maximum value and the minimum value of the points contained in the 3D point cloud of the detected object are obtained according to the geometric model and the size of the midpoint of the 3D point cloud, and then a target point cloud piece containing the points between the maximum value and the minimum value is filtered from the point cloud piece set. For ease of understanding, the following description is given by way of example:
for example, if the detected object is a cat, a geometric model of the cat is established according to the geometric dimensions, such as length, width and height, of the cat, the maximum volume of the cat is determined to be 0.027 cubic meter and the minimum volume of the cat is determined to be 0.015 cubic meter according to the geometric model of the cat, and the maximum value and the minimum value of the number of points contained in the 3D point cloud of the cat are determined to be 270 and 150 respectively because the volume of the midpoint of the 3D point cloud is 0.0001 cubic meter.
Step 205: and acquiring a point cloud picture containing the most points from the target point cloud pictures, and determining the point cloud picture containing the most points as the 3D semantic point cloud of the detected object.
Based on the same invention concept, the embodiment of the application provides a 3D point cloud semantic segmentation device, and the 3D point cloud semantic segmentation device can realize the corresponding functions of the 3D point cloud semantic segmentation method. The 3D point cloud semantic segmentation device can be a hardware structure, a software module or a hardware structure and a software module. The 3D point cloud semantic segmentation device can be realized by a chip system, and the chip system can be formed by a chip and can also comprise the chip and other discrete devices. Referring to fig. 3, the apparatus for semantic segmentation of 3D point cloud includes a first determining module 301, a processing module 302, an obtaining module 303, a filtering module 304, and a second determining module 305, where:
the first determining module 301 is configured to obtain an RGB image to be detected and a first 3D point cloud corresponding to the RGB image, detect and judge an object in the RGB image by using a pre-training model, and determine whether the detected object is an object of an appointed type, where the pre-training model is a target detection model generated according to a fast regional convolutional neural network fast R-CNN;
a processing module 302, configured to, if an object of a specified type is found, obtain a second 3D point cloud from the first 3D point cloud, and remove a ground point cloud in the second 3D point cloud, where the second 3D point cloud corresponds to a target detection frame of the detected object in the RGB image;
an obtaining module 303, configured to perform 3D clustering on the second 3D point cloud after the ground point cloud is removed, and obtain a point cloud slice set, where the point cloud slice set includes at least one point cloud slice, and the point cloud slice represents 3D point clouds of objects of different types;
a filtering module 304, configured to determine a maximum value and a minimum value of points included in the 3D point cloud of the detected object, and filter a target point cloud slice from the point cloud slice set, where the points are included between the maximum value and the minimum value;
a second determining module 305, configured to obtain a point cloud slice with the largest number of points from the target point cloud slices, and determine that the point cloud slice with the largest number of points is the 3D semantic point cloud of the detected object.
In an optional embodiment, the apparatus further comprises a generating module configured to:
obtaining a sample RGB image set of a pre-training model, wherein the sample RGB image set comprises at least one sample RGB image;
marking an object of a specified type in a sample RGB image of the sample RGB image set to generate a sample data set;
training Faster R-CNN by using the sample data set to generate the pre-training model.
In an optional implementation manner, the processing module 302 is specifically configured to:
performing median filtering and voxelization processing on the second 3D point cloud;
determining ground point clouds in the median filtering and voxelization processed second 3D point clouds according to a ground equation;
removing the ground point cloud from the median filtered and voxelized second 3D point cloud.
In an optional implementation manner, the obtaining module 303 is specifically configured to:
adding points in a first point in the second 3D point cloud after the ground point cloud is removed into a point cloud slice corresponding to the first point according to the 3D neighborhood ball of the first point, wherein the 3D neighborhood ball is a ball with the first point as the center of sphere and a preset threshold as the radius;
adding the point cloud piece corresponding to the first point into a point cloud piece corresponding to the first point until a point in a 3D neighborhood sphere of any point in the point cloud pieces corresponding to the first point is added, and obtaining a point cloud piece corresponding to the first point;
acquiring a point cloud sheet corresponding to a second point according to the 3D neighborhood sphere of the second point in the second 3D point cloud after the ground point cloud is removed, wherein the second point is any point except the point in the point cloud sheet corresponding to the first point;
and adding the corresponding point cloud piece into any point in the second 3D point cloud after the ground point cloud is removed, and acquiring a point cloud piece set.
In an optional implementation manner, the filtering module 304 is specifically configured to:
establishing a geometric model of the detected object according to the geometric size of the detected object;
and acquiring the maximum value and the minimum value of the points contained in the 3D point cloud of the detected object according to the geometric model and the size of the midpoint of the 3D point cloud.
Based on the same inventive concept, an embodiment of the present application provides a 3D point cloud semantic segmentation system, please refer to fig. 4, where the 3D point cloud semantic segmentation system includes at least one processor 402 and a memory 401 connected to the at least one processor, a specific connection medium between the processor 402 and the memory 401 is not limited in this embodiment of the present application, fig. 4 is an example in which the processor 402 and the memory 401 are connected by a bus 400, the bus 400 is represented by a thick line in fig. 4, and a connection manner between other components is only schematically illustrated, and is not limited thereto. The bus 400 may be divided into an address bus, a data bus, a control bus, etc., and is shown with only one thick line in fig. 4 for ease of illustration, but does not represent only one bus or type of bus.
In the embodiment of the present application, the memory 401 stores instructions executable by the at least one processor 402, and the at least one processor 402 may perform the steps included in the aforementioned method for semantic segmentation of 3D point cloud by calling the instructions stored in the memory 401.
The processor 402 is a control center of the 3D point cloud semantic segmentation system, and can connect various parts of the entire 3D point cloud semantic segmentation system by using various interfaces and lines, and implement various functions of the 3D point cloud semantic segmentation system by executing instructions stored in the memory 401. Optionally, the processor 402 may include one or more processing units, and the processor 402 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 402. In some embodiments, processor 402 and memory 401 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.
The processor 402 may be a general-purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method for semantic segmentation of 3D point cloud disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
By programming the processor 402, the code corresponding to the 3D point cloud semantic segmentation method described in the foregoing embodiment may be solidified into a chip, so that the chip can execute the steps of the 3D point cloud semantic segmentation method when running, and how to program the processor 402 is a technique known by those skilled in the art, and is not described here again.
Based on the same inventive concept, the present application further provides a storage medium storing computer instructions, which when executed on a computer, cause the computer to perform the steps of the method for semantic segmentation of 3D point cloud as described above.
In some possible embodiments, various aspects of the method for 3D point cloud semantic segmentation provided by the present application may also be implemented in the form of a program product including program code for causing a 3D point cloud semantic segmentation system to perform the steps in the method for 3D point cloud semantic segmentation according to various exemplary embodiments of the present application described above in this specification when the program product is run on the 3D point cloud semantic segmentation system.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.
Claims (10)
1. A method for semantic segmentation of a 3D point cloud, comprising:
respectively obtaining an RGB image to be detected and a first 3D point cloud corresponding to the RGB image, detecting and judging an object in the RGB image by adopting a pre-training model, and determining whether the detected object is an object of a specified type, wherein the pre-training model is a target detection model generated according to a fast regional convolutional neural network Faster R-CNN;
if the object is of the designated type, acquiring a second 3D point cloud from the first 3D point cloud, and removing a ground point cloud in the second 3D point cloud, wherein the second 3D point cloud corresponds to a target detection frame of the detected object in the RGB image;
3D clustering is carried out on the second 3D point cloud after the ground point cloud is removed, and a point cloud piece set is obtained, wherein the point cloud piece set comprises at least one point cloud piece, and the point cloud pieces represent the 3D point clouds of objects of different types;
determining the maximum value and the minimum value of the points contained in the 3D point cloud of the detected object, and filtering a target point cloud picture containing the points between the maximum value and the minimum value from the point cloud picture set;
and acquiring a point cloud picture containing the most points from the target point cloud pictures, and determining the point cloud picture containing the most points as the 3D semantic point cloud of the detected object.
2. The method of claim 1, wherein before the obtaining of the RGB image to be detected and the first 3D point cloud corresponding to the RGB image, respectively, further comprises:
obtaining a sample RGB image set of a pre-training model, wherein the sample RGB image set comprises at least one sample RGB image;
marking an object of a specified type in a sample RGB image of the sample RGB image set to generate a sample data set;
training Faster R-CNN by using the sample data set to generate the pre-training model.
3. The method of claim 1, wherein removing the ground point cloud in the second 3D point cloud comprises:
performing median filtering and voxelization processing on the second 3D point cloud;
determining ground point clouds in the median filtering and voxelization processed second 3D point clouds according to a ground equation;
removing the ground point cloud from the median filtered and voxelized second 3D point cloud.
4. The method of claim 3, wherein 3D clustering the second 3D point cloud after removing the ground point cloud to obtain a set of point cloud slices comprises:
adding points in a first point in the second 3D point cloud after the ground point cloud is removed into a point cloud slice corresponding to the first point according to the 3D neighborhood ball of the first point, wherein the 3D neighborhood ball is a ball with the first point as the center of sphere and a preset threshold as the radius;
adding the point cloud piece corresponding to the first point into a point cloud piece corresponding to the first point until a point in a 3D neighborhood sphere of any point in the point cloud pieces corresponding to the first point is added, and obtaining a point cloud piece corresponding to the first point;
acquiring a point cloud sheet corresponding to a second point according to the 3D neighborhood sphere of the second point in the second 3D point cloud after the ground point cloud is removed, wherein the second point is any point except the point in the point cloud sheet corresponding to the first point;
and adding the corresponding point cloud piece into any point in the second 3D point cloud after the ground point cloud is removed, and acquiring a point cloud piece set.
5. The method of claim 1 or 4, wherein determining a maximum and a minimum of a number of points contained in the 3D point cloud of the detected object comprises:
establishing a geometric model of the detected object according to the geometric size of the detected object;
and acquiring the maximum value and the minimum value of the points contained in the 3D point cloud of the detected object according to the geometric model and the size of the midpoint of the 3D point cloud.
6. An apparatus for 3D point cloud semantic segmentation, comprising:
the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for respectively obtaining an RGB image to be detected and a first 3D point cloud corresponding to the RGB image, detecting and judging an object in the RGB image by adopting a pre-training model, and determining whether the detected object is an object of a specified type, and the pre-training model is a target detection model generated according to a fast regional convolutional neural network Faster R-CNN;
the processing module is used for acquiring a second 3D point cloud from the first 3D point cloud and removing a ground point cloud in the second 3D point cloud if the object is of the specified type, wherein the second 3D point cloud corresponds to the target detection frame of the detected object in the RGB image;
the acquisition module is used for performing 3D clustering on the second 3D point cloud after the ground point cloud is removed to acquire a point cloud piece set, wherein the point cloud piece set comprises at least one point cloud piece, and the point cloud pieces represent the 3D point clouds of objects of different types;
the filtering module is used for determining the maximum value and the minimum value of the points contained in the 3D point cloud of the detected object and filtering a target point cloud film containing the points between the maximum value and the minimum value from the point cloud film set;
and the second determining module is used for acquiring the point cloud piece with the most points from the target point cloud pieces and determining the point cloud piece with the most points as the 3D semantic point cloud of the detected object.
7. The apparatus of claim 6, wherein the obtaining module is specifically configured to:
adding points in a first point in the second 3D point cloud after the ground point cloud is removed into a point cloud slice corresponding to the first point according to the 3D neighborhood ball of the first point, wherein the 3D neighborhood ball is a ball with the first point as the center of sphere and a preset threshold as the radius;
adding the point cloud piece corresponding to the first point into a point cloud piece corresponding to the first point until a point in a 3D neighborhood sphere of any point in the point cloud pieces corresponding to the first point is added, and obtaining a point cloud piece corresponding to the first point;
acquiring a point cloud sheet corresponding to a second point according to the 3D neighborhood sphere of the second point in the second 3D point cloud after the ground point cloud is removed, wherein the second point is any point except the point in the point cloud sheet corresponding to the first point;
and adding the corresponding point cloud piece into any point in the second 3D point cloud after the ground point cloud is removed, and acquiring a point cloud piece set.
8. The apparatus of claim 6, wherein the filtering module is specifically configured to:
establishing a geometric model of the detected object according to the geometric size of the detected object;
and acquiring the maximum value and the minimum value of the points contained in the 3D point cloud of the detected object according to the geometric model and the size of the midpoint of the 3D point cloud.
9. A system for 3D point cloud semantic segmentation, comprising:
a memory for storing program instructions;
a processor for calling program instructions stored in said memory and for executing the steps comprised by the method of any one of claims 1 to 5 in accordance with the obtained program instructions.
10. A storage medium storing computer-executable instructions for causing a computer to perform the steps comprising the method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911329744.7A CN111178413A (en) | 2019-12-20 | 2019-12-20 | 3D point cloud semantic segmentation method, device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911329744.7A CN111178413A (en) | 2019-12-20 | 2019-12-20 | 3D point cloud semantic segmentation method, device and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111178413A true CN111178413A (en) | 2020-05-19 |
Family
ID=70655580
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911329744.7A Pending CN111178413A (en) | 2019-12-20 | 2019-12-20 | 3D point cloud semantic segmentation method, device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111178413A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106407947A (en) * | 2016-09-29 | 2017-02-15 | 百度在线网络技术(北京)有限公司 | Target object recognition method and device applied to unmanned vehicle |
WO2018068653A1 (en) * | 2016-10-10 | 2018-04-19 | 腾讯科技(深圳)有限公司 | Point cloud data processing method and apparatus, and storage medium |
CN108717540A (en) * | 2018-08-03 | 2018-10-30 | 浙江梧斯源通信科技股份有限公司 | The method and device of pedestrian and vehicle are distinguished based on 2D laser radars |
CN109141364A (en) * | 2018-08-01 | 2019-01-04 | 北京进化者机器人科技有限公司 | Obstacle detection method, system and robot |
CN109657698A (en) * | 2018-11-20 | 2019-04-19 | 同济大学 | A kind of magnetic-levitation obstacle detection method based on cloud |
CN109740628A (en) * | 2018-12-03 | 2019-05-10 | 深圳市华讯方舟太赫兹科技有限公司 | Point cloud clustering method, image processing equipment and the device with store function |
CN110275153A (en) * | 2019-07-05 | 2019-09-24 | 上海大学 | A kind of waterborne target detection and tracking based on laser radar |
-
2019
- 2019-12-20 CN CN201911329744.7A patent/CN111178413A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106407947A (en) * | 2016-09-29 | 2017-02-15 | 百度在线网络技术(北京)有限公司 | Target object recognition method and device applied to unmanned vehicle |
WO2018068653A1 (en) * | 2016-10-10 | 2018-04-19 | 腾讯科技(深圳)有限公司 | Point cloud data processing method and apparatus, and storage medium |
CN109141364A (en) * | 2018-08-01 | 2019-01-04 | 北京进化者机器人科技有限公司 | Obstacle detection method, system and robot |
CN108717540A (en) * | 2018-08-03 | 2018-10-30 | 浙江梧斯源通信科技股份有限公司 | The method and device of pedestrian and vehicle are distinguished based on 2D laser radars |
CN109657698A (en) * | 2018-11-20 | 2019-04-19 | 同济大学 | A kind of magnetic-levitation obstacle detection method based on cloud |
CN109740628A (en) * | 2018-12-03 | 2019-05-10 | 深圳市华讯方舟太赫兹科技有限公司 | Point cloud clustering method, image processing equipment and the device with store function |
CN110275153A (en) * | 2019-07-05 | 2019-09-24 | 上海大学 | A kind of waterborne target detection and tracking based on laser radar |
Non-Patent Citations (2)
Title |
---|
卢惠民等: "《ROS与中型组足球机器人》", 31 October 2016, 国防工业出版社, pages: 125 * |
郭宁博等: ""基于RANSAC分割的点云数据K-近邻去噪算法研究"", no. 12 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2811423B1 (en) | Method and apparatus for detecting target | |
EP3330925B1 (en) | Method for 3d reconstruction of an environment of a mobile device, corresponding computer program product and device | |
US9245200B2 (en) | Method for detecting a straight line in a digital image | |
CN110717489A (en) | Method and device for identifying character area of OSD (on screen display) and storage medium | |
CN109658454B (en) | Pose information determination method, related device and storage medium | |
CN105303514A (en) | Image processing method and apparatus | |
CN110570442A (en) | Contour detection method under complex background, terminal device and storage medium | |
KR20190042077A (en) | Digital Object Unique Identifier (DOI) recognition method and device | |
CN110207702B (en) | Target positioning method and device | |
CN108805201A (en) | Destination image data set creation method and its device | |
CN110689134A (en) | Method, apparatus, device and storage medium for performing machine learning process | |
Yogeswaran et al. | 3d surface analysis for automated detection of deformations on automotive body panels | |
CN111382637A (en) | Pedestrian detection tracking method, device, terminal equipment and medium | |
CN113744256A (en) | Depth map hole filling method and device, server and readable storage medium | |
CN109492639A (en) | " loaded " position three-dimensional coordinate acquisition methods, system and image recognition apparatus | |
CN108960247B (en) | Image significance detection method and device and electronic equipment | |
CN114898321A (en) | Method, device, equipment, medium and system for detecting road travelable area | |
CN112101139B (en) | Human shape detection method, device, equipment and storage medium | |
CN112837384B (en) | Vehicle marking method and device and electronic equipment | |
CN113269790B (en) | Video clipping method, device, electronic equipment, server and storage medium | |
CN110673607A (en) | Feature point extraction method and device in dynamic scene and terminal equipment | |
CN116468838B (en) | Regional resource rendering method, system, computer and readable storage medium | |
CN114972492A (en) | Position and pose determination method and device based on aerial view and computer storage medium | |
CN116403062A (en) | Point cloud target detection method, system, equipment and medium | |
CN115588187A (en) | Pedestrian detection method, device and equipment based on three-dimensional point cloud and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |