CN116468892A - Semantic segmentation method and device of three-dimensional point cloud, electronic equipment and storage medium - Google Patents
Semantic segmentation method and device of three-dimensional point cloud, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN116468892A CN116468892A CN202310444546.5A CN202310444546A CN116468892A CN 116468892 A CN116468892 A CN 116468892A CN 202310444546 A CN202310444546 A CN 202310444546A CN 116468892 A CN116468892 A CN 116468892A
- Authority
- CN
- China
- Prior art keywords
- feature
- dimensional
- point
- point cloud
- points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000002776 aggregation Effects 0.000 claims abstract description 68
- 238000004220 aggregation Methods 0.000 claims abstract description 68
- 230000004927 fusion Effects 0.000 claims abstract description 41
- 230000006870 function Effects 0.000 claims description 36
- 238000011176 pooling Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 7
- 239000013598 vector Substances 0.000 claims description 5
- 230000001131 transforming effect Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000000605 extraction Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Abstract
The invention discloses a semantic segmentation method, a semantic segmentation device, electronic equipment and a storage medium of a three-dimensional point cloud, wherein the semantic segmentation method comprises the following steps: acquiring three-dimensional point cloud data to be processed; searching nearest neighbors of points in eight subspaces corresponding to the points according to preset searching radiuses by taking each point in the three-dimensional point cloud data as a center, taking the characteristics of the nearest neighbors as subspace characteristics if the nearest neighbors exist in the subspaces, and taking the point characteristics of the points as subspace characteristics if the nearest neighbors exist in the subspaces; fusing each subspace feature with the point feature to obtain a point fusion feature; according to the fusion features, the point cloud features of the three-dimensional point cloud data are determined, feature aggregation is carried out on the point cloud features based on a preset aggregation function, and a semantic segmentation result is obtained, so that the local relationship between the points is added before the features are extracted, the loss of information is reduced, and more accurate semantic segmentation of the three-dimensional point cloud is realized.
Description
Technical Field
The present disclosure relates to the field of computer vision, and more particularly, to a semantic segmentation method and apparatus for three-dimensional point cloud, an electronic device, and a storage medium.
Background
In recent years, technologies such as computer vision and autopilot are in a rapid development period, two-dimensional data research cannot meet the social demands of the current stage, three-dimensional data processing is receiving more and more attention, and computer vision tasks are coming into a brand-new development stage in the three-dimensional field. The three-dimensional point cloud semantic segmentation task refers to classifying the same class of points into a subset according to semantic information of a given point cloud. The semantic segmentation is widely applied in actual scenes, and the problem of accurate and rapid semantic segmentation is one of the current research hotspots.
In the prior art, pointNet is a mountain-climbing operation of directly processing point clouds by a neural network, but the method does not consider local structural relations among the point clouds; although PointNet++ groups point clouds into different local point clouds, each point is handled separately in each local point cloud, and no point-to-point relationship is considered. The latter work mainly relies on the use of convolution, graphics or attention mechanisms to explore complex local geometry extractors, although these approaches have some improvement in performance, the complexity of the modules also makes the model run at low speed, while the later-proposed PointNeXt achieves good results without using complex local feature extractors, but ignores the point-to-point interrelationships of the local regions, which tends to result in loss of information.
Therefore, how to more accurately perform semantic segmentation on the three-dimensional point cloud is a technical problem to be solved at present.
Disclosure of Invention
The embodiment of the application provides a semantic segmentation method, a semantic segmentation device, electronic equipment and a storage medium for three-dimensional point cloud, which are used for more accurately carrying out semantic segmentation on the three-dimensional point cloud.
In a first aspect, a semantic segmentation method of a three-dimensional point cloud is provided, the method comprising: acquiring three-dimensional point cloud data to be processed; searching nearest neighbors of points in eight subspaces corresponding to the points according to a preset searching radius by taking each point in the three-dimensional point cloud data as a center, wherein the eight subspaces correspond to eight quadrants in a space coordinate system taking the points as origins; if the nearest neighbor exists in the subspace, taking the characteristic of the nearest neighbor as a subspace characteristic, otherwise taking the point characteristic of the point as the subspace characteristic; fusing each subspace feature with the point feature to obtain a fused feature of the point; and determining the point cloud characteristics of the three-dimensional point cloud data according to each fusion characteristic, and carrying out characteristic aggregation on the point cloud characteristics based on a preset aggregation function to obtain a semantic segmentation result.
In a second aspect, there is provided a semantic segmentation apparatus for a three-dimensional point cloud, the apparatus comprising: the acquisition module is used for acquiring the three-dimensional point cloud data to be processed; the searching module is used for searching nearest neighbor points of the points in eight subspaces corresponding to the points according to a preset searching radius by taking the points in the three-dimensional point cloud data as centers, wherein the eight subspaces correspond to eight quadrants in a space coordinate system taking the points as origins; the determining module is used for taking the characteristics of the nearest neighbor points as subspace characteristics if the nearest neighbor points exist in the subspace, otherwise taking the point characteristics of the points as subspace characteristics; the fusion module is used for fusing each subspace feature with the point feature to obtain the fusion feature of the point; and the aggregation module is used for determining the point cloud characteristics of the three-dimensional point cloud data according to the fusion characteristics, and carrying out characteristic aggregation on the point cloud characteristics based on a preset aggregation function to obtain a semantic segmentation result.
In a third aspect, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the semantic segmentation method of the three-dimensional point cloud of the first aspect via execution of the executable instructions.
In a fourth aspect, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the semantic segmentation method of a three-dimensional point cloud according to the first aspect.
By applying the technical scheme, three-dimensional point cloud data to be processed are obtained; searching nearest neighbors of points in eight subspaces corresponding to the points according to a preset searching radius by taking each point in the three-dimensional point cloud data as a center, wherein the eight subspaces correspond to eight quadrants in a space coordinate system taking the points as origins; if the nearest neighbor exists in the subspace, taking the characteristic of the nearest neighbor as a subspace characteristic, otherwise taking the point characteristic of the point as the subspace characteristic; fusing each subspace feature with the point feature to obtain a fused feature of the point; and determining the point cloud characteristics of the three-dimensional point cloud data according to each fusion characteristic, and carrying out characteristic aggregation on the point cloud characteristics based on a preset aggregation function to obtain a semantic segmentation result, so that the local relation between points is added before the characteristics are extracted, the information loss is reduced, and more accurate semantic segmentation on the three-dimensional point cloud is realized.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 shows a flow diagram of a semantic segmentation method of a three-dimensional point cloud according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a semantic segmentation method of a three-dimensional point cloud according to another embodiment of the present invention;
FIG. 3 is a schematic diagram of searching for nearest neighbors in an embodiment of the invention;
FIG. 4 is a schematic diagram of a convolution operation performed in an embodiment of the present disclosure;
fig. 5 shows a schematic structural diagram of a semantic segmentation device for three-dimensional point cloud according to an embodiment of the present invention;
fig. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
It is noted that other embodiments of the present application will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the present application is not limited to the precise construction set forth herein below and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
The subject application is operational with numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor devices, distributed computing environments that include any of the above devices or devices, and the like.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiment of the application provides a semantic segmentation method of a three-dimensional point cloud, as shown in fig. 1, comprising the following steps:
step S101, three-dimensional point cloud data to be processed are acquired.
The three-dimensional point cloud data to be processed can be acquired in real time, for example, the data acquisition is performed on the appointed three-dimensional space based on the laser radar, so as to obtain the three-dimensional point cloud data to be processed. The three-dimensional point cloud data to be processed can also be uploaded by a user or acquired from other servers.
In some embodiments of the present application, the acquiring three-dimensional point cloud data to be processed includes:
acquiring original three-dimensional point cloud data;
transforming the original three-dimensional point cloud data according to a formula II, wherein the formula II specifically comprises the following steps of:
wherein { f i,j Is f after grouping i ∈R d K local neighborhood points of (f), each neighborhood point f i,j Are all d-dimensional vectors, j=1,.. k×d ,α∈R d Is a learnable parameter, +.is Hadamard product, ∈=1 e-5 Is a small value for ensuring numerical stability and σ is a scalar quantity characterizing the feature bias of all local groupings and channels.
In this embodiment, data acquisition can be performed on the designated three-dimensional space based on the laser radar to obtain original three-dimensional point cloud data, and because the point cloud data has the characteristics of irregularity and local region sparseness, the stability of the model in a simple depth MLP structure is not strong due to the characteristics, so that in order to enhance the robustness of the model, the original three-dimensional point cloud data is transformed according to a formula two to obtain the three-dimensional point cloud data to be processed, thereby improving the stability in the subsequent semantic segmentation process.
Step S102, searching nearest neighbors of points in eight subspaces corresponding to the points according to a preset searching radius by taking the points in the three-dimensional point cloud data as centers.
In this embodiment, because the point cloud feature is required to be added to the correlation between points in the three-dimensional point cloud data, the nearest neighbor point of each point is searched first, specifically, each point in the three-dimensional point cloud data is used as the center, searching is performed in eight subspaces corresponding to the points according to a preset search radius, and whether the nearest neighbor point which is not greater than the preset search radius from the point exists in each subspace is judged, wherein the eight subspaces correspond to eight quadrants in a space coordinate system with the point as an origin, and the eight quadrants are formed after the space coordinate system is divided by the positive direction and the negative direction of the coordinate axis. Fig. 3 is a schematic diagram of searching for a nearest neighbor point according to an embodiment of the present invention.
And step S103, if the nearest neighbor exists in the subspace, taking the characteristic of the nearest neighbor as a subspace characteristic, otherwise taking the point characteristic of the point as the subspace characteristic.
When searching is carried out according to a preset searching radius, regarding the subspace with the nearest neighbor as subspace characteristics; and regarding a subspace in which no nearest neighbor point exists, namely, no related point which is not more than a preset searching radius from the point exists in the subspace, taking the point characteristic of the point as a subspace characteristic.
Step S104, fusing each subspace feature with the point feature to obtain the fusion feature of the point.
After each subspace feature is obtained, each subspace feature is fused with the point feature, and the fusion feature of the point is obtained, wherein the fusion feature covers the relationship between the points.
In some embodiments of the present application, the fusing each subspace feature with the point feature to obtain the fused feature of the point includes:
encoding each subspace feature and the point feature to obtain the encoding feature of the point;
and carrying out convolution operation on the coding features according to the X axis, the Y axis and the Z axis of the points in sequence to obtain the fusion features.
In this embodiment, feature extraction is performed through convolution operation, for efficient convolution operation, sub-space features and point features are encoded first to obtain encoded features of the points, and then convolution operation is performed on the encoded features in sequence according to the X axis, the Y axis and the Z axis of the points, so that direction information between the points is added to the convolution operation, and after the convolution operation is completed, fusion features are extracted, thereby improving accuracy of the fusion features.
In some embodiments of the present application, the convolving the encoded feature sequentially along the X-axis, the Y-axis, and the Z-axis of the point to obtain the fused feature includes:
performing a convolution operation on the coding features according to the X axis to combine eight subspace features of the points in pairs to obtain four-dimensional features;
performing a convolution operation on the four-dimensional features according to the Y axis to combine the four-dimensional features in pairs according to dimensions to obtain two-dimensional features;
performing a convolution operation on the two-dimensional features according to the Z axis to combine the two-dimensional features according to dimensions to obtain single-dimensional features;
and splicing the single-dimensional feature and the point feature to obtain the fusion feature.
In this embodiment, the four-dimensional feature has four dimensions, the two-dimensional feature has two dimensions, and the single-dimensional feature has one dimension. And performing convolution operation for one time according to the X axis, the Y axis and the Z axis in sequence, converting eight subspace features in the coding features into single-dimensional features, and then splicing the single-dimensional features and the point features to obtain fusion features, so that the accuracy of the fusion features is further improved. For example, if the point feature is a d-dimensional feature, the code feature is a 2 x d-dimensional feature, after the eight subspace features are converted into the single-dimensional features, the obtained fusion features are also d-dimensional features.
Step S105, determining point cloud features of the three-dimensional point cloud data according to each fusion feature, and performing feature aggregation on the point cloud features based on a preset aggregation function to obtain a semantic segmentation result.
After the fusion characteristics of each point are obtained, the point cloud characteristics of the three-dimensional point cloud data are determined based on the fusion characteristics, and then characteristic aggregation is carried out on the point cloud characteristics based on a preset aggregation function, so that classification of the point cloud characteristics is completed, and a semantic segmentation result is obtained.
In some embodiments of the present application, the preset aggregation function includes a first aggregation function that adopts a maximum pooling operation and a second aggregation function that adopts an average pooling operation, and the feature aggregation is performed on the point cloud feature based on the preset aggregation function, so as to obtain a semantic segmentation result, where the method includes:
determining the aggregation characteristic of each point according to a formula I, wherein the formula I is specifically as follows:
determining the semantic segmentation result according to each aggregation feature;
wherein f out For the aggregation feature, A (·) is the first aggregation function, B (·) is the second aggregation function,for residual MLP block, g i And j is the K local neighborhood points of the ith point in the characteristics of the point cloud.
In feature aggregation, the maximum pooling method can keep better texture features, but only takes the maximum value to lose other feature information. In this embodiment, the first aggregation function and the second aggregation function are sampled to perform the maximum pooling operation and the average pooling operation respectively, so that the integrity of the extracted features is ensured, and in addition, the residual MLP (Multilayer Perceptron, multi-layer perceptron) is adopted to perform feature extraction, so that a complex local feature extractor is avoided, the operation cost is reduced, and the segmentation efficiency is improved.
Alternatively, the residual MLP block is composed of a combination of FC layer, normalization layer and activation layer (repeated twice).
By applying the technical scheme, three-dimensional point cloud data to be processed are obtained; searching nearest neighbors of points in eight subspaces corresponding to the points according to a preset searching radius by taking each point in the three-dimensional point cloud data as a center, wherein the eight subspaces correspond to eight quadrants in a space coordinate system taking the points as origins; if the nearest neighbor exists in the subspace, taking the characteristic of the nearest neighbor as a subspace characteristic, otherwise taking the point characteristic of the point as the subspace characteristic; fusing each subspace feature with the point feature to obtain a fused feature of the point; and determining the point cloud characteristics of the three-dimensional point cloud data according to each fusion characteristic, and carrying out characteristic aggregation on the point cloud characteristics based on a preset aggregation function to obtain a semantic segmentation result, so that the local relation between points is added before the characteristics are extracted, the information loss is reduced, and more accurate semantic segmentation on the three-dimensional point cloud is realized.
In order to further explain the technical idea of the invention, the technical scheme of the invention is described with specific application scenarios.
The embodiment of the application provides a semantic segmentation method of a three-dimensional point cloud, which is shown in fig. 2 and comprises the following steps:
step S201, obtaining original three-dimensional point cloud data.
Step S202, performing linear processing on the original three-dimensional point cloud data.
Specifically, the original three-dimensional point cloud data is transformed according to a formula II, so as to obtain the three-dimensional point cloud data to be processed, wherein the formula II specifically comprises:
wherein { f i,j Is f after grouping i ∈R d K local neighborhood points of (f), each neighborhood point f i,j Are all d-dimensional vectors, j=1,.. k×d ,α∈R d Is a learnable parameter, +.is Hadamard product, ∈=1 e-5 Is a small value for ensuring numerical stability and σ is a scalar quantity characterizing the feature bias of all local groupings and channels.
Step S203, processing the neighborhood point interrelationship.
Specifically, as shown in fig. 3, each point in the three-dimensional point cloud data is taken as a center, and the nearest neighbor point of the point is searched in eight subspaces corresponding to the point according to a preset searching radius; if the nearest neighbor exists in the subspace, taking the characteristic of the nearest neighbor as the subspace characteristic, otherwise taking the point characteristic of the point as the subspace characteristic.
Coding each subspace feature and the point feature to obtain a coding feature M epsilon R of the point 2×2×2×d . And carrying out convolution operation on the coding features according to the X axis, the Y axis and the Z axis of the points in sequence to obtain fusion features. Specifically, as shown in the first graph from left to right in fig. 4, a convolution operation is performed on the coding features according to the X-axis, so as to combine eight subspace features of the points in pairs, and four-dimensional features are obtained; as shown in a second graph from left to right in fig. 4, performing a convolution operation on the four-dimensional features according to the Y axis to combine the four-dimensional features in pairs according to dimensions to obtain two-dimensional features; as shown in the third and fourth graphs from left to right in fig. 4, performing a convolution operation on the two-dimensional features according to the Z axis to combine the two-dimensional features according to dimensions to obtain a single-dimensional feature; and finally, splicing the single-dimensional features and the point features to obtain fusion features comprising d-dimensional vectors.
Step S204, determining point cloud characteristics of the three-dimensional point cloud data according to the fusion characteristics, and carrying out characteristic aggregation on the point cloud characteristics based on a preset aggregation function to obtain a semantic segmentation result.
Specifically, the preset aggregation function includes a first aggregation function adopting a maximum pooling operation and a second aggregation function adopting an average pooling operation, and the aggregation characteristics of each point are determined according to a formula I, wherein the formula I specifically includes:
determining a semantic segmentation result according to each aggregation feature;
wherein f out For the aggregation feature, A (-) is a first aggregation function, B (-) is a second aggregation function,for residual MLP block, g i And j is the K local neighborhood points of the ith point in the point cloud characteristics.
By applying the technical scheme, compared with the prior art, the method has the following beneficial effects:
1) The existing MLP-based method does not consider local point relation, but the method aiming at a complex local feature extractor increases operation cost, and the embodiment of the invention adds the interrelationship between the local points into the simple residual MLP feature extraction, so that the model fully considers the original geometric relation between data points and points while ensuring the reduction of operation cost, and effectively reduces the loss of information;
2) In the prior art, a maximum pooling method is generally adopted, and although better texture characteristics can be reserved during characteristic aggregation, other characteristic information can be lost only by taking the maximum value, so that the maximum pooling result and the average pooling result are spliced during characteristic extraction in the embodiment of the application, and the integrity of the extracted characteristics is ensured.
The embodiment of the application also provides a semantic segmentation device of the three-dimensional point cloud, as shown in fig. 5, the device comprises: an obtaining module 501, configured to obtain three-dimensional point cloud data to be processed; the searching module 502 is configured to search, with each point in the three-dimensional point cloud data as a center, for a nearest neighbor point of the point in eight subspaces corresponding to the point according to a preset searching radius, where the eight subspaces correspond to eight quadrants in a space coordinate system with the point as an origin; a determining module 503, configured to take, if the nearest neighbor exists in the subspace, a feature of the nearest neighbor as a subspace feature, otherwise take, as the subspace feature, a point feature of the point; a fusion module 504, configured to fuse each subspace feature with the point feature to obtain a fused feature of the point; the aggregation module 505 is configured to determine point cloud features of the three-dimensional point cloud data according to each of the fusion features, and perform feature aggregation on the point cloud features based on a preset aggregation function to obtain a semantic segmentation result.
In a specific application scenario, the fusion module 504 is specifically configured to: encoding each subspace feature and the point feature to obtain the encoding feature of the point; and carrying out convolution operation on the coding features according to the X axis, the Y axis and the Z axis of the points in sequence to obtain the fusion features.
In a specific application scenario, the fusion module 504 is further specifically configured to: performing a convolution operation on the coding features according to the X axis to combine eight subspace features of the points in pairs to obtain four-dimensional features; performing a convolution operation on the four-dimensional features according to the Y axis to combine the four-dimensional features in pairs according to dimensions to obtain two-dimensional features; performing a convolution operation on the two-dimensional features according to the Z axis to combine the two-dimensional features according to dimensions to obtain single-dimensional features; and splicing the single-dimensional feature and the point feature to obtain the fusion feature.
In a specific application scenario, the preset aggregation functions include a first aggregation function that adopts a maximum pooling operation and a second aggregation function that adopts an average pooling operation, and the aggregation module 505 is specifically configured to: determining the aggregation characteristic of each point according to a formula I, wherein the formula I is specifically as follows:
determining the semantic segmentation result according to each aggregation feature;
wherein f out For the aggregation feature, A (·) is the first aggregation function, B (·) is the second aggregation function,for residual MLP block, g i And j is the K local neighborhood points of the ith point in the characteristics of the point cloud.
In a specific application scenario, the obtaining module 501 is specifically configured to: acquiring original three-dimensional point cloud data; transforming the original three-dimensional point cloud data according to a formula II, wherein the formula II specifically comprises the following steps of:
wherein { f i,j Is f after grouping i ∈R d K local neighborhood points of (f), each neighborhood point f i,j Are all d-dimensional vectors, j=1,.. k×d ,α∈R d Is a learnable parameter, +.is Hadamard product, ∈=1 e-5 Is a small value for ensuring numerical stability and σ is a scalar quantity characterizing the feature bias of all local groupings and channels.
The embodiment of the invention also provides an electronic device, as shown in fig. 6, which comprises a processor 601, a communication interface 602, a memory 603 and a communication bus 604, wherein the processor 601, the communication interface 602 and the memory 603 complete communication with each other through the communication bus 604,
a memory 603 for storing executable instructions of the processor;
a processor 601 configured to execute via execution of the executable instructions:
acquiring three-dimensional point cloud data to be processed; searching nearest neighbors of points in eight subspaces corresponding to the points according to a preset searching radius by taking each point in the three-dimensional point cloud data as a center, wherein the eight subspaces correspond to eight directions of the points in the three-dimensional space respectively; if the nearest neighbor exists in the subspace, taking the characteristic of the nearest neighbor as a subspace characteristic, otherwise taking the point characteristic of the point as the subspace characteristic; fusing each subspace feature with the point feature to obtain a fused feature of the point; and determining the point cloud characteristics of the three-dimensional point cloud data according to each fusion characteristic, and carrying out characteristic aggregation on the point cloud characteristics based on a preset aggregation function to obtain a semantic segmentation result.
The communication bus may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or an EISA (Extended Industry Standard Architecture ) bus, or the like. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the terminal and other devices.
The memory may include RAM (Random Access Memory ) or may include non-volatile memory, such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a CPU (Central Processing Unit ), NP (Network Processor, network processor), etc.; but also DSP (Digital Signal Processing, digital signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
In yet another embodiment of the present invention, a computer readable storage medium is provided, in which a computer program is stored, which when executed by a processor implements the semantic segmentation method of a three-dimensional point cloud as described above.
In yet another embodiment of the present invention, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the semantic segmentation method of a three-dimensional point cloud as described above.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.
Claims (10)
1. A semantic segmentation method of a three-dimensional point cloud, the method comprising:
acquiring three-dimensional point cloud data to be processed;
searching nearest neighbors of points in eight subspaces corresponding to the points according to a preset searching radius by taking each point in the three-dimensional point cloud data as a center, wherein the eight subspaces correspond to eight quadrants in a space coordinate system taking the points as origins;
if the nearest neighbor exists in the subspace, taking the characteristic of the nearest neighbor as a subspace characteristic, otherwise taking the point characteristic of the point as the subspace characteristic;
fusing each subspace feature with the point feature to obtain a fused feature of the point;
and determining the point cloud characteristics of the three-dimensional point cloud data according to each fusion characteristic, and carrying out characteristic aggregation on the point cloud characteristics based on a preset aggregation function to obtain a semantic segmentation result.
2. The method of claim 1, wherein said fusing each of said subspace features with said point features to obtain a fused feature of said point comprises:
encoding each subspace feature and the point feature to obtain the encoding feature of the point;
and carrying out convolution operation on the coding features according to the X axis, the Y axis and the Z axis of the points in sequence to obtain the fusion features.
3. The method of claim 2, wherein convolving the encoded features sequentially along the X-axis, Y-axis, and Z-axis of the point to obtain the fused feature, comprising:
performing a convolution operation on the coding features according to the X axis to combine eight subspace features of the points in pairs to obtain four-dimensional features;
performing a convolution operation on the four-dimensional features according to the Y axis to combine the four-dimensional features in pairs according to dimensions to obtain two-dimensional features;
performing a convolution operation on the two-dimensional features according to the Z axis to combine the two-dimensional features according to dimensions to obtain single-dimensional features;
and splicing the single-dimensional feature and the point feature to obtain the fusion feature.
4. The method of claim 1, wherein the preset aggregation function includes a first aggregation function using a maximum pooling operation and a second aggregation function using an average pooling operation, the feature aggregation is performed on the point cloud feature based on the preset aggregation function, so as to obtain a semantic segmentation result, and the method includes:
determining the aggregation characteristic of each point according to a formula I, wherein the formula I is specifically as follows:
determining the semantic segmentation result according to each aggregation feature;
wherein f out For the aggregation feature, A (·) is the first aggregation function, B (·) is the second aggregation function,for residual MLP block, g i And j is the K local neighborhood points of the ith point in the characteristics of the point cloud.
5. The method of claim 1, wherein the acquiring three-dimensional point cloud data to be processed comprises:
acquiring original three-dimensional point cloud data;
transforming the original three-dimensional point cloud data according to a formula II, wherein the formula II specifically comprises the following steps of:
wherein { f i,j Is f after grouping i ∈R d K local neighborhood points of (f), each neighborhood point f i,j Are all d-dimensional vectors, j=1,.. k×d ,α∈R d Is a learnable parameter, +.is Hadamard product, ∈=1 e-5 Is a small value for ensuring numerical stability and σ is a scalar quantity characterizing the feature bias of all local groupings and channels.
6. A semantic segmentation apparatus for a three-dimensional point cloud, the apparatus comprising:
the acquisition module is used for acquiring the three-dimensional point cloud data to be processed;
the searching module is used for searching nearest neighbor points of the points in eight subspaces corresponding to the points according to a preset searching radius by taking the points in the three-dimensional point cloud data as centers, wherein the eight subspaces correspond to eight quadrants in a space coordinate system taking the points as origins;
the determining module is used for taking the characteristics of the nearest neighbor points as subspace characteristics if the nearest neighbor points exist in the subspace, otherwise taking the point characteristics of the points as subspace characteristics;
the fusion module is used for fusing each subspace feature with the point feature to obtain the fusion feature of the point;
and the aggregation module is used for determining the point cloud characteristics of the three-dimensional point cloud data according to the fusion characteristics, and carrying out characteristic aggregation on the point cloud characteristics based on a preset aggregation function to obtain a semantic segmentation result.
7. The apparatus of claim 6, wherein the fusion module is specifically configured to:
encoding each subspace feature and the point feature to obtain the encoding feature of the point;
and carrying out convolution operation on the coding features according to the X axis, the Y axis and the Z axis of the points in sequence to obtain the fusion features.
8. The apparatus of claim 7, wherein the fusion module is further specifically configured to:
performing a convolution operation on the coding features according to the X axis to combine eight subspace features of the points in pairs to obtain four-dimensional features;
performing a convolution operation on the four-dimensional features according to the Y axis to combine the four-dimensional features in pairs according to dimensions to obtain two-dimensional features;
performing a convolution operation on the two-dimensional features according to the Z axis to combine the two-dimensional features according to dimensions to obtain single-dimensional features;
and splicing the single-dimensional feature and the point feature to obtain the fusion feature.
9. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the semantic segmentation method of the three-dimensional point cloud of any of claims 1-5 via execution of the executable instructions.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the semantic segmentation method of a three-dimensional point cloud according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310444546.5A CN116468892A (en) | 2023-04-24 | 2023-04-24 | Semantic segmentation method and device of three-dimensional point cloud, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310444546.5A CN116468892A (en) | 2023-04-24 | 2023-04-24 | Semantic segmentation method and device of three-dimensional point cloud, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116468892A true CN116468892A (en) | 2023-07-21 |
Family
ID=87178628
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310444546.5A Pending CN116468892A (en) | 2023-04-24 | 2023-04-24 | Semantic segmentation method and device of three-dimensional point cloud, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116468892A (en) |
Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112069883A (en) * | 2020-07-28 | 2020-12-11 | 浙江工业大学 | Deep learning signal classification method fusing one-dimensional and two-dimensional convolutional neural network |
CN112183330A (en) * | 2020-09-28 | 2021-01-05 | 北京航空航天大学 | Target detection method based on point cloud |
CN112418235A (en) * | 2020-11-20 | 2021-02-26 | 中南大学 | Point cloud semantic segmentation method based on expansion nearest neighbor feature enhancement |
CN112801015A (en) * | 2021-02-08 | 2021-05-14 | 华南理工大学 | Multi-mode face recognition method based on attention mechanism |
CN113052835A (en) * | 2021-04-20 | 2021-06-29 | 江苏迅捷装具科技有限公司 | Medicine box detection method and detection system based on three-dimensional point cloud and image data fusion |
KR20210106703A (en) * | 2020-02-21 | 2021-08-31 | 전남대학교산학협력단 | Semantic segmentation system in 3D point cloud and semantic segmentation method in 3D point cloud using the same |
CN113658122A (en) * | 2021-08-09 | 2021-11-16 | 深圳市欢太科技有限公司 | Image quality evaluation method, device, storage medium and electronic equipment |
CN114092803A (en) * | 2021-11-01 | 2022-02-25 | 武汉卓目科技有限公司 | Cloud detection method and device based on remote sensing image, electronic device and medium |
CN114255238A (en) * | 2021-11-26 | 2022-03-29 | 电子科技大学长三角研究院(湖州) | Three-dimensional point cloud scene segmentation method and system fusing image features |
CN114266891A (en) * | 2021-11-17 | 2022-04-01 | 京沪高速铁路股份有限公司 | Railway operation environment abnormity identification method based on image and laser data fusion |
WO2022088676A1 (en) * | 2020-10-29 | 2022-05-05 | 平安科技(深圳)有限公司 | Three-dimensional point cloud semantic segmentation method and apparatus, and device and medium |
CN114581411A (en) * | 2022-02-28 | 2022-06-03 | 北京科技大学 | Convolution kernel generation method and device and electronic equipment |
CN114723764A (en) * | 2022-02-28 | 2022-07-08 | 西安理工大学 | Parameterized edge curve extraction method for point cloud object |
CN114863062A (en) * | 2022-06-07 | 2022-08-05 | 南京航空航天大学深圳研究院 | Industrial scene 3D point cloud model construction method based on point and voxel characteristic representation |
CN115409989A (en) * | 2022-09-22 | 2022-11-29 | 沈阳工业大学 | Three-dimensional point cloud semantic segmentation method for optimizing boundary |
CN115457395A (en) * | 2022-09-22 | 2022-12-09 | 南京信息工程大学 | Lightweight remote sensing target detection method based on channel attention and multi-scale feature fusion |
CN115471641A (en) * | 2022-08-31 | 2022-12-13 | 广东三维家信息科技有限公司 | Three-dimensional indoor scene completion method, device, equipment and storage medium |
US20230052595A1 (en) * | 2021-08-16 | 2023-02-16 | GE Precision Healthcare LLC | Deep learning-based image quality enhancement of three-dimensional anatomy scan images |
CN115861619A (en) * | 2022-12-20 | 2023-03-28 | 重庆大学 | Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network |
CN115937507A (en) * | 2022-04-17 | 2023-04-07 | 北京工业大学 | Point cloud semantic segmentation method based on point void direction convolution |
-
2023
- 2023-04-24 CN CN202310444546.5A patent/CN116468892A/en active Pending
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20210106703A (en) * | 2020-02-21 | 2021-08-31 | 전남대학교산학협력단 | Semantic segmentation system in 3D point cloud and semantic segmentation method in 3D point cloud using the same |
CN112069883A (en) * | 2020-07-28 | 2020-12-11 | 浙江工业大学 | Deep learning signal classification method fusing one-dimensional and two-dimensional convolutional neural network |
CN112183330A (en) * | 2020-09-28 | 2021-01-05 | 北京航空航天大学 | Target detection method based on point cloud |
WO2022088676A1 (en) * | 2020-10-29 | 2022-05-05 | 平安科技(深圳)有限公司 | Three-dimensional point cloud semantic segmentation method and apparatus, and device and medium |
CN112418235A (en) * | 2020-11-20 | 2021-02-26 | 中南大学 | Point cloud semantic segmentation method based on expansion nearest neighbor feature enhancement |
CN112801015A (en) * | 2021-02-08 | 2021-05-14 | 华南理工大学 | Multi-mode face recognition method based on attention mechanism |
CN113052835A (en) * | 2021-04-20 | 2021-06-29 | 江苏迅捷装具科技有限公司 | Medicine box detection method and detection system based on three-dimensional point cloud and image data fusion |
CN113658122A (en) * | 2021-08-09 | 2021-11-16 | 深圳市欢太科技有限公司 | Image quality evaluation method, device, storage medium and electronic equipment |
US20230052595A1 (en) * | 2021-08-16 | 2023-02-16 | GE Precision Healthcare LLC | Deep learning-based image quality enhancement of three-dimensional anatomy scan images |
CN114092803A (en) * | 2021-11-01 | 2022-02-25 | 武汉卓目科技有限公司 | Cloud detection method and device based on remote sensing image, electronic device and medium |
CN114266891A (en) * | 2021-11-17 | 2022-04-01 | 京沪高速铁路股份有限公司 | Railway operation environment abnormity identification method based on image and laser data fusion |
CN114255238A (en) * | 2021-11-26 | 2022-03-29 | 电子科技大学长三角研究院(湖州) | Three-dimensional point cloud scene segmentation method and system fusing image features |
CN114723764A (en) * | 2022-02-28 | 2022-07-08 | 西安理工大学 | Parameterized edge curve extraction method for point cloud object |
CN114581411A (en) * | 2022-02-28 | 2022-06-03 | 北京科技大学 | Convolution kernel generation method and device and electronic equipment |
CN115937507A (en) * | 2022-04-17 | 2023-04-07 | 北京工业大学 | Point cloud semantic segmentation method based on point void direction convolution |
CN114863062A (en) * | 2022-06-07 | 2022-08-05 | 南京航空航天大学深圳研究院 | Industrial scene 3D point cloud model construction method based on point and voxel characteristic representation |
CN115471641A (en) * | 2022-08-31 | 2022-12-13 | 广东三维家信息科技有限公司 | Three-dimensional indoor scene completion method, device, equipment and storage medium |
CN115409989A (en) * | 2022-09-22 | 2022-11-29 | 沈阳工业大学 | Three-dimensional point cloud semantic segmentation method for optimizing boundary |
CN115457395A (en) * | 2022-09-22 | 2022-12-09 | 南京信息工程大学 | Lightweight remote sensing target detection method based on channel attention and multi-scale feature fusion |
CN115861619A (en) * | 2022-12-20 | 2023-03-28 | 重庆大学 | Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network |
Non-Patent Citations (3)
Title |
---|
CHARLES R. QI ET AL.: "PointNet++:Deep Hierachical Feature Learning on Point Sets in a Metric Space", 《ARXIV.ORG》, pages 1 - 14 * |
MINGYANG JIANG ET AL.: "PointSIFT: A SIFT-like Network Module for 3D Point Cloud Semantic Segmentation", 《ARXIV.ORG》, pages 1 - 10 * |
彭玉旭: "基于注意力机制的三维点云车辆目标检测", 《计算机系统应用》, vol. 30, no. 12, pages 211 - 217 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Daradkeh et al. | Development of effective methods for structural image recognition using the principles of data granulation and apparatus of fuzzy logic | |
Eldesokey et al. | Propagating confidences through cnns for sparse data regression | |
CN107240029B (en) | Data processing method and device | |
CN111582054B (en) | Point cloud data processing method and device and obstacle detection method and device | |
Wu et al. | A closed-form solution to tensor voting: Theory and applications | |
WO2022193335A1 (en) | Point cloud data processing method and apparatus, and computer device and storage medium | |
CN115374186B (en) | Data processing method based on big data and AI system | |
CN112749726B (en) | Training method and device for target detection model, computer equipment and storage medium | |
CN111553946A (en) | Method and device for removing ground point cloud and obstacle detection method and device | |
CN112336342A (en) | Hand key point detection method and device and terminal equipment | |
CN115860836B (en) | E-commerce service pushing method and system based on user behavior big data analysis | |
CN111428805B (en) | Method for detecting salient object, model, storage medium and electronic device | |
CN109993026B (en) | Training method and device for relative recognition network model | |
CN112435193A (en) | Method and device for denoising point cloud data, storage medium and electronic equipment | |
CN110889323A (en) | Universal license plate recognition method and device, computer equipment and storage medium | |
Qi et al. | Fast and robust homography estimation method with algebraic outlier rejection | |
CN115546574A (en) | Image classification method, model training method, image classification apparatus, model training apparatus, storage medium, and computer program | |
CN116468892A (en) | Semantic segmentation method and device of three-dimensional point cloud, electronic equipment and storage medium | |
CN111930858A (en) | Representation learning method and device of heterogeneous information network and electronic equipment | |
CN116704254A (en) | Point cloud classification method, point cloud classification device, computer equipment and storage medium | |
CN115830342A (en) | Method and device for determining detection frame, storage medium and electronic device | |
CN116028832A (en) | Sample clustering processing method and device, storage medium and electronic equipment | |
CN115685133A (en) | Positioning method for autonomous vehicle, control device, storage medium, and vehicle | |
CN113468604A (en) | Big data privacy information analysis method and system based on artificial intelligence | |
CN114897147A (en) | Backbone network generation method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |