CN116468892A - Semantic segmentation method and device of three-dimensional point cloud, electronic equipment and storage medium - Google Patents

Semantic segmentation method and device of three-dimensional point cloud, electronic equipment and storage medium Download PDF

Info

Publication number
CN116468892A
CN116468892A CN202310444546.5A CN202310444546A CN116468892A CN 116468892 A CN116468892 A CN 116468892A CN 202310444546 A CN202310444546 A CN 202310444546A CN 116468892 A CN116468892 A CN 116468892A
Authority
CN
China
Prior art keywords
feature
dimensional
point
point cloud
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310444546.5A
Other languages
Chinese (zh)
Inventor
胡敏
李冬冬
李立江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Ruitu Technology Co ltd
Original Assignee
Beijing Zhongke Ruitu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Ruitu Technology Co ltd filed Critical Beijing Zhongke Ruitu Technology Co ltd
Priority to CN202310444546.5A priority Critical patent/CN116468892A/en
Publication of CN116468892A publication Critical patent/CN116468892A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The invention discloses a semantic segmentation method, a semantic segmentation device, electronic equipment and a storage medium of a three-dimensional point cloud, wherein the semantic segmentation method comprises the following steps: acquiring three-dimensional point cloud data to be processed; searching nearest neighbors of points in eight subspaces corresponding to the points according to preset searching radiuses by taking each point in the three-dimensional point cloud data as a center, taking the characteristics of the nearest neighbors as subspace characteristics if the nearest neighbors exist in the subspaces, and taking the point characteristics of the points as subspace characteristics if the nearest neighbors exist in the subspaces; fusing each subspace feature with the point feature to obtain a point fusion feature; according to the fusion features, the point cloud features of the three-dimensional point cloud data are determined, feature aggregation is carried out on the point cloud features based on a preset aggregation function, and a semantic segmentation result is obtained, so that the local relationship between the points is added before the features are extracted, the loss of information is reduced, and more accurate semantic segmentation of the three-dimensional point cloud is realized.

Description

Semantic segmentation method and device of three-dimensional point cloud, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer vision, and more particularly, to a semantic segmentation method and apparatus for three-dimensional point cloud, an electronic device, and a storage medium.
Background
In recent years, technologies such as computer vision and autopilot are in a rapid development period, two-dimensional data research cannot meet the social demands of the current stage, three-dimensional data processing is receiving more and more attention, and computer vision tasks are coming into a brand-new development stage in the three-dimensional field. The three-dimensional point cloud semantic segmentation task refers to classifying the same class of points into a subset according to semantic information of a given point cloud. The semantic segmentation is widely applied in actual scenes, and the problem of accurate and rapid semantic segmentation is one of the current research hotspots.
In the prior art, pointNet is a mountain-climbing operation of directly processing point clouds by a neural network, but the method does not consider local structural relations among the point clouds; although PointNet++ groups point clouds into different local point clouds, each point is handled separately in each local point cloud, and no point-to-point relationship is considered. The latter work mainly relies on the use of convolution, graphics or attention mechanisms to explore complex local geometry extractors, although these approaches have some improvement in performance, the complexity of the modules also makes the model run at low speed, while the later-proposed PointNeXt achieves good results without using complex local feature extractors, but ignores the point-to-point interrelationships of the local regions, which tends to result in loss of information.
Therefore, how to more accurately perform semantic segmentation on the three-dimensional point cloud is a technical problem to be solved at present.
Disclosure of Invention
The embodiment of the application provides a semantic segmentation method, a semantic segmentation device, electronic equipment and a storage medium for three-dimensional point cloud, which are used for more accurately carrying out semantic segmentation on the three-dimensional point cloud.
In a first aspect, a semantic segmentation method of a three-dimensional point cloud is provided, the method comprising: acquiring three-dimensional point cloud data to be processed; searching nearest neighbors of points in eight subspaces corresponding to the points according to a preset searching radius by taking each point in the three-dimensional point cloud data as a center, wherein the eight subspaces correspond to eight quadrants in a space coordinate system taking the points as origins; if the nearest neighbor exists in the subspace, taking the characteristic of the nearest neighbor as a subspace characteristic, otherwise taking the point characteristic of the point as the subspace characteristic; fusing each subspace feature with the point feature to obtain a fused feature of the point; and determining the point cloud characteristics of the three-dimensional point cloud data according to each fusion characteristic, and carrying out characteristic aggregation on the point cloud characteristics based on a preset aggregation function to obtain a semantic segmentation result.
In a second aspect, there is provided a semantic segmentation apparatus for a three-dimensional point cloud, the apparatus comprising: the acquisition module is used for acquiring the three-dimensional point cloud data to be processed; the searching module is used for searching nearest neighbor points of the points in eight subspaces corresponding to the points according to a preset searching radius by taking the points in the three-dimensional point cloud data as centers, wherein the eight subspaces correspond to eight quadrants in a space coordinate system taking the points as origins; the determining module is used for taking the characteristics of the nearest neighbor points as subspace characteristics if the nearest neighbor points exist in the subspace, otherwise taking the point characteristics of the points as subspace characteristics; the fusion module is used for fusing each subspace feature with the point feature to obtain the fusion feature of the point; and the aggregation module is used for determining the point cloud characteristics of the three-dimensional point cloud data according to the fusion characteristics, and carrying out characteristic aggregation on the point cloud characteristics based on a preset aggregation function to obtain a semantic segmentation result.
In a third aspect, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the semantic segmentation method of the three-dimensional point cloud of the first aspect via execution of the executable instructions.
In a fourth aspect, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the semantic segmentation method of a three-dimensional point cloud according to the first aspect.
By applying the technical scheme, three-dimensional point cloud data to be processed are obtained; searching nearest neighbors of points in eight subspaces corresponding to the points according to a preset searching radius by taking each point in the three-dimensional point cloud data as a center, wherein the eight subspaces correspond to eight quadrants in a space coordinate system taking the points as origins; if the nearest neighbor exists in the subspace, taking the characteristic of the nearest neighbor as a subspace characteristic, otherwise taking the point characteristic of the point as the subspace characteristic; fusing each subspace feature with the point feature to obtain a fused feature of the point; and determining the point cloud characteristics of the three-dimensional point cloud data according to each fusion characteristic, and carrying out characteristic aggregation on the point cloud characteristics based on a preset aggregation function to obtain a semantic segmentation result, so that the local relation between points is added before the characteristics are extracted, the information loss is reduced, and more accurate semantic segmentation on the three-dimensional point cloud is realized.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 shows a flow diagram of a semantic segmentation method of a three-dimensional point cloud according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a semantic segmentation method of a three-dimensional point cloud according to another embodiment of the present invention;
FIG. 3 is a schematic diagram of searching for nearest neighbors in an embodiment of the invention;
FIG. 4 is a schematic diagram of a convolution operation performed in an embodiment of the present disclosure;
fig. 5 shows a schematic structural diagram of a semantic segmentation device for three-dimensional point cloud according to an embodiment of the present invention;
fig. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
It is noted that other embodiments of the present application will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the present application is not limited to the precise construction set forth herein below and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
The subject application is operational with numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor devices, distributed computing environments that include any of the above devices or devices, and the like.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiment of the application provides a semantic segmentation method of a three-dimensional point cloud, as shown in fig. 1, comprising the following steps:
step S101, three-dimensional point cloud data to be processed are acquired.
The three-dimensional point cloud data to be processed can be acquired in real time, for example, the data acquisition is performed on the appointed three-dimensional space based on the laser radar, so as to obtain the three-dimensional point cloud data to be processed. The three-dimensional point cloud data to be processed can also be uploaded by a user or acquired from other servers.
In some embodiments of the present application, the acquiring three-dimensional point cloud data to be processed includes:
acquiring original three-dimensional point cloud data;
transforming the original three-dimensional point cloud data according to a formula II, wherein the formula II specifically comprises the following steps of:
wherein { f i,j Is f after grouping i ∈R d K local neighborhood points of (f), each neighborhood point f i,j Are all d-dimensional vectors, j=1,.. k×d ,α∈R d Is a learnable parameter, +.is Hadamard product, ∈=1 e-5 Is a small value for ensuring numerical stability and σ is a scalar quantity characterizing the feature bias of all local groupings and channels.
In this embodiment, data acquisition can be performed on the designated three-dimensional space based on the laser radar to obtain original three-dimensional point cloud data, and because the point cloud data has the characteristics of irregularity and local region sparseness, the stability of the model in a simple depth MLP structure is not strong due to the characteristics, so that in order to enhance the robustness of the model, the original three-dimensional point cloud data is transformed according to a formula two to obtain the three-dimensional point cloud data to be processed, thereby improving the stability in the subsequent semantic segmentation process.
Step S102, searching nearest neighbors of points in eight subspaces corresponding to the points according to a preset searching radius by taking the points in the three-dimensional point cloud data as centers.
In this embodiment, because the point cloud feature is required to be added to the correlation between points in the three-dimensional point cloud data, the nearest neighbor point of each point is searched first, specifically, each point in the three-dimensional point cloud data is used as the center, searching is performed in eight subspaces corresponding to the points according to a preset search radius, and whether the nearest neighbor point which is not greater than the preset search radius from the point exists in each subspace is judged, wherein the eight subspaces correspond to eight quadrants in a space coordinate system with the point as an origin, and the eight quadrants are formed after the space coordinate system is divided by the positive direction and the negative direction of the coordinate axis. Fig. 3 is a schematic diagram of searching for a nearest neighbor point according to an embodiment of the present invention.
And step S103, if the nearest neighbor exists in the subspace, taking the characteristic of the nearest neighbor as a subspace characteristic, otherwise taking the point characteristic of the point as the subspace characteristic.
When searching is carried out according to a preset searching radius, regarding the subspace with the nearest neighbor as subspace characteristics; and regarding a subspace in which no nearest neighbor point exists, namely, no related point which is not more than a preset searching radius from the point exists in the subspace, taking the point characteristic of the point as a subspace characteristic.
Step S104, fusing each subspace feature with the point feature to obtain the fusion feature of the point.
After each subspace feature is obtained, each subspace feature is fused with the point feature, and the fusion feature of the point is obtained, wherein the fusion feature covers the relationship between the points.
In some embodiments of the present application, the fusing each subspace feature with the point feature to obtain the fused feature of the point includes:
encoding each subspace feature and the point feature to obtain the encoding feature of the point;
and carrying out convolution operation on the coding features according to the X axis, the Y axis and the Z axis of the points in sequence to obtain the fusion features.
In this embodiment, feature extraction is performed through convolution operation, for efficient convolution operation, sub-space features and point features are encoded first to obtain encoded features of the points, and then convolution operation is performed on the encoded features in sequence according to the X axis, the Y axis and the Z axis of the points, so that direction information between the points is added to the convolution operation, and after the convolution operation is completed, fusion features are extracted, thereby improving accuracy of the fusion features.
In some embodiments of the present application, the convolving the encoded feature sequentially along the X-axis, the Y-axis, and the Z-axis of the point to obtain the fused feature includes:
performing a convolution operation on the coding features according to the X axis to combine eight subspace features of the points in pairs to obtain four-dimensional features;
performing a convolution operation on the four-dimensional features according to the Y axis to combine the four-dimensional features in pairs according to dimensions to obtain two-dimensional features;
performing a convolution operation on the two-dimensional features according to the Z axis to combine the two-dimensional features according to dimensions to obtain single-dimensional features;
and splicing the single-dimensional feature and the point feature to obtain the fusion feature.
In this embodiment, the four-dimensional feature has four dimensions, the two-dimensional feature has two dimensions, and the single-dimensional feature has one dimension. And performing convolution operation for one time according to the X axis, the Y axis and the Z axis in sequence, converting eight subspace features in the coding features into single-dimensional features, and then splicing the single-dimensional features and the point features to obtain fusion features, so that the accuracy of the fusion features is further improved. For example, if the point feature is a d-dimensional feature, the code feature is a 2 x d-dimensional feature, after the eight subspace features are converted into the single-dimensional features, the obtained fusion features are also d-dimensional features.
Step S105, determining point cloud features of the three-dimensional point cloud data according to each fusion feature, and performing feature aggregation on the point cloud features based on a preset aggregation function to obtain a semantic segmentation result.
After the fusion characteristics of each point are obtained, the point cloud characteristics of the three-dimensional point cloud data are determined based on the fusion characteristics, and then characteristic aggregation is carried out on the point cloud characteristics based on a preset aggregation function, so that classification of the point cloud characteristics is completed, and a semantic segmentation result is obtained.
In some embodiments of the present application, the preset aggregation function includes a first aggregation function that adopts a maximum pooling operation and a second aggregation function that adopts an average pooling operation, and the feature aggregation is performed on the point cloud feature based on the preset aggregation function, so as to obtain a semantic segmentation result, where the method includes:
determining the aggregation characteristic of each point according to a formula I, wherein the formula I is specifically as follows:
determining the semantic segmentation result according to each aggregation feature;
wherein f out For the aggregation feature, A (·) is the first aggregation function, B (·) is the second aggregation function,for residual MLP block, g i And j is the K local neighborhood points of the ith point in the characteristics of the point cloud.
In feature aggregation, the maximum pooling method can keep better texture features, but only takes the maximum value to lose other feature information. In this embodiment, the first aggregation function and the second aggregation function are sampled to perform the maximum pooling operation and the average pooling operation respectively, so that the integrity of the extracted features is ensured, and in addition, the residual MLP (Multilayer Perceptron, multi-layer perceptron) is adopted to perform feature extraction, so that a complex local feature extractor is avoided, the operation cost is reduced, and the segmentation efficiency is improved.
Alternatively, the residual MLP block is composed of a combination of FC layer, normalization layer and activation layer (repeated twice).
By applying the technical scheme, three-dimensional point cloud data to be processed are obtained; searching nearest neighbors of points in eight subspaces corresponding to the points according to a preset searching radius by taking each point in the three-dimensional point cloud data as a center, wherein the eight subspaces correspond to eight quadrants in a space coordinate system taking the points as origins; if the nearest neighbor exists in the subspace, taking the characteristic of the nearest neighbor as a subspace characteristic, otherwise taking the point characteristic of the point as the subspace characteristic; fusing each subspace feature with the point feature to obtain a fused feature of the point; and determining the point cloud characteristics of the three-dimensional point cloud data according to each fusion characteristic, and carrying out characteristic aggregation on the point cloud characteristics based on a preset aggregation function to obtain a semantic segmentation result, so that the local relation between points is added before the characteristics are extracted, the information loss is reduced, and more accurate semantic segmentation on the three-dimensional point cloud is realized.
In order to further explain the technical idea of the invention, the technical scheme of the invention is described with specific application scenarios.
The embodiment of the application provides a semantic segmentation method of a three-dimensional point cloud, which is shown in fig. 2 and comprises the following steps:
step S201, obtaining original three-dimensional point cloud data.
Step S202, performing linear processing on the original three-dimensional point cloud data.
Specifically, the original three-dimensional point cloud data is transformed according to a formula II, so as to obtain the three-dimensional point cloud data to be processed, wherein the formula II specifically comprises:
wherein { f i,j Is f after grouping i ∈R d K local neighborhood points of (f), each neighborhood point f i,j Are all d-dimensional vectors, j=1,.. k×d ,α∈R d Is a learnable parameter, +.is Hadamard product, ∈=1 e-5 Is a small value for ensuring numerical stability and σ is a scalar quantity characterizing the feature bias of all local groupings and channels.
Step S203, processing the neighborhood point interrelationship.
Specifically, as shown in fig. 3, each point in the three-dimensional point cloud data is taken as a center, and the nearest neighbor point of the point is searched in eight subspaces corresponding to the point according to a preset searching radius; if the nearest neighbor exists in the subspace, taking the characteristic of the nearest neighbor as the subspace characteristic, otherwise taking the point characteristic of the point as the subspace characteristic.
Coding each subspace feature and the point feature to obtain a coding feature M epsilon R of the point 2×2×2×d . And carrying out convolution operation on the coding features according to the X axis, the Y axis and the Z axis of the points in sequence to obtain fusion features. Specifically, as shown in the first graph from left to right in fig. 4, a convolution operation is performed on the coding features according to the X-axis, so as to combine eight subspace features of the points in pairs, and four-dimensional features are obtained; as shown in a second graph from left to right in fig. 4, performing a convolution operation on the four-dimensional features according to the Y axis to combine the four-dimensional features in pairs according to dimensions to obtain two-dimensional features; as shown in the third and fourth graphs from left to right in fig. 4, performing a convolution operation on the two-dimensional features according to the Z axis to combine the two-dimensional features according to dimensions to obtain a single-dimensional feature; and finally, splicing the single-dimensional features and the point features to obtain fusion features comprising d-dimensional vectors.
Step S204, determining point cloud characteristics of the three-dimensional point cloud data according to the fusion characteristics, and carrying out characteristic aggregation on the point cloud characteristics based on a preset aggregation function to obtain a semantic segmentation result.
Specifically, the preset aggregation function includes a first aggregation function adopting a maximum pooling operation and a second aggregation function adopting an average pooling operation, and the aggregation characteristics of each point are determined according to a formula I, wherein the formula I specifically includes:
determining a semantic segmentation result according to each aggregation feature;
wherein f out For the aggregation feature, A (-) is a first aggregation function, B (-) is a second aggregation function,for residual MLP block, g i And j is the K local neighborhood points of the ith point in the point cloud characteristics.
By applying the technical scheme, compared with the prior art, the method has the following beneficial effects:
1) The existing MLP-based method does not consider local point relation, but the method aiming at a complex local feature extractor increases operation cost, and the embodiment of the invention adds the interrelationship between the local points into the simple residual MLP feature extraction, so that the model fully considers the original geometric relation between data points and points while ensuring the reduction of operation cost, and effectively reduces the loss of information;
2) In the prior art, a maximum pooling method is generally adopted, and although better texture characteristics can be reserved during characteristic aggregation, other characteristic information can be lost only by taking the maximum value, so that the maximum pooling result and the average pooling result are spliced during characteristic extraction in the embodiment of the application, and the integrity of the extracted characteristics is ensured.
The embodiment of the application also provides a semantic segmentation device of the three-dimensional point cloud, as shown in fig. 5, the device comprises: an obtaining module 501, configured to obtain three-dimensional point cloud data to be processed; the searching module 502 is configured to search, with each point in the three-dimensional point cloud data as a center, for a nearest neighbor point of the point in eight subspaces corresponding to the point according to a preset searching radius, where the eight subspaces correspond to eight quadrants in a space coordinate system with the point as an origin; a determining module 503, configured to take, if the nearest neighbor exists in the subspace, a feature of the nearest neighbor as a subspace feature, otherwise take, as the subspace feature, a point feature of the point; a fusion module 504, configured to fuse each subspace feature with the point feature to obtain a fused feature of the point; the aggregation module 505 is configured to determine point cloud features of the three-dimensional point cloud data according to each of the fusion features, and perform feature aggregation on the point cloud features based on a preset aggregation function to obtain a semantic segmentation result.
In a specific application scenario, the fusion module 504 is specifically configured to: encoding each subspace feature and the point feature to obtain the encoding feature of the point; and carrying out convolution operation on the coding features according to the X axis, the Y axis and the Z axis of the points in sequence to obtain the fusion features.
In a specific application scenario, the fusion module 504 is further specifically configured to: performing a convolution operation on the coding features according to the X axis to combine eight subspace features of the points in pairs to obtain four-dimensional features; performing a convolution operation on the four-dimensional features according to the Y axis to combine the four-dimensional features in pairs according to dimensions to obtain two-dimensional features; performing a convolution operation on the two-dimensional features according to the Z axis to combine the two-dimensional features according to dimensions to obtain single-dimensional features; and splicing the single-dimensional feature and the point feature to obtain the fusion feature.
In a specific application scenario, the preset aggregation functions include a first aggregation function that adopts a maximum pooling operation and a second aggregation function that adopts an average pooling operation, and the aggregation module 505 is specifically configured to: determining the aggregation characteristic of each point according to a formula I, wherein the formula I is specifically as follows:
determining the semantic segmentation result according to each aggregation feature;
wherein f out For the aggregation feature, A (·) is the first aggregation function, B (·) is the second aggregation function,for residual MLP block, g i And j is the K local neighborhood points of the ith point in the characteristics of the point cloud.
In a specific application scenario, the obtaining module 501 is specifically configured to: acquiring original three-dimensional point cloud data; transforming the original three-dimensional point cloud data according to a formula II, wherein the formula II specifically comprises the following steps of:
wherein { f i,j Is f after grouping i ∈R d K local neighborhood points of (f), each neighborhood point f i,j Are all d-dimensional vectors, j=1,.. k×d ,α∈R d Is a learnable parameter, +.is Hadamard product, ∈=1 e-5 Is a small value for ensuring numerical stability and σ is a scalar quantity characterizing the feature bias of all local groupings and channels.
The embodiment of the invention also provides an electronic device, as shown in fig. 6, which comprises a processor 601, a communication interface 602, a memory 603 and a communication bus 604, wherein the processor 601, the communication interface 602 and the memory 603 complete communication with each other through the communication bus 604,
a memory 603 for storing executable instructions of the processor;
a processor 601 configured to execute via execution of the executable instructions:
acquiring three-dimensional point cloud data to be processed; searching nearest neighbors of points in eight subspaces corresponding to the points according to a preset searching radius by taking each point in the three-dimensional point cloud data as a center, wherein the eight subspaces correspond to eight directions of the points in the three-dimensional space respectively; if the nearest neighbor exists in the subspace, taking the characteristic of the nearest neighbor as a subspace characteristic, otherwise taking the point characteristic of the point as the subspace characteristic; fusing each subspace feature with the point feature to obtain a fused feature of the point; and determining the point cloud characteristics of the three-dimensional point cloud data according to each fusion characteristic, and carrying out characteristic aggregation on the point cloud characteristics based on a preset aggregation function to obtain a semantic segmentation result.
The communication bus may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or an EISA (Extended Industry Standard Architecture ) bus, or the like. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the terminal and other devices.
The memory may include RAM (Random Access Memory ) or may include non-volatile memory, such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a CPU (Central Processing Unit ), NP (Network Processor, network processor), etc.; but also DSP (Digital Signal Processing, digital signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
In yet another embodiment of the present invention, a computer readable storage medium is provided, in which a computer program is stored, which when executed by a processor implements the semantic segmentation method of a three-dimensional point cloud as described above.
In yet another embodiment of the present invention, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the semantic segmentation method of a three-dimensional point cloud as described above.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (10)

1. A semantic segmentation method of a three-dimensional point cloud, the method comprising:
acquiring three-dimensional point cloud data to be processed;
searching nearest neighbors of points in eight subspaces corresponding to the points according to a preset searching radius by taking each point in the three-dimensional point cloud data as a center, wherein the eight subspaces correspond to eight quadrants in a space coordinate system taking the points as origins;
if the nearest neighbor exists in the subspace, taking the characteristic of the nearest neighbor as a subspace characteristic, otherwise taking the point characteristic of the point as the subspace characteristic;
fusing each subspace feature with the point feature to obtain a fused feature of the point;
and determining the point cloud characteristics of the three-dimensional point cloud data according to each fusion characteristic, and carrying out characteristic aggregation on the point cloud characteristics based on a preset aggregation function to obtain a semantic segmentation result.
2. The method of claim 1, wherein said fusing each of said subspace features with said point features to obtain a fused feature of said point comprises:
encoding each subspace feature and the point feature to obtain the encoding feature of the point;
and carrying out convolution operation on the coding features according to the X axis, the Y axis and the Z axis of the points in sequence to obtain the fusion features.
3. The method of claim 2, wherein convolving the encoded features sequentially along the X-axis, Y-axis, and Z-axis of the point to obtain the fused feature, comprising:
performing a convolution operation on the coding features according to the X axis to combine eight subspace features of the points in pairs to obtain four-dimensional features;
performing a convolution operation on the four-dimensional features according to the Y axis to combine the four-dimensional features in pairs according to dimensions to obtain two-dimensional features;
performing a convolution operation on the two-dimensional features according to the Z axis to combine the two-dimensional features according to dimensions to obtain single-dimensional features;
and splicing the single-dimensional feature and the point feature to obtain the fusion feature.
4. The method of claim 1, wherein the preset aggregation function includes a first aggregation function using a maximum pooling operation and a second aggregation function using an average pooling operation, the feature aggregation is performed on the point cloud feature based on the preset aggregation function, so as to obtain a semantic segmentation result, and the method includes:
determining the aggregation characteristic of each point according to a formula I, wherein the formula I is specifically as follows:
determining the semantic segmentation result according to each aggregation feature;
wherein f out For the aggregation feature, A (·) is the first aggregation function, B (·) is the second aggregation function,for residual MLP block, g i And j is the K local neighborhood points of the ith point in the characteristics of the point cloud.
5. The method of claim 1, wherein the acquiring three-dimensional point cloud data to be processed comprises:
acquiring original three-dimensional point cloud data;
transforming the original three-dimensional point cloud data according to a formula II, wherein the formula II specifically comprises the following steps of:
wherein { f i,j Is f after grouping i ∈R d K local neighborhood points of (f), each neighborhood point f i,j Are all d-dimensional vectors, j=1,.. k×d ,α∈R d Is a learnable parameter, +.is Hadamard product, ∈=1 e-5 Is a small value for ensuring numerical stability and σ is a scalar quantity characterizing the feature bias of all local groupings and channels.
6. A semantic segmentation apparatus for a three-dimensional point cloud, the apparatus comprising:
the acquisition module is used for acquiring the three-dimensional point cloud data to be processed;
the searching module is used for searching nearest neighbor points of the points in eight subspaces corresponding to the points according to a preset searching radius by taking the points in the three-dimensional point cloud data as centers, wherein the eight subspaces correspond to eight quadrants in a space coordinate system taking the points as origins;
the determining module is used for taking the characteristics of the nearest neighbor points as subspace characteristics if the nearest neighbor points exist in the subspace, otherwise taking the point characteristics of the points as subspace characteristics;
the fusion module is used for fusing each subspace feature with the point feature to obtain the fusion feature of the point;
and the aggregation module is used for determining the point cloud characteristics of the three-dimensional point cloud data according to the fusion characteristics, and carrying out characteristic aggregation on the point cloud characteristics based on a preset aggregation function to obtain a semantic segmentation result.
7. The apparatus of claim 6, wherein the fusion module is specifically configured to:
encoding each subspace feature and the point feature to obtain the encoding feature of the point;
and carrying out convolution operation on the coding features according to the X axis, the Y axis and the Z axis of the points in sequence to obtain the fusion features.
8. The apparatus of claim 7, wherein the fusion module is further specifically configured to:
performing a convolution operation on the coding features according to the X axis to combine eight subspace features of the points in pairs to obtain four-dimensional features;
performing a convolution operation on the four-dimensional features according to the Y axis to combine the four-dimensional features in pairs according to dimensions to obtain two-dimensional features;
performing a convolution operation on the two-dimensional features according to the Z axis to combine the two-dimensional features according to dimensions to obtain single-dimensional features;
and splicing the single-dimensional feature and the point feature to obtain the fusion feature.
9. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the semantic segmentation method of the three-dimensional point cloud of any of claims 1-5 via execution of the executable instructions.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the semantic segmentation method of a three-dimensional point cloud according to any one of claims 1 to 5.
CN202310444546.5A 2023-04-24 2023-04-24 Semantic segmentation method and device of three-dimensional point cloud, electronic equipment and storage medium Pending CN116468892A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310444546.5A CN116468892A (en) 2023-04-24 2023-04-24 Semantic segmentation method and device of three-dimensional point cloud, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310444546.5A CN116468892A (en) 2023-04-24 2023-04-24 Semantic segmentation method and device of three-dimensional point cloud, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116468892A true CN116468892A (en) 2023-07-21

Family

ID=87178628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310444546.5A Pending CN116468892A (en) 2023-04-24 2023-04-24 Semantic segmentation method and device of three-dimensional point cloud, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116468892A (en)

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069883A (en) * 2020-07-28 2020-12-11 浙江工业大学 Deep learning signal classification method fusing one-dimensional and two-dimensional convolutional neural network
CN112183330A (en) * 2020-09-28 2021-01-05 北京航空航天大学 Target detection method based on point cloud
CN112418235A (en) * 2020-11-20 2021-02-26 中南大学 Point cloud semantic segmentation method based on expansion nearest neighbor feature enhancement
CN112801015A (en) * 2021-02-08 2021-05-14 华南理工大学 Multi-mode face recognition method based on attention mechanism
CN113052835A (en) * 2021-04-20 2021-06-29 江苏迅捷装具科技有限公司 Medicine box detection method and detection system based on three-dimensional point cloud and image data fusion
KR20210106703A (en) * 2020-02-21 2021-08-31 전남대학교산학협력단 Semantic segmentation system in 3D point cloud and semantic segmentation method in 3D point cloud using the same
CN113658122A (en) * 2021-08-09 2021-11-16 深圳市欢太科技有限公司 Image quality evaluation method, device, storage medium and electronic equipment
CN114092803A (en) * 2021-11-01 2022-02-25 武汉卓目科技有限公司 Cloud detection method and device based on remote sensing image, electronic device and medium
CN114255238A (en) * 2021-11-26 2022-03-29 电子科技大学长三角研究院(湖州) Three-dimensional point cloud scene segmentation method and system fusing image features
CN114266891A (en) * 2021-11-17 2022-04-01 京沪高速铁路股份有限公司 Railway operation environment abnormity identification method based on image and laser data fusion
WO2022088676A1 (en) * 2020-10-29 2022-05-05 平安科技(深圳)有限公司 Three-dimensional point cloud semantic segmentation method and apparatus, and device and medium
CN114581411A (en) * 2022-02-28 2022-06-03 北京科技大学 Convolution kernel generation method and device and electronic equipment
CN114723764A (en) * 2022-02-28 2022-07-08 西安理工大学 Parameterized edge curve extraction method for point cloud object
CN114863062A (en) * 2022-06-07 2022-08-05 南京航空航天大学深圳研究院 Industrial scene 3D point cloud model construction method based on point and voxel characteristic representation
CN115409989A (en) * 2022-09-22 2022-11-29 沈阳工业大学 Three-dimensional point cloud semantic segmentation method for optimizing boundary
CN115457395A (en) * 2022-09-22 2022-12-09 南京信息工程大学 Lightweight remote sensing target detection method based on channel attention and multi-scale feature fusion
CN115471641A (en) * 2022-08-31 2022-12-13 广东三维家信息科技有限公司 Three-dimensional indoor scene completion method, device, equipment and storage medium
US20230052595A1 (en) * 2021-08-16 2023-02-16 GE Precision Healthcare LLC Deep learning-based image quality enhancement of three-dimensional anatomy scan images
CN115861619A (en) * 2022-12-20 2023-03-28 重庆大学 Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network
CN115937507A (en) * 2022-04-17 2023-04-07 北京工业大学 Point cloud semantic segmentation method based on point void direction convolution

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210106703A (en) * 2020-02-21 2021-08-31 전남대학교산학협력단 Semantic segmentation system in 3D point cloud and semantic segmentation method in 3D point cloud using the same
CN112069883A (en) * 2020-07-28 2020-12-11 浙江工业大学 Deep learning signal classification method fusing one-dimensional and two-dimensional convolutional neural network
CN112183330A (en) * 2020-09-28 2021-01-05 北京航空航天大学 Target detection method based on point cloud
WO2022088676A1 (en) * 2020-10-29 2022-05-05 平安科技(深圳)有限公司 Three-dimensional point cloud semantic segmentation method and apparatus, and device and medium
CN112418235A (en) * 2020-11-20 2021-02-26 中南大学 Point cloud semantic segmentation method based on expansion nearest neighbor feature enhancement
CN112801015A (en) * 2021-02-08 2021-05-14 华南理工大学 Multi-mode face recognition method based on attention mechanism
CN113052835A (en) * 2021-04-20 2021-06-29 江苏迅捷装具科技有限公司 Medicine box detection method and detection system based on three-dimensional point cloud and image data fusion
CN113658122A (en) * 2021-08-09 2021-11-16 深圳市欢太科技有限公司 Image quality evaluation method, device, storage medium and electronic equipment
US20230052595A1 (en) * 2021-08-16 2023-02-16 GE Precision Healthcare LLC Deep learning-based image quality enhancement of three-dimensional anatomy scan images
CN114092803A (en) * 2021-11-01 2022-02-25 武汉卓目科技有限公司 Cloud detection method and device based on remote sensing image, electronic device and medium
CN114266891A (en) * 2021-11-17 2022-04-01 京沪高速铁路股份有限公司 Railway operation environment abnormity identification method based on image and laser data fusion
CN114255238A (en) * 2021-11-26 2022-03-29 电子科技大学长三角研究院(湖州) Three-dimensional point cloud scene segmentation method and system fusing image features
CN114723764A (en) * 2022-02-28 2022-07-08 西安理工大学 Parameterized edge curve extraction method for point cloud object
CN114581411A (en) * 2022-02-28 2022-06-03 北京科技大学 Convolution kernel generation method and device and electronic equipment
CN115937507A (en) * 2022-04-17 2023-04-07 北京工业大学 Point cloud semantic segmentation method based on point void direction convolution
CN114863062A (en) * 2022-06-07 2022-08-05 南京航空航天大学深圳研究院 Industrial scene 3D point cloud model construction method based on point and voxel characteristic representation
CN115471641A (en) * 2022-08-31 2022-12-13 广东三维家信息科技有限公司 Three-dimensional indoor scene completion method, device, equipment and storage medium
CN115409989A (en) * 2022-09-22 2022-11-29 沈阳工业大学 Three-dimensional point cloud semantic segmentation method for optimizing boundary
CN115457395A (en) * 2022-09-22 2022-12-09 南京信息工程大学 Lightweight remote sensing target detection method based on channel attention and multi-scale feature fusion
CN115861619A (en) * 2022-12-20 2023-03-28 重庆大学 Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHARLES R. QI ET AL.: "PointNet++:Deep Hierachical Feature Learning on Point Sets in a Metric Space", 《ARXIV.ORG》, pages 1 - 14 *
MINGYANG JIANG ET AL.: "PointSIFT: A SIFT-like Network Module for 3D Point Cloud Semantic Segmentation", 《ARXIV.ORG》, pages 1 - 10 *
彭玉旭: "基于注意力机制的三维点云车辆目标检测", 《计算机系统应用》, vol. 30, no. 12, pages 211 - 217 *

Similar Documents

Publication Publication Date Title
Daradkeh et al. Development of effective methods for structural image recognition using the principles of data granulation and apparatus of fuzzy logic
Eldesokey et al. Propagating confidences through cnns for sparse data regression
CN107240029B (en) Data processing method and device
CN111582054B (en) Point cloud data processing method and device and obstacle detection method and device
Wu et al. A closed-form solution to tensor voting: Theory and applications
WO2022193335A1 (en) Point cloud data processing method and apparatus, and computer device and storage medium
CN115374186B (en) Data processing method based on big data and AI system
CN112749726B (en) Training method and device for target detection model, computer equipment and storage medium
CN111553946A (en) Method and device for removing ground point cloud and obstacle detection method and device
CN112336342A (en) Hand key point detection method and device and terminal equipment
CN115860836B (en) E-commerce service pushing method and system based on user behavior big data analysis
CN111428805B (en) Method for detecting salient object, model, storage medium and electronic device
CN109993026B (en) Training method and device for relative recognition network model
CN112435193A (en) Method and device for denoising point cloud data, storage medium and electronic equipment
CN110889323A (en) Universal license plate recognition method and device, computer equipment and storage medium
Qi et al. Fast and robust homography estimation method with algebraic outlier rejection
CN115546574A (en) Image classification method, model training method, image classification apparatus, model training apparatus, storage medium, and computer program
CN116468892A (en) Semantic segmentation method and device of three-dimensional point cloud, electronic equipment and storage medium
CN111930858A (en) Representation learning method and device of heterogeneous information network and electronic equipment
CN116704254A (en) Point cloud classification method, point cloud classification device, computer equipment and storage medium
CN115830342A (en) Method and device for determining detection frame, storage medium and electronic device
CN116028832A (en) Sample clustering processing method and device, storage medium and electronic equipment
CN115685133A (en) Positioning method for autonomous vehicle, control device, storage medium, and vehicle
CN113468604A (en) Big data privacy information analysis method and system based on artificial intelligence
CN114897147A (en) Backbone network generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination