CN113393474A - Feature fusion based three-dimensional point cloud classification and segmentation method - Google Patents
Feature fusion based three-dimensional point cloud classification and segmentation method Download PDFInfo
- Publication number
- CN113393474A CN113393474A CN202110648726.6A CN202110648726A CN113393474A CN 113393474 A CN113393474 A CN 113393474A CN 202110648726 A CN202110648726 A CN 202110648726A CN 113393474 A CN113393474 A CN 113393474A
- Authority
- CN
- China
- Prior art keywords
- scale
- local
- point cloud
- features
- attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a method for classifying and segmenting three-dimensional point cloud based on feature fusion. The method comprises the following steps: dividing the three-dimensional point cloud into a plurality of local areas, establishing a multi-scale area in each local area through a KNN algorithm, extracting fine-grained scale features of the scale areas through a graph attention convolution layer, distributing attention weight to each scale feature of the local area, and performing weighted fusion on each scale feature of the local area according to the attention weight to obtain the local area features of the point cloud containing fine-grained geometric information; context information among different local area features is obtained through a bidirectional long-term and short-term memory network, all the local area features are fused to obtain global semantic features of the point cloud, and the three-dimensional point cloud is classified and segmented. The invention excavates fine-grained multi-scale information of different local areas, captures the local area information by combining the correlation among the different scale areas, and improves the accuracy of classification and segmentation tasks in the understanding of the three-dimensional point cloud scene.
Description
Technical Field
The invention relates to the technical field of computer application, in particular to a method for classifying and segmenting three-dimensional point cloud based on feature fusion.
Background
Typical tasks of three-dimensional point cloud scene understanding include target detection, shape classification, target segmentation and the like, wherein the classification and segmentation tasks become research focuses in the fields of surveying and mapping geography, navigation positioning, automatic driving and the like, and are widely applied to real scenes: the modeling of trees is realized according to the classified and segmented tree point cloud data, the information of forest resources is mastered, and efficient forest resource management and decision making are carried out; weeds in the vegetable greenhouse are identified through a point cloud classification and segmentation technology, so that the quality of crops is guaranteed; by classifying and segmenting the medical point cloud data, doctors can be better assisted to carry out more accurate diagnosis and treatment; by segmenting and reconstructing the incomplete historical relics, the research and the protection of the historical relics are facilitated. In these complex application scenarios, capturing fine-grained local geometric information (such as distance, direction, etc.) and context information (association between point pairs and between different regions) of the point cloud is particularly critical.
The phenomenon that two or more objects adjacent to each other cannot be effectively distinguished due to the fact that apparent shapes of the objects are very similar, and therefore the objects cannot be classified and divided correctly is caused. For example: objects such as beams, room columns and walls are very similar in appearance and shape, only local geometric structure and context detail information are different, and if the detail information cannot be sufficiently mined, target classification and segmentation errors are easily caused.
The small target is segmented by mistake in the segmentation process, which is a phenomenon that a target with a small volume and inconspicuous apparent details is segmented by mistake in a complex environment due to a large difference of target sizes.
The rough segmentation edge refers to an adjacent or overlapped target, and the phenomenon that the segmentation outline of the target is unclear is caused by neglecting the structural relationship between edge points of the adjacent target.
The practical application of the classification and segmentation technology for understanding the three-dimensional point cloud scene can effectively assist research works in different fields, but due to the restriction of multiple factors in a complex real environment, the classification and segmentation method for the three-dimensional point cloud still has many problems worthy of deep research, and has important research value. How to fully capture fine-grained local geometric information and context information of point cloud, and to deal with the problems that apparent similar targets cannot be effectively distinguished in a complex scene, small targets are wrongly divided, the divided edges are rough, and the like, is a problem to be solved urgently.
In the prior art, most of point cloud classification and segmentation methods extract point cloud features by a manual extraction mode, and then realize the classification and segmentation of point clouds by constructing corresponding discrimination models. However, when the conventional methods face point cloud data with more and more types and quantities, the problems of high calculation cost, loss of detail information, low accuracy and the like exist.
A first point cloud classification and segmentation method in the prior art includes: an Attention Mechanism (Attention Mechanism) can let the network focus on useful information while ignoring useless information, and is introduced into a point cloud classification and segmentation algorithm in order to capture more important information of the point cloud. Compared with the method based on the RNN, the method based on the attention mechanism can capture the association relation among different layer features of the point cloud, gather important information and eliminate useless information, is not limited by the length of an input sequence when capturing the long-term dependence relation among the features, has few parameters and is simpler in model.
The attention mechanism mainly calculates attention weight through probability distribution, so that the network ignores irrelevant information and focuses on important information. FIG. 1 is a schematic diagram of a prior art attention mechanism in its general form. The attention mechanism can be seen as a mapping of Query (Query) to Key-value pair (Key-value), and the attention weight refers to QSimilarity of ue and each Key. In fig. 1, K ═ K (K)1,k2,...,kN) Denotes a bond sequence, V ═ V1,v2,...,vN) Denotes a sequence of values, Q ═ Q1,q2,...,qM) Representing the query sequence, the calculation process can be divided into the following three steps:
(1) calculating an attention score eti. Computing query q using dot product, scaling dot product, stitching, and addingtAnd each key kiThe attention score of (2) is shown in the formula (2-1).
(2) Calculating the attention weight alphati. Normalization of the attention score e using the Softmax functiontiAs shown in equation (2-2).
(3) The Attention output Attention is calculated. Attention is weighted by alphatiAnd its corresponding value viThe weighted sum is performed as shown in equation (2-3).
Attention(qt,K,V)=∑i αti vi (2-3)
The first point cloud classification and segmentation method in the prior art has the following disadvantages: the attention mechanism cannot learn the front-back sequence relation in the feature sequence, and the front-back sequence between the classification and segmentation midpoints of the point cloud and between the regions also contains a part of important information, so that the loss of part of important information of the point cloud is still caused
The second point cloud classification and segmentation method in the prior art includes: bi-directional Long Short-Term Memory (Bi-LSTM) is a Bi-directional Long Short-Term Memory (LSTM), and LSTM and Bi-LSTM will be introduced in this section.
The long-short term memory network (LSTM) is an improved RNN network, which adds a gate structure for memory, update and transfer respectively in a hidden layer, and adds a core memory unit for storing history information, and the internal structure thereof is shown in fig. 2. In FIG. 2, xtRepresenting an input sequence value; h ist-1Representing the hidden layer state at time t-1; the memory unit c is used for controlling the transmission of information and is the core of the network; input Gate i determines Current xtHow much information to keep to ct(ii) a Forgetting door f determines how many previous time c are storedt-1To current ct(ii) a Output gate o determines ctHow many outputs h to present state to transmitt。
The bidirectional long-short term memory network (Bi-LSTM) can reasonably and effectively solve the problem of the front-back association of information at different moments by considering the influence of the information at the previous moment and the information at the next moment on the information at the current moment, and the internal structure of the bidirectional long-short term memory network is shown in FIG. 3. The input layer encodes input data into a sequence which meets the network input requirement through sequence encoding; the forward LSTM layer acquires history information from front to back through the LSTM; the backward LSTM layer acquires future information from back to front through the LSTM; the output layer integrates the output of the forward LSTM layer and the output of the backward LSTM layer through a sequence connection.
The second point cloud classification and segmentation method in the prior art has the following disadvantages: the LSTM controls the updating and the transmission of historical information through a memory unit and a gate structure, can process and predict a time sequence, and effectively relieves the long-distance dependence problem of RNN and CNN. But if the input information sequence is too long, it still causes the problems of long-distance memory loss and gradient disappearance. The Bi-LSTM considers the influence of historical information and future information on the current information at the same time, can reasonably and effectively solve the problem of information context correlation, but if the length of the characteristic sequence is too long, the gradient attenuation is still large, and the effective information at the front end of the input sequence is lost.
Disclosure of Invention
The invention provides a method for classifying and segmenting three-dimensional point cloud based on feature fusion, which is used for effectively understanding, classifying and segmenting a three-dimensional point cloud scene.
In order to achieve the purpose, the invention adopts the following technical scheme.
A method for classifying and segmenting three-dimensional point cloud based on feature fusion comprises the following steps:
dividing the three-dimensional point cloud into a plurality of local areas, establishing a multi-scale area in each local area through a KNN (K nearest neighbor) search algorithm, and extracting fine-grained multi-scale features of the multi-scale area through a graph attention convolution layer;
distributing attention weight to each scale feature of the local region through a space attention mechanism, and performing weighted fusion on each scale feature of the local region according to the attention weight to obtain local region features of point cloud containing fine-grained geometric information;
acquiring context information among different local area features of the point cloud through a bidirectional long-term and short-term memory network, and fusing the local area features according to the context information among the different local area features to obtain the global semantic features of the point cloud;
and classifying and dividing the three-dimensional point cloud according to the global semantic features of the point cloud.
Preferably, the establishing a multi-scale region in each local region in the point cloud by a KNN search algorithm includes:
dividing input three-dimensional point cloud into M local regions { L ] through iteration farthest point sampling algorithm and KNN searching algorithm1,L2,...,LMAt each local area LMRespectively adopting KNN search algorithm to construct T scale regions, and dividing local region LMDivision into scale regions of T different scales SM1,SM2,...,SMTAnd establishing a multi-scale area according to all the scale areas.
Preferably, the extracting fine-grained multi-scale features of the multi-scale region by the attention convolutional layer comprises:
in each scale region, extracting association information between different neighborhood points and a central point through a graph convolution layer fusing spatial position information and characteristic attribute information, capturing local geometric information of point cloud to obtain fine-grained geometric characteristics of each scale region, extracting fine-grained scale characteristics of T different scale regions through a graph attention convolution layer, and obtaining fine-grained multi-scale characteristics of a multi-scale region according to the fine-grained scale characteristics of each scale region.
Preferably, the allocating an attention weight to each scale feature of the local region through the spatial attention mechanism, performing weighted fusion on each scale feature of the local region according to the attention weight, and obtaining the local region feature of the point cloud containing fine-grained geometric information includes:
distributing different weights to information from different neighborhood points by adopting a spatial attention mechanism, respectively distributing attention weights to scale features of T scale regions of each local region, aggregating neighborhood point information of each local region to a central point, and performing weighted fusion on the scale features of the T scale regions of each local region according to the attention weights to obtain 1 local region feature, wherein the local region feature comprises point cloud fine-grained local geometric information and point pair association information.
Preferably, the acquiring context information between different local area features of the point cloud through the bidirectional long-short term memory network includes:
an encoder which forms a context attention coding layer by utilizing a Bi-directional long-short term memory network Bi-LSTM and a grouping attention module, wherein M local area features of the point cloud are abstracted into a local feature sequence R ═ R by the encoder1,r2,...,rm,...rMExtracting context information among different local area features by using the local feature sequence R, and outputting the local area features h containing the context informationmThe calculation method is shown as the formula (4-1):
wherein, WaIs a weight matrix which can be learnt;representing the combination of the Bi-LSTM output layers, LSTM represents the nonlinear activation function.
Preferably, the fusing the local region features according to the context information between the different local region features to obtain the global semantic features of the point cloud includes:
all local region characteristics h containing context information after Bi-LSTM encodingmForming a local signature sequence H ═ H1,h2,...,hm,...hm};
Using grouped attention modules for different local area features hmDistributing different attention weights, calculating the incidence relation between different local area features according to the local feature sequence H and the transposition thereof, applying a Softmax function to carry out normalization processing on the relation graph to obtain a group attention matrix G, and using the G to obtain a group attention matrix Gj,iThe influence of the ith local area feature on the jth local area feature, i.e. the attention weight, is shown in formula (4-2):
performing weighted fusion between the local feature sequence H and the grouping attention matrix G by using matrix multiplication, connecting the result of the weighted fusion with the originally input local feature sequence H by using jump link, and outputting the global semantic feature C of the three-dimensional point cloud, as shown in a formula (4-3):
according to the technical scheme provided by the embodiment of the invention, the method provided by the embodiment of the invention not only can mine fine-grained multi-scale information of different local areas, but also can capture the local area information by combining the correlation among the different scale areas, so that the problem of insufficient feature expression capability of the local areas is solved, the distinguishing capability of the network on apparent similar targets is enhanced, and the accuracy of classification and segmentation tasks in the understanding of the three-dimensional point cloud scene is improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram in general form of an attention mechanism of the prior art;
FIG. 2 is a schematic diagram of an internal structure of a Long Short-Term Memory network (LSTM) in the prior art;
FIG. 3 is an internal block diagram of a prior art Bi-directional long short term memory (Bi-LSTM) network;
fig. 4 is an implementation schematic diagram of a three-dimensional point cloud scene understanding method based on feature fusion according to an embodiment of the present invention;
fig. 5 is a specific processing flow chart of a three-dimensional point cloud scene understanding method based on feature fusion according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an implementation of the feature extraction method for fusing fine-grained multi-scale information according to the present invention;
FIG. 7 is a diagram illustrating a structure of a force convolution layer according to an embodiment of the present invention;
FIG. 8 is an internal block diagram of a spatial attention module according to an embodiment of the present invention;
fig. 9 is a schematic diagram of an implementation of a feature fusion method based on context attention RNN according to an embodiment of the present invention;
FIG. 10 is a diagram illustrating an internal structure of a context attention RNN coding layer according to an embodiment of the present invention;
fig. 11 is an internal structural diagram of a packet attention module according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.
The embodiment of the invention is based on the research and analysis of the existing three-dimensional point cloud scene understanding algorithm, mainly researches the three-dimensional point cloud scene understanding based on feature fusion, and realizes the tasks of shape classification, part segmentation and semantic segmentation in the three-dimensional scene understanding.
According to the embodiment of the invention, important contextual information such as geometric shape information, spatial position information, context information and the like of the point cloud is fully captured through feature fusion of different layers, the recognition rate of learning features is improved, the problems that similar objects cannot be effectively distinguished in a complex scene, the small objects are segmented by mistake, the segmentation edges are rough and the like are solved, and the performance of three-dimensional scene understanding tasks such as shape classification, part segmentation and semantic segmentation is improved.
Fig. 4 is an implementation schematic diagram of a feature fusion-based three-dimensional point cloud scene understanding method provided in an embodiment of the present invention, and a specific processing flow is shown in fig. 5, and includes the following processing steps:
and step S10, dividing the three-dimensional point cloud into a plurality of local areas, and establishing a multi-scale area in each local area through a KNN algorithm.
An implementation schematic diagram of the feature extraction method fusing fine-grained multi-scale information provided by the invention is shown in fig. 6. The treatment process comprises the following steps: dividing input three-dimensional point cloud into M local regions { L ] through iteration farthest point sampling algorithm and KNN searching algorithm1,L2,...,LMAt each local area LMIn the method, T scale regions are constructed by a KNN search algorithm, the value of T depends on a specific task, and is initially set to be 4, KT is [16, 32, 64 and 128 ]]) A local region LMDivision into scale regions of T different scales SM1,SM2,...,SMTAnd establishing a multi-scale area according to all the scale areas.
And step S20, extracting fine-grained multi-scale features of the multi-scale region through the graph attention convolution layer.
In order to fully capture fine-grained local geometric information of point cloud, the invention provides a multi-scale feature extraction method based on a graph attention convolution layer at a multi-scale feature extraction stage. In each scale region, extracting the association information between different neighborhood points and a central point through a graph convolution layer fusing spatial position information and characteristic attribute information, and capturing local geometric information of point cloud to obtain fine-grained geometric characteristics of each scale region. And extracting fine-grained scale features of T different scale regions through the graph attention convolution layer, and obtaining the fine-grained multi-scale features of the multi-scale region according to the fine-grained scale features of each scale region.
An attention-seeking convolutional layer structure provided by an embodiment of the present invention is shown in FIG. 7, in which Pt={Pm,Pm1,Pm2,...,PmkDenotes a local area LmSet of points in the tth scale region in, PmRepresents the center point, PmkA kth nearest neighbor point representing a center point m; representing the relation between the central point and the neighborhood point by using a K neighbor graph G (V, E), wherein V represents a vertex set of the graph, and E represents an edge set of the graph; with emkRepresenting the edge coefficient of the graph, αmkThe graph attention weight is represented.
In order to fully capture fine-grained local geometric information of point cloud, the invention calculates the edge coefficient e of the graphmkSimultaneously considering the spatial position information and the characteristic attribute information of the point cloud; in order to focus on capturing useful information and avoid useless information redundancy, the invention also adopts a mode of attention gathering to replace the traditional Max-pooling gathering mode (Max-pooling).
And step S30, distributing attention weight to the scale features of each local area through a space attention mechanism, and performing weighted fusion on the scale features of each local area according to the attention weight to obtain the local area features of the point cloud containing fine-grained geometric information.
In the multi-scale feature fusion stage, the importance of different scale features is emphasized, different weights are distributed to information from different neighborhood points by adopting a spatial attention mechanism, attention weights are respectively distributed to the scale features of T scale regions of each local region, the neighborhood point information of each local region is aggregated to a central point, the scale features of the T scale regions of each local region are subjected to weighted fusion according to the attention weights, and 1 local region feature is obtained, wherein the local region feature comprises fine-grained local geometric information and point pair association information.
In order to selectively aggregate the multi-scale features, fig. 8 is an internal structure diagram of a spatial attention module according to an embodiment of the present invention, where the spatial attention module utilizes an attention mechanism idea to emphasize importance of different scale features, dynamically assigns appropriate attention weights, and performs weighted fusion on the multi-scale features and the attention weights to obtain local region features.
As shown in fig. 8, firstly, two MLP (Multi-Layer per perware) layers are used to map the Multi-scale feature S into two new feature representations F1 and F2, then the association relationship between features of different scales is calculated according to the transpose of F1 and F2, and the relationship diagram is normalized by applying the Softmax function, so as to obtain the spatial attention matrix a. A for the inventionijThe spatial attention weight is expressed as shown in the formula (3-4), wherein i and j represent the positions of the scale features in F1 and F2 respectively, and the higher the dependency relationship of the two features is, the higher the value is, and the longer-term dependency relationship between the features is represented.
Meanwhile, the multi-scale features S are converted into new features D through an MLP layer, then matrix multiplication is used between the features D and a spatial attention matrix A to perform weighted fusion on the multi-scale features, finally, a jump link is used to connect the weighted fusion result with the originally input multi-scale features S, and the final local region features L are output, as shown in a formula (3-5).
L=A·D+s (3-5)
In conclusion, the method directly calculates the spatial correlation between the features from the transpose of F1 and F2 without changing the matrix, thereby maintaining the original spatial distribution of the point cloud and better capturing the fine-grained multi-scale features of the point cloud. Compared with the feature fusion based on maximum pooling (Max clustering), the multi-scale feature fusion method based on spatial attention provided by the invention can selectively gather the multi-scale features and avoid feature pollution among the scale features.
And step S40, acquiring context information among different local area features of the point cloud through the bidirectional long-short term memory network.
An implementation schematic diagram of the feature fusion method based on the context attention RNN provided by the embodiment of the invention is shown in FIG. 9, and firstly, the front-back relevance among the features of local regions is extracted through a Bi-LSTM model, and context information among different local regions is captured; then, a grouping attention mechanism is introduced, and the consideration on the feature importance of the local area is increased; and finally, fusing local region features into global features through a pooling layer.
Since each local region has a correlation not only with the preceding local region but also with the local region following it, the present invention selects the derivative network Bi-LSTM of RNN as the encoder of the context attention coding layer. However, the Bi-LSTM model easily causes that information carried by the first input features is diluted by the second input features, so that the global feature vector cannot be reasonably represented. In order to solve the problem, the invention considers that a grouping attention module is introduced on the basis of a Bi-LSTM model to highlight the importance degree of different local region characteristics on the global characteristics of the whole input point cloud, so that the model focuses on useful characteristic information, and avoids redundancy of useless characteristic information, thereby effectively solving the problem of losing semantic information at the front end of a long sequence. The internal structure of a context attention RNN coding layer provided by the embodiment of the invention is shown in FIG. 10, and mainly comprises a Bi-LSTM part and a grouping attention part, and the main calculation flow is as follows: firstly, abstracting M local region features into a local feature sequence R ═ { R ═ R1,r2,...,rm,...rMAnd then extracting context information between local areas by adopting a Bi-LSTM encoder, and finally outputting a local area characteristic h containing the context informationmThe calculation method is shown in formula (4-1).
Wherein, WaIs a weight matrix which can be learnt;the method adopts a concat combination mode, and concat represents an end-to-end splicing processing function between vectors; LSTM represents a nonlinear activation function.
Step S50: and fusing the local area features according to the context information among the different local area features to obtain the global semantic features of the point cloud.
Fig. 11 is an internal structural diagram of a packet attention module according to an embodiment of the present invention. Obtaining a local characteristic sequence H ═ H after Bi-LSTM coding1,h2,...,hm,...hMThe invention adopts a grouping attention module as the characteristic h of different local areasmDifferent attention weights are assigned to emphasize the importance of different local area features. The group attention module does not need convolution operation at first, because the attention module in the feature fusion stage inputs the features of each local area, including the association information among a plurality of point clouds, and the association relationship can be destroyed by applying convolution.
As shown in fig. 11, first, an association relationship between different local region features is calculated according to the local feature sequence H and its transpose, and a Softmax function is applied to normalize the relationship diagram, so as to obtain a grouping attention matrix G. For the invention Gj,iThe influence of the ith local area feature on the jth local area feature, i.e. the attention weight, is shown as formula (4-2), and the higher the dependency relationship between the two features is, the higher the value is, and it also represents the long-term dependency relationship between the local area features.
And then performing matrix multiplication weighting fusion between the local feature sequence H and the grouping attention matrix G, and finally connecting the result of weighting fusion with the originally input local feature sequence H by using a jump link to output the final global semantic feature C of the three-dimensional point cloud, as shown in a formula (4-3).
Step S60: subsequently, the three-dimensional point cloud can be classified and segmented according to the global semantic features of the three-dimensional point cloud.
In summary, the feature extraction method fusing the fine-grained multi-scale information provided by the embodiment of the invention can not only mine the fine-grained multi-scale information of different local areas, but also capture the local area information by combining the correlation among the different scale areas, solve the problem of insufficient feature expression capability of the local areas, enhance the distinguishing capability of the network on the apparent similar targets, and improve the precision of classification and segmentation tasks in the understanding of the three-dimensional point cloud scene under the condition of ensuring moderate time complexity and space complexity.
The feature fusion method based on the context attention RNN coding provided by the embodiment of the invention can fully capture the correlation among local regions, obtain the local geometric information and the spatial context information with fine granularity, effectively solve the problems that apparent similar targets cannot be effectively distinguished, small targets are wrongly divided, the divided edges are rough and the like in a complex environment, and improve the precision of point cloud classification and division tasks under the condition of ensuring moderate time complexity and spatial complexity.
Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (6)
1. A method for classifying and segmenting three-dimensional point cloud based on feature fusion is characterized by comprising the following steps:
dividing the three-dimensional point cloud into a plurality of local areas, establishing a multi-scale area in each local area through a KNN (K nearest neighbor) search algorithm, and extracting fine-grained multi-scale features of the multi-scale area through a graph attention convolution layer;
distributing attention weight to each scale feature of the local region through a space attention mechanism, and performing weighted fusion on each scale feature of the local region according to the attention weight to obtain local region features of point cloud containing fine-grained geometric information;
acquiring context information among different local area features of the point cloud through a bidirectional long-term and short-term memory network, and fusing the local area features according to the context information among the different local area features to obtain the global semantic features of the point cloud;
and classifying and dividing the three-dimensional point cloud according to the global semantic features of the point cloud.
2. The method according to claim 1, wherein the establishing a multi-scale region in each local region of the point cloud by a KNN search algorithm comprises:
dividing input three-dimensional point cloud into M local regions { L ] through iteration farthest point sampling algorithm and KNN searching algorithm1,L2,...,LMAt each local area LMRespectively adopting KNN search algorithm to construct T scale regions, and dividing local region LMDivision into scale regions of T different scales SM1,SM2,...,SMTAnd establishing a multi-scale area according to all the scale areas.
3. The method of claim 2, wherein the extracting fine-grained multi-scale features of the multi-scale region by the graph attention convolution layer comprises:
in each scale region, extracting association information between different neighborhood points and a central point through a graph convolution layer fusing spatial position information and characteristic attribute information, capturing local geometric information of point cloud to obtain fine-grained geometric characteristics of each scale region, extracting fine-grained scale characteristics of T different scale regions through a graph attention convolution layer, and obtaining fine-grained multi-scale characteristics of a multi-scale region according to the fine-grained scale characteristics of each scale region.
4. The method of claim 1, wherein the assigning an attention weight to each scale feature of the local region through a spatial attention mechanism, and performing weighted fusion on each scale feature of the local region according to the attention weight to obtain the local region feature of the point cloud containing fine-grained geometric information comprises:
distributing different weights to information from different neighborhood points by adopting a spatial attention mechanism, respectively distributing attention weights to scale features of T scale regions of each local region, aggregating neighborhood point information of each local region to a central point, and performing weighted fusion on the scale features of the T scale regions of each local region according to the attention weights to obtain 1 local region feature, wherein the local region feature comprises point cloud fine-grained local geometric information and point pair association information.
5. The method as claimed in claim 1, wherein the obtaining of the context information between different local area features of the point cloud through the bidirectional long-short term memory network comprises:
an encoder which forms a context attention coding layer by utilizing a Bi-directional long-short term memory network Bi-LSTM and a grouping attention module, wherein M local area features of the point cloud are abstracted into a local feature sequence R ═ R by the encoder1,r2,...,rm,...rMExtracting context information among different local area features by using the local feature sequence R, and outputting the local area features h containing the context informationmThe calculation method is shown as the formula (4-1):
6. The method of claim 5, wherein the fusing the local region features according to context information between different local region features to obtain the global semantic features of the point cloud comprises:
all local region characteristics h containing context information after Bi-LSTM encodingmForming a local signature sequence H ═ H1,h2,...,hm,...hM};
Using grouped attention modules for different local area features hmDistributing different attention weights, calculating the incidence relation between different local area features according to the local feature sequence H and the transposition thereof, applying a Softmax function to carry out normalization processing on the relation graph to obtain a group attention matrix G, and using the G to obtain a group attention matrix Gj,iThe influence of the ith local area feature on the jth local area feature, i.e. the attention weight, is shown in formula (4-2):
performing weighted fusion between the local feature sequence H and the grouping attention matrix G by using matrix multiplication, connecting the result of the weighted fusion with the originally input local feature sequence H by using jump link, and outputting the global semantic feature C of the three-dimensional point cloud, as shown in a formula (4-3):
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110648726.6A CN113393474B (en) | 2021-06-10 | 2021-06-10 | Feature fusion based three-dimensional point cloud classification and segmentation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110648726.6A CN113393474B (en) | 2021-06-10 | 2021-06-10 | Feature fusion based three-dimensional point cloud classification and segmentation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113393474A true CN113393474A (en) | 2021-09-14 |
CN113393474B CN113393474B (en) | 2022-05-13 |
Family
ID=77620332
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110648726.6A Active CN113393474B (en) | 2021-06-10 | 2021-06-10 | Feature fusion based three-dimensional point cloud classification and segmentation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113393474B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114529757A (en) * | 2022-01-21 | 2022-05-24 | 四川大学 | Cross-modal single-sample three-dimensional point cloud segmentation method |
CN114882285A (en) * | 2022-05-23 | 2022-08-09 | 北方民族大学 | Fine-grained three-dimensional point cloud classification method based on information enhancement |
CN115965788A (en) * | 2023-01-12 | 2023-04-14 | 黑龙江工程学院 | Point cloud semantic segmentation method based on multi-view image structural feature attention convolution |
CN116206306A (en) * | 2022-12-26 | 2023-06-02 | 山东科技大学 | Inter-category characterization contrast driven graph roll point cloud semantic annotation method |
CN116540790A (en) * | 2023-07-05 | 2023-08-04 | 深圳市保凌影像科技有限公司 | Tripod head stability control method and device, electronic equipment and storage medium |
CN116608866A (en) * | 2023-07-20 | 2023-08-18 | 华南理工大学 | Picture navigation method, device and medium based on multi-scale fine granularity feature fusion |
WO2023202695A1 (en) * | 2022-04-22 | 2023-10-26 | 北京灵汐科技有限公司 | Data processing method and apparatus, device, and medium |
CN118154996A (en) * | 2024-05-10 | 2024-06-07 | 山东科技大学 | Three-dimensional scene point cloud classification method for multi-scale depth feature aggregation |
CN118570194A (en) * | 2024-07-31 | 2024-08-30 | 烟台东泽汽车零部件有限公司 | Method and system for detecting defects of inner surface of special-shaped bushing based on three-dimensional point cloud |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190108639A1 (en) * | 2017-10-09 | 2019-04-11 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and Methods for Semantic Segmentation of 3D Point Clouds |
CN110111345A (en) * | 2019-05-14 | 2019-08-09 | 西安电子科技大学 | A kind of 3D point cloud dividing method based on attention network |
CN111192270A (en) * | 2020-01-03 | 2020-05-22 | 中山大学 | Point cloud semantic segmentation method based on point global context reasoning |
CN111242208A (en) * | 2020-01-08 | 2020-06-05 | 深圳大学 | Point cloud classification method, point cloud segmentation method and related equipment |
CN112215101A (en) * | 2020-09-27 | 2021-01-12 | 武汉科技大学 | Attention mechanism-based three-dimensional target identification method and system |
CN112241676A (en) * | 2020-07-07 | 2021-01-19 | 西北农林科技大学 | Method for automatically identifying terrain sundries |
CN112257597A (en) * | 2020-10-22 | 2021-01-22 | 中国人民解放军战略支援部队信息工程大学 | Semantic segmentation method of point cloud data |
CN112348056A (en) * | 2020-10-16 | 2021-02-09 | 北京大学深圳研究生院 | Point cloud data classification method, device, equipment and readable storage medium |
CN112560865A (en) * | 2020-12-23 | 2021-03-26 | 清华大学 | Semantic segmentation method for point cloud under outdoor large scene |
CN112633330A (en) * | 2020-12-06 | 2021-04-09 | 西安电子科技大学 | Point cloud segmentation method, system, medium, computer device, terminal and application |
CN112633350A (en) * | 2020-12-18 | 2021-04-09 | 湖北工业大学 | Multi-scale point cloud classification implementation method based on graph convolution |
CN112801262A (en) * | 2019-11-14 | 2021-05-14 | 波音公司 | Attention weighting module and method for convolutional neural networks |
CN112819080A (en) * | 2021-02-05 | 2021-05-18 | 四川大学 | High-precision universal three-dimensional point cloud identification method |
CN112819833A (en) * | 2021-02-05 | 2021-05-18 | 四川大学 | Large scene point cloud semantic segmentation method |
-
2021
- 2021-06-10 CN CN202110648726.6A patent/CN113393474B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190108639A1 (en) * | 2017-10-09 | 2019-04-11 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and Methods for Semantic Segmentation of 3D Point Clouds |
CN110111345A (en) * | 2019-05-14 | 2019-08-09 | 西安电子科技大学 | A kind of 3D point cloud dividing method based on attention network |
CN112801262A (en) * | 2019-11-14 | 2021-05-14 | 波音公司 | Attention weighting module and method for convolutional neural networks |
CN111192270A (en) * | 2020-01-03 | 2020-05-22 | 中山大学 | Point cloud semantic segmentation method based on point global context reasoning |
CN111242208A (en) * | 2020-01-08 | 2020-06-05 | 深圳大学 | Point cloud classification method, point cloud segmentation method and related equipment |
CN112241676A (en) * | 2020-07-07 | 2021-01-19 | 西北农林科技大学 | Method for automatically identifying terrain sundries |
CN112215101A (en) * | 2020-09-27 | 2021-01-12 | 武汉科技大学 | Attention mechanism-based three-dimensional target identification method and system |
CN112348056A (en) * | 2020-10-16 | 2021-02-09 | 北京大学深圳研究生院 | Point cloud data classification method, device, equipment and readable storage medium |
CN112257597A (en) * | 2020-10-22 | 2021-01-22 | 中国人民解放军战略支援部队信息工程大学 | Semantic segmentation method of point cloud data |
CN112633330A (en) * | 2020-12-06 | 2021-04-09 | 西安电子科技大学 | Point cloud segmentation method, system, medium, computer device, terminal and application |
CN112633350A (en) * | 2020-12-18 | 2021-04-09 | 湖北工业大学 | Multi-scale point cloud classification implementation method based on graph convolution |
CN112560865A (en) * | 2020-12-23 | 2021-03-26 | 清华大学 | Semantic segmentation method for point cloud under outdoor large scene |
CN112819080A (en) * | 2021-02-05 | 2021-05-18 | 四川大学 | High-precision universal three-dimensional point cloud identification method |
CN112819833A (en) * | 2021-02-05 | 2021-05-18 | 四川大学 | Large scene point cloud semantic segmentation method |
Non-Patent Citations (5)
Title |
---|
LEI WANG 等: "Graph Attention Convolution for Point Cloud Semantic Segmentation", 《PROCEEDINGS OF THE IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
NAN LUO 等: "KVGCN: A KNN Searching and VLAD Combined Graph Convolutional Network for Point Cloud Segmentation", 《REMOTE SENSING》 * |
RUI GUO 等: "Point cloud classification by dynamic graph CNN with adaptive feature fusion", 《IET COMPUTER VISION》 * |
杨军 等: "基于上下文注意力 CNN 的三维点云语义分割", 《通信学报》 * |
柴玉晶 等: "用于点云语义分割的深度图注意力卷积网络", 《激光与光电子学进展》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114529757B (en) * | 2022-01-21 | 2023-04-18 | 四川大学 | Cross-modal single-sample three-dimensional point cloud segmentation method |
CN114529757A (en) * | 2022-01-21 | 2022-05-24 | 四川大学 | Cross-modal single-sample three-dimensional point cloud segmentation method |
WO2023202695A1 (en) * | 2022-04-22 | 2023-10-26 | 北京灵汐科技有限公司 | Data processing method and apparatus, device, and medium |
CN114882285A (en) * | 2022-05-23 | 2022-08-09 | 北方民族大学 | Fine-grained three-dimensional point cloud classification method based on information enhancement |
CN114882285B (en) * | 2022-05-23 | 2024-03-29 | 北方民族大学 | Fine-grained three-dimensional point cloud classification method based on information enhancement |
CN116206306A (en) * | 2022-12-26 | 2023-06-02 | 山东科技大学 | Inter-category characterization contrast driven graph roll point cloud semantic annotation method |
CN115965788A (en) * | 2023-01-12 | 2023-04-14 | 黑龙江工程学院 | Point cloud semantic segmentation method based on multi-view image structural feature attention convolution |
CN116540790A (en) * | 2023-07-05 | 2023-08-04 | 深圳市保凌影像科技有限公司 | Tripod head stability control method and device, electronic equipment and storage medium |
CN116540790B (en) * | 2023-07-05 | 2023-09-08 | 深圳市保凌影像科技有限公司 | Tripod head stability control method and device, electronic equipment and storage medium |
CN116608866A (en) * | 2023-07-20 | 2023-08-18 | 华南理工大学 | Picture navigation method, device and medium based on multi-scale fine granularity feature fusion |
CN116608866B (en) * | 2023-07-20 | 2023-09-26 | 华南理工大学 | Picture navigation method, device and medium based on multi-scale fine granularity feature fusion |
CN118154996A (en) * | 2024-05-10 | 2024-06-07 | 山东科技大学 | Three-dimensional scene point cloud classification method for multi-scale depth feature aggregation |
CN118154996B (en) * | 2024-05-10 | 2024-08-27 | 山东科技大学 | Three-dimensional scene point cloud classification method for multi-scale depth feature aggregation |
CN118570194A (en) * | 2024-07-31 | 2024-08-30 | 烟台东泽汽车零部件有限公司 | Method and system for detecting defects of inner surface of special-shaped bushing based on three-dimensional point cloud |
CN118570194B (en) * | 2024-07-31 | 2024-10-18 | 烟台东泽汽车零部件有限公司 | Method and system for detecting defects of inner surface of special-shaped bushing based on three-dimensional point cloud |
Also Published As
Publication number | Publication date |
---|---|
CN113393474B (en) | 2022-05-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113393474B (en) | Feature fusion based three-dimensional point cloud classification and segmentation method | |
CN110472531B (en) | Video processing method, device, electronic equipment and storage medium | |
US12100192B2 (en) | Method, apparatus, and electronic device for training place recognition model | |
CN111311107B (en) | Risk assessment method and device based on user relationship and computer equipment | |
CN113963445A (en) | Pedestrian falling action recognition method and device based on attitude estimation | |
CN111382868A (en) | Neural network structure search method and neural network structure search device | |
CN113033507B (en) | Scene recognition method and device, computer equipment and storage medium | |
Zhang et al. | Fast face detection on mobile devices by leveraging global and local facial characteristics | |
CN110765882A (en) | Video tag determination method, device, server and storage medium | |
CN113283400B (en) | Skeleton action identification method based on selective hypergraph convolutional network | |
CN113963304B (en) | Cross-modal video time sequence action positioning method and system based on time sequence-space diagram | |
CN113821657A (en) | Artificial intelligence-based image processing model training method and image processing method | |
CN114299285A (en) | Three-dimensional point cloud semi-automatic labeling method and system, electronic equipment and storage medium | |
CN113705596A (en) | Image recognition method and device, computer equipment and storage medium | |
CN111524140A (en) | Medical image semantic segmentation method based on CNN and random forest method | |
CN114358109A (en) | Feature extraction model training method, feature extraction model training device, sample retrieval method, sample retrieval device and computer equipment | |
US10635918B1 (en) | Method and device for managing smart database for face recognition based on continual learning | |
CN116612386A (en) | Pepper disease and pest identification method and system based on hierarchical detection double-task model | |
CN115457332A (en) | Image multi-label classification method based on graph convolution neural network and class activation mapping | |
CN114387304A (en) | Target tracking method, computer program product, storage medium, and electronic device | |
CN116597267B (en) | Image recognition method, device, computer equipment and storage medium | |
CN113705293A (en) | Image scene recognition method, device, equipment and readable storage medium | |
CN117371511A (en) | Training method, device, equipment and storage medium for image classification model | |
CN115661861A (en) | Skeleton behavior identification method based on dynamic time sequence multidimensional adaptive graph convolution network | |
CN113822291A (en) | Image processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |