CN117036370A - Plant organ point cloud segmentation method based on attention mechanism and graph convolution - Google Patents
Plant organ point cloud segmentation method based on attention mechanism and graph convolution Download PDFInfo
- Publication number
- CN117036370A CN117036370A CN202310704110.5A CN202310704110A CN117036370A CN 117036370 A CN117036370 A CN 117036370A CN 202310704110 A CN202310704110 A CN 202310704110A CN 117036370 A CN117036370 A CN 117036370A
- Authority
- CN
- China
- Prior art keywords
- point
- feature
- point cloud
- layer
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 50
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000007246 mechanism Effects 0.000 title claims abstract description 30
- 210000000056 organ Anatomy 0.000 title claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 17
- 230000008878 coupling Effects 0.000 claims abstract description 14
- 238000010168 coupling process Methods 0.000 claims abstract description 14
- 238000005859 coupling reaction Methods 0.000 claims abstract description 14
- 238000010586 diagram Methods 0.000 claims abstract description 11
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000013528 artificial neural network Methods 0.000 claims abstract description 5
- 239000010410 layer Substances 0.000 claims description 67
- 239000011159 matrix material Substances 0.000 claims description 21
- 230000002776 aggregation Effects 0.000 claims description 19
- 238000004220 aggregation Methods 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 16
- 238000011176 pooling Methods 0.000 claims description 15
- 238000004422 calculation algorithm Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 8
- 239000000284 extract Substances 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 239000013598 vector Substances 0.000 claims description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 239000013047 polymeric layer Substances 0.000 claims description 3
- 238000006116 polymerization reaction Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000011426 transformation method Methods 0.000 claims description 3
- 241000196324 Embryophyta Species 0.000 abstract description 34
- 238000011160 research Methods 0.000 abstract description 4
- 240000006394 Sorghum bicolor Species 0.000 abstract description 3
- 235000011684 Sorghum saccharatum Nutrition 0.000 abstract description 3
- 238000004458 analytical method Methods 0.000 abstract description 3
- 235000007688 Lycopersicon esculentum Nutrition 0.000 abstract description 2
- 244000061176 Nicotiana tabacum Species 0.000 abstract description 2
- 235000002637 Nicotiana tabacum Nutrition 0.000 abstract description 2
- 240000003768 Solanum lycopersicum Species 0.000 abstract description 2
- 208000003643 Callosities Diseases 0.000 abstract 1
- 206010020649 Hyperkeratosis Diseases 0.000 abstract 1
- 235000021307 Triticum Nutrition 0.000 abstract 1
- 244000098338 Triticum aestivum Species 0.000 abstract 1
- 240000008042 Zea mays Species 0.000 abstract 1
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 abstract 1
- 235000002017 Zea mays subsp mays Nutrition 0.000 abstract 1
- 235000005822 corn Nutrition 0.000 abstract 1
- 230000000694 effects Effects 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 230000000877 morphologic effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000002679 ablation Methods 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000029553 photosynthesis Effects 0.000 description 1
- 238000010672 photosynthesis Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
A plant organ point cloud segmentation method based on an attention mechanism and graph convolution belongs to the technical field of three-dimensional point cloud instance segmentation. The method comprises the steps of dividing a network TRGCN based on a point attention mechanism and a double-branch parallel example of space diagram convolution, directly inputting a three-dimensional point cloud, respectively focusing on local feature extraction and global feature extraction by double branches, and fusing the two features through a T-G feature coupling layer. The method takes five plant point cloud data of tomatoes, corns, tobaccos, sorghum and wheat as research objects, and the dual-branch parallel neural network architecture TRGCN can capture local characteristics and global characteristics of the point cloud at the same time, is used for training a high-robustness example segmentation model, can improve the segmentation precision of the plant point cloud, has good generalization capability, and can provide good data support for rapid, efficient and accurate plant phenotype analysis.
Description
Technical Field
The invention belongs to the technical field of three-dimensional point cloud instance segmentation, and particularly relates to a double-branch parallel plant organ point cloud segmentation method based on an attention mechanism and space diagram convolution.
Background
With the popularization of laser radar equipment and the advent of various consumer-level depth sensors, point cloud data is increasingly being used in various fields, such as robots, autopilot, city planning, and the like. In phenotypic studies, three-dimensional point clouds have become the most directly effective data form for studying plant structure and morphology as a real-world low-resolution representation. Many studies have employed three-dimensional structures of plants for organ segmentation, monitoring growth vigor, and evaluating varieties, etc. Points in the three-dimensional coordinate system serve as the most basic units constituting the point cloud, are similar to pixel points in a two-dimensional picture, but can accommodate more high-dimensional semantic information. In phenotypic research, the morphological structure of plant organs is an intuitive and important character, and can reflect the adaptability of plants to external conditions and the growth condition, such as photosynthesis efficiency, water absorption efficiency and the like. The plant organ point cloud segmentation refers to the process of semantically dividing plants according to different organs (such as stems, leaves, fruits and the like), is the basis for deep understanding of point cloud data later, has important significance for understanding the functional structure of the plants, and is a challenging research direction at present.
The traditional plant point cloud segmentation algorithm needs to manually perform feature description in advance, the segmentation process is complex and complicated, and along with the arrival of big data age, the traditional processing method is difficult to meet the requirement of rapid and accurate analysis. Therefore, the demand for automated segmentation methods is increasing. With the rapid increase of the performance of the computer graphics processor, as a leading technology of artificial intelligence, deep learning has been successfully used for solving various two-dimensional vision problems. However, due to the disorder and complexity of the point cloud in spatial summary, the application of the deep learning method on the point cloud also faces many challenges. The Convolutional Neural Network (CNN) with good performance in the visual segmentation task extracts the features through shared kernel convolution, so that the model efficiency is improved, and the inherent translational invariance of the CNN enables the control of the local features to be more accurate. However, CNNs themselves are typically relatively small in receptive field, relatively weak in capturing ability for global features, and cannot directly act on the original point cloud data. Another neural network structure applied to point cloud data is a graph roll-up network (GCN), which treats each individual point in the point cloud as a vertex in the graph data structure, and can extract local features by performing a convolution-like operation directly in the point cloud data. The transducer with outstanding performance in the natural language processing field can capture global features equally well, and the core thought Attention (Attention) mechanism is also very suitable for processing point cloud data. These deep learning methods have achieved satisfactory segmentation results on many common point cloud data sets, revealing the effectiveness of the deep learning method for point cloud data segmentation.
However, the structural complexity of the plant point cloud results in a relatively large amount of semantic information that needs to be identified in the organ segmentation task. When the point cloud is acquired, the shielding problem among the blades often causes the loss of part of the point cloud, and the problems of holes and sparseness occur. In addition, the similarity between plant organs is high, and different leaf examples often have the same color, morphological structure, texture and other characteristics, and the highly repeated characteristics are not friendly to learning of the neural network. Finally, the plants of different varieties have different geometric characteristics, and even the plants of the same variety have different phenotypic characteristics under different growing environments, even large differences can be generated, so that the requirement on the generalization capability of the network is high. In summary, the current segmentation accuracy of the plant point cloud cannot meet the requirements.
Disclosure of Invention
The invention aims to solve the problem of accurate organ segmentation in complex plant point clouds, and provides a double-branch parallel plant organ point cloud segmentation method based on an attention mechanism and a space diagram convolution.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a plant organ point cloud segmentation method based on an attention mechanism and graph convolution, the method comprising:
step one: the feature encoder takes an original point cloud as input, adopts a multi-layer perceptron to map features to a high-dimensional space, adopts a point cloud attention mechanism to extract the features preliminarily, and then inputs initial feature data into a TRGCN block, and the module can be used for cascade superposition to deepen understanding of the high-dimensional features; the feature aggregation layer in the TRGCN block extracts the neighborhood feature and downsamples the point cloud, then enters a double-branch parallel network part, is a local feature capturing branch formed by space graph convolution and a global feature learning branch formed by a point attention mechanism respectively, finally inputs feature data into the T-G feature coupling layer to obtain a target number of point clouds and corresponding high-dimensional abstract features, and the encoder part extracts high-dimensional feature information from the original plant point clouds by superposing the TRGCN block for dividing tasks;
step two: the feature decoder part stacks three cascaded TRGCN blocks, and receives the outputs of the three TRGCN blocks in the encoder respectively, but replaces the feature aggregation layer with the interpolation layer, and the interpolation layer restores the features of the high-dimensional point set to the low-dimensional point set, and still outputs the grouping result of the K nearest neighbor algorithm for two branches of the TRGCN to calculate; for segmentation result prediction, the decoder sets an independent interpolation layer behind the TRGCN block, adopts a single point attention layer to ensure information integrity, and finally adopts a multi-layer perceptron to output a segmentation result of point cloud;
step three: training a network: all experiments in this study were performed on a stand-alone server equipped with a 12-core 20-thread CPU, 64GB memory, and a Nvidia GeForce RTX 3090Ti GPU; the neural network training is carried out by using an independent server, and in the training stage, all plant point cloud segmentation models adopt the same super parameters, wherein the super parameters are specifically as follows: training batch size was set to 32, initial learning rate was set to 0.001, the network was optimized using Adam method for 100 cycles, learning rate was halved every 20 cycles, weight decay was set to 0.0001, momentum was set to 0.9, K value of K nearest neighbor algorithm was set to 12, and feature dimension of point attention layer was set to 256.
Further, the first step specifically comprises:
(1) Feature polymeric layer
The specific process of feature polymerization is as follows: inputting x points with feature dimension, firstly sampling the points at the most distant point by using a random, grouping the point clouds by adopting a K nearest neighbor algorithm, inputting the point clouds into a multi-layer perceptron to aggregate the neighbor point features to a central point, and finally obtaining y points with feature' dimension features by adopting a maximum pooling operation;
the characteristic aggregation layer adopts a K nearest neighbor algorithm to sample and group the input point set; the feature aggregation layer outputs the calculated K neighbor matrix and shares the K neighbor matrix with the subsequent parallel branches;
(2) Local feature capture branching
The branch is constructed based on a dynamic space diagram convolution and is used for extracting local features in an input plant point cloud; firstly, constructing a feature graph G= (V, E) based on a point set V and neighbor information E, and carrying out feature extraction on an input feature space by adopting edge convolution; extracting a certain point x i The formula of the characteristics is as follows:
f i =?h(x i ,y i )
wherein x is j For point x i Is one of the neighbor points? And h represents a certain aggregation function and a certain relational operation, respectively; the method comprises the steps that neighbor point features around candidate points are aggregated through a relation operation, so that feature information of the candidate points is obtained, and the relation operation is defined as edge convolution;
the maximum pooling is adopted as an aggregation function, and the specific process is as follows:
conv i =Max(MLP(h(x i ,x i -x j )))
the relational operation h is defined as point x i ,x i And its neighbor point x j Feature difference value and point x of (2) i Linear combinations between the output values;
(3) Global feature learning branching
The feature is extracted by adopting a vector attention mechanism in a local neighborhood, and the calculation formula is as follows:
wherein x is j Is the point x i X is the independent point set in each single plant point cloud, ρ is the regularization function, γ is the mapping function, β is a certain relational operation, which is defined in this study as the difference between the neighborhood point and the point of interest, φ,Alpha is a feature transformation method of a point level, Q, K, V (Query, key, value is a proper noun in an attention mechanism, corresponding to Chinese is query, key and value) values in the self-attention mechanism are respectively obtained, delta is a position coding function, a point attention layer is proposed according to the attention mechanism, and an improved calculation formula is as follows:
(4) T-G feature coupling layer
Through the processing, the feature matrix with two dimensions and identical shapes is obtained: a matrix G with significant local features and a matrix T with complete global features; and (3) inputting the spliced G and T into a feature coupling layer to obtain a target feature matrix:
TG=Linear(ReLU(Linear(T,G)))
the T-G characteristic coupling layer is designed by adopting two linear layers and one ReLU activation layer, so that the network can learn more important information of each of the two parts of matrixes and combine the two parts of matrixes into a target characteristic matrix.
Compared with the prior art, the invention has the beneficial effects that: the invention designs a brand new dual-branch parallel instance segmentation network TRGCN based on a point attention mechanism and space diagram convolution, which directly inputs three-dimensional point cloud, and the dual branches respectively pay attention to local feature extraction and global feature extraction and fuse the two features through a T-G feature coupling layer. The result shows that TRGCN has excellent performance on different plant point clouds, has higher accuracy than other main stream point cloud segmentation networks, has good generalization capability, and can provide good data support for rapid, efficient and accurate plant phenotype analysis.
Drawings
FIG. 1 is a network architecture diagram of a TRGCN of the present invention;
FIG. 2 is a block diagram of the TRGCN block of the present invention;
FIG. 3 is a schematic diagram of a TRGCN block global feature learning layer of the present invention;
fig. 4 is a graph of the segmentation result of five plant point clouds according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following embodiments, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description and specific examples are intended for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1:
the present study is based on a point cloud self-attention mechanism and a space diagram convolution, and innovatively proposes a dual-branch parallel network Transformer Graph Convolution Network (TRGCN) designed by adopting an encoder-decoder architecture (fig. 1).
The feature encoder takes the original point cloud as input, adopts a multi-layer perceptron to map the features into a high-dimensional space (defaults to 32 dimensions), and adopts a point cloud attention mechanism to extract the features preliminarily. The initial feature data is then input to a TRGCN module (fig. 2) that can be stacked in multiple cascades to continually deepen understanding of the high-dimensional features. Specifically, the feature aggregation layer in the TRGCN block extracts the neighborhood feature and downsamples the point cloud, and then enters a dual-branch parallel network part, which is a local feature capture branch formed by space diagram convolution and a global feature learning branch formed by a point attention mechanism, respectively. And finally, inputting the characteristic data into a specially designed T-G characteristic coupling layer to obtain the target number of point clouds and corresponding high-dimensional abstract characteristics. The encoder part extracts high-dimensional characteristic information from the original plant point cloud by superposing TRGCN blocks for dividing tasks.
(1) Feature polymeric layer
The feature aggregation layer in the TRGCN block is used for reducing the input point set base number and abstracting the feature vector with high dimension in the process of stacking a plurality of modules. For example, from the original input to the first TRGCN block, the number of points is reduced from N to N/4, and the feature dimension of the point cloud is increased from F to 2F;
the specific process of feature polymerization is as follows: the method comprises the steps of inputting x points with feature dimension, firstly, sampling the points at the most distant randomly, then grouping the point clouds by adopting a K nearest neighbor algorithm, inputting the point clouds into a multi-layer perceptron to aggregate the neighbor point features to a central point, and finally, calculating to obtain y points with feature 'dimension feature by adopting a maximum pooling operation (default y=x/4, feature' =2 x feature).
The feature aggregation layer adopts a K neighbor algorithm to sample and group the input point set. In addition, in order to save the video memory space during training, the layer outputs the calculated K neighbor matrix and shares the K neighbor matrix with the subsequent parallel branches.
(2) Local feature capture branching
The branch is constructed based on a dynamic space graph convolution for extracting local features in an input plant point cloud. First, a feature map g= (V, E) is constructed based on the point set V and the neighbor information E, and feature extraction is performed on the input feature space using edge convolution. Extracting a certain point x i The formula of the features is as follows:
f i =?h(x i ,y i )
wherein x is j Representative point x i Is one of the neighbor points? And h represents some aggregation function and some relational operation, respectively. The feature information of the candidate points can be obtained by aggregating the neighbor point features around the candidate points by adopting a relational operation, wherein the relational operation is defined as edge convolution. To enhance local features in a point cloudIt is understood that the present study uses maximum pooling as an aggregation function, as follows:
conv i =Max(MLP(h(x i ,x i -x j )))
the relational operation h is defined as point x i x i And its neighbor point x j Feature difference value and point x of (2) i Linear combinations between the output values. This choice not only preserves the features of the local point sets that affect each other, but also partially considers the global features of the whole.
(3) Global feature learning branching
As shown in fig. 3, this branch is constructed based on a point cloud attention mechanism, and is very suitable for processing point cloud data, which can be essentially regarded as word vectors embedded in an attention space, and the present study adopts a vector attention mechanism in a local neighborhood to extract features, and the calculation formula is as follows:
wherein x is j Is the point x i One of K neighbor points phi,Alpha is a characteristic transformation method of the point level, Q, K, V values in a self-attention mechanism are respectively obtained, delta is a position coding function, rho is a regularization function, gamma is a mapping function, and beta is a certain relational operation, and the difference value between a neighborhood point and a focus point is defined in the study. According to the above attention mechanism, the present study proposes a point attention layer, and the improved calculation formula is as follows:
unlike the common attentional mechanisms, position coding is also added to the alpha function to enhance the understanding of the features. On the basis of the point attention layer, the TRGCN encoder constructs a residual structure in the global feature learning branch. And adding a linear layer before and after the point attention layer, and connecting the final output with the input in a residual way, so that information exchange is promoted, network convergence is accelerated, and possibility is provided for training a deep network.
(4) T-G feature coupling layer
Through the processing, the feature matrix with two dimensions and identical shapes can be obtained: a matrix G with significant local features and a matrix T with complete global features. And (3) inputting the spliced G and T into a feature coupling layer to obtain a target feature matrix:
TG=Linear(ReLU(Linear(T,G)))
the T-G characteristic coupling layer is designed by adopting two linear layers and one ReLU activation layer, so that the network can learn more important information of each of the two parts of matrixes and combine the two parts of matrixes into a target characteristic matrix.
In summary, the TRGCN network feature encoder portion may design a model that accommodates different visual tasks by varying the number of stacks of TRGCN blocks. Fewer TRGCN blocks may be used for lightweight classification networks, while more cascaded TRGCN blocks may be used for finer-grained tasks such as point cloud segmentation and target recognition.
The feature decoder section also stacks three concatenated TRGCN blocks and receives the outputs of the three TRGCN blocks in the encoder, respectively, but replaces the feature aggregation layer with the interpolation layer. Contrary to the feature aggregation layer, the interpolation layer in the decoder restores the features of the high-dimensional point set to the low-dimensional point set, but still outputs the grouping result of the K-nearest neighbor algorithm for the two branch computation of the TRGCN. For example segmentation result prediction, the decoder sets an independent interpolation layer behind the TRGCN block, adopts a single point attention layer to ensure information integrity, and finally adopts a multi-layer perceptron to output a segmentation result of point cloud.
And (5) training a network. All experiments in this study were performed on a separate server equipped with a 12 core 20 thread CPU, 64GB memory and a Nvidia GeForce RTX 3090Ti GPU. In the training stage, the 5-plant point cloud segmentation model adopts the same super parameters, and specifically comprises the following steps: training batch size was set to 32, initial learning rate was set to 0.001, the network was optimized using Adam method for 100 cycles, learning rate was halved every 20 cycles, weight decay was set to 0.0001, momentum was set to 0.9, K value of K nearest neighbor algorithm was set to 12, and feature dimension of point attention layer was set to 256.
Organ example segmentation tests were performed on 5 plant point cloud data, with the highest average cross-over ratio of 86.38% and average accuracy of 88.58%. In order to verify the segmentation capability of the TRGCN, three main stream point cloud segmentation networks are selected to be compared with the TRGCN, among 5 segmentation tasks, the TRGCN has 9 indexes leading other three methods, the optimal precision is obtained in most segmentation tasks, and particularly the precision improvement on sorghum leaves is more obvious, so that the TRGCN is better in treating monocotyledonous plant point clouds. Because the canopy structure of dicotyledonous crops is crowded, the problem of shielding is easy to cause, the segmentation effect of tobacco and tomato point clouds is not as good as that of monocotyledonous crops, but the segmentation effect is still better than that of other three segmentation networks. Specific test results are shown in table 1, and fig. 4 is a graph of the segmentation effect of five plant point clouds.
The invention also adopts sorghum point cloud as a research object to discuss the stacking quantity of TRGCN pooling layers and cascaded TRGCN blocks, and the result shows that the network segmentation performance adopting the maximum pooling is optimal, and the accuracy is about 2% higher than that of the average pooling and summation pooling. When the number of the cascaded TRGCN blocks is 3, the network is optimal in training time and segmentation effect, and higher segmentation accuracy is obtained at the expense of a certain time. The specific test results are shown in tables 2 and 3.
Table 1 is a table showing the comparison of the segmentation accuracy of the TRGCN network of the present invention with other mainstream networks
Pooling layer | Training time (seconds) | Average cross ratio (%) | Average accuracy (%) |
Maximum pooling | 2082 | 78.9292 | 84.9198 |
Average pooling | 2085 | 75.7748 | 80.6104 |
Summation pooling | 2086 | 76.3709 | 82.9647 |
Table 2 the ablation experiment 1 of the invention, the segmentation effect table of different pooling layers
TRGCN block number | Training time (seconds) | Average cross ratio (%) |
2 | 1728 | 73.7120 |
3 | 2082 | 78.9292 |
4 | 2202 | 75.5498 |
Table 3 the present invention ablates experiment 2, a table of segmentation effects for different TRGCN block stacking numbers.
Claims (2)
1. A plant organ point cloud segmentation method based on an attention mechanism and graph convolution is characterized by comprising the following steps of: the method comprises the following steps:
step one: the feature encoder takes an original point cloud as input, adopts a multi-layer perceptron to map features to a high-dimensional space, adopts a point cloud attention mechanism to extract the features preliminarily, and then inputs initial feature data into a TRGCN block, and the module can be used for cascade superposition to deepen understanding of the high-dimensional features; the feature aggregation layer in the TRGCN block extracts neighborhood features and downsamples point clouds at the same time, then enters a double-branch parallel network part, is a local feature capturing branch formed by space graph convolution and a global feature learning branch formed by a point attention mechanism respectively, and finally inputs feature data into the T-G feature coupling layer to obtain target number of point clouds and corresponding high-dimensional abstract features;
step two: the feature decoder part stacks three cascaded TRGCN blocks, and receives the outputs of the three TRGCN blocks in the encoder respectively, but replaces the feature aggregation layer with the interpolation layer, and the interpolation layer restores the features of the high-dimensional point set to the low-dimensional point set, and still outputs the grouping result of the K nearest neighbor algorithm for two branches of the TRGCN to calculate; for segmentation result prediction, the decoder sets an independent interpolation layer behind the TRGCN block, adopts a single point attention layer to ensure information integrity, and finally adopts a multi-layer perceptron to output a segmentation result of point cloud;
step three: training a network: a CPU with 12 cores and 20 threads, a 64GB memory and a Nvidia GeForce RTX 3090Ti GPU are arranged on an independent server; the neural network training is carried out by using an independent server, and in the training stage, all plant point cloud segmentation models adopt the same super parameters, wherein the super parameters are specifically as follows: training batch size was set to 32, initial learning rate was set to 0.001, the network was optimized using Adam method for 100 cycles, learning rate was halved every 20 cycles, weight decay was set to 0.0001, momentum was set to 0.9, K value of K nearest neighbor algorithm was set to 12, and feature dimension of point attention layer was set to 256.
2. The method for segmenting the plant organ point cloud based on the attention mechanism and the graph convolution according to claim 1, wherein the method comprises the following steps of: the first step is specifically as follows:
(1) Feature polymeric layer
The specific process of feature polymerization is as follows: inputting x points with feature dimension, firstly sampling the points at the most distant point by using a random, grouping the point clouds by adopting a K nearest neighbor algorithm, inputting the point clouds into a multi-layer perceptron to aggregate the neighbor point features to a central point, and finally obtaining y points with feature' dimension features by adopting a maximum pooling operation;
the characteristic aggregation layer adopts a K nearest neighbor algorithm to sample and group the input point set; the feature aggregation layer outputs the calculated K neighbor matrix and shares the K neighbor matrix with the subsequent parallel branches;
(2) Local feature capture branching
The branch is constructed based on a dynamic space diagram convolution and is used for extracting local features in an input plant point cloud; firstly, constructing a feature graph G= (V, E) based on a point set V and neighbor information E, and carrying out feature extraction on an input feature space by adopting edge convolution; extracting a certain point x i The formula of the characteristics is as follows:
f i =?h(x i ,y i )
wherein x is j For point x i Is one of the neighbor points? And h represents a certain aggregation function and a certain relational operation, respectively; the method comprises the steps that neighbor point features around candidate points are aggregated through a relation operation, so that feature information of the candidate points is obtained, and the relation operation is defined as edge convolution;
the maximum pooling is adopted as an aggregation function, and the specific process is as follows:
conv i =Max(MLP(h(x i ,x i -x j )))
the relational operation h is defined as point x i ,x i And its neighbor point x j Feature difference value and point x of (2) i Linear combinations between the output values;
(3) Global feature learning branching
The feature is extracted by adopting a vector attention mechanism in a local neighborhood, and the calculation formula is as follows:
wherein x is j Is the point x i X is an independent point set in each single plant point cloud, ρ is a regularization function, γ is a mapping function, β is a difference between a neighborhood point and a point of interest, φ,Alpha is a characteristic transformation method of point level, Q, K, V values in a self-attention mechanism are respectively obtained, delta is a position coding function, a point attention layer is provided according to the attention mechanism, and an improved calculation formula is as follows:
(4) T-G feature coupling layer
Through the processing, the feature matrix with two dimensions and identical shapes is obtained: a matrix G with significant local features and a matrix T with complete global features; and (3) inputting the spliced G and T into a feature coupling layer to obtain a target feature matrix:
TG=Linear(ReLU(Linear(T,G)))
the T-G characteristic coupling layer is designed by adopting two linear layers and one ReLU activation layer, so that the network can learn more important information of each of the two parts of matrixes and combine the two parts of matrixes into a target characteristic matrix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310704110.5A CN117036370A (en) | 2023-06-14 | 2023-06-14 | Plant organ point cloud segmentation method based on attention mechanism and graph convolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310704110.5A CN117036370A (en) | 2023-06-14 | 2023-06-14 | Plant organ point cloud segmentation method based on attention mechanism and graph convolution |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117036370A true CN117036370A (en) | 2023-11-10 |
Family
ID=88625100
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310704110.5A Pending CN117036370A (en) | 2023-06-14 | 2023-06-14 | Plant organ point cloud segmentation method based on attention mechanism and graph convolution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117036370A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117455929A (en) * | 2023-12-26 | 2024-01-26 | 福建理工大学 | Tooth segmentation method and terminal based on double-flow self-attention force diagram convolution network |
CN117726822A (en) * | 2024-02-18 | 2024-03-19 | 安徽大学 | Three-dimensional medical image classification segmentation system and method based on double-branch feature fusion |
CN117745148A (en) * | 2024-02-10 | 2024-03-22 | 安徽省农业科学院烟草研究所 | Multi-source data-based rice stubble flue-cured tobacco planting quality evaluation method and system |
CN118632027A (en) * | 2024-08-08 | 2024-09-10 | 华侨大学 | Point cloud compression method based on graph rolling network |
-
2023
- 2023-06-14 CN CN202310704110.5A patent/CN117036370A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117455929A (en) * | 2023-12-26 | 2024-01-26 | 福建理工大学 | Tooth segmentation method and terminal based on double-flow self-attention force diagram convolution network |
CN117455929B (en) * | 2023-12-26 | 2024-03-15 | 福建理工大学 | Tooth segmentation method and terminal based on double-flow self-attention force diagram convolution network |
CN117745148A (en) * | 2024-02-10 | 2024-03-22 | 安徽省农业科学院烟草研究所 | Multi-source data-based rice stubble flue-cured tobacco planting quality evaluation method and system |
CN117745148B (en) * | 2024-02-10 | 2024-05-10 | 安徽省农业科学院烟草研究所 | Multi-source data-based rice stubble flue-cured tobacco planting quality evaluation method and system |
CN117726822A (en) * | 2024-02-18 | 2024-03-19 | 安徽大学 | Three-dimensional medical image classification segmentation system and method based on double-branch feature fusion |
CN117726822B (en) * | 2024-02-18 | 2024-05-03 | 安徽大学 | Three-dimensional medical image classification segmentation system and method based on double-branch feature fusion |
CN118632027A (en) * | 2024-08-08 | 2024-09-10 | 华侨大学 | Point cloud compression method based on graph rolling network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117036370A (en) | Plant organ point cloud segmentation method based on attention mechanism and graph convolution | |
Wang et al. | Deep CNNs meet global covariance pooling: Better representation and generalization | |
WO2024040828A1 (en) | Method and device for fusion and classification of remote sensing hyperspectral image and laser radar image | |
CN115082293B (en) | Image registration method based on Swin transducer and CNN dual-branch coupling | |
Ma et al. | Using an improved lightweight YOLOv8 model for real-time detection of multi-stage apple fruit in complex orchard environments | |
Liu et al. | Deep multibranch fusion residual network for insect pest recognition | |
CN113505856B (en) | Non-supervision self-adaptive classification method for hyperspectral images | |
Gao et al. | Natural scene recognition based on convolutional neural networks and deep Boltzmannn machines | |
CN112365511A (en) | Point cloud segmentation method based on overlapped region retrieval and alignment | |
Hu et al. | Lightweight multi-scale network with attention for facial expression recognition | |
Zhang et al. | Knowledge amalgamation for object detection with transformers | |
Chen et al. | Deep convolutional network for citrus leaf diseases recognition | |
Zhao et al. | A target detection algorithm for remote sensing images based on a combination of feature fusion and improved anchor | |
CN111144469B (en) | End-to-end multi-sequence text recognition method based on multi-dimensional associated time sequence classification neural network | |
CN116311504A (en) | Small sample behavior recognition method, system and equipment | |
CN116109868A (en) | Image classification model construction and small sample image classification method based on lightweight neural network | |
CN116486261A (en) | Method and system for separating tree point cloud wood components from blade components | |
CN114565639B (en) | Target tracking method and system based on composite convolution network | |
Ma et al. | Attentive Enhanced Convolutional Neural Network for Point Cloud Analysis | |
Zhao et al. | Facial expression recognition based on visual transformers and local attention features network | |
Li et al. | Prune the Convolutional Neural Networks with Sparse Shrink | |
Xu et al. | SPS-LCNN: A Significant Point Sampling-based Lightweight Convolutional Neural Network for point cloud processing | |
Jin et al. | Multi-stream aggregation network for fine-grained crop pests and diseases image recognition | |
Wenwen et al. | Animal Pose Estimation Algorithm Based on the Lightweight Stacked Hourglass Network | |
Cheng et al. | An Effective Anchor-Free model with Transformer for Logo Detection< Subtitle> Efficient Logo Detection via Transformer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |