CN117036370A

CN117036370A - Plant organ point cloud segmentation method based on attention mechanism and graph convolution

Info

Publication number: CN117036370A
Application number: CN202310704110.5A
Authority: CN
Inventors: 马韫韬; 蔡智博; 朱晋宇; 郭焱; 李保国
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2023-06-14
Filing date: 2023-06-14
Publication date: 2023-11-10

Abstract

A plant organ point cloud segmentation method based on an attention mechanism and graph convolution belongs to the technical field of three-dimensional point cloud instance segmentation. The method comprises the steps of dividing a network TRGCN based on a point attention mechanism and a double-branch parallel example of space diagram convolution, directly inputting a three-dimensional point cloud, respectively focusing on local feature extraction and global feature extraction by double branches, and fusing the two features through a T-G feature coupling layer. The method takes five plant point cloud data of tomatoes, corns, tobaccos, sorghum and wheat as research objects, and the dual-branch parallel neural network architecture TRGCN can capture local characteristics and global characteristics of the point cloud at the same time, is used for training a high-robustness example segmentation model, can improve the segmentation precision of the plant point cloud, has good generalization capability, and can provide good data support for rapid, efficient and accurate plant phenotype analysis.

Description

A plant organ point cloud segmentation method based on attention mechanism and graph convolution

技术领域Technical field

本发明属于三维点云实例分割技术领域，具体涉及一种基于注意力机制和空间图卷积的双分支并行植物器官点云分割方法。The invention belongs to the technical field of three-dimensional point cloud instance segmentation, and specifically relates to a dual-branch parallel plant organ point cloud segmentation method based on attention mechanism and spatial graph convolution.

背景技术Background technique

随着激光雷达设备的普及和各种消费级深度传感器的出现，点云数据越来越广泛的应用于各个领域，比如机器人、自动驾驶和城市规划等。在表型分析研究中，三维点云作为现实世界的低分辨率表示，已经成为研究植物结构和形态最直接有效的数据形式。许多研究已经采用植物三维结构来进行器官分割、监测长势和评价品种等。三维坐标系中的点作为构成点云的最基本单元，类似于二维图片中的像素点，但可以容纳更多高维度的语义信息。在表型研究中，植物器官的形态结构是直观且重要的性状，可以反映出植物对外界条件的适应能力和生长情况，例如光合作用效率和水分吸收效率等。植物器官点云分割是指将植物按照不同器官(如茎、叶、果实等)进行语义划分的过程，是后续对点云数据进行深入理解的基础，并对理解植物的功能结构具有重要意义，目前是一个颇具挑战性的研究方向。With the popularization of lidar equipment and the emergence of various consumer-grade depth sensors, point cloud data is increasingly used in various fields, such as robotics, autonomous driving, and urban planning. In phenotypic analysis research, three-dimensional point cloud, as a low-resolution representation of the real world, has become the most direct and effective data form for studying plant structure and morphology. Many studies have used plant three-dimensional structures to segment organs, monitor growth, and evaluate varieties. Points in a three-dimensional coordinate system, as the most basic units that constitute point clouds, are similar to pixels in two-dimensional pictures, but can accommodate more high-dimensional semantic information. In phenotypic research, the morphological structure of plant organs is an intuitive and important trait, which can reflect the plant's ability to adapt to external conditions and growth, such as photosynthesis efficiency and water absorption efficiency. Plant organ point cloud segmentation refers to the process of semantically dividing plants according to different organs (such as stems, leaves, fruits, etc.). It is the basis for subsequent in-depth understanding of point cloud data and is of great significance for understanding the functional structure of plants. It is currently a challenging research direction.

传统的植物点云分割算法需要人工预先进行特征描述，这种分割过程复杂繁琐，随着大数据时代的来临，传统处理方法难以满足快速准确分析的需求。因此，自动化分割方法的需求日益增大。而随着计算机图形处理器性能的飞速增长，作为人工智能的主导技术，深度学习已经成功的用于解决各类二维视觉问题。但由于点云在空间汇总的无序性和复杂性，深度学习方法在点云上的应用还面临诸多挑战。在视觉分割任务中有良好表现的卷积神经网络(CNN)通过共享核卷积来提取特征，提高模型效率，CNN固有的平移不变性使其对局部特征的把控更为精准。然而，CNN自身的感受野通常比较小，对于全局特征的捕获能力相对较弱，并且无法直接作用于原始点云数据中。另一种应用于点云数据中的神经网络结构是图卷积网络(GCN)，它将点云中每个独立的点视为图数据结构中的顶点，可以通过直接在点云数据中进行类卷积操作来提取局部特征。在自然语言处理领域有着杰出表现的Transformer同样可以很好地捕获到全局特征，其核心思想注意力(Attention)机制也非常适合处理点云数据。这些深度学习方法在许多公共点云数据集上取得了令人满意的分割结果，展现了深度学习方法对于点云数据分割的有效性。The traditional plant point cloud segmentation algorithm requires manual feature description in advance. This segmentation process is complex and cumbersome. With the advent of the big data era, traditional processing methods cannot meet the needs of fast and accurate analysis. Therefore, there is an increasing need for automated segmentation methods. With the rapid growth of computer graphics processor performance, deep learning, as the leading technology of artificial intelligence, has been successfully used to solve various two-dimensional vision problems. However, due to the disorder and complexity of point clouds in spatial aggregation, the application of deep learning methods on point clouds still faces many challenges. Convolutional neural networks (CNN), which have good performance in visual segmentation tasks, extract features through shared kernel convolution and improve model efficiency. The inherent translation invariance of CNN makes it more accurate to control local features. However, the receptive field of CNN itself is usually relatively small, the ability to capture global features is relatively weak, and it cannot directly act on the original point cloud data. Another neural network structure applied to point cloud data is the graph convolutional network (GCN), which treats each independent point in the point cloud as a vertex in the graph data structure, which can be directly implemented in the point cloud data. Convolution-like operations to extract local features. Transformer, which has outstanding performance in the field of natural language processing, can also capture global features well, and its core idea attention mechanism is also very suitable for processing point cloud data. These deep learning methods have achieved satisfactory segmentation results on many public point cloud data sets, demonstrating the effectiveness of deep learning methods for point cloud data segmentation.

然而，相对来说，植物点云在结构上的复杂性导致在器官分割任务中需要识别的语义信息量也就更大。在获取点云时，叶片之间的遮挡问题常常会导致部分点云丢失，出现孔洞、稀疏的问题。此外，植物器官之间的相似度很高，不同叶片实例往往具有相同的颜色、形态结构和纹理等特征，这种高度重复的特征对于神经网络的学习并不友好。最后，不同品种植物具有的几何形态特征不同，即使同一品种的植物在不同生长环境下的表型特征也不尽相同，甚至会产生很大差异，这对于网络的泛化能力要求很高。综上，目前植物点云的分割精度尚不能满足要求。However, relatively speaking, the structural complexity of plant point clouds results in a greater amount of semantic information that needs to be recognized in the organ segmentation task. When obtaining point clouds, the occlusion problem between leaves often causes part of the point cloud to be lost, resulting in holes and sparse problems. In addition, the similarity between plant organs is very high, and different leaf instances often have the same characteristics such as color, morphological structure, and texture. This highly repetitive feature is not friendly to the learning of neural networks. Finally, different species of plants have different geometric morphological characteristics. Even the phenotypic characteristics of plants of the same species in different growth environments are different, and may even be very different, which requires high generalization capabilities of the network. In summary, the current segmentation accuracy of plant point clouds cannot meet the requirements.

发明内容Contents of the invention

本发明的目的是为了解决复杂植物点云中精准的器官分割问题，提供一种基于注意力机制和空间图卷积的双分支并行植物器官点云分割方法，该方法为复杂结构的植物三维点云提供了可靠且高效的器官分割方法，以番茄、玉米、烟草、高粱、小麦五种植物点云数据作为研究对象，基于注意力机制和空间图卷积全新设计一种双分支并行神经网络架构TRGCN，能够同时捕获点云的局部特征和全局特征，用于训练高鲁棒性的实例分割模型，可提高植物点云的分割精度，为快速、高效、准确的植物表型分析提供数据支持。The purpose of this invention is to solve the problem of accurate organ segmentation in complex plant point clouds and provide a two-branch parallel plant organ point cloud segmentation method based on attention mechanism and spatial graph convolution. This method is a three-dimensional plant point cloud with complex structure. The cloud provides a reliable and efficient organ segmentation method. Taking five plant point cloud data of tomato, corn, tobacco, sorghum, and wheat as the research object, a dual-branch parallel neural network architecture is newly designed based on the attention mechanism and spatial graph convolution. TRGCN can simultaneously capture local features and global features of point clouds and is used to train highly robust instance segmentation models. It can improve the segmentation accuracy of plant point clouds and provide data support for fast, efficient, and accurate plant phenotypic analysis.

为实现上述目的，本发明采取的技术方案如下：In order to achieve the above objects, the technical solutions adopted by the present invention are as follows:

一种基于注意力机制和图卷积的植物器官点云分割方法，所述方法为：A plant organ point cloud segmentation method based on attention mechanism and graph convolution. The method is:

步骤一：特征编码器以原始点云作为输入，采用多层感知机将特征映射到高维空间，并采用点云注意力机制初步提取特征，然后将初始特征数据输入TRGCN块，该模块能够多个级联叠加，以不断深化对高维特征的理解；所述TRGCN块中的特征聚合层提取邻域特征的同时下采样点云，然后进入双分支并行网络部分，分别是由空间图卷积构成的局部特征捕获分支和由点注意力机制构成的全局特征学习分支，最终将特征数据输入T-G特征耦合层，得到目标数量的点云以及对应的高维抽象特征，编码器部分通过叠加TRGCN块，从原始植物点云中抽取高维度的特征信息，用于分割任务；Step 1: The feature encoder takes the original point cloud as input, uses a multi-layer perceptron to map the features to a high-dimensional space, and uses the point cloud attention mechanism to initially extract features, and then inputs the initial feature data into the TRGCN block. This module can cascade superposition to continuously deepen the understanding of high-dimensional features; the feature aggregation layer in the TRGCN block extracts neighborhood features while downsampling point clouds, and then enters the dual-branch parallel network part, which is composed of spatial graph convolution The local feature capture branch composed of the local feature capture branch and the global feature learning branch composed of the point attention mechanism finally input the feature data into the T-G feature coupling layer to obtain the target number of point clouds and corresponding high-dimensional abstract features. The encoder part superimposes the TRGCN block , extract high-dimensional feature information from the original plant point cloud for segmentation tasks;

步骤二：特征解码器部分同样堆叠三个级联的TRGCN块，并分别接收编码器中三个TRGCN块的输出，但用插值层替换特征聚合层，插值层将高维度点集的特征恢复到低维度点集中，但仍然输出K近邻算法的分组结果供TRGCN的两个分支计算所需；对于分割结果预测，解码器在TRGCN块后设置独立的插值层，并采用单个点注意力层来保证信息完整性，网络最后采用多层感知机输出点云的分割结果；Step 2: The feature decoder part also stacks three cascaded TRGCN blocks and receives the output of the three TRGCN blocks in the encoder respectively, but replaces the feature aggregation layer with an interpolation layer. The interpolation layer restores the features of the high-dimensional point set to Low-dimensional points are concentrated, but the grouping results of the K nearest neighbor algorithm are still output for calculation by the two branches of TRGCN; for segmentation result prediction, the decoder sets an independent interpolation layer after the TRGCN block and uses a single point attention layer to ensure Information integrity, the network finally uses a multi-layer perceptron to output the segmentation results of the point cloud;

步骤三：网络训练：本研究所有实验均在独立服务器上进行，该服务器配备了12核心20线程的CPU、64GB内存和一个Nvidia GeForce RTX 3090Ti GPU；使用独立服务器进行神经网络训练，训练阶段，所有的植物点云分割模型采用相同的超参数，所述超参数具体为：训练批次大小设置为32，初始学习率设置为0.001，使用Adam方法对网络进行优化，共训练100个周期，每20个周期将学习率减半，权重衰减设置为0.0001，动量设置为0.9，K近邻算法的K值设置为12，点注意力层的特征维度设置为256。Step 3: Network training: All experiments in this study were conducted on an independent server equipped with a 12-core 20-thread CPU, 64GB memory, and an Nvidia GeForce RTX 3090Ti GPU; an independent server was used for neural network training. During the training phase, all The plant point cloud segmentation model uses the same hyperparameters. The hyperparameters are specifically: the training batch size is set to 32, the initial learning rate is set to 0.001, the Adam method is used to optimize the network, and a total of 100 cycles are trained, every 20 The learning rate is halved every cycle, the weight decay is set to 0.0001, the momentum is set to 0.9, the K value of the K nearest neighbor algorithm is set to 12, and the feature dimension of the point attention layer is set to 256.

进一步地，所述步骤一具体为：Further, the step one is specifically:

(1)特征聚合层(1) Feature aggregation layer

特征聚合具体过程为：输入具有feature维度的x个点，先经过随机最远点采样，然后采用K近邻算法将点云分组，输入到多层感知机中将邻居点特征聚合到中心点，最后采用最大池化操作计算得到具有feature’维度特征的y个点；The specific process of feature aggregation is: input x points with feature dimension, first randomly sample the farthest point, then use the K nearest neighbor algorithm to group the point cloud, input it into the multi-layer perceptron to aggregate the neighbor point features to the center point, and finally The maximum pooling operation is used to calculate y points with feature' dimension characteristics;

特征聚合层采用K近邻算法对输入点集进行采样分组；特征聚合层将计算的K近邻矩阵输出，与后续并行分支共享；The feature aggregation layer uses the K nearest neighbor algorithm to sample and group the input point set; the feature aggregation layer outputs the calculated K nearest neighbor matrix and shares it with subsequent parallel branches;

(2)局部特征捕获分支(2) Local feature capture branch

该分支基于动态空间图卷积构建，用于提取输入的植物点云中的局部特征；首先，基于点集V和近邻信息E构建特征图G＝(V，E)，采用边缘卷积在输入的特征空间上进行特征提取；提取某点x_i特征的公式为：This branch is built based on dynamic spatial graph convolution and is used to extract local features in the input plant point cloud; first, a feature map G = (V, E) is constructed based on the point set V and neighbor information E, and edge convolution is used to extract the local features in the input plant point cloud. Feature extraction is performed on the feature space of ; the formula for extracting features of a certain point x _i is:

f_i＝？h(x_i,y_i)f _i =? h(x _i ,y _i )

其中，x_j为点x_i的邻居点之一，？和h分别代表某种聚合函数和某种关系操作；即采用一种关系操作将候选点周围的邻居点特征聚合，即得到候选点的特征信息，该关系操作定义为边缘卷积；Among them, x _j is one of the neighbor points of point _xi ,? and h respectively represent a certain aggregation function and a certain relational operation; that is, a relational operation is used to aggregate the features of neighbor points around the candidate point, that is, the characteristic information of the candidate point is obtained. The relational operation is defined as edge convolution;

采用最大池化作为聚合函数，具体过程为：Maximum pooling is used as the aggregation function. The specific process is:

conv_i＝Max(MLP(h(x_i,x_i-x_j)))conv _i =Max(MLP(h( _xi , _xi -x _j )))

关系操作h定义为点x_i，x_i和其邻居点x_j的特征差值和点x_i输出值之间的线性组合；The relational operation h is defined as the linear combination between the feature differences of point x _i , x _i and its neighbor point x _j and the output value of point x _i ;

(3)全局特征学习分支(3) Global feature learning branch

采用局部邻域中的向量注意力机制来提取特征，计算公式为：The vector attention mechanism in the local neighborhood is used to extract features. The calculation formula is:

其中，x_j是点x_i的K个邻居点之一，X为每个单株植物点云中的独立点集合，ρ是正则化函数，γ是映射函数，β是某种关系运算，在本研究中定义为邻域点与关注点的差值，φ、α是点级别的特征变换方法，分别得到自注意力机制中的Q、K、V(分别为Query、Key、Value，是注意力机制中的专有名词，对应中文是查询、键、值)值，δ是位置编码函数，根据以上注意力机制，提出点注意力层，改进后的计算公式为：Among them, x _j is one of the K neighbor points of point x _i , In this study, it is defined as the difference between the neighborhood point and the attention point, φ, α is a point-level feature transformation method, which is obtained from Q, K, and V in the attention mechanism (Query, Key, and Value respectively, which are proper nouns in the attention mechanism, corresponding to query, key, and value in Chinese) value, δ is the position encoding function. Based on the above attention mechanism, a point attention layer is proposed. The improved calculation formula is:

(4)T-G特征耦合层(4)T-G characteristic coupling layer

经过以上处理，得到两个维度以及形状完全相同的特征矩阵：具有显著局部特征的矩阵G和具有完整全局特征的矩阵T；将G和T拼接后输入特征耦合层得到目标特征矩阵：After the above processing, two feature matrices with identical dimensions and shapes are obtained: a matrix G with significant local features and a matrix T with complete global features; G and T are spliced and input into the feature coupling layer to obtain the target feature matrix:

TG＝Linear(ReLU(Linear(T,G)))TG＝Linear(ReLU(Linear(T,G)))

采用两个线性层和一个ReLU激活层设计了T-G特征耦合层，使网络能够学习两部分矩阵各自更重要的信息，并结合成为目标特征矩阵。The T-G feature coupling layer is designed using two linear layers and a ReLU activation layer, allowing the network to learn the more important information of the two matrices and combine them into the target feature matrix.

本发明相对于现有技术的有益效果为：本发明设计了一个全新的基于点注意力机制和空间图卷积的双分支并行实例分割网络TRGCN，直接输入三维点云，双分支分别关注局部特征提取和全局特征提取，并通过T-G特征耦合层融合两种特征。结果表明，TRGCN在不同的植物点云上均获得了优秀的性能，准确率高于其他主流点云分割网络，具有很好的泛化能力，可以为快速、高效、准确的植物表型分析提供良好的数据支撑。The beneficial effects of the present invention compared to the existing technology are: the present invention designs a brand new dual-branch parallel instance segmentation network TRGCN based on point attention mechanism and spatial graph convolution, which directly inputs the three-dimensional point cloud, and the dual branches focus on local features respectively. extraction and global feature extraction, and fuse the two features through the T-G feature coupling layer. The results show that TRGCN has achieved excellent performance on different plant point clouds, with an accuracy higher than other mainstream point cloud segmentation networks. It has good generalization ability and can provide fast, efficient and accurate plant phenotype analysis. Good data support.

附图说明Description of the drawings

图1为本发明TRGCN的网络架构图；Figure 1 is a network architecture diagram of the TRGCN of the present invention;

图2为本发明TRGCN块的结构图；Figure 2 is a structural diagram of the TRGCN block of the present invention;

图3为本发明TRGCN块全局特征学习层的架构图；Figure 3 is an architectural diagram of the global feature learning layer of the TRGCN block of the present invention;

图4为本发明对五种植物点云的分割结果图。Figure 4 is a diagram showing the segmentation results of five plant point clouds according to the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案及优点更加清楚明白，以下结合具体实施方式，对本发明进行进一步的详细说明。应当理解的是，此处所描述的具体实施方式仅用以解释本发明，并不限定本发明的保护范围。In order to make the purpose, technical solutions and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with specific embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention and do not limit the scope of the present invention.

实施例1：Example 1:

本研究基于点云自注意力机制和空间图卷积，创新性的提出一种采用编码器-解码器架构设计的双分支并行网络Transformer Graph Convolution Network(TRGCN)(图1)。Based on the point cloud self-attention mechanism and spatial graph convolution, this research innovatively proposes a dual-branch parallel network Transformer Graph Convolution Network (TRGCN) designed using an encoder-decoder architecture (Figure 1).

特征编码器以原始点云作为输入，采用多层感知机将特征映射到高维空间(默认为32维)，并采用点云注意力机制初步提取特征。然后将初始特征数据输入TRGCN模块(图2)，该模块可多个级联叠加，以不断深化对高维特征的理解。具体来说，TRGCN块中的特征聚合层提取邻域特征的同时下采样点云，然后进入双分支并行网络部分，分别是由空间图卷积构成的局部特征捕获分支和由点注意力机制构成的全局特征学习分支。最终，将特征数据输入特别设计的T-G特征耦合层，得到目标数量的点云以及对应的高维抽象特征。编码器部分通过叠加TRGCN块，从原始植物点云中抽取高维度的特征信息，用于分割任务。The feature encoder takes the original point cloud as input, uses a multi-layer perceptron to map features to a high-dimensional space (default is 32 dimensions), and uses a point cloud attention mechanism to initially extract features. The initial feature data is then input into the TRGCN module (Figure 2), which can be superimposed in multiple cascades to continuously deepen the understanding of high-dimensional features. Specifically, the feature aggregation layer in the TRGCN block extracts neighborhood features while downsampling the point cloud, and then enters the dual-branch parallel network part, which is the local feature capture branch composed of spatial graph convolution and the point attention mechanism. The global feature learning branch. Finally, the feature data is input into the specially designed T-G feature coupling layer to obtain the target number of point clouds and corresponding high-dimensional abstract features. The encoder part extracts high-dimensional feature information from the original plant point cloud by superimposing TRGCN blocks for segmentation tasks.

(1)特征聚合层(1) Feature aggregation layer

TRGCN块中特征聚合层的作用是在多个模块堆叠过程中减少输入的点集基数同时抽象出跟高维度的特征向量。例如从原始输入到第一个TRGCN块，点数量从N减少为N/4，点云特征维度从F提升到2F；The function of the feature aggregation layer in the TRGCN block is to reduce the input point set base and abstract high-dimensional feature vectors during the stacking process of multiple modules. For example, from the original input to the first TRGCN block, the number of points is reduced from N to N/4, and the point cloud feature dimension is increased from F to 2F;

特征聚合具体过程如下：输入具有feature维度的x个点，先经过随机最远点采样，然后采用K近邻算法将点云分组，输入到多层感知机中将邻居点特征聚合到中心点，最后采用最大池化操作计算得到具有feature’维度特征的y个点(默认y＝x/4,feature’＝2*feature)。The specific process of feature aggregation is as follows: input x points with feature dimension, first randomly sample the farthest point, then use the K nearest neighbor algorithm to group the point cloud, input it into the multi-layer perceptron to aggregate the neighbor point features to the center point, and finally The maximum pooling operation is used to calculate y points with feature' dimension characteristics (default y=x/4, feature'=2*feature).

特征聚合层采用K近邻算法对输入点集进行采样分组。并且，为了节省训练时的显存空间，此层会将计算的K近邻矩阵输出，与后续并行分支共享。The feature aggregation layer uses the K nearest neighbor algorithm to sample and group the input point set. Moreover, in order to save video memory space during training, this layer will output the calculated K nearest neighbor matrix and share it with subsequent parallel branches.

(2)局部特征捕获分支(2) Local feature capture branch

该分支基于动态空间图卷积构建，用于提取输入的植物点云中的局部特征。首先，基于点集V和近邻信息E构建特征图G＝(V,E)，采用边缘卷积在输入的特征空间上进行特征提取。提取某点x_i特征的公式如下：This branch is built based on dynamic spatial graph convolution and is used to extract local features in the input plant point cloud. First, a feature map G = (V, E) is constructed based on the point set V and neighbor information E, and edge convolution is used to extract features on the input feature space. The formula for extracting features of a certain point x _i is as follows:

f_i＝？h(x_i,y_i)f _i =? h(x _i ,y _i )

其中，x_j代表点x_i的邻居点之一，？和h分别代表某种聚合函数和某种关系操作。即采用一种关系操作将候选点周围的邻居点特征聚合，即可得到候选点的特征信息，该关系操作定义为边缘卷积。为了增强对点云中局部特征的理解，本研究采用最大池化作为聚合函数，具体过程如下：Among them, x _j represents one of the neighbor points of point _xi ,? and h represent some kind of aggregate function and some kind of relational operation respectively. That is, a relational operation is used to aggregate the features of neighbor points around the candidate point to obtain the feature information of the candidate point. This relational operation is defined as edge convolution. In order to enhance the understanding of local features in point clouds, this study uses max pooling as the aggregation function. The specific process is as follows:

conv_i＝Max(MLP(h(x_i,x_i-x_j)))conv _i =Max(MLP(h( _xi , _xi -x _j )))

关系操作h定义为点x_i x_i和其邻居点x_j的特征差值和点x_i输出值之间的线性组合。这种选择不仅保留了局部点集中彼此影响的特征，也部分考虑到整体的全局特征。The relational operation h is defined as the linear combination between the characteristic difference between point x _i x _i and its neighbor point x _j and the output value of point x _i . This choice not only retains the characteristics of local point sets that influence each other, but also partially takes into account the overall global characteristics.

(3)全局特征学习分支(3) Global feature learning branch

如图3所示，此分支基于点云注意力机制构建，非常适合处理点云数据，本质上，点云数据可以看作是嵌入在注意力空间的词向量，本研究采用局部邻域中的向量注意力机制来提取特征，计算公式如下：As shown in Figure 3, this branch is built based on the point cloud attention mechanism and is very suitable for processing point cloud data. In essence, point cloud data can be regarded as word vectors embedded in the attention space. This study uses Vector attention mechanism is used to extract features. The calculation formula is as follows:

其中，x_j是点x_i的K个邻居点之一，φ、α是点级别的特征变换方法，分别得到自注意力机制中的Q、K、V值，δ是位置编码函数，ρ是正则化函数，γ是映射函数，β是某种关系运算，在本研究中定义为邻域点与关注点的差值。根据以上注意力机制，本研究提出了点注意力层，改进后的计算公式如下：Among them, x _j is one of the K neighbor points of point x _i , φ, α is a point-level feature transformation method, which is obtained from the Q, K, and V values in the attention mechanism respectively. δ is the position encoding function, ρ is the regularization function, γ is the mapping function, and β is some kind of relational operation. In this paper In the study, it is defined as the difference between the neighborhood point and the attention point. Based on the above attention mechanism, this study proposes a point attention layer. The improved calculation formula is as follows:

与普通注意力机制不同的是，在α函数中也加入了位置编码，以此强化对特征的理解。在点注意力层的基础上，TRGCN编码器在全局特征学习分支构建了一个残差结构。在点注意力层前后各添加一个线性层，并将最终输出与输入进行残差连接，以此促进信息交换，加速网络收敛，并为训练深层网络提供可能性。Different from the ordinary attention mechanism, position coding is also added to the α function to enhance the understanding of features. Based on the point attention layer, the TRGCN encoder constructs a residual structure in the global feature learning branch. Add a linear layer before and after the point attention layer, and conduct residual connection between the final output and the input to promote information exchange, accelerate network convergence, and provide the possibility to train deep networks.

(4)T-G特征耦合层(4)T-G characteristic coupling layer

经过以上处理，可以得到两个维度以及形状完全相同的特征矩阵：具有显著局部特征的矩阵G和具有完整全局特征的矩阵T。将G和T拼接后输入特征耦合层得到目标特征矩阵：After the above processing, two feature matrices with exactly the same dimensions and shapes can be obtained: a matrix G with significant local features and a matrix T with complete global features. After splicing G and T, input into the feature coupling layer to obtain the target feature matrix:

TG＝Linear(ReLU(Linear(T,G)))TG＝Linear(ReLU(Linear(T,G)))

采用两个线性层和一个ReLU激活层设计了T-G特征耦合层，使网络可以学习两部分矩阵各自更重要的信息，并结合成为目标特征矩阵。The T-G feature coupling layer is designed using two linear layers and a ReLU activation layer, so that the network can learn the more important information of the two matrices and combine them into the target feature matrix.

综上所述，TRGCN网络特征编码器部分可以通过改变TRGCN块的堆叠数量，设计适应不同视觉任务的模型。较少的TRGCN块可以用于轻量级分类网络，而更多的级联TRGCN块可以用于更细粒度的任务，例如点云分割和目标识别。In summary, the feature encoder part of the TRGCN network can design models adapted to different visual tasks by changing the stacking number of TRGCN blocks. Fewer TRGCN blocks can be used for lightweight classification networks, while more cascaded TRGCN blocks can be used for finer-grained tasks such as point cloud segmentation and object recognition.

特征解码器部分同样堆叠三个级联的TRGCN块，并分别接收编码器中三个TRGCN块的输出，但用插值层替换特征聚合层。与特征聚合层相反的是解码器中的插值层将高维度点集的特征恢复到低维度点集中，但仍然输出K近邻算法的分组结果供TRGCN的两个分支计算所需。对于实例分割结果预测，解码器在TRGCN块后设置独立的插值层，并采用单个点注意力层来保证信息完整性，网络最后采用多层感知机输出点云的分割结果。The feature decoder part also stacks three cascaded TRGCN blocks and receives the outputs of the three TRGCN blocks in the encoder respectively, but replaces the feature aggregation layer with an interpolation layer. Contrary to the feature aggregation layer, the interpolation layer in the decoder restores the features of the high-dimensional point set to the low-dimensional point set, but still outputs the grouping results of the K nearest neighbor algorithm for the calculation of the two branches of TRGCN. For instance segmentation result prediction, the decoder sets an independent interpolation layer after the TRGCN block and uses a single point attention layer to ensure information integrity. The network finally uses a multi-layer perceptron to output the segmentation result of the point cloud.

网络训练。本研究所有实验均在独立服务器上进行，该服务器配备了12核心20线程的CPU、64GB内存和一个Nvidia GeForce RTX 3090Ti GPU。在训练阶段，5种植物点云分割模型采用相同的超参数，具体为：训练批次大小设置为32，初始学习率设置为0.001，使用Adam方法对网络进行优化，共训练100个周期，每20个周期将学习率减半，权重衰减设置为0.0001，动量设置为0.9，K近邻算法的K值设置为12，点注意力层的特征维度设置为256。Network training. All experiments in this study were conducted on a dedicated server equipped with a 12-core 20-thread CPU, 64GB memory, and an Nvidia GeForce RTX 3090Ti GPU. In the training phase, the five plant point cloud segmentation models use the same hyperparameters, specifically: the training batch size is set to 32, the initial learning rate is set to 0.001, the Adam method is used to optimize the network, and a total of 100 cycles are trained. The learning rate is halved for 20 epochs, the weight decay is set to 0.0001, the momentum is set to 0.9, the K value of the K nearest neighbor algorithm is set to 12, and the feature dimension of the point attention layer is set to 256.

在5种植物点云数据上进行器官实例分割测试，取得了最高86.38％的平均交并比和88.58％的平均准确率。为了验证TRGCN的分割能力，另外选择了三个主流的点云分割网络与TRGCN进行对比，在5种分割任务中，TRGCN有9项指标领先其他三种方法，在绝大部分分割任务中取得了最优的精度，尤其在高粱叶片上的精度提高更为明显，这表明TRGCN更擅长处理单子叶植物点云。由于双子叶作物的冠层结构较为拥挤，容易造成遮挡问题，烟草和番茄点云的分割效果并不如单子叶作物，但分割效果仍然好于其他三种分割网络。具体的测试结果如表1所示，图4是五种植物点云的分割效果图。The organ instance segmentation test was conducted on five kinds of plant point cloud data, and the highest average intersection and union ratio of 86.38% and an average accuracy of 88.58% were achieved. In order to verify the segmentation ability of TRGCN, three mainstream point cloud segmentation networks were selected for comparison with TRGCN. Among the 5 segmentation tasks, TRGCN has 9 indicators leading the other three methods, and has achieved the best results in most segmentation tasks. The optimal accuracy, especially the improvement in accuracy on sorghum leaves, is more obvious, which shows that TRGCN is better at processing monocot point clouds. Since the canopy structure of dicotyledonous crops is relatively crowded and can easily cause occlusion problems, the segmentation effect of tobacco and tomato point clouds is not as good as that of monocot crops, but the segmentation effect is still better than the other three segmentation networks. The specific test results are shown in Table 1. Figure 4 is a segmentation rendering of five plant point clouds.

本发明还采用高粱点云作为研究对象，对TRGCN池化层和级联TRGCN块的堆叠数量进行探讨，结果表明采用最大池化的网络分割性能最佳，准确率比平均池化和求和池化高约2％。在级联的TRGCN块数量为3的时候网络在训练时间和分割效果上达到最优，牺牲一定的时间得到了更高的分割精度。具体的测试结果如表2和3所示。This invention also uses sorghum point cloud as the research object to discuss the stacking number of TRGCN pooling layers and cascaded TRGCN blocks. The results show that the network segmentation performance using maximum pooling is the best, and the accuracy is better than average pooling and summation pooling. The chemical is about 2% higher. When the number of cascaded TRGCN blocks is 3, the network achieves optimal training time and segmentation effect, sacrificing a certain amount of time to obtain higher segmentation accuracy. The specific test results are shown in Tables 2 and 3.

表1为本发明TRGCN网络与其他主流网络的分割精度对比表Table 1 is a comparison table of segmentation accuracy between the TRGCN network of the present invention and other mainstream networks.

池化层Pooling layer 训练时间(秒)Training time (seconds) 平均交并比(％)Average intersection ratio (%) 平均准确率(％)Average accuracy (%) 最大池化max pooling 20822082 78.929278.9292 84.919884.9198 平均池化average pooling 20852085 75.774875.7748 80.610480.6104 求和池化sum pooling 20862086 76.370976.3709 82.964782.9647

表2本发明消融实验1，不同池化层的分割效果表Table 2 Ablation experiment 1 of the present invention, segmentation effect table of different pooling layers

TRGCN块数量TRGCN block number 训练时间(秒)Training time (seconds) 平均交并比(％)Average intersection ratio (%) 22 17281728 73.712073.7120 33 20822082 78.929278.9292 44 22022202 75.549875.5498

表3本发明消融实验2，不同TRGCN块堆叠数量的分割效果表。Table 3: Ablation experiment 2 of the present invention, segmentation effect table with different stacking numbers of TRGCN blocks.

Claims

1. A plant organ point cloud segmentation method based on an attention mechanism and graph convolution is characterized by comprising the following steps of: the method comprises the following steps:

step one: the feature encoder takes an original point cloud as input, adopts a multi-layer perceptron to map features to a high-dimensional space, adopts a point cloud attention mechanism to extract the features preliminarily, and then inputs initial feature data into a TRGCN block, and the module can be used for cascade superposition to deepen understanding of the high-dimensional features; the feature aggregation layer in the TRGCN block extracts neighborhood features and downsamples point clouds at the same time, then enters a double-branch parallel network part, is a local feature capturing branch formed by space graph convolution and a global feature learning branch formed by a point attention mechanism respectively, and finally inputs feature data into the T-G feature coupling layer to obtain target number of point clouds and corresponding high-dimensional abstract features;

step two: the feature decoder part stacks three cascaded TRGCN blocks, and receives the outputs of the three TRGCN blocks in the encoder respectively, but replaces the feature aggregation layer with the interpolation layer, and the interpolation layer restores the features of the high-dimensional point set to the low-dimensional point set, and still outputs the grouping result of the K nearest neighbor algorithm for two branches of the TRGCN to calculate; for segmentation result prediction, the decoder sets an independent interpolation layer behind the TRGCN block, adopts a single point attention layer to ensure information integrity, and finally adopts a multi-layer perceptron to output a segmentation result of point cloud;

step three: training a network: a CPU with 12 cores and 20 threads, a 64GB memory and a Nvidia GeForce RTX 3090Ti GPU are arranged on an independent server; the neural network training is carried out by using an independent server, and in the training stage, all plant point cloud segmentation models adopt the same super parameters, wherein the super parameters are specifically as follows: training batch size was set to 32, initial learning rate was set to 0.001, the network was optimized using Adam method for 100 cycles, learning rate was halved every 20 cycles, weight decay was set to 0.0001, momentum was set to 0.9, K value of K nearest neighbor algorithm was set to 12, and feature dimension of point attention layer was set to 256.

2. The method for segmenting the plant organ point cloud based on the attention mechanism and the graph convolution according to claim 1, wherein the method comprises the following steps of: the first step is specifically as follows:

(1) Feature polymeric layer

The specific process of feature polymerization is as follows: inputting x points with feature dimension, firstly sampling the points at the most distant point by using a random, grouping the point clouds by adopting a K nearest neighbor algorithm, inputting the point clouds into a multi-layer perceptron to aggregate the neighbor point features to a central point, and finally obtaining y points with feature' dimension features by adopting a maximum pooling operation;

the characteristic aggregation layer adopts a K nearest neighbor algorithm to sample and group the input point set; the feature aggregation layer outputs the calculated K neighbor matrix and shares the K neighbor matrix with the subsequent parallel branches;

(2) Local feature capture branching

The branch is constructed based on a dynamic space diagram convolution and is used for extracting local features in an input plant point cloud; firstly, constructing a feature graph G= (V, E) based on a point set V and neighbor information E, and carrying out feature extraction on an input feature space by adopting edge convolution; extracting a certain point x _i The formula of the characteristics is as follows:

f _i ＝？h(x _i ,y _i )

wherein x is _j For point x _i Is one of the neighbor points? And h represents a certain aggregation function and a certain relational operation, respectively; the method comprises the steps that neighbor point features around candidate points are aggregated through a relation operation, so that feature information of the candidate points is obtained, and the relation operation is defined as edge convolution;

the maximum pooling is adopted as an aggregation function, and the specific process is as follows:

conv _i ＝Max(MLP(h(x _i ,x _i -x _j )))

the relational operation h is defined as point x _i ，x _i And its neighbor point x _j Feature difference value and point x of (2) _i Linear combinations between the output values;

(3) Global feature learning branching

The feature is extracted by adopting a vector attention mechanism in a local neighborhood, and the calculation formula is as follows:

wherein x is _j Is the point x _i X is an independent point set in each single plant point cloud, ρ is a regularization function, γ is a mapping function, β is a difference between a neighborhood point and a point of interest, φ,Alpha is a characteristic transformation method of point level, Q, K, V values in a self-attention mechanism are respectively obtained, delta is a position coding function, a point attention layer is provided according to the attention mechanism, and an improved calculation formula is as follows:

(4) T-G feature coupling layer

Through the processing, the feature matrix with two dimensions and identical shapes is obtained: a matrix G with significant local features and a matrix T with complete global features; and (3) inputting the spliced G and T into a feature coupling layer to obtain a target feature matrix:

TG＝Linear(ReLU(Linear(T,G)))

the T-G characteristic coupling layer is designed by adopting two linear layers and one ReLU activation layer, so that the network can learn more important information of each of the two parts of matrixes and combine the two parts of matrixes into a target characteristic matrix.