CN115375707B

CN115375707B - A method and system for accurately segmenting plant leaves under complex backgrounds

Info

Publication number: CN115375707B
Application number: CN202210990338.0A
Authority: CN
Inventors: 高攀; 闫靖昆; 鄢天颖; 黄毓贤; 张远; 吕新
Original assignee: Shihezi University
Current assignee: Shihezi University
Priority date: 2022-08-18
Filing date: 2022-08-18
Publication date: 2023-07-25
Anticipated expiration: 2042-08-18
Also published as: CN115375707A

Abstract

The invention discloses a method and a system for accurately dividing plant leaves under a complex background, and relates to the technical field of image processing. The method comprises the following steps: acquiring a plant image under a complex background as a target blade image; inputting the target blade image into the blade segmentation model to obtain a predicted image segmentation mask of the target blade image; the predictive image segmentation mask is used for segmenting the target blade image; the blade segmentation model comprises a trained composite backbone network and a trained image segmentation network; the composite backbone network comprises an auxiliary backbone network and a guiding backbone network which are connected by multi-level residual jump; the auxiliary backbone network is constructed based on a convolutional neural network and a transducer network; the guided backbone network is constructed based on a convolutional neural network. The invention can realize plant leaf segmentation under complex background, improve segmentation precision and segmentation speed, and make segmentation result accurately reflect phenotype character of plant leaf.

Description

A method and system for accurately segmenting plant leaves under complex backgrounds

技术领域technical field

本发明涉及图像处理技术领域，特别是涉及一种复杂背景下植物叶片精准分割方法及系统。The invention relates to the technical field of image processing, in particular to a method and system for accurately segmenting plant leaves under complex backgrounds.

背景技术Background technique

植物在生长过程中会受到生物胁迫和非生物胁迫，为获取胁迫类型，须持续监测植物叶片相关的表型性状，植物在体图像检测是一种快速监测方法，但从图像复杂背景中精准分割出植物叶片成为技术难点。Plants are subject to biotic and abiotic stresses during their growth. In order to obtain the type of stress, phenotypic traits related to plant leaves must be continuously monitored. Plant in-vivo image detection is a fast monitoring method, but it has become a technical difficulty to accurately segment plant leaves from complex image backgrounds.

传统图像分割算法高度依赖专家经验且普适性不强，且对于在复杂背景下的植物叶片分割的速度和精度不高，分割结果无法准确反映植物叶片的表型性状。Traditional image segmentation algorithms are highly dependent on expert experience and are not universally applicable, and the speed and accuracy of plant leaf segmentation in complex backgrounds are not high, and the segmentation results cannot accurately reflect the phenotypic traits of plant leaves.

随着算法理论和硬件计算能力的进步，深度学习由于其强力的非线性学习能力和鲁棒的泛化能力成为了一个更好的图像分割方法。特别是卷积神经网络的深层堆叠结构和更好的特征表示，使得它逐渐成为了植物表型领域的一个主流架构。虽然，卷积神经网络已经被广泛地应用于植物表型领域。但是它也有一些明显的缺点。例如，它不能完全学习图像的低层次特征。而且，全局信息的缺失表明它在一些复杂的环境中往往是无效的。近年来，Transformer已经成为计算机视觉领域的一个热点。因为它们对全局上下文有很好的理解，在大数据集任务上有很高的精度表现，很快就被植物研究人员所认可。但在某些方面，Transformer仍然有不可忽视的缺陷。例如，由于缺乏正确的归纳偏置，它们的泛化能力常常不如卷积神经网络。此外，Transformer也不能快速应用到农业任务中，因为更大的模型容量带来了更多的参数和更高的算力要求。因此，结合了卷积神经网络和Transformer所有优点的混合神经网络正逐渐成为一种新的研究方向。With the advancement of algorithm theory and hardware computing power, deep learning has become a better image segmentation method due to its powerful nonlinear learning ability and robust generalization ability. In particular, the deep stacking structure and better feature representation of convolutional neural networks have gradually made it a mainstream architecture in the field of plant phenotypes. Although, convolutional neural networks have been widely used in the field of plant phenotyping. But it also has some obvious disadvantages. For example, it cannot fully learn low-level features of images. Moreover, the absence of global information suggests that it is often ineffective in some complex environments. In recent years, Transformer has become a hot spot in the field of computer vision. Because they have a good understanding of the global context and high accuracy performance on large dataset tasks, they were quickly recognized by plant researchers. But in some respects, Transformer still has flaws that cannot be ignored. For example, they often do not generalize as well as convolutional neural networks due to lack of correct inductive bias. In addition, Transformer cannot be quickly applied to agricultural tasks, because the larger model capacity brings more parameters and higher computing power requirements. Therefore, hybrid neural networks that combine all the advantages of convolutional neural networks and Transformers are gradually becoming a new research direction.

发明内容Contents of the invention

本发明的目的是提供一种复杂背景下植物叶片精准分割方法及系统，以实现在复杂背景下的植物叶片分割，提高叶片分割精度和分割速度，从而为准确获取植物叶片的表型参数提供数据基础。The purpose of the present invention is to provide a method and system for accurately segmenting plant leaves under complex backgrounds, so as to realize the segmentation of plant leaves under complex backgrounds, improve the accuracy and speed of leaf segmentation, and provide a data basis for accurately obtaining phenotypic parameters of plant leaves.

为实现上述目的，本发明提供了如下方案：To achieve the above object, the present invention provides the following scheme:

一种复杂背景下植物叶片精准分割方法，所述复杂背景下植物叶片精准分割方法包括：A method for accurately segmenting plant leaves under complex backgrounds, the method for accurately segmenting plant leaves under complex backgrounds includes:

获取目标叶片图像；所述目标叶片图像为复杂背景下的植物叶片图像；Obtaining a target leaf image; the target leaf image is a plant leaf image under a complex background;

将所述目标叶片图像输入至叶片分割模型中，得到目标叶片图像的预测图像分割掩码；所述目标叶片图像的预测图像分割掩码用于分割所述目标叶片图像；The target leaf image is input into the leaf segmentation model to obtain a predicted image segmentation mask of the target leaf image; the predicted image segmentation mask of the target leaf image is used to segment the target leaf image;

所述叶片分割模型由混合神经分割网络训练得到；所述混合神经分割网络包括复合主干网络和图像分割网络；训练好的复合主干网络用于对所述目标叶片图像进行特征提取，得到所述目标叶片图像的多层融合特征图；训练好的图像分割网络用于根据所述目标叶片图像的多层融合特征图确定所述目标叶片图像的预测图像分割掩码；The leaf segmentation model is obtained by training a hybrid neural segmentation network; the hybrid neural segmentation network includes a composite backbone network and an image segmentation network; the trained composite backbone network is used to extract features from the target leaf image to obtain a multi-layer fusion feature map of the target leaf image; the trained image segmentation network is used to determine the predicted image segmentation mask of the target leaf image according to the multi-layer fusion feature map of the target leaf image;

所述复合主干网络包括多层级残差跳跃连接的辅助主干网络和引导主干网络；所述辅助主干网络基于卷积神经网络和Transformer网络构建；所述引导主干网络基于卷积神经网络构建。The composite backbone network includes an auxiliary backbone network with multi-level residual skip connections and a guided backbone network; the auxiliary backbone network is constructed based on a convolutional neural network and a Transformer network; the guided backbone network is constructed based on a convolutional neural network.

可选地，所述叶片分割模型的确定方法包括：Optionally, the method for determining the blade segmentation model includes:

获取样本数据集；所述样本数据集包括多幅样本叶片图像和对应的真实图像分割掩码；Obtain a sample data set; the sample data set includes a plurality of sample leaf images and corresponding real image segmentation masks;

将每幅样本叶片图像输入至复合主干网络中进行特征提取，得到每幅样本叶片图像的多层辅助特征图和多层融合特征图；Input each sample leaf image into the composite backbone network for feature extraction, and obtain the multi-layer auxiliary feature map and multi-layer fusion feature map of each sample leaf image;

将每幅样本叶片图像的多层辅助特征图输入至图像分割网络中进行预测分割，得到每幅样本叶片图像的辅助图像分割掩码；Input the multi-layer auxiliary feature map of each sample leaf image into the image segmentation network for prediction segmentation, and obtain the auxiliary image segmentation mask of each sample leaf image;

将每幅样本叶片图像的多层融合特征图输入至图像分割网络中进行预测分割，得到每幅样本叶片图像的预测图像分割掩码；Input the multi-layer fusion feature map of each sample leaf image into the image segmentation network for prediction segmentation, and obtain the prediction image segmentation mask of each sample leaf image;

根据所述真实图像分割掩码、所述辅助图像分割掩码和所述预测图像分割掩码确定综合交叉熵损失；所述综合交叉熵损失包括：所述复合主干网络的交叉熵损失和所述辅助主干网络的交叉熵损失；A comprehensive cross-entropy loss is determined according to the real image segmentation mask, the auxiliary image segmentation mask, and the predicted image segmentation mask; the comprehensive cross-entropy loss includes: the cross-entropy loss of the composite backbone network and the cross-entropy loss of the auxiliary backbone network;

以综合交叉熵损失最小为目标，对混合神经分割网络进行训练，得到叶片分割模型；所述叶片分割模型包括训练好的复合主干网络和训练好的图像分割网络。Aiming at minimizing the comprehensive cross-entropy loss, the hybrid neural segmentation network is trained to obtain a leaf segmentation model; the leaf segmentation model includes a trained composite backbone network and a trained image segmentation network.

可选地，所述将每幅样本叶片图像输入至复合主干网络中进行特征提取，得到每幅样本叶片图像的多层辅助特征图和多层融合特征图，具体包括：Optionally, said inputting each sample leaf image into the composite backbone network for feature extraction to obtain a multi-layer auxiliary feature map and a multi-layer fusion feature map of each sample leaf image, specifically including:

对于任意一幅样本叶片图像，将所述样本叶片图像输入至所述辅助主干网络中进行特征提取，得到多层辅助特征图；所述多层辅助特征图包括：多层局部特征图和多层全局特征图；For any sample leaf image, the sample leaf image is input into the auxiliary backbone network for feature extraction to obtain a multi-layer auxiliary feature map; the multi-layer auxiliary feature map includes: a multi-layer local feature map and a multi-layer global feature map;

将所述多层辅助特征图和所述样本叶片图像输入至所述引导主干网络中进行特征提取，得到多层融合特征图。The multi-layer auxiliary feature map and the sample leaf image are input into the guided backbone network for feature extraction to obtain a multi-layer fusion feature map.

可选地，所述辅助主干网络包括：三个卷积神经网络模块和两个自注意力网络模块，且相邻两个网络模块之间存在残差跳跃连接；Optionally, the auxiliary backbone network includes: three convolutional neural network modules and two self-attention network modules, and there is a residual skip connection between two adjacent network modules;

所述将所述样本叶片图像输入至所述辅助主干网络中进行特征提取，得到多层辅助特征图，具体包括：The described sample leaf image is input into the auxiliary backbone network for feature extraction to obtain a multi-layer auxiliary feature map, which specifically includes:

将所述样本叶片图像输入至所述辅助主干网络的第一个卷积神经网络模块进行特征提取，得到第一层局部特征图；The sample leaf image is input to the first convolutional neural network module of the auxiliary backbone network for feature extraction to obtain the first layer of local feature maps;

将第一层局部特征图输入至所述辅助主干网络的第二个卷积神经网络模块进行特征提取，得到第二层局部特征图；The first layer of local feature maps are input to the second convolutional neural network module of the auxiliary backbone network for feature extraction to obtain the second layer of local feature maps;

将第二层局部特征图输入至所述辅助主干网络的第三个卷积神经网络模块进行特征提取，得到第三层局部特征图；The second layer of local feature maps are input to the third convolutional neural network module of the auxiliary backbone network for feature extraction to obtain the third layer of local feature maps;

将第三层局部特征图输入至所述辅助主干网络的第一个自注意力网络模块进行特征提取，得到第一层全局特征图；Inputting the third layer of local feature maps to the first self-attention network module of the auxiliary backbone network for feature extraction to obtain the first layer of global feature maps;

将第一层全局特征图输入至所述辅助主干网络的第二个自注意力网络模块进行特征提取，得到第二层全局特征图。The first layer of global feature map is input to the second self-attention network module of the auxiliary backbone network for feature extraction to obtain the second layer of global feature map.

可选地，所述引导主干网络包括：五个卷积神经网络模块，且相邻两个网络模块之间存在残差跳跃连接；Optionally, the guided backbone network includes: five convolutional neural network modules, and residual skip connections exist between two adjacent network modules;

所述将所述多层辅助特征图和所述样本叶片图像输入至所述引导主干网络中进行特征提取，得到多层融合特征图，具体包括：Said inputting said multi-layer auxiliary feature map and said sample leaf image into said guiding backbone network for feature extraction to obtain a multi-layer fusion feature map, specifically comprising:

将第一至第三层局部特征图、第一至第二层全局特征图及所述样本叶片图像输入至所述引导主干网络的第一个卷积神经网络模块进行特征提取，得到第一层融合特征图；Inputting the first to third layers of local feature maps, the first to second layers of global feature maps and the sample leaf image to the first convolutional neural network module of the guided backbone network for feature extraction to obtain the first layer of fusion feature maps;

将第二至第三层局部特征图、第一至第二层全局特征图及第一层融合特征图输入至所述引导主干网络的第二个卷积神经网络模块进行特征提取，得到第二层融合特征图；The second to third layers of local feature maps, the first to second layers of global feature maps and the first layer of fusion feature maps are input to the second convolutional neural network module of the guided backbone network for feature extraction to obtain the second layer of fusion feature maps;

将第三层局部特征图、第一至第二层全局特征图及第二层融合特征图输入至所述引导主干网络的第三个卷积神经网络模块进行特征提取，得到第三层融合特征图；The third layer of local feature maps, the first to second layer of global feature maps and the second layer of fusion feature maps are input to the third convolutional neural network module of the guided backbone network for feature extraction, and the third layer of fusion feature maps is obtained;

将第一至第二层全局特征图及第三层融合特征图输入至所述引导主干网络的第四个卷积神经网络模块进行特征提取，得到第四层融合特征图；The first to the second layer of global feature maps and the third layer of fusion feature maps are input to the fourth convolutional neural network module of the guided backbone network for feature extraction to obtain the fourth layer of fusion feature maps;

将第二层全局特征图及第四层融合特征图输入至所述引导主干网络的第五个卷积神经网络模块进行特征提取，得到第五层融合特征图。The second layer global feature map and the fourth layer fusion feature map are input to the fifth convolutional neural network module of the guided backbone network for feature extraction to obtain the fifth layer fusion feature map.

可选地，所述辅助主干网络和所述引导主干网络之间的每层残差跳跃连接处还包括一个通道批量归一化模块；将多层辅助特征图与对应的中间图像输入至所述引导主干网络中对应的卷积神经网络模块之前还包括：Optionally, each layer of residual skip connection between the auxiliary backbone network and the guided backbone network further includes a channel batch normalization module; before inputting the multi-layer auxiliary feature map and the corresponding intermediate image to the corresponding convolutional neural network module in the guided backbone network, it also includes:

将多层辅助特征图进行拼接，得到对应的拼接特征图；所述拼接特征图为：第一至第三层局部特征图与第一至第二层全局特征图的拼接特征图、第二至第三层局部特征图与第一至第二层全局特征图的拼接特征图、第三层局部特征图与第一至第二层全局特征图的拼接特征图、第一至第二层全局特征图的拼接特征图或第二层全局特征图；The multi-layer auxiliary feature map is spliced to obtain a corresponding spliced feature map; the spliced feature map is: a spliced feature map of the first to third layer local feature map and the first to second layer global feature map, a spliced feature map of the second to third layer local feature map and the first to second layer global feature map, a spliced feature map of the third layer local feature map and the first to second layer global feature map, a spliced feature map of the first to second layer global feature map or the second layer global feature map;

将所述拼接特征图与对应的中间图像按元素进行叠加，得到对应的叠加特征图；所述中间图像为：所述样本叶片图像、所述第一层融合特征图、所述第二层融合特征图、所述第三层融合特征图或所述第四层融合特征图；Superimpose the spliced feature map and the corresponding intermediate image element by element to obtain a corresponding superimposed feature map; the intermediate image is: the sample leaf image, the first layer fusion feature map, the second layer fusion feature map, the third layer fusion feature map or the fourth layer fusion feature map;

利用对应的通道批量归一化模块对所述叠加特征图进行通道批量归一化处理，并将处理后的叠加特征图作为所述引导主干网络中对应的卷积神经网络模块的输入。Using the corresponding channel batch normalization module to perform channel batch normalization processing on the overlay feature map, and use the processed overlay feature map as an input to the corresponding convolutional neural network module in the guided backbone network.

可选地，所述综合交叉熵损失的计算公式为：Optionally, the formula for calculating the comprehensive cross-entropy loss is:

L＝L_Lead+λ·L_Assist；L＝L _Lead +λ·L _Assist ;

其中，L表示综合交叉熵损失；L_Lead表示复合主干网络的交叉熵损失，由所述真实图像分割掩码和所述预测图像分割掩码确定；L_Assist表示辅助主干网络的交叉熵损失，由所述真实图像分割掩码和所述辅助图像分割掩码确定；λ表示辅助监督的权重。Among them, L represents the comprehensive cross-entropy loss; L _Lead represents the cross-entropy loss of the composite backbone network, which is determined by the real image segmentation mask and the predicted image segmentation mask; L _Assist represents the cross-entropy loss of the auxiliary backbone network, which is determined by the real image segmentation mask and the auxiliary image segmentation mask; λ represents the weight of the auxiliary supervision.

可选地，所述复合主干网络的交叉熵损失的计算公式为：Optionally, the calculation formula of the cross-entropy loss of the composite backbone network is:

所述辅助主干网络的交叉熵损失的计算公式为：The calculation formula of the cross-entropy loss of the auxiliary backbone network is:

其中，L_Lead表示复合主干网络的交叉熵损失；L_Assist表示辅助主干网络的交叉熵损失；m表示输入的样本叶片图像的像素总数；y_i表示输入的样本叶片图像中第i个像素的真实类别值，输入的样本叶片图像中全部像素的真实类别值构成输入的样本叶片图像的真实图像分割掩码；p_i表示输入的样本叶片图像中第i个像素的预测类别值，输入的样本叶片图像中全部像素的预测类别值构成输入的样本叶片图像的预测图像分割掩码；q_i表示输入的样本叶片图像中第i个像素的辅助类别值，输入的样本叶片图像中全部像素的辅助类别值构成输入的样本叶片图像的辅助图像分割掩码；i表示输入的样本叶片图像中像素的序号。Among them, L_leadIndicates the cross-entropy loss of the composite backbone network; L_AssistRepresents the cross-entropy loss of the auxiliary backbone network; m represents the total number of pixels of the input sample leaf image; y_iRepresents the real category value of the i-th pixel in the input sample leaf image, and the real category values of all pixels in the input sample leaf image constitute the real image segmentation mask of the input sample leaf image; p_iRepresents the predicted category value of the i-th pixel in the input sample leaf image, and the predicted category values of all pixels in the input sample leaf image constitute the predicted image segmentation mask of the input sample leaf image; q_iRepresents the auxiliary category value of the i-th pixel in the input sample leaf image, and the auxiliary category values of all pixels in the input sample leaf image constitute the auxiliary image segmentation mask of the input sample leaf image; i represents the sequence number of the pixel in the input sample leaf image.

可选地，所述样本数据集的获取方法包括：Optionally, the method for obtaining the sample data set includes:

获取叶片数据集；所述叶片数据集包括复杂背景下的多幅植物叶片图像；Obtaining a leaf data set; the leaf data set includes multiple plant leaf images under complex backgrounds;

对所述叶片数据集中的每幅植物叶片图像均进行去噪、标注和扩展处理，得到多幅样本叶片图像和对应的真实图像分割掩码。Each plant leaf image in the leaf dataset is denoised, labeled and expanded to obtain multiple sample leaf images and corresponding real image segmentation masks.

一种复杂背景下植物叶片精准分割系统，采用上述复杂背景下植物叶片精准分割方法实现，所述复杂背景下植物叶片精准分割系统包括：A system for precise segmentation of plant leaves under complex backgrounds, which is realized by adopting the above method for precise segmentation of plant leaves under complex backgrounds. The system for precise segmentation of plant leaves under complex backgrounds includes:

目标叶片图像获取模块，用于获取目标叶片图像；所述目标叶片图像为复杂背景下的植物叶片图像；A target leaf image acquisition module, configured to acquire a target leaf image; the target leaf image is a plant leaf image under a complex background;

图像分割掩码预测模块，用于将所述目标叶片图像输入至叶片分割模型中，得到目标叶片图像的预测图像分割掩码；所述目标叶片图像的预测图像分割掩码用于分割所述目标叶片图像；An image segmentation mask prediction module, configured to input the target leaf image into the leaf segmentation model to obtain a predicted image segmentation mask of the target leaf image; the predicted image segmentation mask of the target leaf image is used to segment the target leaf image;

根据本发明提供的具体实施例，本发明公开了以下技术效果：According to the specific embodiments provided by the invention, the invention discloses the following technical effects:

本发明提供一种复杂背景下植物叶片精准分割方法及系统，利用辅助主干网络提取目标叶片图像的局部特征和全局特征，利用引导主干网络提取目标叶片图像的融合特征，并通过在辅助主干网络与引导主干网络之间建立多层级跳跃连接结构，以有效融合高低层特征图，使复合主干网络能够充分学习目标叶片图像的边缘信息和多层次特征，既保留了卷积神经网络良好的泛化能力和收敛能力，又兼顾了Transformer网络广阔的全局感受野和更高的模型容量，因此能够实现在复杂背景下的叶片分割，且具有更高的分割精度和更快的分割速度，使分割结果可以更准确地反映植物叶片的表型性状。The present invention provides a method and system for accurately segmenting plant leaves under complex backgrounds. The auxiliary backbone network is used to extract the local features and global features of the target leaf image, and the guided backbone network is used to extract the fusion features of the target leaf image. A multi-level skip connection structure is established between the auxiliary backbone network and the guided backbone network to effectively fuse high and low-level feature maps, so that the composite backbone network can fully learn the edge information and multi-level features of the target leaf image. It not only retains the good generalization ability and convergence ability of the convolutional neural network, but also takes into account the broad global receptive field of the Transformer network. And higher model capacity, so it can realize leaf segmentation in complex background, and has higher segmentation accuracy and faster segmentation speed, so that the segmentation results can more accurately reflect the phenotypic traits of plant leaves.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the accompanying drawings required in the embodiments will be briefly introduced below. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without paying creative labor.

图1为本发明实施例提供的复杂背景下植物叶片精准分割方法的流程图；Fig. 1 is a flowchart of a method for accurately segmenting plant leaves under a complex background provided by an embodiment of the present invention;

图2为本发明实施例提供的辅助主干网络的结构图；FIG. 2 is a structural diagram of an auxiliary backbone network provided by an embodiment of the present invention;

图3为本发明实施例提供的复合主干网络的结构图；FIG. 3 is a structural diagram of a composite backbone network provided by an embodiment of the present invention;

图4为本发明实施例提供的叶片分割模型的整体网络结构示意图；4 is a schematic diagram of the overall network structure of the leaf segmentation model provided by the embodiment of the present invention;

图5为本发明实施例提供的复杂背景下植物叶片精准分割方法的具体流程图；Fig. 5 is a specific flowchart of a method for accurately segmenting plant leaves under a complex background provided by an embodiment of the present invention;

图6为本发明实施例提供的基于批量归一化的权重贡献矩阵的计算过程示意图；6 is a schematic diagram of a calculation process of a weight contribution matrix based on batch normalization provided by an embodiment of the present invention;

图7为本发明实施例提供的第一类叶片图像分割结果示意图；Fig. 7 is a schematic diagram of the segmentation result of the first type of leaf image provided by the embodiment of the present invention;

图8为本发明实施例提供的第二类叶片图像分割结果示意图；Fig. 8 is a schematic diagram of the segmentation result of the second type of leaf image provided by the embodiment of the present invention;

图9为本发明实施例提供的复杂背景下植物叶片精准分割系统的结构图；Fig. 9 is a structural diagram of a system for accurately segmenting plant leaves under a complex background provided by an embodiment of the present invention;

图10为本发明实施例提供的复杂背景下植物叶片精准分割系统的具体结构图。Fig. 10 is a specific structural diagram of a system for accurately segmenting plant leaves under a complex background provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

为使本发明的上述目的、特征和优点能够更加明显易懂，下面结合附图和具体实施方式对本发明作进一步详细的说明。In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

图1为本发明实施例提供的复杂背景下植物叶片精准分割方法的流程图。如图1所示，本发明提供的复杂背景下植物叶片精准分割方法包括：FIG. 1 is a flowchart of a method for accurately segmenting plant leaves under a complex background provided by an embodiment of the present invention. As shown in Figure 1, the method for accurately segmenting plant leaves under complex backgrounds provided by the present invention includes:

步骤101：获取目标叶片图像；所述目标叶片图像为复杂背景下的植物叶片图像。Step 101: acquiring a target leaf image; the target leaf image is a plant leaf image under a complex background.

步骤102：将所述目标叶片图像输入至叶片分割模型中，得到目标叶片图像的预测图像分割掩码；所述目标叶片图像的预测图像分割掩码用于分割所述目标叶片图像，从而为获取叶片表型参数提供数据基础；所述叶片分割模型由混合神经分割网络训练得到；所述混合神经分割网络包括复合主干网络和图像分割网络；训练好的复合主干网络用于对所述目标叶片图像进行特征提取，得到所述目标叶片图像的多层融合特征图；训练好的图像分割网络用于根据所述目标叶片图像的多层融合特征图确定所述目标叶片图像的预测图像分割掩码；所述复合主干网络包括多层级残差跳跃连接的辅助主干网络和引导主干网络；所述辅助主干网络基于卷积神经网络和Transformer网络构建；所述引导主干网络基于卷积神经网络构建。所述图像分割网络包括相互连接的编码器模块和解码器模块；所述编码器模块基于空洞空间卷积池化金字塔网络构建。Step 102: Input the target leaf image into the leaf segmentation model to obtain a predicted image segmentation mask of the target leaf image; the predicted image segmentation mask of the target leaf image is used to segment the target leaf image, thereby providing a data basis for obtaining leaf phenotype parameters; the leaf segmentation model is obtained by training a hybrid neural segmentation network; the hybrid neural segmentation network includes a composite backbone network and an image segmentation network; The multi-layer fusion feature map determines the predicted image segmentation mask of the target leaf image; the composite backbone network includes an auxiliary backbone network and a guided backbone network of multi-level residual skip connections; the auxiliary backbone network is constructed based on a convolutional neural network and a Transformer network; the guided backbone network is constructed based on a convolutional neural network. The image segmentation network includes an encoder module and a decoder module connected to each other; the encoder module is constructed based on a spatial convolution pooling pyramid network.

进一步地，所述叶片分割模型的确定方法包括：Further, the method for determining the blade segmentation model includes:

步骤1：获取样本数据集；所述样本数据集包括多幅样本叶片图像和对应的真实图像分割掩码。Step 1: Obtain a sample data set; the sample data set includes multiple sample leaf images and corresponding real image segmentation masks.

具体地，所述样本数据集的获取方法包括：Specifically, the acquisition method of the sample data set includes:

步骤1.1：获取叶片数据集；所述叶片数据集包括复杂背景下的多幅植物叶片图像。Step 1.1: Obtain a leaf dataset; the leaf dataset includes multiple images of plant leaves under complex backgrounds.

优选地，所述复杂背景下的多幅植物叶片图像包括第一类植物叶片图像和第二类植物叶片图像，其中所述第一类植物叶片图像包括健康的植物叶片图像，所述第二类植物叶片图像包括受到病虫害感染的植物叶片图像和堆叠光照不均匀的植物叶片图像。本发明通过将两类不同的植物叶片图像共同作为样本数据集参与训练，提高了叶片分割模型的鲁棒性和普适性，使面对复杂样本时仍能够实现准确分割。Preferably, the multiple plant leaf images under the complex background include a first type of plant leaf image and a second type of plant leaf image, wherein the first type of plant leaf image includes a healthy plant leaf image, and the second type of plant leaf image includes an image of a plant leaf infected by a disease and insect pest and a stack of unevenly illuminated plant leaf images. The invention improves the robustness and universality of the leaf segmentation model by using two different types of plant leaf images as sample data sets to participate in training, so that accurate segmentation can still be achieved when facing complex samples.

步骤1.2：对所述叶片数据集中的每幅植物叶片图像均进行去噪、标注和扩展处理，得到多幅样本叶片图像和对应的真实图像分割掩码。Step 1.2: Denoise, label and expand each plant leaf image in the leaf dataset to obtain multiple sample leaf images and corresponding real image segmentation masks.

步骤2：将每幅样本叶片图像输入至复合主干网络中进行特征提取，得到每幅样本叶片图像的多层辅助特征图和多层融合特征图。其中，对于任意一幅样本叶片图像，步骤2具体包括：Step 2: Input each sample leaf image into the composite backbone network for feature extraction, and obtain the multi-layer auxiliary feature map and multi-layer fusion feature map of each sample leaf image. Wherein, for any sample leaf image, step 2 specifically includes:

步骤2.1：将所述样本叶片图像输入至所述辅助主干网络中进行特征提取，得到多层辅助特征图；所述多层辅助特征图包括：多层局部特征图和多层全局特征图。Step 2.1: Input the sample leaf image into the auxiliary backbone network for feature extraction to obtain a multi-layer auxiliary feature map; the multi-layer auxiliary feature map includes: a multi-layer local feature map and a multi-layer global feature map.

步骤2.2：将所述多层辅助特征图和所述样本叶片图像输入至所述引导主干网络中进行特征提取，得到多层融合特征图。Step 2.2: Input the multi-layer auxiliary feature map and the sample leaf image into the guided backbone network for feature extraction to obtain a multi-layer fusion feature map.

图2为本发明实施例提供的辅助主干网络的结构图，图3为本发明实施例提供的复合主干网络的结构图。如图2及图3所示，所述辅助主干网络包括：三个卷积神经网络模块(即S0、S1和S2)和两个自注意力网络模块(即S3和S4)，且相邻两个网络模块之间存在残差跳跃连接。所述引导主干网络包括：五个卷积神经网络模块，且相邻两个网络模块之间存在残差跳跃连接。FIG. 2 is a structural diagram of an auxiliary backbone network provided by an embodiment of the present invention, and FIG. 3 is a structural diagram of a composite backbone network provided by an embodiment of the present invention. As shown in Figure 2 and Figure 3, the auxiliary backbone network includes: three convolutional neural network modules (ie S0, S1 and S2) and two self-attention network modules (ie S3 and S4), and there is a residual skip connection between two adjacent network modules. The guiding backbone network includes: five convolutional neural network modules, and residual skip connections exist between two adjacent network modules.

进一步地，所述辅助主干网络的第一个卷积神经网络模块S0包括顺次连接的两个二维卷积层。所述辅助主干网络的第二个卷积神经网络模块S1包括顺次连接的三个二维卷积层。所述辅助主干网络的第三个卷积神经网络模块S2包括顺次连接的三个二维卷积层。所述辅助主干网络的第一个自注意力网络模块S3包括顺次连接的一个自注意力层和一个全连接层。所述辅助主干网络的第二个自注意力网络模块S4包括顺次连接的一个自注意力层和一个全连接层。并且，S0模块与S1模块之间、S1模块与S2模块之间、S2模块与S3模块之间、S3模块与S4模块之间、S3模块中的自注意力层与全连接层之间以及S4模块中的自注意力层与全连接层之间均存在残差跳跃连接。所述引导主干网络的每个卷积神经网络模块均由三个二维卷积层构成。Further, the first convolutional neural network module S0 of the auxiliary backbone network includes two two-dimensional convolutional layers connected in sequence. The second convolutional neural network module S1 of the auxiliary backbone network includes three two-dimensional convolutional layers connected in sequence. The third convolutional neural network module S2 of the auxiliary backbone network includes three two-dimensional convolutional layers connected in sequence. The first self-attention network module S3 of the auxiliary backbone network includes a self-attention layer and a fully-connected layer connected in sequence. The second self-attention network module S4 of the auxiliary backbone network includes a self-attention layer and a fully-connected layer connected in sequence. Moreover, there are residual skip connections between the S0 module and the S1 module, between the S1 module and the S2 module, between the S2 module and the S3 module, between the S3 module and the S4 module, between the self-attention layer and the fully connected layer in the S3 module, and between the self-attention layer and the fully connected layer in the S4 module. Each convolutional neural network module of the guided backbone network consists of three two-dimensional convolutional layers.

因此，所述将所述样本叶片图像输入至所述辅助主干网络中进行特征提取，得到多层辅助特征图，具体包括：Therefore, the input of the sample leaf image into the auxiliary backbone network for feature extraction to obtain a multi-layer auxiliary feature map specifically includes:

步骤2.1.1：将所述样本叶片图像输入至所述辅助主干网络的第一个卷积神经网络模块进行特征提取，得到第一层局部特征图。Step 2.1.1: Input the sample leaf image to the first convolutional neural network module of the auxiliary backbone network for feature extraction to obtain the first layer of local feature maps.

步骤2.1.2：将第一层局部特征图输入至所述辅助主干网络的第二个卷积神经网络模块进行特征提取，得到第二层局部特征图。Step 2.1.2: Input the local feature map of the first layer to the second convolutional neural network module of the auxiliary backbone network for feature extraction to obtain the local feature map of the second layer.

步骤2.1.3：将第二层局部特征图输入至所述辅助主干网络的第三个卷积神经网络模块进行特征提取，得到第三层局部特征图。Step 2.1.3: Input the local feature map of the second layer to the third convolutional neural network module of the auxiliary backbone network for feature extraction to obtain the local feature map of the third layer.

步骤2.1.4：将第三层局部特征图输入至所述辅助主干网络的第一个自注意力网络模块进行特征提取，得到第一层全局特征图。Step 2.1.4: Input the third-layer local feature map to the first self-attention network module of the auxiliary backbone network for feature extraction to obtain the first-layer global feature map.

步骤2.1.5：将第一层全局特征图输入至所述辅助主干网络的第二个自注意力网络模块进行特征提取，得到第二层全局特征图。Step 2.1.5: Input the global feature map of the first layer to the second self-attention network module of the auxiliary backbone network for feature extraction, and obtain the global feature map of the second layer.

步骤2.2.1：将第一至第三层局部特征图、第一至第二层全局特征图及所述样本叶片图像输入至所述引导主干网络的第一个卷积神经网络模块进行特征提取，得到第一层融合特征图L0。Step 2.2.1: Input the local feature maps of the first to third layers, the global feature maps of the first to second layers, and the sample leaf image to the first convolutional neural network module of the guided backbone network for feature extraction, and obtain the first layer of fusion feature map L0.

步骤2.2.2将第二至第三层局部特征图、第一至第二层全局特征图及第一层融合特征图输入至所述引导主干网络的第二个卷积神经网络模块进行特征提取，得到第二层融合特征图L1。Step 2.2.2 Input the second to third layer local feature maps, the first to second layer global feature maps and the first layer fusion feature map to the second convolutional neural network module of the guided backbone network for feature extraction, and obtain the second layer fusion feature map L1.

步骤2.2.3将第三层局部特征图、第一至第二层全局特征图及第二层融合特征图输入至所述引导主干网络的第三个卷积神经网络模块进行特征提取，得到第三层融合特征图L2。Step 2.2.3 Input the third layer local feature map, the first to second layer global feature map and the second layer fusion feature map to the third convolutional neural network module of the guided backbone network for feature extraction to obtain the third layer fusion feature map L2.

步骤2.2.4将第一至第二层全局特征图及第三层融合特征图输入至所述引导主干网络的第四个卷积神经网络模块进行特征提取，得到第四层融合特征图L3。Step 2.2.4 Input the first to second layer global feature maps and the third layer fusion feature map to the fourth convolutional neural network module of the guided backbone network for feature extraction to obtain the fourth layer fusion feature map L3.

步骤2.2.5将第二层全局特征图及第四层融合特征图输入至所述引导主干网络的第五个卷积神经网络模块进行特征提取，得到第五层融合特征图L4。Step 2.2.5 Input the second-layer global feature map and the fourth-layer fusion feature map to the fifth convolutional neural network module of the guided backbone network for feature extraction, and obtain the fifth-layer fusion feature map L4.

进一步地，所述辅助主干网络和所述引导主干网络之间的每层残差跳跃连接处还包括一个通道批量归一化模块。Further, each layer of residual jump connection between the auxiliary backbone network and the guidance backbone network further includes a channel batch normalization module.

在本实施例中，将多层辅助特征图与对应的中间图像输入至所述引导主干网络中对应的卷积神经网络模块之前还包括：In this embodiment, before inputting the multi-layer auxiliary feature map and the corresponding intermediate image to the corresponding convolutional neural network module in the guided backbone network, it also includes:

(1)将多层辅助特征图进行拼接，得到对应的拼接特征图；所述拼接特征图为：第一至第三层局部特征图与第一至第二层全局特征图的拼接特征图、第二至第三层局部特征图与第一至第二层全局特征图的拼接特征图、第三层局部特征图与第一至第二层全局特征图的拼接特征图、第一至第二层全局特征图的拼接特征图或第二层全局特征图。(1) The multi-layer auxiliary feature maps are spliced to obtain the corresponding spliced feature maps; the spliced feature maps are: the spliced feature maps of the first to third layers of local feature maps and the first to second layers of global feature maps, the spliced feature maps of the second to third layers of local feature maps and the first to second layers of global feature maps, the spliced feature maps of the third layer of local feature maps and the first to second layers of global feature maps, the spliced feature maps of the first to second layers of global feature maps, or the second layer of global feature maps.

(2)将所述拼接特征图与对应的中间图像按元素进行叠加，得到对应的叠加特征图；所述中间图像为：所述样本叶片图像、所述第一层融合特征图、所述第二层融合特征图、所述第三层融合特征图或所述第四层融合特征图。(2) Superposing the spliced feature map and the corresponding intermediate image element by element to obtain a corresponding superimposed feature map; the intermediate image is: the sample leaf image, the first layer fusion feature map, the second layer fusion feature map, the third layer fusion feature map or the fourth layer fusion feature map.

(3)利用对应的通道批量归一化模块对所述叠加特征图进行通道批量归一化处理，并将处理后的叠加特征图作为所述引导主干网络中对应的卷积神经网络模块的输入。本发明对叠加特征图进行通道批量归一化处理，能够抑制图像中不重要的通道特征，从而实现不同主干网络之间多层级特征图的完美融合。(3) Use the corresponding channel batch normalization module to perform channel batch normalization processing on the overlay feature map, and use the processed overlay feature map as an input to the corresponding convolutional neural network module in the guided backbone network. The present invention performs channel batch normalization processing on superimposed feature maps, which can suppress unimportant channel features in images, thereby realizing the perfect fusion of multi-level feature maps between different backbone networks.

也就是说，所述将第一至第三层局部特征图、第一至第二层全局特征图及所述样本叶片图像输入至所述引导主干网络的第一个卷积神经网络模块进行特征提取，具体包括：将第一至第三层局部特征图与第一至第二层全局特征图进行拼接，得到第一拼接特征图；将所述第一拼接特征图与所述样本叶片图像按元素进行叠加，得到第一叠加特征图；利用第一个通道批量归一化模块对所述第一叠加特征图进行通道批量归一化处理后输入至所述引导主干网络的第一个卷积神经网络模块进行特征提取。That is to say, the inputting the local feature maps of the first to third layers, the global feature maps of the first to second layers, and the sample leaf image to the first convolutional neural network module of the guided backbone network for feature extraction includes: splicing the local feature maps of the first to third layers with the global feature maps of the first to second layers to obtain a first spliced feature map; superimposing the first spliced feature map and the sample leaf image by elements to obtain a first superimposed feature map; Bootstrap the first convolutional neural network module of the backbone network for feature extraction.

所述将第二至第三层局部特征图、第一至第二层全局特征图及第一层融合特征图输入至所述引导主干网络的第二个卷积神经网络模块进行特征提取，具体包括：将第二至第三层局部特征图与第一至第二层全局特征图进行拼接，得到第二拼接特征图；将所述第二拼接特征图与第一层融合特征图按元素进行叠加，得到第二叠加特征图；利用第二个通道批量归一化模块对所述第二叠加特征图进行通道批量归一化处理后输入至所述引导主干网络的第二个卷积神经网络模块进行特征提取。The input of the second to third layer local feature maps, the first to second layer global feature maps and the first layer fusion feature map to the second convolutional neural network module of the guided backbone network for feature extraction specifically includes: splicing the second to third layer local feature maps with the first to second layer global feature maps to obtain a second spliced feature map; superimposing the second spliced feature map and the first layer fusion feature map by element to obtain a second superimposed feature map; using the second channel batch normalization module to perform channel batch normalization processing on the second superimposed feature map and then input it to The second convolutional neural network module of the guided backbone network performs feature extraction.

所述将第三层局部特征图、第一至第二层全局特征图及第二层融合特征图输入至所述引导主干网络的第三个卷积神经网络模块进行特征提取，具体包括：将第三层局部特征图与第一至第二层全局特征图进行拼接，得到第三拼接特征图；将所述第三拼接特征图与第二层融合特征图按元素进行叠加，得到第三叠加特征图；利用第三个通道批量归一化模块对所述第三叠加特征图进行通道批量归一化处理后输入至所述引导主干网络的第三个卷积神经网络模块进行特征提取。The input of the third layer local feature map, the first to second layer global feature map and the second layer fusion feature map to the third convolutional neural network module of the guidance backbone network for feature extraction specifically includes: splicing the third layer local feature map and the first to second layer global feature map to obtain a third splicing feature map; superimposing the third splicing feature map and the second layer fusion feature map by element to obtain a third superposition feature map; using the third channel batch normalization module to perform channel batch normalization processing on the third superposition feature map and then input it to the guidance backbone The third convolutional neural network module of the network performs feature extraction.

所述将第一至第二层全局特征图及第三层融合特征图输入至所述引导主干网络的第四个卷积神经网络模块进行特征提取，具体包括：将第一至第二层全局特征图进行拼接，得到第四拼接特征图；将所述第四拼接特征图与第三层融合特征图按元素进行叠加，得到第四叠加特征图；利用第四个通道批量归一化模块对所述第四叠加特征图进行通道批量归一化处理后输入至所述引导主干网络的第四个卷积神经网络模块进行特征提取。The input of the first to second layer global feature maps and the third layer fusion feature map to the fourth convolutional neural network module of the guidance backbone network for feature extraction specifically includes: splicing the first to second layer global feature maps to obtain a fourth splicing feature map; superimposing the fourth splicing feature map and the third layer fusion feature map by element to obtain a fourth superposition feature map; using the fourth channel batch normalization module to perform channel batch normalization on the fourth superposition feature map and then inputting it to the fourth convolution neural network module of the guidance backbone network for feature extraction .

所述将第二层全局特征图及第四层融合特征图输入至所述引导主干网络的第五个卷积神经网络模块进行特征提取，具体包括：将第二层全局特征图作为第五拼接特征图；将所述第五拼接特征图与第四层融合特征图按元素进行叠加，得到第五叠加特征图；利用第五个通道批量归一化模块对所述第五叠加特征图进行通道批量归一化处理后输入至所述引导主干网络的第五个卷积神经网络模块进行特征提取。The inputting the second-layer global feature map and the fourth-layer fusion feature map to the fifth convolutional neural network module of the guiding backbone network for feature extraction specifically includes: using the second-layer global feature map as the fifth splicing feature map; superimposing the fifth splicing feature map and the fourth layer fusion feature map by elements to obtain the fifth superimposed feature map; using the fifth channel batch normalization module to perform channel batch normalization on the fifth superimposed feature map and then inputting it to the fifth convolutional neural network module of the guiding backbone network for feature extraction.

将S4模块提取的第二层全局特征图作为特征图S4’，S3模块提取的第一层全局特征图作为特征图S3’，S2模块提取的第三层局部特征图作为特征图S2’，S1模块提取的第二层局部特征图作为特征图S1’，S0模块提取的第一层局部特征图作为特征图S0’。The global feature map of the second layer extracted by the S4 module is used as the feature map S4', the global feature map of the first layer extracted by the S3 module is used as the feature map S3', the local feature map of the third layer extracted by the S2 module is used as the feature map S2', the local feature map of the second layer extracted by the S1 module is used as the feature map S1', and the local feature map of the first layer extracted by the S0 module is used as the feature map S0'.

进一步地，以生成第一拼接特征图为例，所述将多层辅助特征图进行拼接，得到对应的拼接特征图，具体包括：首先利用二维卷积层将特征图S4’的通道维度变成与特征图S3’的通道维度相同，然后进行线性插值上采样将特征图S4’的分辨率大小变成与特征图S3’的分辨率大小相同，使此时处理后的两幅特征图的通道维度和分辨率大小完全相同，二者按元素进行相加得到特征图S’。然后特征图S’利用二维卷积层将通道维度变成与特征图S2’相同，然后进行线性插值上采样将分辨率大小变成与特征图S2’相同，然后此时特征图S’与特征图S2’的通道维度和分辨率大小完全相同，二者按元素进行相加得到特征图S”；以下依次类推，最终拼接得到特征图S””(即第一拼接特征图)。Further, taking the generation of the first spliced feature map as an example, the splicing of the multi-layer auxiliary feature maps to obtain the corresponding spliced feature map specifically includes: firstly, using a two-dimensional convolution layer to change the channel dimension of the feature map S4' to be the same as that of the feature map S3', and then perform linear interpolation and upsampling to make the resolution of the feature map S4' the same as that of the feature map S3', so that the channel dimensions and resolutions of the two processed feature maps at this time are exactly the same, and the two are added element by element to obtain the feature map S'. Then the feature map S’ uses a two-dimensional convolutional layer to make the channel dimension the same as the feature map S2’, and then performs linear interpolation and upsampling to make the resolution size the same as the feature map S2’, and then the channel dimension and resolution of the feature map S’ and the feature map S2’ are exactly the same at this time, and the two are added element by element to obtain the feature map S”; and so on, the final splicing to get the feature map S”” (that is, the first stitching feature map).

相对应地，以生成第一叠加特征图为例，所述将所述拼接特征图与对应的中间图像按元素进行叠加，得到对应的叠加特征图，具体包括：对特征图S””利用二维卷积层将通道维度变成与中间图像(即样本叶片图像)相同，然后进行线性插值上采样将分辨率大小变成与中间图像(即样本叶片图像)相同，然后此时S””与中间图像(即样本叶片图像)的通道维度和分辨率大小完全相同，将二者按元素进行相加得到第一叠加特征图。Correspondingly, taking the generation of the first superimposed feature map as an example, the superposition of the spliced feature map and the corresponding intermediate image by element is obtained to obtain the corresponding superimposed feature map, which specifically includes: using a two-dimensional convolution layer on the feature map S"" to change the channel dimension to be the same as the intermediate image (that is, the sample leaf image), and then perform linear interpolation and upsampling to make the resolution size the same as the intermediate image (that is, the sample leaf image), and then at this time S"" and the intermediate image (that is, the sample leaf image). Fig.

步骤3：将每幅样本叶片图像的多层辅助特征图输入至图像分割网络中进行预测分割，得到每幅样本叶片图像的辅助图像分割掩码；将每幅样本叶片图像的多层融合特征图输入至图像分割网络中进行预测分割，得到每幅样本叶片图像的预测图像分割掩码。Step 3: Input the multi-layer auxiliary feature map of each sample leaf image into the image segmentation network for prediction segmentation, and obtain the auxiliary image segmentation mask of each sample leaf image; input the multi-layer fusion feature map of each sample leaf image into the image segmentation network for prediction segmentation, and obtain the prediction image segmentation mask of each sample leaf image.

图4为本发明实施例提供的叶片分割模型的整体网络结构示意图。如图4所示，在产生预测图像分割掩码时，首先，将复合主干网络提取得到的特征图L4传入到图像分割网络的编码器中的空洞空间卷积池化金字塔模块进行学习。其中，特征图L4经过空洞空间卷积池化金字塔(Atrous Spatial Pyramid Pooling，ASPP)中四种不同比率的卷积操作和全局平均池化操作后得到了五种多尺度感受野特征图。然后将五种多尺度感受野特征图进行通道维度的拼接后使用二维卷积进行聚合处理，随后使用线性插值上采样方法将特征图空间尺寸扩大为原来的4倍。其次，将复合主干网络提取得到的特征图L1使用二维卷积压缩通道维度。然后与图像分割网络的编码器中上采样后得到扩大后的特征图在通道维度进行残差拼接。并将拼接后的特征图使用二维卷积处理和像素批量归一化(PixelBatchNormalization)中的权重贡献因子抑制不重要的像素特征。最后，基于线性插值上采样方法将特征图空间尺寸扩大为输入图像的尺寸以得到预测图像分割掩码。Fig. 4 is a schematic diagram of an overall network structure of a leaf segmentation model provided by an embodiment of the present invention. As shown in Figure 4, when generating the predicted image segmentation mask, first, the feature map L4 extracted by the composite backbone network is passed into the atrous space convolution pooling pyramid module in the encoder of the image segmentation network for learning. Among them, the feature map L4 obtained five multi-scale receptive field feature maps after four different ratio convolution operations and global average pooling operations in Atrous Spatial Pyramid Pooling (ASPP). Then, the five multi-scale receptive field feature maps are concatenated in the channel dimension and then aggregated using two-dimensional convolution, and then the linear interpolation upsampling method is used to expand the spatial size of the feature map to 4 times the original size. Second, the feature map L1 extracted from the composite backbone network is compressed using two-dimensional convolution to compress the channel dimension. Then, the enlarged feature map obtained after upsampling in the encoder of the image segmentation network is residually spliced in the channel dimension. And the spliced feature map is processed by two-dimensional convolution and the weight contribution factor in PixelBatchNormalization (PixelBatchNormalization) to suppress unimportant pixel features. Finally, the spatial size of the feature map is expanded to the size of the input image based on a linear interpolation upsampling method to obtain the predicted image segmentation mask.

相对应地，在产生辅助图像分割掩码时，则是将辅助主干网络的S4模块提取的特征图传入到图像分割网络的编码器中的空洞空间卷积池化金字塔模块，即具体过程类似于对特征图L4的处理；将辅助主干网络的S1模块提取的特征图使用二维卷积压缩通道维度。然后与图像分割网络的编码器中上采样后得到扩大后的特征图在通道维度进行残差拼接，即具体过程类似于对特征图L1的处理。Correspondingly, when generating the auxiliary image segmentation mask, the feature map extracted by the S4 module of the auxiliary backbone network is passed to the hollow space convolution pooling pyramid module in the encoder of the image segmentation network, that is, the specific process is similar to the processing of the feature map L4; the feature map extracted by the S1 module of the auxiliary backbone network is compressed using two-dimensional convolution. Then, the enlarged feature map obtained by upsampling in the encoder of the image segmentation network is residually spliced in the channel dimension, that is, the specific process is similar to the processing of the feature map L1.

步骤4：根据所述真实图像分割掩码、所述辅助图像分割掩码和所述预测图像分割掩码确定综合交叉熵损失；所述综合交叉熵损失包括：所述复合主干网络的交叉熵损失和所述辅助主干网络的交叉熵损失。Step 4: Determine a comprehensive cross-entropy loss according to the real image segmentation mask, the auxiliary image segmentation mask and the predicted image segmentation mask; the comprehensive cross-entropy loss includes: the cross-entropy loss of the composite backbone network and the cross-entropy loss of the auxiliary backbone network.

步骤5：以综合交叉熵损失最小为目标，对混合神经分割网络进行训练，得到叶片分割模型。所述叶片分割模型包括训练好的复合主干网络和训练好的图像分割网络。Step 5: With the goal of minimizing the comprehensive cross-entropy loss, train the hybrid neural segmentation network to obtain the leaf segmentation model. The leaf segmentation model includes a trained composite backbone network and a trained image segmentation network.

具体地，所述综合交叉熵损失的计算公式为：Specifically, the formula for calculating the comprehensive cross-entropy loss is:

L＝L_Lead+λ·L_Assist；L＝L _Lead +λ·L _Assist ;

其中，所述复合主干网络的交叉熵损失的计算公式为：Wherein, the calculation formula of the cross-entropy loss of the composite backbone network is:

在本实施例中，通过使用两种损失函数监督和优化网络训练，能够使神经网络更快实现收敛，提高网络训练的效率。In this embodiment, by using two kinds of loss functions to supervise and optimize network training, the neural network can achieve faster convergence and improve the efficiency of network training.

图5为本发明实施例提供的复杂背景下植物叶片精准分割方法的具体流程图。如图5所示，在实际应用中，本发明提供的复杂背景下植物叶片精准分割方法的具体步骤还可以如下所示：Fig. 5 is a specific flowchart of a method for accurately segmenting plant leaves under a complex background provided by an embodiment of the present invention. As shown in Figure 5, in practical applications, the specific steps of the method for accurately segmenting plant leaves under complex backgrounds provided by the present invention can also be as follows:

步骤501：获取复杂背景下的第一类植物叶片图像和第二类植物叶片图像；其中所述第一类植物叶片图像包括健康的植物叶片图像，所述第二类植物叶片图像包括受到病虫害感染的植物叶片图像和堆叠光照不均匀的植物叶片图像。Step 501: Acquire the first type of plant leaf images and the second type of plant leaf images under complex backgrounds; wherein the first type of plant leaf images include healthy plant leaf images, and the second type of plant leaf images include plant leaf images infected by diseases and insect pests and stacked unevenly illuminated plant leaf images.

步骤502：分别对所述第一类植物叶片图像和所述第二类植物叶片图像进行去噪和标注以及扩展处理，将处理后的植物叶片图像按设定比例，分为训练数据集和验证数据集；其中，所述设定比例优选为8:2；具体的去噪过程是利用的现有的中值滤波算法；具体的扩展处理过程包括对上述得到的图片进行旋转和镜像翻转，以得到更多训练集的数据。Step 502: Perform denoising, labeling, and expansion processing on the first-type plant leaf images and the second-type plant leaf images, and divide the processed plant leaf images into a training data set and a verification data set according to a set ratio; wherein, the set ratio is preferably 8:2; the specific denoising process is the use of the existing median filter algorithm; the specific expansion process includes rotating and mirror-flipping the above-mentioned obtained pictures to obtain more training set data.

步骤503：将训练数据集中的植物叶片图像传送到复合主干网络中包含卷积层和自注意力层的辅助主干网络中提取特征图。如图2所示，首先图像经过辅助主干网络中的S0、S1、S2卷积神经网络模块提取局部和低层特征，然后将提取得到的特征图传入到辅助主干网络中的S3、S4自注意力模块继续提取图像的全局和语义特征。Step 503: Send the plant leaf images in the training data set to the auxiliary backbone network including the convolution layer and the self-attention layer in the composite backbone network to extract feature maps. As shown in Figure 2, the image first passes through the S0, S1, and S2 convolutional neural network modules in the auxiliary backbone network to extract local and low-level features, and then the extracted feature maps are passed to the S3 and S4 self-attention modules in the auxiliary backbone network to continue extracting global and semantic features of the image.

步骤504：将植物叶片图像和辅助主干网络中S0、S1、S2、S3、S4模块提取的特征图传入到基于卷积神经网络的引导主干网络中提取特征图。如图3所示，植物叶片图像在经过引导主干网络中各卷积神经网络模块自底向上提取局部特征和语义特征的同时，在短跳跃残差连接处拼接了辅助主干网络S0、S1、S2、S3、S4模块提取的特征图，以得到L0、L1、L2、L3、L4特征图，实现了多层级的高低语义特征融合。Step 504: Pass the plant leaf image and the feature maps extracted by the S0, S1, S2, S3, and S4 modules in the auxiliary backbone network to the convolutional neural network-based guided backbone network to extract feature maps. As shown in Figure 3, while extracting local features and semantic features from the bottom up through the convolutional neural network modules in the guided backbone network, the feature maps extracted by the auxiliary backbone network S0, S1, S2, S3, and S4 modules are stitched together at the short-jump residual connection to obtain L0, L1, L2, L3, and L4 feature maps, realizing multi-level fusion of high and low semantic features.

步骤505：利用通道批量归一化(Channel BatchNormalization)中的权重贡献因子抑制步骤504中得到的L0、L1、L2、L3、L4特征图中不重要的通道特征，以实现不同主干网络之间多层级特征图的完美融合。图6为本发明实施例提供的基于批量归一化的权重贡献矩阵的计算过程示意图。如图6所示，通道或像素的重要程度的计算公式如下：Step 505: Use the weight contribution factors in Channel BatchNormalization to suppress unimportant channel features in the L0, L1, L2, L3, and L4 feature maps obtained in step 504, so as to achieve perfect fusion of multi-level feature maps between different backbone networks. FIG. 6 is a schematic diagram of a calculation process of a weight contribution matrix based on batch normalization provided by an embodiment of the present invention. As shown in Figure 6, the calculation formula for the importance of a channel or pixel is as follows:

其中，γ_i表示在批量归一化中计算得到的第i个通道(或像素)的权重值，γ_j表示在批量归一化中计算得到的第j个通道(或像素)的权重值，ω_i表示第i个通道(或像素)的重要程度。Among them, γ _i represents the weight value of the i-th channel (or pixel) calculated in batch normalization, γ _j represents the weight value of the j-th channel (or pixel) calculated in batch normalization, and ω _i represents the importance of the i-th channel (or pixel).

具体地，在计算得到重要程度矩阵ω后，将经过批量归一化的特征图和ω矩阵叉乘，然后再使用sigmoid非线性函数处理叉乘后的结果，得到最终的权重贡献矩阵。权重贡献矩阵中值越大代表其越重要，值越小就代表它相对不重要。然后再将权重贡献矩阵与原始特征图进行叉乘，就达到了抑制不重要通道或者像素特征的作用。Specifically, after the importance matrix ω is calculated, the batch-normalized feature map is cross-multiplied by the ω matrix, and then the sigmoid nonlinear function is used to process the cross-multiplied result to obtain the final weight contribution matrix. The larger the value in the weight contribution matrix, the more important it is, and the smaller the value, it is relatively unimportant. Then, the weight contribution matrix is cross-multiplied with the original feature map to achieve the effect of suppressing unimportant channel or pixel features.

步骤506：将复合主干网络(具体为复合主干网络中的引导主干网络)提取得到的特征图L4传入到分割网络编码器中的空洞空间卷积池化金字塔模块进行学习。其中，特征图L4经过空洞空间卷积池化金字塔(Atrous Spatial Pyramid Pooling，ASPP)中四种不同比率的卷积操作和全局平均池化操作后得到了五种多尺度感受野特征图。然后将五种多尺度感受野特征图进行通道维度的拼接后使用二维卷积进行聚合处理。随后使用线性插值上采样方法将特征图空间尺寸扩大为原来的4倍。将复合主干网络(具体为复合主干网络中的引导主干网络)提取得到的特征图L1使用二维卷积压缩通道维度。然后与上采样后得到的特征图在通道维度进行残差拼接。并将拼接后的特征图使用二维卷积处理和像素批量归一化(Pixel BatchNormalization)中的权重贡献因子抑制不重要的像素特征。最后基于线性插值上采样方法将特征图空间尺寸扩大为输入图像的尺寸以得到预测分割掩码结果。具体过程参见图4。Step 506: Pass the feature map L4 extracted from the composite backbone network (specifically, the guided backbone network in the composite backbone network) to the atrous space convolution pooling pyramid module in the segmentation network encoder for learning. Among them, the feature map L4 obtained five multi-scale receptive field feature maps after four different ratio convolution operations and global average pooling operations in Atrous Spatial Pyramid Pooling (ASPP). Then the five multi-scale receptive field feature maps are concatenated in the channel dimension and then aggregated using two-dimensional convolution. The feature map spatial size is then expanded to 4 times the original size using a linear interpolation upsampling method. The feature map L1 obtained by extracting the composite backbone network (specifically, the guided backbone network in the composite backbone network) uses two-dimensional convolution to compress the channel dimension. Then, residual stitching is performed in the channel dimension with the feature map obtained after upsampling. And the spliced feature map is processed by two-dimensional convolution and the weight contribution factor in Pixel BatchNormalization (Pixel BatchNormalization) to suppress unimportant pixel features. Finally, based on the linear interpolation upsampling method, the spatial size of the feature map is expanded to the size of the input image to obtain the prediction segmentation mask result. See Figure 4 for the specific process.

步骤507：利用两种损失函数监督和优化网络训练。Step 507: Supervise and optimize network training using two loss functions.

复合主干网络和图像分割网络构成的混合神经分割网络通过交叉熵损失(成本)函数进行优化。A hybrid neural segmentation network consisting of a composite backbone network and an image segmentation network is optimized by a cross-entropy loss (cost) function.

在模型中，复合主干网络提取的特征图被用来产生原始交叉熵损失L_Lead。此外，辅助主干网络提取的特征图被用于产生辅助交叉熵损失L_Assist。其中，原始交叉熵损失在网络优化过程中占主导地位，而辅助交叉熵损失则在优化过程中加速了网络收敛。同时，还增加了权重来平衡辅助监督。最终，混合神经分割网络的损失函数如下所示：In the model, the feature maps extracted by the composite backbone network are used to generate the raw cross-entropy loss L _Lead . In addition, the feature maps extracted by the auxiliary backbone network are used to generate the auxiliary cross-entropy loss L _Assist . Among them, the original cross-entropy loss plays a dominant role in the network optimization process, while the auxiliary cross-entropy loss accelerates the network convergence during the optimization process. Meanwhile, weights are also added to balance auxiliary supervision. Finally, the loss function of the hybrid neural segmentation network looks like this:

L＝L_Lead+λ·L_Assist；L＝L _Lead +λ·L _Assist ;

其中，L_Lead表示复合主干网络的交叉熵损失(即原始交叉熵损失)；L_Assist表示辅助主干网络的交叉熵损失(即辅助交叉熵损失)；λ表示辅助监督的权重。Among them, L _Lead represents the cross-entropy loss of the composite backbone network (i.e., the original cross-entropy loss); L _Assist represents the cross-entropy loss of the auxiliary backbone network (i.e., the auxiliary cross-entropy loss); λ represents the weight of the auxiliary supervision.

步骤508：利用边界交并比评价叶片分割模型的分割结果。Step 508: Evaluate the segmentation result of the leaf segmentation model by using the boundary intersection ratio.

边界交并比(Boundary IoU)的计算公式如下：The calculation formula of Boundary IoU is as follows:

其中，BIoU表示边界交并比，G表示植物叶片图像的真实类别掩码，P表示植物叶片图像的预测类别掩码，d表示边界区域的像素宽度，G_d表示距离真实类别掩码边界d个像素范围内的所有像素的集合，P_d表示距离预测类别掩码边界d个像素范围内的所有像素的集合。Among them, BIoU represents the boundary intersection ratio, G represents the true category mask of the plant leaf image, P represents the predicted category mask of the plant leaf image, d represents the pixel width of the boundary area, G _d represents the set of all pixels within d pixels from the boundary of the true category mask, and P _d represents the set of all pixels within d pixels from the boundary of the predicted category mask.

综上，本发明提出一种基于卷积神经网络和Transformers网络的复杂背景下植物叶片精准分割方法，利用复合主干网络中的卷积层和自注意力层提取叶片图像的局部和全局特征，并使用多层级连接结构以有效融合高低层特征图，充分学习了复杂背景下植物叶片图像的边缘信息和多层次特征。既保留了卷积层良好的泛化能力和收敛能力，又兼顾了自注意力层广阔的全局感受野和更高的模型容量。本方法在不同光照和不同背景噪声干扰等复杂背景下，面对多种叶片状态也具有较高的分割精度和分割速度。采用本发明提供的方法提取的第一类叶片图像的分割结果和第二类叶片图像的分割结果参见图7和图8。To sum up, the present invention proposes a method for accurately segmenting plant leaves under complex backgrounds based on convolutional neural networks and Transformers networks. The convolutional layer and self-attention layer in the composite backbone network are used to extract local and global features of leaf images, and a multi-level connection structure is used to effectively fuse high-level and low-level feature maps, and the edge information and multi-level features of plant leaf images under complex backgrounds are fully learned. It not only retains the good generalization ability and convergence ability of the convolutional layer, but also takes into account the broad global receptive field and higher model capacity of the self-attention layer. This method also has high segmentation accuracy and segmentation speed in the face of various leaf states under complex backgrounds such as different illumination and different background noise interference. The segmentation results of the first type of leaf image and the segmentation results of the second type of leaf image extracted by the method provided by the present invention are shown in FIG. 7 and FIG. 8 .

图9为本发明实施例提供的复杂背景下植物叶片精准分割系统的结构图。如图9所示，本发明还提供一种复杂背景下植物叶片精准分割系统，具体包括：Fig. 9 is a structural diagram of a system for accurately segmenting plant leaves under a complex background provided by an embodiment of the present invention. As shown in Figure 9, the present invention also provides a system for accurately segmenting plant leaves under complex backgrounds, which specifically includes:

目标叶片图像获取模块901，用于获取目标叶片图像；所述目标叶片图像为复杂背景下的植物叶片图像。The target leaf image acquisition module 901 is configured to acquire a target leaf image; the target leaf image is an image of a plant leaf under a complex background.

图像分割掩码预测模块902，用于将所述目标叶片图像输入至叶片分割模型中，得到目标叶片图像的预测图像分割掩码；所述目标叶片图像的预测图像分割掩码用于分割所述目标叶片图像，从而为获取叶片表型参数提供数据基础；所述叶片分割模型由混合神经分割网络训练得到；所述混合神经分割网络包括复合主干网络和图像分割网络；训练好的复合主干网络用于对所述目标叶片图像进行特征提取，得到所述目标叶片图像的多层融合特征图；训练好的图像分割网络用于根据所述目标叶片图像的多层融合特征图确定所述目标叶片图像的预测图像分割掩码；所述复合主干网络包括多层级残差跳跃连接的辅助主干网络和引导主干网络；所述辅助主干网络基于卷积神经网络和Transformer网络构建；所述引导主干网络基于卷积神经网络构建。The image segmentation mask prediction module 902 is used to input the target leaf image into the leaf segmentation model to obtain the predicted image segmentation mask of the target leaf image; the predicted image segmentation mask of the target leaf image is used to segment the target leaf image, thereby providing a data basis for obtaining leaf phenotype parameters; the leaf segmentation model is obtained by training a hybrid neural segmentation network; the hybrid neural segmentation network includes a composite backbone network and an image segmentation network; It is used to determine the predicted image segmentation mask of the target leaf image according to the multi-layer fusion feature map of the target leaf image; the composite backbone network includes an auxiliary backbone network and a guided backbone network of multi-level residual skip connections; the auxiliary backbone network is constructed based on a convolutional neural network and a Transformer network; the guided backbone network is constructed based on a convolutional neural network.

在实际应用中，本发明提供的复杂背景下植物叶片精准分割系统的具体结构还可以如图10所示，包括：获取模块1001、预处理模块1002、特征图提取模块1003、编码器模块1004、解码器模块1005、训练和优化模块1006以及预测模块1007。其中，获取模块1001完成复杂背景下植物叶片图像的采集，然后预处理模块1002实现对叶片图像的标注去噪，特征图提取模块1003、编码器模块1004和解码器模块1005实现叶片图像的特征图提取和学习，训练和优化模块1006则实现网络的训练和优化，最后预测模块1007实现复杂背景下植物叶片图像的叶片分割掩码预测。因此本发明提供的复杂背景下植物叶片精准分割系统能够规模化实现复杂背景下植物叶片自动分割并进行结果可视化。In practical applications, the specific structure of the precise plant leaf segmentation system under complex backgrounds provided by the present invention can also be shown in Figure 10, including: an acquisition module 1001, a preprocessing module 1002, a feature map extraction module 1003, an encoder module 1004, a decoder module 1005, a training and optimization module 1006, and a prediction module 1007. Among them, the acquisition module 1001 completes the collection of plant leaf images under complex backgrounds, and then the preprocessing module 1002 realizes labeling and denoising of leaf images, the feature map extraction module 1003, encoder module 1004 and decoder module 1005 realize feature map extraction and learning of leaf images, the training and optimization module 1006 realizes network training and optimization, and finally the prediction module 1007 realizes leaf segmentation mask prediction of plant leaf images under complex backgrounds. Therefore, the precise segmentation system of plant leaves under complex backgrounds provided by the present invention can realize automatic segmentation of plant leaves under complex backgrounds on a large scale and visualize the results.

具体地，获取模块1001，用于：获取复杂背景下的第一类植物叶片图像和第二类植物叶片图像，其中所述第一类植物叶片图像包括健康的植物叶片图像，所述第二类植物叶片图像包括受到病虫害感染的植物叶片图像和堆叠光照不均匀的植物叶片图像。Specifically, the acquisition module 1001 is configured to: acquire a first type of plant leaf image and a second type of plant leaf image under a complex background, wherein the first type of plant leaf image includes a healthy plant leaf image, and the second type of plant leaf image includes a plant leaf image infected by diseases and insect pests and a stack of unevenly illuminated plant leaf images.

预处理模块1002，用于：分别对所述第一类植物叶片图像和所述第二类植物叶片图像进行去噪和标注以及扩展处理，将处理后的叶片图像按8:2的比例，分为训练数据集和验证数据集；其中具体的去噪过程，是利用的现有的中值滤波算法。以及扩展处理过程，具体包括对上述得到的图片进行旋转和镜像翻转，以得到更多训练集的数据。The preprocessing module 1002 is configured to: respectively perform denoising, labeling and expansion processing on the first type of plant leaf image and the second type of plant leaf image, and divide the processed leaf image into a training data set and a verification data set according to a ratio of 8:2; wherein the specific denoising process is to use an existing median filter algorithm. And the extended processing process, which specifically includes rotating and mirror flipping the above obtained pictures to obtain more data of the training set.

特征图提取模块1003，用于：首先将植物叶片图像传送到复合主干网络中包含卷积层和自注意力层的辅助主干网络。图像经过辅助主干网络中的S0、S1、S2卷积神经网络模块提取局部和低层特征，然后将提取得到的特征图传入到辅助主干网络中的S3、S4自注意力模块继续提取图像的全局和语义特征。接着将叶片图像和辅助主干网络中S0、S1、S2、S3、S4模块提取的特征图传入到基于卷积神经网络的引导主干网络中。叶片图像在经过引导主干网络中各卷积神经网络模块自底向上提取局部特征和语义特征的同时，在短跳跃残差连接处拼接了辅助主干网络S0、S1、S2、S3、S4模块提取的特征图，实现了多层级的高低语义特征图融合。最后利用通道批量归一化(Channel BatchNormalization)中的权重贡献因子抑制不重要的通道特征。The feature map extraction module 1003 is configured to: first transmit the plant leaf image to an auxiliary backbone network including a convolutional layer and a self-attention layer in the composite backbone network. The local and low-level features of the image are extracted through the S0, S1, and S2 convolutional neural network modules in the auxiliary backbone network, and then the extracted feature maps are passed to the S3 and S4 self-attention modules in the auxiliary backbone network to continue to extract global and semantic features of the image. Then the leaf image and the feature maps extracted by the S0, S1, S2, S3, and S4 modules in the auxiliary backbone network are passed into the guided backbone network based on the convolutional neural network. While extracting local features and semantic features from bottom to top through the convolutional neural network modules in the guided backbone network, the leaf image is spliced with the feature maps extracted by the S0, S1, S2, S3, and S4 modules of the auxiliary backbone network at the short-jump residual connection, realizing multi-level fusion of high and low semantic feature maps. Finally, the weight contribution factor in Channel BatchNormalization is used to suppress unimportant channel features.

编码器模块1004，用于：将复合主干网络中提取得到的特征图传入到图像分割网络中的编码器模块进行学习。在编码器中，复合主干网络提取得到的特征图被送入到空洞空间卷积池化金字塔(Atrous Spatial Pyramid Pooling，ASPP)中学习以获得多尺度感受野特征图，然后多尺度特征图在被拼接经过二维卷积聚合处理后使用线性插值上采样方法进行扩展。The encoder module 1004 is configured to: transfer the feature map extracted from the composite backbone network to the encoder module in the image segmentation network for learning. In the encoder, the feature map extracted by the composite backbone network is sent to the Atrous Spatial Pyramid Pooling (ASPP) for learning to obtain a multi-scale receptive field feature map, and then the multi-scale feature map is expanded using the linear interpolation upsampling method after splicing and two-dimensional convolution aggregation processing.

解码器模块1005，用于：将编码器模块1004扩展后的特征图与传入到解码器模块中的特征图在通道维度进行残差拼接，然后拼接后的特征图经过二维卷积处理和像素批量归一化(Pixel BatchNormalization)中的权重贡献因子抑制不重要的像素特征后，最后基于线性插值上采样进行扩展以得到预测分割掩码。The decoder module 1005 is configured to perform residual splicing of the feature map extended by the encoder module 1004 and the feature map passed into the decoder module in the channel dimension, and then the spliced feature map undergoes two-dimensional convolution processing and pixel batch normalization (Pixel BatchNormalization).

训练和优化模块1006，用于：利用预处理模块1002得到的训练数据集和两种损失函数对基于卷积神经网络和Transformers网络的混合神经分割网络进行训练和优化，得到叶片分割模型。具体流程见步骤503至步骤507。The training and optimization module 1006 is used for: using the training data set obtained by the preprocessing module 1002 and two kinds of loss functions to train and optimize the hybrid neural segmentation network based on the convolutional neural network and the Transformers network to obtain a leaf segmentation model. See step 503 to step 507 for the specific process.

预测模块1007，用于：将复杂背景下的第一类植物叶片图像和第二类植物叶片图像作为待处理叶片图像，输入叶片分割模型中，对所述待处理叶片图像进行分割，得到所述待处理叶片图像的预测叶片分割掩码并利用边界交并比(Boundary IoU)进行评价。The prediction module 1007 is used to: use the first type of plant leaf image and the second type of plant leaf image under the complex background as the leaf image to be processed, input it into the leaf segmentation model, segment the leaf image to be processed, obtain the predicted leaf segmentation mask of the leaf image to be processed, and evaluate using the boundary intersection ratio (Boundary IoU).

本发明公开的基于卷积神经网络和Transformers网络的复杂背景下植物叶片精准分割方法及系统，基于卷积层和自注意力层进行叶片分割。方法首先将植物叶片图像传入到复合主干网络中进行特征提取，复合主干网络由辅助主干网络和引导主干网络组成。在方法中，辅助主干网络为基于卷积神经网络和Transformers的混合神经网络，引导主干网络为卷积神经网络。植物叶片图像首先传入到辅助主干网络中进行处理，辅助主干网络包含卷积层和自注意力层。卷积层通常具有深层的非线性堆叠和归纳偏置属性，因此往往有更好的泛化能力和更快的收敛能力。而自注意力层通常将图像划分表示为若干个分块，并对其进行向量化，然后再对各分块向量进行矩阵运算，所以总是有着广阔的全局感受野和更高的模型容量。同时，在复合主干网络中，辅助主干网络和引导主干网络之间存在着并行流动的多层级交叉结构，实现了不同层级特征图的有效融合。此外，在不同主干网络之间的多层级特征图流动时，单靠卷积运算无法完美融合。本方法使用批量归一化的权重贡献因子来抑制不重要的通道或像素，以简单有效地合并多层级特征。最后，本方法引入辅助监督，它确保了辅助主干网络对分割模型输出的贡献。而且，原始损失和辅助损失的联合监督使分割模型能够学习更多的鉴别特征并快速收敛。基于上述技术，本方法在面对不同光照条件和复杂环境的遮挡问题时，仍然实现了不同状态植物叶片的快速精准分割。经实验证明，本方法在分割复杂背景下的多状态植物叶片图像时平均交并比(Mean Intersectionover Union)达到了94.0％，边界交并比(Boundary IoU)达到了60.8％。The method and system for accurately segmenting plant leaves under a complex background based on a convolutional neural network and a Transformers network disclosed by the present invention perform leaf segmentation based on a convolutional layer and a self-attention layer. Method Firstly, the plant leaf image is passed into the composite backbone network for feature extraction, and the composite backbone network is composed of an auxiliary backbone network and a guiding backbone network. In the method, the auxiliary backbone network is a hybrid neural network based on convolutional neural network and Transformers, and the guiding backbone network is a convolutional neural network. The plant leaf image is first passed to the auxiliary backbone network for processing, and the auxiliary backbone network includes a convolutional layer and a self-attention layer. Convolutional layers typically have deep non-linear stacking and inductive bias properties, and thus tend to have better generalization and faster convergence. The self-attention layer usually divides the image into several blocks and vectorizes them, and then performs matrix operations on each block vector, so it always has a broad global receptive field and a higher model capacity. At the same time, in the composite backbone network, there is a multi-level cross structure with parallel flow between the auxiliary backbone network and the guiding backbone network, which realizes the effective fusion of feature maps of different levels. In addition, when multi-level feature maps flow between different backbone networks, convolution operations alone cannot be perfectly integrated. Our method uses batch-normalized weight contribution factors to suppress unimportant channels or pixels to simply and efficiently incorporate multi-level features. Finally, the present method introduces auxiliary supervision, which ensures the contribution of the auxiliary backbone network to the output of the segmentation model. Moreover, the joint supervision of the original loss and auxiliary loss enables the segmentation model to learn more discriminative features and converge quickly. Based on the above technology, this method still achieves fast and accurate segmentation of plant leaves in different states when faced with different lighting conditions and occlusion problems in complex environments. Experiments have proved that the method achieves 94.0% Mean Intersectionover Union and 60.8% Boundary IoU when segmenting multi-state plant leaf images under complex backgrounds.

与现有技术相比，本发明提供的复杂背景下植物叶片精准分割方法及系统具有以下特点：Compared with the prior art, the method and system for accurately segmenting plant leaves under complex backgrounds provided by the present invention have the following characteristics:

1.本发明提供的叶片分割模型基于卷积神经网络和Transformers网络构建，能够实现复杂背景下植物叶片的精准分割。1. The leaf segmentation model provided by the present invention is constructed based on convolutional neural network and Transformers network, which can realize accurate segmentation of plant leaves under complex background.

2.本发明利用Transformer网络处理叶片图像以获得图像的全局上下文。2. The present invention utilizes the Transformer network to process the leaf image to obtain the global context of the image.

3.本发明使用包含卷积层和自注意力层的混合神经网络(即辅助主干网络)处理叶片图像，在保证具有全局感受野的同时也具有更好的泛化能力和更快的收敛能力。3. The present invention uses a hybrid neural network (that is, an auxiliary backbone network) including a convolutional layer and a self-attention layer to process leaf images, which has better generalization ability and faster convergence ability while ensuring a global receptive field.

4.本发明构建具有多层级结构的复合主干网络处理叶片图像，将浅层和深层的特征信息互相传递，融合了多种主干网络的特征信息。4. The present invention builds a composite backbone network with a multi-level structure to process leaf images, transfers the feature information of the shallow layer and the deep layer to each other, and integrates feature information of various backbone networks.

5.本发明在网络优化和训练中使用两种损失监督，实现了神经网络的更快收敛。5. The present invention uses two types of loss supervision in network optimization and training to achieve faster convergence of the neural network.

6.本发明搭建的神经网络可以无额外成本的加载已有的预训练权重，以实现信息的快速传递。6. The neural network built by the present invention can load the existing pre-trained weights without additional cost, so as to realize the rapid transmission of information.

7.本发明使用批量归一化(BatchNormalization)中的权重贡献因子抑制叶片图像中的不重要信息。7. The present invention uses the weight contribution factor in the batch normalization (BatchNormalization) to suppress unimportant information in the leaf image.

8.本发明使用边界交并比(Boundary Intersection over Union(BIoU))评价分割效果，更加细致地辨别了预测图像分割掩码的完整度。8. The present invention uses Boundary Intersection over Union (BIoU) to evaluate the segmentation effect, and more carefully distinguishes the integrity of the predicted image segmentation mask.

本说明书中各个实施例采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统而言，由于其与实施例公开的方法相对应，所以描述的比较简单，相关之处参见方法部分说明即可。Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other. As for the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for the related information, please refer to the description of the method part.

本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处。综上所述，本说明书内容不应理解为对本发明的限制。In this paper, specific examples are used to illustrate the principles and implementation methods of the present invention. The descriptions of the above examples are only used to help understand the core ideas of the present invention; meanwhile, for those of ordinary skill in the art, according to the ideas of the present invention, there will be changes in the specific implementation methods and application scope. In summary, the contents of this specification should not be construed as limiting the present invention.

Claims

1. a method for precise segmentation of plant leaves under a complex background, characterized in that, the precise segmentation method of plant leaves under the complex background comprises:

Obtaining a target leaf image; the target leaf image is a plant leaf image under a complex background;

The target leaf image is input into the leaf segmentation model to obtain a predicted image segmentation mask of the target leaf image; the predicted image segmentation mask of the target leaf image is used to segment the target leaf image;

The leaf segmentation model is obtained by training a hybrid neural segmentation network; the hybrid neural segmentation network includes a composite backbone network and an image segmentation network; the trained composite backbone network is used to extract features from the target leaf image to obtain a multi-layer fusion feature map of the target leaf image; the trained image segmentation network is used to determine the predicted image segmentation mask of the target leaf image according to the multi-layer fusion feature map of the target leaf image;

The composite backbone network includes an auxiliary backbone network and a guided backbone network of multi-level residual skip connections; the auxiliary backbone network is constructed based on a convolutional neural network and a Transformer network; the guided backbone network is constructed based on a convolutional neural network;

The method for determining the blade segmentation model includes:

Obtain a sample data set; the sample data set includes a plurality of sample leaf images and corresponding real image segmentation masks;

Input each sample leaf image into the composite backbone network for feature extraction, and obtain the multi-layer auxiliary feature map and multi-layer fusion feature map of each sample leaf image;

Input the multi-layer auxiliary feature map of each sample leaf image into the image segmentation network for prediction segmentation, and obtain the auxiliary image segmentation mask of each sample leaf image;

Input the multi-layer fusion feature map of each sample leaf image into the image segmentation network for prediction segmentation, and obtain the prediction image segmentation mask of each sample leaf image;

A comprehensive cross-entropy loss is determined according to the real image segmentation mask, the auxiliary image segmentation mask, and the predicted image segmentation mask; the comprehensive cross-entropy loss includes: the cross-entropy loss of the composite backbone network and the cross-entropy loss of the auxiliary backbone network;

Taking the minimum comprehensive cross-entropy loss as the goal, the hybrid neural segmentation network is trained to obtain a leaf segmentation model; the leaf segmentation model includes a trained composite backbone network and a trained image segmentation network;

Said inputting each sample leaf image into the composite backbone network for feature extraction, obtaining a multi-layer auxiliary feature map and a multi-layer fusion feature map of each sample leaf image, specifically including:

For any sample leaf image, the sample leaf image is input into the auxiliary backbone network for feature extraction to obtain a multi-layer auxiliary feature map; the multi-layer auxiliary feature map includes: a multi-layer local feature map and a multi-layer global feature map;

The multi-layer auxiliary feature map and the sample leaf image are input into the guided backbone network for feature extraction to obtain a multi-layer fusion feature map.

2. The method for accurately segmenting plant leaves under complex backgrounds according to claim 1, wherein the auxiliary backbone network comprises: three convolutional neural network modules and two self-attention network modules, and there is a residual jump connection between two adjacent network modules;

The described sample leaf image is input into the auxiliary backbone network for feature extraction to obtain a multi-layer auxiliary feature map, which specifically includes:

The sample leaf image is input to the first convolutional neural network module of the auxiliary backbone network for feature extraction to obtain the first layer of local feature maps;

The first layer of local feature maps are input to the second convolutional neural network module of the auxiliary backbone network for feature extraction to obtain the second layer of local feature maps;

The second layer of local feature maps are input to the third convolutional neural network module of the auxiliary backbone network for feature extraction to obtain the third layer of local feature maps;

Inputting the third layer of local feature maps to the first self-attention network module of the auxiliary backbone network for feature extraction to obtain the first layer of global feature maps;

The first layer of global feature map is input to the second self-attention network module of the auxiliary backbone network for feature extraction to obtain the second layer of global feature map.

3. The method for accurately segmenting plant leaves under complex backgrounds according to claim 1, wherein the guided backbone network comprises: five convolutional neural network modules, and there is a residual jump connection between two adjacent network modules;

Said inputting said multi-layer auxiliary feature map and said sample leaf image into said guiding backbone network for feature extraction to obtain a multi-layer fusion feature map, specifically comprising:

Inputting the first to third layers of local feature maps, the first to second layers of global feature maps and the sample leaf image to the first convolutional neural network module of the guided backbone network for feature extraction to obtain the first layer of fusion feature maps;

The second to third layers of local feature maps, the first to second layers of global feature maps and the first layer of fusion feature maps are input to the second convolutional neural network module of the guided backbone network for feature extraction to obtain the second layer of fusion feature maps;

The third layer of local feature maps, the first to second layer of global feature maps and the second layer of fusion feature maps are input to the third convolutional neural network module of the guided backbone network for feature extraction, and the third layer of fusion feature maps is obtained;

The first to the second layer of global feature maps and the third layer of fusion feature maps are input to the fourth convolutional neural network module of the guided backbone network for feature extraction to obtain the fourth layer of fusion feature maps;

The second layer global feature map and the fourth layer fusion feature map are input to the fifth convolutional neural network module of the guided backbone network for feature extraction to obtain the fifth layer fusion feature map.

4. The method for accurately segmenting plant leaves under complex backgrounds according to claim 3, wherein each layer of residual jump connection between the auxiliary backbone network and the guide backbone network also includes a channel batch normalization module; before inputting the multi-layer auxiliary feature map and the corresponding intermediate image to the corresponding convolutional neural network module in the guide backbone network, it also includes:

The multi-layer auxiliary feature map is spliced to obtain a corresponding spliced feature map; the spliced feature map is: a spliced feature map of the first to third layer local feature map and the first to second layer global feature map, a spliced feature map of the second to third layer local feature map and the first to second layer global feature map, a spliced feature map of the third layer local feature map and the first to second layer global feature map, a spliced feature map of the first to second layer global feature map or the second layer global feature map;

Superimpose the spliced feature map and the corresponding intermediate image element by element to obtain a corresponding superimposed feature map; the intermediate image is: the sample leaf image, the first layer fusion feature map, the second layer fusion feature map, the third layer fusion feature map or the fourth layer fusion feature map;

Using the corresponding channel batch normalization module to perform channel batch normalization processing on the overlay feature map, and use the processed overlay feature map as an input to the corresponding convolutional neural network module in the guided backbone network.

5. The method for accurately segmenting plant leaves under complex backgrounds according to claim 1, wherein the formula for calculating the comprehensive cross-entropy loss is:

L＝L _Lead +λ·L _Assist ;

Among them, L represents the comprehensive cross-entropy loss; L _Lead represents the cross-entropy loss of the composite backbone network, which is determined by the real image segmentation mask and the predicted image segmentation mask; L _Assist represents the cross-entropy loss of the auxiliary backbone network, which is determined by the real image segmentation mask and the auxiliary image segmentation mask; λ represents the weight of the auxiliary supervision.

6. The method for accurately segmenting plant leaves under complex backgrounds according to claim 1, wherein the calculation formula of the cross-entropy loss of the composite backbone network is:

The calculation formula of the cross-entropy loss of the auxiliary backbone network is:

Among them, L_leadIndicates the cross-entropy loss of the composite backbone network; L_AssistRepresents the cross-entropy loss of the auxiliary backbone network; m represents the total number of pixels of the input sample leaf image; y_iRepresents the real category value of the i-th pixel in the input sample leaf image, and the real category values of all pixels in the input sample leaf image constitute the real image segmentation mask of the input sample leaf image; p_iRepresents the predicted category value of the i-th pixel in the input sample leaf image, and the predicted category values of all pixels in the input sample leaf image constitute the predicted image segmentation mask of the input sample leaf image; q_iRepresents the auxiliary category value of the i-th pixel in the input sample leaf image, and the auxiliary category values of all pixels in the input sample leaf image constitute the auxiliary image segmentation mask of the input sample leaf image; i represents the sequence number of the pixel in the input sample leaf image.

7. The method for accurately segmenting plant leaves under complex backgrounds according to claim 1, wherein the method for obtaining the sample data set comprises:

Obtaining a leaf data set; the leaf data set includes multiple plant leaf images under complex backgrounds;

Each plant leaf image in the leaf dataset is denoised, labeled and expanded to obtain multiple sample leaf images and corresponding real image segmentation masks.

8. A system for precise segmentation of plant leaves under complex backgrounds, characterized in that the system for precise segmentation of plant leaves under complex backgrounds comprises:

A target leaf image acquisition module, configured to acquire a target leaf image; the target leaf image is a plant leaf image under a complex background;

An image segmentation mask prediction module, configured to input the target leaf image into the leaf segmentation model to obtain a predicted image segmentation mask of the target leaf image; the predicted image segmentation mask of the target leaf image is used to segment the target leaf image;

The method for determining the blade segmentation model includes: