CN114821321A

CN114821321A - Blade hyperspectral image classification and regression method based on multi-scale cascade convolution neural network

Info

Publication number: CN114821321A
Application number: CN202210450076.9A
Authority: CN
Inventors: 王健; 朱逢乐; 赵章风
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2022-04-27
Filing date: 2022-04-27
Publication date: 2022-07-29
Anticipated expiration: 2042-04-27
Also published as: CN114821321B

Abstract

The invention belongs to the field of plant science research, and discloses a leaf hyperspectral image classification and regression method based on a multi-scale cascade convolution neural network, which comprises the steps of firstly, embedding expansion convolution in 3D-CNN to construct spectrum-space feature extraction structures with different scales, and realizing fusion of multi-scale features; secondly, cascading 1D-CNN after the 3D-CNN to further extract high-level abstract spectral features, and performing optimal framework exploration on the proposed multi-scale 3D-1D-CNN network; and finally, comparing the proposed multi-scale 3D-1D-CNN network model with a reference model and a multi-scale 3D-CNN model on two facility crop leaf data sets with limited samples to verify the effectiveness of the proposed method. The method is beneficial to the classification and regression of the hyperspectral images of the leaves, and can also provide new ideas and technical assistance for other image classification regression methods in the technical field of agricultural informatization.

Description

A classification and regression method for leaf hyperspectral images based on multi-scale cascaded convolutional neural networks

技术领域technical field

本发明属于植物科学研究领域，尤其涉及一种基于多尺度级联卷积神经网络的叶片高光谱图像分类和回归方法。The invention belongs to the field of plant scientific research, in particular to a leaf hyperspectral image classification and regression method based on a multi-scale cascaded convolutional neural network.

背景技术Background technique

植物叶片的光合作用、蒸腾作用等生物过程与叶片的生化参量如叶片叶绿素含量、水分含量有着密切联系。以罗勒叶片和辣椒叶片为例，罗勒是我国南方地区温室栽培较多的草本香料经济作物，其叶片可鲜食或晒干入药，罗勒叶片中叶绿素的含量直接影响罗勒的生长发育、营养水平和经济产量。辣椒属于浅根性植物，木栓化程度较高，不耐旱，严重的水分胁迫易对辣椒的生理机制造成重大的损害。植物叶片受到上述生化参量的影响，其对应的可见-近红外光谱也会表现出不同的响应特征。高光谱成像作为一种图谱合一的技术，可以同时获得图像上每个像素点的连续光谱信息和每个光谱波段的连续图像信息，其光谱维度由可见光到红外波长范围内的上百个连续波段组成，具有分辨率高、波段信息丰富、快速无损等诸多优势，已被广泛应用于植物表型研究领域。由于高光谱图像三维数据块存在光谱维度高、相邻波段信息冗余和标签样本有限等问题，而传统的高光谱图像分析过程较为繁琐、分析结果依赖于专家经验，因此我们需要建立一种高效的叶片高光谱图像分类和回归的方法，这为后续植物生化参数含量预测和胁迫诊断模型的建立奠定基础。The biological processes such as photosynthesis and transpiration in plant leaves are closely related to the biochemical parameters of leaves, such as chlorophyll content and water content in leaves. Taking basil leaves and pepper leaves as an example, basil is a herbal spice economic crop that is cultivated in greenhouses in southern my country. Its leaves can be eaten fresh or dried as medicine. The content of chlorophyll in basil leaves directly affects the growth and development of basil. economic output. Pepper is a shallow-rooted plant with a high degree of suberization and is not drought tolerant. Severe water stress can easily cause significant damage to the physiological mechanism of pepper. Plant leaves are affected by the above biochemical parameters, and their corresponding visible-near-infrared spectra will also show different response characteristics. Hyperspectral imaging, as a spectrum integration technology, can simultaneously obtain continuous spectral information of each pixel on the image and continuous image information of each spectral band, and its spectral dimension ranges from visible light to hundreds of continuous infrared wavelengths. The band composition has many advantages such as high resolution, rich band information, fast and non-destructive, and has been widely used in the field of plant phenotyping. Due to the problems of high spectral dimension, redundant adjacent band information and limited label samples in the 3D data blocks of hyperspectral images, the traditional hyperspectral image analysis process is cumbersome and the analysis results depend on expert experience. Therefore, we need to establish an efficient The method of leaf hyperspectral image classification and regression, which lays the foundation for the subsequent prediction of plant biochemical parameter content and the establishment of stress diagnosis model.

目前在植物科学研究领域，以卷积神经网络(CNN)为代表的深度学习模型越来越多地应用于高光谱图像的分析。CNN通过多层卷积和池化操作自主学习和提取数据的局部和全局特征。根据CNN的卷积核结构，可分为一维CNN(1D-CNN)，二维CNN(2D-CNN)和三维CNN(3D-CNN)。1D-CNN是当前应用最多的高光谱图像分析模型，基于一维光谱数据进行深度光谱特征的提取和建模，通常需要事先手动提取样本感兴趣区域(ROI)的平均光谱或像素点光谱，已应用于棉花种子的分类，玉米植株低温胁迫的识别，玉米植株水分含量的预测，黑枸杞总花青素的定量，韭菜叶片杀虫剂残留的检测等。2D-CNN主要提取深度空间特征，可对原始或降维后的高光谱图像进行分析建模，已应用于苹果叶片状态分类，抗病水稻种子的识别，土豆植株冠层染病区域的识别，玉米种子活力的评估等。相比于1D-CNN和2D-CNN分别在光谱和空间维度上的特征学习，3D-CNN能端到端地提取深度光谱-空间联合特征，同时保持相邻光谱波段特征的信息连续性，尤其适用于具有空谱连续性的高光谱图像三维数据块的分析。At present, in the field of plant scientific research, deep learning models represented by convolutional neural networks (CNN) are increasingly used in the analysis of hyperspectral images. CNN autonomously learns and extracts local and global features of data through multi-layer convolution and pooling operations. According to the convolution kernel structure of CNN, it can be divided into one-dimensional CNN (1D-CNN), two-dimensional CNN (2D-CNN) and three-dimensional CNN (3D-CNN). 1D-CNN is currently the most widely used hyperspectral image analysis model. It extracts and models deep spectral features based on one-dimensional spectral data. Usually, it is necessary to manually extract the average spectrum or pixel point spectrum of the sample region of interest (ROI) in advance. It is applied to the classification of cotton seeds, the identification of low temperature stress in corn plants, the prediction of water content in corn plants, the quantification of total anthocyanins in black wolfberry, and the detection of pesticide residues in leek leaves. 2D-CNN mainly extracts deep spatial features, and can analyze and model the original or dimensionally reduced hyperspectral images. It has been applied to the classification of apple leaves, the identification of disease-resistant rice seeds, the identification of diseased areas in the canopy of potato plants, and the identification of diseased areas in the canopy of potato plants. Evaluation of seed viability, etc. Compared with 1D-CNN and 2D-CNN for feature learning in spectral and spatial dimensions, respectively, 3D-CNN can extract deep spectral-spatial joint features end-to-end while maintaining the information continuity of adjacent spectral band features, especially It is suitable for the analysis of 3D data blocks of hyperspectral images with spatial spectral continuity.

目前3D-CNN在植物科学领域的应用还较少，仅有在大豆茎杆病害检测和棉花叶片蚜虫检测的报道，这是由于植物研究中标签样本的获取难度大且耗费时间多，并且3D-CNN网络参数量大、计算复杂度高，对于植物研究中有限的标签训练样本易发生过拟合现象，泛化性能降低。因此，在植物科学研究领域，亟需探究一种针对叶片高光谱图像分类和回归的改进3D-CNN模型，在训练样本有限情况下降低模型计算复杂度，同时提高模型泛化性能。At present, there are few applications of 3D-CNN in the field of plant science. There are only reports on soybean stem disease detection and cotton leaf aphid detection. This is because it is difficult and time-consuming to obtain labeled samples in plant research, and 3D-CNN The CNN network has a large number of parameters and high computational complexity. For the limited label training samples in plant research, it is prone to overfitting, and the generalization performance is reduced. Therefore, in the field of plant scientific research, it is urgent to explore an improved 3D-CNN model for leaf hyperspectral image classification and regression, which can reduce the computational complexity of the model and improve the generalization performance of the model when the training samples are limited.

发明内容SUMMARY OF THE INVENTION

本发明目的在于提供一种基于多尺度级联卷积神经网络的叶片高光谱图像分类和回归方法,以解决上述的技术问题。The purpose of the present invention is to provide a leaf hyperspectral image classification and regression method based on a multi-scale cascaded convolutional neural network to solve the above-mentioned technical problems.

为解决上述技术问题，本发明的一种基于多尺度级联卷积神经网络的叶片高光谱图像分类和回归方法的具体技术方案如下：In order to solve the above-mentioned technical problems, the specific technical scheme of a leaf hyperspectral image classification and regression method based on a multi-scale cascaded convolutional neural network of the present invention is as follows:

一种基于多尺度级联卷积神经网络的叶片高光谱图像分类和回归方法，包括以下步骤：A leaf hyperspectral image classification and regression method based on multi-scale cascaded convolutional neural network, including the following steps:

S1：搭建高光谱成像系统；S1: Build a hyperspectral imaging system;

S2：采集罗勒叶片和辣椒叶片样本的高光谱图像，并对图像预处理；S2: Collect hyperspectral images of basil leaves and pepper leaf samples, and preprocess the images;

S3：将扩张卷积引入3D-CNN，在不增加网络参数和计算复杂度的同时扩大卷积核的感受野，并构建不同尺度的谱空特征提取结构，实现多尺度特征的融合，探究出最佳S3: Introduce dilated convolution into 3D-CNN, expand the receptive field of the convolution kernel without increasing network parameters and computational complexity, and build spectral-spatial feature extraction structures of different scales to achieve multi-scale feature fusion, and explore optimal

多尺度3D-CNN网络结构；Multi-scale 3D-CNN network structure;

S4：在S3所述的最佳多尺度3D-CNN网络后级联1D-CNN网络，以三维数据作为输入，在谱空联合特征提取的基础上学习更加抽象的光谱特征，并对提出的多尺度3D-1D-CNN网络进行最优框架探索；S4: cascade 1D-CNN network after the optimal multi-scale 3D-CNN network described in S3, take 3D data as input, learn more abstract spectral features on the basis of spectral-space joint feature extraction, and analyze the proposed multi-scale features. Scale 3D-1D-CNN network for optimal framework exploration;

S5：将S4所述的多尺度3D-1D-CNN网络的最优框架与基准1D-CNN、基准2D-CNN、基准3D-CNN、多尺度3D-CNN网络模型进行比较，验证所述方法的有效性。S5: Compare the optimal framework of the multi-scale 3D-1D-CNN network described in S4 with the benchmark 1D-CNN, benchmark 2D-CNN, benchmark 3D-CNN, and multi-scale 3D-CNN network models to verify the effectiveness of the method. effectiveness.

进一步地，所述S1：搭建高光谱成像系统包括获取样本的高光谱图像，设置光谱波段范围以及空间分辨率，并对原始高光谱图像进行校正。Further, the S1: building a hyperspectral imaging system includes acquiring a hyperspectral image of the sample, setting the spectral band range and spatial resolution, and correcting the original hyperspectral image.

进一步地，所述高光谱成像系统包括SNAPSCAN VNIR光谱相机、35mm镜头、150W的环形卤钨灯光源、图像采集软件和图像采集平台；所述光谱波段范围是470-900nm，共140个波段；所述SNAPSCAN VNIR光谱相机采用内置扫描成像方式快速获取样本的高光谱图像，不需要相机或样本的相对位移，所述SNAPSCAN VNIR光谱相机的最大空间分辨率为3650×2048，数据采集中的实际空间分辨率根据叶片样本尺寸调整；所述35mm镜头与样本的垂直距离为34.5cm，曝光时间设置为30ms；所述高光谱成像系统采集到的原始高光谱图像进行如下黑白校正，以计算样本的反射率，同时减小光照度和相机暗电流对不同样本谱图获取的影响：Further, the hyperspectral imaging system includes a SNAPSCAN VNIR spectral camera, a 35mm lens, a 150W annular tungsten halogen light source, image acquisition software and an image acquisition platform; the spectral band range is 470-900 nm, with a total of 140 bands; The SNAPSCAN VNIR spectroscopic camera uses a built-in scanning imaging method to quickly acquire hyperspectral images of the sample, without the need for relative displacement of the camera or the sample. The maximum spatial resolution of the SNAPSCAN VNIR spectroscopic camera is 3650×2048. The rate is adjusted according to the size of the leaf sample; the vertical distance between the 35mm lens and the sample is 34.5cm, and the exposure time is set to 30ms; the original hyperspectral image collected by the hyperspectral imaging system is subjected to the following black and white correction to calculate the reflectance of the sample , while reducing the influence of illumination and camera dark current on the acquisition of different sample spectra:

其中R₀和R分别为校正前和校正后的高光谱图像，W和D分别为白板参照图像和暗电流参照图像。where R ₀ and R are the hyperspectral images before and after correction, respectively, and W and D are the whiteboard reference image and the dark current reference image, respectively.

进一步地，S2所述罗勒叶片，在人工LED光照栽培下的甜罗勒实施3种不同的光强处理，在罗勒生长40天时开展数据采集，测定叶绿素相对值，采集高光谱图像540张；S2所述辣椒叶片，对栽培50天长势一致的健康辣椒植株实施两种水分处理，对照组样本正常灌溉，实验组样本实施五天持续性干旱，采集600张高光谱图像；Further, for the basil leaves described in S2, 3 different light intensity treatments were performed on sweet basil under artificial LED illumination cultivation, data collection was carried out when the basil was grown for 40 days, the relative value of chlorophyll was measured, and 540 hyperspectral images were collected; The pepper leaves were described, and two water treatments were performed on healthy pepper plants that had been growing consistently for 50 days. The samples in the control group were irrigated normally, and the samples in the experimental group were subjected to continuous drought for five days, and 600 hyperspectral images were collected;

高光谱图像和相应标签数据对是通过自定义python函数来进行读取。The hyperspectral image and corresponding label data pairs are read through a custom python function.

进一步地，S2所述图像预处理包括背景分割、噪音点去除、尺寸调整；Further, the image preprocessing of S2 includes background segmentation, noise point removal, and size adjustment;

所述背景分割方式为：采用叶片和背景光谱反射率差异最大的800nm近红外波段作为背景分割的阈值波段，阈值设为0.15；The background segmentation method is as follows: the 800 nm near-infrared band with the largest difference in spectral reflectance between the leaves and the background is used as the threshold band for background segmentation, and the threshold is set to 0.15;

所述噪音点去除的方法为：利用opencv-python中的形态学变换，实现噪点去除；The method for removing noise points is: using morphological transformation in opencv-python to remove noise points;

所述尺寸调整为：将空间维统一缩减为120×120。The size adjustment is as follows: the spatial dimension is uniformly reduced to 120×120.

进一步地，S2包括将预处理后的图像划分为训练集、验证集和测试集，所述划分训练集、验证集和测试集的方式为随机抽取方式；罗勒叶片中随机抽取380个样本作为训练集，80个作为验证集，80个作为测试集；辣椒叶片中随机抽取400个样本作为训练集，100个作为验证集，100个作为测试集，并将数据集分成多个批次，批处理样本数设置为8。Further, S2 includes dividing the preprocessed image into a training set, a verification set and a test set, and the method for dividing the training set, the verification set and the test set is a random extraction method; 380 samples are randomly selected from the basil leaves as the training set. set, 80 as the validation set, 80 as the test set; randomly select 400 samples from the pepper leaves as the training set, 100 as the validation set, and 100 as the test set, and divide the data set into multiple batches, batch processing The number of samples is set to 8.

进一步地，S3所述构建不同尺度的谱空特征提取结构，是在网络的同一层并行嵌入标准3D卷积和扩张3D卷积，两种卷积核各占据50％的通道数，再将不同卷积核输出的特征图经过批归一化后在通道维度上进行拼接，并进行ReLU非线性激活。Further, the construction of spectral-spatial feature extraction structures of different scales described in S3 is to embed standard 3D convolution and expanded 3D convolution in parallel in the same layer of the network. The two convolution kernels each occupy 50% of the number of channels, and then the different The feature maps output by the convolution kernel are batch-normalized and then spliced in the channel dimension, and ReLU nonlinear activation is performed.

进一步地，S3所述扩张3D卷积通过在标准3D卷积所有行和列的相邻权值间插入d-1个权重为0的值实现，d为扩张因子，在不增加网络参数和不损失信息的同时扩大了卷积核的感受野；所述标准3D卷积的卷积核为3×3×3，d＝1，感受野为3×3×3；Further, the expanded 3D convolution described in S3 is implemented by inserting d-1 values with a weight of 0 between the adjacent weights of all rows and columns of the standard 3D convolution, where d is the expansion factor, without increasing network parameters and without While losing information, the receptive field of the convolution kernel is expanded; the convolution kernel of the standard 3D convolution is 3×3×3, d=1, and the receptive field is 3×3×3;

所述扩张3D卷积的卷积核为3×3×3，并且测试d分别为2、3、4的扩张3D卷积核结构，分别对应5×5×5、7×7×7、9×9×9的感受野，不同感受野的卷积核能够提取特征图中不同尺度的谱空特征；The convolution kernel of the dilated 3D convolution is 3×3×3, and the test d is the dilated 3D convolution kernel structure of 2, 3, and 4, respectively, corresponding to 5×5×5, 7×7×7, 9 ×9×9 receptive field, convolution kernels of different receptive fields can extract spectral-spatial features of different scales in the feature map;

所述最佳多尺度3D-CNN网络结构为拼接d＝2的3×3×3扩张3D卷积核的模型。进一步地，所述S4在最佳多尺度3D-CNN网络后级联1D-CNN网络，是在多尺度3DResNet-18网络的不同位点处进行3D-CNN转1D-CNN的模型性能测试，得到的最佳多尺度3D-1D-CNN模型；The optimal multi-scale 3D-CNN network structure is a model of splicing 3×3×3 dilated 3D convolution kernels with d=2. Further, the S4 cascades the 1D-CNN network after the optimal multi-scale 3D-CNN network, and performs the model performance test of 3D-CNN to 1D-CNN at different positions of the multi-scale 3DResNet-18 network, and obtains. The best multi-scale 3D-1D-CNN model of ;

最佳多尺度3D-CNN网络级联1D-CNN网络的基本组成单元包括卷积层、批归一化层、ReLU非线性激活层、均值池化层和全连接层，从输入层到输出层，整个网络分为3D-CNN、3D-CNN转1D-CNN、1D-CNN阶段；The best multi-scale 3D-CNN network The basic components of the cascaded 1D-CNN network include convolutional layer, batch normalization layer, ReLU nonlinear activation layer, mean pooling layer and fully connected layer, from input layer to output layer , the whole network is divided into 3D-CNN, 3D-CNN to 1D-CNN, 1D-CNN stages;

所述3D-CNN阶段，将预处理后的维度为120×120×140的高光谱图像立方体数据块作为输入，首先采用步长为2,2,1的3×3×3卷积核提取局部光谱-空间特征，产生64通道三维光谱图像，再输入连续的最佳多尺度3D-CNN网络，实现不同尺度的谱空特征提取和融合；In the 3D-CNN stage, the preprocessed hyperspectral image cube data block with a dimension of 120 × 120 × 140 is used as input, and a 3 × 3 × 3 convolution kernel with a stride of 2, 2, and 1 is used to extract local Spectral-spatial features, generate 64-channel three-dimensional spectral images, and then input the continuous optimal multi-scale 3D-CNN network to achieve spectral-spatial feature extraction and fusion of different scales;

所述3D-CNN转1D-CNN阶段，经测试，在多尺度3D ResNet-18网络中的位点⑦转换效果最佳，对3D-CNN输出的通道数×空间高度×空间宽度×波段数：256×15×15×35的三维特征图采用15×15×1的3D卷积核转换为尺寸为35的一维特征；In the 3D-CNN to 1D-CNN stage, after testing, the site ⑦ in the multi-scale 3D ResNet-18 network has the best conversion effect. The number of channels output by 3D-CNN × space height × space width × number of bands: The 256×15×15×35 three-dimensional feature map is converted into a one-dimensional feature of size 35 with a 3D convolution kernel of 15×15×1;

所述1D-CNN阶段，采用连续的尺寸为3的一维卷积核提取深度光谱特征，得到512×17的一维特征，再基于均值池化对提取的高级抽象特征进行压缩和聚合，得到表示能力更强的特征，维度为512×1，最后通过全连接层输出模型预测值，与未级联1D-CNN的最佳多尺度3D-CNN网络相比，该模型的参数量下降约35.83％。In the 1D-CNN stage, a continuous one-dimensional convolution kernel of size 3 is used to extract deep spectral features, and a one-dimensional feature of 512 × 17 is obtained, and then the extracted high-level abstract features are compressed and aggregated based on mean pooling. The features with stronger representation ability have a dimension of 512×1. Finally, the model prediction value is output through the fully connected layer. Compared with the optimal multi-scale 3D-CNN network without cascaded 1D-CNN, the parameter amount of this model decreases by about 35.83 %.

进一步地，S5所述基准1D-CNN、基准2D-CNN、基准3D-CNN、多尺度3D-CNN包括卷积层、批归一化层、ReLU非线性激活层、均值池化层和全连接层；Further, the benchmark 1D-CNN, benchmark 2D-CNN, benchmark 3D-CNN, and multi-scale 3D-CNN described in S5 include convolutional layers, batch normalization layers, ReLU nonlinear activation layers, mean pooling layers and fully connected layers. Floor;

所述基准1D-CNN是基于残差网络框架ResNet的一维卷积神经网络，在光谱维度上进行特征提取；The benchmark 1D-CNN is a one-dimensional convolutional neural network based on the residual network framework ResNet, which performs feature extraction in the spectral dimension;

所述基准2D-CNN是基于ResNet的二维卷积神经网络，在空间维度上进行特征提取；所述基准3D-CNN是基于ResNet的三维卷积神经网络，在光谱+空间上维度进行联合特征提取；The benchmark 2D-CNN is a two-dimensional convolutional neural network based on ResNet, which performs feature extraction in the spatial dimension; the benchmark 3D-CNN is a three-dimensional convolutional neural network based on ResNet, which performs joint features in the spectrum + space dimension. extract;

所述多尺度3D-CNN是S3中的最佳多尺度3D-CNN网络，能够获取不同尺度的谱空特征；The multi-scale 3D-CNN is the best multi-scale 3D-CNN network in S3, which can obtain spectral-spatial features of different scales;

模型训练均采用从头训练from scratch的方式，初始网络权重来源于随机初始化数值；Model training adopts the method of de novo training from scratch, and the initial network weights are derived from random initialization values;

回归模型的损失函数为均方误差，分类模型的损失函数为交叉熵；The loss function of regression model is mean square error, and the loss function of classification model is cross entropy;

模型采用梯度下降法优化，动量设为0.9，初始学习率变动于{1×10^-2,1×10^-3,1×10^-4,1×10^-5}，每训练30个轮次学习率下降一个数量级，总的训练轮次为80；The model is optimized by gradient descent, the momentum is set to 0.9, the initial learning rate is changed to {1×10 ^-2 , 1×10 ^-3 , 1×10 ^-4 , 1×10 ^-5 } , every 30 rounds of training rate dropped by an order of magnitude, with a total of 80 training epochs;

回归模型性能的评估指标为决定系数R²和均方根误差RMSE；The evaluation indicators of the performance of the regression model are the coefficient of determination R ² and the root mean square error RMSE;

分类模型的评估指标为F1-score和准确率。The evaluation metrics of the classification model are F1-score and accuracy.

本发明的一种基于多尺度级联卷积神经网络的叶片高光谱图像分类和回归方法具有以下优点：A kind of leaf hyperspectral image classification and regression method based on multi-scale cascaded convolutional neural network of the present invention has the following advantages:

(1)本发明针对1D-CNN和2D-CNN网络模型无法提取植物叶片高光谱图像上光谱-空间联合特征的问题，提出融合扩张卷积的多尺度3D-CNN网络框架，有利于提取植物叶片中的多尺度光谱-空间联合特征，在不增加网络参数和计算复杂度的情况下进一步提升了3D-CNN模型性能；(1) Aiming at the problem that 1D-CNN and 2D-CNN network models cannot extract spectral-spatial joint features on plant leaf hyperspectral images, the present invention proposes a multi-scale 3D-CNN network framework fused with dilated convolution, which is conducive to extracting plant leaves The multi-scale spectral-spatial joint feature in , further improves the performance of the 3D-CNN model without increasing network parameters and computational complexity;

(2)本发明针对3D-CNN网络计算复杂度高、有限样本情况下易发生过拟合以及模型泛化性能低的问题，探寻出在多尺度3D-CNN后级联1D-CNN的最优网络结构，进一步降低单纯三维卷积的网络参数量、模型计算复杂度和过拟合程度，提升模型泛化性能；(2) Aiming at the problems of high computational complexity of 3D-CNN network, easy overfitting and low model generalization performance in the case of limited samples, the present invention finds out the optimal method of cascading 1D-CNN after multi-scale 3D-CNN The network structure further reduces the amount of network parameters of simple 3D convolution, the computational complexity of the model and the degree of overfitting, and improves the generalization performance of the model;

(3)本发明提出的一种针对叶片高光谱图像分类和回归的多尺度级联卷积神经网络模型方法，有助于叶片高光谱图像的分类和回归，也可为农业信息化技术领域内的其他图像分类回归方法提供新思路和技术协助。(3) A multi-scale cascaded convolutional neural network model method for leaf hyperspectral image classification and regression proposed by the present invention is helpful for the classification and regression of leaf hyperspectral images, and can also be used in the field of agricultural information technology. Other image classification regression methods provide new ideas and technical assistance.

附图说明Description of drawings

图1为基于多尺度级联卷积神经网络的叶片高光谱图像分类和回归方法流程图；Figure 1 is a flow chart of a leaf hyperspectral image classification and regression method based on a multi-scale cascaded convolutional neural network;

图2为融合标准和扩张3D卷积的多尺度谱空联合特征提取模块示意图；Figure 2 is a schematic diagram of a multi-scale spectral-space joint feature extraction module that fuses standard and dilated 3D convolutions;

图3为高光谱图像端到端建模的最优多尺度3D-1D-CNN网络整体框架图；Figure 3 shows the overall framework of the optimal multi-scale 3D-1D-CNN network for end-to-end modeling of hyperspectral images;

图4为罗勒叶片样本SPAD值定量的多尺度3D-1D-CNN模型性能比较示意图；Figure 4 is a schematic diagram of the performance comparison of the multi-scale 3D-1D-CNN model for quantifying the SPAD value of the basil leaf sample;

图5为辣椒叶片样本干旱胁迫识别的多尺度3D-1D-CNN模型性能比较示意图；Figure 5 is a schematic diagram of the performance comparison of the multi-scale 3D-1D-CNN model for drought stress identification in pepper leaf samples;

图6为未级联1D-CNN的多尺度3D-CNN网络整体框架图；Figure 6 is the overall frame diagram of the multi-scale 3D-CNN network without cascaded 1D-CNN;

图7为罗勒叶片样本SPAD值定量的基准1D-CNN、基准2D-CNN、基准3D-CNN、多尺度3D-CNN、多尺度3D-1D-CNN的模型预测效果对比图；Figure 7 is a comparison chart of model prediction effects of benchmark 1D-CNN, benchmark 2D-CNN, benchmark 3D-CNN, multi-scale 3D-CNN, and multi-scale 3D-1D-CNN for the quantitative SPAD value of basil leaf samples;

图8为辣椒叶片样本干旱胁迫识别的基准1D-CNN、基准2D-CNN、基准3D-CNN、多尺度3D-CNN、多尺度3D-1D-CNN的模型预测效果对比图。Figure 8 is a comparison chart of model prediction effects of benchmark 1D-CNN, benchmark 2D-CNN, benchmark 3D-CNN, multi-scale 3D-CNN and multi-scale 3D-1D-CNN for drought stress identification of pepper leaf samples.

具体实施方式Detailed ways

为了更好地了解本发明的目的、结构及功能，下面结合附图，对本发明一种基于多尺度级联卷积神经网络的叶片高光谱图像分类和回归方法做进一步详细的描述。In order to better understand the purpose, structure and function of the present invention, a method for classification and regression of leaf hyperspectral images based on a multi-scale cascaded convolutional neural network of the present invention will be further described in detail below with reference to the accompanying drawings.

本发明首先，在3D-CNN中，嵌入扩张卷积构建了不同尺度的谱空特征提取结构，实现多尺度特征的融合，在不增加网络参数和计算复杂度的情况下进一步提升3D-CNN模型性能；First, in the 3D-CNN, the dilated convolution is embedded to construct a spectral-spatial feature extraction structure of different scales, which realizes the fusion of multi-scale features, and further improves the 3D-CNN model without increasing network parameters and computational complexity. performance;

其次，基于所述多尺度3D-CNN，设计一种高效的多尺度3D-1D-CNN网络结构，即在3D-CNN后级联1D-CNN以进一步提取高级抽象光谱特征，并对提出的多尺度3D-1D-CNN网络进行最优框架探索，以降低单纯三维卷积的计算复杂度和过拟合程度；Secondly, based on the multi-scale 3D-CNN, an efficient multi-scale 3D-1D-CNN network structure is designed, that is, the 1D-CNN is cascaded after the 3D-CNN to further extract high-level abstract spectral features, and the proposed multi-scale 3D-1D-CNN network structure is designed. Scale 3D-1D-CNN network for optimal framework exploration to reduce the computational complexity and overfitting of pure 3D convolution;

最后，在两个样本有限的设施作物叶片数据集上，将所提出的多尺度3D-1D-CNN网络模型与基准1D-CNN、基准2D-CNN、基准3D-CNN及多尺度3D-CNN模型进行比较，以验证所提方法的有效性。Finally, the proposed multi-scale 3D-1D-CNN network model is compared with the benchmark 1D-CNN, benchmark 2D-CNN, benchmark 3D-CNN and multi-scale 3D-CNN models on two facility crop leaf datasets with limited samples. Comparisons are made to verify the effectiveness of the proposed method.

如图1所示，本发明的一种基于多尺度级联卷积神经网络的叶片高光谱图像分类和回归方法，具体包括以下步骤：As shown in Figure 1, a method for classifying and regressing leaf hyperspectral images based on a multi-scale cascaded convolutional neural network of the present invention specifically includes the following steps:

S1：搭建可见-近红外高光谱成像系统，获取样本的高光谱图像，设置光谱波段范围以及空间分辨率，并对原始高光谱图像进行校正。S1: Build a visible-near-infrared hyperspectral imaging system, acquire hyperspectral images of samples, set the spectral band range and spatial resolution, and correct the original hyperspectral images.

可见-近红外高光谱成像系统由SNAPSCAN VNIR光谱相机(IMEC,Leuven,Belgium)、35mm镜头、150W的环形卤钨灯光源、图像采集软件(HSImager)、图像采集平台等组成；S1所述的光谱波段范围是470-900nm，共140个波段；S1所述光谱相机，采用内置扫描成像方式快速获取样本的高光谱图像，不需要相机或样本的相对位移，避免了光谱图像空间形变问题；S1所述光谱相机的最大空间分辨率为3650×2048，数据采集中的实际空间分辨率根据叶片样本尺寸调整；S1所述35mm镜头与样本的垂直距离为34.5cm，曝光时间设置为30ms；S1所述的高光谱成像系统采集到的原始高光谱图像进行如下黑白校正，以计算样本的反射率，同时减小光照度和相机暗电流对不同样本谱图获取的影响：The visible-near infrared hyperspectral imaging system consists of SNAPSCAN VNIR spectral camera (IMEC, Leuven, Belgium), 35mm lens, 150W annular tungsten halogen light source, image acquisition software (HSImager), image acquisition platform, etc.; the spectrum described in S1 The band range is 470-900nm, with a total of 140 bands; the spectroscopic camera described in S1 uses the built-in scanning imaging method to quickly acquire hyperspectral images of the sample, without the relative displacement of the camera or the sample, avoiding the problem of spatial deformation of the spectral image; The maximum spatial resolution of the spectroscopic camera is 3650×2048, and the actual spatial resolution in data collection is adjusted according to the size of the leaf sample; the vertical distance between the 35mm lens and the sample in S1 is 34.5cm, and the exposure time is set to 30ms; The original hyperspectral images collected by the hyperspectral imaging system are subjected to the following black-and-white correction to calculate the reflectance of the sample, and at the same time reduce the influence of illumination and camera dark current on the acquisition of different sample spectra:

其中R₀和R分别为校正前和校正后的高光谱图像，W和D分别为白板参照图像(漫反射率接近100％)和暗电流参照图像。where R ₀ and R are the hyperspectral images before and after correction, respectively, and W and D are the whiteboard reference image (diffuse reflectance is close to 100%) and the dark current reference image, respectively.

S2：采集罗勒叶片和辣椒叶片样本的高光谱图像，并对图像预处理，将所有图像划分为训练集、验证集和测试集；所述图像预处理包括图像背景分割、噪音点去除、尺寸调整。S2: Collect hyperspectral images of basil leaves and pepper leaf samples, and preprocess the images to divide all images into training sets, validation sets and test sets; the image preprocessing includes image background segmentation, noise point removal, and size adjustment .

罗勒叶片，在人工LED光照条件下的甜罗勒实施3种不同的光强处理，光强分别为200±5、135±4、70±5μmol m^-2s^-1，红蓝光质比均为3:1，光周期均为16小时/天，每个实验组栽培40盆，每盆栽种1株，共120株罗勒。不同实验组除光强不同外，均正常水、肥、气、热、病、虫、草管理。在罗勒生长40天时开展数据采集，对每个叶片用SPAD-502叶绿素仪(Minolta Camera Co.,Osaka,Japan)测定SPAD值，沿中叶脉的左侧和右侧分别测定3次，取6次测定的均值作为该样本的叶绿素相对值，采集高光谱图像540张，每个叶片样本高光谱图像的维度为600×800×140；Basil leaves, sweet basil under artificial LED lighting conditions were treated with three different light intensities, the light intensities were 200±5, 135±4, 70±5 μmol m ^-2 s ^-1 , and the red and blue quality ratios were both 3 : 1, the photoperiod was 16 hours/day, 40 pots were cultivated in each experimental group, 1 plant was planted in each pot, and there were 120 basil plants in total. Except for different light intensity, different experimental groups were managed normally with water, fertilizer, gas, heat, disease, insect and grass. Data collection was carried out when the basil was growing for 40 days. The SPAD value of each leaf was measured with a SPAD-502 chlorophyll meter (Minolta Camera Co., Osaka, Japan), three times along the left and right sides of the middle vein, and six times The measured mean value was taken as the relative chlorophyll value of the sample, 540 hyperspectral images were collected, and the dimension of each leaf sample hyperspectral image was 600×800×140;

辣椒叶片，对栽培50天长势一致的健康辣椒植株实施两种水分处理，对照组样本正常灌溉，对照组样本正常灌溉(滴灌600ml/天)，实验组样本实施5天持续性干旱(滴灌50ml/天)，滴灌时间为每天17：00点。每组有10盆，每盆栽种2株，共40株辣椒。不同实验组除水分施加不同外，均正常肥、气、热、光、病、虫、草管理。从每组中收集300个叶片样本，总共600张采集高光谱图像，每个叶片样本高光谱图像的维度120×200×140；For pepper leaves, two water treatments were applied to healthy pepper plants that had been growing consistently for 50 days. The samples in the control group were irrigated normally, the samples in the control group were irrigated normally (drip irrigation 600ml/day), and the experimental group samples were subjected to continuous drought for 5 days (drip irrigation 50ml/day). day), the drip irrigation time is 17:00 every day. There are 10 pots in each group, and 2 plants are planted in each pot, for a total of 40 peppers. Except for different application of water, the different experimental groups were all managed normally in terms of fertilizer, gas, heat, light, disease, insects and grass. 300 leaf samples were collected from each group, a total of 600 hyperspectral images were collected, and the dimension of each leaf sample hyperspectral image was 120×200×140;

高光谱图像和相应标签数据对是通过自定义python来进行读取；The hyperspectral image and corresponding label data pairs are read by custom python;

图像背景分割方式为：采用叶片和背景光谱反射率差异最大的800nm近红外波段作为背景分割的阈值波段，阈值设为0.15；The image background segmentation method is as follows: the 800nm near-infrared band with the largest difference in spectral reflectance between leaves and background is used as the threshold band for background segmentation, and the threshold is set to 0.15;

噪音点去除的方式为：利用opencv-python中的形态学变换，实现噪点去除；The method of noise point removal is: use the morphological transformation in opencv-python to achieve noise point removal;

尺寸调整：将空间维统一缩减为120×120；Size adjustment: uniformly reduce the space dimension to 120×120;

图像划分训练集、验证集和测试集的方式为随机抽取方式；罗勒叶片中随机抽取380个样本作为训练集，80个作为验证集，80个作为测试集；辣椒叶片中随机抽取400个样本作为训练集，100个作为验证集，100个作为测试集，并将所述数据集分成多个批次，批处理样本数设置为8。The images are divided into training set, validation set and test set by random sampling; 380 samples are randomly selected from the basil leaves as the training set, 80 are used as the validation set, and 80 are used as the test set; 400 samples are randomly selected from the pepper leaves as the The training set, 100 as the validation set, and 100 as the test set, and the dataset is divided into multiple batches, and the number of batch samples is set to 8.

S3：将扩张卷积引入3D-CNN，在不增加网络参数和计算复杂度的同时扩大卷积核的感受野，并构建不同尺度的谱空特征提取结构，实现多尺度特征的融合，探究出最佳多尺度3D-CNN网络结构，以提升3D-CNN模型性能；S3: Introduce dilated convolution into 3D-CNN, expand the receptive field of the convolution kernel without increasing network parameters and computational complexity, and build spectral-spatial feature extraction structures of different scales to achieve multi-scale feature fusion, and explore Best multi-scale 3D-CNN network structure to improve 3D-CNN model performance;

3D-CNN能端到端地提取连续的局部光谱-空间联合特征，第i层的第j个特征在坐标x,y,z处的特征值

可表示为：3D-CNN can extract continuous local spectral-spatial joint features end-to-end, the eigenvalues of the jth feature of the i-th layer at coordinates x, y, z

can be expressed as:

式中：P_i、Q_i和R_i为第i层三维卷积核的大小，

为第(i-1)层的第k个特征在(x+p)(y+q)(z+r)位置的特征值，

为对(i-1)层的第k个特征进行卷积操作的卷积核；m为(i-1)层特征的个数，b_ij为偏置，δ表示激活函数。In the formula: _Pi , Qi and Ri are the size of the three-dimensional convolution kernel of the _i - _th layer,

is the eigenvalue of the kth feature of the (i-1)th layer at the (x+p)(y+q)(z+r) position,

is the convolution kernel that performs the convolution operation on the k-th feature of the (i-1) layer; m is the number of features in the (i-1) layer, b _ij is the bias, and δ represents the activation function.

构建不同尺度的谱空特征提取结构，是在网络的同一层并行嵌入标准3D卷积和扩张3D卷积，两种卷积核各占据50％的通道数，再将不同卷积核输出的特征图经过批归一化后在通道维度上进行拼接，并进行ReLU非线性激活，实现不同尺度的谱空特征提取和融合，其框架如图2所示；The construction of spectral-space feature extraction structures of different scales is to embed standard 3D convolution and dilated 3D convolution in parallel in the same layer of the network. The two convolution kernels each occupy 50% of the number of channels, and then the features output by different convolution kernels are used. After batch normalization, the graphs are spliced in the channel dimension, and ReLU nonlinear activation is performed to achieve spectral-spatial feature extraction and fusion of different scales. The framework is shown in Figure 2;

扩张3D卷积通过在标准3D卷积所有行和列的相邻权值间插入d-1个权重为0的值实现，d为扩张因子，在不增加网络参数和不损失信息的同时扩大了卷积核的感受野；The dilated 3D convolution is realized by inserting d-1 values with a weight of 0 between the adjacent weights of all rows and columns of the standard 3D convolution, where d is the dilation factor, which expands the network parameters without increasing the network parameters and without losing information. The receptive field of the convolution kernel;

标准3D卷积的卷积核为3×3×3，d＝1，感受野为3×3×3；The convolution kernel of standard 3D convolution is 3×3×3, d=1, and the receptive field is 3×3×3;

扩张3D卷积的卷积核为3×3×3，并且测试d分别为2、3、4的扩张3D卷积核结构，分别对应5×5×5、7×7×7、9×9×9的感受野，不同感受野的卷积核能够提取特征图中不同尺度的谱空特征；The convolution kernel of the dilated 3D convolution is 3×3×3, and the test d is the dilated 3D convolution kernel structure of 2, 3, and 4, respectively, corresponding to 5×5×5, 7×7×7, 9×9 ×9 receptive field, convolution kernels of different receptive fields can extract spectral-spatial features of different scales in the feature map;

最佳多尺度3D-CNN网络结构，是将预处理后的维度为120×120×140的高光谱图像立方体数据块作为输入，首先采用步长为2,2,1的3×3×3卷积核提取局部光谱-空间特征，产生64通道三维光谱图像，再输入连续的多尺度谱空联合特征提取模块；The optimal multi-scale 3D-CNN network structure is to take the preprocessed hyperspectral image cube data block of dimension 120 × 120 × 140 as input, and first use a 3 × 3 × 3 volume with stride 2, 2, 1 The accumulation kernel extracts local spectral-spatial features, generates 64-channel three-dimensional spectral images, and then inputs the continuous multi-scale spectral-space joint feature extraction module;

经表1和表2的结果表示，拼接d＝2的3×3×3扩张3D卷积核时，所述多尺度3D-CNN的模型性能相较于其他类型效果更佳。The results in Tables 1 and 2 show that when splicing 3×3×3 dilated 3D convolution kernels with d=2, the model performance of the multi-scale 3D-CNN is better than other types.

表1罗勒叶片样本SPAD值定量的多尺度3D-CNN模型性能比较Table 1 Performance comparison of multi-scale 3D-CNN models for quantification of SPAD values of basil leaf samples

表2辣椒叶片样本干旱胁迫识别的多尺度3D-CNN模型性能比较Table 2 Performance comparison of multi-scale 3D-CNN models for drought stress recognition in pepper leaf samples

S4：在最佳多尺度3D-CNN后级联1D-CNN，以三维数据作为输入，在谱空联合特征提取的基础上学习更加抽象的光谱特征，并对提出的多尺度3D-1D-CNN网络进行最优框架探索，以降低网络参数量、模型计算复杂度及过拟合程度，提升模型泛化性能。S4: Cascade 1D-CNN after the optimal multi-scale 3D-CNN, take 3D data as input, learn more abstract spectral features on the basis of spectral-space joint feature extraction, and evaluate the proposed multi-scale 3D-1D-CNN. The network performs optimal framework exploration to reduce the amount of network parameters, the computational complexity of the model and the degree of overfitting, and to improve the generalization performance of the model.

多尺度3D-1D-CNN网络，是在多尺度3D ResNet-18网络的不同位点(①-⑩)处进行3D-CNN转1D-CNN的模型性能测试，得到的最佳多尺度3D-1D-CNN模型，如图3所示；The multi-scale 3D-1D-CNN network is a model performance test of 3D-CNN to 1D-CNN at different locations (①-⑩) of the multi-scale 3D ResNet-18 network, and the best multi-scale 3D-1D is obtained. -CNN model, as shown in Figure 3;

多尺度3D-CNN级联1D-CNN网络的基本组成单元包括卷积层、批归一化层、ReLU非线性激活层、均值池化层和全连接层。从输入层到输出层，整个网络分为3D-CNN、3D-CNN转1D-CNN、1D-CNN阶段；The basic components of the multi-scale 3D-CNN cascaded 1D-CNN network include convolutional layers, batch normalization layers, ReLU nonlinear activation layers, mean pooling layers and fully connected layers. From the input layer to the output layer, the entire network is divided into 3D-CNN, 3D-CNN to 1D-CNN, and 1D-CNN stages;

3D-CNN阶段，将预处理后的维度为120×120×140的高光谱图像立方体数据块作为输入，首先采用步长为2,2,1的3×3×3卷积核提取局部光谱-空间特征，产生64通道三维光谱图像，再输入连续的多尺度3D-CNN模块，实现不同尺度的谱空特征提取和融合；如图3所示，连续的多尺度3D-CNN模块有10层，输出通道数分别为64、64、64、64、128、128、128、128、256、256，每一层的输出通道数为下一层的输入通道数；In the 3D-CNN stage, the preprocessed hyperspectral image cube data blocks with dimensions of 120 × 120 × 140 are used as input, and the local spectra are first extracted with a 3 × 3 × 3 convolution kernel with stride 2, 2, 1- Spatial features, generate 64-channel three-dimensional spectral images, and then input continuous multi-scale 3D-CNN modules to achieve spectral spatial feature extraction and fusion of different scales; as shown in Figure 3, the continuous multi-scale 3D-CNN module has 10 layers, The number of output channels is 64, 64, 64, 64, 128, 128, 128, 128, 256, 256, and the number of output channels of each layer is the number of input channels of the next layer;

3D-CNN转1D-CNN阶段，图4和图5表示在多尺度3D ResNet-18网络中的位点⑦转换，测试集效果最佳。可以看出，刚开始随着多尺度3D卷积层的增多，两个数据集的模型训练复杂度虽然均上升，但模型预测性能均显著升高，说明在输入三维数据块上进行多尺度谱空特征提取有助于模型学习到越来越丰富的信息，但是多尺度3D卷积层增多到位点⑦时，模型复杂度继续上升，预测泛化性能反而开始下降，且出现了过拟合现象，主要是由于模型参数过多而样本数据集有限导致。因此在位点⑦，对3D-CNN输出的256×15×15×35(通道数×空间高度×空间宽度×波段数)的三维特征图采用15×15×1的3D卷积核转换为尺寸为35的一维特征。3D-CNN to 1D-CNN stage, Figure 4 and Figure 5 represent the site ⑦ transformation in the multi-scale 3D ResNet-18 network, and the test set works best. It can be seen that at the beginning, with the increase of multi-scale 3D convolutional layers, although the model training complexity of the two data sets increased, the model prediction performance increased significantly, indicating that the multi-scale spectrum was performed on the input 3D data block. Empty feature extraction helps the model to learn more and more information, but when the multi-scale 3D convolutional layers are increased to the number of locations ⑦, the model complexity continues to rise, the prediction generalization performance begins to decline, and overfitting occurs. , mainly due to too many model parameters and limited sample data sets. Therefore, at site ⑦, the 3D feature map of 256×15×15×35 (channel number×space height×space width×band number) output by 3D-CNN is converted to size using a 3D convolution kernel of 15×15×1 is a one-dimensional feature of 35.

1D-CNN阶段，采用连续的尺寸为3的一维卷积核提取深度光谱特征，得到512×17的一维特征，再基于均值池化对提取的高级抽象特征进行压缩和聚合，得到表示能力更强的特征，维度为512×1，最后通过全连接层输出模型预测值；In the 1D-CNN stage, a continuous one-dimensional convolution kernel of size 3 is used to extract deep spectral features, and a one-dimensional feature of 512×17 is obtained, and then the extracted high-level abstract features are compressed and aggregated based on mean pooling to obtain the representation capability. Stronger features, the dimension is 512 × 1, and finally the model prediction value is output through the fully connected layer;

多尺度3D-CNN级联1D-CNN的网络结构(图3)与未级联1D-CNN的多尺度3D-CNN网络(图6)相比，该模型的参数量下降约35.83％。Multi-scale 3D-CNN The network structure of cascaded 1D-CNN (Fig. 3) compared with the multi-scale 3D-CNN network without cascaded 1D-CNN (Fig. 6), the parameter amount of this model decreased by about 35.83%.

S5：将S4所述多尺度3D-1D-CNN的最优网络框架与基准1D-CNN、基准2D-CNN、基准3D-CNN、多尺度3D-CNN网络模型进行比较，以验证所提方法的有效性。S5: Compare the optimal network framework of the multi-scale 3D-1D-CNN described in S4 with the benchmark 1D-CNN, benchmark 2D-CNN, benchmark 3D-CNN, and multi-scale 3D-CNN network models to verify the performance of the proposed method. effectiveness.

作为优选，S5所述基准1D-CNN、基准2D-CNN、基准3D-CNN、多尺度3D-CNN主要包括卷积层、批归一化层、ReLU非线性激活层、均值池化层和全连接层；Preferably, the benchmark 1D-CNN, benchmark 2D-CNN, benchmark 3D-CNN, and multi-scale 3D-CNN described in S5 mainly include convolution layers, batch normalization layers, ReLU nonlinear activation layers, mean pooling layers and full connection layer;

基准1D-CNN是基于残差网络框架(ResNet)的一维卷积神经网络，在光谱维度上进行特征提取；The benchmark 1D-CNN is a one-dimensional convolutional neural network based on the residual network framework (ResNet), which performs feature extraction in the spectral dimension;

基准2D-CNN是基于ResNet的二维卷积神经网络，在空间维度上进行特征提取；The benchmark 2D-CNN is a two-dimensional convolutional neural network based on ResNet, which performs feature extraction in the spatial dimension;

基准3D-CNN是基于ResNet的三维卷积神经网络，在光谱+空间上维度进行联合特征提取；The benchmark 3D-CNN is a three-dimensional convolutional neural network based on ResNet, which performs joint feature extraction in the spectral + space dimension;

多尺度3D-CNN是S3中的最佳多尺度3D-CNN网络，能够获取不同尺度的谱空特征，避免单一尺度下特征信息的不完整表达；Multi-scale 3D-CNN is the best multi-scale 3D-CNN network in S3, which can obtain spectral space features of different scales and avoid incomplete expression of feature information at a single scale;

模型训练均采用从头训练(from scratch)的方式，初始网络权重来源于随机初始化数值；Model training is done from scratch, and the initial network weights are derived from random initialization values;

回归模型性能的评估指标为决定系数(R²)和均方根误差(RMSE)；The evaluation indicators of the performance of the regression model are the coefficient of determination (R ² ) and the root mean square error (RMSE);

相比于多尺度3D-CNN，本文提出的多尺度3D-1D-CNN最佳框架在小样本条件下不仅提升了模型性能，还明显降低了计算复杂度，如表3和表4所示：Compared with multi-scale 3D-CNN, the optimal multi-scale 3D-1D-CNN framework proposed in this paper not only improves the model performance under small sample conditions, but also significantly reduces the computational complexity, as shown in Tables 3 and 4:

表3罗勒叶片样本SPAD值定量的多尺度3D-1D-CNN模型性能比较Table 3 Performance comparison of multi-scale 3D-1D-CNN models for quantification of SPAD values of basil leaf samples

表4辣椒叶片样本干旱胁迫识别的多尺度3D-1D-CNN模型性能比较Table 4 Performance comparison of multi-scale 3D-1D-CNN models for drought stress recognition in pepper leaf samples

图7、图8直观地展示了两个叶片数据集上基准1D-CNN、基准2D-CNN、基准3D-CNN、多尺度3D-CNN、多尺度3D-1D-CNN模型预测效果对比：Figures 7 and 8 visually show the comparison of the prediction effects of the benchmark 1D-CNN, benchmark 2D-CNN, benchmark 3D-CNN, multi-scale 3D-CNN, and multi-scale 3D-1D-CNN models on the two leaf datasets:

基准1D-CNN模型在数据输入层就损失了大量的谱空信息，导致模型学习不充分，即欠拟合；The benchmark 1D-CNN model loses a lot of spectral space information at the data input layer, resulting in insufficient model learning, that is, underfitting;

基准2D-CNN模型简化了数据预处理过程，提升了高光谱图像分析流程的自动化程度，但2D-CNN仅聚焦于空间特征的提取；The benchmark 2D-CNN model simplifies the data preprocessing process and improves the automation of the hyperspectral image analysis process, but the 2D-CNN only focuses on the extraction of spatial features;

基准3D-CNN模型在光谱+空间维度的联合特征提取能获取更加有效和细致的局部抽象特征，同时保留光谱特征的信息连续性，充分适用于三维高光谱图像块本身的数据特点，但是3D-CNN模型参数量大、计算复杂度高；The joint feature extraction of the benchmark 3D-CNN model in the spectral + spatial dimension can obtain more effective and detailed local abstract features, while retaining the information continuity of spectral features, which is fully suitable for the data characteristics of the 3D hyperspectral image block itself, but 3D- The CNN model has a large number of parameters and high computational complexity;

多尺度3D-CNN模型在整体上能提升模型预测效果，且不增加模型的参数量和计算复杂度，但数据集上出现了测试集结果明显低于验证集结果的现象，即过拟合程度高；The multi-scale 3D-CNN model can improve the prediction effect of the model as a whole, without increasing the parameter quantity and computational complexity of the model, but there is a phenomenon in the data set that the test set results are significantly lower than the validation set results, that is, the degree of overfitting high;

可以看出，相比于基准1D-CNN、基准2D-CNN、基准3D-CNN、多尺度3D-CNN，本文所提出的多尺度3D-1D-CNN模型在回归测试集的R²有22.46％、8.15％、4.30％和2％的提升，在分类测试集的准确率有28.56％、16.59％、8.49％和4.13％的提升，证明了所设计网络模型在小样本条件下的有效性，能够降低单纯三维卷积的网络参数量、模型计算复杂度和过拟合程度，提升模型泛化性能。It can be seen that, compared with the benchmark 1D-CNN, benchmark 2D-CNN, benchmark 3D-CNN, and multi-scale 3D-CNN, the multi-scale 3D-1D-CNN model proposed in this paper has 22.46% R ² in the regression test set , 8.15%, 4.30%, and 2%, and the accuracy of the classification test set was improved by 28.56%, 16.59%, 8.49%, and 4.13%, which proved the effectiveness of the designed network model under the condition of small samples. Reduce the amount of network parameters, model calculation complexity and overfitting degree of simple 3D convolution, and improve model generalization performance.

综上所述，本发明技术效果显著，对叶片高光谱图像分类和回归方法的发展有较好的技术贡献，在植物科学研究领域应用前景广阔，经济效益可观。To sum up, the technical effect of the present invention is remarkable, and it has a good technical contribution to the development of leaf hyperspectral image classification and regression methods, and has broad application prospects and considerable economic benefits in the field of plant scientific research.

可以理解，本发明是通过一些实施例进行描述的，本领域技术人员知悉的，在不脱离本发明的精神和范围的情况下，可以对这些特征和实施例进行各种改变或等效替换。另外，在本发明的教导下，可以对这些特征和实施例进行修改以适应具体的情况及材料而不会脱离本发明的精神和范围。因此，本发明不受此处所公开的具体实施例的限制，所有落入本申请的权利要求范围内的实施例都属于本发明所保护的范围内。It can be understood that the present invention is described by some embodiments, and those skilled in the art know that various changes or equivalent substitutions can be made to these features and embodiments without departing from the spirit and scope of the present invention. In addition, in the teachings of this invention, these features and embodiments may be modified to adapt a particular situation and material without departing from the spirit and scope of the invention. Therefore, the present invention is not limited by the specific embodiments disclosed herein, and all embodiments falling within the scope of the claims of the present application fall within the protection scope of the present invention.

Claims

1. A leaf hyperspectral image classification and regression method based on a multi-scale cascade convolution neural network is characterized by comprising the following steps:

s1: constructing a hyperspectral imaging system;

s2: collecting hyperspectral images of basil leaf samples and pepper leaf samples, and preprocessing the images;

s3: introducing the expanded convolution into the 3D-CNN, expanding the receptive field of a convolution kernel without increasing network parameters and calculation complexity, constructing spectrum space feature extraction structures with different scales, realizing the fusion of multi-scale features, and exploring the optimal multi-scale 3D-CNN network structure;

s4: cascading the 1D-CNN network behind the optimal multi-scale 3D-CNN network described in S3, learning more abstract spectral features on the basis of spectrum-space combined feature extraction by taking three-dimensional data as input, and performing optimal frame exploration on the proposed multi-scale 3D-1D-CNN network;

s5: and S4, comparing the optimal framework of the multi-scale 3D-1D-CNN network with the reference 1D-CNN, the reference 2D-CNN, the reference 3D-CNN and the multi-scale 3D-CNN network model, and verifying the effectiveness of the method.

2. The method for classifying and regressing vane hyperspectral images based on the multi-scale cascade convolutional neural network of claim 1, wherein the step S1: the method comprises the steps of building a hyperspectral imaging system, obtaining a hyperspectral image of a sample, setting a spectral band range and spatial resolution, and correcting an original hyperspectral image.

3. The blade hyperspectral image classification and regression method based on the multiscale cascade convolution neural network is characterized in that the hyperspectral imaging system comprises an SNAPSCAN VNIR spectrum camera, a 35mm lens, a 150W annular halogen tungsten lamp light source, image acquisition software and an image acquisition platform; the spectral band range is 470-900nm, and 140 bands are total; the SNAPSCAN VNIR spectral camera adopts a built-in scanning imaging mode to rapidly acquire a hyperspectral image of a sample without relative displacement of the camera or the sample, the maximum spatial resolution of the SNAPSCAN VNIR spectral camera is 3650 x 2048, and the actual spatial resolution in data acquisition is adjusted according to the size of a blade sample; the vertical distance between the 35mm lens and the sample is 34.5cm, and the exposure time is set to be 30 ms; the method comprises the following steps of performing black-and-white correction on an original hyperspectral image acquired by the hyperspectral imaging system to calculate the reflectivity of a sample, and simultaneously reducing the influence of illuminance and camera dark current on acquisition of spectrograms of different samples:

wherein R is ₀ And R are hyperspectral images before and after correction respectively, and W and D are a white board reference image and a dark current reference image respectively.

4. The blade hyperspectral image classification and regression method based on the multiscale cascade convolution neural network is characterized in that sweet basil blades cultured under artificial LED illumination are subjected to 3 different light intensity treatments in S2, data acquisition is carried out when the basil grows for 40 days, the relative value of chlorophyll is measured, and 540 hyperspectral images are acquired; s2, carrying out two kinds of water treatment on healthy pepper plants which grow uniformly for 50 days, normally irrigating control group samples, continuously drying experimental group samples for five days, and collecting 600 hyperspectral images;

the hyperspectral image and corresponding tag data pair are read by a custom python function.

5. The method for classifying and regressing vane hyperspectral images based on the multi-scale cascade convolutional neural network as claimed in claim 1, wherein the image preprocessing of S2 comprises background segmentation, noise point removal and size adjustment;

the background segmentation mode comprises the following steps: adopting an 800nm near-infrared band with the largest difference between the spectral reflectances of the blade and the background as a threshold band for background segmentation, wherein the threshold is set to be 0.15;

the method for removing the noise point comprises the following steps: removing noise points by using morphological transformation in opencv-python;

the size adjustment is as follows: the spatial dimension is uniformly reduced to 120 x 120.

6. The method for classifying and regressing hyperspectral images of leaves based on a multi-scale cascade convolutional neural network as claimed in claim 1, wherein S2 comprises dividing the preprocessed images into a training set, a validation set and a test set, wherein the method for dividing the training set, the validation set and the test set is a random extraction method; 380 samples are randomly extracted from the basil leaves to serve as a training set, 80 samples are taken as a verification set, and 80 samples are taken as a test set; 400 samples in pepper leaves are randomly extracted as a training set, 100 samples are taken as a verification set, 100 samples are taken as a test set, the data set is divided into a plurality of batches, and the number of batch processing samples is set to be 8.

7. The blade hyperspectral image classification and regression method based on the multi-scale cascaded convolutional neural network as claimed in claim 1, wherein the constructing of the spectral-spatial feature extraction structures of different scales in S3 is to embed a standard 3D convolution and an expanded 3D convolution in parallel in the same layer of the network, the two convolution kernels occupy 50% of the number of channels respectively, and then splice feature maps output by the different convolution kernels in channel dimensions after batch normalization, and perform ReLU nonlinear activation.

8. The blade hyperspectral image classification and regression method based on the multiscale cascade convolution neural network is characterized in that the expansion 3D convolution is realized by inserting D-1 values with weight of 0 between adjacent weights of all rows and columns of the standard 3D convolution, wherein D is an expansion factor, and the receptive field of a convolution kernel is expanded while network parameters are not increased and information is not lost; the convolution kernel of the standard 3D convolution is 3 × 3 × 3, D =1, and the receptive field is 3 × 3 × 3; the convolution kernel of the expansion 3D convolution is 3 multiplied by 3, and the test D is the expansion 3D convolution kernel structures of 2, 3 and 4 respectively, which correspond to the receptive fields of 5 multiplied by 5, 7 multiplied by 7 and 9 multiplied by 9 respectively, and the convolution kernels of different receptive fields can extract the spectral space characteristics of different scales in the characteristic diagram;

the optimal multi-scale 3D-CNN network structure is a model of a 3 × 3 × 3 expanded 3D convolution kernel with splice D = 2.

9. The method for classifying and regressing vane hyperspectral images based on the multi-scale cascaded convolutional neural network as claimed in claim 1, wherein the S4 cascades the 1D-CNN network after the optimal multi-scale 3D-CNN network, and is an optimal multi-scale 3D-1D-CNN model obtained by performing model performance test of converting 3D-CNN into 1D-CNN at different positions of the multi-scale 3D ResNet-18 network;

the basic composition unit of the optimal multi-scale 3D-CNN network cascade 1D-CNN network comprises a convolution layer, a batch normalization layer, a ReLU nonlinear activation layer, a mean pooling layer and a full connection layer, wherein the whole network is divided into stages of 3D-CNN, 3D-CNN-to-1D-CNN and 1D-CNN from an input layer to an output layer;

in the 3D-CNN stage, a preprocessed hyperspectral image cube data block with the dimensionality of 120 x 140 is used as input, firstly, a 3 x 3 convolution kernel with the step length of 2,2,1 is adopted to extract local spectrum-space characteristics to generate a 64-channel three-dimensional spectrum image, and then, a continuous optimal multi-scale 3D-CNN network is input to realize extraction and fusion of spectrum-space characteristics with different scales;

the 3D-CNN to 1D-CNN stage is tested to be positioned in a multi-scale 3D ResNet-18 network

The conversion effect is optimal, and the channel number, the space height, the space width and the wave band number output by the 3D-CNN are as follows: the 256 × 15 × 15 × 35 three-dimensional feature map is converted into a one-dimensional feature with the size of 35 by using a 15 × 15 × 1 3D convolution kernel;

in the 1D-CNN stage, the depth spectral features are extracted by adopting continuous one-dimensional convolution kernels with the size of 3 to obtain 512 x 17 one-dimensional features, the extracted high-level abstract features are compressed and aggregated based on mean pooling to obtain features with higher expression capacity, the dimensionality is 512 x 1, and finally, a model predicted value is output through a full-connection layer, and compared with an optimal multi-scale 3D-CNN network which is not cascaded with 1D-CNN, the parameter quantity of the model is reduced by about 35.83%.

10. The method for blade hyperspectral image classification and regression based on the multi-scale cascaded convolutional neural network as claimed in claim 1, wherein S5 the benchmark 1D-CNN, the benchmark 2D-CNN, the benchmark 3D-CNN, and the multi-scale 3D-CNN comprise a convolutional layer, a batch normalization layer, a ReLU nonlinear activation layer, a mean pooling layer, and a full connection layer;

the reference 1D-CNN is a one-dimensional convolution neural network based on a residual error network framework ResNet, and feature extraction is carried out on spectral dimensions;

the reference 2D-CNN is a two-dimensional convolution neural network based on ResNet, and feature extraction is carried out on the spatial dimension;

the reference 3D-CNN is a three-dimensional convolution neural network based on ResNet, and performs combined feature extraction on the dimensionality of the spectrum + space;

the multi-scale 3D-CNN is the optimal multi-scale 3D-CNN network in S3, and can acquire spectrum space characteristics of different scales;

the model training adopts a mode of training from scratch from the beginning, and the initial network weight comes from a random initialization value;

the loss function of the regression model is mean square error, and the loss function of the classification model is cross entropy;

the model is optimized by a gradient descent method, the momentum is set to be 0.9, and the initial learning rate is changed to be {1 multiplied by 10 ⁻² , 1×10 ⁻³ , 1×10 ⁻⁴ , 1×10 ⁻⁵ The learning rate is reduced by one order of magnitude in each 30 training rounds, and the total training round is 80;

the evaluation index of regression model performance is the coefficient of determination R ² And root mean square error, RMSE;

the classification model was evaluated for F1-score and accuracy.