CN113344077A

CN113344077A - Anti-noise solanaceae disease identification method based on convolution capsule network structure

Info

Publication number: CN113344077A
Application number: CN202110638097.9A
Authority: CN
Inventors: 李振波; 杨泳波; 赵远洋; 李晔
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2021-09-03

Abstract

The invention discloses an anti-noise solanaceae disease identification method based on a convolution capsule network structure, and belongs to the technical field of image classification. Establishing a common solanaceae disease data set, and manually marking; inputting an automatic extracted solanaceae disease data set; disease identification based on a deep convolutional network, carrying out feature processing and classification through the network, and dividing a data set into a training set, a verification set and a test set; storing the trained model; the output is tested on the test data set using the trained model. The invention adopts a convolution capsule network, can improve the speed and the precision, has good noise immunity, and has better noise immunity to common salt and pepper, Gaussian and fuzzy noise. Compared with the CNN, the network can achieve higher identification precision by only needing less data, and can provide feasible basis for the deployment and actual production of a hardware end.

Description

Anti-noise solanaceae disease identification method based on convolution capsule network structure

Technical Field

The invention belongs to the technical field of image classification, and particularly relates to an anti-noise solanaceae disease identification method based on a convolution capsule network structure.

Background

Solanaceous crops such as tomatoes, eggplants, medlar, potatoes, peppers and the like are widely cultivated crops and have high economic value. The diseases can affect the yield and the quality of solanaceae crops, and a large number of crops are damaged by different diseases every year, so that huge economic losses are caused. Accurate detection and identification of plant diseases are key elements affecting plant production. The crop disease recognition based on the traditional machine vision mainly comprises the steps of preprocessing crop disease images by using an image processing technology, extracting some specific features, and then classifying the extracted features by using a classifier, so that the classification recognition of crop diseases is realized. Wangliwei et al used a support vector machine with different kernel functions to identify grape leaf diseases in [1] "study of grape leaf disease identification based on computer vision. The crop disease image recognition based on the deep feature learning generally adopts deep convolution neural network learning, the features of the image are input, the learned features are automatically classified, and the network directly outputs the class probability of the image, so that the method is an end-to-end image recognition method. Nachtigall LG [2] and the like use an AlexNet model to identify six apple disease images with the number of two thousand, and the identification accuracy rate exceeds the accuracy rate of expert identification.

Disease identification based on a deep convolutional network mainly utilizes some existing network architectures to automatically extract disease characteristics, and performs characteristic processing and classification through the network. Zhongyong et al [3] based on DenseNet combined with regression, multi-label classification and focusing loss function, apple disease leaves were identified, and the accuracy of the test set was 93.51%, 93.31% and 93.71%, respectively, which are superior to the conventional cross entropy loss. Chen et al [4] use transfer learning to select VGGNet and inclusion modules pre-trained on ImageNet for rice disease identification, and the average identification precision reaches 92%. However, convolutional neural networks are poorly identifiable for noisy images and require large amounts of data to train. The capsule network is successfully applied to the field of image processing as a network which requires less training data and has strong noise resistance. Ding Yongjun et al [5] researched the identification of lily diseases based on a convolution capsule network, tested the anti-noise capability of the model by using Gaussian noise, salt and pepper noise, speckle noise and affine transformation images, and finally compared with a VGG-16 network, and the experimental result shows that the convolution capsule network is obviously superior to the VGG-16 model.

In the invention, a solanaceae disease is taken as a research object, and a solanaceae data set which comprises 5 types of 4 types of diseases and 5 types of normal leaves in total and 1835 pictures in total is self-established. Referring to a GoogLeNet network architecture, a disease identification network with certain noise immunity is provided by combining Incepison V2, SENEt, a batch normalization algorithm and a capsule network. The Inception V2 structure is combined with a SENET module and a BN layer to build a feature extraction part of the network, extracted features are transmitted to the capsule network, and parameter optimization is continuously carried out in the training process. The experimental result shows that the method has high identification precision on the self-built solanaceae disease data set and the plantavivallige public data set, has certain noise immunity to common gauss, salt and pepper and fuzzy noise, and the identification precision of the model is superior to that of a common lightweight model under the same data volume. The present invention refers to the following prior art:

1.GoogLeNet

google lenet proposes an inclusion architecture that makes good use of the computational resources in the network and increases the width and depth of the network without increasing the computational load. Meanwhile, in order to optimize the network quality, the Hebbian principle and multi-scale processing are adopted. GoogLeNet achieves good effects on classification and detection. GoogLeNet achieved the champion of ILSVRC-2014, and the error rate of top-5 reached 6.67%.

2.Batch Normalization

Batch Normalization [6] to solve the Internal Covariate Shift problem. The BN is to forcibly pull back the distribution of this input value of any neuron of each layer of neural network to a standard normal distribution with a mean value of 0 and a variance of 1 by a certain normalization means, so as to avoid the disappearance of the gradient and accelerate the convergence of the network. The best result of ImageNet classification is improved by combining BN and the inclusion structure, and the error rate of top-5 reaches 4.9 percent and exceeds the level of human beings.

3.InceptionV2

Szegedy et al propose the Inception V2[7] structure. The InceptitionV 2 structure decomposes an originally large convolution kernel into symmetrical small convolution kernels. And a large number of parameters are saved by adopting a small convolution kernel, richer spatial features can be processed, and the diversity of the features is increased. The error rate of the structure exceeds that of the top-5 method in the most advanced method on ILSVRC-2012 and reaches 5.6 percent.

4.Squeeze-and-Excitation Networks

Squeeze-and-Excitation Networks (SENet) [8] is a paper in CVPR2017, which won the champion of the last ImageNet 2017 contest classification task. SENTET focuses on the relationships between channels, explicitly models the dependencies between channels, i.e. the degree of importance of each feature channel, by a compression-excitation (SE) block, and then promotes useful features and suppresses features that are not useful for the current task according to this degree of importance. And the SEnet idea is simple and can be easily expanded in the existing network structure.

5. Capsule network

In 2017, the Geoffrey Hinton team proposed that the CapsNet [9] architecture achieved the most advanced performance on MNIST and achieved much better experimental results than the traditional CNN on MultiMNIST datasets, i.e., overlapping handwritten digit sets. Compared with the CNN, the capsule network has better noise resistance, can store some detailed information of pictures, and has lower requirements on data volume.

In order to improve the accuracy and real-time performance of identifying the solanaceae diseases and the supportability of hardware. The existing mature object identification method mainly based on deep learning in academic circles has a good effect on public data sets, but agricultural data has special characteristics and cannot be directly migrated and applied. In addition, most of the better recognition network models in the current academic world are larger, and can not process noise and fuzzy pictures, thereby failing to meet the production application of the actual mobile terminal. Therefore, the method adopts a convolution capsule network aiming at the characteristics of solanaceae diseases, can improve the speed and the precision, has good noise resistance, and has good noise resistance to common salt and pepper, Gaussian and fuzzy noise. Compared with the CNN, the network can achieve higher identification precision by only needing less data, and can provide feasible basis for the deployment and actual production of a hardware end.

Disclosure of Invention

The invention aims to provide an anti-noise solanaceae disease identification method based on a convolution capsule network structure, which is characterized by comprising the following steps of:

1) establishing a common solanaceae disease data set, automatically extracting disease characteristics by utilizing some existing network architectures, and manually marking;

2) inputting an automatic extracted solanaceae disease data set;

3) image enhancement, namely rotation, translation, turning and other operations, and enlarging a data set;

4) dividing a data set into a training set, a verification set and a test set;

5) after data is input, parallel convolutional layers are designed, convolutional kernels with different scales are used for carrying out convolution, and a BN layer is added to each convolutional layer;

6) a characteristic extraction part of the network is built by combining an inclusion structure and a SeNet module;

7) transmitting the extracted features to a capsule network to realize classification;

8) training a training set in the data set;

9) storing the trained model;

10) the output is tested on the test data set using the trained model.

The step 2) is to input a solanaceae disease data set; step 3), image enhancement is carried out, namely, after rotation, translation and turnover operations are adopted, a data set is expanded; then, in the step 4), dividing the data set into a training set, a verification set and a test set; the specific processing is that in the step 5), after data is input, parallel convolutional layers are designed, convolutional kernels with different scales are used for convolution, and a BN layer is added after each convolutional layer; that is, after the convolution in FIG. 1, Batch Normalization is adopted; the main purpose of batch normalization is to forcibly pull back the distribution of the input value of any neuron of each layer of neural network to a standard normal distribution with the mean value of 0 and the variance of 1 through a certain normalization means; this helps to speed up the convergence speed and greatly speed up the training.

The input for batch normalization is the value of x in one batch: b ═ { x _1 … x _ m }, the parameters γ, β that need to be learned; output is { y_i＝BN_γ，β(x_i)}. The specific process is as follows:

the mean and variance of batch B were first calculated:

average of x values per batch:

variance per batch:

x in the above formula_iDenotes the ith data, μ, in the batch_BRepresents the mean of the batch, and

representing the variance of the batch, then normalizing the data to obtain data with the mean value of 0 and the variance of 1

Normalization:

e represents the parameters set during normalization

I.e., the specification of BN is converted to:

where γ, β are parameters to be learned, x_iRepresenting the data in batch B.

The step 6) is to combine an inclusion structure and a SeNet module to build a network, extract partial features, reduce the parameter quantity of the model by using the inclusion structure so as to reduce the size of the model, improve the accuracy of the network by using a SENEt module, and combine the SENEt structure and the inclusion structure; the IncepotionV 2 structure was obtained.

The InceptitionV 2 structure is formed by changing 5 x 5 convolution in the InceptitionV 1 into two 3 x 3 convolutions, and in order to increase the convergence speed, a BN layer is added after the convolution in the InceptitionV 2 structure.

The step 7) transmits the extracted features to a capsule network to realize classification, and a residual error attention mechanism is introduced into the classification network, so that the local features of the target in the image can be gathered, and the detection precision is improved; i.e. starting from the relationships between the feature channels, the interdependencies between the modeled feature channels are displayed.

The step 8) trains the training set in the data set; during the training process, it is observed whether the training curve converges. The loss function here is a spacing loss function; it is specifically shown as

L_k＝T_kmax(0，m⁺-||v_k||)²+λ(1-T_k)max(0，||v_k||-m^-)²)，

L_kThe loss of the k-th digital capsule is shown, and the loss of the whole capsule network is accumulated by all the digital capsule losses. T is_kIt indicates whether or not k classes exist, and the existence is 1 and the nonexistence is 0. v. of_kIndicating the k-th capsule. When the current data is class k, namely the capsule prediction is correct, T _k1, otherwise T_k＝0。max(0，m⁺-||v_k||)²Calculating the loss of the correct predicted capsule, m⁺0.9, i.e., when the probability of prediction being correct is 0.9 or more, the term is 0; (1-T)_k)max(0，||v_k||-m^-)²Calculating the loss of the predicted error capsule, m^-Is 0.1, i.e., when the probability of prediction being correct is 0.1 or less, the term is 0. The initial lambda value was 0.5.

The convolution capsule network has the beneficial effects that the convolution capsule network is adopted, so that the speed and the precision can be improved, meanwhile, the convolution capsule network has good noise immunity, and has good noise immunity for common salt and pepper, Gaussian and fuzzy noise. Compared with the CNN, the network can achieve higher identification precision by only needing less data, and can provide feasible basis for the deployment and actual production of a hardware end.

Drawings

Fig. 1 is a schematic structural view of an input portion.

Fig. 2 is a schematic structural diagram of inclusion v 2.

Fig. 3 is a schematic diagram of the change of the 5 × 5 convolution in inclusion v1 to two 3 × 3 convolutions.

FIG. 4 is a SE-incorporation module.

FIG. 5 is a diagram of a convolutional capsule network architecture.

Fig. 6 is a flowchart of the anti-noise solanaceae disease identification based on the convolution capsule network.

Detailed Description

The invention aims to provide an anti-noise solanaceae disease identification method based on a convolution capsule network structure, and particularly relates to an anti-noise solanaceae disease identification flow chart of the convolution capsule network structure shown in figure 6; the method comprises the following steps:

1) and establishing a common solanaceae disease data set, and automatically extracting disease characteristics by using some existing network architectures. And manual labeling is carried out;

2) inputting an automatically extracted solanaceae disease data set;

4) dividing a data set into a training set, a verification set and a test set;

8) training a training set in the data set;

9) storing the trained model;

10) the output is tested on the test data set using the trained model.

The invention is further described with reference to the following figures and examples.

The step 2) is to input a solanaceae disease data set; step 3), image enhancement is carried out, namely, after rotation, translation and turnover operations are adopted, a data set is expanded; then, in the step 4), dividing the data set into a training set, a verification set and a test set; the specific processing is that in the step 5), after data is input, parallel convolutional layers are designed, convolutional kernels with different scales are used for convolution, and a BN layer is added after each convolutional layer; that is, after the convolution in FIG. 1, Batch Normalization is adopted; the main purpose of batch normalization is to forcibly pull back the distribution of the input value of any neuron of each layer of neural network to a standard normal distribution with the mean value of 0 and the variance of 1 through a certain normalization means; this helps to speed up the convergence speed and greatly speed up the training. The main operation steps of batch normalization are as follows:

inputting: value of x in one batch: b ═ x₁... m }; the parameters gamma, beta to be learned

And (3) outputting: { y_i＝BN_γ，β(x_i)}

Average of x values per batch:

variance per batch:

normalization:

and (3) specification conversion:

because the types of diseases are more, the color and texture characteristics of different diseases have larger difference; the same disease has obvious difference in different stages. Therefore, the convolution is performed on the input picture by adopting convolution kernels with different scales, so that local features of multiple scales can be extracted simultaneously, and the robustness of the network is improved (the structure of an input part is shown in fig. 1).

Step 6) is to combine an inclusion structure (shown in fig. 4) and a SeNet module to construct a network, extract partial features, reduce the parameter quantity of the model by using the inclusion structure so as to reduce the size of the model, improve the accuracy of the network by using a SeNet module, and combine the SeNet and the inclusion structure; the IncepotionV 2 structure was obtained.

The InceptitionV 2 structure is obtained by changing the 5 x 5 convolution in InceptitionV 1 to two 3 x 3 convolutions (as shown in FIG. 3). Has the following advantages: the first is that a large number of parameters can be saved; the second point is that richer spatial features can be processed, and the diversity of the features is increased; to speed up convergence, a BN layer was also added after convolution in the inclusion v2 structure.

The step 7) transmits the extracted features to a capsule network to realize classification, and a residual error attention mechanism is introduced into the classification network, so that the local features of the target in the image can be gathered, and the detection precision is improved; starting from the relationship among the characteristic channels, explicitly modeling the interdependence relationship among the characteristic channels; a brand-new 'feature recalibration' strategy is adopted to automatically acquire the importance degree of each feature channel, then the useful features are enhanced according to the importance degree, the features which are not useful for the current task are restrained, and therefore the feature channel self-adaptive calibration can be achieved. Thus, the whole network structure not only pays attention to the whole information, but also pays attention to the local information.

A schematic diagram of the combination of SENet and inclusion structures is shown in figure 3. The SENET module is mainly divided into three steps of compression, excitation and restoration; the specific flow of SENEt is as follows: assuming that the original feature map is H × W × C, first obtaining a 1 × 1 × C feature map through global pooling (compression); then using the full connection layer, the RELU activation layer, the full connection layer and the Sigmoid layer to obtain a 1 × 1 × C feature map (activation); finally, the original feature diagram size is restored (recovered); in order to better fit complex correlation among channels, greatly reduce parameter quantity and calculation quantity and increase more nonlinearity, the dimension reduction is carried out on the neuron number C divided by r when a first full connection layer is used, wherein r is the compression ratio of the channels. Then, the dimension is raised again through a second full connection layer, and a characteristic diagram of 1 multiplied by C is obtained. Furthermore, since there is correlation between channels, Sigmoid is used after the second full connectivity layer without Softmax.

The step 8) trains the training set in the data set; during the training process, it is observed whether the training curve converges. The loss function used here is a spacing loss function. It is specifically represented by the formula (1):

L_k＝T_kmax(0，m⁺-||v_k||)²+λ(1-T_k)max(0，||v_k||-m^-)² (1)

L_kthe loss of the k-th digital capsule is shown, and the loss of the whole capsule network is accumulated by all the digital capsule losses. When the current data is of the k-th class T _k1, otherwise T_k＝0。T_kmax(0，m⁺-||v_k||)²Calculating the loss of the correct predicted capsule, m⁺0.9, i.e., when the probability of prediction being correct is 0.9 or more, the term is 0; (1-T)_k)max(0，||v_k||-m^-)²Calculating the loss of the predicted error capsule, m^-0.1, i.e., when the probability of prediction being correct is 0.1 or less, the term is 0 and the initial value is 0.5.

The overall network model structure is a convolution capsule network structure diagram as shown in fig. 5, and the whole network is divided into two parts of feature extraction and capsule network.

The part marked by a red dashed box in fig. 5 represents a feature extraction part, and the part is formed by stacking a multi-scale feature extraction module, a maximum pooling layer, a convolution layer, a maximum pooling layer and two SE-inclusion modules. The multiscale feature extraction module, i.e., MultiConv shown in fig. 2, refers to the idea of inclusion, and performs feature extraction using four different convolution kernels, i.e., 1 × 1, 3 × 3, 5 × 5, and 7 × 7, respectively, where the number of convolution kernels is 32, 32, 16, and 16, respectively. . And a BN layer is added after each convolution layer in the network, so that the network convergence is accelerated, and overfitting is prevented. The red module in fig. 2 represents an SE-inclusion architecture, so that the network has a strong feature extraction capability while keeping a small number of parameters, and the channel compression ratio is set to 16.

The capsule network part mainly consists of a Reshape layer and a digital capsule layer. 5 16-dimensional capsules are adopted for processing in the layer, and the number of routing iterations is set to be 2. The parameters of a particular convolutional capsule network are shown in table 1.

Table 1 network structure table

Table 1 the first column indicates the name of each layer; the second column represents the size and step size of the convolution kernel, the dimension dim _ capsule of the capsule and the route iteration number rounding; the third column represents the output of this layer.

Experiments on the self-built disease data set show that the network has higher precision and smaller model volume. Gaussian noise was added to the raw data, and the recognition effect is shown in table 2.

TABLE 2 comparison of experimental results based on different degrees of Gaussian noise

As can be seen from table 2, the recognition accuracy of all models was degraded to some extent when the noise was enhanced. The recognition accuracy of the capsule SE-inclusion model was slightly lower than that of shefflonetv 2 when no noise was present, but the experimental accuracy of shefflonetv 2 was lower than that of the capsule SE-inclusion when gaussian noise was added, indicating that it was less robust to gaussian noise than the model presented herein. Similarly, MobileNetV1 and MobileNetV2 did not perform as well as capsule SE-inclusion under varying degrees of gaussian noise. The MobileNetV3 performed best under gaussian noise, and its anti-gaussian noise capability was superior to capsule SE-inclusion. When the Gaussian noise degree is 0.03, the identification precision of the capsule SE-inclusion is 95.51%, which is slightly higher than MobileNet V3; the recognition accuracy is 94.67% when the Gaussian noise is 0.05, which is slightly lower than 94.94% of MobileNet V3; when the gaussian noise level is 0.1, the recognition accuracy is 84.39%, which is lower than 88.48% of MobileNetV 3. In summary, the immunity of capsule SE-inclusion to Gaussian noise is better than that of MobileNet V1, MobileNet V2 and ShutteNetV 2 and is worse than that of MobileNet V3.

And (3) performing salt and pepper processing on the data set, and processing diseases on the original picture by adopting salt and pepper noises with the ratio of 0.01, 0.02 and 0.05. The accuracy of the specific recognition is shown in table 3.

TABLE 3 contrast experiment results based on different degrees of salt and pepper noise pictures

As can be seen from table 3, almost all models were recognized with a certain degree of deterioration in recognition accuracy as the level of salt and pepper noise gradually increased. When the salt and pepper noise of 0.01 is added, the identification accuracy of the capsule SE-inclusion model is slightly lower than that of MobileNet V3, and the identification accuracy is 95.23%. With increasing noise levels, capsule SE-inclusion achieved optimal results at 0.03 and 0.05, 94.33% and 77.59%, respectively. The value after the slash represents the difference between the recognition accuracy under the current noise and the recognition accuracy under the previous noise, and the value indicates that the accuracy reduction amplitude of the model on the verification set is lower than that of other models, so that the resistance of the model to salt and pepper noise is strong. MobileNetV1 and MobileNetV2 both performed less well than capsule SE-inclusion under the original pictures and varying degrees of salt and pepper noise. The recognition accuracy of capsule SE-inclusion was 77.59% higher than 75.06% of MobileNetV3 when the salt and pepper noise was 0.05, while the recognition accuracy of MobileNetV1, MobileNetV2 and ShuffleNetV2 was 64.71%, 66.99% and 36.87%, respectively. In summary, the capsule SE-inclusion has better interference resistance to salt and pepper noise than MobileNet V1, MobileNet V2, MobileNet V3 and ShutteNetV 2.

And performing salt and pepper processing on the data set, and performing fuzzy processing on the original picture by adopting salt and pepper noises with the ratios of 0.01, 0.02 and 0.05. The accuracy of the specific recognition is shown in table 4.

TABLE 4 comparison experiment results based on different degrees of fuzzy noise pictures

Comparing noise-free and 3 × 3 median filtered noise pictures, the identification accuracy of capsule SE-inclusion decreased by 2.8 percentage points, MobileNetV3 decreased by 3.7 percentage points, MobileNetV1 decreased by 5.7 percentage points, and MobileNetV2 and shuffnetv 2 both decreased by more than 10 percentage points, comparing 5 × 5 and 3 × 3 filtered kernels, capsule SE-inclusion was also optimal, and the identification accuracy decreased by 2.68%; the reduction in MobileNetV3 was 4.54%, the reduction in MobileNetV1 was 8.74%, the reduction in MobileNetV2 was 10.12%, and the reduction in ShuffleNetV2 was 12.22%. The decrease in magnitude was more gradual for all models compared to the convolution kernels of 7 x 7 and 5 x 5. In conclusion, the identification effect of the capsule SE-inclusion on the fuzzy picture is better than that of the control model MobileNet series and ShuffleNet V2.

To explore the recognition effect of the capsule SE-inclusion model under a small amount of training data, tests were performed based on the PlantVillage dataset, which was tested at a data volume of 100% -10%.

TABLE 5 recognition results based on different data volumes

Detailed experimental results show that the recognition accuracy of the model proposed by the inventor is higher than that of other control models under the same data quantity.

The convergence rate of the model with the BN layer is high, so that the calculation resources can be greatly saved. The accuracy of the network can be improved by combining the increment and the SEnet. The integration of the capsule network in turn leads to an increase in the noise immunity of the network and requires only a small amount of data. The network has higher precision and smaller body machines, and can lay a foundation for the application of mobile terminals such as single-chip microcomputers or mobile phones.

Claims

1. An anti-noise solanaceae disease identification method based on a convolution capsule network structure is characterized by comprising the following steps:

2) inputting an automatic extracted solanaceae disease data set;

4) dividing a data set into a training set, a verification set and a test set;

8) training a training set in the data set;

9) storing the trained model;

10) the output is tested on the test data set using the trained model.

2. The method for recognizing the anti-noise Solanaceae diseases based on the convolution capsule network structure as claimed in claim 1,

characterized in that the step 2) inputs a solanaceae disease data set; step 3), image enhancement is carried out, namely, after rotation, translation and turnover operations are adopted, a data set is expanded; then, in the step 4), dividing the data set into a training set, a verification set and a test set; the specific processing is that step 5) after inputting data, parallel convolution layers are designed, convolution is carried out by convolution kernels with different scales, and the data are processedAdding a BN layer after each convolution layer; that is, the convolution in FIG. 1 is followed by batch normalization using BatchNormalization; the main purpose of batch normalization is to forcibly pull back the distribution of the input value of any neuron of each layer of neural network to a standard normal distribution with the mean value of 0 and the variance of 1 through a certain normalization means; thus being beneficial to accelerating the convergence speed and greatly accelerating the training speed; the input for batch normalization is the value of x in one batch: b ═ { x _1 … x _ m }, the parameters γ, β that need to be learned; output is { y_i＝BN_γ，β(x_i) }; the specific process is as follows:

the mean and variance of batch B were first calculated:

average of x values per batch:

variance per batch:

Normalization:

e represents the parameters set during normalization

I.e., the specification of BN is converted to:

where γ, β are parameters to be learned, x_iRepresenting the data in batch B.

3. The method for recognizing the antinoise solanaceae diseases based on the convolution capsule network structure as claimed in claim 1, wherein the step 6) is to combine an inclusion structure and a Senet module to build a network, extract partial features, reduce the parameter quantity of the model by using the inclusion structure so as to reduce the size of the model, improve the accuracy of the network by using a SENet module, and combine the SENet and the inclusion structure; the IncepotionV 2 structure was obtained.

4. The method for recognizing the anti-noise solanaceae diseases based on the convolution capsule network structure is characterized in that the Incepison V2 structure is formed by changing 5 x 5 convolution in the Incepison V1 into two 3 x 3 convolutions, and a BN layer is added after the convolution in the Incepison V2 structure in order to increase the convergence speed.

5. The method for recognizing the antinoise solanaceae diseases based on the convolution capsule network structure as claimed in claim 1, wherein the step 7) is to transmit the extracted features to the capsule network to realize classification, and a residual attention mechanism is introduced into the classification network, so that local features of targets in the images can be gathered, and the detection precision is improved; i.e. starting from the relationships between the feature channels, the interdependencies between the modeled feature channels are displayed.

6. The method for recognizing the antinoise solanaceae diseases based on the convolution capsule network structure as claimed in claim 1, wherein the step 8) is to train a training set in a data set; during the training process, it is observed whether the training curve converges. The loss function here is a spacing loss function; it is specifically shown as

L_k＝T_kmax(0，m⁺-||v_k||)²+λ(1-T_k)max(0，||v_k||-m^-)²)，

L_kRepresenting the loss of the kth digital capsule, wherein the loss of the whole capsule network is formed by the accumulation of all digital capsule losses;

T_krepresents whether k types exist, the existence is 1, and the nonexistence is 0; v. of_kRepresents the k capsule; when the current data is class k, namely the capsule prediction is correct, T_k1, otherwise T_k＝0；max(0，m⁺-||v_k||)²Calculating the loss of the correct predicted capsule, m⁺0.9, i.e., when the probability of prediction being correct is 0.9 or more, the term is 0; (1-T)_k)max(0，||v_k||-m-)²Calculating the loss of the predicted error capsule, m^-Is 0.1, i.e., when the probability of prediction being correct is 0.1 or less, the term is 0 and the value of the initial λ is 0.5.