CN114219672A - Vegetable disease diagnosis method and device, electronic device and storage medium - Google Patents

Vegetable disease diagnosis method and device, electronic device and storage medium Download PDF

Info

Publication number
CN114219672A
CN114219672A CN202111424433.6A CN202111424433A CN114219672A CN 114219672 A CN114219672 A CN 114219672A CN 202111424433 A CN202111424433 A CN 202111424433A CN 114219672 A CN114219672 A CN 114219672A
Authority
CN
China
Prior art keywords
vegetable disease
vegetable
disease diagnosis
feature set
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111424433.6A
Other languages
Chinese (zh)
Inventor
张领先
徐畅
李凯雨
丁俊琦
朱昕怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN202111424433.6A priority Critical patent/CN114219672A/en
Publication of CN114219672A publication Critical patent/CN114219672A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Forestry; Mining
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01GHORTICULTURE; CULTIVATION OF VEGETABLES, FLOWERS, RICE, FRUIT, VINES, HOPS OR SEAWEED; FORESTRY; WATERING
    • A01G13/00Protecting plants
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Optimization (AREA)
  • Human Resources & Organizations (AREA)
  • Mathematical Analysis (AREA)
  • Agronomy & Crop Science (AREA)
  • Animal Husbandry (AREA)
  • Marine Sciences & Fisheries (AREA)
  • Mining & Mineral Resources (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Economics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Toxicology (AREA)
  • Environmental Sciences (AREA)
  • Medical Informatics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a vegetable disease diagnosis method, a device, an electronic device and a storage medium, wherein the vegetable disease diagnosis method comprises the following steps: acquiring vegetable disease prescription source data, and screening the vegetable disease prescription source data to obtain an experimental data set; cleaning and coding the experimental data set to obtain an initial feature set; selecting an initial feature set based on a method of combining recursive feature elimination and a gradient ascending decision tree to obtain a target feature set; and inputting the target feature set into a vegetable disease diagnosis model to obtain the category of vegetable diseases. The vegetable disease diagnosis method, the device, the electronic equipment and the storage medium provided by the invention can overcome the defects of low vegetable disease diagnosis efficiency and difficulty in image data acquisition in the prior art, and assist in efficient and accurate diagnosis of vegetable diseases by using multi-source information contained in prescription data, thereby avoiding misdiagnosis.

Description

Vegetable disease diagnosis method and device, electronic device and storage medium
Technical Field
The invention relates to the technical field of crop disease diagnosis, in particular to a vegetable disease diagnosis method and device, electronic equipment and a storage medium.
Background
Vegetables are crops with high susceptibility to diseases, such as tomatoes, and there are 3 types of common tomato diseases, such as: tomato virus disease, tomato late blight and tomato gray mold, the affected parts of the tomato include roots, stem bases, stems, tender leaves/branches, leaves, flowers, fruits/grains, whole plants, tender buds and the like, and the tomato has different expressions at the corresponding affected parts due to different diseases. At present, whether the corresponding diseases of vegetables such as tomatoes occur or not is determined and diagnosed by observing the characteristics of all parts of the vegetables such as tomatoes or using computer vision assistance, misdiagnosis is easy to occur, and image data is difficult to acquire. Meanwhile, the method is a brand-new and unexplored research content in the aspect of quick and accurate diagnosis of diseases assisted by information mined by crop prescriptions in the agricultural field. How to effectively mine the internal relation among all information and assist accurate diagnosis is an urgent problem to be solved.
Disclosure of Invention
The invention provides a vegetable disease diagnosis method, a device, electronic equipment and a storage medium, which are used for solving the defects of low vegetable disease diagnosis efficiency and difficulty in image data acquisition in the prior art, and assisting in efficient and accurate diagnosis of vegetable diseases by using multi-source information contained in prescription data to avoid the occurrence of misdiagnosis.
The invention provides a vegetable disease diagnosis method, which comprises the following steps:
acquiring vegetable disease prescription source data, and screening the vegetable disease prescription source data to obtain an experimental data set;
cleaning and coding the experimental data set to obtain an initial feature set;
selecting an initial feature set based on a method of combining recursive feature elimination and a gradient ascending decision tree to obtain a target feature set;
and inputting the target feature set into a vegetable disease diagnosis model to obtain the category of vegetable diseases.
According to the vegetable disease diagnosis method provided by the invention, the experimental data set is cleaned and coded to obtain an initial feature set, and the method comprises the following steps:
deleting repeated values, uniformizing and deleting abnormal values of the experimental data set to obtain a cleaned experimental data set;
and performing label coding and one-hot coding on the cleaned experimental data set to obtain an initial feature set.
According to the vegetable disease diagnosis method provided by the invention, the method for selecting the initial feature set based on the combination of recursive feature elimination and gradient ascending decision tree obtains a target feature set, and comprises the following steps:
extracting the initial feature set based on a recursive feature elimination method to obtain an intermediate feature set;
and performing cross validation on the subset of the intermediate feature set based on a gradient ascending decision tree method, and selecting the optimal number of features as the target feature set.
The vegetable disease diagnosis method provided by the invention further comprises the following steps:
and optimizing the hyper-parameters of the vegetable disease diagnosis model based on a Bayesian optimization algorithm.
According to the vegetable disease diagnosis method provided by the invention, the vegetable disease diagnosis model is optimized based on the Bayesian optimization algorithm, and the method comprises the following steps:
modeling an objective function corresponding to the vegetable disease diagnosis model, and calculating the mean value and the variance of function values corresponding to each sampling point of the objective function;
determining an acquisition function based on the mean value and the variance of the function value corresponding to each sampling point of the target function, and selecting the sampling point of the iteration based on the acquisition function;
and iteratively reducing the observation area corresponding to the sampling point of the iteration based on the sampling point of the iteration until the optimal solution of the hyper-parameters of the vegetable disease diagnosis model is determined from the observation area, so as to complete the optimization of the hyper-parameters of the vegetable disease diagnosis model.
According to the vegetable disease diagnosis method provided by the invention, the vegetable disease diagnosis model is obtained by training the LightGBM model by taking the characteristics corresponding to the vegetable disease prescription source data as a sample and the vegetable disease category corresponding to the vegetable disease prescription source data as a sample label.
The present invention also provides a vegetable disease diagnosis device, comprising:
the data set construction module is used for acquiring vegetable disease prescription source data and screening the vegetable disease prescription source data to obtain an experimental data set;
the data set processing module is used for cleaning and coding the experimental data set to obtain an initial feature set;
the characteristic selection module is used for selecting the initial characteristic set based on a method of combining recursive characteristic elimination and a gradient ascending decision tree to obtain a target characteristic set;
and the disease diagnosis module is used for inputting the target feature set into a vegetable disease diagnosis model to obtain vegetable disease categories.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of any one of the vegetable disease diagnosis methods.
The present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the vegetable disease diagnosis methods described above.
The present invention also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of any of the above-described vegetable disease diagnosis methods.
According to the vegetable disease diagnosis method, the vegetable disease diagnosis device, the electronic equipment and the storage medium, the vegetable disease prescription source data are obtained, and the vegetable disease prescription source data are screened to obtain an experimental data set; cleaning and coding the experimental data set to obtain an initial feature set; selecting an initial feature set based on a method of combining recursive feature elimination and a gradient ascending decision tree to obtain a target feature set; and inputting the target feature set into a vegetable disease diagnosis model to obtain the category of vegetable diseases.
According to the vegetable disease diagnosis method provided by the invention, vegetable disease prescription source data are obtained, the vegetable disease prescription source data are processed to finally obtain an initial characteristic set, a target characteristic set is obtained on the basis of the initial characteristic set, and the target characteristic set is input into a trained vegetable disease diagnosis model for diagnosis and identification to obtain vegetable disease categories. Plant protection personnel are not needed to participate in the whole process, only the farmers fill in disease related information, intelligent identification can be carried out through prescription big data analysis and a vegetable disease diagnosis model, the defects that vegetable disease diagnosis efficiency is low and image data acquisition is difficult in the prior art can be overcome, multi-source information contained in prescription data is utilized, efficient and accurate diagnosis of vegetable diseases is assisted, and the condition of misdiagnosis is avoided.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a vegetable disease diagnosis method provided by the present invention;
FIG. 2 is a flow chart of the data preprocessing of the recipe source of vegetable diseases in the vegetable disease diagnosis method provided by the present invention;
FIG. 3 is a flow chart of obtaining a target feature set in the method for diagnosing vegetable diseases according to the present invention;
FIG. 4 is a flow chart of a Bayesian optimization-based vegetable disease diagnosis model provided by the invention;
FIG. 5 is a second schematic flow chart of the method for diagnosing vegetable diseases according to the present invention;
FIG. 6 is a schematic structural view of a vegetable disease diagnosing apparatus according to the present invention;
fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The vegetable disease diagnosis method, apparatus, electronic device, and storage medium of the present invention are described below with reference to fig. 1 to 7.
As shown in fig. 1, the present invention provides a method for diagnosing a vegetable disease, comprising:
and 110, acquiring vegetable disease prescription source data, and screening the vegetable disease prescription source data to obtain an experimental data set.
It is understood that the vegetable in the present invention may be tomato, or other vegetables. For example, the recipe source data of the vegetable diseases can include tomato virus disease data, tomato late blight data and tomato gray mold data.
The method selects prescription data of a plurality of diseases of tomatoes with the time from 3 and 25 in 2019 to 11 and 19 in 2020 as a data source of vegetable disease prescription information, namely vegetable disease prescription source data from a prescription database. Each prescription data contains the following information: the disease control method comprises the following steps of county, time, development stage, affected part, area, specific gravity, main symptoms, inquiry record, field symptom distribution, diagnosis result, name of pesticide opened and quantity of pesticide opened, wherein the total number of the 7344 prescription data is obtained, and the specific data statistical result of each disease is shown in table 1.
Table 1 statistical data presentation
Disease name Data volume (bar) Ratio (%)
Viral disease of tomato 2607 35.5
Late blight of tomato 3248 44.2
Gray mold of tomato 1489 20.3
Total of 7344 100
And step 120, cleaning and coding the experimental data set to obtain an initial feature set.
It can be understood that the experimental data set is obtained based on the vegetable disease prescription source data, so that a part of redundant data exists in the data in the experimental data set and needs to be cleaned.
In this embodiment, referring to fig. 2, a vegetable disease prescription source data preprocessing flow includes the main steps of: the method comprises the steps of sorting and converting source data files to obtain an experiment data set, cleaning the experiment data set (deleting repeated values, processing missing values, performing consistency processing and performing abnormal value processing), performing data statistics, and finally encoding output data (label encoding and One-hot encoding) to obtain an initial feature set.
The label coding can be that 12 urban areas corresponding to the attributes of the geographic areas are converted into label values of 1-12, 6 stages of seedling stage, growth stage, flowering stage, fruiting stage, maturation stage and harvest stage corresponding to the attributes of the development stages are converted into label values of 1-6, and 3 specific gravities of mild occurrence, moderate occurrence and severe occurrence corresponding to the attributes of the specific gravities are converted into label values of 1-3.
One-hot encoding is a process for converting class variables into a form which is easy to utilize by a machine learning algorithm, and is beneficial to loss function or accuracy calculation. Because the description of the producer on the distribution of the affected part, the main symptoms and the field symptoms is displayed as multiple choices, the multiple choice values of the three attributes are subjected to One-hot coding, and the root, the stem base, the stem, the tender leaf/branch, the leaf, the flower, the fruit/grain, the whole plant and the tender bud of the affected part have 10 choice values and 27 choice values of wilting, dwarfing, fallen leaves, flowers and leaves, vesicular and rotting and the like and 9 choice values of local distribution, scattered distribution, line distribution, field edge, uniform distribution and the like are coded.
And step 130, selecting the initial feature set based on a method of combining recursive feature elimination and a gradient ascending decision tree to obtain a target feature set.
It is appreciated that, in order to avoid the problem that the presence of a large number of extraneous, redundant or noisy features in a crop disease prescription data set can cause dimensional disasters and affect classifier performance, feature selection provides an effective solution by eliminating extraneous, redundant data, which can reduce computation time, improve learning accuracy, and promote better understanding of the learning model or data.
And 140, inputting the target feature set into a vegetable disease diagnosis model to obtain the vegetable disease category.
It can be understood that the vegetable disease diagnosis model is a trained neural network model.
In some embodiments, the washing and encoding the experimental data set to obtain an initial feature set includes:
deleting repeated values, uniformizing and deleting abnormal values of the experimental data set to obtain a cleaned experimental data set;
and performing label coding and one-hot coding on the cleaned experimental data set to obtain an initial feature set.
It can be appreciated that in order to avoid the problem that the existence of a large number of irrelevant, redundant or noisy features in the experimental data set can cause dimension disasters and affect the performance of the classifier, feature selection provides an effective solution by deleting irrelevant redundant data, which can reduce the computation time, improve the learning accuracy, and promote a better understanding of the learning model (i.e., the vegetable disease diagnosis model) or data.
In some embodiments, the selecting an initial feature set based on a method combining recursive feature elimination and a gradient ascending decision tree to obtain a target feature set includes:
extracting the initial feature set based on a recursive feature elimination method to obtain an intermediate feature set;
and performing cross validation on the subset of the intermediate feature set based on a gradient ascending decision tree method, and selecting the optimal number of features as the target feature set.
It can be understood that, in this embodiment, a recursive feature elimination method in the Wrapper mode is implemented by using a python programming language, data features of crop disease locations are extracted, and meanwhile, an optimal feature subset capable of obtaining the highest classification accuracy, that is, the above-mentioned target feature set, is found by combining with a gdbt (gradient Boosting Decision tree) model. Based on a method combining recursive feature elimination and a gradient ascending decision tree, an initial feature set is selected to obtain a target feature set, which is divided into two stages, as shown in fig. 3:
(1) recursive Feature Elimination (RFE): continuously training the GDBT model, carrying out importance rating on the features according to feature _ attributes, deleting one or more least important features after each training is finished, then training the screened feature set again until a complete feature set is traversed, and finally screening out important feature variables, wherein the number of the important feature variables is d.
(2) Cross Validation (CV): for a feature set with a number d, the number of all subsets is 2dAnd 1, sequentially inputting all subsets consisting of different feature quantities into a GDBT classifier, wherein the feature subset output according to the highest classification Accuracy (Accuracy) is the optimal feature subset.
Further, 32 characteristics are finally obtained through characteristic selection, including the belonged region, the development stage, the occurrence area, the occurrence specific gravity, the affected part (characteristics of the whole plant, fruit/grain, root and the like), the main symptoms (characteristics of branches, insect/mite discovery, leaf spot and the like), and the field distribution symptoms (characteristics of individual plants, scattered distribution and the like).
In some embodiments, the vegetable disease diagnosis method further comprises:
and optimizing the hyper-parameters of the vegetable disease diagnosis model based on a Bayesian optimization algorithm.
It can be understood that bayesian optimization is a method for solving a function extremum problem for which an expression is unknown, and is applied to a parameter combination optimization problem. The method can determine the next search point by utilizing the searched information, improves the quality of the result and the search speed, is more effective than grid search and random search, and has the advantages of less iteration times, small parameter granularity and the like. Bayesian optimization, see fig. 4.
In some embodiments, the optimizing the hyper-parameters of the vegetable disease diagnosis model based on the bayesian optimization algorithm includes:
modeling an objective function corresponding to the vegetable disease diagnosis model, and calculating the mean value and the variance of function values corresponding to each sampling point of the objective function;
determining an acquisition function based on the mean value and the variance of the function value corresponding to each sampling point of the target function, and selecting the sampling point of the iteration based on the acquisition function;
and iteratively reducing the observation area corresponding to the sampling point of the iteration based on the sampling point of the iteration until the optimal solution of the hyper-parameters of the vegetable disease diagnosis model is determined from the observation area, so as to complete the optimization of the hyper-parameters of the vegetable disease diagnosis model.
It can be understood that, in order to extract a proper sample and avoid the bayesian optimization from being trapped in a local optimal solution, a balance point is found in the detection and development (iteration and iteration) based on the confidence interval upper bound method, and a sampling point of the iteration is selected. Wherein, the detection (exploration) is to obtain sampling points in an area which is not sampled yet, and the development (exploration) is to sample in an area where a global optimal solution is most likely to appear according to posterior distribution;
and continuously reducing the observation area of the sampling point based on the iteration of the process until an optimal solution is searched, and completing the optimization of the vegetable disease diagnosis model with the hyperparameter.
It can be understood that the overall idea of bayesian optimization is: suppose there is a function f: x → R for the purpose of
Figure BDA0003378474740000091
Finding:
Figure BDA0003378474740000092
where X represents a hyper-parameter and X represents a hyper-parameter search space.
The core of the Bayesian optimization algorithm is composed of two parts: -a Prior Function (PF); modeling the objective function, i.e. calculating the mean and variance of the function values at each point, usually by using gaussian process regression; acquisition function (AC): the sampling function is mainly used for determining the sampling point of the iteration, and the acquisition function mainly comprises methods of EI (expected improvement), PI (adaptability of improvement), UCB (upper confidence bound), and the like.
Wherein the UCB function is as follows:
UCB(x)=μ(x)+εσ(x)
the PI function is as follows:
Figure BDA0003378474740000093
the EI function is as follows:
Figure BDA0003378474740000101
where μ (x) represents the mean, σ (x) represents the variance, and upsilon*And (3) representing the current optimal function value, phi (phi) is a standard normal distribution cumulative density function, and epsilon is a balance parameter (a parameter for balancing the relationship between local and global searches can be set manually).
The Bayesian optimization algorithm flow is as follows:
(1) the algorithm first initializes n0The candidate solutions, typically pick points evenly throughout the feasible domain.
(2) A loop is started, one point at a time, until N candidate solutions are found.
(3) When the next point is searched each time, a Gaussian regression model is established by using n candidate solutions which are found, and the posterior probability of the function value at any point is obtained.
(4) And constructing an acquisition function according to the posterior probability, and searching a maximum value point of the function as a next search point.
(5) The function value at the next search point is calculated.
(6) And finally, returning the maximum values of the N candidate solutions as the optimal solution by the algorithm.
The vegetable disease diagnosis model in the embodiment is obtained based on training of a light gbm (light Gradient Boosting machine) model, when parameter optimization is performed on the light gbm model by using a bayesian optimization algorithm, different hyper-parameter combinations of the light gbm model are used as independent variables x, accuracy obtained by five-fold cross validation evaluation is used as output y of a bayesian frame, and the ratio of a training set to a testing set is set to be 7: 3.
The result of selecting parameters of the LightGBM model based on bayesian optimization is shown in table 2:
TABLE 2 optimal hyper-parameters of LightGBM model based on Bayesian optimization
Parameter(s) Meaning of parameters Selection scope End result
colsample_bytree Ratio of randomly sampled columns per tree (0.5,1) 0.56
min_child_samples Minimum number of leaf node samples (2,200) 9
num_leaves Number of leaf nodes (5,1000) 884
subsample Sampling ratio (0.6,1) 0.7
max_depth Maximum depth of tree (2,10) 6
n_estimators Number of iterations (10,1000) 1000
reg_alpha L1Regularization (0,10) 1.08
reg_lambda L2Regularization (0,10) 4.8
min_gain_to_split Minimum benefit of segmentation (0,1) 0.15
learning_rate Learning rate (0,1) 0.1
In some embodiments, the vegetable disease diagnosis model is obtained by training a LightGBM model by taking the features corresponding to the vegetable disease prescription source data as a sample and taking the vegetable disease category corresponding to the vegetable disease prescription source data as a sample label.
It can be understood that, in order to realize the efficient and accurate diagnosis of vegetable diseases, a plurality of learning models can be combined to obtain a better effect, so that the combined model has stronger generalization capability. The LightGBM model is a distributed gradient lifting framework based on a decision tree algorithm, has the advantages of high efficiency, low memory and high accuracy, supports parallelization learning, and can process large-scale data.
The LightGBM model and the XGboost model are efficient implementation of the GBDT model, so the LightGBM model is similar to the GBDT model and the XGboost model in principle, and both the LightGBM model and the XGboost model are efficient implementation based on a Gradient boosting algorithm (GB), the GBDT model is a Gradient boosting algorithm with a decision tree as a base learner (basis function), and the XGboost is an improved algorithm of the GDBT model.
The gradient lifting algorithm firstly gives an objective loss function, and the definition domain of the objective loss function is a set of all feasible functions (basic functions); the gradient boosting algorithm gradually approaches the local minimum by iteratively selecting a basis function in the negative gradient direction.
The gradient boost algorithm process is as follows:
for the training set T { (x)1,y1),(x2,y2),…,(xN,yN)},
Figure BDA0003378474740000121
Figure BDA0003378474740000122
Loss function L (y, f (x));
1. the model is initialized to constants:
Figure BDA0003378474740000123
2. iteratively generating M basis learners:
2.1 calculate the pseudo-residual (gradient direction), where i ═ 1,2, …, n;
Figure BDA0003378474740000124
2.2 based on data
Figure BDA0003378474740000125
Calculating a basis function h of the fitted residualm(x);
2.3 calculating an optimal step length gamma, wherein L is a loss function, and a parameter gamma is selected through empirical risk minimization:
Figure BDA0003378474740000126
2.4 updating model:
Fm(x)=Fm-1(x)+γmhm(x)
compared with other Boosting integration methods, the LightGBM adds a gradient unilateral sampling (GOSS) algorithm and an Exclusive Feature Bundling (EFB) algorithm, so that the LightGBM model is selected and further optimized in the invention:
(1) the main idea of the gradient unilateral sampling method is as follows: from the reduced sample perspective, most of the samples of the small gradient are excluded and only the remaining samples are used to calculate the information gain. Because the sample points with large gradients contribute more information gains, in order to keep the accuracy of information gain evaluation, the gradient single-side sampling algorithm reserves all the examples with large gradients, and uses random sampling on the examples with small gradients, thereby achieving the purpose of improving the efficiency. In order to counteract the influence on data distribution, when information gain is calculated, a gradient unilateral sampling algorithm introduces a constant multiplier to data with small gradient. The gradient unilateral sampling algorithm firstly selects the first a examples according to the gradient absolute value sequence of the data. Then b instances are randomly sampled in the remaining data. The information gain is then calculated by multiplying the sampled small gradient data by (1-a)/b, so that the algorithm is more concerned with the case of training deficiency without changing the distribution of the original data set too much.
(2) The main idea of the mutual exclusion characteristic binding algorithm is as follows: from the aspect of reducing the characteristics, the mutually exclusive characteristics are bound together, so that the information integrity is ensured, and meanwhile, the calculation efficiency is improved. For the problem of how to merge mutually exclusive features, LightGBM solves the problem by using Histogram (Histogram) algorithm to discretize continuous features into k discrete features, and at the same time, constructing a k-wide Histogram for statistical information (containing k discrete features). By utilizing the histogram algorithm, the optimal split point can be found by only traversing k discrete features without traversing all data.
In some embodiments, the vegetable disease diagnosis method further comprises:
the vegetable disease diagnosis model based on the Bayesian optimization LightGBM model is compared with a common 7-machine learning model by taking Precision (Precision), Recall (Recall), comprehensive evaluation index (F1-score) and Accuracy (Accuracy), Macro average (Macro avg) and Weighted average (Weighted avg) as evaluation indexes. The accuracy, the recall ratio, the comprehensive evaluation index and the accuracy are evaluation indexes of two classification problems, the comprehensive evaluation index can give consideration to the accuracy and the recall ratio of a classifier (namely a vegetable disease diagnosis model) at the same time, and the higher the comprehensive evaluation index is, the better the comprehensive performance of the classifier is represented. The expressions for the macro-average and the weighted average are as follows:
Figure BDA0003378474740000131
Figure BDA0003378474740000132
Figure BDA0003378474740000133
Figure BDA0003378474740000134
wherein TP is the number of correctly predicted positive classes; TN is the number of correctly predicted negative classes; FP is the number of wrongly predicted positive classes; FN is the number of erroneously predicted negative classes. The diagnosis of multiple diseases in tomato is a multi-classification problem. For a multi-class problem, class i corresponds to TPi、TNi、FPiAnd FNiThe evaluation index can be expanded into a macro average and a weighted average.
Macro average (Macro avg) is the arithmetic mean of all categories of evaluation indexes (accuracy, recall and comprehensive evaluation indexes), but the evaluation method ignores the possible imbalance problem among samples, so Weighted average (Weighted avg) is used for calculating each evaluation index by multiplying the ratio w of the category in the total samplesiAnd further summed.
By PrecisionmarcoAnd PrecisionweightedFor example, the following steps are carried out:
Figure BDA0003378474740000141
Figure BDA0003378474740000142
Figure BDA0003378474740000143
in order to verify the effectiveness of the feature selection method on the vegetable disease diagnosis model based on the Bayesian optimization LightGBM model, feature variables before and after feature selection are input into the vegetable disease diagnosis model based on the Bayesian optimization LightGBM model, and the experimental results are shown in Table 3.
Table 3 vegetable disease diagnosis model experiment results based on LightGBM model
Figure BDA0003378474740000144
By comparing table 3, the LightGBM model based on bayesian optimization before and after feature selection reduces 47.73% of the running time on the basis of ensuring accuracy. The difficulty of data acquisition in the early stage is further reduced, and the model operation efficiency is improved.
The vegetable disease diagnosis model based on the LightGBM achieves a good classification effect on three category data of tomato virus diseases, tomato late blight and tomato gray mold. The tomato virus disease classification effect is optimal, the accuracy rate and the recall rate respectively reach 97.27% and 94.68%, and the comprehensive evaluation index F1-score can reach 95.95%.
Compared with tomato virus diseases, the classification effect of tomato late blight and tomato gray mold is slightly inferior, and the reason is that the two diseases are common and difficult to distinguish in the actual field environment. Both appeared as off-white mildew layers on the fruits at the early stage of onset and as dark brown spots on the leaves at the later stage of onset. Particularly, the tomato late blight is rapidly spread and can cause destructive damage in a short time, so the advantage of the LightGBM-based disease diagnosis model is utilized, and the method has important practical significance for timely identifying and diagnosing the vegetable diseases.
The vegetable disease diagnosis model based on the Bayesian optimization LightGBM model is compared with a common 7-machine learning model, and is shown in Table 4.
TABLE 4 comparison of different models
Figure BDA0003378474740000151
Figure BDA0003378474740000161
According to the results of the three evaluation indexes of accuracy, macro-average and weighted average in the table 4, the vegetable disease diagnosis model provided by the invention has better overall classification performance. Compared with the common machine learning method in table 4, the vegetable disease diagnosis model provided by the invention has the best performance no matter a single machine learning model (KNN, DT and SVM), or an RF algorithm based on a bagging integration framework and other algorithms (Adaboost, GDBT and XGBoost) based on a boosting integration framework.
In other embodiments, the vegetable disease diagnosis method can be used for diagnosing tomato diseases, and the corresponding process is shown in fig. 5 and includes data set construction, data preprocessing, feature verification, model construction and model optimization.
In summary, the method for diagnosing vegetable diseases provided by the present invention comprises: acquiring vegetable disease prescription source data, and screening the vegetable disease prescription source data to obtain an experimental data set; cleaning and coding the experimental data set to obtain an initial feature set; selecting an initial feature set based on a method of combining recursive feature elimination and a gradient ascending decision tree to obtain a target feature set; and inputting the target feature set into a vegetable disease diagnosis model to obtain the category of vegetable diseases.
According to the vegetable disease diagnosis method provided by the invention, vegetable disease prescription source data are obtained, the vegetable disease prescription source data are processed to finally obtain an initial characteristic set, a target characteristic set is obtained on the basis of the initial characteristic set, and the target characteristic set is input into a trained vegetable disease diagnosis model for diagnosis and identification to obtain vegetable disease categories. Plant protection personnel are not needed to participate in the whole process, only the farmers fill in disease related information, intelligent identification can be carried out through prescription big data analysis and a vegetable disease diagnosis model, the defects that vegetable disease diagnosis efficiency is low and image data acquisition is difficult in the prior art can be overcome, multi-source information contained in prescription data is utilized for the first time, efficient and accurate diagnosis of vegetable diseases is assisted, and the condition of misdiagnosis is avoided.
Moreover, the vegetable disease prescription source data is selected through screening, cleaning and coding and based on a method of combining recursive feature elimination and a gradient ascending decision tree, a target feature set needing to be identified is determined, the feature set needing to be identified is reduced, the difficulty of early-stage data collection is reduced, and the data processing efficiency of the vegetable disease diagnosis model is improved.
The present invention provides a vegetable disease diagnosis apparatus, which can be referred to in correspondence with the above-described vegetable disease diagnosis method.
As shown in fig. 6, a vegetable disease diagnosis device 600 according to the present invention includes: a data set construction module 610, a data set processing module 620, a feature selection module 630 and a disease diagnosis module 640.
The data set construction module 610 is configured to obtain vegetable disease prescription source data, and screen the vegetable disease prescription source data to obtain an experimental data set.
The data set processing module 620 is configured to clean and encode the experimental data set to obtain an initial feature set.
The feature selection module 630 is configured to select an initial feature set based on a method combining recursive feature elimination and a gradient ascending decision tree to obtain a target feature set.
And the disease diagnosis module 640 is used for inputting the target feature set into a vegetable disease diagnosis model to obtain vegetable disease categories.
In some embodiments, the data set processing module 620 includes: a cleaning unit and a coding unit.
And the cleaning unit is used for deleting repeated values, uniformizing and abnormal values of the experimental data set to obtain the cleaned experimental data set.
And the coding unit is used for performing label coding and one-hot coding on the cleaned experimental data set to obtain an initial feature set.
In some embodiments, the feature extraction module 630 includes: an extraction unit and a verification unit.
The extraction unit is used for extracting the initial feature set based on a recursive feature elimination method to obtain an intermediate feature set.
The verification unit is used for performing cross verification on the subsets of the intermediate feature set based on a gradient ascending decision tree method, and selecting the features with the optimal number as the target feature set.
In some embodiments, the vegetable disease diagnosis device further comprises: and a hyper-parameter optimization module.
And the hyper-parameter optimization module is used for optimizing hyper-parameters of the vegetable disease diagnosis model based on a Bayesian optimization algorithm.
In some embodiments, the hyper-parameter optimization module comprises: the device comprises a modeling unit, an iteration selection unit and an optimization unit.
The modeling unit is used for modeling an objective function corresponding to the vegetable disease diagnosis model and calculating the mean value and the variance of function values corresponding to each sampling point of the objective function.
The iteration selection unit is used for determining an acquisition function based on the mean value and the variance of the function value corresponding to each sampling point of the target function, and selecting the sampling point of the iteration based on the acquisition function.
The optimization unit is used for iteratively reducing an observation area corresponding to the sampling point of the iteration based on the sampling point of the iteration until the hyper-parameter optimal solution of the vegetable disease diagnosis model is determined from the observation area, so that the hyper-parameter optimization of the vegetable disease diagnosis model is completed.
In some embodiments, the vegetable disease diagnosis model is obtained by training a LightGBM model by taking the features corresponding to the vegetable disease prescription source data as a sample and taking the vegetable disease category corresponding to the vegetable disease prescription source data as a sample label.
The electronic device, the computer program product, and the storage medium provided by the present invention are described below, and the electronic device, the computer program product, and the storage medium described below may be referred to in correspondence with the above-described vegetable disease diagnosis method.
Fig. 7 illustrates a physical structure diagram of an electronic device, and as shown in fig. 7, the electronic device may include: a processor (processor)710, a communication Interface (Communications Interface)720, a memory (memory)730, and a communication bus 740, wherein the processor 710, the communication Interface 720, and the memory 730 communicate with each other via the communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform a vegetable disease diagnostic method comprising:
110, acquiring vegetable disease prescription source data, and screening the vegetable disease prescription source data to obtain an experimental data set;
step 120, cleaning and coding the experimental data set to obtain an initial feature set;
step 130, selecting an initial feature set based on a method of combining recursive feature elimination and a gradient ascending decision tree to obtain a target feature set;
and 140, inputting the target feature set into a vegetable disease diagnosis model to obtain the vegetable disease category.
In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being stored on a non-transitory computer-readable storage medium, wherein when the computer program is executed by a processor, the computer is capable of executing the vegetable disease diagnosis method provided by the above methods, the method including:
110, acquiring vegetable disease prescription source data, and screening the vegetable disease prescription source data to obtain an experimental data set;
step 120, cleaning and coding the experimental data set to obtain an initial feature set;
step 130, selecting an initial feature set based on a method of combining recursive feature elimination and a gradient ascending decision tree to obtain a target feature set;
and 140, inputting the target feature set into a vegetable disease diagnosis model to obtain the vegetable disease category.
In still another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements a method for diagnosing vegetable diseases provided by the above methods, the method including:
110, acquiring vegetable disease prescription source data, and screening the vegetable disease prescription source data to obtain an experimental data set;
step 120, cleaning and coding the experimental data set to obtain an initial feature set;
step 130, selecting an initial feature set based on a method of combining recursive feature elimination and a gradient ascending decision tree to obtain a target feature set;
and 140, inputting the target feature set into a vegetable disease diagnosis model to obtain the vegetable disease category.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for diagnosing a vegetable disease, comprising:
acquiring vegetable disease prescription source data, and screening the vegetable disease prescription source data to obtain an experimental data set;
cleaning and coding the experimental data set to obtain an initial feature set;
selecting an initial feature set based on a method of combining recursive feature elimination and a gradient ascending decision tree to obtain a target feature set;
and inputting the target feature set into a vegetable disease diagnosis model to obtain the category of vegetable diseases.
2. A vegetable disease diagnostic method as claimed in claim 1, wherein the washing and encoding of the experimental data set to obtain an initial feature set comprises:
deleting repeated values, uniformizing and deleting abnormal values of the experimental data set to obtain a cleaned experimental data set;
and performing label coding and one-hot coding on the cleaned experimental data set to obtain an initial feature set.
3. The vegetable disease diagnosis method according to claim 1, wherein the selecting of the initial feature set based on the combination of recursive feature elimination and gradient ascending decision tree to obtain the target feature set comprises:
extracting the initial feature set based on a recursive feature elimination method to obtain an intermediate feature set;
and performing cross validation on the subset of the intermediate feature set based on a gradient ascending decision tree method, and selecting the optimal number of features as the target feature set.
4. A vegetable disease diagnosis method according to claim 1, characterized by further comprising:
and optimizing the hyper-parameters of the vegetable disease diagnosis model based on a Bayesian optimization algorithm.
5. A vegetable disease diagnosis method according to claim 4, wherein the optimizing the hyper-parameters of the vegetable disease diagnosis model based on the Bayesian optimization algorithm comprises:
modeling an objective function corresponding to the vegetable disease diagnosis model, and calculating the mean value and the variance of function values corresponding to each sampling point of the objective function;
determining an acquisition function based on the mean value and the variance of the function value corresponding to each sampling point of the target function, and selecting the sampling point of the iteration based on the acquisition function;
and iteratively reducing the observation area corresponding to the sampling point of the iteration based on the sampling point of the iteration until the optimal solution of the hyper-parameters of the vegetable disease diagnosis model is determined from the observation area, so as to complete the optimization of the hyper-parameters of the vegetable disease diagnosis model.
6. A vegetable disease diagnosis method according to any one of claims 1 to 5, wherein the vegetable disease diagnosis model is obtained by training a LightGBM model with characteristics corresponding to vegetable disease prescription source data as a sample and vegetable disease categories corresponding to vegetable disease prescription source data as sample labels.
7. A vegetable disease diagnostic device characterized by comprising:
the data set construction module is used for acquiring vegetable disease prescription source data and screening the vegetable disease prescription source data to obtain an experimental data set;
the data set processing module is used for cleaning and coding the experimental data set to obtain an initial feature set;
the characteristic selection module is used for selecting the initial characteristic set based on a method of combining recursive characteristic elimination and a gradient ascending decision tree to obtain a target characteristic set;
and the disease diagnosis module is used for inputting the target feature set into a vegetable disease diagnosis model to obtain vegetable disease categories.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the vegetable disease diagnosis method according to any one of claims 1 to 6.
9. A non-transitory computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the vegetable disease diagnosis method according to any one of claims 1 to 6.
10. A computer program product comprising a computer program, wherein the computer program when executed by a processor implements the steps of the vegetable disease diagnosis method according to any one of claims 1 to 6.
CN202111424433.6A 2021-11-26 2021-11-26 Vegetable disease diagnosis method and device, electronic device and storage medium Pending CN114219672A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111424433.6A CN114219672A (en) 2021-11-26 2021-11-26 Vegetable disease diagnosis method and device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111424433.6A CN114219672A (en) 2021-11-26 2021-11-26 Vegetable disease diagnosis method and device, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN114219672A true CN114219672A (en) 2022-03-22

Family

ID=80698535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111424433.6A Pending CN114219672A (en) 2021-11-26 2021-11-26 Vegetable disease diagnosis method and device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN114219672A (en)

Similar Documents

Publication Publication Date Title
Haider et al. A generic approach for wheat disease classification and verification using expert opinion for knowledge-based decisions
Revathy et al. Comparative analysis of C4. 5 and C5. 0 algorithms on crop pest data
Ninomiya High-throughput field crop phenotyping: current status and challenges
Pujari et al. Automatic fungal disease detection based on wavelet feature extraction and PCA analysis in commercial crops
CN108846695A (en) The prediction technique and device of terminal replacement cycle
Herdiyeni et al. Chilli quality classification using deep learning
CN116703328B (en) Project review method and system
CN115618021A (en) Method and device for recommending suitable planting area of crop variety
CN110751035A (en) Seed corn production identification method and device
Atanbori et al. Convolutional neural net-based cassava storage root counting using real and synthetic images
Ullah et al. Automatic diseases detection and classification in maize crop using convolution neural network
Cho et al. Fruit ripeness prediction based on DNN feature induction from sparse dataset
Mohanty et al. Tomato Plant Leaves Disease Detection using Machine Learning
Mubarokah et al. Detection of Begomovirus Disease for Identification of Disease Severity Level in Tomato Leaves Using Convolutional Neural Network (CNN)
CN113077271A (en) Enterprise credit rating method and device based on BP neural network
CN112308289A (en) Rice yield prediction method and device
CN114219672A (en) Vegetable disease diagnosis method and device, electronic device and storage medium
Mahendran et al. Feature extraction and classification based on pixel in banana fruit for disease detection using neural networks
Nurmalasari et al. Classification for Papaya Fruit Maturity Level with Convolutional Neural Network
KR101632537B1 (en) Technical ripple effect analysis method
Mowla et al. Weeds Detection Networks
Ibrahim et al. Plant Disease Detection using Artificial Intelligence techniques for Agricultural productivity enhancement in Egypt
Marandi et al. Apple Fruit Disease Detection Using Image Processing
Custodio et al. Performance analysis of deep learning architectures in classifying calamansi (citrofortunella microcarpa)
CN118097435B (en) Supergraph neural network-based corn lodging classification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination