Pesticide residue type identification method based on fluorescence spectrum
Technical Field
The invention relates to a pesticide residue detection method, in particular to a pesticide residue type identification method based on fluorescence spectrum.
Background
In the current agricultural production, pesticides are indispensable important production data, and have positive effects on controlling plant diseases and insect pests of crops, promoting yield increase and the like. However, the substances remaining in agricultural products, soil, environment and organisms after the pesticide application mainly include organic chlorine, organic phosphorus, pyrethroid and carbamate pesticides, which pose a great threat to human health, especially three effects and effects on reproductive performance.
At present, the detection of pesticide residues mainly comprises methods such as chromatography, mass spectrometry, spectral method, immunoassay biological detection and the like, and the principles and characteristics are different, wherein the methods such as chromatography, mass spectrometry, chromatography-mass spectrometry combined technology, high performance liquid chromatography and the like are complex in sample pretreatment, strong in speciality and high in equipment cost, such as: at present, the national standards (GB 23200.7-2016 and GB 23200.14-2016) apply chromatographic-mass spectrometry methods for detecting pesticide residues in fruit juice. Other methods are as follows: quick detection methods such as an immunoassay method and an enzyme inhibition method have the defects of poor sensitivity to pesticides, unstable properties and the like. The fluorescence spectrum method has the characteristics of high sensitivity, simple operation, quick method and the like, but has larger information quantity of spectrum data, more dimension and difficult analysis.
Disclosure of Invention
The invention aims to: the invention aims to provide a method for identifying the types of pesticide residues based on fluorescence spectrum, which improves the identification rate and analysis speed of the pesticide residues.
The technical scheme is as follows: the invention discloses a method for identifying pesticide residue types based on fluorescence spectrum, which comprises the following steps:
(1) Performing fluorescence spectrum detection on different types of pesticide residues, recording spectrum information, and determining corresponding fluorescence characteristic peaks to obtain spectrum characteristics;
(2) The particle swarm algorithm is applied to reduce the dimension of the spectrum data on the basis of the original spectrum, so that the model training process is simplified;
(3) Based on the spectrum data after dimension reduction, training a pesticide residue classification model by using a support vector machine method;
(4) And (3) applying a support vector machine model to identify, test and verify the pesticide residue type.
The spectrometer used in the step (1) is a fluorescence spectrophotometer LS55, the scanning range of the emission wavelength is set to be 200-800 nm, the slit width is 5.0nm, the scanning speed is 500nm/min, and the excitation wavelength is 265nm.
In the step (2), a particle swarm algorithm is applied to reduce the dimension of the fluorescence spectrum, the particle number is set to be 100, the particle dimension is the original wavelength number 401 of the fluorescence spectrum, the learning factors are all set to be 1.5, the inertia weight is linearly reduced from 0.9 to 0.4, and the particle speed range is set to be [ -10,10]; the algorithm selects a linear combination mode, and simultaneously defines the classification accuracy and the feature dimension after dimension reduction in an fitness function, wherein the accuracy weight is set to 0.8, the weight corresponding to the feature number is set to 0.2, and the fitness function expression is as follows:
f(X i )=ErrorRate(i)·0.8+Dimension(i)/D·0.2
the error rate is predicted for a test set corresponding to the ith particle and is derived from a classification result of a support vector machine model; dimension is the feature Dimension taken by the ith particle; d is the original spectral dimension.
The kernel function in the step (3) is a radial basis function (Radial Basis Function, RBF) with the expression of K (x) i ,x j )=exp(-g||x i -x j || 2 ),x i The fluorescent intensity of the sample is obtained after the dimension reduction by the particle swarm algorithm.
In the step (3), a grid search method is used for optimizing a penalty factor c and a kernel function parameter g, K-fold cross validation is used for obtaining the classification accuracy of the validation set under the group of parameters, and finally, the parameters corresponding to the highest classification accuracy are taken as optimal parameters c and g.
The range of c in the grid search method is set to [2 ] -10 ,2 10 ]The g range is set to [2 ] -10 ,2 10 ]A step distance of 2 0.5 Setting an error threshold value 10 -4 The method comprises the steps of carrying out a first treatment on the surface of the The K-fold cross validation is set to 5 folds.
The optimal parameter c=13.93, g=0.66.
The beneficial effects are that: compared with the prior art, the invention has the following remarkable advantages: the method has high recognition rate of pesticide residues and high analysis speed; the sample spectrum is measured by adopting a fluorescence spectrophotometer, the detection is efficient, the sensitivity is high, the original spectrum data is subjected to dimension reduction based on a particle swarm algorithm, the data volume is reduced, the training time of a model is shortened, a classification model is trained by adopting a support vector machine method, and a high-performance pesticide residue classification model is constructed through model performance verification.
Drawings
FIG. 1 is a flow chart of a method for identifying pesticide residue types based on fluorescence spectrum;
FIG. 2 is a chart of bifenthrin fluorescence spectrum;
FIG. 3 is a fluorescence spectrum of prochloraz;
FIG. 4 is a fluorescence spectrum of cyromazine;
FIG. 5 is a flow chart of a particle swarm algorithm for realizing spectral dimension reduction;
FIG. 6 is a plot of characteristic wavelengths of bifenthrin after application of particle swarm optimization to the original fluorescence spectrum;
FIG. 7 is a graph of characteristic wavelength of prochloraz after applying particle swarm optimization to the original fluorescence spectrum to reduce the dimension;
FIG. 8 is a plot of characteristic wavelengths of cyromazine after application of particle swarm optimization to the original fluorescence spectrum;
fig. 9 is a diagram of the recognition result of the pesticide category corresponding to the classification model established by the support vector machine method.
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, the method for identifying the pesticide residue type based on the fluorescence spectrum is as follows:
(1) Fluorescence spectrum for detecting pesticide residue
Sample preparation: three typical pesticides, bifenthrin (active ingredient 100g/L, emulsifiable concentrate), prochloraz (active ingredient 450g/L, aqueous emulsion) and cyromazine (active ingredient 75%, wettable powder) were selected in this example. Taking bifenthrin as an example, the process of sample preparation is described. Firstly, weighing quantitative bifenthrin pesticide, diluting the quantitative bifenthrin pesticide with purified water to form standard liquid medicine with the concentration of 30 mug/ml for later use, then, diluting the standard liquid medicine with a proper amount of standard liquid medicine for 60 times with different multiples, thus obtaining 60 bifenthrin solutions with different concentrations, recording the concentrations of each sample, and marking the pesticide category as 1. According to the same steps, 60 prochloraz samples are prepared, and the pesticide category is marked as 2; 60 samples of cyromazine are prepared, the pesticide category is marked as 3, and the total number of the samples is 180.
And (3) spectrum data acquisition: and using a fluorescence spectrophotometer LS55 (Perkin Elmer company of America), setting the scanning range of the emission wavelength to be 200-800 nm, the slit width to be 5.0nm, the scanning speed to be 500nm/min and the excitation wavelength to be 265nm, and finally obtaining corresponding data by the combination of FL-winlab software and a fluorescence spectrometer. Each sample was tested 5 times in duplicate and averaged. The fluorescence spectra of bifenthrin, prochloraz and cyromazine with different concentrations are respectively measured, and the fluorescence spectrograms corresponding to the three pesticides are respectively drawn for further analyzing the fluorescence characteristics of each pesticide. The fluorescence spectrum of bifenthrin part is shown in fig. 2, and the corresponding concentrations of each sample in the graph are respectively as follows: 0 μg/mL, 0.925 μg/mL, 1.967 μg/mL, 3.623 μg/mL, 6.574 μg/mL, 9.852 μg/mL, 12.523 μg/mL. The fluorescence spectrum of prochloraz part is shown in figure 3, and the corresponding concentrations of each sample in the figure are respectively as follows: 0 μg/mL, 0.705 μg/mL, 1.469 μg/mL, 2.077 μg/mL, 2.684 μg/mL, 3.000 μg/mL. The fluorescence spectrum of the cyromazine part is shown in fig. 4, and the corresponding concentrations of each sample in the graph are respectively as follows: 0 μg/mL, 0.705 μg/mL, 1.469 μg/mL, 2.077 μg/mL, 2.684 μg/mL, 3.000 μg/mL. As a result, the characteristic peaks of bifenthrin, prochloraz and cyromazine were 295nm, 308nm and 356nm, respectively, and the fluorescence intensity increased with increasing concentration.
(2) Application of particle swarm algorithm to realize spectrum dimension reduction
As shown in fig. 5, the particle swarm algorithm is applied to reduce the dimension of the fluorescence spectrum of the sample, the particle number is set to 100, the particle dimension is the original wavelength number 401 of the fluorescence spectrum, the learning factors are all set to 1.5, the inertia weight is linearly reduced from 0.9 to 0.4, and the particle speed range is set to [ -10,10]. The algorithm selects a linear combination mode, and simultaneously defines the classification accuracy and the feature dimension after dimension reduction in an fitness function, and as the classification accuracy is more important than the feature dimension, the accuracy weight is set to 0.8, the weight corresponding to the feature number is set to 0.2, and the fitness function expression is:
f(X i )=ErrorRate(i)·0.8+Dimension(i)/D·0.2
the error rate is predicted for a test set corresponding to the ith particle and is derived from a classification result of a support vector machine model; dimension is the feature Dimension taken by the ith particle; d is the original spectral dimension.
And determining an individual extremum and a population extremum according to the initial particle fitness value, and updating the position and the speed of the particles according to a particle updating formula. The particle swarm algorithm is used for characteristic selection of fluorescence spectrum, a plurality of remarkable characteristic spectrums can be obtained from the original high-dimensional data in an optimized mode, and the optimal individual is decoded to obtain the corresponding spectrum dimension reduction result. After the dimensions of the bifenthrin, prochloraz and cyromazine spectrums are reduced by a particle swarm optimization method, 176-dimension is preferably selected from the original 401-dimension fluorescence wavelength as a characteristic spectrum. FIGS. 6, 7 and 8 are characteristic wavelength plots (circles represent selected characteristic wavelengths) of bifenthrin (0.03890 mg/mL), prochloraz (3.7214. Mu.g/mL) and cyromazine (0.04719 mg/mL), respectively, fluorescence spectra after particle swarm optimization for dimension reduction. After the dimension reduction treatment, the number of wavelengths is less than one half of the original number, which further simplifies the training process of the post support vector machine model.
(3) Training classification model using support vector machine
The present embodiment applies a nonlinear support vector machine model in which the kernel function is set as a radial basis function (Radial Basis Function, RBF) whose expression is: k (x) i ,x j )=exp(-g||x i -x j || 2 ),x i The fluorescent intensity is 176-dimensional after the dimension reduction by a particle swarm algorithm. The support vector machine model corresponds to the objective function:
let its optimal solution be: alpha
* =(α
1 * ,α
2 * ,...α
n * ) Then
The corresponding optimal classification function is:
x in formula (2) i Is a support vector, x is an unknown vector, α i K (x, y) is a kernel function, which is the Lagrangian multiplier.
In the modeling process of the support vector machine method, a penalty factor c and a kernel function parameter g are two important parameters, and the value of the penalty factor c and the kernel function parameter g influence the accuracy of the SVC model. The penalty factor c and the kernel function parameter g are optimized by a grid search method. The method comprises the following specific steps: firstly, setting the search range of parameters c and g, enabling the parameters c and g to traverse all points in the grid to take values, obtaining the classification accuracy of the verification set under the group of parameters by using K-fold cross verification, and finally taking the parameters corresponding to the highest classification accuracy as optimal parameters c and g. Wherein, the ranges of c and g are respectively set as [2 ] -10 ,2 10 ]A step distance of 2 0.5 An error threshold of 10 -4 The method comprises the steps of carrying out a first treatment on the surface of the Setting the cross validation as 5 folds, and obtaining optimal parameter values of c=13.93 and g=0.66 through testing.
Setting parameters in the support vector machine model as the optimal parameters, namely: c=13.93, g=0.66, setting radial basis function, taking 176-dimensional fluorescence intensity after the dimension reduction of the particle swarm algorithm as model input, and taking the corresponding sample type as model output. In the model, the number of samples is 180, and the samples are divided into 3 types (60 samples in each type), wherein the types corresponding to the bifenthrin, prochloraz and cyromazine are respectively marked as numbers 1, 2 and 3, 42 bifenthrin samples, 42 prochloraz samples and 49 cyromazine samples are randomly selected, 133 samples are training sets, and the remaining 47 samples are test sets. After model training is completed, the support vector machine model can be used for identifying the types of unknown pesticide samples.
(4) Pesticide residue type identification test verification
Taking three typical pesticides as examples, 47 samples are selected as test sets, wherein the test sets comprise 18 samples of bifenthrin 1 type, 18 samples of prochloraz 2 type and 11 samples of cyromazine 3 type. The 176-dimensional fluorescence intensity corresponding to the sample fluorescence spectrogram subjected to the particle swarm optimization dimension reduction is input into a support vector machine model, the test result is shown in fig. 9, and the recognition accuracy of the support vector machine classification model on three pesticide residues is 100%.