CN112287601B - Method, medium and application for constructing tobacco leaf quality prediction model by using R language - Google Patents

Method, medium and application for constructing tobacco leaf quality prediction model by using R language Download PDF

Info

Publication number
CN112287601B
CN112287601B CN202011141976.2A CN202011141976A CN112287601B CN 112287601 B CN112287601 B CN 112287601B CN 202011141976 A CN202011141976 A CN 202011141976A CN 112287601 B CN112287601 B CN 112287601B
Authority
CN
China
Prior art keywords
model
prediction
tobacco
variable
variables
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011141976.2A
Other languages
Chinese (zh)
Other versions
CN112287601A (en
Inventor
李伟
王攀磊
鲁耀
张静
刘浩
董石飞
杨应明
王超
耿川雄
陈拾华
杨景华
王建新
聂鑫
朱海滨
林昆
杨义
段宗颜
张忠武
严君
邹炳礼
周敏
周绍松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hongyun Honghe Tobacco Group Co Ltd
Institute of Agricultural Environment and Resources of Yunnan Academy of Agricultural Sciences
Original Assignee
Hongyun Honghe Tobacco Group Co Ltd
Institute of Agricultural Environment and Resources of Yunnan Academy of Agricultural Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hongyun Honghe Tobacco Group Co Ltd, Institute of Agricultural Environment and Resources of Yunnan Academy of Agricultural Sciences filed Critical Hongyun Honghe Tobacco Group Co Ltd
Priority to CN202011141976.2A priority Critical patent/CN112287601B/en
Publication of CN112287601A publication Critical patent/CN112287601A/en
Application granted granted Critical
Publication of CN112287601B publication Critical patent/CN112287601B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Mathematical Optimization (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • Computational Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Algebra (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of tobacco quality prediction, and discloses a method, a medium and application for constructing a tobacco quality prediction model by using R language, wherein data transformation and screening treatment are respectively carried out on predicted variables; creating a prediction variable set and a result variable set, and respectively dividing and resampling the data; selecting a plurality of regression methods for modeling; using root mean square error RMSE and decision coefficient R 2 And evaluating the prediction effects of different models, and selecting an optimal model from the test models according to the model effects. The ecological factor model suitable for predicting the tobacco quality provided by the invention can predict the quality fluctuation condition of single-grade tobacco in different areas in the current year according to the current year ecological climate change condition, realize targeted adjustment of tobacco purchasing grade and quantity, actively adjust the quantity and proportion of tobacco purchasing grade, and ensure stable tobacco quality supply.

Description

Method, medium and application for constructing tobacco leaf quality prediction model by using R language
Technical Field
The invention belongs to the technical field of tobacco quality prediction, and particularly relates to a method, medium and application for constructing a tobacco quality prediction model by using R language.
Background
At present, tobacco quality is the result of the combined action of genetic factors, ecological environment and cultivation technology. Many researches show that ecological factors such as climate, soil, topography and the like are important factors influencing agronomic characters, physical characteristics, chemical components, disease rate, aroma substance content and smoking quality of tobacco leaves, particularly the characteristic characteristics of tobacco leaf quality such as multiple factors, multiple changes and difficult quantification, the influence of ecological environment is more remarkable, and the tobacco leaf quality in different planting areas and different years is greatly different due to the change of light, temperature, water and gas conditions. Therefore, an ecological factor model for predicting the tobacco quality is constructed, and the ecological factors such as climate, soil, cultivation management and the like are utilized to predict the tobacco quality change, so that the tobacco quality is very important to improvement.
Through the above analysis, the problems and defects existing in the prior art are as follows: at present, the prediction model is mainly focused on predicting the sensory quality of tobacco leaves by utilizing the inherent chemical components of the tobacco leaves, the research method about the correlation between the ecological factors and the tobacco leaf quality is mainly focused on exploring the influence and contribution of the ecological factors on the tobacco leaf quality through the methods of principal component regression analysis, gray correlation analysis and the like, finding out the key ecological factors, and guiding the tobacco leaf production through the regulation and control of the key ecological factors. And no prediction model is used for predicting tobacco quality by using external ecological factors of flue-cured tobacco growth.
The difficulty of solving the problems and the defects is as follows: on one hand, the construction of the prediction model requires a large amount of complete tobacco quality and corresponding ecological factor data; on the other hand, the data types involved in the invention are complex, both continuous variables and dependent variables, and the prediction model constructed by each regression method has uncertainty.
The meaning of solving the problems and the defects is as follows: thus, the present invention chooses to construct a predictive model using the R language that provides multiple regression methods. The R language is open source software for mathematical and statistical calculations, which can provide as many models as possible, relatively complex predictive model construction of massive data, exploration of model uncertainty through rigorous training tests, and selection of optimal models. Reduces the workload and cost of tobacco leaf detection and solves the problems of tobacco leaf raw material supply and blending caused by tobacco leaf quality detection lag. According to the ecological climate conditions in the current year, the tobacco quality is evaluated and predicted by using a prediction model, the stable supply of the tobacco raw material grade and quantity in the cigarette formula module is ensured, and the stable quality of the cigarette products is ensured.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a method, a medium and application for constructing a tobacco quality prediction model by using R language.
The invention is realized in such a way that the tobacco quality prediction method based on the ecological factor model comprises the following steps:
step one, respectively carrying out data transformation and prediction variable screening treatment on prediction variables in tobacco quality prediction;
step two, a prediction variable set and a result variable set in tobacco quality prediction are created, and data are respectively segmented and resampled;
step three, selecting a plurality of regression methods to model the data; and obtaining prediction models in different tobacco quality predictions.
Step four, adopting Root Mean Square Error (RMSE) and a determination coefficient R 2 And evaluating the prediction effects of the different prediction models, and selecting an optimal model from the prediction models according to the prediction effects.
Further, in the first step, the data transformation performed on the predicted variable includes centering, normalization and bias transformation; the centering is to subtract the average value of all variables, and the result is that the average value of the variable after transformation is 0; the standardized data is obtained by dividing each variable by the standard deviation of the variable, and the standard deviation of the standardized forced variable is 1; the bias transformation can remove distribution bias, so that right bias distribution or left bias distribution is transformed into unbiased distribution, and the variables are distributed approximately symmetrically.
Further, in the first step, the method for performing data transformation on the predicted variable includes:
(I) Constructing a trans function by using a preprocess function in a caret packet, and simultaneously carrying out centering, standardization scale and bias conversion Box Cox processing on data;
(II) after the trans function is constructed, the original data is transformed using the prediction function.
Further, in the first step, the method for screening the predicted variable includes:
(1) Removing zero variance variables: detecting near zero variance variables to be filtered using the function nearZerovar function in the caret packet: if the display data set has a zero variance variable, the variable needs to be removed;
(2) The multiple co-linear variables are removed.
Further, in step (2), the method for removing multiple co-linear variables includes:
1) Calculating a correlation coefficient matrix among all the predicted variables by using a cor function in the corrplot packet;
2) Finding out the pair of prediction variables with the maximum absolute value of the correlation coefficient by using a findCorrelation function, and marking the pair of prediction variables as prediction variables A and B;
3) Calculating the average value of the correlation coefficients of the A and other predicted variables by using a head function, performing the same calculation on the B, and listing a variable column with high correlation coefficient;
4) If the average correlation coefficient of A is greater, then A is removed; if not, removing B;
5) Repeating the steps 2) -4) until the absolute values of all the correlation coefficients are lower than the set Guo value.
Further, in the second step, the method for creating the prediction variable set and the result variable set includes:
(I) Establishing a predictive variable set predictor from the first 1 to n predictive variable columns in the data set;
(II) establishing a result variable set result with the result variable column of the (n+1) th column in the data set.
Further, in the second step, the method for performing segmentation processing on the data includes:
(1) The createdata partition function in the caret packet is used for randomly selecting a test sample from the samples to construct a training set;
(2) After obtaining a training line, creating a predictive variable training set TrainPreactor and a result variable training set TrainResult containing the training line;
(3) And simultaneously creating a prediction variable test set TestPrectors and a result variable test set TestResult by using the residual samples.
Further, in the second step, the method for resampling the data includes: the K-fold cross-validation resampling may be implemented using the trainControl function in the caret packet.
Further, the K-fold cross validation method includes:
1) Randomly dividing the samples into k sub-sets of comparable size, first fitting the model with all samples except the first sub-set;
2) Predicting the reserved first folding sample by using the model, and evaluating the model by using the result;
3) Then returning the first subset to the training set, reserving the second subset for model evaluation, and then analogizing;
4) And calculating the mean value and standard deviation of the k obtained model evaluation results, and then based on the evaluation results, calculating the relationship between the demodulation optimal parameters and the model performances.
Further, in the fourth step, the test model selects a linear regression model, a nonlinear regression model and a regression tree model; the linear regression model comprises a generalized linear model, a stepwise regression linear model and a partial least square regression model; the nonlinear regression model comprises a Support Vector Machine (SVM) model and a K nearest neighbor model; the regression tree model includes a simple regression tree, a regression model tree, a random forest, and a cube model.
Further, in the step four, predicting and evaluating the model by using the train function in the caret packet; the predictive effectiveness of each model was evaluated using the samples function in caret and model results were reviewed using summary (resamp).
Further, in the model comparison result, the RMSE and R of each model can be used 2 Preferably, the smaller the RMSE, the higher the model prediction accuracy, R 2 The larger the modelThe better the degree of simulation.
It is another object of the present invention to provide a computer readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the method for tobacco leaf quality prediction based on an ecological factor model.
Another object of the present invention is to provide a computer terminal including:
the transformation and screening module is used for respectively carrying out data transformation and prediction variable screening treatment on the prediction variables in tobacco quality prediction;
the segmentation resampling module is used for creating a prediction variable set and a result variable set in tobacco quality prediction, and respectively carrying out segmentation and resampling on the data;
the prediction model acquisition module is used for selecting a plurality of regression methods to model the data; and obtaining prediction models in different tobacco quality predictions.
An optimal model screening module for determining a coefficient R by adopting a Root Mean Square Error (RMSE) 2 And evaluating the prediction effects of the different prediction models, and selecting an optimal model from the prediction models according to the prediction effects.
It is another object of the present invention to provide a computer readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the method for tobacco leaf quality prediction based on an ecological factor model.
The invention further aims to provide an application of the tobacco quality prediction method based on the ecological factor model in quality detection of tobacco agronomic characters, physical characteristics, chemical components, disease rate, aroma substance content, absorption quality, different planting areas and different years.
By combining all the technical schemes, the invention has the advantages and positive effects that: the tobacco quality prediction method based on the ecological factor model utilizes the R language to construct an ecological factor optimal model for predicting tobacco quality. The R language is open source software for mathematical and statistical calculations, relatively complex predictive model construction can be performed using massive data, each predictive model has uncertainty, the R language can provide as many models as possible, the uncertainty of the models can be explored through strict training tests, and the optimal model can be selected. The model provided by the invention can predict the quality fluctuation condition of single-grade tobacco leaves in different areas in the current year according to the current year ecological climate change condition, realize targeted adjustment of the purchase grade and quantity of the tobacco leaves, actively adjust the quantity and proportion of the purchase grade of the tobacco leaves, and ensure stable quality supply of the tobacco leaves.
Drawings
Fig. 1 is a flowchart of a tobacco leaf quality prediction method based on an ecological factor model provided by an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Aiming at the problems existing in the prior art, the invention provides a tobacco leaf quality prediction method based on an ecological factor model, and the invention is described in detail below with reference to the accompanying drawings.
The tobacco leaf quality prediction method based on the ecological factor model provided by the embodiment of the invention comprises the following steps: respectively carrying out data transformation and prediction variable screening treatment on prediction variables in tobacco quality prediction;
creating a prediction variable set and a result variable set in tobacco quality prediction, and respectively dividing and resampling the data;
selecting a plurality of regression methods to model the data; and obtaining prediction models in different tobacco quality predictions.
Using root mean square error RMSE and decision coefficient R 2 And evaluating the prediction effects of the different prediction models, and selecting an optimal model from the prediction models according to the prediction effects.
Specifically, as shown in fig. 1, the tobacco quality prediction method based on the ecological factor model provided by the embodiment of the invention comprises the following steps:
s101, data preprocessing: and respectively carrying out data transformation and prediction variable screening processing on the prediction variables.
S102, data division: and creating a prediction variable set and a result variable set, and respectively carrying out segmentation and resampling on the data.
S103, data modeling: and selecting a plurality of regression methods to model the data.
S104, model preference: using root mean square error RMSE and decision coefficient R 2 And evaluating the prediction effects of different models, and selecting an optimal model from the test models according to the model effects.
The invention is further described below with reference to examples.
Example 1
1. Data preprocessing
The data preprocessing technology generally refers to adding, deleting or transforming training set data, and transforming the data to reduce the influence of data skewness and outliers, so that the performance of a model can be remarkably improved.
1.1 data transformation
Predictive models require that the predicted variables have the same dimension or scale, requiring data transformations, i.e., centralisation, normalisation and skewness transformations, to be performed on the variables. Centralizing subtracts the mean value from all variables, resulting in a transformed variable mean value of 0. Normalization data divides each variable by its own standard deviation, normalization forcing the standard deviation of the variable to be 1. The bias transformation can remove the distribution bias, so that the right bias distribution or the left bias distribution is transformed into unbiased distribution, and the variables are distributed approximately symmetrically.
The method uses a preprocess function in a caret packet to construct a trans function, and performs centering (center), standardization (scale) and skewness conversion (box cox) processing on data at the same time, wherein the trans function is constructed as follows:
trans<-preProcess(tobacco.numeric,
method=c("BoxCox","center","scale"))
after the trans function is constructed, the original data is converted by using the prediction function, and in the following command, data is the original data and converted.
transformed.data<-predict(trans,data)
1.2 predictive variable screening
Some of the predicted variables need to be removed before modeling to improve model performance and stability. Using fewer variables for prediction to reduce computational complexity, deleting redundant predicted variables makes it easier to obtain a more compact and easy-to-interpret model.
1.2.1 removing zero variance variable
The zero variance variable refers to a predicted variable with only one value, and the zero variance variable hardly contributes to the model, so that the zero variance variable needs to be distinguished and removed. If the ratio of the number of non-repeated values to the sample size is low (e.g. 10%), and the ratio of the highest frequency to the next highest frequency is high, the variance variable is zero.
The method uses the function nearZerovar function in the caret packet to detect near zero variance variables to be filtered:
nearZeroVar(data)
if there is a zero variance variable in the display dataset, the variable needs to be culled.
1.2.2 removal of multiple co-linear variables
Collinearity refers to the existence of a strong correlation between a pair of predicted variables, and collinearity between a plurality of predicted variables is called multiple collinearity. Since redundant predicted variables typically increase the complexity of the model rather than the amount of information, and in linear regression models, the use of highly correlated predicted variables may give a very unstable model, the predicted variables should avoid the occurrence of highly correlated variables in the data. The specific algorithm is as follows:
1. calculating a correlation coefficient matrix of the predicted variable;
2. the pair of prediction variables (marked as prediction variables A and B) with the largest absolute value of the correlation coefficient is found;
3. calculating the average value of the correlation coefficients of the A and other predicted variables, and performing the same calculation on the B;
4. if the average correlation coefficient of A is greater, then A is removed; if not, removing B;
5. repeating steps 2 to 4 until the absolute values of all the correlation coefficients are lower than the set peak value.
In the method, a cor function in a corrplot packet is used for calculating the correlation coefficient among all predicted variables, and in the following command, data is a data set and correlations is a correlation coefficient matrix between every two predicted variables in the data set.
correlations<-cor(data)
After calculating the correlation coefficient, searching a prediction variable with a higher correlation coefficient by using a findCorrelation function, wherein in the following command, corrientation is a correlation coefficient matrix, highcorrrection is a prediction variable with a filtered correlation coefficient of more than 0.75, and cutoff is a set threshold value for filtering the correlation coefficient:
highcorr.correlations<-findCorrelation(correlations,cutoff=0.75)
using the head function, a variable column with high correlation coefficient is listed:
head(highcorr.correlations)
and then removing the variable column with high correlation coefficient, wherein in the following commands, data.filtered data after removing multiple co-linear variables:
data.filtered<-data[,-highcorr.correlations]
2. data partitioning
2.1 creating a set of prediction variables and a set of result variables
When the prediction model is constructed, the data structure comprises a plurality of prediction variables and a result variable, and independent data sets are required to be respectively built for the prediction variables and the result variables.
The following commands build the first 1 to n prediction variable columns in the data dataset into the prediction variable set predictors:
predictors<-data[,1:n]
the following commands establish a result variable set result for the result variable column of the n+1th column in the data dataset:
result<-data[,n+1]
thus, a prediction variable set predictor and a result variable set result are respectively established.
2.2 data partitioning
Some models learn the data generalization pattern while also learning noise characteristics specific to each sample, called overfitting. Overfitting generally does not accurately predict new samples. Inappropriate tuning parameters may result in an overfitting of the model, requiring the model parameters to be adjusted by the data to give the most appropriate predictions. Thus, the data used to evaluate the model is not applied to build or debug the model, so that an unbiased estimate of the model's effect can be given. When the prediction model is built, a part of samples can be selected to build the prediction model, and the rest is reserved for model evaluation. The sample set used for modeling is referred to as the "training set" and the sample set used for verifying the model performance is referred to as the "test set".
The method uses the createDataPartition function in the caret packet to randomly select test samples from the samples, and constructs a training set. In the following command, data is the data set, the trainningrows represent randomly extracted sample rows divided into training sets, and p=0.8 represents extracting 80% of sample rows as training sets
trainningrows<-createDataPartition(data,
p=0.8,
list=FALSE)
After obtaining the training line, creating a predictive variable training set TrainPreactor and a result variable training set TrainResult containing the training line
TrainPredictors<-predictors[trainningrows,]
TrainResult<-result[trainningrows]
At the same time, a predictive variable test set TestPrectors and a result variable test set TestResult are created by the residual samples
TestPredictors<-predictors[-trainningrows,]
TestResult<-result[-trainningrows]
2.3 resampling
The resampling technique refers to fitting a model with one sub-sample in the test set, then evaluating the model with the remaining samples in the test set, repeating the process multiple times, and then summarizing the results. The resampling method allows a reasonable assessment of the predicted performance of a model on future samples. The samples may be resampled using a variety of sampling methods.
The method uses a K-fold cross-validation method, the principle of which is to divide samples randomly into K subsets of comparable size, first fit a model with all samples except the first subset (first fold), then predict the reserved first fold samples with the model, evaluate the model with its results, then return the first subset to the training set, reserve the second subset for model evaluation, and then analogize. The k model evaluation results thus obtained are summarized (typically, the mean and standard deviation are calculated), and then the relationship between the demodulation preference and the model performance is based thereon.
The resampling by the K-fold cross-validation method can be achieved using the trainControl function in the caret packet, and in the following command, trainControl resampling function, method= "cv" indicates that the K-fold cross-validation method is adopted, and number=10 indicates 10 folds.
trainControl(method="cv",number=10)
3. Data modeling
The method selects a plurality of regression methods to model data, and selects an optimal model from test models according to model effects. The method selects a linear regression model, a nonlinear regression model and a regression tree model. The linear regression model comprises a generalized linear model, a stepwise regression linear model and a partial least square regression model; the nonlinear regression model comprises a Support Vector Machine (SVM) model and a K nearest neighbor model; the regression tree model includes a simple regression tree, a regression model tree, a random forest, and a cube model.
The prediction and evaluation of the above model use the train function in the caret package, the general command is as follows, where fit represents the model, x represents the regression method used by the different model (the method command used by the different model is as follows), trControl specifies the resampling method, which is 10-fold cross-validation.
fit<-train(x=TrainPredictors,y=TrainScore,
method="x",
trControl=trainControl(method="cv",number=10))
4. Model preference
Using root mean square error (rootmean squared error, RMSE) and a decision coefficient (R 2 ) And evaluating the prediction effect of different models. RMSE is a function of model residuals, where residuals are observations minus model predictions, and RMSE values account for the average distance between observations and model predictions. Determining coefficient (R) 2 ) Interpreted as the proportion of the information contained in the data that can be interpreted by the model.
The prediction effect of each model was evaluated using the samples function in caret, and in the following commands, samples were the results of each model evaluation, and model results were checked using summary (resamp), with fit1, fit2, and fit3 representing different models.
resample<-resamples(list(fit1,fit2,fit3))
summary(resamp)
In the model comparison result, the RMSE and R of each model can be used 2 Preferably, the smaller the RMSE, the higher the model prediction accuracy, R 2 The larger the model simulation degree is, the better.
5. Model verification
(1) Model prediction
The predictive model is built using training sets as above and based on RMSE and R 2 Preferably a model that performs better. The section tests the predictive effect of each preferred model using the prediction function and test set data. The following commands are given, wherein the prediction is a prediction function, fit is a model to be tested, and TestPredictors are prediction variables of the test set.
PredictedResult<-predict(fit,TestPredictors)
(2) Model verification
And obtaining a predictive value PredictedResult according to model prediction, and comparing the predictive value PredictedResult with an observed value TestResult of a test set to measure a model prediction effect. Model quality was measured by the following 2 visualization methods.
1) The observation value and the prediction value scatter diagram know the fitting effect of the model. A plot function is used to reveal a scatter plot of observations and predictions. The predicted value and the observed value of the ideal model are distributed along the inclined line with the slope of 1, and the closer to the inclined line, the better the model prediction effect is.
plot(TestScore,PredictedResult)
(2) Scatter plot of residual and predicted values shows systematic patterns of predicted values
The difference between the observed value and the predicted value is the model residual, calculated using the following commands:
residualvalues<-TestResult-Predictedresult
the residual error of the model without systematic error should be uniformly distributed near 0, and the plot function can be used for displaying the scatter diagram of the residual error and the predicted value.
plot(PredictedResult,residualvalues)
(3) RMSE and R for calculating observations and predictions 2
Using RMSE and R 2 The function calculates the fitting effect of the observed value and the predicted value, and the command is as follows:
R2(PredictedResult,TestResult)
RMSE(PredictedResult,TestResult)
similarly, R is 2 The larger the representative observation and the better the fit of the predicted value, the smaller the RMSE, the closer the representative prediction and the observed value are, and the better the model prediction effect.
Example 2
1. Data preprocessing
Firstly, the data required by the model are preprocessed, and the data are transformed to reduce the influence of data skewness and outliers, so that the model performance can be obviously improved.
1.1 data transformation
1.1.1 importing data
library(readxl)
# load data read package "readxl" (R package, R function, is a collection of code and sample data)
tobacco<-read_excel("tobacco.xlsx",col_names=TRUE)
# import data and name "tobacco"
1.1.2 data structures and transformations
(1) Viewing data structures
str(tobacco)
The example data includes 595 samples, 51 variables, 50 predicted variables, 1 result variable. Of the predicted variables, 6 character-type variables, 44 continuous-type variables.
The character type variable in the predicted variable needs to be converted into the genotype type. In this example, 6 character variables of Area, cultvar, position, soil type, land form, and transplant were converted into a genotype.
tobacco$Area<-factor(tobacco$Area)
tobacco$Cultivar<-factor(tobacco$Cultivar)
tobacco$Position<-factor(tobacco$Position)
tobacco$soiltype<-factor(tobacco$soiltype)
tobacco$landform<-factor(tobacco$landform)
tobacco$transplant<-factor(tobacco$transplant,levels=c("early","middle","late"),ordered=TRUE)
The continuous variable is required to be centered, standardized and deflection processed, in this example, the preprocess function pairs of caret package were used for TN (total nitrogen), ni (nicotine), TS (total sugar), RS (reducing sugar), K (potassium), cl (chlorine), PE (petroleum ether), st (starch), N/Ni (nitrogen-to-alkali ratio), RS/Ni' (sugar-to-alkali ratio), DS (difference in two sugars), K/Cl (potassium-to-chlorine ratio), particleseze (soil particle size), alttude (altitude), ph, som (soil organic matter), an (soil available nitrogen), ap (soil available phosphorus), ak (soil available potassium), scl (soil chlorine), B (soil boron), growthdays (growth period), leaf (leave number) Napplication (nitrogen application), mayrainfall (5 month rainfall), junerainfall (6 month rainfall), julyrainfall (7 month rainfall), augustainfall (8 month rainfall), growthrainfall (growth period rainfall), maytem (5 month temperature), junetem (6 month temperature), julytem (7 month temperature), augusttem (8 month temperature), growthtem (growth period temperature), mayrun (5 month illumination), juneseun (6 month illumination), julyun (7 month illumination), augustsun (8 month illumination), growthsun (growth period illumination), mayhub (5 month humidity), junehumidity (6 month humidity), augustum (6 month humidity), the julyhumdity (7 months humidity), augusthumdity (8 months humidity), and growth humdity (growth period humidity) were transformed by 44 continuous variables in total.
library(caret)
# load carbet package
tobacco.numeric<-as.data.frame(tobacco[,c(5:34,38:69)])
Screening digital data to build data set
trans<-preProcess(tobacco.numeric,
method=c("BoxCox","center","scale"))
And 3 functions of skewness conversion, centralization and standardization transformation are integrated by using the preprocess function, so as to construct a trans function.
tobacco.transformed.numeric<-predict(trans,tobacco.numeric)
# transform the continuous variable using trans function.
tobacco.factor<-tobacco[,c(1:4,35:37)]
tobacco.transformed<-cbind(tobacco.factor,tobacco.transformed.numeric)
# factor type prediction variables and continuous type prediction variables are integrated.
1.2 predictive variable screening
1.2.1 removing zero variance variable
The near zero variance variable to be filtered is detected using the function nearZerovar in the caret packet.
nearZeroVar(tobacco.transformed.numeric)
1.2.2 removal of multiple co-linear variables
Removal of multiple co-linear high variables from chemical components
library(corrplot)
# load correlation coefficient calculation package
Removing variable with high multiple collinearity in chemical components of tobacco leaves
the chemical < -chemical conversion of tobacco leaf [,26:37] # extracts tobacco leaf chemical ingredient predictive variable
correlation coefficient calculation by correlations chemical < -cor (tobacco. Chemical) #
highcorr.chemical < —final correlation (chemical, cutoff=0.75) # finds variables with correlation coefficients above 0.75
head (chemical) # lists variable columns with high correlation coefficients, in this example RS/Ni, TS, cl are multiple collinearity variables
##[1]10 3 6
the multi-collinearity variables are removed by tobacco.chemical filtered < -tobacco.chemical [, -highcorr chemical ] # and the like
Removing multiple co-linear high variable in ecological factor
the ecological.numeric < -ecological.transformmed [,38:69] # extracts ecological factor predictive variables
correlation coefficients are calculated by correlation @ technical < -cor @ technical @ numerical #
highcorr.ecological<-findCorrelation(correlations.ecological,cutoff=0.75)
Finding a variable with a correlation coefficient above 0.75 #
head (highcorr. Technical) # lists variable columns with high correlation coefficients
##[1]322817291530
In this example, the five-month humidity (mayhumity), june humidity (junehmity), june humidity (julyhumity), growing period humidity (growthhumity), june rainfall (julyrain fall), and growing period rainfall (growthrain fall) are multiple collinearity variables.
the filtered < -tobacco.technical numeric [, -highcorr.technical ] # removes the multiple co-linear variables
And integrating the converted and screened variables to be used as a predicted variable data set.
tobacco.filtered<-cbind(tobacco.factor,tobacco.chemical.filtered,tobacco.ecologi cal.filtered)
And the data set is exported, so that the later use is convenient.
write.csv(tobacco.filtered,"tobacco.filtered.csv",row.names=FALSE,col.names=TRUE)
2. Data set construction
2.1 creating a set of prediction variables and a set of result variables
Importing the preprocessed data
tobacco.filtered<-read.csv("tobacco.filtered.csv")
tobacco<-read_excel("tobacco.xlsx",col_names=TRUE)
Creating a set of prediction variables
(1) Creation of prediction variable set suitable for conventional prediction model such as linear regression model
predictors<-tobacco.filtered[,-c(4,8:25)]
(2) Creation of prediction variable set suitable for random vector machine, K neighbor model and the like
ind.Area<-nnet::class.ind(predictors$Area)
ind.Cultivar<-nnet::class.ind(predictors$Cultivar)
ind.Position<-nnet::class.ind(predictors$Position)
ind.soiltype<-nnet::class.ind(predictors$soiltype)
ind.landform<-nnet::class.ind(predictors$landform)
ind.transplant<-nnet::class.ind(predictors$transplant)
ind<-cbind(ind.Area,ind.Cultivar,ind.Position,ind.soiltype,ind.landform,ind.trans plant)
trans.1<-preProcess(ind,method=c("BoxCox","center","scale"))
ind.transformed<-predict(trans.1,ind)
predictors.ind<-cbind(ind.transformed,predictors[,-c(1:6)])
And creating a result variable set, wherein the result variable refers to the tobacco sensory quality score, and the result variable in the example is the tobacco sensory quality score.
score<-tobacco$SCORE
2.2 data partitioning
2.2.1 data partitioning
A training set and a test set of predicted variables and result variables, respectively, are created.
set (222) # set random number seed to ensure repeatable results
trainningrows<-createDataPartition(score,
p=0.8,
list=FALSE)
In this example 80% of the sample rows are randomly chosen as training rows, which represent samples divided into training sets
TrainPredictors<-predictors[trainningrows,]
Trainer predictors. Ind < -predictors. Ind [ trainninggrowth ] # selects a prediction variable sample to the training set
TrainScore < -score [ trainninggrowth ] # select result variable samples to training set
TestPredictors<-predictors[-trainningrows,]
Testpredictors. Ind < -predictors. Ind [ -traniningrow ], # select a prediction variable sample to test set
Testscore < -score [ -traniningrow ] # samples of the result variable were taken to the test set
2.2.2 resampling
In this example, a 10-fold cross resampling method is selected. the instructions in the train function are as follows:
trControl=trainControl(method="cv",number=10)
3. data modeling
3.1 Linear regression model
3.1.1 generalized Linear model
Inputting a code:
set.seed(222)
glm1<-train(x=TrainPredictors,y=TrainScore,
method="glm",
trControl=trainControl(method="cv",number=10))
glm1
##Generalized LinearModel
outputting a result:
model #477samples # prediction uses 477sample sizes
Model #30predictor # prediction uses 30 prediction variables
# Resampling: cross-validled (10 fold) # Resampling method: 10fold cross-validation # Summary ofsample sizes:429,430,430,430,429,429.
# Resampling results: # resampling results
##RMSE Rsquared MAE
##3.001024 0.03957612 2.327293
3.1.2 stepwise regression Linear model
Inputting a code:
set.seed(222)
glmstep1<-train(x=TrainPredictors,y=TrainScore,
method="glmStepAIC",
trControl=trainControl(method="cv",number=10))
outputting a result:
/>
/>
/>
3.1.3 common linear regression input code:
outputting a result:
3.1.4 partial least squares plsr input code:
outputting a result:
3.2 nonlinear regression model 3.2.1 support vector machine SVM input codes:
outputting a result:
/>
3.2.2K neighbor input code:
outputting a result:
3.3 regression Tree model
3.3.1 simple regression tree (single tree) input code:
outputting a result:
/>
3.3.2 regression model tree input code:
outputting a result:
3.3.3 random forest input codes:
outputting a result:
/>
3.3.4 cube input code:
outputting a result:
/>
the invention will be further described with reference to specific examples and experimental data.
4. Model fitting effect
By comparing MAE, RMSE and R 2 And evaluating the fitting effect of each model.
Inputting a code:
resamp<-resamples(list(glm=glm1,lm=lm1,plsr=plsr1,SVM=SVM1,knnTune=knnTune,rpartTune=rpartTune1,M5Tune=M5Tune1,cubist=cubist1,randomforest=rand omforest1))
the models were compared using the samples function.
Outputting a result:
/>
note that: MAE (mean absolute error) is the average absolute error of the model, the average value of the absolute errors, RMSE (root mean squared error) root mean square error, and the predicted valueAnd the square root of the average value of the square difference between the actual observation is used for measuring the model residual, wherein the residual is the observation value minus the predicted value of the model, and the RMSE value interprets the average distance between the observation value and the predicted value of the model. Determining coefficient (R) 2 ) Interpreted as the proportion of the information contained in the data that can be interpreted by the model.
Model R 2 The values are preferably above 0.26, with values between 0.13 and 0.26 being the median, and values between 0.02 and 0.13 being the worse (Cohen et al, 1988). From the model comparison result, the MAE and RMSE values of the random forest model are the lowest, R 2 The value is highest, is close to 0.26, and the prediction effect is best; secondly, SVM and cubist models are adopted, and other models have poor simulation effects.
5. Model predictive effects
Taking random forests as an example, the prediction and evaluation process is as follows:
(1) Prediction
Using the prediction function, predicting a test sample by using a random forest model:
predictedscore.randomforest </predictedforest 1, # randomfo rest1 is a random forest model, testPredictors is a test sample, and predictedscore.randomforest is a predicted value.
(2) Calculating RMSE and R of predicted and observed values 2
R2(PredictedScore.randomforest,TestScore)
##[1]0.256195
RMSE(PredictedScore.randomforest,TestScore)
##[1]2.330102
Similarly, 10 models were predicted and evaluated, and the results are shown in the following table:
among the models, the random forest model has the smallest absolute error (MAE) and Root Mean Square Error (RMSE) between the predicted value and observed value, and determines the coefficient (R 2 ) The largest model prediction results are the best.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When used in whole or in part, is implemented in the form of a computer program product comprising one or more computer instructions. When loaded or executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the invention is not limited thereto, but any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention will be apparent to those skilled in the art within the scope of the present invention.

Claims (7)

1. The tobacco quality prediction method based on the ecological factor model is characterized by comprising the following steps of:
respectively carrying out data transformation and screening treatment on predicted variables in tobacco quality prediction; converting character type variables in the predicted variables into metatypes; converting 6 character variables of variable Area, cultivar variety, position part, soil type of soil type, land form topography and transplant transplanting period into a factor type; the continuous variable is subjected to centering, standardization and skewness treatment, the preprocess function of the caret package was used on TN total nitrogen, ni nicotine, TS total sugar, RS reducing sugar, K potassium, cl chlorine, PE petroleum ether, st starch, N/Ni nitrogen to alkali ratio, 'RS/Ni' sugar to alkali ratio, DS disaccharide difference, K/Cl potassium to chlorine ratio, particleseze soil particle size, alidate elevation, ph, som soil organic matter, an soil available nitrogen, ap soil available phosphorus, ak soil available potassium, scl soil chlorine, B soil boron, growth days, leaf leave number, napplication nitrogen application amount, mayrain fall5 months rainfall, junerailfall 6 months rainfall the transformation is performed by 44 continuous variables, including julyrainfall7 months rainfall, augustainfall 8 months rainfall, growthrainfall growth period rainfall, maytem5 months temperature, junetem6 months temperature, julytem7 months temperature, augusttem8 months temperature, growthtem growth period temperature, mayun 5 months illumination, junesun6 months illumination, julyun 7 months illumination, augustsun8 months illumination, growthsun growth period illumination, mayhumidity5 months humidity, junehumidity6 months humidity, julyhumidity 7 months humidity, augusthumidity8 months humidity, growthhumidity growth period humidity;
creating a prediction variable set and a result variable set in tobacco quality prediction, and respectively dividing and resampling the data; the result variable refers to the sensory quality evaluation score of tobacco leaves;
selecting a plurality of regression methods to model the data; obtaining different prediction models of tobacco quality;
using root mean square error RMSE and decision coefficient R 2 Evaluating the prediction effects of the different prediction models, and selecting an optimal model from the prediction models according to the prediction effects;
the data transformation of the predicted variables comprises centering, standardization and skewness transformation; the centering is to subtract the average value of all variables, and the result is that the average value of the variable after transformation is 0; normalized data is the standard deviation of each variable divided by itself, the normalization forcing the standard deviation of the variable to be 1; the deflection transformation can remove distribution deflection, so that right deflection distribution or left deflection distribution is transformed into unbiased distribution, and the variables are approximately symmetrically distributed;
the method for carrying out data transformation on the predicted variable comprises the following steps:
(I) Constructing a trans function by using a preprocess function in a caret packet, and simultaneously carrying out centering, standardization scale and bias conversion Box Cox processing on data;
(II) after the trans function is constructed, converting the original data by using the prediction function;
the test model selects a linear regression model, a nonlinear regression model and a regression tree model; the linear regression model comprises a generalized linear model, a stepwise regression linear model and a partial least square regression model; the nonlinear regression model comprises a Support Vector Machine (SVM) model and a K nearest neighbor model; the regression tree model comprises a simple regression tree, a regression model tree, a random forest and a cube model;
predicting and evaluating the model by using the train function in the caret packet; evaluating the prediction effect of each model by using a rescmers function in a caret packet;
in the model comparison results, the RMSE and R according to each model 2 Preferably, the smaller the RMSE, the higher the model prediction accuracy, R 2 The larger the model is, the better the simulation degree of the model is;
according to the tobacco quality prediction method, quality fluctuation conditions of single-grade tobacco leaves in different areas in the current year are predicted according to the current year ecological climate change condition, so that the purchase grade and quantity of the tobacco leaves are adjusted in a targeted manner, the quantity and proportion of the purchase grade of the tobacco leaves are actively adjusted, and stable quality supply of the tobacco leaves is ensured.
2. The method for predicting tobacco leaf quality based on an ecological factor model as claimed in claim 1, wherein the method for screening predicted variables comprises the following steps:
(1) Removing zero variance variables: detecting near zero variance variables to be filtered using the function nearZerovar function in the caret packet: if the display data set has a zero variance variable, the variable needs to be removed;
(2) Removing multiple collinearity variables;
in step (2), the method for removing multiple co-linear variables comprises the following steps:
1) Calculating a correlation coefficient matrix among all the predicted variables by using a cor function in the corrplot packet;
2) Finding out the pair of prediction variables with the maximum absolute value of the correlation coefficient by using a findCorrelation function, and marking the pair of prediction variables as prediction variables A and B;
3) Calculating the average value of the correlation coefficients of the A and other predicted variables by using a head function, performing the same calculation on the B, and listing a variable column with high correlation coefficient;
4) If the average correlation coefficient of A is greater, then A is removed; if not, removing B;
5) Repeating the steps 2) -4) until the absolute values of all the correlation coefficients are lower than the set Guo value.
3. The method for predicting tobacco leaf quality based on an ecological factor model as recited in claim 1, wherein the method for creating the set of predicted variables and the set of result variables comprises:
(I) Establishing a predictive variable set predictor from the first 1 to n predictive variable columns in the data set;
(II) establishing a result variable set result for the result variable column of the n+1st column in the data dataset;
the method for carrying out segmentation processing on the data comprises the following steps:
(1) Randomly selecting training lines from the samples using the createDataPartition function in the caret packet;
(2) After obtaining a training line, creating a predictive variable training set TrainPreactor and a result variable training set TrainResult containing the training line;
(3) And simultaneously creating a prediction variable test set TestPrectors and a result variable test set TestResult by using the residual samples.
4. The method for predicting tobacco leaf quality based on an ecological factor model according to claim 1, wherein the method for resampling data comprises: resampling by a K-fold cross-validation method can be realized by using the trainControl function in the caret packet;
the K-fold cross validation method comprises the following steps:
1) Randomly dividing the samples into k sub-sets of comparable size, first fitting the model with all samples except the first sub-set;
2) Predicting the reserved first folding sample by using the model, and evaluating the model by using the result;
3) Then returning the first subset to the training set, reserving the second subset for model evaluation, and then analogizing;
4) And calculating the mean value and standard deviation of the k obtained model evaluation results, and then based on the evaluation results, calculating the relationship between the demodulation optimal parameters and the model performances.
5. A computer terminal, the computer terminal comprising:
the segmentation resampling module is used for creating a prediction variable set and a result variable set in tobacco quality prediction, and respectively carrying out segmentation and resampling on the data;
the prediction model acquisition module is used for selecting a plurality of regression methods to model the data; obtaining prediction models in different tobacco quality predictions;
an optimal model screening module for determining a coefficient R by adopting a Root Mean Square Error (RMSE) 2 And evaluating the prediction effects of the different prediction models, and selecting an optimal model from the prediction models according to the prediction effects.
6. A computer readable storage medium storing instructions that when run on a computer cause the computer to perform the ecological factor model based tobacco quality prediction method of any one of claims 1 to 4.
7. An application of the tobacco leaf quality prediction method based on the ecological factor model according to any one of claims 1-4 in detection, evaluation and prediction of tobacco leaf economic character, disease rate, appearance quality, physical characteristics, chemical composition, aroma substances and sensory evaluation of tobacco leaf production quality in different planting areas and different years.
CN202011141976.2A 2020-10-23 2020-10-23 Method, medium and application for constructing tobacco leaf quality prediction model by using R language Active CN112287601B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011141976.2A CN112287601B (en) 2020-10-23 2020-10-23 Method, medium and application for constructing tobacco leaf quality prediction model by using R language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011141976.2A CN112287601B (en) 2020-10-23 2020-10-23 Method, medium and application for constructing tobacco leaf quality prediction model by using R language

Publications (2)

Publication Number Publication Date
CN112287601A CN112287601A (en) 2021-01-29
CN112287601B true CN112287601B (en) 2023-08-01

Family

ID=74424144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011141976.2A Active CN112287601B (en) 2020-10-23 2020-10-23 Method, medium and application for constructing tobacco leaf quality prediction model by using R language

Country Status (1)

Country Link
CN (1) CN112287601B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256021B (en) * 2021-06-16 2021-10-15 北京德风新征程科技有限公司 Product quality alarm method and device based on ensemble learning
CN113488113B (en) * 2021-07-12 2024-02-23 浙江中烟工业有限责任公司 Industrial use value identification method for redried strip tobacco
CN115481750A (en) * 2022-09-20 2022-12-16 云南省农业科学院农业环境资源研究所 On-line prediction method and system for nitrate nitrogen in underground water based on machine learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107991969A (en) * 2017-12-25 2018-05-04 云南五佳生物科技有限公司 A kind of wisdom tobacco planting management system based on Internet of Things
CN110990784A (en) * 2019-11-19 2020-04-10 湖北中烟工业有限责任公司 Cigarette ventilation rate prediction method based on gradient lifting regression tree

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201611596D0 (en) * 2016-07-04 2016-08-17 British American Tobacco Investments Ltd Apparatus and method for classifying a tobacco sample into one of a predefined set of taste categories
CN110751335B (en) * 2019-10-21 2024-06-14 中国气象局沈阳大气环境研究所 Regional ecological quality annual scene prediction evaluation method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107991969A (en) * 2017-12-25 2018-05-04 云南五佳生物科技有限公司 A kind of wisdom tobacco planting management system based on Internet of Things
CN110990784A (en) * 2019-11-19 2020-04-10 湖北中烟工业有限责任公司 Cigarette ventilation rate prediction method based on gradient lifting regression tree

Also Published As

Publication number Publication date
CN112287601A (en) 2021-01-29

Similar Documents

Publication Publication Date Title
CN112287601B (en) Method, medium and application for constructing tobacco leaf quality prediction model by using R language
Luo et al. Greater than the sum of the parts: how the species composition in different forest strata influence ecosystem function
Singh et al. Identifying dominant controls on hydrologic parameter transfer from gauged to ungauged catchments–A comparative hydrology approach
CN108877905B (en) Hospital outpatient quantity prediction method based on Xgboost framework
CN110674604A (en) Transformer DGA data prediction method based on multi-dimensional time sequence frame convolution LSTM
Yang et al. A new method for generating a global forest aboveground biomass map from multiple high-level satellite products and ancillary information
Haque et al. Crop yield analysis using machine learning algorithms
Kawakita et al. Prediction and parameter uncertainty for winter wheat phenology models depend on model and parameterization method differences
Zhang et al. Machine learning models for net photosynthetic rate prediction using poplar leaf phenotype data
Pagès et al. Links between root length density profiles and models of the root system architecture
Renton et al. Functional–structural plant modelling using a combination of architectural analysis, L-systems and a canonical model of function
Michel et al. Reconstructing climatic modes of variability from proxy records using ClimIndRec version 1.0
CN113361194B (en) Sensor drift calibration method based on deep learning, electronic equipment and storage medium
Aboelyazeed et al. A differentiable, physics-informed ecosystem modeling and learning framework for large-scale inverse problems: Demonstration with photosynthesis simulations
Cornet et al. Assessing allometric models to predict vegetative growth of yams in different environments
Andermann et al. The origin and evolution of open habitats in North America inferred by Bayesian deep learning models
van Oijen et al. Process‐based modeling of timothy regrowth
CN112881333B (en) Near infrared spectrum wavelength screening method based on improved immune genetic algorithm
US20230214668A1 (en) Hyperparameter adjustment device, non-transitory recording medium in which hyperparameter adjustment program is recorded, and hyperparameter adjustment program
He et al. Developing machine learning models with multiple environmental data to predict stand biomass in natural coniferous-broad leaved mixed forests in Jilin Province of China
Aboelyazeed et al. A differentiable ecosystem modeling framework for large-scale inverse problems: demonstration with photosynthesis simulations
Clark et al. Deep learning for monthly rainfall-runoff modelling: a comparison with classical rainfall-runoff modelling across Australia
Clark et al. Deep learning for monthly rainfall–runoff modelling: a large-sample comparison with conceptual models across Australia
CN113361596B (en) Sensor data augmentation method, system and storage medium
Hisano et al. Functional diversity enhances dryland forest productivity under long-term climate change

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant