CN113657452A - Tobacco leaf quality grade classification prediction method based on principal component analysis and super learning - Google Patents
Tobacco leaf quality grade classification prediction method based on principal component analysis and super learning Download PDFInfo
- Publication number
- CN113657452A CN113657452A CN202110817834.1A CN202110817834A CN113657452A CN 113657452 A CN113657452 A CN 113657452A CN 202110817834 A CN202110817834 A CN 202110817834A CN 113657452 A CN113657452 A CN 113657452A
- Authority
- CN
- China
- Prior art keywords
- index
- data
- classification prediction
- tobacco
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 235000002637 Nicotiana tabacum Nutrition 0.000 title claims abstract description 104
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000000513 principal component analysis Methods 0.000 title claims abstract description 27
- 244000061176 Nicotiana tabacum Species 0.000 title 1
- 241000208125 Nicotiana Species 0.000 claims abstract description 103
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 56
- 238000012549 training Methods 0.000 claims abstract description 20
- 238000012795 verification Methods 0.000 claims abstract description 9
- 239000000126 substance Substances 0.000 claims description 27
- 230000001953 sensory effect Effects 0.000 claims description 23
- 238000007477 logistic regression Methods 0.000 claims description 20
- 238000003066 decision tree Methods 0.000 claims description 19
- 230000000704 physical effect Effects 0.000 claims description 15
- 239000000203 mixture Substances 0.000 claims description 8
- 238000007637 random forest analysis Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 5
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 claims description 4
- 238000012706 support-vector machine Methods 0.000 claims description 4
- ZAMOUSCENKQFHK-UHFFFAOYSA-N Chlorine atom Chemical compound [Cl] ZAMOUSCENKQFHK-UHFFFAOYSA-N 0.000 claims description 2
- 206010013911 Dysgeusia Diseases 0.000 claims description 2
- 241000196324 Embryophyta Species 0.000 claims description 2
- ZLMJMSJWJFRBEC-UHFFFAOYSA-N Potassium Chemical compound [K] ZLMJMSJWJFRBEC-UHFFFAOYSA-N 0.000 claims description 2
- 229920002472 Starch Polymers 0.000 claims description 2
- NMLQNVRHVSWEGS-UHFFFAOYSA-N [Cl].[K] Chemical compound [Cl].[K] NMLQNVRHVSWEGS-UHFFFAOYSA-N 0.000 claims description 2
- 238000013528 artificial neural network Methods 0.000 claims description 2
- 239000000460 chlorine Substances 0.000 claims description 2
- 229910052801 chlorine Inorganic materials 0.000 claims description 2
- 239000007789 gas Substances 0.000 claims description 2
- 230000007794 irritation Effects 0.000 claims description 2
- 229910052757 nitrogen Inorganic materials 0.000 claims description 2
- 239000011591 potassium Substances 0.000 claims description 2
- 229910052700 potassium Inorganic materials 0.000 claims description 2
- 239000008107 starch Substances 0.000 claims description 2
- 235000019698 starch Nutrition 0.000 claims description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 2
- 239000003513 alkali Substances 0.000 claims 3
- 238000004590 computer program Methods 0.000 claims 3
- 239000002585 base Substances 0.000 claims 1
- 238000007635 classification algorithm Methods 0.000 claims 1
- 239000011159 matrix material Substances 0.000 description 28
- 238000011156 evaluation Methods 0.000 description 26
- 239000013598 vector Substances 0.000 description 20
- 238000012360 testing method Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000013145 classification model Methods 0.000 description 5
- 238000013441 quality evaluation Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000002790 cross-validation Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 235000019504 cigarettes Nutrition 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000007621 cluster analysis Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 239000002994 raw material Substances 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 229930013930 alkaloid Natural products 0.000 description 1
- 150000003797 alkaloid derivatives Chemical class 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000000701 chemical imaging Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004737 colorimetric analysis Methods 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000002513 implantation Methods 0.000 description 1
- 238000012844 infrared spectroscopy analysis Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000011158 quantitative evaluation Methods 0.000 description 1
- 239000000779 smoke Substances 0.000 description 1
- 239000002689 soil Substances 0.000 description 1
- 235000019505 tobacco product Nutrition 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06395—Quality analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Development Economics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- Educational Administration (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Primary Health Care (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Manufacturing & Machinery (AREA)
- Probability & Statistics with Applications (AREA)
- Manufacture Of Tobacco Products (AREA)
Abstract
The invention discloses a tobacco leaf quality grade classification prediction method based on principal component analysis and super learning, which comprises the following steps: 1) grouping the tobacco leaf quality data samples according to the set index types; 2) performing principal component analysis on the index data in each index data set respectively, reducing the dimension of the data and eliminating the correlation; 3) training each basic learning algorithm in the super learning framework by using each processed index data set to obtain a first-level classification prediction model; 4) selecting verification data and inputting the verification data into a corresponding first-stage classification prediction model to obtain a classification prediction result; 5) training the classification prediction results as input data of a meta-learner in a super-learning frame to obtain an optimized weight combination of each first-stage classification prediction model and create a super-learning model; 6) and inputting the index data of the tobacco quality data to be identified into the super learning model to obtain the tobacco quality grade classification prediction result of the tobacco quality data to be identified.
Description
Technical Field
The invention relates to a tobacco quality grade classification prediction method based on machine learning, in particular to a method for realizing tobacco quality grade classification prediction based on principal component analysis and super learning.
Background
Tobacco leaves are important raw materials in the tobacco industry, and the relationship among quality indexes such as appearance, physical properties, chemical components, sense and the like of the tobacco leaves is the focus of much research attention and has direct influence on the quality of cigarette products. China is a big country for planting, producing and consuming tobacco leaves, and the quality of the tobacco leaves has great difference due to the influence of climate, soil, regional environment, variety, planting measures, implantation parts and baking process. The quality dynamics of the tobacco leaves are mastered, the quality grade of the tobacco leaves is determined, and the method has important significance for tobacco leaf production and cigarette industry. Meanwhile, the tobacco quality evaluation is a complex system engineering, and the scientific, objective and accurate evaluation of the tobacco quality is helpful for guiding the production, purchase and industrial application of tobacco raw materials.
The conventional chemical component evaluation method is relatively objective and also contains rich tobacco quality information, but cannot comprehensively reflect the tobacco quality. Many researchers develop index evaluation method research, and determine the quality of the tobacco leaves according to the height of the accumulated value of each index score or the correlation parameter. The proposed tobacco evaluation system mostly only carries out single monitoring evaluation analysis on one or more indexes of the tobacco, and does not carry out comprehensive and comprehensive index evaluation analysis. The comprehensive evaluation method has the phenomenon of large evaluation result difference due to the fact that related indexes are many, the weight relation is complex, and the comprehensive evaluation method is influenced by factors such as sample sources and algorithm mechanisms.
In recent years, researchers have developed researches on automatic tobacco leaf grading methods, most of which utilize image processing and colorimetry theories to carry out grading according to the appearance characteristics of tobacco leaves, and also utilize a hyperspectral imaging technology or infrared spectroscopy analysis method to obtain the internal structural characteristics of the tobacco leaves, but the internal structural characteristics of the tobacco leaves, such as chemical components, physical characteristics, smoke panel indicators and the like, are not taken into consideration comprehensively, and the method has the defects of long acquisition time, possible damage to the tobacco leaves and the like.
In addition, the mathematical statistics method is widely applied to the aspect of tobacco quality evaluation, such as research and application of fuzzy mathematics, typical correlation analysis, cluster analysis, principal component analysis and other methods, and researchers combine tobacco appearance quality evaluation and conventional chemical component evaluation to establish a tobacco quality evaluation model based on cluster analysis. Aiming at providing a scientific method for evaluating the quality of tobacco leaves. The method is mainly used for analyzing and evaluating the relationship between single tobacco leaf quality and every two tobacco leaves by using a mathematical statistics method.
In summary, on one hand, accurate evaluation of the quality of tobacco leaves has important significance for tobacco leaf production and cigarette industry, and on the other hand, the prior method has the problems of large evaluation result difference and low accuracy of classification prediction of the quality grade of the tobacco leaves aiming at the problems of more indexes, subjective and objective differences and the like.
The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
Disclosure of Invention
The invention provides a tobacco quality grade classification prediction method based on principal component analysis and super learning, aiming at the problems that the evaluation result difference is large and the accuracy of tobacco quality grade classification prediction is low due to the fact that the tobacco quality grade evaluation relates to more indexes, subjective and objective differences and the like.
In order to achieve the purpose, the invention adopts the following technical scheme:
the tobacco leaf quality grade classification prediction method based on principal component analysis and super learning comprises the following steps:
(1) grouping the tobacco leaf quality data according to appearance indexes, sensory quality indexes, chemical component indexes and physical characteristic indexes;
(2) and respectively carrying out principal component analysis on the appearance index, the sensory quality index, the chemical component index and the physical characteristic index data, reducing the dimension of the corresponding index data, and eliminating the correlation among the data. Taking the data after dimensionality reduction as input data of subsequent super learning;
(3) and selecting a classification prediction algorithm for a basic learning algorithm in the super learning. Algorithms supporting classification prediction, such as multiple logistic regression, gradient boosting decision trees, random forests, support vector machines and the like, can be selected;
(4) respectively forming a data set by the dimensionality reduction data of the appearance index, the sensory quality index, the chemical component index and the physical characteristic index after principal component analysis processing; respectively using a selected basic learning algorithm to carry out training fitting on each data set to obtain a first-stage classification prediction model;
(5) according to the V-fold cross validation scheme, the dimension reduction data of the appearance index, the sensory quality index, the chemical component index and the physical property index after principal component analysis processing are respectively divided into V data sets with the same size. Selecting one group as a verification sample set and the other groups as training sample sets each time;
(6) for each folding, training a model on a training sample set by using different basic learning algorithms, applying the trained model to a corresponding verification sample set to perform classification prediction, and storing a classification prediction result on the corresponding verification sample set;
(7) and stacking the tobacco leaf quality classification prediction results of the first-stage classification prediction model as a second-stage classification prediction model, namely, input data of the meta-learning algorithm. Selecting a meta-learning algorithm in super-learning, selecting one of linear classification, gradient elevator, random forest, neural network, naive Bayes, xgboost and other algorithms, training an optimized meta-learning algorithm model through a minimum loss function, and obtaining optimized weight combination parameters of a first-stage classification prediction model;
(8) and (4) combining the first-stage classification prediction model obtained in the step (4) and the weight combination parameters obtained in the step (7) to create a super learning model for tobacco quality grade classification prediction.
Compared with the prior art, the invention has the following positive effects:
according to the tobacco quality grade classification prediction model based on principal component analysis and super learning, principal component analysis is utilized to perform dimensionality reduction on appearance indexes, sensory quality indexes, chemical component indexes and physical characteristic index data, the correlation of different data of the same kind of indexes is eliminated, and the influence of the multi-dimensional data correlation on classification accuracy is reduced; the optimal weighted combination of the basic learning model is realized through a stack integration mechanism of super learning, the influence of the appearance quality, chemical components, physical characteristics and sensory quality indexes of the tobacco leaves on the quality grade classification of the tobacco leaves is comprehensively considered, and the accuracy of the quality grade classification prediction of the tobacco leaves is improved; the overfitting problem can be effectively avoided in the training and modeling process based on the V-fold cross validation, so that the proposed tobacco quality grade classification prediction model has good robustness.
Drawings
FIG. 1 is a flow chart of a tobacco leaf quality grade classification prediction method based on principal component analysis and super learning.
Detailed Description
In order to make the technical solution, the creation features, the achievement objects and the effects of the present invention easy to understand, the following detailed description of the embodiments of the present invention.
Aiming at the problems that the tobacco quality grade evaluation relates to more indexes, the existing classification evaluation method has larger evaluation result difference and low tobacco quality grade classification prediction accuracy, the invention provides a tobacco quality grade classification prediction method based on principal component analysis and super learning, which comprises the following specific steps:
step 1: the tobacco leaf quality data are grouped according to the appearance index, the sensory quality index, the chemical component index and the physical characteristic index.
The tobacco leaf quality data comprises appearance indexes, sensory quality indexes, chemical component indexes and physical property index values, and 30 evaluation index items are counted. The appearance index refers to the GB 2635-1992 flue-cured tobacco grading standard to evaluate the appearance quality of the flue-cured tobacco, evaluates 6 indexes such as the color, the maturity, the leaf structure, the identity, the oil content and the chroma of the tobacco, and grades the sample based on 10 grades, wherein the higher the score is, the higher the quality is. The sensory indexes include 7 items of aroma quality, aroma amount, concentration, strength, miscellaneous gas, irritation, aftertaste, etc. Based on the standard YC/T138-1998 sensory evaluation method of tobacco and tobacco products. And carrying out quantitative scoring by adopting a 9-point quantitative evaluation method. The flue-cured tobacco chemical component indexes comprise 7 indexes of total plant alkaloid, total sugar, reducing sugar, total nitrogen, potassium, chlorine, starch and the like, and 3 derived indexes of nitrogen-base ratio, sugar-base ratio and potassium-chlorine ratio. The physical properties of the tobacco leaves refer to the external form and physical properties of the tobacco leaves. The physical property indexes comprise 7 indexes of thickness, elongation, filling value, tensile force, stalk content, equilibrium water content, leaf surface density and the like.
And segmenting the sample data containing the 30 evaluation index items according to the index types, and segmenting the sample data into four data sets, namely an appearance index sample set, a sensory quality index sample set, a chemical composition index sample set and a physical property index sample set.
It should be noted that, although the currently generally accepted tobacco quality evaluation index classification scheme is followed, the technical scheme proposed by the present invention is not limited to the above data grouping scheme, and the technical scheme proposed by the present invention is still applicable in future, such as adding other evaluation index categories or increasing or decreasing specific evaluation index items.
Step 2: and respectively carrying out principal component analysis on the appearance index, the sensory quality index, the chemical component index and the physical characteristic index data, reducing the dimension of the corresponding index data, and eliminating the correlation among the data. And taking the data after dimensionality reduction as input data of subsequent super learning.
Principal component analysis uses dependencies between variables to represent high-dimensional data in a lower-dimensional form that is easier to process without losing too much information. Assuming there is a p-dimensional vector, it needs to be reduced to a q-dimensional subspace. Dimensionality reduction can be achieved by projecting the original vector into a subspace spanned by q-dimensional principal components. The principal component is mathematically calculated by maximizing the projection variance. The first principal component is a direction in space in which the projection of the p-dimensional data has the largest variance. The second principal component is the direction having the largest projection variance among all directions orthogonal to the first principal component. By analogy, the kth principal component is the direction with the largest projection variance among all directions orthogonal to the first k-1 principal components.
Assuming that there are n observation records, each observation record has p variables, the centralized data is abstracted into an n X p dimensional matrix X, and the ith observation vector is expressed as a p dimensional vectorSelecting a p-dimensional unit vector(Vector)The matrix of (a) is denoted as W, representing a matrix of dimension p × 1. Vector quantityIn the vectorThe projection in the direction isDue to the data centralization, all the observed data areThe variance of the projection in direction is:
where V is the covariance matrix of the observed data. To find out so thatMaximum unit vectorSatisfied by unit vectorOr WTAnd (3) introducing a Lagrange multiplier lambda under the constraint condition of W being 1, multiplying the lambda by a constraint equation, adding the constraint equation to the objective function, and solving the unconstrained optimization problem. Namely:
setting the partial derivative to 0 to obtain an extreme value, and obtaining:
WTW=1
VW=λW
can obtain and solve the vectorIs the eigenvector of the covariance matrix V of the observed data. So that the varianceLargest sizeIs the eigenvector corresponding to the largest eigenvalue λ. Since V is a symmetric covariance matrix of dimension p × p, V has p different and mutually orthogonal eigenvectors. Since the covariance matrix is also a positive definite matrix, the eigenvalues of V are ≧ 0. The feature vector of V constitutes the principal component of the observed data. The eigenvalues describe the variance ratios of the corresponding principal component interpretations, i.e. the cumulative variance of the projections in the first q principal component directions is
Suppose that the sample data has n observation records, each observation record has p variables, which are expressed as n X p dimensional matrix X, and the ith observation vector represents p dimensional vector asThe principal component analysis algorithm for reducing the dimension of the sample data into q dimension is as follows:
(2) computing covariance matrix X of sample dataTX;
(3) For covariance matrix XTCarrying out characteristic value decomposition on the X;
(4) sorting the obtained characteristic values in the descending order, and taking out the characteristic vectors corresponding to the first q maximum characteristic valuesForming a feature vector matrix W;
And respectively performing principal component analysis on the four sample data sets of the appearance index, the sensory quality index, the chemical component index and the physical property index by using the principal component analysis algorithm. And selecting principal components with the accumulated variance contribution rate of more than 95 percent to each index to form a feature vector matrix W. Original data is converted into a low-dimensional version, and correlation among index items of the original data is eliminated.
And step 3: a classification prediction algorithm is selected for a base learner in the super-learning. Algorithms supporting classification prediction, including multiple logistic regression, gradient boosting decision trees, random forests, support vector machines, and the like, may be selected.
The super learning is a stack integrated learning method, a group of basic learning algorithms are trained and predicted by using V-fold cross validation, and an optimized weighted combination of a basic learner is constructed based on a prediction result, so that the accuracy and stability of the final prediction result are improved.
Algorithms supporting multivariate classification prediction, such as multivariate logistic regression, gradient boosting decision trees, random forests, support vector machines and the like, can be used as basic learning algorithms and applied to tobacco quality grade classification prediction. Building a base learning algorithm libraryAdding a base learning algorithm toIn (1). The multiple logistic regression, gradient boosting decision tree algorithm applied in the present invention is described here.
(1) Multiple logistic regression algorithm
Multiple logistic regression is a generalization of logistic regression. And (3) realizing multivariate logistic regression by using a Softmax regression algorithm, and modeling classification into conditional probability for judging the classification given observation data. Suppose that N observation records contain K different classes, each output class having a corresponding coefficient vector βkGiven an observation x, the conditional probability that x belongs to category c is modeled as:
the parameter estimation is performed using a maximum likelihood method. The likelihood function is defined as:
the maximum log-likelihood function is:
(2) gradient boosting decision tree algorithm
Gradient boosting is a machine learning technique that integrates both gradient-based optimization and boosting tools. Gradient-based optimization uses a gradient to calculate a loss function. Boosting refers to creating a robust ensemble learning system for predictive tasks by stepping up weak models. The following describes a Gradient Boosting classification Decision Tree (GBDT) algorithm, which implements a class K classification model.
K regression trees are constructed in the algorithm, each tree representing one target class. m denotes the number of weak classifiers added to the current ensemble. In the inner loop, the first step is to first calculate the residual rikm(line 5 of the algorithm), which is actually the gradient value over the N bins of the Classification And Regression decision tree (CART). A regression tree is then constructed to fit these gradient calculations (row 6 of the algorithm). For the generated decision tree, the approximation of the best negative gradient fit for each leaf node is calculated separately (row 7 in the algorithm). Based on gradientThe descent optimization method adds the constructed regression tree to the ensemble learning model to improve the training precision (line 8 in the algorithm). And completing training through M iterations for predicting tasks.
And 4, step 4: respectively representing the dimension reduction data of the appearance index, the sensory quality index, the chemical component index and the physical characteristic index after the principal component analysis treatment as Xwgi=(Yi,Wwgi),Xggi=(Yi,Wggi),Xhxi=(Yi,Whxi),Xwli=(Yi,Wwli) I is 1, …, n. Wherein Y is the grade category corresponding to the sample index, W is the principal component value of the corresponding index after dimensionality reduction, and Y isiIs the ith class, WwgiThe value of the main component, W, of the appearance index data of the ith category grade tobacco data after dimensionality reductionggiThe main component value, W, of the sensory quality index data of the ith category grade tobacco data after the dimensionality reductionhxiThe main component value, W, of the chemical component index data of the ith category grade tobacco data after dimensionality reductionwliAnd (4) reducing the value of the main component of the physical characteristic index data of the ith category grade tobacco data after dimension reduction. Using a base learning algorithm libraryEach algorithm in (1) is respectively at Xwg={Xwgi:i=1,…,n},Xgg={Xggi:i=1,…,n},Xhx={Xhxi:i=1,…,n},Xwl={Xwli: training modeling on i ═ 1, …, n }, ifThe method comprises K basic learning algorithms, and 4 xK first-stage classification prediction models are obtained
And 5: according to the V-fold cross validation scheme, the data set X is divided intowg、Xgg、XhxAnd XwlIn the same orderAnd segmenting into a training sample set and a verification sample set. The specific operation is as follows: data set Xwg、Xgg、XhxAnd XwlDividing into V subsets with equal size according to the same sequence, and for XjJ ∈ (wg, gg, hx, wl), select the V-th group as the validation sample set, and the other groups as the training sample set, where V ═ 1, …, V. Definition of Tj(v) Is XjV-th training data packet of, Vj(v) Is XjThe corresponding authentication data packet. Then Tj(v)=Xj\Vj(v),v=1,…,V&j∈(wg,gg,hx,wl)。
Step 6: for the v-th folded packet, at Tj(v) Use of j ∈ (wg, gg, hx, wl)Training the model by each algorithm in (1), and applying the trained model to the corresponding verification sample set Vj(v) Performs classification prediction on the data and retains the data at Vj(v) The predicted result of (1):
and 7: stacking the tobacco leaf quality classification prediction results of the first-stage classification prediction model to obtain an n multiplied by 4K matrix expressed asIn which symbols are usedRepresents Vj(v) Verifying covariate W corresponding to samplej. A weighted combination of the prediction results for the first class classification is proposed as follows:
and (3) carrying out fitting estimation by using a multi-class classification supporting algorithm, wherein a multivariate logistic regression algorithm is also used as a meta-learner for modeling and estimating the alpha parameter, and a weight parameter combination alpha which enables the final loss to be minimum is selected. The following were used:
and 8: correspondingly classifying and predicting the first-stage classification model obtained in the step 4 according to the weighted combination of m (z | alpha)And the weight parameters obtained in the step 7In combination, a super learning model for tobacco leaf quality grade classification prediction is created:
it should be noted that the super-learning algorithm does not limit the method for weighted combination of the first-stage classification prediction results. Here, a convex combination limit is imposed on the alpha parameter, i.e.Is for the final super-learning prediction modelIt is possible to provide a better stability of the liquid,and predicting the k parameter weight estimated value of the model for the j first-stage classification. Since the prediction result of the super-learning requires a bounded penalty function, the limitation of convex combinations means if the base learning algorithm library isThe algorithm in (1) is bounded, then the overall convex combination will also be bounded.
Based on the technical scheme, the method is specifically implemented on a tobacco scientific research big data analysis model and a visual platform. The tobacco leaf quality data used in the study included 4133 pieces of tobacco leaf quality data collected between 2010 and 2017. Each observation datum comprises appearance indexes, sensory quality indexes, chemical component indexes and physical property index numerical values, and 30 evaluation index items are counted. In addition, each observation record also comprises corresponding information such as grade, tobacco area, odor type, tobacco variety and the like. The quality grades of the tobacco leaves are divided into three grades of B2F, C3F and X2F. Firstly, modeling and evaluating the classification prediction effect by using a multiple logistic regression algorithm and a gradient lifting decision tree algorithm according to an appearance index, a sensory quality index, a chemical composition index and a physical property index respectively. Then, the principal component analysis and super learning-based method provided by the invention is used for modeling and evaluation, and comparative analysis is carried out. The classification prediction effect was evaluated using Precision (Precision), Recall (Recall), Accuracy (Accuracy), and F1 scores.
Among the tobacco leaf quality data, 70% of the data were randomly selected, and 2878 records were used as training samples. The remaining 30% of the data, 1255 total records were used as test samples. And respectively carrying out classification experiments based on a multiple logistic regression algorithm and a gradient lifting decision tree algorithm. And taking the appearance index item in the tobacco quality data as an input variable. The confusion matrix for the test results obtained for the three quality levels B2F, C3F and X2F are shown in tables 1 and 2:
TABLE 1 appearance index multiple logistic regression model confusion matrix
TABLE 2 appearance index gradient boosting decision tree model confusion matrix
B2F | C3F | X2F | Error | Rate | Precision | |
B2F | 371 | 42 | 2 | 0.1060 | 44/415 | 0.91 |
C3F | 30 | 358 | 59 | 0.1991 | 89/447 | 0.81 |
X2F | 6 | 41 | 346 | 0.1196 | 47/393 | 0.85 |
Total | 407 | 441 | 407 | 0.1434 | 180/1255 | |
Recall | 0.89 | 0.80 | 0.88 |
For the appearance indexes, the accuracy rates of the model based on multiple logistic regression on the classification results of the quality grades of the three tobacco leaves B2F, C3F and X2F are 91%, 80% and 85% respectively, the recall rate is 90%, 80% and 86% respectively, and the F1 scores are 0.905, 0.8 and 0.855 respectively. The overall model accuracy was 85%. The accuracy rates of the model based on the gradient lifting decision tree on the classification results of the quality grades of the tobacco leaves B2F, C3F and X2F are 91%, 81% and 85% respectively, the recall rates are 89%, 80% and 88% respectively, and the F1 scores are 0.9, 0.805 and 0.865 respectively. The overall model accuracy was 86%.
And taking the sensory quality index item in the tobacco leaf quality data as an input variable. The confusion matrix for the test results is shown in tables 3 and 4:
TABLE 3 sensory quality index multiple logistic regression model confusion matrix
B2F | C3F | X2F | Error | Rate | Precision | |
B2F | 360 | 49 | 6 | 0.1325 | 55/415 | 0.86 |
C3F | 47 | 353 | 47 | 0.2103 | 94/447 | 0.79 |
X2F | 13 | 46 | 334 | 0.1501 | 59/393 | 0.86 |
Total | 420 | 448 | 387 | 0.1657 | 208/1255 | |
Recall | 0.87 | 0.79 | 0.85 |
TABLE 4 sensory quality index gradient boosting decision tree model confusion matrix
B2F | C3F | X2F | Error | Rate | Precision | |
B2F | 358 | 46 | 11 | 0.1373 | 57/415 | 0.85 |
C3F | 52 | 348 | 47 | 0.2215 | 99/447 | 0.81 |
X2F | 9 | 34 | 350 | 0.1094 | 43/393 | 0.86 |
Total | 419 | 428 | 408 | 0.1586 | 199/1255 | |
Recall | 0.86 | 0.78 | 0.89 |
For the sensory quality indexes, the accuracy rates of the model based on multiple logistic regression on the classification results of the quality grades of the three tobacco leaves B2F, C3F and X2F are respectively 86%, 79% and 86%, the recall rates are respectively 87%, 79% and 85%, and the F1 scores are respectively 0.865, 0.79 and 0.855. The overall model accuracy was 83%. The accuracy rates of the model based on the gradient lifting decision tree on the classification results of the quality grades of the tobacco leaves B2F, C3F and X2F are respectively 85%, 81% and 86%, the recall rates are respectively 86%, 78% and 89%, and the F1 scores are respectively 0.855, 0.795 and 0.875. The overall model accuracy was 84%.
And taking the chemical composition index items in the tobacco quality data as input variables. The confusion matrices for the test results are shown in tables 5 and 6:
TABLE 5 chemical composition index multiple logistic regression model confusion matrix
B2F | C3F | X2F | Error | Rate | Precision | |
B2F | 328 | 76 | 11 | 0.2096 | 87/415 | 0.76 |
C3F | 91 | 273 | 83 | 0.3893 | 174/447 | 0.63 |
X2F | 11 | 85 | 297 | 0.2443 | 96/393 | 0.76 |
Total | 430 | 434 | 391 | 0.2845 | 357/1255 | |
Recall | 0.79 | 0.61 | 0.76 |
TABLE 6 gradient lifting decision tree model confusion matrix for chemical composition indexes
B2F | C3F | X2F | Error | Rate | Precision | |
B2F | 328 | 76 | 11 | 0.2096 | 87/415 | 0.75 |
C3F | 101 | 261 | 85 | 0.4161 | 186/447 | 0.60 |
X2F | 10 | 98 | 285 | 0.2748 | 108/393 | 0.75 |
Total | 439 | 435 | 381 | 0.3036 | 381/1255 | |
Recall | 0.79 | 0.58 | 0.73 |
For the chemical composition indexes, the accuracy rates of the model based on multiple logistic regression on the classification results of the quality grades of the three tobacco leaves B2F, C3F and X2F are respectively 76%, 63% and 76%, the recall rates are respectively 79%, 61% and 76%, and the F1 scores are respectively 0.775, 0.620 and 0.76. The overall model accuracy was 72%. The accuracy rates of the model based on the gradient lifting decision tree on the classification results of the quality grades of the tobacco leaves B2F, C3F and X2F are respectively 75%, 60% and 75%, the recall rates are respectively 79%, 58% and 73%, and the F1 scores are respectively 0.77, 0.59 and 0.74. The overall model accuracy was 70%.
And taking the physical characteristic index item in the tobacco quality data as an input variable. The confusion matrix for the test results is shown in tables 7 and 8:
TABLE 7 multiple logistic regression model confusion matrix for physical property index
TABLE 8 gradient boosting decision tree model confusion matrix for physical property index
B2F | C3F | X2F | Error | Rate | Precision | |
B2F | 378 | 35 | 2 | 0.0892 | 37/415 | 0.92 |
C3F | 34 | 398 | 15 | 0.1096 | 49/447 | 0.89 |
|
1 | 16 | 376 | 0.0433 | 17/393 | 0.96 |
Total | 413 | 449 | 393 | 0.0821 | 103/1255 | |
Recall | 0.91 | 0.89 | 0.96 |
For the physical property indexes, the accuracy rates of the model based on multiple logistic regression on the classification results of the quality grades of the three tobacco leaves B2F, C3F and X2F are respectively 88%, 86% and 96%, the recall rates are respectively 91%, 86% and 93%, and the F1 scores are respectively 0.895, 0.86 and 0.945. The overall model accuracy was 90%. The accuracy rates of the model based on the gradient lifting decision tree on the classification results of the quality grades of the three tobacco leaves B2F, C3F and X2F are respectively 92%, 89% and 96%, the recall rates are respectively 91%, 89% and 96%, and the F1 scores are respectively 0.915, 0.89 and 0.96. The overall model accuracy was 92%.
And (4) carrying out experimental evaluation by using a tobacco quality grade classification model based on principal component analysis and super learning. Of the tobacco leaf quality data, 70% of the data were randomly selected, and 2910 records were used as training samples. The remaining 30% of the data, 1223 total records were used as test samples. The confusion matrix for the test results is shown in table 9:
TABLE 9 Main component analysis and Hyperlearning model based confusion matrix
B2F | C3F | X2F | Error | Rate | Precision | |
B2F | 375 | 14 | 0 | 0.0360 | 14/389 | 0.97 |
C3F | 10 | 402 | 9 | 0.0451 | 19/421 | 0.95 |
X2F | 0 | 9 | 404 | 0.0218 | 9/413 | 0.98 |
Total | 385 | 425 | 413 | 0.0343 | 42/1223 | |
Recall | 0.96 | 0.95 | 0.98 |
The tobacco leaf quality grade classification model based on principal component analysis and super learning respectively has the corresponding accuracy rates of 97%, 95% and 98% on the classification results of the tobacco leaf quality grades B2F, C3F and X2F, the corresponding recall rates of 96%, 95% and 98% and the corresponding F1 scores of 0.965, 0.95 and 0.98. The overall accuracy of the model was 97%.
According to the evaluation result, the tobacco quality grade classification model based on principal component analysis and super learning obviously improves the tobacco quality grade classification prediction effect.
The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.
Claims (9)
1. A tobacco leaf quality grade classification prediction method based on principal component analysis and super learning comprises the following steps:
1) grouping the tobacco quality data samples according to set index types to obtain N groups of index data sets with different index types;
2) performing principal component analysis on the index data in each index data set respectively, performing dimensionality reduction on the corresponding index data and eliminating correlation among the index data;
3) taking each index data set processed in the step 2) as input data of each basic learning algorithm in a super learning framework, and training the input data to respectively obtain a corresponding first-stage classification prediction model; obtaining N × M first-stage classification prediction models in total, wherein M is the number of basic learning algorithms in the super learning frame;
4) selecting a part of data from each index data set processed in the step 2) as verification data and inputting the verification data into each first-stage classification prediction model obtained by training the index data set to obtain a corresponding classification prediction result;
5) training each classification prediction result obtained in the step 4) as input data of a meta learner in the super learning frame to obtain an optimized weight combination of each first-stage classification prediction model;
6) combining each first-stage classification prediction model with the optimization weight combination to create a super learning model for tobacco quality grade classification prediction;
7) and inputting the index data of the tobacco quality data to be identified into the super learning model to obtain the tobacco quality grade classification prediction result of the tobacco quality data to be identified.
2. The method of claim 1, wherein the index categories include an appearance index, an organoleptic quality index, a chemical composition index, and a physical property index; the index dataset includes an appearance index dataset, a sensory quality index dataset, a chemical composition index dataset, and a physical property index dataset.
3. The method of claim 2, wherein the appearance indicators include 6 indicators of tobacco leaf color, maturity, leaf structure, identity, oil content, and color; the sensory quality indexes comprise 7 indexes of aroma quality, aroma quantity, concentration, strength, miscellaneous gas, irritation and aftertaste; the chemical component indexes comprise 10 indexes of total plant alkali, total sugar, reducing sugar, total nitrogen, potassium, chlorine, starch, nitrogen-alkali ratio, sugar-alkali ratio and potassium-chlorine ratio; the physical characteristic indexes comprise 7 indexes of thickness, elongation, filling value, tensile force, stem content, balanced water content and leaf surface density.
5. The method of claim 1, wherein the base learning algorithm is a class prediction algorithm.
6. The method of claim 5, wherein the classification prediction algorithm comprises a multiple logistic regression algorithm, a gradient boosting decision tree algorithm, a random forest algorithm, a support vector machine classification prediction algorithm.
7. The method of claim 1, wherein the meta learner is a classification prediction algorithm selected from a linear classification algorithm, or a gradient elevator, or a random forest algorithm, or a neural network, or a naive bayes algorithm, or an xgboost algorithm.
8. A server, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the method according to any one of claims 1 to 7.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110817834.1A CN113657452A (en) | 2021-07-20 | 2021-07-20 | Tobacco leaf quality grade classification prediction method based on principal component analysis and super learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110817834.1A CN113657452A (en) | 2021-07-20 | 2021-07-20 | Tobacco leaf quality grade classification prediction method based on principal component analysis and super learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113657452A true CN113657452A (en) | 2021-11-16 |
Family
ID=78489595
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110817834.1A Pending CN113657452A (en) | 2021-07-20 | 2021-07-20 | Tobacco leaf quality grade classification prediction method based on principal component analysis and super learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113657452A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114397297A (en) * | 2022-01-19 | 2022-04-26 | 河南中烟工业有限责任公司 | Rapid nondestructive testing method for starch content of flue-cured tobacco |
CN117035560A (en) * | 2023-10-09 | 2023-11-10 | 深圳市五轮科技股份有限公司 | Electronic cigarette production data management system based on big data |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080133434A1 (en) * | 2004-11-12 | 2008-06-05 | Adnan Asar | Method and apparatus for predictive modeling & analysis for knowledge discovery |
CN109726767A (en) * | 2019-01-13 | 2019-05-07 | 胡燕祝 | A kind of perceptron network data classification method based on AdaBoost algorithm |
CN111160425A (en) * | 2019-12-17 | 2020-05-15 | 湖北中烟工业有限责任公司 | Neural network-based flue-cured tobacco comfort classification evaluation method |
CN111199343A (en) * | 2019-12-24 | 2020-05-26 | 上海大学 | Multi-model fusion tobacco market supervision abnormal data mining method |
AU2020100709A4 (en) * | 2020-05-05 | 2020-06-11 | Bao, Yuhang Mr | A method of prediction model based on random forest algorithm |
-
2021
- 2021-07-20 CN CN202110817834.1A patent/CN113657452A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080133434A1 (en) * | 2004-11-12 | 2008-06-05 | Adnan Asar | Method and apparatus for predictive modeling & analysis for knowledge discovery |
CN109726767A (en) * | 2019-01-13 | 2019-05-07 | 胡燕祝 | A kind of perceptron network data classification method based on AdaBoost algorithm |
CN111160425A (en) * | 2019-12-17 | 2020-05-15 | 湖北中烟工业有限责任公司 | Neural network-based flue-cured tobacco comfort classification evaluation method |
CN111199343A (en) * | 2019-12-24 | 2020-05-26 | 上海大学 | Multi-model fusion tobacco market supervision abnormal data mining method |
AU2020100709A4 (en) * | 2020-05-05 | 2020-06-11 | Bao, Yuhang Mr | A method of prediction model based on random forest algorithm |
Non-Patent Citations (5)
Title |
---|
尹梅;周国雄;: "基于改进模糊聚类的烟草品质集成评价模型", 湖南农业大学学报(自然科学版), no. 04, 25 August 2016 (2016-08-25) * |
张建强;刘维涓;侯英;: "基于稀疏表示分类和近红外光谱的烟叶自动分级研究", 光谱学与光谱分析, no. 1, 15 October 2018 (2018-10-15) * |
石子健等: "多分类器集成系统在卷烟感官评估中的应用", 《中国烟草学报》, 29 February 2016 (2016-02-29), pages 24 - 31 * |
童珂凡;张忠良;雒兴刚;曾鸣;汤建国;: "基于动态分类器集成系统的卷烟感官质量预测方法", 计算机应用与软件, no. 01, 12 January 2020 (2020-01-12) * |
谭观萍;宾俊;范伟;张发明;李海平;王承伟;周冀衡;: "模型集群分析-随机森林方法在烟叶分类中的应用", 江西农业学报, no. 01, 15 January 2017 (2017-01-15) * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114397297A (en) * | 2022-01-19 | 2022-04-26 | 河南中烟工业有限责任公司 | Rapid nondestructive testing method for starch content of flue-cured tobacco |
CN114397297B (en) * | 2022-01-19 | 2024-01-23 | 河南中烟工业有限责任公司 | Rapid nondestructive testing method for starch content of flue-cured tobacco |
CN117035560A (en) * | 2023-10-09 | 2023-11-10 | 深圳市五轮科技股份有限公司 | Electronic cigarette production data management system based on big data |
CN117035560B (en) * | 2023-10-09 | 2024-02-20 | 深圳市五轮科技股份有限公司 | Electronic cigarette production data management system based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Huber et al. | Nowcasting in a pandemic using non-parametric mixed frequency VARs | |
CN110503531B (en) | Dynamic social scene recommendation method based on time sequence perception | |
CN112288191B (en) | Ocean buoy service life prediction method based on multi-class machine learning method | |
CN111199343A (en) | Multi-model fusion tobacco market supervision abnormal data mining method | |
CN113657452A (en) | Tobacco leaf quality grade classification prediction method based on principal component analysis and super learning | |
CN112557034B (en) | Bearing fault diagnosis method based on PCA _ CNNS | |
CN111831905A (en) | Recommendation method and device based on team scientific research influence and sustainability modeling | |
CN105431854B (en) | Method and apparatus for analyzing biological sample | |
CN112100439B (en) | Recommendation method based on dependency embedding and neural attention network | |
CN107609588A (en) | A kind of disturbances in patients with Parkinson disease UPDRS score Forecasting Methodologies based on voice signal | |
Al Imran et al. | Deep neural network approach for predicting the productivity of garment employees | |
Mesters et al. | Generalized dynamic panel data models with random effects for cross-section and time | |
CN111309577A (en) | Spark-oriented batch processing application execution time prediction model construction method | |
CN115204967A (en) | Recommendation method integrating implicit feedback of long-term and short-term interest representation of user | |
CN105651941B (en) | A kind of cigarette sense organ intelligent evaluation system based on decomposition aggregation strategy | |
Kale et al. | Forecasting Indian stock market using artificial neural networks | |
Loddo et al. | Selection of multivariate stochastic volatility models via Bayesian stochastic search | |
CN108363830B (en) | Functional clothes hanger-oriented principle scheme non-cooperative game decision method | |
CN115841269A (en) | Periodical dynamic evaluation method based on multi-dimensional index analysis | |
CN111612491A (en) | State analysis model construction method, analysis method and device | |
Sauvé et al. | Variable selection through CART | |
He et al. | Accelerated bayesian additive regression trees | |
CN112465054A (en) | Multivariate time series data classification method based on FCN | |
Heaton | Feature Importance in Supervised Training | |
CN113125377A (en) | Method and device for detecting diesel oil property based on near infrared spectrum |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |