CN117131970A - Air separation system oxygen extraction rate prediction method and system based on ensemble learning - Google Patents
Air separation system oxygen extraction rate prediction method and system based on ensemble learning Download PDFInfo
- Publication number
- CN117131970A CN117131970A CN202310476394.7A CN202310476394A CN117131970A CN 117131970 A CN117131970 A CN 117131970A CN 202310476394 A CN202310476394 A CN 202310476394A CN 117131970 A CN117131970 A CN 117131970A
- Authority
- CN
- China
- Prior art keywords
- extraction rate
- oxygen extraction
- model
- target
- air separation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 title claims abstract description 113
- 229910052760 oxygen Inorganic materials 0.000 title claims abstract description 113
- 239000001301 oxygen Substances 0.000 title claims abstract description 113
- 238000000605 extraction Methods 0.000 title claims abstract description 109
- 238000000034 method Methods 0.000 title claims abstract description 88
- 238000000926 separation method Methods 0.000 title claims abstract description 39
- 238000012216 screening Methods 0.000 claims abstract description 40
- 238000012549 training Methods 0.000 claims abstract description 37
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 17
- 238000007781 pre-processing Methods 0.000 claims abstract description 14
- 238000004458 analytical method Methods 0.000 claims abstract description 7
- 230000010354 integration Effects 0.000 claims abstract 6
- 238000005070 sampling Methods 0.000 claims description 21
- 238000007637 random forest analysis Methods 0.000 claims description 12
- 238000012706 support-vector machine Methods 0.000 claims description 12
- 238000003066 decision tree Methods 0.000 claims description 9
- 230000002159 abnormal effect Effects 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 claims description 5
- 238000006467 substitution reaction Methods 0.000 claims description 3
- 238000007635 classification algorithm Methods 0.000 claims 1
- 238000012795 verification Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 7
- 238000011161 development Methods 0.000 description 5
- 238000002790 cross-validation Methods 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- MYMOFIZGZYHOMD-UHFFFAOYSA-N Dioxygen Chemical compound O=O MYMOFIZGZYHOMD-UHFFFAOYSA-N 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012417 linear regression Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000005194 fractionation Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000013450 outlier detection Methods 0.000 description 1
Classifications
-
- F—MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
- F25—REFRIGERATION OR COOLING; COMBINED HEATING AND REFRIGERATION SYSTEMS; HEAT PUMP SYSTEMS; MANUFACTURE OR STORAGE OF ICE; LIQUEFACTION SOLIDIFICATION OF GASES
- F25J—LIQUEFACTION, SOLIDIFICATION OR SEPARATION OF GASES OR GASEOUS OR LIQUEFIED GASEOUS MIXTURES BY PRESSURE AND COLD TREATMENT OR BY BRINGING THEM INTO THE SUPERCRITICAL STATE
- F25J3/00—Processes or apparatus for separating the constituents of gaseous or liquefied gaseous mixtures involving the use of liquefaction or solidification
- F25J3/02—Processes or apparatus for separating the constituents of gaseous or liquefied gaseous mixtures involving the use of liquefaction or solidification by rectification, i.e. by continuous interchange of heat and material between a vapour stream and a liquid stream
- F25J3/04—Processes or apparatus for separating the constituents of gaseous or liquefied gaseous mixtures involving the use of liquefaction or solidification by rectification, i.e. by continuous interchange of heat and material between a vapour stream and a liquid stream for air
- F25J3/04763—Start-up or control of the process; Details of the apparatus used
- F25J3/04769—Operation, control and regulation of the process; Instrumentation within the process
- F25J3/04848—Control strategy, e.g. advanced process control or dynamic modeling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- General Engineering & Computer Science (AREA)
- Tourism & Hospitality (AREA)
- Software Systems (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Game Theory and Decision Science (AREA)
- Primary Health Care (AREA)
- Operations Research (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Thermal Sciences (AREA)
- Manufacturing & Machinery (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Mechanical Engineering (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application relates to an integrated learning-based air separation system oxygen extraction rate prediction method and an integrated learning-based air separation system oxygen extraction rate prediction system, and relates to the technical field of air separation prediction, wherein the oxygen extraction rate prediction method comprises the following steps: the method comprises the steps of collecting historical air separation system data, preprocessing, determining target historical data, carrying out feature screening by utilizing a method of combining Lasso feature screening with a Pearson correlation coefficient method, forming a target oxygen extraction rate influence factor set, constructing a primary learner and a secondary learner for Stacking integrated learning based on a Boosting integration algorithm and a Bagging integration algorithm principle, carrying out model training according to the target oxygen extraction rate influence factor set, fusing the primary learners based on the Stacking integration learning method principle, further determining a target oxygen extraction rate prediction model, and fusing a plurality of Boosting and Bagging integration models by using the Stacking integration learning method to improve the generalization capability and the prediction accuracy of the air separation system oxygen extraction rate prediction model and provide assistance for air separation enterprise oxygen extraction rate adjustment and analysis.
Description
Technical Field
The application belongs to the technical field of space division prediction, and particularly relates to a prediction method and a prediction system for an oxygen extraction rate of a space division system based on ensemble learning.
Background
Along with the development of social economy, the development of metallurgical industry in China is rapid, and the development of an air separation system plays an important role in the metallurgical industry.
Along with the development of social intelligence, the development of the traditional industry is also affected by gradual intelligence, various sensors in a factory collect mass production operation data, the mass data cannot be analyzed by using a manual operation method, the operation is too time-consuming and labor-consuming, the mass data is analyzed by the technologies of data mining, machine learning and the like, and the faults of industrial equipment can be rapidly and accurately detected, the quality and the output of products can be predicted and the like. The data mining technology is different according to the different problems to be processed, and the mined contents are divided into: correlation analysis, dependency model, cluster analysis, fault and outlier detection, classification and regression, classification prediction and regression prediction, and the like.
Regression prediction is classified into linear regression prediction and nonlinear regression prediction, wherein the linear regression prediction only considers the linear relation between an independent variable and a dependent variable and has limitation, while the nonlinear regression prediction considers the nonlinear relation between the independent variable and the dependent variable, and the current regression prediction method mainly predicts data through the establishment of a single model, but the prediction performance of the single linear or nonlinear regression prediction model is limited.
Therefore, an integrated learning-based air separation system oxygen extraction rate prediction method is provided. The method mainly comprises data acquisition, data preprocessing, characteristic variable screening and prediction model establishment, wherein raw data in an actual space division flow are directly acquired from a factory through various sensors, abnormal data in the data are correspondingly processed, characteristic variables with strong correlation with oxygen extraction rate are screened out through characteristic variable screening, complexity of the prediction model establishment can be effectively simplified through characteristic variable screening, meanwhile accuracy of the prediction model can be obviously improved, the prediction model is established through an integrated learning method, and a plurality of prediction models can be combined based on the integrated learning prediction model to obtain better generalization performance and prediction accuracy.
Disclosure of Invention
The application provides a prediction method and a prediction system for an oxygen extraction rate of a space division system, which are used for solving the technical problems that the research on the oxygen extraction rate in the existing space division system is less, and the model single generalization capability of the existing prediction method is insufficient, so that the final prediction result and the actual fitting degree are insufficient.
In view of the above problems, the application provides a method and a system for predicting the oxygen extraction rate of a space division system based on ensemble learning.
In a first aspect, the present application provides a method for predicting an oxygen extraction rate of a space division system based on ensemble learning, the method comprising: collecting historical data of a space division system and preprocessing; analyzing the original data and screening characteristic variables, determining target historical data and constructing a target oxygen extraction rate influence factor set; establishing a single prediction model set based on algorithm principle analysis, wherein the single prediction model set comprises: boosting integrated model based on GBDT gradient Boosting iterative decision tree, bagging integrated model based on support vector machine, bagging integrated model based on ridge regression and Bagging integrated model based on random forest; training the single prediction model sequentially by taking the target oxygen extraction rate influence factor set as input information to obtain a target model set; and fusing the target model set based on a Stacking integrated learning method principle to obtain a target air separation system oxygen extraction rate prediction model, and predicting the oxygen extraction rate through the air separation system oxygen extraction rate prediction model.
In a second aspect, the present application provides an air separation system oxygen extraction rate prediction system based on ensemble learning, the system comprising: the data acquisition module is used for acquiring historical data of the space division system to obtain target initial data; the data preprocessing module is used for preprocessing the acquired original data to obtain target historical data; the characteristic variable selection module is used for screening out characteristic variables with larger influence on the oxygen extraction rate, screening out some irrelevant characteristic variables and constructing a target oxygen extraction rate influence factor set according to the screening result; the model set component module is used for establishing a single prediction model set based on algorithm principle analysis, namely a Stacking model base learner and a meta learner, wherein the single prediction model set comprises: boosting integrated model based on GBDT gradient Boosting iterative decision tree, bagging integrated model based on support vector machine, bagging integrated model based on ridge regression and Bagging integrated model based on random forest; the model set training module is used for training the single prediction models in sequence by taking the target oxygen extraction rate factor set as input information to obtain a target single prediction model set; and the model fusion module is used for fusing the target single model set based on the Stacking integrated learning method principle to obtain a target air separation system oxygen extraction rate prediction model, and predicting the oxygen extraction rate through the target air separation system oxygen extraction rate prediction model.
The beneficial effects of the application are as follows:
1. the application adopts the Lasso+Pearson mode to carry out the characteristic variable screening, the Pearson correlation coefficient method is simple and convenient, but if the method of carrying out the covariate screening one by one only by using the Pearson correlation coefficient is adopted, the time for the high-dimensional characteristic variable screening is not only wasted, and the problem of multiple collinearity and autocorrelation of the characteristic variable obtained by the final screening exists. The Lasso algorithm can be just used for screening high-dimensional characteristic variables, and can solve the problem that independent variables have collinearity. Therefore, the characteristic variable screening process is divided into two parts, namely, the Lasso characteristic is used for screening the dimension of the compressed variable and solving the problem of collinearity among independent variables, then the Pearson correlation coefficient method is used for further screening the characteristic variable related to the oxygen extraction rate, and the two parts mutually correspond to better screening the characteristic variable of the space division system, so that the phenomenon that the screened characteristic variable and the oxygen extraction rate are pseudo-related is effectively prevented;
2. according to the application, a prediction model is built by utilizing the idea of integrated learning, and a plurality of models can be combined, so that error correction is mutually carried out among different learners, and a good strong supervision model is obtained, so that the accuracy of a final model and the generalization capability are improved;
3. according to the application, a plurality of integrated learning ideas are combined, boosting, bagging and Stacking are combined, the Boosting training process is in a ladder shape, the importance of data is continuously adjusted according to the learning result of the previous model, the GBDT algorithm based on the Boosting ideas has strong supervision capability and high prediction precision, and nonlinear data can be processed; the Bagging core idea is to sample autonomously, each base model is fitted on a slightly different training set, which can enable differences between each base model to exist and further have slightly different training capabilities; the modeling method and the modeling system have the advantages that the modeling method and the modeling system are used for training by using different models, and the advantages of the different models can be automatically fused, and finally the modeling method and the modeling system based on Boosting and Bagging ideas are fused by using the modeling idea, so that the model with the advantages of the different models is finally obtained, the performance of the final model is further improved, and the generalization capability and the prediction accuracy of the air separation system oxygen extraction rate prediction model are improved.
Table 1 below shows the mean square error MSE, the root mean square error RMSE, and the mean absolute error MAE of the integrated model corresponding to the feature variables having a moderate correlation and a weak correlation with the oxygen extraction rate, using only the Lasso feature variable screening, using the Lasso-combined Pearson correlation coefficient method, and using the Lasso-combined Pearson correlation coefficient method.
Table 1 feature variable screening method comparison
According to the results of table 1, it can be seen that the method for screening the feature variables by combining the Lasso feature screening with the Pearson correlation coefficient method can effectively improve the generalization capability of the model, wherein the selection of the Pearson correlation coefficient method for screening the feature variables with weak correlation and above weak correlation with the oxygen extraction rate for model establishment has a better effect than the selection of the feature variables with medium correlation and above medium correlation for model establishment. The eastern division and western division and example verification show that the feature variable screening method combining the Pearson correlation coefficient method by using the Lasso feature variable screening can effectively improve the generalization capability of the model and the prediction accuracy of the model.
In order to further compare the model performance difference between the single model building method and the Stacking integrated model building method combining Boosting and Bagging, model 1 is sequentially used for screening the weak correlation and the characteristic variables above the weak correlation by using Lasso and Pearson correlation coefficient method: boosting integrated model based on GBDT gradient Boosting iterative decision tree, model 2: bagging integrated model based on support vector machine, model 3: and carrying out five-fold cross validation on a Bagging integrated model based on ridge regression and a Stacking model fused with the three models.
Table 2 comparison of predictive model creation methods
Model | Model 1 | Model 2 | Model 3 | Stacking model |
Cross validation results | 0.76561 | 0.43019 | 0.87913 | 0.88555 |
According to the five-fold cross validation results of each model in table 2, the model after model fusion by using the Stacking integrated model has better model generalization capability and model prediction accuracy.
Drawings
FIG. 1 is a schematic flow chart of a method for predicting the oxygen extraction rate of a space division system based on ensemble learning;
FIG. 2 is a schematic diagram of a process for obtaining a target oxygen extraction factor set in a method for predicting an oxygen extraction rate of a space division system based on ensemble learning;
FIG. 3 is a schematic diagram of a flow chart for constructing an oxygen extraction rate prediction model in an air separation system oxygen extraction rate prediction method based on ensemble learning;
fig. 4 is a schematic structural diagram of an air separation system oxygen extraction rate prediction system based on ensemble learning.
Detailed Description
The application provides an integrated learning-based air separation system oxygen extraction rate prediction method, which is used for solving the technical problems that the existing air separation system has little research on the oxygen extraction rate and the existing prediction method model has insufficient single generalization capability, so that the final prediction result and the actual fitting degree are insufficient.
Example 1
As shown in fig. 1, the application provides a method for predicting an oxygen extraction rate of a space division system based on ensemble learning, which comprises the following steps:
(1) The method comprises the steps of collecting space division system historical data, wherein the space division system historical data refer to space division system historical data collected based on a preset time interval in the past, integrating the collected data, identifying the collected data based on a time sequence so as to perform data identification and distinguishing at a later stage, for example, the space division system data are collected every 60s every day, namely 1440 data in total every 24 hours, and the collected characteristic variables comprise: air separation system characteristic variables of 214 such as oxygen outlet tower flow, air inlet fractionating tower flow, lower tower resistance, upper tower resistance, liquid oxygen evaporator pressure and the like, and meanwhile, according to an oxygen extraction rate formula: oxygen extraction rate= (oxygen outlet tower flow/air inlet fractionation tower flow), obtaining oxygen extraction rate y;
(2) Preprocessing the historical data of the space division system, wherein the preprocessing process comprises the following steps: filling the blank value, processing the abnormal value to obtain target historical data, filling the blank value according to an average value method, calculating a minimum observed value, 25% quantile, median, 75% quantile and maximum observed value of the oxygen extraction rate by using a box diagram, and if the data exceeds the maximum observed value or the minimum observed value, regarding the data as the abnormal value, and deleting a space division system data sample corresponding to the abnormal value by using a deletion method to obtain the target historical data because the target historical data sample is rich;
(3) Carrying out Max_Min normalization processing on the characteristic variables of the space division systems with different dimensions, mapping all data into a range of 0-1, and carrying out data dimensionless processing, wherein the normalization processing formula is as follows:
where x is the original data, x min For the corresponding minimum data in the feature variable, x max For the corresponding maximum data in the feature variables, max_Min normalization processing is used, so that all data in different dimensions can be mapped to the same dimension, and dimension influence among indexes is eliminated;
(4) For normalized data x new First round feature variable screening using Lasso algorithm by adding L 1 The regularization mode prevents overfitting, and the coefficient of unimportant characteristic variables in the data is contracted to 0, so that the characteristic variable selection of the first round is realized, and an initial oxygen extraction rate influence factor set is obtained, wherein a cost function in a Lasso algorithm is as follows:
wherein y is (i) Represents the i-th oxygen extraction value, θ T Representing the corresponding characteristic variable coefficient size, x (i) Represents the i characteristic variable value, lambda is penalty coefficient, is mainly responsible for controlling the magnitude of penalty force,is L 1 The punishment item is used for selecting a model with the best effect when a punishment coefficient is selected to be 0.01 in the Lasso characteristic variable screening process in the air separation system oxygen extraction rate prediction method based on ensemble learning;
(5) The initial oxygen extraction rate influence factor set comprises a plurality of oxygen extraction rate influence factors, and in order to further simplify the model construction and improve the generalization capability of the model, the plurality of oxygen extraction rate influence factors and the oxygen extraction rate are sequentially analyzed by using a Pearson correlation coefficient method, characteristic variables with the absolute value of the Pearson correlation coefficient between the plurality of oxygen extraction rate and the oxygen extraction rate being more than or equal to 0.2 are screened out, namely characteristic variables with weak correlation and correlation degrees above the weak correlation are screened out, and a target oxygen extraction rate influence factor set is formed, wherein the calculation formula of the Pearson correlation coefficient is as follows:
wherein the method comprises the steps ofFor X, the sum of squares of the mean differences of the individual characteristic variables,>y is the sum of squares of the mean deviation of the oxygen extraction rate,>for the sum of the average differences of the distances between X and Y, further screening out influence factors with a strong relation with the oxygen extraction rate by calculating the Pearson correlation coefficient between each characteristic variable and the oxygen extraction rate, and constructing a target oxygen extraction rate influence factor set;
(6) Dividing target historical data into a training set and a testing set according to the proportion of 7:3, wherein the training set is responsible for training a model, and the testing set is used for testing and establishing the model goodness;
(7) Constructing a Boosting integrated model based on GBDT gradient Boosting iterative decision tree, namely using the decision tree as a base regressor, adopting a method of linear combination of base functions for a training set and continuously reducing residual errors generated in a training process, continuously Boosting a weak learner into a strong learner, building the Boosting integrated model, and defining the model as H_1;
(8) Constructing a Bagging integrated model based on a support vector machine, namely using the support vector machine as a base model, randomly sampling m samples in a training data set, carrying out T times of random sampling, wherein the random sampling ensures that the sample number of each sampling set is the same as that of the training set, but the sample contents of the T sampling sets are different, the T sampling sets are different from each other, training different subsets by using the base model support vector machine after the acquisition is completed to obtain T base models, then respectively predicting test data by using the T base models to obtain T prediction results corresponding to the T base models, calculating the average value of the T prediction results by using an average value method to be used as a final prediction result of the model, and defining the model as H_2;
(9) Constructing a Bagging integrated model based on ridge regression, namely randomly sampling T samples in a training set by using a ridge regression algorithm as a base model, sampling m samples each time, wherein the random sampling ensures that the sample content of each sampling set is the same as that of the training set, but the sample content of the T sampling sets is different, then training the T sampling sets by using the ridge regression algorithm respectively to obtain T base models based on the ridge regression model, determining the final prediction result of the model by using an average method on the T prediction results by using a Bagging integrated idea, and defining the model as H_3;
(10) Training by a meta learner, constructing a Bagging integrated model based on a random forest, namely using the random forest as a base model, randomly sampling m samples in a training data set, carrying out T times of random sampling, wherein the random sampling ensures that the number of samples in each sampling set is the same as that of samples in the training set, but the sample contents of T sampling sets are different, the T sampling sets are different from each other, training different subsets by using the base model random forest after acquisition is completed to obtain T base models, respectively predicting test data by using the T base models to obtain T prediction results corresponding to the T base models, calculating the average value of the T prediction results by using an average value method to be used as a final prediction result of the model, and defining the model as MH;
(11) Referring to fig. 3, based on the Stacking integrated learning algorithm principle, models h_1, h_2 and h_3 are used as primary learners in a Stacking prediction model, primary learners learn raw data first, then the output of the primary learners for the raw data is stacked in a column manner to form (m, p) dimensional new data, m is the number of samples, p represents the number of the base learners, in the method, p is 3, and the (m, p) dimensional new data is input into a meta learner in the Stacking integrated model as input to perform fusion of the primary learners.
(12) The primary learner, the meta learner and the Stacking integrated model fused by the meta learner are respectively subjected to five-fold model cross verification by using five-fold cross verification, and the specific method comprises the following steps: the training set is divided into five folds, one fold is selected as a verification set for model verification every time, the other four folds are used as the training set for model training, the five-fold cross verification is carried out for five times for model verification, and the risk of model fitting can be effectively reduced by adopting a five-fold cross verification method.
(13) Referring to fig. 4, the embodiment of the application also provides an air separation system oxygen extraction rate prediction system based on ensemble learning, which mainly comprises the following modules:
the data acquisition module is used for acquiring historical data of the space division system, and the historical data of the space division system comprises: 214 air separation system characteristic variables such as oxygen outlet tower flow, air inlet fractionating tower flow, lower tower resistance, upper tower resistance, liquid oxygen evaporator pressure and the like and an oxygen extraction rate spool, wherein the oxygen extraction rate is calculated as a final index value according to the oxygen outlet tower flow and the air inlet fractionating tower flow;
the data preprocessing module is used for preprocessing the collected original data, and preprocessing contents comprise: processing the missing value, processing the abnormal value and unifying and dimensionalizing the data so as to form a data set required by constructing an oxygen extraction rate prediction model of the space division system;
the feature variable selecting module is used for screening feature variables with strong correlation to the oxygen extraction rate, screening out some feature variables which are irrelevant to the oxygen extraction rate or have little correlation, improving the model establishment efficiency and the model precision, and the feature screening module comprises: the Lasso characteristic variable screening and the Pearson correlation coefficient method are adopted, and finally the characteristic importance degree of the corresponding characteristic variable on the oxygen extraction rate is given;
the model set building module is used for building a single prediction model set based on algorithm principle analysis, wherein the single prediction model set comprises: boosting integrated model based on GBDT gradient Boosting iterative decision tree, bagging integrated model based on support vector machine, bagging integrated model based on ridge regression and Bagging integrated model based on random forest;
the model set training module is used for training the single prediction models sequentially by taking the target oxygen extraction rate factor set as input information to obtain a target single prediction model set;
the model fusion module is used for fusing the target single model set based on the Stacking integrated learning method principle to obtain a target air separation system oxygen extraction rate prediction model;
and the oxygen extraction rate prediction module is used for predicting the oxygen extraction rate by using the target air separation system oxygen extraction rate prediction model to obtain a target value.
The foregoing is only a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art, who is within the scope of the present application, should make equivalent substitutions or modifications according to the technical solution of the present application and the inventive concept thereof, and should be covered by the scope of the present application.
Claims (10)
1. The method for predicting the oxygen extraction rate of the air separation system based on the ensemble learning is characterized by comprising the following steps of: collecting historical data of a space division system, and preprocessing the historical data of the space division system to obtain target historical data; analyzing the target historical data and screening characteristic variables, and constructing a target oxygen extraction rate influence factor set according to screening results; based on Boosting integrated learning algorithm and Bagging integrated learning algorithm principle, a base learner and a meta learner of Stacking integrated learning are built, wherein the base learner comprises: boosting integrated model based on GBDT gradient lifting iterative decision tree, bagging integrated model based on support vector machine, bagging integrated model based on ridge regression; the meta learner is: bagging integrated model based on random forest; training the basic learner and the meta learner model by taking the target oxygen extraction rate influence factor set as input in sequence to respectively obtain a target basic learner and a target meta learner; and fusing the target base learner with the target element learner based on the Stacking integrated learning method principle to obtain a target air separation system oxygen extraction rate prediction model, and predicting the oxygen extraction rate through the oxygen extraction rate prediction model.
2. The air separation system oxygen extraction rate prediction method based on ensemble learning according to claim 1, wherein: preprocessing the historical space division system data, filling the missing values by using an average filling method, identifying the abnormal values by using a box diagram, and deleting the abrupt variation normal values.
3. The air separation system oxygen extraction rate prediction method based on ensemble learning according to claim 1, wherein: and performing first-round feature variable screening by using Lasso feature screening, and constructing an initial oxygen extraction rate influence factor set according to an analysis result, wherein the initial oxygen extraction rate influence factor set comprises a plurality of oxygen extraction rate influence factors, and performing Pearson related coefficient analysis on each oxygen extraction rate influence factor in the initial oxygen extraction rate influence factors and the oxygen extraction rate in sequence to obtain a target oxygen extraction rate influence factor set.
4. The air separation system oxygen extraction rate prediction method based on ensemble learning according to claim 1, wherein: and dividing the data corresponding to the target oxygen extraction rate influence factor set into training data and test data according to the ratio of 7:3, and training the base learner and the element learner by taking the training data as input information in sequence to respectively obtain a target base learner and a target element learner.
5. The method for predicting the oxygen extraction rate of the air separation system based on ensemble learning according to claim 4, wherein the method comprises the following steps: based on the Boosting integrated model of the GBDT gradient Boosting iterative decision tree, a series of base learners are trained and generated on a sample subset by using a weak classification algorithm CART, the series of base learners are used for predicting the target oxygen extraction rate, and the obtained series of prediction results are subjected to weighted fusion to generate a final Stacking primary learner model H_1.
6. The method for predicting the oxygen extraction rate of the air separation system based on ensemble learning according to claim 4, wherein the method comprises the following steps: and uniformly sampling the target oxygen extraction rate training data set with a substitution by using a support vector machine as a base model based on the Bagging integrated model of the support vector machine, obtaining a plurality of base models according to different sample sets, respectively predicting the oxygen extraction rate to obtain a plurality of prediction results, obtaining a final Bagging integrated model result based on the support vector machine by an average value method, and generating a final Bagging primary learner model H_2.
7. The method for predicting the oxygen extraction rate of the air separation system based on ensemble learning according to claim 4, wherein the method comprises the following steps: and (3) uniformly sampling the target oxygen extraction rate training data set with a substitution by using a ridge regression algorithm as a base model based on a ridge regression integrated model, obtaining a plurality of ridge regression weak learners according to different sample sets, respectively predicting the oxygen extraction rate to obtain a plurality of prediction results, and obtaining a final ridge regression based Bagging integrated model result by an average method to generate a final Stacking primary learner model H_3.
8. The method for predicting the oxygen extraction rate of the air separation system based on ensemble learning according to claim 4, wherein the method comprises the following steps: and uniformly sampling the target oxygen extraction rate training data set with a put-back ground by using a random forest algorithm as a base model based on a Bagging integrated model of a random forest, obtaining a plurality of random forest weak learners according to different sample sets, respectively predicting the oxygen extraction rate to obtain a plurality of prediction remembers, and obtaining a final Bagging integrated model result based on the random forest by an average value method to generate a final Bagging secondary learner model MH.
9. The method for predicting the oxygen extraction rate of the air separation system based on ensemble learning according to claim 4, wherein the method comprises the following steps: the models H_1, H_2 and H_3 are used as primary learners in a Stacking prediction model, primary data are firstly learned through the primary learners, then the output of the primary learners to the primary data is stacked in a column mode to form (m, p) dimensional new data, m is the number of samples, p represents the number of the base learners, p is 3 in the method, and the (m, p) dimensional new data are used as input to the meta learners in the Stacking integration model to be fused.
10. An air separation system oxygen extraction rate prediction system based on ensemble learning, comprising: the data acquisition module is used for acquiring historical data of each parameter variable of the space division system sensor; the data preprocessing module is used for preprocessing the acquired original data to obtain target historical data; the characteristic variable selection module is used for screening out characteristic variables with larger influence on the oxygen extraction rate, screening out some irrelevant characteristic variables and constructing a target oxygen extraction rate influence factor set according to the screening result; the model set building module is used for building a single prediction model set based on algorithm principle analysis, wherein the single prediction model set comprises: boosting integrated model based on GBDT gradient Boosting iterative decision tree, bagging integrated model based on support vector machine, bagging integrated model based on ridge regression and Bagging integrated model based on random forest; the model set training module is used for training the single prediction model in sequence by taking the target oxygen extraction rate factor set as input information to obtain a target single prediction model set; and the model fusion module is used for fusing the target single model set based on the Stacking integrated learning method principle to obtain a target air separation system oxygen extraction rate prediction model, and predicting the oxygen extraction rate through the target air separation system oxygen extraction rate prediction model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310476394.7A CN117131970A (en) | 2023-04-28 | 2023-04-28 | Air separation system oxygen extraction rate prediction method and system based on ensemble learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310476394.7A CN117131970A (en) | 2023-04-28 | 2023-04-28 | Air separation system oxygen extraction rate prediction method and system based on ensemble learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117131970A true CN117131970A (en) | 2023-11-28 |
Family
ID=88858915
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310476394.7A Pending CN117131970A (en) | 2023-04-28 | 2023-04-28 | Air separation system oxygen extraction rate prediction method and system based on ensemble learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117131970A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117574329A (en) * | 2024-01-15 | 2024-02-20 | 南京信息工程大学 | Nitrogen dioxide refined space distribution method based on ensemble learning |
-
2023
- 2023-04-28 CN CN202310476394.7A patent/CN117131970A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117574329A (en) * | 2024-01-15 | 2024-02-20 | 南京信息工程大学 | Nitrogen dioxide refined space distribution method based on ensemble learning |
CN117574329B (en) * | 2024-01-15 | 2024-04-30 | 南京信息工程大学 | Nitrogen dioxide refined space distribution method based on ensemble learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111337768B (en) | Deep parallel fault diagnosis method and system for dissolved gas in transformer oil | |
CN115276006B (en) | Load prediction method and system for power integration system | |
CN105678332B (en) | Converter steelmaking end point judgment method and system based on flame image CNN recognition modeling | |
CN109614973A (en) | Rice seedling and Weeds at seedling image, semantic dividing method, system, equipment and medium | |
CN108985380B (en) | Point switch fault identification method based on cluster integration | |
CN111145042A (en) | Power distribution network voltage abnormity diagnosis method adopting full-connection neural network | |
CN106815492A (en) | A kind of bacterial community composition and the automatic mode of diversity analysis for 16SrRNA genes | |
CN112101480A (en) | Multivariate clustering and fused time sequence combined prediction method | |
CN112330050A (en) | Power system load prediction method considering multiple features based on double-layer XGboost | |
CN112756759B (en) | Spot welding robot workstation fault judgment method | |
CN117131970A (en) | Air separation system oxygen extraction rate prediction method and system based on ensemble learning | |
CN111008726B (en) | Class picture conversion method in power load prediction | |
CN108334943A (en) | The semi-supervised soft-measuring modeling method of industrial process based on Active Learning neural network model | |
CN111896495A (en) | Method and system for discriminating Taiping Houkui production places based on deep learning and near infrared spectrum | |
CN112489497A (en) | Airspace operation complexity evaluation method based on deep convolutional neural network | |
CN111046961A (en) | Fault classification method based on bidirectional long-and-short-term memory unit and capsule network | |
CN105844334B (en) | A kind of temperature interpolation method based on radial base neural net | |
CN115757103A (en) | Neural network test case generation method based on tree structure | |
CN113657023A (en) | Near-surface ozone concentration inversion method based on combination of machine learning and deep learning | |
CN106570514A (en) | Automobile wheel hub classification method based on word bag model and support vector machine | |
CN111061151B (en) | Distributed energy state monitoring method based on multivariate convolutional neural network | |
CN107808245A (en) | Based on the network scheduler system for improving traditional decision-tree | |
CN111026075A (en) | Error matching-based fault detection method for medium-low pressure gas pressure regulator | |
CN116304941A (en) | Ocean data quality control method and device based on multi-model combination | |
CN113108949B (en) | Model fusion-based sonde temperature sensor error prediction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |