CN117131970A - Air separation system oxygen extraction rate prediction method and system based on ensemble learning - Google Patents

Air separation system oxygen extraction rate prediction method and system based on ensemble learning Download PDF

Info

Publication number
CN117131970A
CN117131970A CN202310476394.7A CN202310476394A CN117131970A CN 117131970 A CN117131970 A CN 117131970A CN 202310476394 A CN202310476394 A CN 202310476394A CN 117131970 A CN117131970 A CN 117131970A
Authority
CN
China
Prior art keywords
extraction rate
oxygen extraction
model
target
air separation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310476394.7A
Other languages
Chinese (zh)
Inventor
王曙燕
刘甜甜
李冠雄
孙家泽
王小银
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kaifeng Dear Air Separation Industrial Co ltd
Xian University of Posts and Telecommunications
Original Assignee
Kaifeng Dear Air Separation Industrial Co ltd
Xian University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kaifeng Dear Air Separation Industrial Co ltd, Xian University of Posts and Telecommunications filed Critical Kaifeng Dear Air Separation Industrial Co ltd
Priority to CN202310476394.7A priority Critical patent/CN117131970A/en
Publication of CN117131970A publication Critical patent/CN117131970A/en
Pending legal-status Critical Current

Links

Classifications

    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F25REFRIGERATION OR COOLING; COMBINED HEATING AND REFRIGERATION SYSTEMS; HEAT PUMP SYSTEMS; MANUFACTURE OR STORAGE OF ICE; LIQUEFACTION SOLIDIFICATION OF GASES
    • F25JLIQUEFACTION, SOLIDIFICATION OR SEPARATION OF GASES OR GASEOUS OR LIQUEFIED GASEOUS MIXTURES BY PRESSURE AND COLD TREATMENT OR BY BRINGING THEM INTO THE SUPERCRITICAL STATE
    • F25J3/00Processes or apparatus for separating the constituents of gaseous or liquefied gaseous mixtures involving the use of liquefaction or solidification
    • F25J3/02Processes or apparatus for separating the constituents of gaseous or liquefied gaseous mixtures involving the use of liquefaction or solidification by rectification, i.e. by continuous interchange of heat and material between a vapour stream and a liquid stream
    • F25J3/04Processes or apparatus for separating the constituents of gaseous or liquefied gaseous mixtures involving the use of liquefaction or solidification by rectification, i.e. by continuous interchange of heat and material between a vapour stream and a liquid stream for air
    • F25J3/04763Start-up or control of the process; Details of the apparatus used
    • F25J3/04769Operation, control and regulation of the process; Instrumentation within the process
    • F25J3/04848Control strategy, e.g. advanced process control or dynamic modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Software Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Primary Health Care (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Thermal Sciences (AREA)
  • Manufacturing & Machinery (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Mechanical Engineering (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to an integrated learning-based air separation system oxygen extraction rate prediction method and an integrated learning-based air separation system oxygen extraction rate prediction system, and relates to the technical field of air separation prediction, wherein the oxygen extraction rate prediction method comprises the following steps: the method comprises the steps of collecting historical air separation system data, preprocessing, determining target historical data, carrying out feature screening by utilizing a method of combining Lasso feature screening with a Pearson correlation coefficient method, forming a target oxygen extraction rate influence factor set, constructing a primary learner and a secondary learner for Stacking integrated learning based on a Boosting integration algorithm and a Bagging integration algorithm principle, carrying out model training according to the target oxygen extraction rate influence factor set, fusing the primary learners based on the Stacking integration learning method principle, further determining a target oxygen extraction rate prediction model, and fusing a plurality of Boosting and Bagging integration models by using the Stacking integration learning method to improve the generalization capability and the prediction accuracy of the air separation system oxygen extraction rate prediction model and provide assistance for air separation enterprise oxygen extraction rate adjustment and analysis.

Description

Air separation system oxygen extraction rate prediction method and system based on ensemble learning
Technical Field
The application belongs to the technical field of space division prediction, and particularly relates to a prediction method and a prediction system for an oxygen extraction rate of a space division system based on ensemble learning.
Background
Along with the development of social economy, the development of metallurgical industry in China is rapid, and the development of an air separation system plays an important role in the metallurgical industry.
Along with the development of social intelligence, the development of the traditional industry is also affected by gradual intelligence, various sensors in a factory collect mass production operation data, the mass data cannot be analyzed by using a manual operation method, the operation is too time-consuming and labor-consuming, the mass data is analyzed by the technologies of data mining, machine learning and the like, and the faults of industrial equipment can be rapidly and accurately detected, the quality and the output of products can be predicted and the like. The data mining technology is different according to the different problems to be processed, and the mined contents are divided into: correlation analysis, dependency model, cluster analysis, fault and outlier detection, classification and regression, classification prediction and regression prediction, and the like.
Regression prediction is classified into linear regression prediction and nonlinear regression prediction, wherein the linear regression prediction only considers the linear relation between an independent variable and a dependent variable and has limitation, while the nonlinear regression prediction considers the nonlinear relation between the independent variable and the dependent variable, and the current regression prediction method mainly predicts data through the establishment of a single model, but the prediction performance of the single linear or nonlinear regression prediction model is limited.
Therefore, an integrated learning-based air separation system oxygen extraction rate prediction method is provided. The method mainly comprises data acquisition, data preprocessing, characteristic variable screening and prediction model establishment, wherein raw data in an actual space division flow are directly acquired from a factory through various sensors, abnormal data in the data are correspondingly processed, characteristic variables with strong correlation with oxygen extraction rate are screened out through characteristic variable screening, complexity of the prediction model establishment can be effectively simplified through characteristic variable screening, meanwhile accuracy of the prediction model can be obviously improved, the prediction model is established through an integrated learning method, and a plurality of prediction models can be combined based on the integrated learning prediction model to obtain better generalization performance and prediction accuracy.
Disclosure of Invention
The application provides a prediction method and a prediction system for an oxygen extraction rate of a space division system, which are used for solving the technical problems that the research on the oxygen extraction rate in the existing space division system is less, and the model single generalization capability of the existing prediction method is insufficient, so that the final prediction result and the actual fitting degree are insufficient.
In view of the above problems, the application provides a method and a system for predicting the oxygen extraction rate of a space division system based on ensemble learning.
In a first aspect, the present application provides a method for predicting an oxygen extraction rate of a space division system based on ensemble learning, the method comprising: collecting historical data of a space division system and preprocessing; analyzing the original data and screening characteristic variables, determining target historical data and constructing a target oxygen extraction rate influence factor set; establishing a single prediction model set based on algorithm principle analysis, wherein the single prediction model set comprises: boosting integrated model based on GBDT gradient Boosting iterative decision tree, bagging integrated model based on support vector machine, bagging integrated model based on ridge regression and Bagging integrated model based on random forest; training the single prediction model sequentially by taking the target oxygen extraction rate influence factor set as input information to obtain a target model set; and fusing the target model set based on a Stacking integrated learning method principle to obtain a target air separation system oxygen extraction rate prediction model, and predicting the oxygen extraction rate through the air separation system oxygen extraction rate prediction model.
In a second aspect, the present application provides an air separation system oxygen extraction rate prediction system based on ensemble learning, the system comprising: the data acquisition module is used for acquiring historical data of the space division system to obtain target initial data; the data preprocessing module is used for preprocessing the acquired original data to obtain target historical data; the characteristic variable selection module is used for screening out characteristic variables with larger influence on the oxygen extraction rate, screening out some irrelevant characteristic variables and constructing a target oxygen extraction rate influence factor set according to the screening result; the model set component module is used for establishing a single prediction model set based on algorithm principle analysis, namely a Stacking model base learner and a meta learner, wherein the single prediction model set comprises: boosting integrated model based on GBDT gradient Boosting iterative decision tree, bagging integrated model based on support vector machine, bagging integrated model based on ridge regression and Bagging integrated model based on random forest; the model set training module is used for training the single prediction models in sequence by taking the target oxygen extraction rate factor set as input information to obtain a target single prediction model set; and the model fusion module is used for fusing the target single model set based on the Stacking integrated learning method principle to obtain a target air separation system oxygen extraction rate prediction model, and predicting the oxygen extraction rate through the target air separation system oxygen extraction rate prediction model.
The beneficial effects of the application are as follows:
1. the application adopts the Lasso+Pearson mode to carry out the characteristic variable screening, the Pearson correlation coefficient method is simple and convenient, but if the method of carrying out the covariate screening one by one only by using the Pearson correlation coefficient is adopted, the time for the high-dimensional characteristic variable screening is not only wasted, and the problem of multiple collinearity and autocorrelation of the characteristic variable obtained by the final screening exists. The Lasso algorithm can be just used for screening high-dimensional characteristic variables, and can solve the problem that independent variables have collinearity. Therefore, the characteristic variable screening process is divided into two parts, namely, the Lasso characteristic is used for screening the dimension of the compressed variable and solving the problem of collinearity among independent variables, then the Pearson correlation coefficient method is used for further screening the characteristic variable related to the oxygen extraction rate, and the two parts mutually correspond to better screening the characteristic variable of the space division system, so that the phenomenon that the screened characteristic variable and the oxygen extraction rate are pseudo-related is effectively prevented;
2. according to the application, a prediction model is built by utilizing the idea of integrated learning, and a plurality of models can be combined, so that error correction is mutually carried out among different learners, and a good strong supervision model is obtained, so that the accuracy of a final model and the generalization capability are improved;
3. according to the application, a plurality of integrated learning ideas are combined, boosting, bagging and Stacking are combined, the Boosting training process is in a ladder shape, the importance of data is continuously adjusted according to the learning result of the previous model, the GBDT algorithm based on the Boosting ideas has strong supervision capability and high prediction precision, and nonlinear data can be processed; the Bagging core idea is to sample autonomously, each base model is fitted on a slightly different training set, which can enable differences between each base model to exist and further have slightly different training capabilities; the modeling method and the modeling system have the advantages that the modeling method and the modeling system are used for training by using different models, and the advantages of the different models can be automatically fused, and finally the modeling method and the modeling system based on Boosting and Bagging ideas are fused by using the modeling idea, so that the model with the advantages of the different models is finally obtained, the performance of the final model is further improved, and the generalization capability and the prediction accuracy of the air separation system oxygen extraction rate prediction model are improved.
Table 1 below shows the mean square error MSE, the root mean square error RMSE, and the mean absolute error MAE of the integrated model corresponding to the feature variables having a moderate correlation and a weak correlation with the oxygen extraction rate, using only the Lasso feature variable screening, using the Lasso-combined Pearson correlation coefficient method, and using the Lasso-combined Pearson correlation coefficient method.
Table 1 feature variable screening method comparison
According to the results of table 1, it can be seen that the method for screening the feature variables by combining the Lasso feature screening with the Pearson correlation coefficient method can effectively improve the generalization capability of the model, wherein the selection of the Pearson correlation coefficient method for screening the feature variables with weak correlation and above weak correlation with the oxygen extraction rate for model establishment has a better effect than the selection of the feature variables with medium correlation and above medium correlation for model establishment. The eastern division and western division and example verification show that the feature variable screening method combining the Pearson correlation coefficient method by using the Lasso feature variable screening can effectively improve the generalization capability of the model and the prediction accuracy of the model.
In order to further compare the model performance difference between the single model building method and the Stacking integrated model building method combining Boosting and Bagging, model 1 is sequentially used for screening the weak correlation and the characteristic variables above the weak correlation by using Lasso and Pearson correlation coefficient method: boosting integrated model based on GBDT gradient Boosting iterative decision tree, model 2: bagging integrated model based on support vector machine, model 3: and carrying out five-fold cross validation on a Bagging integrated model based on ridge regression and a Stacking model fused with the three models.
Table 2 comparison of predictive model creation methods
Model Model 1 Model 2 Model 3 Stacking model
Cross validation results 0.76561 0.43019 0.87913 0.88555
According to the five-fold cross validation results of each model in table 2, the model after model fusion by using the Stacking integrated model has better model generalization capability and model prediction accuracy.
Drawings
FIG. 1 is a schematic flow chart of a method for predicting the oxygen extraction rate of a space division system based on ensemble learning;
FIG. 2 is a schematic diagram of a process for obtaining a target oxygen extraction factor set in a method for predicting an oxygen extraction rate of a space division system based on ensemble learning;
FIG. 3 is a schematic diagram of a flow chart for constructing an oxygen extraction rate prediction model in an air separation system oxygen extraction rate prediction method based on ensemble learning;
fig. 4 is a schematic structural diagram of an air separation system oxygen extraction rate prediction system based on ensemble learning.
Detailed Description
The application provides an integrated learning-based air separation system oxygen extraction rate prediction method, which is used for solving the technical problems that the existing air separation system has little research on the oxygen extraction rate and the existing prediction method model has insufficient single generalization capability, so that the final prediction result and the actual fitting degree are insufficient.
Example 1
As shown in fig. 1, the application provides a method for predicting an oxygen extraction rate of a space division system based on ensemble learning, which comprises the following steps:
(1) The method comprises the steps of collecting space division system historical data, wherein the space division system historical data refer to space division system historical data collected based on a preset time interval in the past, integrating the collected data, identifying the collected data based on a time sequence so as to perform data identification and distinguishing at a later stage, for example, the space division system data are collected every 60s every day, namely 1440 data in total every 24 hours, and the collected characteristic variables comprise: air separation system characteristic variables of 214 such as oxygen outlet tower flow, air inlet fractionating tower flow, lower tower resistance, upper tower resistance, liquid oxygen evaporator pressure and the like, and meanwhile, according to an oxygen extraction rate formula: oxygen extraction rate= (oxygen outlet tower flow/air inlet fractionation tower flow), obtaining oxygen extraction rate y;
(2) Preprocessing the historical data of the space division system, wherein the preprocessing process comprises the following steps: filling the blank value, processing the abnormal value to obtain target historical data, filling the blank value according to an average value method, calculating a minimum observed value, 25% quantile, median, 75% quantile and maximum observed value of the oxygen extraction rate by using a box diagram, and if the data exceeds the maximum observed value or the minimum observed value, regarding the data as the abnormal value, and deleting a space division system data sample corresponding to the abnormal value by using a deletion method to obtain the target historical data because the target historical data sample is rich;
(3) Carrying out Max_Min normalization processing on the characteristic variables of the space division systems with different dimensions, mapping all data into a range of 0-1, and carrying out data dimensionless processing, wherein the normalization processing formula is as follows:
where x is the original data, x min For the corresponding minimum data in the feature variable, x max For the corresponding maximum data in the feature variables, max_Min normalization processing is used, so that all data in different dimensions can be mapped to the same dimension, and dimension influence among indexes is eliminated;
(4) For normalized data x new First round feature variable screening using Lasso algorithm by adding L 1 The regularization mode prevents overfitting, and the coefficient of unimportant characteristic variables in the data is contracted to 0, so that the characteristic variable selection of the first round is realized, and an initial oxygen extraction rate influence factor set is obtained, wherein a cost function in a Lasso algorithm is as follows:
wherein y is (i) Represents the i-th oxygen extraction value, θ T Representing the corresponding characteristic variable coefficient size, x (i) Represents the i characteristic variable value, lambda is penalty coefficient, is mainly responsible for controlling the magnitude of penalty force,is L 1 The punishment item is used for selecting a model with the best effect when a punishment coefficient is selected to be 0.01 in the Lasso characteristic variable screening process in the air separation system oxygen extraction rate prediction method based on ensemble learning;
(5) The initial oxygen extraction rate influence factor set comprises a plurality of oxygen extraction rate influence factors, and in order to further simplify the model construction and improve the generalization capability of the model, the plurality of oxygen extraction rate influence factors and the oxygen extraction rate are sequentially analyzed by using a Pearson correlation coefficient method, characteristic variables with the absolute value of the Pearson correlation coefficient between the plurality of oxygen extraction rate and the oxygen extraction rate being more than or equal to 0.2 are screened out, namely characteristic variables with weak correlation and correlation degrees above the weak correlation are screened out, and a target oxygen extraction rate influence factor set is formed, wherein the calculation formula of the Pearson correlation coefficient is as follows:
wherein the method comprises the steps ofFor X, the sum of squares of the mean differences of the individual characteristic variables,>y is the sum of squares of the mean deviation of the oxygen extraction rate,>for the sum of the average differences of the distances between X and Y, further screening out influence factors with a strong relation with the oxygen extraction rate by calculating the Pearson correlation coefficient between each characteristic variable and the oxygen extraction rate, and constructing a target oxygen extraction rate influence factor set;
(6) Dividing target historical data into a training set and a testing set according to the proportion of 7:3, wherein the training set is responsible for training a model, and the testing set is used for testing and establishing the model goodness;
(7) Constructing a Boosting integrated model based on GBDT gradient Boosting iterative decision tree, namely using the decision tree as a base regressor, adopting a method of linear combination of base functions for a training set and continuously reducing residual errors generated in a training process, continuously Boosting a weak learner into a strong learner, building the Boosting integrated model, and defining the model as H_1;
(8) Constructing a Bagging integrated model based on a support vector machine, namely using the support vector machine as a base model, randomly sampling m samples in a training data set, carrying out T times of random sampling, wherein the random sampling ensures that the sample number of each sampling set is the same as that of the training set, but the sample contents of the T sampling sets are different, the T sampling sets are different from each other, training different subsets by using the base model support vector machine after the acquisition is completed to obtain T base models, then respectively predicting test data by using the T base models to obtain T prediction results corresponding to the T base models, calculating the average value of the T prediction results by using an average value method to be used as a final prediction result of the model, and defining the model as H_2;
(9) Constructing a Bagging integrated model based on ridge regression, namely randomly sampling T samples in a training set by using a ridge regression algorithm as a base model, sampling m samples each time, wherein the random sampling ensures that the sample content of each sampling set is the same as that of the training set, but the sample content of the T sampling sets is different, then training the T sampling sets by using the ridge regression algorithm respectively to obtain T base models based on the ridge regression model, determining the final prediction result of the model by using an average method on the T prediction results by using a Bagging integrated idea, and defining the model as H_3;
(10) Training by a meta learner, constructing a Bagging integrated model based on a random forest, namely using the random forest as a base model, randomly sampling m samples in a training data set, carrying out T times of random sampling, wherein the random sampling ensures that the number of samples in each sampling set is the same as that of samples in the training set, but the sample contents of T sampling sets are different, the T sampling sets are different from each other, training different subsets by using the base model random forest after acquisition is completed to obtain T base models, respectively predicting test data by using the T base models to obtain T prediction results corresponding to the T base models, calculating the average value of the T prediction results by using an average value method to be used as a final prediction result of the model, and defining the model as MH;
(11) Referring to fig. 3, based on the Stacking integrated learning algorithm principle, models h_1, h_2 and h_3 are used as primary learners in a Stacking prediction model, primary learners learn raw data first, then the output of the primary learners for the raw data is stacked in a column manner to form (m, p) dimensional new data, m is the number of samples, p represents the number of the base learners, in the method, p is 3, and the (m, p) dimensional new data is input into a meta learner in the Stacking integrated model as input to perform fusion of the primary learners.
(12) The primary learner, the meta learner and the Stacking integrated model fused by the meta learner are respectively subjected to five-fold model cross verification by using five-fold cross verification, and the specific method comprises the following steps: the training set is divided into five folds, one fold is selected as a verification set for model verification every time, the other four folds are used as the training set for model training, the five-fold cross verification is carried out for five times for model verification, and the risk of model fitting can be effectively reduced by adopting a five-fold cross verification method.
(13) Referring to fig. 4, the embodiment of the application also provides an air separation system oxygen extraction rate prediction system based on ensemble learning, which mainly comprises the following modules:
the data acquisition module is used for acquiring historical data of the space division system, and the historical data of the space division system comprises: 214 air separation system characteristic variables such as oxygen outlet tower flow, air inlet fractionating tower flow, lower tower resistance, upper tower resistance, liquid oxygen evaporator pressure and the like and an oxygen extraction rate spool, wherein the oxygen extraction rate is calculated as a final index value according to the oxygen outlet tower flow and the air inlet fractionating tower flow;
the data preprocessing module is used for preprocessing the collected original data, and preprocessing contents comprise: processing the missing value, processing the abnormal value and unifying and dimensionalizing the data so as to form a data set required by constructing an oxygen extraction rate prediction model of the space division system;
the feature variable selecting module is used for screening feature variables with strong correlation to the oxygen extraction rate, screening out some feature variables which are irrelevant to the oxygen extraction rate or have little correlation, improving the model establishment efficiency and the model precision, and the feature screening module comprises: the Lasso characteristic variable screening and the Pearson correlation coefficient method are adopted, and finally the characteristic importance degree of the corresponding characteristic variable on the oxygen extraction rate is given;
the model set building module is used for building a single prediction model set based on algorithm principle analysis, wherein the single prediction model set comprises: boosting integrated model based on GBDT gradient Boosting iterative decision tree, bagging integrated model based on support vector machine, bagging integrated model based on ridge regression and Bagging integrated model based on random forest;
the model set training module is used for training the single prediction models sequentially by taking the target oxygen extraction rate factor set as input information to obtain a target single prediction model set;
the model fusion module is used for fusing the target single model set based on the Stacking integrated learning method principle to obtain a target air separation system oxygen extraction rate prediction model;
and the oxygen extraction rate prediction module is used for predicting the oxygen extraction rate by using the target air separation system oxygen extraction rate prediction model to obtain a target value.
The foregoing is only a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art, who is within the scope of the present application, should make equivalent substitutions or modifications according to the technical solution of the present application and the inventive concept thereof, and should be covered by the scope of the present application.

Claims (10)

1. The method for predicting the oxygen extraction rate of the air separation system based on the ensemble learning is characterized by comprising the following steps of: collecting historical data of a space division system, and preprocessing the historical data of the space division system to obtain target historical data; analyzing the target historical data and screening characteristic variables, and constructing a target oxygen extraction rate influence factor set according to screening results; based on Boosting integrated learning algorithm and Bagging integrated learning algorithm principle, a base learner and a meta learner of Stacking integrated learning are built, wherein the base learner comprises: boosting integrated model based on GBDT gradient lifting iterative decision tree, bagging integrated model based on support vector machine, bagging integrated model based on ridge regression; the meta learner is: bagging integrated model based on random forest; training the basic learner and the meta learner model by taking the target oxygen extraction rate influence factor set as input in sequence to respectively obtain a target basic learner and a target meta learner; and fusing the target base learner with the target element learner based on the Stacking integrated learning method principle to obtain a target air separation system oxygen extraction rate prediction model, and predicting the oxygen extraction rate through the oxygen extraction rate prediction model.
2. The air separation system oxygen extraction rate prediction method based on ensemble learning according to claim 1, wherein: preprocessing the historical space division system data, filling the missing values by using an average filling method, identifying the abnormal values by using a box diagram, and deleting the abrupt variation normal values.
3. The air separation system oxygen extraction rate prediction method based on ensemble learning according to claim 1, wherein: and performing first-round feature variable screening by using Lasso feature screening, and constructing an initial oxygen extraction rate influence factor set according to an analysis result, wherein the initial oxygen extraction rate influence factor set comprises a plurality of oxygen extraction rate influence factors, and performing Pearson related coefficient analysis on each oxygen extraction rate influence factor in the initial oxygen extraction rate influence factors and the oxygen extraction rate in sequence to obtain a target oxygen extraction rate influence factor set.
4. The air separation system oxygen extraction rate prediction method based on ensemble learning according to claim 1, wherein: and dividing the data corresponding to the target oxygen extraction rate influence factor set into training data and test data according to the ratio of 7:3, and training the base learner and the element learner by taking the training data as input information in sequence to respectively obtain a target base learner and a target element learner.
5. The method for predicting the oxygen extraction rate of the air separation system based on ensemble learning according to claim 4, wherein the method comprises the following steps: based on the Boosting integrated model of the GBDT gradient Boosting iterative decision tree, a series of base learners are trained and generated on a sample subset by using a weak classification algorithm CART, the series of base learners are used for predicting the target oxygen extraction rate, and the obtained series of prediction results are subjected to weighted fusion to generate a final Stacking primary learner model H_1.
6. The method for predicting the oxygen extraction rate of the air separation system based on ensemble learning according to claim 4, wherein the method comprises the following steps: and uniformly sampling the target oxygen extraction rate training data set with a substitution by using a support vector machine as a base model based on the Bagging integrated model of the support vector machine, obtaining a plurality of base models according to different sample sets, respectively predicting the oxygen extraction rate to obtain a plurality of prediction results, obtaining a final Bagging integrated model result based on the support vector machine by an average value method, and generating a final Bagging primary learner model H_2.
7. The method for predicting the oxygen extraction rate of the air separation system based on ensemble learning according to claim 4, wherein the method comprises the following steps: and (3) uniformly sampling the target oxygen extraction rate training data set with a substitution by using a ridge regression algorithm as a base model based on a ridge regression integrated model, obtaining a plurality of ridge regression weak learners according to different sample sets, respectively predicting the oxygen extraction rate to obtain a plurality of prediction results, and obtaining a final ridge regression based Bagging integrated model result by an average method to generate a final Stacking primary learner model H_3.
8. The method for predicting the oxygen extraction rate of the air separation system based on ensemble learning according to claim 4, wherein the method comprises the following steps: and uniformly sampling the target oxygen extraction rate training data set with a put-back ground by using a random forest algorithm as a base model based on a Bagging integrated model of a random forest, obtaining a plurality of random forest weak learners according to different sample sets, respectively predicting the oxygen extraction rate to obtain a plurality of prediction remembers, and obtaining a final Bagging integrated model result based on the random forest by an average value method to generate a final Bagging secondary learner model MH.
9. The method for predicting the oxygen extraction rate of the air separation system based on ensemble learning according to claim 4, wherein the method comprises the following steps: the models H_1, H_2 and H_3 are used as primary learners in a Stacking prediction model, primary data are firstly learned through the primary learners, then the output of the primary learners to the primary data is stacked in a column mode to form (m, p) dimensional new data, m is the number of samples, p represents the number of the base learners, p is 3 in the method, and the (m, p) dimensional new data are used as input to the meta learners in the Stacking integration model to be fused.
10. An air separation system oxygen extraction rate prediction system based on ensemble learning, comprising: the data acquisition module is used for acquiring historical data of each parameter variable of the space division system sensor; the data preprocessing module is used for preprocessing the acquired original data to obtain target historical data; the characteristic variable selection module is used for screening out characteristic variables with larger influence on the oxygen extraction rate, screening out some irrelevant characteristic variables and constructing a target oxygen extraction rate influence factor set according to the screening result; the model set building module is used for building a single prediction model set based on algorithm principle analysis, wherein the single prediction model set comprises: boosting integrated model based on GBDT gradient Boosting iterative decision tree, bagging integrated model based on support vector machine, bagging integrated model based on ridge regression and Bagging integrated model based on random forest; the model set training module is used for training the single prediction model in sequence by taking the target oxygen extraction rate factor set as input information to obtain a target single prediction model set; and the model fusion module is used for fusing the target single model set based on the Stacking integrated learning method principle to obtain a target air separation system oxygen extraction rate prediction model, and predicting the oxygen extraction rate through the target air separation system oxygen extraction rate prediction model.
CN202310476394.7A 2023-04-28 2023-04-28 Air separation system oxygen extraction rate prediction method and system based on ensemble learning Pending CN117131970A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310476394.7A CN117131970A (en) 2023-04-28 2023-04-28 Air separation system oxygen extraction rate prediction method and system based on ensemble learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310476394.7A CN117131970A (en) 2023-04-28 2023-04-28 Air separation system oxygen extraction rate prediction method and system based on ensemble learning

Publications (1)

Publication Number Publication Date
CN117131970A true CN117131970A (en) 2023-11-28

Family

ID=88858915

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310476394.7A Pending CN117131970A (en) 2023-04-28 2023-04-28 Air separation system oxygen extraction rate prediction method and system based on ensemble learning

Country Status (1)

Country Link
CN (1) CN117131970A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117574329A (en) * 2024-01-15 2024-02-20 南京信息工程大学 Nitrogen dioxide refined space distribution method based on ensemble learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117574329A (en) * 2024-01-15 2024-02-20 南京信息工程大学 Nitrogen dioxide refined space distribution method based on ensemble learning
CN117574329B (en) * 2024-01-15 2024-04-30 南京信息工程大学 Nitrogen dioxide refined space distribution method based on ensemble learning

Similar Documents

Publication Publication Date Title
CN111337768B (en) Deep parallel fault diagnosis method and system for dissolved gas in transformer oil
CN115276006B (en) Load prediction method and system for power integration system
CN105678332B (en) Converter steelmaking end point judgment method and system based on flame image CNN recognition modeling
CN109614973A (en) Rice seedling and Weeds at seedling image, semantic dividing method, system, equipment and medium
CN108985380B (en) Point switch fault identification method based on cluster integration
CN111145042A (en) Power distribution network voltage abnormity diagnosis method adopting full-connection neural network
CN106815492A (en) A kind of bacterial community composition and the automatic mode of diversity analysis for 16SrRNA genes
CN112101480A (en) Multivariate clustering and fused time sequence combined prediction method
CN112330050A (en) Power system load prediction method considering multiple features based on double-layer XGboost
CN112756759B (en) Spot welding robot workstation fault judgment method
CN117131970A (en) Air separation system oxygen extraction rate prediction method and system based on ensemble learning
CN111008726B (en) Class picture conversion method in power load prediction
CN108334943A (en) The semi-supervised soft-measuring modeling method of industrial process based on Active Learning neural network model
CN111896495A (en) Method and system for discriminating Taiping Houkui production places based on deep learning and near infrared spectrum
CN112489497A (en) Airspace operation complexity evaluation method based on deep convolutional neural network
CN111046961A (en) Fault classification method based on bidirectional long-and-short-term memory unit and capsule network
CN105844334B (en) A kind of temperature interpolation method based on radial base neural net
CN115757103A (en) Neural network test case generation method based on tree structure
CN113657023A (en) Near-surface ozone concentration inversion method based on combination of machine learning and deep learning
CN106570514A (en) Automobile wheel hub classification method based on word bag model and support vector machine
CN111061151B (en) Distributed energy state monitoring method based on multivariate convolutional neural network
CN107808245A (en) Based on the network scheduler system for improving traditional decision-tree
CN111026075A (en) Error matching-based fault detection method for medium-low pressure gas pressure regulator
CN116304941A (en) Ocean data quality control method and device based on multi-model combination
CN113108949B (en) Model fusion-based sonde temperature sensor error prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination