CN112686296A - Octane loss value prediction method based on particle swarm optimization random forest parameters - Google Patents
Octane loss value prediction method based on particle swarm optimization random forest parameters Download PDFInfo
- Publication number
- CN112686296A CN112686296A CN202011587477.6A CN202011587477A CN112686296A CN 112686296 A CN112686296 A CN 112686296A CN 202011587477 A CN202011587477 A CN 202011587477A CN 112686296 A CN112686296 A CN 112686296A
- Authority
- CN
- China
- Prior art keywords
- random forest
- data
- particle swarm
- value
- particle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000002245 particle Substances 0.000 title claims abstract description 95
- 238000000034 method Methods 0.000 title claims abstract description 59
- TVMXDCGIABBOFY-UHFFFAOYSA-N octane Chemical compound CCCCCCCC TVMXDCGIABBOFY-UHFFFAOYSA-N 0.000 title claims abstract description 58
- 238000007637 random forest analysis Methods 0.000 title claims abstract description 55
- 238000005457 optimization Methods 0.000 title claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 63
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 58
- 238000012360 testing method Methods 0.000 claims abstract description 41
- 238000003066 decision tree Methods 0.000 claims abstract description 16
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000010606 normalization Methods 0.000 claims description 13
- 238000011049 filling Methods 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 10
- 238000012217 deletion Methods 0.000 claims description 6
- 230000037430 deletion Effects 0.000 claims description 6
- 239000003502 gasoline Substances 0.000 description 19
- 230000008569 process Effects 0.000 description 19
- 239000001257 hydrogen Substances 0.000 description 16
- 229910052739 hydrogen Inorganic materials 0.000 description 16
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 description 15
- 238000012545 processing Methods 0.000 description 9
- 238000004523 catalytic cracking Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 8
- NINIDFKCEFEMDL-UHFFFAOYSA-N Sulfur Chemical compound [S] NINIDFKCEFEMDL-UHFFFAOYSA-N 0.000 description 7
- 238000011156 evaluation Methods 0.000 description 7
- 239000002994 raw material Substances 0.000 description 7
- 229910052717 sulfur Inorganic materials 0.000 description 7
- 239000011593 sulfur Substances 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 5
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 4
- 238000006477 desulfuration reaction Methods 0.000 description 4
- 230000023556 desulfurization Effects 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 239000007787 solid Substances 0.000 description 4
- 230000000087 stabilizing effect Effects 0.000 description 4
- 239000003463 adsorbent Substances 0.000 description 3
- 150000001336 alkenes Chemical class 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 230000003197 catalytic effect Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 239000003638 chemical reducing agent Substances 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000011010 flushing procedure Methods 0.000 description 3
- 239000002737 fuel gas Substances 0.000 description 3
- 238000010438 heat treatment Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- JRZJOMJEPLMPRA-UHFFFAOYSA-N olefin Natural products CCCCCCCC=C JRZJOMJEPLMPRA-UHFFFAOYSA-N 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- WKBOTKDWSSQWDR-UHFFFAOYSA-N Bromine atom Chemical compound [Br] WKBOTKDWSSQWDR-UHFFFAOYSA-N 0.000 description 2
- 239000004215 Carbon black (E152) Substances 0.000 description 2
- UGFAIRIUMAVXCW-UHFFFAOYSA-N Carbon monoxide Chemical compound [O+]#[C-] UGFAIRIUMAVXCW-UHFFFAOYSA-N 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- GDTBXPJZTBHREO-UHFFFAOYSA-N bromine Substances BrBr GDTBXPJZTBHREO-UHFFFAOYSA-N 0.000 description 2
- 229910052794 bromium Inorganic materials 0.000 description 2
- 239000003054 catalyst Substances 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000001311 chemical methods and process Methods 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 239000003546 flue gas Substances 0.000 description 2
- 239000007789 gas Substances 0.000 description 2
- 229930195733 hydrocarbon Natural products 0.000 description 2
- 150000002430 hydrocarbons Chemical class 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 229910052757 nitrogen Inorganic materials 0.000 description 2
- 230000008929 regeneration Effects 0.000 description 2
- 238000011069 regeneration method Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 150000001335 aliphatic alkanes Chemical class 0.000 description 1
- 150000004945 aromatic hydrocarbons Chemical class 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 238000004517 catalytic hydrocracking Methods 0.000 description 1
- 239000000571 coke Substances 0.000 description 1
- 238000002485 combustion reaction Methods 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000000368 destabilizing effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000005243 fluidization Methods 0.000 description 1
- 150000002431 hydrogen Chemical class 0.000 description 1
- 238000005984 hydrogenation reaction Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 238000011112 process operation Methods 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 229930195734 saturated hydrocarbon Natural products 0.000 description 1
- 239000010865 sewage Substances 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an octane loss value prediction method based on particle swarm optimization random forest parameters, which comprises the steps of 1, calculating information gain values of relevant characteristics of octane loss values, and deleting characteristics with small influence on octane number loss; step 2, preprocessing the residual data; step 3, training a random forest algorithm by adopting a training data set to obtain a training model; step 4, initializing particle swarm algorithm parameters; step 5, adopting the root mean square error as a fitness function of the particle swarm algorithm, continuously solving the optimal values of the number of the parameter decision trees and the depth of the trees in the training model through the particle swarm algorithm, and introducing the optimal parameters into the training model to obtain an optimal prediction model; and 6, inputting a new test set again, importing the optimal prediction model for testing, and obtaining a prediction result. The method can be effectively used for predicting the octane loss value.
Description
Technical Field
The invention relates to an octane loss value prediction method based on particle swarm optimization random forest parameters, and belongs to the technical field of octane loss value prediction in a gasoline catalytic cracking process flow.
Background
In the gasoline catalytic cracking process, in order to meet the requirement of gasoline sulfur content under a new national standard environment, the requirement of further improvement on desulfurization treatment in gasoline is required, but the octane number content in the gasoline is influenced by excessive process operation in the desulfurization process. The octane number is used as the most important index for reflecting the combustion performance of the gasoline, the octane loss value in the process is controlled, and the economic benefit in production can be effectively improved. Most of the traditional chemical process modeling is realized based on data association and mechanism modeling, but the complexity of the actual oil refining process is high, the control variables have highly nonlinear and strongly coupled relations, the traditional chemical process modeling has high requirements on raw material analysis, and the process optimization response is not timely, so the effect is not ideal.
Currently, the prediction of octane number in process production has been widely studied and good prediction results are obtained. The method mainly focuses on the prediction of the octane component ratio in the finished oil. And analyzing the collected data in the product oil by using a machine learning method, and then performing analysis prediction by using a machine learning model.
Disclosure of Invention
The invention provides an octane loss value prediction method based on particle swarm optimization random forest parameters, which is used for predicting an octane loss value.
The technical scheme of the invention is as follows: a particle swarm optimization random forest parameter-based octane loss value prediction method comprises the following steps:
step 1, calculating an information gain value of relevant characteristics of octane loss values, and deleting characteristics with small influence of octane number loss;
step 2, preprocessing the residual data with characteristics less influenced by the deletion of the octane number loss, and dividing the preprocessed data into a training data set and a test data set;
step 3, training the random forest algorithm by adopting a training data set to obtain a training model, and verifying the training model by adopting a test data set;
step 4, initializing particle swarm algorithm parameters;
and 5, adopting the root mean square error of the verified random forest algorithm training model as a fitness function of the particle swarm algorithm, continuously solving the optimal value of the number n _ estimators of the parameter decision trees and the depth max _ depth of the trees in the verified random forest algorithm training model through the particle swarm algorithm, and introducing the optimal parameters into the verified random forest algorithm training model to obtain the optimal prediction model.
Further comprising:
and 6, inputting the data processed in the step 1 and the step 2 again as a new test set, importing the new test set into an optimal prediction model, and testing to obtain a prediction result.
In the step 1, the deleting conditions are as follows: and whether the information gain value of the characteristic is smaller than the average information gain value of all the characteristics or not is judged, and the characteristics corresponding to the average information gain value smaller than all the characteristics are deleted.
In the step 2, the pretreatment specifically comprises: filling null values and normalizing.
The filling null specifically is: the sample data is concentrated, and when a single characteristic of a certain sample data is a null value, the mean value of the sum of the previous data and the next data at the null value position is used for filling the null value; otherwise, when more than two characteristics in a certain sample data have null values, deleting the data.
The normalization is specifically performed by min-max normalization, so that the result value is mapped between [0-1 ].
In the step 4, the parameters are set as follows: population number, particle position inertial weight, particle learning factor and particle dimension; the population number, the particle position inertia weight and the particle learning factor are used as main parameters influencing the particle swarm algorithm, and the particle dimension is the number of optimized random forest parameters.
The invention has the beneficial effects that:
(1) according to the method, the information gain calculation is carried out on the collected data, the size and the distribution interval of the information gain value of each feature data can be conveniently observed, the association degree of the features and the octane loss value in the collected data is conveniently observed, the feature data with low association coupling degree in the original data set is further deleted, the effective information of the feature data is continuously extracted, meanwhile, the time length required by model training is reduced, the overfitting influence of too much feature data on the model is avoided, through the step 1, the feature data required by the model training is guaranteed to have high association coupling effective information, and the economic cost and the time cost in the model training are reduced.
(2) The null value data are processed by different means, and the data of more than two null values are deleted instead of filled, so that abnormal data are effectively removed, and the defects that the filling is different from other normal data and the training effect of the model is interfered are avoided; and for a null value, the mean value of the sum of data before and after the null value position is adopted for replacing, the purpose is that the mean value reflects the change trend before and after, the deviation from the actual true value is effectively reduced, and the purpose of replacing the true value is achieved. Meanwhile, normalization processing is added, through the normalization processing, the indexes are in the same order of magnitude according to standardization, the influence of different dimensions and dimension units caused by different evaluation indexes can be eliminated, and the comparability between data indexes is solved, so that the convergence speed in the subsequent parameter optimization process is higher, and the convergence optimal solution is easier to obtain. The normalized data can be divided into a training data set and a testing data set in a random sampling mode 1:1, so that the large data difference between the training data set and the testing data set is avoided.
(3) After the processing of the steps 1 and 2, the random forest algorithm is further combined, the data processed in the steps 1 and 2 can well avoid the defects of the random forest algorithm (such as high noise, easy overfitting and long training time), so that the advantages of the random forest algorithm can be further improved, the random forest is trained through the training data set processed in the steps, the model can achieve a good training effect, and then the test data set is used for testing the random forest algorithm to show the prediction effect of the model.
(4) By using the initialized particle swarm algorithm to solve the optimal value of the parameters in the training model, the defects of uncertainty and deviation caused by the fact that the algorithm parameters in the training model are set by adopting artificial experience values can be avoided; the optimal parameters are obtained through the particle swarm algorithm and input into the training model, so that the training model can be promoted to be an optimal prediction model, and the time required by training the training model is reduced and the generalization capability of the training model is promoted simultaneously under the condition that the prediction capability of the model is effectively enhanced through the number n _ estimators and the depth max _ depth of the proper decision trees selected through the particle swarm algorithm; furthermore, two parameters of the number n _ estimators of the decision tree and the depth max _ depth of the tree are selected as particle dimensions, so that the condition that the search dimension is higher due to the fact that the number of target values is larger can be avoided, the search running time of a particle swarm algorithm is greatly increased, and the search efficiency is reduced; the defects that single parameter searching is too simple and the effect improvement is unstable can be avoided; by constructing the two-dimensional space search range, parameter values required by the optimization training model can be ensured to be searched, the algorithm search operation time is short, and the algorithm efficiency is ensured.
In conclusion, the training effect of the training model can be improved by performing the processing of the step 3 after the processing of the steps 1 and 2, and the prediction capability of the training model is continuously improved from the aspect of parameters by further matching with particle swarm parameter optimization, so that the training time of the training model is reduced, and the prediction performance of the model is further improved; experiments of the invention also show that the optimal prediction model has good prediction capability on new characteristic data collected in the process, has strong stability and can be effectively used for predicting the octane loss value.
Drawings
FIG. 1 shows a flow chart of the present invention;
FIG. 2 shows a comparison experimental verification diagram of superiority of the random forest algorithm to data in the scene of the invention;
FIG. 3 is a scatter plot of relative data distribution according to the method of the present invention;
FIG. 4 is a graph showing the prediction capability of the method of the present invention in real short-term data.
Detailed Description
Example 1: as shown in fig. 1, a method for predicting octane loss value based on particle swarm optimization random forest parameters comprises the following steps:
step 1, calculating an information gain value of relevant characteristics of octane loss values, and deleting characteristics with small influence of octane number loss;
step 2, preprocessing the residual data with characteristics less influenced by the deletion of the octane number loss, and dividing the preprocessed data into a training data set and a test data set;
step 3, training the random forest algorithm by adopting a training data set to obtain a training model, and verifying the training model by adopting a test data set;
step 4, initializing particle swarm algorithm parameters;
and 5, adopting the root mean square error of the verified random forest algorithm training model as a fitness function of the particle swarm algorithm, continuously solving the optimal value of the number n _ estimators of the parameter decision trees and the depth max _ depth of the trees in the verified random forest algorithm training model through the particle swarm algorithm, and introducing the optimal parameters into the verified random forest algorithm training model to obtain the optimal prediction model.
Further, it may be provided that the method further includes: and 6, inputting the data processed in the step 1 and the step 2 again as a new test set, importing the new test set into an optimal prediction model, and testing to obtain a prediction result.
Further, in the step 1, the deleting conditions may be: and whether the information gain value of the characteristic is smaller than the average information gain value of all the characteristics or not is judged, and the characteristics corresponding to the average information gain value smaller than all the characteristics are deleted.
Further, in the step 2, the pretreatment specifically includes: filling null values and normalizing.
Further, the filling null value may be specifically set as: the sample data is concentrated, and when a single characteristic of a certain sample data is a null value, the mean value of the sum of the previous data and the next data at the null value position is used for filling the null value; otherwise, when more than two characteristics in a certain sample data have null values, deleting the data.
Further, the normalization may be set to specifically employ min-max normalization, so that the result value is mapped between [0-1 ].
Further, in the step 4, the parameters may be set as follows: population number, particle position inertial weight, particle learning factor and particle dimension; the population number, the particle position inertia weight and the particle learning factor are used as main parameters influencing the particle swarm algorithm, and the particle dimension is the number of optimized random forest parameters.
In the step 1, the characteristic with small influence on octane number loss is deleted to avoid the problem of overfitting, and the specific deletion conditions are as follows: whether the characteristic information gain value is smaller than the average information gain value of all the characteristics in the original data set or not and deleting the characteristics corresponding to the average information gain value smaller than all the characteristics in the original data set, wherein the information gain formula is as follows:
in the formula, the sample set is assumed to have n types of labels, and the set is C ═ C1,C2,...,Cn),i=1,2,...n;CiThe label set C is the ith type label. Suppose there are m classes of features in the sample set, and the set is T ═ T (T)1,t2,...,tm),j=1,2,...m,tjIs the jth class feature in the feature set T. P (t)j) Represents a feature tjThe probability of occurrence of the event is,representation featuretjProbability of absence, P (C)i) Is represented by CiProportion value of class label data to total data, P (C)i/tj) Represents a feature tjAt the time of occurrence CiThe probability of the occurrence of the class data,represents a feature tjWhen not present CiProbability of occurrence of class data. H (C) represents the information entropy value of the tag set C, the smaller the entropy value is, the lower the randomness degree of the information is, and the characteristic T in the characteristic set TjConditional entropy of (C) H (C)i/tj) Is shown in the known characteristic tjUnder the conditions of (1), label CiThe lower the conditional entropy value, the lower the degree of randomness of the information of (C)iAnd tjHigher the degree of correlation of (c), the information gain IG (t)j) The difference value of the information entropy minus the conditional entropy is obtained, so that the larger the information gain is, the label CiAnd the feature tjThe higher the degree of correlation of (c), the higher the feature tjThe greater the ability to reduce the degree of randomness of the total data, the greater its value is for classification.
The method comprises the steps of constructing an original data set for relevant features of octane loss values, wherein each sample data is composed of m features and corresponding labels, calculating information gain values of the relevant features of the octane loss, selecting information gains to carry out subsequent judgment but not others, carrying out numerical display on the features in the data based on the information gains, conveniently observing the size and distribution interval of each feature information gain value, and providing a convenient numerical observation means for selecting or rejecting the relevant features; and whether the characteristic information gain value is smaller than the average information gain value of all the characteristics in the original data set or not is used as a deletion judgment condition, so that the data with low degree of association with the label category in the original data set can be effectively deleted, the calculation consumption is reduced, the overfitting of the model can be avoided, and the purpose of improving the training speed of the model in the subsequent step is achieved.
When the null value in the step 2 is filled, the average value of the sum of the data before and after the position of each null value is adopted to replace the filled null value, so that the data characteristics with small amplitude fluctuation change between adjacent data in continuous time are attached, and the deviation can be effectively reduced.
And 2, preprocessing the data, processing the characteristic data by using a normalization function, standardizing the data by processing, and eliminating the influence of different dimensions and dimension units caused by different evaluation indexes to solve the comparability between the data indexes, wherein all the indexes are in the same order of magnitude. The normalization specifically adopts min-max normalization, also called dispersion normalization, which is linear transformation of the original data to map the result value between [0-1 ]; the conversion formula is as follows:
max is the maximum value in the characteristic data, and min is the minimum value in the characteristic data. A is data before normalization, A*Is normalized data.
In step 5, the particle swarm optimization is used for optimizing random forest parameters, the parameters for determining the performance of the model mainly consist of the number n _ estimators of the decision trees and the depth max _ depth of the decision trees, the prediction precision of the model can be effectively improved through the optimal parameter combination of the particle swarm optimization search, the deviation of the predicted value is reduced, and the predicted value is closer to the true value, most parameter settings of the traditional random forest algorithm are based on manual settings, and the empirical values are relied on; in the optimization iteration process by means of particle swarm, the position of a selected parameter is set as a two-dimensional vector Sq=(S1,S2),S1=n-estimators,S2Max-depth, each particle has two attributes: the speed and the position of the particles are continuously searched, wherein the individual searching optimal solution is Pbest, the group optimal solution is Gtest, and the particles continuously update the speed and the position of the particles through Pbest and Gtest in the iterative optimization process:
wherein:for the q-dimensional component of the particle lambda airspeed vector at the kth iteration,is the q-dimensional component of the particle lambda position vector at the kth iteration. Alpha is the inertial weight of the particle position, q is the particle dimension, generally r1、r2Two value ranges are [0, 1]]In order to increase the search randomness, the position and velocity of the particles are generally limited to [ X ]MIN,XMAX],[VMIN,VMAX],VMINIs the minimum search velocity, V, of the particleMAXIs the maximum search speed of the particle, XMINFor the minimum search position of the particle, XMAXThe maximum search position for the particle is to ensure that the particle does not search blindly.
And 5, the value of the random forest parameter must be an integer, so that the position and speed values of the particle are rounded by using a rounding function, and when the model can obtain the optimal solution, the optimal parameter combination is output to be a positive integer.
Example 2: aiming at a method for predicting octane loss value based on particle swarm optimization random forest parameters, the invention provides the following experimental data process:
step 1, extracting characteristic data influencing octane loss in a gasoline catalytic cracking process flow to serve as an original data set, calculating information gain values of all characteristics in the original data set, and deleting the characteristics with small influence on octane number loss to avoid the problem of overfitting. The method comprises the following specific steps:
1.1, in the gasoline catalytic cracking process, the octane loss is caused by that excessive olefin substances are generated to cause the reaction consumption of octane due to hydrodesulfurization in the desulfurization process, so that data collected by a hydrodesulfurization section sensor in the gasoline catalytic cracking process per hour is selected as an original data set, only section data causing octane number loss in the gasoline catalytic cracking process flow is locked and extracted, the data collection cost is effectively lowered, and effective information in characteristic data is conveniently extracted; and calculating and acquiring information gain values of the features in the original data set, deleting the features which are smaller than all the features in the original data set and correspond to the average information gain values of the features, wherein the deleted feature variables are shown in table 1.
TABLE 1 delete characteristic information gain Table
Name of variable | Information gain |
Feed unit feedstock sulfur content | 0.342 |
Stabilizing column pressure | 0.356 |
Recycle hydrogen to lockhopper dipleg flow | 1.073 |
Cumulative flow of waste hydrogen discharge | 1.039 |
Pressure of reducer | 1.98 |
Reactor top pressure | 1.919 |
Flow rate of light hydrocarbon out of device | 1.174 |
Fuel gas inlet pressure | 1.664 |
Flow of light naphtha into the device | 1.984 |
Regenerator pressure | 1.958 |
Flow of refined gasoline to feeding buffer tank | 1.846 |
Pressure of outlet mixed hydrogen point of circulating hydrogen compressor | 1.899 |
Middle temperature of R-101 bed | 1.927 |
Average information gain value of all features | 1.987674 |
The remaining features include: saturated hydrocarbon (alkane + cyclane) content, olefin content, aromatic hydrocarbon content, bromine number, raw material sulfur content, spent adsorbent coke content, spent adsorbent sulfur content, hydrogen-oil ratio, reducer fluidization hydrogen flow, reactor upper temperature, reactor bottom temperature, reactor top-bottom pressure difference, back-flushing hydrogen temperature, back-flushing hydrogen pressure, dry gas outlet device temperature, refined gasoline outlet device flow, refined gasoline outlet device sulfur content, steam inlet device pressure, steam inlet device flow, dry gas outlet device flow, fuel gas inlet device temperature, fuel gas inlet device flow, 1.0MPa steam inlet device temperature, D107 converter line pressure difference, D107 lift nitrogen flow, catalytic gasoline inlet device total flow, 2# catalytic gasoline inlet device flow, 3# catalytic gasoline inlet device flow, raw material pump outlet flow, raw material inlet device flow, Hydrogen flow of hydrogen mixing point, inlet temperature of heating furnace, exhaust temperature of heating furnace, outlet temperature of circulating hydrogen of heating furnace, inlet temperature of reactor, D104 destabilizing tower flow, reducer temperature, regeneration air flow, R102 regenerator lifting nitrogen flow, regenerator top and bottom differential pressure, regenerator top flue gas temperature, regenerator temperature, regeneration flue gas oxygen content, raw material inlet device flow, D-123 steam outlet flow, D-110 steam coil inlet flow, stabilizing tower lower temperature, stabilizing tower top outlet temperature, stabilizing tower bottom outlet temperature, regenerator top/regenerator receiver differential pressure, emergency hydrogen main pipe flow, emergency hydrogen R-101 flow, blocking hopper hydrocarbon content, blocking hopper charging line pressure, R-101 bed lower temperature, D-121 sulfur-containing sewage discharge capacity, hydrocracking light naphtha inlet device accumulated flow, The flow from the hydrogen of 8.0MPa to the inlet of the recycle hydrogen compressor and the flow from the hydrogen of 8.0MPa to the outlet of the back-flushing hydrogen compressor.
Step 2, preprocessing the residual data with characteristics of small influence on octane number loss deletion, specifically: filling null values, normalizing, dividing into a training data set and a test data set, and extracting as a comparison and verification experiment data set. The method comprises the following specific steps:
2.1, deleting a certain piece of data when two or more characteristics contained in the certain piece of sample data have null values; (2) in the sample data set, a single characteristic of a certain sample data is a null value, and the average value of the sum of the previous data and the next data at the null value position is used for substitution, so that the deviation can be effectively reduced.
And 2.2, normalizing all the characteristic data to eliminate dimensional influence among the data.
2.3, dividing the 2017 and 2019 year data into a training data set and a testing data set according to a random 1:1 ratio. The data are acquired for a long time, and octane loss numerical characteristic data are constructed and completed through long-time historical data collection, so that the prediction capability of the model can be more accurate.
And 3, training the random forest algorithm by adopting the training data set to obtain a training model, and verifying the training model by adopting the test data set. The method comprises the following specific steps:
and 3.1, inputting a training data set and a test data set, performing comparison and verification experiments of different regression algorithms, wherein prediction results are shown in fig. 2 and fig. 3, the prediction capability of the different regression algorithms on the test data set is shown in fig. 2, solid dots with dotted lines are true values of each serial number data in the test data set, solid inverted triangles are predicted values of each serial number data in the test data set by the regression algorithms, and the closer the predicted value corresponding to each serial number in the graph is to the true value, the better the prediction capability of the regression algorithms is. Fig. 3 shows relative positional deviation values between the true values of the serial number data in the test data set and the predicted values of the serial number data in the test data set in different regression algorithms, where if the data points in the graph are linearly concentrated, the better the performance of the regression algorithm is. The regression models in the positions of fig. 3 and fig. 2 correspond one to verify the sample data processing effect of different algorithms in the present invention. The comparative evaluation index selects a correlation coefficient, a root mean square error, a mean square error and a mean absolute error, wherein the formula is as follows:
in the above formula, h (x)I) Indicating the predicted value of the I-th data,the true value of the I-th data is represented,the mean value is shown, and the number of data is M. The numerator of the correlation coefficient represents the sum of the square deviations of the real value and the predicted value, the denominator represents the sum of the square deviations of the real value and the mean value, and the value range is [0, 1]]In between, a closer value to 1 indicates a better model fit. The mean square error is the sum of the squares of the differences between the actual and predicted values in the test set, the root mean square error is the square root of the mean square error, the squared absolute error is the average of the absolute values of the differences between the predicted and actual values, the mean square error is the same as the root mean square error, and the more the numerical value is close to 0, the more the modulus is representedThe type prediction precision is high. The square absolute error can avoid the problem of mutual error cancellation, and the actual situation of the error of the predicted value can be better reflected. The evaluation indexes of the actual comparative experiment results are shown in table 2.
TABLE 2 evaluation index table for different regression models
The performance indexes of Decision Trees (DT), Logistic Regression (LR), Support Vector Machines (SVM), K neighbor (KNN), AdaBoos, Bagging, BP neural networks (BP) and the like are weaker than those of Random Forest (RF) models.
Step 4, initializing particle swarm algorithm parameters, and setting main parameters as follows: the method comprises the following steps of population quantity, particle position inertia weight, particle learning factors and particle dimension, wherein the population quantity, the particle position inertia weight and the particle learning factors are used as main parameters influencing a particle swarm algorithm, and the particle dimension is the number of optimized random forest parameters. The method comprises the following specific steps:
4.1, when the parameters of the established random forest are optimized, the number n _ estimators of the decision trees and the depth max _ depth of the trees in the core parameters are optimized, theoretically, the number of the decision trees is increased, the variance of a prediction result can be effectively reduced, training time can be increased, the deeper the depth of the decision trees is, the stronger the prediction capability of the corresponding model is, and the more the depth of the decision trees is, the longer the training time is and the easier overfitting is. Therefore, the selection of the proper number n _ estimators and depth of the decision trees can effectively enhance the prediction capability of the model by max _ depth, and simultaneously reduce the time length required by the model training. Because the parameter is manually set, the optimal model prediction capability cannot be obtained usually by means of empirical values, and the particle swarm algorithm is continuously updated according to the positions of particles to determine the final optimal parameter combination.
4.2, initializing the population, setting the population number w to 100, the iteration number k to 100, the optimization dimension q to 2, the inertia weight alpha to 0.8, and the learning factor c1=c22, particle search velocity VMIN=1,VMAXWhen 5, the particle searches for position XMIN=1,XMAX=20。
And 4.3, setting a fitness function as a judgment index of the optimal position of the particle search, and selecting the root mean square error as the fitness function of the prediction model by the model.
And 5, adopting the root mean square error of the random forest algorithm training model as a fitness function of the particle swarm algorithm, taking the root mean square error as the square root of the sum of squares value of the difference between the real value and the predicted value of the test set, wherein the smaller the root mean square error is, the stronger the prediction performance of the model is, and performing minimum value optimization on the root mean square error by using the particle swarm algorithm so as to determine the parameter value corresponding to the optimal performance when the random forest regression model reaches the search condition. Therefore, the optimal values of the number n _ estimators of the parameter decision trees and the depth max _ depth of the trees in the random forest algorithm are continuously solved through the particle swarm, and the optimal parameters are led into the random forest algorithm to obtain an optimal prediction model. The method comprises the following specific steps:
and 5.1, calculating the fitness value of each particle by adopting the fitness function established in the step 4.3 for the trained random forest prediction model.
5.2 after the continuous iteration process, counting the fitness of each particle, selecting the particles with smaller fitness, and continuously reducing the range to carry out iteration after remembering the positions. And when the iteration times are reached, obtaining the optimal fitness function value and the position and the speed of the corresponding particle, outputting the corresponding parameter value, and inputting the optimal parameter combination into the random forest prediction model to obtain the optimal prediction model.
And 6, inputting the processed latest data again to be extracted as a new test data set, importing the new test data set into an optimal prediction model for testing to obtain a prediction result, and verifying the stability of the model. The method comprises the following specific steps:
6.1 the data of 1-2 months in 2020 is processed according to step 1 and step 2, but not divided, and all the processed data is used as a new test data set.
6.2, importing the new test data set into an optimal prediction model based on particle swarm optimization random forest parameters (PSO-RF), and obtaining a prediction result distribution graph as shown in FIG. 4, wherein the solid dots with lines are real values of each serial number data in the new test data set, and the solid pentagons with lines are predicted values of each serial number data in the new test data set. The evaluation indexes of the prediction model are shown in table 2. As can be seen from fig. 4 and table 2, the optimal prediction model has a good prediction effect on the new test data set, and is within the actual error range.
Table 2 test data prediction result evaluation table
Evaluation index | Numerical value |
MSE | 0.01881 |
MAE | 0.10302 |
RMSE | 0.13716 |
The experimental stand-alone processor of the embodiment of the invention is Intel (R) core (TM) i5-4590 CPU3.3GHz, the running memory is 12GB, the operating system is a 64-bit windows7 flagship edition, and the program compiling language is Python.
The working principle of the invention is as follows: according to relevant literature data, equipment characteristic parameters relevant to desulfurization core reaction are selected from process production, the characteristic parameter extraction range is narrowed to relevant equipment such as a reactor, a regenerator, raw material relevant equipment, hydrogenation relevant equipment, a catalyst and the like, influence factors influencing octane loss values in various steps and equipment are found out according to the literature, wherein the influence of factors influencing octane loss values on hydrogen-oil ratio, mass space velocity, spent adsorbent sulfur holding rate, carbon holding rate and the like is researched by Zhengyunfeng et al (influence of raw material oil on octane number of catalytic cracking gasoline), and the influence of factors such as steam pressure, stable tower top and bottom temperature, catalyst circulation capacity, olefin, bromine and the like on octane loss is researched by Shangbao et al (strengthening process management and reducing octane number loss of refined gasoline of an S Zorb device). In summary, from the analysis of the principle of catalytic cracking decomposition of gasoline, the characteristic factors causing octane loss are extracted. Due to the complex association relation of mutual coupling among the operation variables, the method carries out information gain calculation on the characteristic parameters of the operation variables selected from the factory data, takes the octane loss value as the variable parameter, carries out information gain calculation on the characteristics, and deletes the characteristics with small influence on octane number loss so as to avoid the problem of over-fitting. The method comprises the steps of determining characteristic variables, preprocessing the residual data, effectively predicting the octane number loss trend through a random forest prediction model optimized by particle swarm parameters, and verifying that the method is superior to other traditional regression algorithms through a comparison experiment because the random forest has the characteristics of high dimensional feature insensitivity, high sample processing unsaturation, unsuitability for overfitting and insensitivity to noise data, so that the octane loss value prediction method based on particle swarm parameter optimization and random forest is designed.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.
Claims (7)
1. A particle swarm optimization random forest parameter-based octane loss value prediction method is characterized by comprising the following steps: the method comprises the following steps:
step 1, calculating an information gain value of relevant characteristics of octane loss values, and deleting characteristics with small influence of octane number loss;
step 2, preprocessing the residual data with characteristics less influenced by the deletion of the octane number loss, and dividing the preprocessed data into a training data set and a test data set;
step 3, training the random forest algorithm by adopting a training data set to obtain a training model, and verifying the training model by adopting a test data set;
step 4, initializing particle swarm algorithm parameters;
and 5, adopting the root mean square error of the verified random forest algorithm training model as a fitness function of the particle swarm algorithm, continuously solving the optimal value of the number n _ estimators of the parameter decision trees and the depth max _ depth of the trees in the verified random forest algorithm training model through the particle swarm algorithm, and introducing the optimal parameters into the verified random forest algorithm training model to obtain the optimal prediction model.
2. The particle swarm optimization random forest parameter-based octane loss value prediction method according to claim 1, wherein the method comprises the following steps: further comprising:
and 6, inputting the data processed in the step 1 and the step 2 again as a new test set, importing the new test set into an optimal prediction model, and testing to obtain a prediction result.
3. The particle swarm optimization random forest parameter-based octane loss value prediction method according to claim 1 or 2, wherein the method comprises the following steps: in the step 1, the deleting conditions are as follows: and whether the information gain value of the characteristic is smaller than the average information gain value of all the characteristics or not is judged, and the characteristics corresponding to the average information gain value smaller than all the characteristics are deleted.
4. The particle swarm optimization random forest parameter-based octane loss value prediction method according to claim 1 or 2, wherein the method comprises the following steps: in the step 2, the pretreatment specifically comprises: filling null values and normalizing.
5. The particle swarm optimization random forest parameter-based octane loss value prediction method of claim 3, wherein the octane loss value prediction method comprises the following steps: the filling null specifically is: the sample data is concentrated, and when a single characteristic of a certain sample data is a null value, the mean value of the sum of the previous data and the next data at the null value position is used for filling the null value; otherwise, when more than two characteristics in a certain sample data have null values, deleting the data.
6. The particle swarm optimization random forest parameter-based octane loss value prediction method of claim 3, wherein the octane loss value prediction method comprises the following steps: the normalization is specifically performed by min-max normalization, so that the result value is mapped between [0-1 ].
7. The particle swarm optimization random forest parameter-based octane loss value prediction method according to claim 1 or 2, wherein the method comprises the following steps: in the step 4, the parameters are set as follows: population number, particle position inertial weight, particle learning factor and particle dimension; the population number, the particle position inertia weight and the particle learning factor are used as main parameters influencing the particle swarm algorithm, and the particle dimension is the number of optimized random forest parameters.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011587477.6A CN112686296B (en) | 2020-12-29 | 2020-12-29 | Octane loss value prediction method based on particle swarm optimization random forest parameters |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011587477.6A CN112686296B (en) | 2020-12-29 | 2020-12-29 | Octane loss value prediction method based on particle swarm optimization random forest parameters |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112686296A true CN112686296A (en) | 2021-04-20 |
CN112686296B CN112686296B (en) | 2022-07-01 |
Family
ID=75454768
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011587477.6A Active CN112686296B (en) | 2020-12-29 | 2020-12-29 | Octane loss value prediction method based on particle swarm optimization random forest parameters |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112686296B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113254435A (en) * | 2021-07-15 | 2021-08-13 | 北京电信易通信息技术股份有限公司 | Data enhancement method and system |
CN113408187A (en) * | 2021-05-15 | 2021-09-17 | 西安石油大学 | Optimization method for reducing gasoline octane number loss based on random forest |
CN116306321A (en) * | 2023-05-18 | 2023-06-23 | 湖南工商大学 | Particle swarm-based adsorbed water treatment scheme optimization method, device and equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017098862A1 (en) * | 2015-12-08 | 2017-06-15 | 国立研究開発法人物質・材料研究機構 | Fuel oil discrimination sensor equipped with receptor layer composed of hydrocarbon-group-modified microparticles, and fuel oil discrimination method |
CN109668856A (en) * | 2017-10-17 | 2019-04-23 | 中国石油化工股份有限公司 | The method and apparatus for predicting hydrocarbon system's composition of LCO hydrogenating materials and product |
CN110059852A (en) * | 2019-03-11 | 2019-07-26 | 杭州电子科技大学 | A kind of stock yield prediction technique based on improvement random forests algorithm |
CN110766222A (en) * | 2019-10-22 | 2020-02-07 | 太原科技大学 | Particle swarm parameter optimization and random forest based PM2.5 concentration prediction method |
CN111797674A (en) * | 2020-04-10 | 2020-10-20 | 成都信息工程大学 | MI electroencephalogram signal identification method based on feature fusion and particle swarm optimization algorithm |
-
2020
- 2020-12-29 CN CN202011587477.6A patent/CN112686296B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017098862A1 (en) * | 2015-12-08 | 2017-06-15 | 国立研究開発法人物質・材料研究機構 | Fuel oil discrimination sensor equipped with receptor layer composed of hydrocarbon-group-modified microparticles, and fuel oil discrimination method |
CN109668856A (en) * | 2017-10-17 | 2019-04-23 | 中国石油化工股份有限公司 | The method and apparatus for predicting hydrocarbon system's composition of LCO hydrogenating materials and product |
CN110059852A (en) * | 2019-03-11 | 2019-07-26 | 杭州电子科技大学 | A kind of stock yield prediction technique based on improvement random forests algorithm |
CN110766222A (en) * | 2019-10-22 | 2020-02-07 | 太原科技大学 | Particle swarm parameter optimization and random forest based PM2.5 concentration prediction method |
CN111797674A (en) * | 2020-04-10 | 2020-10-20 | 成都信息工程大学 | MI electroencephalogram signal identification method based on feature fusion and particle swarm optimization algorithm |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408187A (en) * | 2021-05-15 | 2021-09-17 | 西安石油大学 | Optimization method for reducing gasoline octane number loss based on random forest |
CN113254435A (en) * | 2021-07-15 | 2021-08-13 | 北京电信易通信息技术股份有限公司 | Data enhancement method and system |
CN113254435B (en) * | 2021-07-15 | 2021-10-29 | 北京电信易通信息技术股份有限公司 | Data enhancement method and system |
CN116306321A (en) * | 2023-05-18 | 2023-06-23 | 湖南工商大学 | Particle swarm-based adsorbed water treatment scheme optimization method, device and equipment |
CN116306321B (en) * | 2023-05-18 | 2023-08-18 | 湖南工商大学 | Particle swarm-based adsorbed water treatment scheme optimization method, device and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN112686296B (en) | 2022-07-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112686296B (en) | Octane loss value prediction method based on particle swarm optimization random forest parameters | |
CN110379463B (en) | Marine algae cause analysis and concentration prediction method and system based on machine learning | |
CN112489733B (en) | Octane number loss prediction method based on particle swarm algorithm and neural network | |
CN109034260B (en) | Desulfurization tower oxidation fan fault diagnosis system and method based on statistical principle and intelligent optimization | |
US11820947B2 (en) | Method of reducing octane loss in catalytic cracking of gasoline in S-zorb plant | |
Alvarez et al. | An evolutionary algorithm to discover quantitative association rules from huge databases without the need for an a priori discretization | |
CN111144609A (en) | Boiler exhaust emission prediction model establishing method, prediction method and device | |
CN112835570A (en) | Machine learning-based visual mathematical modeling method and system | |
CN112435720A (en) | Prediction method based on self-attention mechanism and multi-drug characteristic combination | |
CN115188429A (en) | Catalytic cracking unit key index modeling method integrating time sequence feature extraction | |
CN105740960B (en) | A kind of optimization method of industry hydrocracking reaction condition | |
Chu et al. | Co-training based on semi-supervised ensemble classification approach for multi-label data stream | |
CN114239400A (en) | Multi-working-condition process self-adaptive soft measurement modeling method based on local double-weighted probability hidden variable regression model | |
CN113111588B (en) | NO of gas turbine X Emission concentration prediction method and device | |
CN112420132A (en) | Product quality optimization control method in gasoline catalytic cracking process | |
Guo et al. | Optimization Modeling and Empirical Research on Gasoline Octane Loss Based on Data Analysis | |
CN112342050B (en) | Method and device for optimizing light oil yield of catalytic cracking unit and storage medium | |
CN113408187A (en) | Optimization method for reducing gasoline octane number loss based on random forest | |
CN116449691A (en) | Raw oil processing control method and device | |
Divine et al. | Enhancing biomass Pyrolysis: Predictive insights from process simulation integrated with interpretable Machine learning models | |
CN110389948A (en) | A kind of tail oil prediction technique of the hydrocracking unit based on data-driven | |
Hamedi et al. | Integrating artificial immune genetic algorithm and metaheuristic ant colony optimizer with two-dose vaccination and modeling for residual fluid catalytic cracking process | |
Hasibuan et al. | Bootstrap aggregating of classification and regression trees in identification of single nucleotide polymorphisms | |
CN117434911B (en) | Equipment running state monitoring method and device and electronic equipment | |
CN115497573B (en) | Carbon-based biological and geological catalytic material property prediction and preparation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |