CN118627418A - Method and device for predicting water inflow of underground water seal oil depot based on ensemble learning - Google Patents
Method and device for predicting water inflow of underground water seal oil depot based on ensemble learning Download PDFInfo
- Publication number
- CN118627418A CN118627418A CN202410715070.9A CN202410715070A CN118627418A CN 118627418 A CN118627418 A CN 118627418A CN 202410715070 A CN202410715070 A CN 202410715070A CN 118627418 A CN118627418 A CN 118627418A
- Authority
- CN
- China
- Prior art keywords
- learner
- learning model
- integrated learning
- stacking integrated
- subset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 title claims abstract description 67
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000012549 training Methods 0.000 claims abstract description 43
- 238000012216 screening Methods 0.000 claims abstract description 17
- 238000005457 optimization Methods 0.000 claims abstract description 13
- 238000003860 storage Methods 0.000 claims description 15
- 238000012360 testing method Methods 0.000 claims description 14
- 238000010219 correlation analysis Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 12
- 238000007637 random forest analysis Methods 0.000 claims description 11
- 238000004422 calculation algorithm Methods 0.000 claims description 10
- 230000001419 dependent effect Effects 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 3
- 238000004891 communication Methods 0.000 description 9
- 238000004590 computer program Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 239000003673 groundwater Substances 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 239000003208 petroleum Substances 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000002790 cross-validation Methods 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012417 linear regression Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 238000012876 topography Methods 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the invention relates to the field of underground water seal oil reservoirs, and provides an integrated learning-based underground water seal oil reservoir water inflow prediction method and device. The method comprises the steps of obtaining a target data set and extracting initial characteristics in the target data set; screening the initial features to obtain an optimal feature subset; constructing a base learner, optimizing the super parameters of the base learner by using a Bayes parameter optimization model, and training the optimized base learner to obtain an optimized trained base learner; and constructing a Stacking integrated learning model, training the Stacking integrated learning model, and predicting the water inflow of the underground water-sealed oil depot through the trained Stacking integrated learning model. In this way, the water inflow of the underground water seal oil depot can be predicted, and effective information as much as possible is obtained from limited learning data, so that the problem of overfitting can be avoided to a certain extent.
Description
Technical Field
The present invention relates generally to the field of underground water seal oil reservoirs, and more particularly, to an integrated learning-based method and apparatus for predicting water inflow of an underground water seal oil reservoir.
Background
The underground water seal cave depot has the advantages of safety, environmental protection, low cost and the like, and is a petroleum reserve mode which is widely popularized in China at present. The water inflow of the cavern is an important index for evaluating the technical and economic comprehensive cost performance and engineering quality safety of the underground water seal cave depot, so that the prediction of the water inflow of the cavern always causes a great difficulty in building and operating the cave depot for a long time.
The prediction method of the water inflow of the underground cave depot is adopted in the prior standard recommendation and engineering and is solved based on an equivalent continuity medium, such as an empirical formula analysis method, a numerical simulation analysis method, a hydrogeologic comparison method and the like, and the prediction values of the prediction method have larger deviation from the actual values after the engineering, so that the engineering requirements cannot be met. The groundwater condition has a larger influence on the water inflow of the cave depot, and the hydrologic monitoring data of the reservoir area can truly reflect the state change of the groundwater level, thereby influencing the change of the water inflow. According to the method, the optimal data set is obtained through the collection of the on-site real data and the pretreatment of the data set, and then the accurate prediction of the water inflow of the underground water-sealed oil depot is realized based on the integrated learning method.
Disclosure of Invention
According to the embodiment of the invention, an integrated learning-based underground water seal oil depot water inflow prediction scheme is provided. According to the scheme, effective information can be obtained from limited learning data as much as possible, the problem of trapping in local minimum values is effectively avoided, and the problem of overfitting can be avoided to a certain extent.
In a first aspect of the invention, an integrated learning-based method for predicting water inflow of an underground water seal oil depot is provided. The method comprises the following steps:
acquiring a target data set, and extracting initial characteristics in the target data set; the initial features include independent variable features and dependent variable features;
screening the initial features by utilizing correlation analysis to obtain an optimal feature subset; dividing the optimal feature subset into a training set and a testing set;
Constructing a base learner, optimizing the super parameters of the base learner by using a Bayes parameter optimization model, and training the optimized base learner by using a K-Fold method to obtain an optimized trained base learner; the basic learner comprises a random forest model, an extreme gradient lifting tree and a lightweight gradient lifting machine;
And constructing a Stacking integrated learning model, training the Stacking integrated learning model by using the training set to obtain a trained Stacking integrated learning model, and predicting the water inflow of the underground water seal oil depot by using the trained Stacking integrated learning model.
Further, the screening the initial features by using correlation analysis to obtain an optimal feature subset includes:
calculating a pearson correlation coefficient matrix of each independent variable characteristic in the initial characteristic to obtain a plurality of independent variable correlation coefficients;
and screening independent variable characteristics with the independent variable correlation coefficient larger than a preset importance threshold value to generate an optimal characteristic subset.
Further, the calculating the pearson correlation coefficient matrix of each independent variable feature in the initial feature includes:
Wherein R is an independent variable correlation coefficient; x i is the ith independent variable feature, y i is the water inflow of the underground water seal oil depot corresponding to the ith independent variable feature, and n is the dimension of the independent variable feature.
Further, the method further comprises the step of carrying out data preprocessing on the optimal feature subset, and taking the preprocessed optimal feature subset as a training set and a testing set.
Further, the preprocessing the optimal feature subset includes:
filling missing data in the optimal feature subset by adopting a random forest regression algorithm to obtain a first subset;
drawing a box graph, detecting and eliminating noise of the first subset to obtain a second subset;
And carrying out normalization processing on the second subset to obtain the preprocessed optimal feature subset.
Further, the Stacking integrated learning model comprises a primary learner and a secondary learner; the first-level learner is a base learner after optimization training; the secondary learner is a meta learner; the input of the primary learner is used as the input of the Stacking integrated learning model, the output of the primary learner is used as the input of the secondary learner, and the output of the secondary learner is used as the output of the Stacking integrated learning model.
Further, the method further comprises the following steps: and evaluating the generalization capability of the trained Stacking integrated learning model so as to predict the water inflow of the underground water seal oil depot by the Stacking integrated learning model meeting the evaluation requirement.
Further, the evaluation requirement includes:
The decision coefficient of the Stacking integrated learning model is larger than a decision coefficient threshold value; and is also provided with
The average absolute percentage error of the Stacking integrated learning model is smaller than the average absolute percentage error threshold; and is also provided with
The root mean square error of the Stacking ensemble learning model is less than the root mean square error threshold.
In a second aspect of the invention, an integrated learning-based device for predicting water inflow of an underground water seal oil depot is provided. The device comprises:
The acquisition module is used for acquiring a target data set and extracting initial characteristics in the target data set; the initial features include independent variable features and dependent variable features;
the screening module is used for screening the initial features by utilizing correlation analysis to obtain an optimal feature subset; dividing the optimal feature subset into a training set and a testing set;
The first construction module is used for constructing a base learner, optimizing the super parameters of the base learner by using a Bayesian parameter optimization model, and training the optimized base learner by using a K-Fold method to obtain an optimized trained base learner; the basic learner comprises a random forest model, an extreme gradient lifting tree and a lightweight gradient lifting machine;
The second building module is used for building a Stacking integrated learning model, training the Stacking integrated learning model by utilizing the training set to obtain a trained Stacking integrated learning model, and predicting the water inflow of the underground water seal oil depot by the trained Stacking integrated learning model.
In a third aspect of the invention, an electronic device is provided. At least one processor of the electronic device; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect of the invention.
In a fourth aspect of the invention, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of the first aspect of the invention.
It should be understood that the description in this summary is not intended to limit the critical or essential features of the embodiments of the invention, nor is it intended to limit the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
The above and other features, advantages and aspects of embodiments of the present invention will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals denote like or similar elements, in which:
FIG. 1 shows a flow chart of an integrated learning-based method for predicting water inflow of an underground water seal oil depot according to an embodiment of the invention;
FIG. 2 shows a block diagram of an integrated learning-based groundwater seal reservoir water inflow prediction device in accordance with an embodiment of the invention;
FIG. 3 illustrates a block diagram of an exemplary electronic device capable of implementing embodiments of the invention;
wherein 300 is an electronic device, 301 is a computing unit, 302 is a ROM, 303 is a RAM, 304 is a bus, 305 is an I/O interface, 306 is an input unit, 307 is an output unit, 308 is a storage unit, 309 is a communication unit.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In addition, the term "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
The embodiment of the invention adopts an integrated learning method which is simple and efficient, particularly solves the complex problems that theoretical models, geological parameters and the like are difficult to determine, and adopts the integrated learning method to predict the water inflow of the underground water-seal oil depot, so that the prediction accuracy obtained by predicting the water inflow of the underground water-seal oil depot can meet the requirements of the domestic underground water-seal oil depot engineering at the present stage, and can help solve the problem that the water inflow of the existing underground water-seal oil depot is difficult to predict.
Fig. 1 shows a flowchart of an integrated learning-based method for predicting water inflow of an underground water seal oil depot according to an embodiment of the invention.
The method comprises the following steps:
S101, acquiring a target data set, and extracting initial characteristics in the target data set; the initial characteristics include an independent variable characteristic and a dependent variable characteristic.
In this embodiment, the target data set is a measured record data set in the petroleum cave depot operation period.
Such as measured records of the operating period of a petroleum storage cave depot in a country in a second phase of Huizhou. The initial features include 7 independent variable features and 1 dependent variable feature; the argument features include: ground water level, field rainfall, reservoir area topography environment, water curtain water supplementing amount, oil gas pressure (operating pressure) above the cavity, oil storage state and cavity volume. The dependent variable is characterized by the water inflow of the underground water seal oil depot.
S102, screening the initial features by utilizing correlation analysis to obtain an optimal feature subset; the optimal feature subset is divided into a training set and a testing set.
In this embodiment, the screening the initial features by using correlation analysis to obtain an optimal feature subset includes:
Firstly, calculating a pearson correlation coefficient matrix of each independent variable characteristic in the initial characteristic to obtain a plurality of independent variable correlation coefficients;
And secondly, screening independent variable characteristics with the independent variable correlation coefficient larger than a preset importance threshold value to generate an optimal characteristic subset.
Specifically, calculating the pearson correlation coefficient matrix of each independent variable feature in the initial features includes:
Wherein R is an independent variable correlation coefficient; x i is the ith independent variable feature, y i is the water inflow of the underground water seal oil depot corresponding to the ith independent variable feature, n is the dimension of the independent variable feature, and if 7 independent variable features exist, n=7.
In this embodiment, the importance threshold is preset, for example, the importance threshold is set to 0.3, and by comparing the independent variable correlation coefficient R with the importance threshold of 0.3, the independent variable features larger than 0.3 are reserved, the independent variable features smaller than 0.3 are eliminated, so as to reserve the features with higher correlation coefficients, eliminate the features with smaller correlation coefficients, and thus obtain an optimal feature subset, and perform the following model training.
In this embodiment, the optimal feature subset is divided into a training set and a test set, for example, 80% of the optimal feature subset is used as the training set and 20% is used as the test set. The test set is used for testing after model training.
Screening the initial features using correlation analysis can improve model performance, i.e., by selecting the most critical features, the model can more accurately capture patterns of data, improving predictive performance.
Screening the initial features by correlation analysis can reduce overfitting, namely, eliminating unimportant features can reduce overfitting of the model to training data and improve generalization capability of the model.
By training and reasoning with fewer features, computational efficiency is significantly improved, shortening the time for model training and prediction.
In one embodiment of the present invention, after obtaining the optimal feature subset, the method further includes performing data preprocessing on the optimal feature subset, and using the preprocessed optimal feature subset as a training set and a testing set.
Specifically, the preprocessing the optimal feature subset includes:
filling missing data in the optimal feature subset by adopting a random forest regression algorithm to obtain a first subset;
drawing a box graph, detecting and eliminating noise of the first subset to obtain a second subset;
And carrying out normalization processing on the second subset to obtain the preprocessed optimal feature subset.
Specifically, the normalization processing includes:
wherein x i' is the normalized sample value; x min is the minimum of the samples; x max is the maximum value of the sample.
Filling missing values by constructing a plurality of decision trees by adopting a random forest regression algorithm so that filled data has randomness and uncertainty and can reflect the real distribution of the unknown data.
Because each branch node selects random partial characteristics but not all characteristics in the process of constructing the decision tree, the method can be well applied to filling of high-dimensional data; the random forest algorithm has good classification precision, so that the accuracy and reliability of the obtained filling value are further ensured.
By plotting the bin pattern to detect and reject noise of the first subset, discrete distribution conditions of the data can be accurately and stably depicted and no specific distribution form is required to be obeyed: the drawing of the box graph relies on real data, without the need to assume in advance that the data is subject to a pending distribution form, without any restrictive requirements on the data, which is merely a real visual representation of the original appearance of the data shape. On the other hand, outliers do not affect the quartile determination: the standard for judging the abnormal value of the box graph is based on quartiles and quartile ranges, the quartiles have certain resistance, and most 25% of data can be arbitrarily far without greatly disturbing the quartiles, so that the abnormal value cannot influence the standard, and the result of identifying the abnormal value of the box graph is objective.
Through normalization processing, dimensional difference can be eliminated, weight unbalance is avoided, and model convergence rate is improved.
S103, constructing a base learner, optimizing the super parameters of the base learner by using a Bayes parameter optimization model, and training the optimized base learner by using a K-Fold method to obtain an optimized trained base learner; the base learner includes a random forest model (RF), an extreme gradient lifting tree (XGB), and a lightweight gradient lifting machine (LGB).
In this embodiment, the super parameters of the base learner are optimized by using a bayesian parameter optimization model, and the specified super parameters are searched by setting a parameter space to seek an optimal parameter combination.
In this embodiment, training the optimized base learner by the K-Fold method includes:
based on a K-fold cross validation method, uniformly dividing the optimal feature subset after data processing into K subsets A= { A 1,A2,A3,…,AK }; and taking each subset in the A= { A 1,A2,A3,…,AK } as a prediction set B C in K times, taking the other subsets as a training set B X, and inputting the training set B X into a first layer of base learner model for training to obtain each sample test set in the A= { A 1,A2,A3,…,AK }. Where the "other subsets", K-1 subsets, are used as training sets since only one subset is selected as the prediction set at a time, the remaining K-1 subsets are used as the training set. The k times are repeated so that each subset makes a prediction set once.
S104, constructing a Stacking integrated learning model, training the Stacking integrated learning model by using the training set to obtain a trained Stacking integrated learning model, and predicting the water inflow of the underground water seal oil depot by using the trained Stacking integrated learning model.
In this embodiment, the Stacking integrated learning model includes a primary learner and a secondary learner; the first-level learner is a base learner after optimization training; the secondary learner is a meta learner; the input of the primary learner is used as the input of the Stacking integrated learning model, the output of the primary learner is used as the input of the secondary learner, and the output of the secondary learner is used as the output of the Stacking integrated learning model.
In this embodiment, in order to avoid overfitting, linear Regression (LR) that is simple in structure and strong in generalization ability is selected as the meta learner.
In some optional implementations of this embodiment, in order to verify whether the data set can be well applied to the model and obtain a higher prediction accuracy, the generalization capability of the trained Stacking integrated learning model needs to be evaluated, so that the Stacking integrated learning model meeting the evaluation requirement predicts the water inflow of the underground water seal oil depot.
Specifically, the evaluation requires that the following 3 conditions are simultaneously satisfied:
condition 1: the decision coefficient of the Stacking integrated learning model is larger than a decision coefficient threshold value;
condition 2: the Mean Absolute Percentage Error (MAPE) of the Stacking ensemble learning model is less than a mean absolute percentage error threshold;
Condition 3: the Root Mean Square Error (RMSE) of the Stacking ensemble learning model is less than the root mean square error threshold.
Parameters for evaluating the generalization capability of the Stacking integrated learning model set a threshold value:
The decision coefficients represent the accuracy of the model fitting data, for example, a model with decision coefficients greater than 0.5 is used; the decision coefficient may be expressed as R 2, the square of the independent variable correlation coefficient R.
Mean Absolute Percent Error (MAPE) represents the average of absolute errors between predicted and true values, for example using a model with a mean absolute percent error less than 5;
the Root Mean Square Error (RMSE) reflects the degree of deviation of the model predicted value from the measured value, where a model with a root mean square error less than 100 is used.
In this embodiment, the larger the decision coefficient, the better the model performance; the smaller the Mean Absolute Percentage Error (MAPE) value, the better the model performance; the smaller the Root Mean Square Error (RMSE) value, the better the model performance.
According to the embodiment of the invention, prediction is performed by adopting a Stacking ensemble learning regression prediction algorithm based on machine learning, and an optimal feature subset is selected through feature correlation analysis. The Bayesian optimization algorithm is adopted to carry out super-parameter optimization on the established base model, and the algorithm can acquire as much effective information as possible from limited learning data by combining the five-fold cross validation method, so that the problem of partial minimum sinking is effectively avoided, and the problem of over-fitting can be avoided to a certain extent.
And calculating the importance of each feature according to the influence degree of each feature on the model output through the feature importance analysis of the XGB algorithm model, and sequentially determining the most important factors influencing the model output to improve a certain reference opinion for field staff.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are alternative embodiments, and that the acts and modules referred to are not necessarily required for the present invention.
The above description of the method embodiments will be further described with reference to the following embodiments of the apparatus having the same inventive concept as the method in the previous embodiments.
As shown in fig. 2, the apparatus 200 includes:
An obtaining module 210, configured to obtain a target data set, and extract initial features in the target data set; the initial features include independent variable features and dependent variable features;
A screening module 220, configured to screen the initial features by using correlation analysis, so as to obtain an optimal feature subset; dividing the optimal feature subset into a training set and a testing set;
The first construction module 230 is configured to construct a base learner, optimize the super parameters of the base learner by using a bayesian parameter optimization model, and train the optimized base learner by using a K-Fold method to obtain an optimized trained base learner; the basic learner comprises a random forest model, an extreme gradient lifting tree and a lightweight gradient lifting machine;
And the second building module 240 is configured to build a Stacking integrated learning model, train the Stacking integrated learning model by using the training set, obtain a trained Stacking integrated learning model, and predict the water inflow of the underground water seal oil depot by using the trained Stacking integrated learning model.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the described modules may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.
In the technical scheme of the invention, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
According to an embodiment of the invention, the invention further provides an electronic device and a readable storage medium.
Fig. 3 shows a schematic block diagram of an electronic device 300 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
The electronic device 300 comprises a computing unit 301 that may perform various suitable actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) 302 or a computer program loaded from a storage unit 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the electronic device 300 may also be stored. The computing unit 301, the ROM 302, and the RAM 303 are connected to each other by a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.
Various components in the electronic device 300 are connected to the I/O interface 305, including: an input unit 306 such as a keyboard, a mouse, etc.; an output unit 307 such as various types of displays, speakers, and the like; a storage unit 308 such as a magnetic disk, an optical disk, or the like; and a communication unit 309 such as a network card, modem, wireless communication transceiver, etc. The communication unit 309 allows the electronic device 300 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 301 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 301 performs the respective methods and processes described above, for example, the methods S101 to S104. For example, in some embodiments, methods S101-S104 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 308. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 300 via the ROM 302 and/or the communication unit 309. When the computer program is loaded into the RAM 303 and executed by the computing unit 301, one or more steps of the methods S101 to S104 described above may be performed. Alternatively, in other embodiments, the computing unit 301 may be configured to perform the methods S101-S104 by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chips (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Claims (11)
1. The method for predicting the water inflow of the underground water seal oil depot based on the integrated learning is characterized by comprising the following steps of:
acquiring a target data set, and extracting initial characteristics in the target data set; the initial features include independent variable features and dependent variable features;
screening the initial features by utilizing correlation analysis to obtain an optimal feature subset; dividing the optimal feature subset into a training set and a testing set;
Constructing a base learner, optimizing the super parameters of the base learner by using a Bayes parameter optimization model, and training the optimized base learner by using a K-Fold method to obtain an optimized trained base learner; the basic learner comprises a random forest model, an extreme gradient lifting tree and a lightweight gradient lifting machine;
And constructing a Stacking integrated learning model, training the Stacking integrated learning model by using the training set to obtain a trained Stacking integrated learning model, and predicting the water inflow of the underground water seal oil depot by using the trained Stacking integrated learning model.
2. The method of claim 1, wherein the screening the initial features using correlation analysis to obtain an optimal feature subset comprises:
calculating a pearson correlation coefficient matrix of each independent variable characteristic in the initial characteristic to obtain a plurality of independent variable correlation coefficients;
and screening independent variable characteristics with the independent variable correlation coefficient larger than a preset importance threshold value to generate an optimal characteristic subset.
3. The method of claim 2, wherein said calculating a pearson correlation coefficient matrix for each of the initial features comprises:
Wherein R is an independent variable correlation coefficient; x i is the ith independent variable feature, y i is the water inflow of the underground water seal oil depot corresponding to the ith independent variable feature, and n is the dimension of the independent variable feature.
4. The method according to claim 1 or 2, further comprising data preprocessing the optimal feature subset to take the preprocessed optimal feature subset as a training set and a test set.
5. The method of claim 4, wherein the preprocessing the optimal feature subset comprises:
filling missing data in the optimal feature subset by adopting a random forest regression algorithm to obtain a first subset;
drawing a box graph, detecting and eliminating noise of the first subset to obtain a second subset;
And carrying out normalization processing on the second subset to obtain the preprocessed optimal feature subset.
6. The method of claim 1, wherein the Stacking integrated learning model comprises a primary learner and a secondary learner; the first-level learner is a base learner after optimization training; the secondary learner is a meta learner; the input of the primary learner is used as the input of the Stacking integrated learning model, the output of the primary learner is used as the input of the secondary learner, and the output of the secondary learner is used as the output of the Stacking integrated learning model.
7. The method as recited in claim 1, further comprising: and evaluating the generalization capability of the trained Stacking integrated learning model so as to predict the water inflow of the underground water seal oil depot by the Stacking integrated learning model meeting the evaluation requirement.
8. The method of claim 7, wherein the evaluating the requirements comprises:
The decision coefficient of the Stacking integrated learning model is larger than a decision coefficient threshold value; and is also provided with
The average absolute percentage error of the Stacking integrated learning model is smaller than the average absolute percentage error threshold; and is also provided with
The root mean square error of the Stacking ensemble learning model is less than the root mean square error threshold.
9. Underground water seal oil depot water inflow prediction unit based on integrated study, its characterized in that includes:
The acquisition module is used for acquiring a target data set and extracting initial characteristics in the target data set; the initial features include independent variable features and dependent variable features;
the screening module is used for screening the initial features by utilizing correlation analysis to obtain an optimal feature subset; dividing the optimal feature subset into a training set and a testing set;
The first construction module is used for constructing a base learner, optimizing the super parameters of the base learner by using a Bayesian parameter optimization model, and training the optimized base learner by using a K-Fold method to obtain an optimized trained base learner; the basic learner comprises a random forest model, an extreme gradient lifting tree and a lightweight gradient lifting machine;
The second building module is used for building a Stacking integrated learning model, training the Stacking integrated learning model by utilizing the training set to obtain a trained Stacking integrated learning model, and predicting the water inflow of the underground water seal oil depot by the trained Stacking integrated learning model.
10. An electronic device comprising at least one processor; and
A memory communicatively coupled to the at least one processor; it is characterized in that the method comprises the steps of,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
11. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410715070.9A CN118627418A (en) | 2024-06-04 | 2024-06-04 | Method and device for predicting water inflow of underground water seal oil depot based on ensemble learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410715070.9A CN118627418A (en) | 2024-06-04 | 2024-06-04 | Method and device for predicting water inflow of underground water seal oil depot based on ensemble learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118627418A true CN118627418A (en) | 2024-09-10 |
Family
ID=92604809
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410715070.9A Pending CN118627418A (en) | 2024-06-04 | 2024-06-04 | Method and device for predicting water inflow of underground water seal oil depot based on ensemble learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118627418A (en) |
-
2024
- 2024-06-04 CN CN202410715070.9A patent/CN118627418A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112907128B (en) | Data analysis method, device, equipment and medium based on AB test result | |
CN116307215A (en) | Load prediction method, device, equipment and storage medium of power system | |
CN115965160B (en) | Data center energy consumption prediction method and device, storage medium and electronic equipment | |
CN114399235B (en) | Method and system for judging disaster risk level based on rain condition data | |
CN116822803A (en) | Carbon emission data graph construction method, device and equipment based on intelligent algorithm | |
CN115221793A (en) | Tunnel surrounding rock deformation prediction method and device | |
CN110991079A (en) | Oil and gas reservoir parameter interpretation method and device based on neural network and electronic equipment | |
CN118014018A (en) | Building energy consumption prediction method, device, equipment and storage medium | |
CN117108269A (en) | Method, device, equipment and medium for dynamically adjusting parameters of oil pumping well | |
CN114861800B (en) | Model training method, probability determining device, model training equipment, model training medium and model training product | |
CN118627418A (en) | Method and device for predicting water inflow of underground water seal oil depot based on ensemble learning | |
CN113807391A (en) | Task model training method and device, electronic equipment and storage medium | |
CN112488805A (en) | Long-renting market early warning method based on multiple regression time series analysis | |
CN114491416B (en) | Processing method and device of characteristic information, electronic equipment and storage medium | |
CN114595781B (en) | Octane number loss prediction method, device, equipment and storage medium | |
CN117909717B (en) | Engineering quantity auxiliary acceptance settlement method based on deep learning and data mining | |
CN118171047B (en) | Filling method and device of missing data, electronic equipment and storage medium | |
CN117575106B (en) | Method, system, electronic equipment and medium for predicting gas production profile of coal-bed gas well | |
CN118071138A (en) | Construction method and device of object XGBoost model, computer equipment and storage medium | |
CN117705178A (en) | Wind power bolt information detection method and device, electronic equipment and storage medium | |
CN117077060A (en) | Shale gas well lost circulation early warning method and device, electronic equipment and storage medium | |
CN118100151A (en) | Power grid load prediction method, device, equipment and storage medium | |
CN116881776A (en) | Reservoir type determining method, device, equipment and medium | |
CN117934137A (en) | Bad asset recovery prediction method, device and equipment based on model fusion | |
CN118354347A (en) | Base station out-of-service prediction method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |