CN115099519B - Oil well yield prediction method based on multi-machine learning model fusion - Google Patents
Oil well yield prediction method based on multi-machine learning model fusion Download PDFInfo
- Publication number
- CN115099519B CN115099519B CN202210826531.0A CN202210826531A CN115099519B CN 115099519 B CN115099519 B CN 115099519B CN 202210826531 A CN202210826531 A CN 202210826531A CN 115099519 B CN115099519 B CN 115099519B
- Authority
- CN
- China
- Prior art keywords
- model
- production
- data
- training
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000003129 oil well Substances 0.000 title claims abstract description 40
- 230000004927 fusion Effects 0.000 title claims abstract description 34
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000010801 machine learning Methods 0.000 title claims abstract description 16
- 238000004519 manufacturing process Methods 0.000 claims abstract description 66
- 238000012549 training Methods 0.000 claims abstract description 41
- 238000013528 artificial neural network Methods 0.000 claims abstract description 32
- 238000012360 testing method Methods 0.000 claims abstract description 22
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 238000010606 normalization Methods 0.000 claims description 11
- 238000011156 evaluation Methods 0.000 claims description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 10
- 230000007246 mechanism Effects 0.000 description 9
- 238000012545 processing Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 101001095088 Homo sapiens Melanoma antigen preferentially expressed in tumors Proteins 0.000 description 4
- 102100037020 Melanoma antigen preferentially expressed in tumors Human genes 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000010339 dilation Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- Evolutionary Computation (AREA)
- Economics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Game Theory and Decision Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Development Economics (AREA)
- General Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Biomedical Technology (AREA)
Abstract
The invention discloses an oil well yield prediction method based on multi-machine learning model fusion, which comprises the following steps: s1: collecting production data of a target oil well and preprocessing to obtain a production data set; s2: dividing a training data set and a test data set; s3: respectively constructing a TCN-attention model, a CatBoost model and an ANFIS model and training; s4: obtaining a first prediction output result of the three models, and dividing the first prediction output result into a second training data set and a second testing data set; s5: constructing an RBF neural network and training; s6: obtaining a second prediction output result; s7: comparing the second prediction output result with the true value in the second test data set, and judging whether the fusion model meets the prediction precision requirement according to the comparison result: if not, retraining; and if so, predicting the future yield of the target oil well by using the fusion model. The invention can more accurately predict the daily oil yield of the oil well and provide technical support for oil field production.
Description
Technical Field
The invention relates to the technical field of oil field production well productivity prediction, in particular to an oil well yield prediction method based on multi-machine learning model fusion.
Background
The oil well yield prediction has an important influence on the design of an oil field development scheme, and the oil well working system is timely adjusted through the yield prediction, so that on-site deployment and workload distribution are more scientific and reasonable, and the normal realization of a planning target is ensured. The most common method for predicting the oil well yield at present is an oil reservoir numerical simulation method, which can obtain a relatively accurate yield prediction result through geologic modeling and history fitting, but has the defects that a large amount of geologic data, petrophysical data and reservoir parameter information are required for modeling in order to ensure the accuracy of the result, so that the problems of large workload, long time consumption and the like are caused.
At present, a large amount of production data with complex structures are accumulated in oil fields, so that an artificial intelligence method is focused by researchers in the oil and gas field. BP neural network and traditional machine learning methods such as Support Vector Machines (SVM), random Forest (RF) and the like are widely used in oil and gas field yield prediction. However, the two methods do not pay attention to the time sequence of the oil well production data, belong to a point-to-point mapping, and neglect the relation between the front and the back of the data. In statistics, linear models such as an autoregressive model (AR) and a differential autoregressive integral moving average model (ARIMA) are mainly used for time series data, but the linear models are difficult to deal with a nonlinear problem of huge data volume. In addition, the oil well yield is affected by factors such as production dynamics, such as formation pressure, production duration, etc., besides various characteristics of time series data, and a single prediction model cannot generally meet the requirements of actual productivity prediction.
Disclosure of Invention
Aiming at the problems, the invention aims to provide an oil well yield prediction method based on multi-machine learning model fusion, which can more accurately realize the prediction of daily oil yield of an oil well.
The technical scheme of the invention is as follows:
an oil well yield prediction method based on multi-machine learning model fusion comprises the following steps:
S1: collecting production data of a target oil well, and preprocessing the production data to obtain a production data set;
s2: dividing the production dataset into a training dataset one and a test dataset one;
S3: respectively constructing a TCN-attention model, a CatBoost model and an ANFIS model, and respectively training a pair of three models by utilizing the training data set to obtain a trained TCN-attention model, catBoost model and ANFIS model;
S4: taking the test data set I as the input of a trained TCN-attention model, a trained CatBoost model and a trained ANFIS model to obtain a predicted output result I of the three models, and dividing the predicted output result I into a training data set II and a test data set II;
S5: constructing an RBF neural network, and training the RBF neural network by using the training data set II to obtain a trained RBF neural network;
S6: taking the second test data set as the input of the trained RBF neural network to obtain a second prediction output result of the RBF neural network;
S7: comparing the second prediction output result with the true value in the second test dataset, and judging whether a fusion model consisting of a TCN-attention model, a CatBoost model, an ANFIS model and an RBF neural network meets the prediction precision requirement or not according to the comparison result:
if the prediction accuracy requirement is not met, repeating the steps S2-S7 or repeating the steps S5-S7;
and if the prediction accuracy requirement is met, predicting the future yield of the target oil well by using the fusion model.
Preferably, in step S1, the production data includes daily oil production, production time, daily water production, oil pressure, casing pressure and back pressure.
Preferably, in step S1, the preprocessing of the production data includes data removal, data complementation and data normalization.
Preferably, in step S3, when training the TCN-attention model by using the training dataset, input data of the TCN-attention model is constructed by adopting a sliding window mode.
Preferably, in step S3, when training the CatBoost model and the ANFIS model respectively using the training dataset, production data of a day before a predicted target date is used as input data of the CatBoost model and the ANFIS model.
Preferably, in step S7, when comparing the second predicted output result with the actual value in the second test data set, an average absolute percentage error is used as an evaluation index.
The beneficial effects of the invention are as follows:
according to the invention, production data (daily oil production, production time, daily water production, oil pressure, casing pressure and back pressure) of a target oil well are taken as basic data of a training model, the influence of production dynamic factors on the oil well yield is considered, and the model prediction result obtained by training can be more in line with reality; in addition, the TCN-attention model, the CatBoost model and the ANFIS model are combined through the RBF neural network to obtain a fusion model, the time sequence of oil well production is considered, the nonlinear problem of time sequence data is considered, the problems of model stability and complexity are also considered, and the fusion model is used for predicting the future yield of the oil well to obtain a more accurate prediction result.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.
FIG. 1 is a flow chart of a method for predicting oil well production based on multi-machine learning model fusion according to the present invention;
Fig. 2 is a schematic structural diagram of a TCN residual module;
FIG. 3 is a schematic structural diagram of a TCN-attention model;
FIG. 4 is a schematic structural diagram of an ANFIS model;
FIG. 5 is a schematic diagram of the structure of an RBF neural network;
FIG. 6 is a schematic diagram showing comparison of prediction results of different models according to one embodiment;
FIG. 7 is a graph showing a comparison of the predicted results of the fusion model of the present invention and the TCN-attention model alone in FIG. 6;
FIG. 8 is a graph showing the comparison result of the partial predicted target date in FIG. 7.
Detailed Description
The application will be further described with reference to examples and figures. It should be noted that, without conflict, the embodiments of the present application and the technical features of the embodiments may be combined with each other. It is noted that all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless otherwise indicated. The use of the terms "comprising" or "includes" and the like in this disclosure is intended to cover a member or article listed after that term and equivalents thereof without precluding other members or articles.
As shown in fig. 1, the invention provides a method for predicting oil well yield based on multi-machine learning model fusion, which comprises the following steps:
S1: and collecting production data of the target oil producing well, and preprocessing the production data to obtain a production data set.
In a specific embodiment, the production data includes daily oil production, production time, daily water production, oil pressure, casing pressure, and back pressure. In this embodiment, the model is trained by incorporating the production date into the production dataset, enabling consideration of the chronology of the well production data; the oil pressure, the casing pressure and the back pressure are taken into the production data set to train the model, so that the influence of ecological dynamic factors on the oil well yield can be considered, and the predicted oil well yield result is more practical.
In a specific embodiment, preprocessing the production materials includes data removal, data complementation, and data normalization. The data removal is mainly to remove abnormal data in production data, the data complement can be completed by adopting methods such as average value or interpolation, and the like, and the data normalization is in the prior art, so that the influence of different characteristic data sizes can be eliminated.
In a specific embodiment, the normalization process is performed by adopting a dispersion normalization method, and a specific calculation formula is as follows:
wherein: x i' is the ith data after normalization; x i is the ith data in the original data; x max and X min are the maximum and minimum values in the raw data.
S2: the production dataset is divided into a training dataset one and a test dataset one.
In a specific embodiment, the data ratio of training data set one to test data set one is 7:3. It should be noted that the ratio may be changed to 8:2, 6:4, 9:1, etc. as required, and the specific division ratio may be adjusted according to the precision of the final fusion model.
S3: and respectively constructing a TCN-attention model, a CatBoost model and an ANFIS model, and respectively training a pair of three models by utilizing the training data set to obtain a trained TCN-attention model, catBoost model and ANFIS model.
The TCN-attention model is based on a Time Convolutional Network (TCN), and a attention mechanism module is introduced into a hidden layer of the TCN. The TCN is a convolutional neural network dedicated to processing time-series data, and its composition is divided into three parts, namely causal convolution, dilation convolution and residual connection. The special network structure has the advantages of flexible adjustment of receptive fields, low memory occupation and parallel calculation, and can avoid the problems of gradient disappearance or gradient explosion and the like in the training of the cyclic neural network. The essence of the attention mechanism is that the characteristics are screened by changing the weight values in the network, and the neural network imitates the attention mechanism of the human brain. In training of a neural network, a large amount of feature data is usually input, and an attention mechanism can screen important feature data with significant influence on an output result from the feature data, and weight values of the data are increased to improve model prediction accuracy.
In a specific embodiment, the construction of the TCN-attention model and the data processing flow specifically include the following sub-steps:
(1) Constructing an input dataset of a TCN-attention model
The TCN-attention model not only can extract time sequence information in the oil production data, but also has the capability of receiving multi-feature input. In a specific embodiment, the input data set is constructed in a sliding window manner, specifically: if the first t data are needed to predict the h oil production data, the sliding window is set to be t, the input data of the first sample is X (1)=[X1,X2,…,Xi,…,Xt]T, wherein X i is composed of a plurality of characteristic data, X i=[Xi 1,Xi 2,…,Xi n, n is the type number of the characteristic data (i.e. 6 production data such as oil pressure, casing pressure, etc.), and the output data is Y (1)=[yt+1,yt+2,…,yt+h]T. The second sample has input data of X (2)=[X2,X3,…,Xt+1]T and output sample Y (2)=[yt+2,yt+3,…,yt+h+1]T. And so on until all data is traversed.
(2) Residual block for constructing TCN network
In a specific embodiment, the residual block is structured as shown in fig. 2. As can be seen from fig. 2, in this embodiment, the residual block is composed of two parallel branches, one of which is a residual connection, and is composed of a layer of one-dimensional convolution with weight normalization; the other branch consists of two one-dimensional convolution layers added with a weight normalization layer and a Dropout layer, and an activation function is set as Relu functions; the outputs of the two branches are finally combined in addition.
(3) Inputting data into TCN network, obtaining output by convolution operation and continuously inputting to next layer
Assuming that the time sequence of the input data is x= [ X 1,x2,xi,…,xt-1,xt ] and the convolution kernel of each layer is f= [ f 1,f2,…,fk-1,fk ], the convolution calculation formula at the time t is as follows:
(4) As the number of network layers increases, the convolution kernel changes its calculation mode by expanding coefficients, so the convolution calculation formula changes to:
wherein, the relation between d and the network layer number r is as follows:
d=2r-1 (4)
(5) Data is processed by TCN network and output to attention mechanism as query matrix (Q)
In a specific embodiment, the attention mechanism adopts a self-attention structure, and the specific formula of the attention mechanism is as follows:
wherein: o is the output of the attention mechanism, L is the length of the input time series, and K and V are the key matrix and the value matrix, respectively.
In a specific embodiment, the key matrix and the value matrix are obtained by calculation from an original sequence, and the calculation formulas are respectively as follows:
K=I·Wk+bk (6)
V=I·WV (7)
Wherein: i is the original input sequence, and W k、bk and W V are all parameters to be trained.
(6) After the output of the attention mechanism is obtained, the final data is calculated by a full connection layer to obtain the final output
In the above embodiment, a flowchart of the TCN-attention model created by the above substeps is shown in fig. 3.
The CatBoost model is a gradient learning algorithm based on a gradient lifting decision tree (GBDT) model, which improves the gradient estimation method in the original GBDT algorithm into a ranking lifting algorithm, so that the gradient estimation method can process the category type features in GBDT with higher efficiency. In CatBoost model structure, several base learners are integrated by serial method, and in the training process, each training round can continuously update sample weight so as to attain the goal of reducing prediction deviation caused by noise point. Compared with other gradient lifting integration algorithms, catBoost can automatically process discrete feature data, and has obvious advantages in processing regression problems of multi-feature input. According to the invention, the CatBoost model is selected to be used in oil well yield prediction, so that the prediction capability of the model on the multi-feature regression problem is utilized, and the prediction performance of the combined model is effectively improved.
In a specific embodiment, the construction of the CatBoost model and the data processing flow specifically include the following substeps:
(1) Construction CatBoost of an input dataset of a model
The Catboost model works well in dealing with regression problems with multi-feature inputs, and in one particular embodiment, the feature data of the day before the selection of target prediction data when constructing the input dataset enables CatBoost to learn features that exist between short-term feature data and predicted values.
In a specific embodiment, the class feature is processed by using the target variable statistical method, the sequence of the sample data is randomly disordered after the sample data is input, and a new sequence sigma= [ sigma 1,σ2,σi,…,σn ] is generated, and then the kth feature value in the feature vector sigma i can be expressed as:
Wherein: p and beta are prior values and prior value weights introduced by low-frequency class data noise, and the value of P is the average value of the output in the sample.
(2) And constructing tree structures in different segmentation modes, determining the values of leaf nodes, and then evaluating and scoring each tree structure to obtain an optimal tree structure model.
An adaptive fuzzy inference system (ANFIS) is a predictive model with high stability and low complexity characteristics. The ANFIS controls parameters through the optimized fuzzy controller and performs optimized calculation through the BP neural network, so that the ANFIS inherits the respective advantages of the two methods and overcomes the respective disadvantages. Compared with other machine learning algorithms, ANIFS does not need to optimize super parameters, and higher prediction precision is realized under the condition of quick deployment.
In a specific embodiment, the construction of the ANFIS model and the data processing flow specifically include the following sub-steps:
(1) Constructing an input dataset of an ANFIS model
In a specific embodiment, the method of constructing the input dataset of the ANFIS model is consistent with the method of constructing the input dataset of the CatBoost model described above, again learning the relationship between short-term characteristic data and yield.
(2) And carrying out fuzzy processing on the input data and the output data received by the model through a membership function of a first layer of the model, wherein the specific formula of the membership function is as follows:
wherein: a i and c i are both conditional parameters.
(3) And a rule layer for establishing a model is positioned at a second layer of the model, the calculation rule is that each node multiplies the received data, and the output result is the fitness of the rule.
(4) The normalization layer of the model is established, and is positioned at the third layer of the model, wherein the main purpose of the layer is to complete the reliability normalization degree of the fuzzy inference system, and the specific calculation formula is as follows:
Wherein: A normalized value output for each node; w i is the reliability of the node input.
(5) The fourth layer of the model is built, the fourth layer of the model is used for calculating the result of each rule, and the node number of the fourth layer of the model is consistent with that of the last layer, so that each data can be ensured to participate in the self-adaptive evolutionary learning of fuzzy reasoning, and a specific calculation formula is as follows:
Wherein: o i is the output of the fourth layer; p i、qi and r i are model parameters.
(6) And establishing an output layer of the model, wherein the output layer is the last layer of the model, and the output of each node of the fourth layer is summed to be used as the final prediction output.
In the above embodiment, the model of the ANFIS model is shown in fig. 4.
S4: and taking the first test data set as the input of a trained TCN-attention model, a trained CatBoost model and a trained ANFIS model to obtain a first prediction output result of the three models, and dividing the first prediction output result into a second training data set and a second test data set.
In a specific embodiment, the prediction output results are partitioned by one, and are also partitioned by a ratio of 7:3.
S5: and constructing an RBF neural network, and training the RBF neural network by using the training data set II to obtain a trained RBF neural network.
The RBF neural network is Radial Basis Function Neural Network (RBFNN), comprises an input layer, a hidden layer and an output layer, and has high training efficiency. The basic idea is to use radial basis functions as the "basis" of hidden units to construct hidden layer space, and input data is transformed into high-dimensional space by vector transformation of the network, so that the data is linearly separable in higher dimensions.
According to the invention, the RBF neural network is adopted to fuse the prediction outputs of the TCN-attention model, the CatBoost model and the ANFIS model, so that the model result can be fused by utilizing the excellent fitting capacity of the RBF neural network and a more accurate prediction value can be output, and meanwhile, the complexity of the model is not increased due to the simple structure, so that the fused model has higher stability.
In a specific embodiment, as shown in fig. 5, the RBF neural network is configured such that the first layer is an input layer, and the number of nodes is the number of input data of each sample; the second layer is a hidden layer and is composed of a plurality of radial basis neurons, and aims to form a basis function space so as to map the problem of linear inseparability in a low dimension into a high dimension space to realize linear inseparability; the third layer is an output layer, the node number is 1, and the predicted value fused by the RBFNN model is output.
In a specific embodiment, the radial basis function selected is a gaussian kernel function, whose formula is:
Wherein: c is the center point of the class; x is input data; τ is the gaussian kernel function decay rate.
S6: and taking the second test data set as the input of the trained RBF neural network to obtain a second prediction output result of the RBF neural network.
S7: comparing the second prediction output result with the true value in the second test dataset, and judging whether a fusion model consisting of a TCN-attention model, a CatBoost model, an ANFIS model and an RBF neural network meets the prediction precision requirement or not according to the comparison result:
if the prediction accuracy requirement is not met, repeating the steps S2-S7 or repeating the steps S5-S7;
and if the prediction accuracy requirement is met, predicting the future yield of the target oil well by using the fusion model.
In a specific embodiment, when comparing the second predicted output result with the actual value in the second test dataset, an average absolute percentage error is used as an evaluation index. The calculation formula of the average absolute percentage error is as follows:
wherein: m is the number of samples; y i is the true value of the oil well yield; y i' is the predicted value of the well production model.
In this embodiment, a MAPE prediction error threshold is set according to a prediction accuracy requirement, and when a MAPE prediction error of a fusion model obtained by training is less than or equal to the MAPE prediction error threshold, the fusion model is the final fusion model.
In a specific embodiment, the method for predicting the oil well yield based on the multi-machine learning model fusion is used for predicting the yield of one oil well in a Tarim area, and a classical RNN model is used for predicting the yield of the oil well, so that the accuracy of the method and the accuracy of the method are improved. In this embodiment, the yield prediction using the present invention specifically includes the following steps:
(1) Collecting production data of a target oil well, and preprocessing the production data to obtain a production data set; wherein, the partial production data of the oil producing well is shown in Table 1
TABLE 1 target well section production profile
(2) And training according to the steps S2-S7 to obtain a fusion model meeting the prediction precision requirement, wherein the optimal super-parameter combination of each model is obtained by adopting a grid search method in the training process.
(3) And predicting the future oil production of the target oil production well by using the fusion model.
The predicted results of the present invention and the predicted results of the classical RNN model, TCN-attention model, catBoost model, and ANFIS model predicted alone are shown in fig. 6. As can be seen from FIG. 6, the prediction accuracy of the TCN-attention model, catBoost model, ANFIS model, and classical RNN model alone are not as good as those of the fusion model of the present invention. The average absolute percent error results for each model are shown in table 2:
table 2 mean absolute percentage error results for each model
Evaluation index | Fusion model | RNN | TCN-attention | CatBoost | ANFIS |
MAPE(%) | 4.29 | 7.04 | 5.38 | 9.64 | 30.04 |
As can be seen from Table 2, the fusion model of the present invention has the least error and the highest prediction accuracy, and the error is reduced by 20.34% compared with the error of the TCN-attention model alone. The predicted results of the present invention and the predicted results of the TCN-attention model alone are shown in FIGS. 7 and 8. It can also be seen from fig. 7 and 8 that the predicted results of the present invention are more nearly true for the production well than the TCN-attention model alone.
In conclusion, the RBF neural network is used for fusing the TCN-attention model, the CatBoost model and the ANFIS model, so that the prediction accuracy of the future production of the oil well can be improved; compared with the prior art, the invention has obvious progress.
The present invention is not limited to the above-mentioned embodiments, but is intended to be limited to the following embodiments, and any modifications, equivalents and modifications can be made to the above-mentioned embodiments without departing from the scope of the invention.
Claims (6)
1. The oil well yield prediction method based on the fusion of the multi-machine learning model is characterized by comprising the following steps of:
S1: collecting production data of a target oil well, and preprocessing the production data to obtain a production data set;
s2: dividing the production dataset into a training dataset one and a test dataset one;
S3: respectively constructing a TCN-attention model, a CatBoost model and an ANFIS model, and respectively training a pair of three models by utilizing the training data set to obtain a trained TCN-attention model, catBoost model and ANFIS model;
S4: taking the test data set I as the input of a trained TCN-attention model, a trained CatBoost model and a trained ANFIS model to obtain a predicted output result I of the three models, and dividing the predicted output result I into a training data set II and a test data set II;
S5: constructing an RBF neural network, and training the RBF neural network by using the training data set II to obtain a trained RBF neural network;
S6: taking the second test data set as the input of the trained RBF neural network to obtain a second prediction output result of the RBF neural network;
S7: comparing the second prediction output result with the true value in the second test dataset, and judging whether a fusion model consisting of a TCN-attention model, a CatBoost model, an ANFIS model and an RBF neural network meets the prediction precision requirement or not according to the comparison result:
if the prediction accuracy requirement is not met, repeating the steps S2-S7 or repeating the steps S5-S7;
and if the prediction accuracy requirement is met, predicting the future yield of the target oil well by using the fusion model.
2. The method for predicting oil well production based on multi-machine learning model fusion according to claim 1, wherein in step S1, the production data includes daily oil production, production time, daily water production, oil pressure, casing pressure and back pressure.
3. The method for predicting oil well production based on multi-machine learning model fusion of claim 1, wherein in step S1, preprocessing the production data comprises data removal, data completion and data normalization.
4. The method for predicting oil well production based on multi-machine learning model fusion according to claim 1, wherein in step S3, when training the TCN-attention model by using the training dataset, the input data of the TCN-attention model is constructed by using a sliding window.
5. The method according to claim 1, wherein in step S3, when training the CatBoost model and the ANFIS model respectively using the training dataset, the production data of the day before the predicted target date is used as the input data of the CatBoost model and the ANFIS model.
6. The method for predicting oil well production based on multi-machine learning model fusion according to any one of claims 1 to 5, wherein in step S7, when comparing the predicted output result two with the true value in the test data set two, an average absolute percentage error is used as an evaluation index.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210826531.0A CN115099519B (en) | 2022-07-13 | 2022-07-13 | Oil well yield prediction method based on multi-machine learning model fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210826531.0A CN115099519B (en) | 2022-07-13 | 2022-07-13 | Oil well yield prediction method based on multi-machine learning model fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115099519A CN115099519A (en) | 2022-09-23 |
CN115099519B true CN115099519B (en) | 2024-05-24 |
Family
ID=83297126
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210826531.0A Active CN115099519B (en) | 2022-07-13 | 2022-07-13 | Oil well yield prediction method based on multi-machine learning model fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115099519B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116050547A (en) * | 2023-01-12 | 2023-05-02 | 哈尔滨工业大学 | Landing gear performance prediction method based on self-attention integrated learning |
CN116225102B (en) * | 2023-05-06 | 2023-08-01 | 南方电网调峰调频发电有限公司信息通信分公司 | Mobile energy storage communication temperature rise automatic monitoring system and device |
CN116611556A (en) * | 2023-05-17 | 2023-08-18 | 西南石油大学 | Compact gas well single well yield prediction method based on hybrid neural network |
CN116451877B (en) * | 2023-06-16 | 2023-09-01 | 中国石油大学(华东) | Pipe network open-cut production prediction method based on computable semantic network |
CN116861800B (en) * | 2023-09-04 | 2023-11-21 | 青岛理工大学 | Oil well yield increasing measure optimization and effect prediction method based on deep learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105096007A (en) * | 2015-08-27 | 2015-11-25 | 中国石油天然气股份有限公司 | Oil well yield prediction method and device based on improved neural network |
CN110400006A (en) * | 2019-07-02 | 2019-11-01 | 中国石油化工股份有限公司 | Oil well output prediction technique based on deep learning algorithm |
CN110969249A (en) * | 2018-09-29 | 2020-04-07 | 北京国双科技有限公司 | Production well yield prediction model establishing method, production well yield prediction method and related device |
-
2022
- 2022-07-13 CN CN202210826531.0A patent/CN115099519B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105096007A (en) * | 2015-08-27 | 2015-11-25 | 中国石油天然气股份有限公司 | Oil well yield prediction method and device based on improved neural network |
CN110969249A (en) * | 2018-09-29 | 2020-04-07 | 北京国双科技有限公司 | Production well yield prediction model establishing method, production well yield prediction method and related device |
CN110400006A (en) * | 2019-07-02 | 2019-11-01 | 中国石油化工股份有限公司 | Oil well output prediction technique based on deep learning algorithm |
Non-Patent Citations (1)
Title |
---|
BP神经网络补偿算法在煤层气井产量预测中的应用;李萍;吉勇;熊杰;刘昊;冯吴祥;王栋;;中国煤层气;20161015(第05期);41-45 * |
Also Published As
Publication number | Publication date |
---|---|
CN115099519A (en) | 2022-09-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115099519B (en) | Oil well yield prediction method based on multi-machine learning model fusion | |
Dong et al. | Electrical load forecasting: A deep learning approach based on K-nearest neighbors | |
CN111563706A (en) | Multivariable logistics freight volume prediction method based on LSTM network | |
Sun et al. | Prediction of stock index futures prices based on fuzzy sets and multivariate fuzzy time series | |
CN111027772B (en) | Multi-factor short-term load prediction method based on PCA-DBILSTM | |
Hassan et al. | A hybrid of multiobjective Evolutionary Algorithm and HMM-Fuzzy model for time series prediction | |
CN106600059A (en) | Intelligent power grid short-term load predication method based on improved RBF neural network | |
CN110826774B (en) | Bus load prediction method and device, computer equipment and storage medium | |
CN106529732A (en) | Carbon emission efficiency prediction method based on neural network and random frontier analysis | |
CN111738477A (en) | Deep feature combination-based power grid new energy consumption capability prediction method | |
CN115510963A (en) | Incremental equipment fault diagnosis method | |
CN115018193A (en) | Time series wind energy data prediction method based on LSTM-GA model | |
Akpinar et al. | Forecasting natural gas consumption with hybrid neural networks—Artificial bee colony | |
CN114118567A (en) | Power service bandwidth prediction method based on dual-channel fusion network | |
CN112801416A (en) | LSTM watershed runoff prediction method based on multi-dimensional hydrological information | |
CN113095484A (en) | Stock price prediction method based on LSTM neural network | |
CN116503118A (en) | Waste household appliance value evaluation system based on classification selection reinforcement prediction model | |
CN110110447B (en) | Method for predicting thickness of strip steel of mixed frog leaping feedback extreme learning machine | |
CN114817571A (en) | Method, medium, and apparatus for predicting achievement quoted amount based on dynamic knowledge graph | |
CN109697531A (en) | A kind of logistics park-hinterland Forecast of Logistics Demand method | |
CN113537556A (en) | Household short-term load prediction method based on state frequency memory network | |
CN117194918A (en) | Air temperature prediction method and system based on self-attention echo state network | |
CN112651499A (en) | Structural model pruning method based on ant colony optimization algorithm and interlayer information | |
CN114638421A (en) | Method for predicting requirement of generator set spare parts | |
CN103198357A (en) | Optimized and improved fuzzy classification model construction method based on nondominated sorting genetic algorithm II (NSGA- II) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |