CN115099519B

CN115099519B - Oil well yield prediction method based on multi-machine learning model fusion

Info

Publication number: CN115099519B
Application number: CN202210826531.0A
Authority: CN
Inventors: 甄艳; 赵晓明; 方君易; 葛家旺
Original assignee: Southwest Petroleum University
Current assignee: Southwest Petroleum University
Priority date: 2022-07-13
Filing date: 2022-07-13
Publication date: 2024-05-24
Anticipated expiration: 2042-07-13
Also published as: CN115099519A

Abstract

The invention discloses an oil well yield prediction method based on multi-machine learning model fusion, which comprises the following steps: s1: collecting production data of a target oil well and preprocessing to obtain a production data set; s2: dividing a training data set and a test data set; s3: respectively constructing a TCN-attention model, a CatBoost model and an ANFIS model and training; s4: obtaining a first prediction output result of the three models, and dividing the first prediction output result into a second training data set and a second testing data set; s5: constructing an RBF neural network and training; s6: obtaining a second prediction output result; s7: comparing the second prediction output result with the true value in the second test data set, and judging whether the fusion model meets the prediction precision requirement according to the comparison result: if not, retraining; and if so, predicting the future yield of the target oil well by using the fusion model. The invention can more accurately predict the daily oil yield of the oil well and provide technical support for oil field production.

Description

Oil well yield prediction method based on multi-machine learning model fusion

Technical Field

The invention relates to the technical field of oil field production well productivity prediction, in particular to an oil well yield prediction method based on multi-machine learning model fusion.

Background

The oil well yield prediction has an important influence on the design of an oil field development scheme, and the oil well working system is timely adjusted through the yield prediction, so that on-site deployment and workload distribution are more scientific and reasonable, and the normal realization of a planning target is ensured. The most common method for predicting the oil well yield at present is an oil reservoir numerical simulation method, which can obtain a relatively accurate yield prediction result through geologic modeling and history fitting, but has the defects that a large amount of geologic data, petrophysical data and reservoir parameter information are required for modeling in order to ensure the accuracy of the result, so that the problems of large workload, long time consumption and the like are caused.

At present, a large amount of production data with complex structures are accumulated in oil fields, so that an artificial intelligence method is focused by researchers in the oil and gas field. BP neural network and traditional machine learning methods such as Support Vector Machines (SVM), random Forest (RF) and the like are widely used in oil and gas field yield prediction. However, the two methods do not pay attention to the time sequence of the oil well production data, belong to a point-to-point mapping, and neglect the relation between the front and the back of the data. In statistics, linear models such as an autoregressive model (AR) and a differential autoregressive integral moving average model (ARIMA) are mainly used for time series data, but the linear models are difficult to deal with a nonlinear problem of huge data volume. In addition, the oil well yield is affected by factors such as production dynamics, such as formation pressure, production duration, etc., besides various characteristics of time series data, and a single prediction model cannot generally meet the requirements of actual productivity prediction.

Disclosure of Invention

Aiming at the problems, the invention aims to provide an oil well yield prediction method based on multi-machine learning model fusion, which can more accurately realize the prediction of daily oil yield of an oil well.

The technical scheme of the invention is as follows:

an oil well yield prediction method based on multi-machine learning model fusion comprises the following steps:

S1: collecting production data of a target oil well, and preprocessing the production data to obtain a production data set;

s2: dividing the production dataset into a training dataset one and a test dataset one;

S3: respectively constructing a TCN-attention model, a CatBoost model and an ANFIS model, and respectively training a pair of three models by utilizing the training data set to obtain a trained TCN-attention model, catBoost model and ANFIS model;

S4: taking the test data set I as the input of a trained TCN-attention model, a trained CatBoost model and a trained ANFIS model to obtain a predicted output result I of the three models, and dividing the predicted output result I into a training data set II and a test data set II;

S5: constructing an RBF neural network, and training the RBF neural network by using the training data set II to obtain a trained RBF neural network;

S6: taking the second test data set as the input of the trained RBF neural network to obtain a second prediction output result of the RBF neural network;

S7: comparing the second prediction output result with the true value in the second test dataset, and judging whether a fusion model consisting of a TCN-attention model, a CatBoost model, an ANFIS model and an RBF neural network meets the prediction precision requirement or not according to the comparison result:

if the prediction accuracy requirement is not met, repeating the steps S2-S7 or repeating the steps S5-S7;

and if the prediction accuracy requirement is met, predicting the future yield of the target oil well by using the fusion model.

Preferably, in step S1, the production data includes daily oil production, production time, daily water production, oil pressure, casing pressure and back pressure.

Preferably, in step S1, the preprocessing of the production data includes data removal, data complementation and data normalization.

Preferably, in step S3, when training the TCN-attention model by using the training dataset, input data of the TCN-attention model is constructed by adopting a sliding window mode.

Preferably, in step S3, when training the CatBoost model and the ANFIS model respectively using the training dataset, production data of a day before a predicted target date is used as input data of the CatBoost model and the ANFIS model.

Preferably, in step S7, when comparing the second predicted output result with the actual value in the second test data set, an average absolute percentage error is used as an evaluation index.

The beneficial effects of the invention are as follows:

according to the invention, production data (daily oil production, production time, daily water production, oil pressure, casing pressure and back pressure) of a target oil well are taken as basic data of a training model, the influence of production dynamic factors on the oil well yield is considered, and the model prediction result obtained by training can be more in line with reality; in addition, the TCN-attention model, the CatBoost model and the ANFIS model are combined through the RBF neural network to obtain a fusion model, the time sequence of oil well production is considered, the nonlinear problem of time sequence data is considered, the problems of model stability and complexity are also considered, and the fusion model is used for predicting the future yield of the oil well to obtain a more accurate prediction result.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a flow chart of a method for predicting oil well production based on multi-machine learning model fusion according to the present invention;

Fig. 2 is a schematic structural diagram of a TCN residual module;

FIG. 3 is a schematic structural diagram of a TCN-attention model;

FIG. 4 is a schematic structural diagram of an ANFIS model;

FIG. 5 is a schematic diagram of the structure of an RBF neural network;

FIG. 6 is a schematic diagram showing comparison of prediction results of different models according to one embodiment;

FIG. 7 is a graph showing a comparison of the predicted results of the fusion model of the present invention and the TCN-attention model alone in FIG. 6;

FIG. 8 is a graph showing the comparison result of the partial predicted target date in FIG. 7.

Detailed Description

The application will be further described with reference to examples and figures. It should be noted that, without conflict, the embodiments of the present application and the technical features of the embodiments may be combined with each other. It is noted that all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless otherwise indicated. The use of the terms "comprising" or "includes" and the like in this disclosure is intended to cover a member or article listed after that term and equivalents thereof without precluding other members or articles.

As shown in fig. 1, the invention provides a method for predicting oil well yield based on multi-machine learning model fusion, which comprises the following steps:

S1: and collecting production data of the target oil producing well, and preprocessing the production data to obtain a production data set.

In a specific embodiment, the production data includes daily oil production, production time, daily water production, oil pressure, casing pressure, and back pressure. In this embodiment, the model is trained by incorporating the production date into the production dataset, enabling consideration of the chronology of the well production data; the oil pressure, the casing pressure and the back pressure are taken into the production data set to train the model, so that the influence of ecological dynamic factors on the oil well yield can be considered, and the predicted oil well yield result is more practical.

In a specific embodiment, preprocessing the production materials includes data removal, data complementation, and data normalization. The data removal is mainly to remove abnormal data in production data, the data complement can be completed by adopting methods such as average value or interpolation, and the like, and the data normalization is in the prior art, so that the influence of different characteristic data sizes can be eliminated.

In a specific embodiment, the normalization process is performed by adopting a dispersion normalization method, and a specific calculation formula is as follows:

wherein: x _i' is the ith data after normalization; x _i is the ith data in the original data; x _max and X _min are the maximum and minimum values in the raw data.

S2: the production dataset is divided into a training dataset one and a test dataset one.

In a specific embodiment, the data ratio of training data set one to test data set one is 7:3. It should be noted that the ratio may be changed to 8:2, 6:4, 9:1, etc. as required, and the specific division ratio may be adjusted according to the precision of the final fusion model.

S3: and respectively constructing a TCN-attention model, a CatBoost model and an ANFIS model, and respectively training a pair of three models by utilizing the training data set to obtain a trained TCN-attention model, catBoost model and ANFIS model.

The TCN-attention model is based on a Time Convolutional Network (TCN), and a attention mechanism module is introduced into a hidden layer of the TCN. The TCN is a convolutional neural network dedicated to processing time-series data, and its composition is divided into three parts, namely causal convolution, dilation convolution and residual connection. The special network structure has the advantages of flexible adjustment of receptive fields, low memory occupation and parallel calculation, and can avoid the problems of gradient disappearance or gradient explosion and the like in the training of the cyclic neural network. The essence of the attention mechanism is that the characteristics are screened by changing the weight values in the network, and the neural network imitates the attention mechanism of the human brain. In training of a neural network, a large amount of feature data is usually input, and an attention mechanism can screen important feature data with significant influence on an output result from the feature data, and weight values of the data are increased to improve model prediction accuracy.

In a specific embodiment, the construction of the TCN-attention model and the data processing flow specifically include the following sub-steps:

(1) Constructing an input dataset of a TCN-attention model

The TCN-attention model not only can extract time sequence information in the oil production data, but also has the capability of receiving multi-feature input. In a specific embodiment, the input data set is constructed in a sliding window manner, specifically: if the first t data are needed to predict the h oil production data, the sliding window is set to be t, the input data of the first sample is X ⁽¹⁾＝[X₁,X₂,…,X_i,…,X_t]^T, wherein X _i is composed of a plurality of characteristic data, X _i＝[X_i ¹,X_i ²,…,X_i ⁿ, n is the type number of the characteristic data (i.e. 6 production data such as oil pressure, casing pressure, etc.), and the output data is Y ⁽¹⁾＝[y_t+1,y_t+2,…,y_t+h]^T. The second sample has input data of X ⁽²⁾＝[X₂,X₃,…,X_t+1]^T and output sample Y ⁽²⁾＝[y_t+2,y_t+3,…,y_t+h+1]^T. And so on until all data is traversed.

(2) Residual block for constructing TCN network

In a specific embodiment, the residual block is structured as shown in fig. 2. As can be seen from fig. 2, in this embodiment, the residual block is composed of two parallel branches, one of which is a residual connection, and is composed of a layer of one-dimensional convolution with weight normalization; the other branch consists of two one-dimensional convolution layers added with a weight normalization layer and a Dropout layer, and an activation function is set as Relu functions; the outputs of the two branches are finally combined in addition.

(3) Inputting data into TCN network, obtaining output by convolution operation and continuously inputting to next layer

Assuming that the time sequence of the input data is x= [ X ₁,x₂,x_i,…,x_t-1,x_t ] and the convolution kernel of each layer is f= [ f ₁,f₂,…,f_k-1,f_k ], the convolution calculation formula at the time t is as follows:

(4) As the number of network layers increases, the convolution kernel changes its calculation mode by expanding coefficients, so the convolution calculation formula changes to:

wherein, the relation between d and the network layer number r is as follows:

d＝2^r-1 (4)

(5) Data is processed by TCN network and output to attention mechanism as query matrix (Q)

In a specific embodiment, the attention mechanism adopts a self-attention structure, and the specific formula of the attention mechanism is as follows:

wherein: o is the output of the attention mechanism, L is the length of the input time series, and K and V are the key matrix and the value matrix, respectively.

In a specific embodiment, the key matrix and the value matrix are obtained by calculation from an original sequence, and the calculation formulas are respectively as follows:

K＝I·W_k+b_k (6)

V＝I·W_V (7)

Wherein: i is the original input sequence, and W _k、b_k and W _V are all parameters to be trained.

(6) After the output of the attention mechanism is obtained, the final data is calculated by a full connection layer to obtain the final output

In the above embodiment, a flowchart of the TCN-attention model created by the above substeps is shown in fig. 3.

The CatBoost model is a gradient learning algorithm based on a gradient lifting decision tree (GBDT) model, which improves the gradient estimation method in the original GBDT algorithm into a ranking lifting algorithm, so that the gradient estimation method can process the category type features in GBDT with higher efficiency. In CatBoost model structure, several base learners are integrated by serial method, and in the training process, each training round can continuously update sample weight so as to attain the goal of reducing prediction deviation caused by noise point. Compared with other gradient lifting integration algorithms, catBoost can automatically process discrete feature data, and has obvious advantages in processing regression problems of multi-feature input. According to the invention, the CatBoost model is selected to be used in oil well yield prediction, so that the prediction capability of the model on the multi-feature regression problem is utilized, and the prediction performance of the combined model is effectively improved.

In a specific embodiment, the construction of the CatBoost model and the data processing flow specifically include the following substeps:

(1) Construction CatBoost of an input dataset of a model

The Catboost model works well in dealing with regression problems with multi-feature inputs, and in one particular embodiment, the feature data of the day before the selection of target prediction data when constructing the input dataset enables CatBoost to learn features that exist between short-term feature data and predicted values.

In a specific embodiment, the class feature is processed by using the target variable statistical method, the sequence of the sample data is randomly disordered after the sample data is input, and a new sequence sigma= [ sigma ₁,σ₂,σ_i,…,σ_n ] is generated, and then the kth feature value in the feature vector sigma _i can be expressed as:

Wherein: p and beta are prior values and prior value weights introduced by low-frequency class data noise, and the value of P is the average value of the output in the sample.

(2) And constructing tree structures in different segmentation modes, determining the values of leaf nodes, and then evaluating and scoring each tree structure to obtain an optimal tree structure model.

An adaptive fuzzy inference system (ANFIS) is a predictive model with high stability and low complexity characteristics. The ANFIS controls parameters through the optimized fuzzy controller and performs optimized calculation through the BP neural network, so that the ANFIS inherits the respective advantages of the two methods and overcomes the respective disadvantages. Compared with other machine learning algorithms, ANIFS does not need to optimize super parameters, and higher prediction precision is realized under the condition of quick deployment.

In a specific embodiment, the construction of the ANFIS model and the data processing flow specifically include the following sub-steps:

(1) Constructing an input dataset of an ANFIS model

In a specific embodiment, the method of constructing the input dataset of the ANFIS model is consistent with the method of constructing the input dataset of the CatBoost model described above, again learning the relationship between short-term characteristic data and yield.

(2) And carrying out fuzzy processing on the input data and the output data received by the model through a membership function of a first layer of the model, wherein the specific formula of the membership function is as follows:

wherein: a _i and c _i are both conditional parameters.

(3) And a rule layer for establishing a model is positioned at a second layer of the model, the calculation rule is that each node multiplies the received data, and the output result is the fitness of the rule.

(4) The normalization layer of the model is established, and is positioned at the third layer of the model, wherein the main purpose of the layer is to complete the reliability normalization degree of the fuzzy inference system, and the specific calculation formula is as follows:

Wherein: A normalized value output for each node; w _i is the reliability of the node input.

(5) The fourth layer of the model is built, the fourth layer of the model is used for calculating the result of each rule, and the node number of the fourth layer of the model is consistent with that of the last layer, so that each data can be ensured to participate in the self-adaptive evolutionary learning of fuzzy reasoning, and a specific calculation formula is as follows:

Wherein: o _i is the output of the fourth layer; p _i、q_i and r _i are model parameters.

(6) And establishing an output layer of the model, wherein the output layer is the last layer of the model, and the output of each node of the fourth layer is summed to be used as the final prediction output.

In the above embodiment, the model of the ANFIS model is shown in fig. 4.

S4: and taking the first test data set as the input of a trained TCN-attention model, a trained CatBoost model and a trained ANFIS model to obtain a first prediction output result of the three models, and dividing the first prediction output result into a second training data set and a second test data set.

In a specific embodiment, the prediction output results are partitioned by one, and are also partitioned by a ratio of 7:3.

S5: and constructing an RBF neural network, and training the RBF neural network by using the training data set II to obtain a trained RBF neural network.

The RBF neural network is Radial Basis Function Neural Network (RBFNN), comprises an input layer, a hidden layer and an output layer, and has high training efficiency. The basic idea is to use radial basis functions as the "basis" of hidden units to construct hidden layer space, and input data is transformed into high-dimensional space by vector transformation of the network, so that the data is linearly separable in higher dimensions.

According to the invention, the RBF neural network is adopted to fuse the prediction outputs of the TCN-attention model, the CatBoost model and the ANFIS model, so that the model result can be fused by utilizing the excellent fitting capacity of the RBF neural network and a more accurate prediction value can be output, and meanwhile, the complexity of the model is not increased due to the simple structure, so that the fused model has higher stability.

In a specific embodiment, as shown in fig. 5, the RBF neural network is configured such that the first layer is an input layer, and the number of nodes is the number of input data of each sample; the second layer is a hidden layer and is composed of a plurality of radial basis neurons, and aims to form a basis function space so as to map the problem of linear inseparability in a low dimension into a high dimension space to realize linear inseparability; the third layer is an output layer, the node number is 1, and the predicted value fused by the RBFNN model is output.

In a specific embodiment, the radial basis function selected is a gaussian kernel function, whose formula is:

Wherein: c is the center point of the class; x is input data; τ is the gaussian kernel function decay rate.

S6: and taking the second test data set as the input of the trained RBF neural network to obtain a second prediction output result of the RBF neural network.

In a specific embodiment, when comparing the second predicted output result with the actual value in the second test dataset, an average absolute percentage error is used as an evaluation index. The calculation formula of the average absolute percentage error is as follows:

wherein: m is the number of samples; y _i is the true value of the oil well yield; y _i' is the predicted value of the well production model.

In this embodiment, a MAPE prediction error threshold is set according to a prediction accuracy requirement, and when a MAPE prediction error of a fusion model obtained by training is less than or equal to the MAPE prediction error threshold, the fusion model is the final fusion model.

In a specific embodiment, the method for predicting the oil well yield based on the multi-machine learning model fusion is used for predicting the yield of one oil well in a Tarim area, and a classical RNN model is used for predicting the yield of the oil well, so that the accuracy of the method and the accuracy of the method are improved. In this embodiment, the yield prediction using the present invention specifically includes the following steps:

(1) Collecting production data of a target oil well, and preprocessing the production data to obtain a production data set; wherein, the partial production data of the oil producing well is shown in Table 1

TABLE 1 target well section production profile

(2) And training according to the steps S2-S7 to obtain a fusion model meeting the prediction precision requirement, wherein the optimal super-parameter combination of each model is obtained by adopting a grid search method in the training process.

(3) And predicting the future oil production of the target oil production well by using the fusion model.

The predicted results of the present invention and the predicted results of the classical RNN model, TCN-attention model, catBoost model, and ANFIS model predicted alone are shown in fig. 6. As can be seen from FIG. 6, the prediction accuracy of the TCN-attention model, catBoost model, ANFIS model, and classical RNN model alone are not as good as those of the fusion model of the present invention. The average absolute percent error results for each model are shown in table 2:

table 2 mean absolute percentage error results for each model

Evaluation index	Fusion model	RNN	TCN-attention	CatBoost	ANFIS
						MAPE(％)	4.29	7.04	5.38	9.64	30.04

As can be seen from Table 2, the fusion model of the present invention has the least error and the highest prediction accuracy, and the error is reduced by 20.34% compared with the error of the TCN-attention model alone. The predicted results of the present invention and the predicted results of the TCN-attention model alone are shown in FIGS. 7 and 8. It can also be seen from fig. 7 and 8 that the predicted results of the present invention are more nearly true for the production well than the TCN-attention model alone.

In conclusion, the RBF neural network is used for fusing the TCN-attention model, the CatBoost model and the ANFIS model, so that the prediction accuracy of the future production of the oil well can be improved; compared with the prior art, the invention has obvious progress.

The present invention is not limited to the above-mentioned embodiments, but is intended to be limited to the following embodiments, and any modifications, equivalents and modifications can be made to the above-mentioned embodiments without departing from the scope of the invention.

Claims

1. The oil well yield prediction method based on the fusion of the multi-machine learning model is characterized by comprising the following steps of:

2. The method for predicting oil well production based on multi-machine learning model fusion according to claim 1, wherein in step S1, the production data includes daily oil production, production time, daily water production, oil pressure, casing pressure and back pressure.

3. The method for predicting oil well production based on multi-machine learning model fusion of claim 1, wherein in step S1, preprocessing the production data comprises data removal, data completion and data normalization.

4. The method for predicting oil well production based on multi-machine learning model fusion according to claim 1, wherein in step S3, when training the TCN-attention model by using the training dataset, the input data of the TCN-attention model is constructed by using a sliding window.

5. The method according to claim 1, wherein in step S3, when training the CatBoost model and the ANFIS model respectively using the training dataset, the production data of the day before the predicted target date is used as the input data of the CatBoost model and the ANFIS model.

6. The method for predicting oil well production based on multi-machine learning model fusion according to any one of claims 1 to 5, wherein in step S7, when comparing the predicted output result two with the true value in the test data set two, an average absolute percentage error is used as an evaluation index.