CN113283174B

CN113283174B - Reservoir productivity prediction method, system and terminal based on algorithm integration and self-control

Info

Publication number: CN113283174B
Application number: CN202110643996.8A
Authority: CN
Inventors: 周长林; 乐宏; 刘飞; 张华礼; 周朗; 陈伟华; 付艳; 吕泽飞; 曾嵘; 张曦; 王茜
Original assignee: Petrochina Co Ltd
Current assignee: Petrochina Co Ltd
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2022-08-30
Anticipated expiration: 2041-06-09
Also published as: CN113283174A

Abstract

The invention discloses a reservoir productivity prediction method, a reservoir productivity prediction system and a reservoir productivity prediction terminal based on algorithm integration and self-control, which relate to the technical field of reservoir development and have the technical scheme key points that: preprocessing productivity test data to obtain a first characteristic set; sorting the first feature set by adopting two interpretable intelligent algorithms to obtain a second feature set, and performing feature dimension reduction on the first feature set by adopting a first manifold dimension reduction method and a second manifold dimension reduction method to obtain a third feature set and a fourth feature set; dividing the feature set into a training set and a testing set; establishing a final prediction model based on an inductive learning algorithm, an interpretable hoisting machine and a LightGBM; and predicting the predicted yield under the single factor, and combining the partial dependence graph to obtain the reservoir productivity. The method can fully mine the potential relation between the characteristics and the prediction target, and avoids the expansion of errors to the maximum extent; the feature set is enhanced and the model is self-organized, so that the generalization capability and robustness of the model are effectively improved.

Description

Reservoir productivity prediction method, system and terminal based on algorithm integration and self-control

Technical Field

The invention relates to the technical field of reservoir development, in particular to a reservoir productivity prediction method, a reservoir productivity prediction system and a reservoir productivity prediction terminal based on algorithm integration and self-control.

Background

With the increasing demand for oil and gas energy and the continuous development of technology, more and more reservoirs are being developed. How to improve the exploitation degree and the utilization value of the reservoir becomes a key problem of reservoir development. A set of systematic and reasonable scheme and technology is established to predict the reservoir productivity, and meanwhile, the method has important significance in realizing efficient production optimization management.

At present, methods for predicting reservoir productivity can be divided into three categories, namely underground testing, numerical simulation methods and intelligent algorithms. The downhole test, although the safety performance of the instrument applied to the downhole test is improved compared with the prior art, has the defects of high risk and time consumption. The numerical simulation method describes a complex dynamic model by establishing a refined geological model so as to accurately predict the productivity, but the complexity and difficulty of establishing and solving a corresponding mathematical physical model are high due to the complexity of geological conditions. With the arrival of the big data era, the intelligent algorithms such as machine learning and fuzzy logic break the limitations of the former two methods, and continuously help the energy exploitation industry to mine the potential of business.

However, in the context of reservoir capacity prediction, the existing intelligent algorithm application has some disadvantages. First, intelligent algorithms are singularly severe in energy storage development applications. If only one algorithm or model is used in the prediction, it may result in: a. the prediction method cannot effectively depict the mapping relation between the complex data characteristics and the prediction target, and further the prediction precision is poor; b. the permeability, the voidage and the like of different reservoirs are obviously different, and the unicity of an intelligent algorithm cannot ensure the effectiveness under multiple situations, so that the model lacks generalization capability; in addition, the algorithm is simplified, so that the prediction method has difficulty in considering high prediction precision and interpretability; models with higher prediction accuracy, such as deep neural networks, can be generated and are often not interpretable, so that the models cannot be used for production optimization management guidance; for an intelligent algorithm with interpretability, such as a random forest, the prediction performance is relatively poor. The loss of either factor of high prediction accuracy and interpretability can cause the intelligent algorithm to have low efficiency in the application of actual reservoir capacity prediction scenes. Secondly, in the reservoir productivity prediction context, feature selection is generally deployed manually, so that the feature selection is lack of intelligence, and the fixed feature also causes the model to lack of generalization capability and robustness.

Therefore, how to research and design a reservoir productivity prediction method, system and terminal based on algorithm integration and self-control is a problem which is urgently needed to be solved at present.

Disclosure of Invention

In order to solve the defects in the prior art, the invention aims to provide a reservoir productivity prediction method, a system and a terminal based on algorithm integration and self-control, and the reservoir productivity prediction method capable of realizing model self-organization control is established by integrating a plurality of intelligent algorithms. The method has high predictive performance, and simultaneously, the method has interpretability; in addition, the method has generalization capability and robustness, and can effectively avoid model overfitting.

The technical purpose of the invention is realized by the following technical scheme:

in a first aspect, a reservoir productivity prediction method based on algorithm integration and self-control is provided, which comprises the following steps:

acquiring productivity test data of a target reservoir, and preprocessing the productivity test data to obtain a first characteristic set;

sorting the first feature set by using two interpretable intelligent algorithms of an interpretable elevator and a LightGBM respectively to obtain a second feature set formed by main control factors, performing feature dimension reduction on the first feature set by using a first manifold dimension reduction method to obtain a third feature set, and performing feature dimension reduction on the first feature set by using a second manifold dimension reduction method to obtain a fourth feature set;

respectively carrying out data division on the first feature set, the second feature set, the third feature set and the fourth feature set, and fusing to obtain a training set and a testing set;

sequentially establishing a first prediction model based on an induction learning algorithm, a second prediction model based on an interpretable hoisting machine and a third prediction model based on a LightGBM according to a training set, and processing the first prediction model, the second prediction model and the third prediction model based on weight values to obtain a final prediction model;

drawing a partial dependency graph between the main control factors and the reservoir productivity based on an interpretable intelligent algorithm;

and predicting according to the test set and the final prediction model to obtain the predicted yield under the single factor, and combining a partial dependence graph to obtain the reservoir productivity.

Further, the obtaining process of the first feature set specifically includes:

acquiring relevant parameters and capacity data of a finished production layer of the capacity test;

standardizing the acquired related parameters and productivity data, storing the standardized related parameters and productivity data in a target database, and combining the standardized characteristics to serve as a first characteristic set;

and filling up missing data in the acquired data by adopting a generation countermeasure network.

Further, the obtaining process of the second feature set specifically includes:

respectively fitting the training sets divided by the first feature set by using two interpretable intelligent algorithms of an interpretable elevator and a LightGBM, and respectively sequencing the first feature set according to the fitting effect;

and respectively calculating the importance scores of all the features in the first feature set under the two interpretable models, reordering the importance of all the features by adopting a performance weighting mode, and identifying main control factors influencing the rock formation productivity according to the reordering result to obtain a second feature set.

Further, the first manifold dimension reduction method adopts a distribution domain embedding algorithm, and the acquisition process of the third feature set specifically comprises the following steps:

calculating the similarity of the sample set in an observation space, and converting the Euclidean distance between two samples in the observation space into conditional probability;

representing the joint probability distribution among the data points in the same space by using the corresponding conditional probability distribution;

and minimizing the difference between the similarity of the high-dimensional sample and the similarity of the low-dimensional sample to obtain a third feature set after the optimal dimension reduction.

Further, the second manifold dimension reduction method adopts a local linear embedding algorithm, and the acquisition process of the fourth feature set specifically comprises the following steps:

calculating k nearest neighbors of each sample in the sample set;

calculating a corresponding local variance matrix for each sample, and solving a corresponding weight coefficient vector;

combining the weight coefficient vectors of each sample into a weight coefficient matrix, and calculating to obtain a matrix M;

calculating the first N eigenvalues of the matrix M, and calculating eigenvectors corresponding to the first N eigenvalues;

and extracting a matrix formed by the second eigenvector and the Nth eigenvector to obtain a low-dimensional sample set matrix.

Furthermore, the first feature set, the second feature set, the third feature set and the fourth feature set are divided into corresponding training sets and test sets by adopting a K-fold cross validation method.

Further, the process of obtaining the final prediction model specifically includes:

respectively constructing a first prediction model, a second prediction model and a third prediction model aiming at four feature combinations of a first feature set, a second feature set, a third feature set and a fourth feature set according to data in a training set;

respectively obtaining 12 loss values of the training process of the first prediction model, the second prediction model and the third prediction model, and distributing weight values for 12 base learners according to the loss values based on a performance weighting method;

and calculating to obtain the weight corresponding to each prediction model according to the weight distribution result of the base learner, and performing integrated calculation on each prediction model and the corresponding weight to obtain a final prediction model.

Furthermore, the partial dependency graph is drawn by selecting the interpretable intelligent algorithm with the best fitting effect from the interpretable intelligent algorithms of the interpretable elevator and the LightGBM.

In a second aspect, there is provided a reservoir capacity prediction system based on algorithm integration and self-control, comprising:

the data acquisition module is used for acquiring productivity test data of a target reservoir and preprocessing the productivity test data to obtain a first characteristic set;

the data processing module is used for sequencing the first feature set by adopting two interpretable intelligent algorithms of an interpretable hoister and a LightGBM to obtain a second feature set formed by main control factors, performing feature dimension reduction on the first feature set by adopting a first manifold dimension reduction method to obtain a third feature set, and performing feature dimension reduction on the first feature set by adopting a second manifold dimension reduction method to obtain a fourth feature set;

the data division module is used for respectively carrying out data division on the first feature set, the second feature set, the third feature set and the fourth feature set, and fusing to obtain a training set and a test set;

the model building module is used for sequentially building a first prediction model based on an induction learning algorithm, a second prediction model based on an interpretable hoisting machine and a third prediction model based on a LightGBM according to the training set, and processing the first prediction model, the second prediction model and the third prediction model based on weight values to obtain a final prediction model;

the drawing module is used for drawing a partial dependency graph between the main control factors and the reservoir productivity based on the interpretable intelligent algorithm;

and the prediction module is used for predicting to obtain the predicted yield under the single factor according to the test set and the final prediction model and obtaining the reservoir productivity by combining a part of dependency graphs.

In a third aspect, there is provided a computer terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the method for predicting reservoir capacity based on algorithm integration and self-control according to any one of the first aspect.

Compared with the prior art, the invention has the following beneficial effects:

1. the calculation method is simple and convenient, and a complex mathematical physical model needs to be established for the traditional reservoir yield prediction based on numerical simulation; however, due to the complexity of the real environment, the real conditions cannot be accurately simulated, the difficulty of model establishment is high, and the calculation complexity is high; according to the invention, the reservoir productivity is predicted in an intelligent algorithm integration mode, all data are from historical observation, and manual intervention is reduced;

2. the invention gives consideration to the precision and interpretability of reservoir productivity prediction, integrates a high-precision intelligent algorithm (an induction learning algorithm) and an interpretable intelligent algorithm (an interpretable hoisting machine, LightGBM), can realize the high-precision prediction of the reservoir productivity, can also combine a plurality of interpretable models to identify main control factors influencing the reservoir productivity, and provides a basis for efficient production optimization management.

3. The method is advanced in calculation method and has good model generalization capability, the data self-organization feature generation strategy based on the model combined screening main control factors, the induction learning algorithm and the performance weighting strategy aiming at the base learner can carry out intelligent self-organization control, so that the prediction model can timely adjust the model structure of the prediction model when the uncertainty change of the prediction target and the external condition occurs, and further, the generalization capability and the robustness of the model are enhanced; the induction learning algorithm adopted by the invention can be effectively applied to the small sample prediction problem, and the characteristic provides an effective solution for poor prediction performance caused by less sample data at the initial stage of reservoir development, so that the method can be earlier and better applied to the current reservoir development project compared with other methods; in addition, the lifting interpreter and the LightGBM adopted by the invention not only have interpretability, but also have improved accuracy compared with a random forest algorithm and an xgboost algorithm. More importantly, the light-weight memory usage and the faster training speed of the interpreter and the LightGBM can be improved, so that the interpreter and the LightGBM are more suitable for practical industrial application scenes.

4. The invention realizes the intellectualization of the filling of the missing data, and the traditional data filling method needs to establish a plurality of models to fill the missing values with different attributes for a multivariable missing mode or any missing mode, so the calculated amount is overlarge; the method adopts the generation of the fitting data distribution of the confrontation network to carry out data filling, and can complete the filling of the multivariable missing model at one time after the training is finished.

5. The invention realizes the intellectualization of the characteristic engineering. Reservoir structures are very complex and irregular, and the features of manual deployment lack accuracy and generalization capability. According to the method, two manifold dimension reduction methods (t-distribution field embedding algorithm and local linear embedding algorithm) and the enhanced feature set of the data self-organization feature generation construction prediction model based on the model combined screening main control factors are adopted, so that the productivity prediction can be well carried out, and the generalization capability and the robustness of the model are improved.

In conclusion, the reservoir productivity prediction method based on algorithm integration and self-control can fully mine the potential relation between the characteristics and the prediction target, and avoid the expansion of errors to the maximum extent; by integrating the intelligent algorithm, the feature set is enhanced and the model is self-organized, so that the generalization capability and robustness of the model are effectively improved, the over-fitting phenomenon is effectively avoided, and the method is superior to other intelligent learning methods. By integrating the high-precision prediction model and the interpretable prediction model, a high-efficiency reservoir productivity prediction model is built, main control factors influencing productivity can be identified, and a new idea is provided for reservoir productivity prediction and production management optimization.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a flow chart of an implementation in an embodiment of the invention;

FIG. 2 is a flowchart illustrating the implementation of the screening of the master factors in the embodiment of the present invention;

fig. 3 is a schematic diagram of a data missing value filling framework in an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and the accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not used as limiting the present invention.

The embodiment is as follows: the reservoir productivity prediction method based on algorithm integration and self-control is specifically realized by the following steps as shown in fig. 1.

Step 1: and (4) collecting and preprocessing data.

Step 11: and acquiring relevant parameters and capacity data of the finished production layer of the capacity test. Assume that the original dataset is a matrix A of dimension n × (m +1), A being expressed as:

wherein, each row corresponds to a reservoir, the first m elements of each row correspond to m influencing factors of the capacity of the reservoir, and the m +1 th element is the capacity of the reservoir.

Step 12: and performing maximum-minimum standardization processing on the acquired data and storing the data in a target database. And the combination of features obtained at this time is taken as feature set 1.

The max-min normalization formula is as follows:

wherein x is _ab As a value after normalization, X _ab Is an original value, X _min Is X _ab Minimum value of column, X _max Is X _ab Maximum value of the column.

At this time, the raw data can be expressed as:

where B is denoted as feature set 1.

B is expressed as:

or B ═ x ₁ ，x ₂ ，...，x _m )。

Step 13: the missing data is filled by using a generation countermeasure network, as shown in fig. 3, and the method is specifically implemented by the following steps.

1) After replacing the missing value NA with 0, the ith sample K in the real data set containing the missing value _i And corresponding missing position vector V _i Splicing to obtain the input vector of the generator

The output of the generator is noted as:

2) calculating to obtain a final filling result: g _i ＝G _i (1-V _i )+K _i ·V _i (ii) a The formula leaves the non-missing values unchanged and the missing values generated by the generator are retained as padding values.

3) The data set filled by the generator and the complete real data set are input into a discriminator to be classified into two categories so as to distinguish data sources.

4) During training, the generator aims to let the arbiter consider the filled data to be from the real data set, and the gradient of parameter update is from the arbiter's sample evaluation of the generator's generated result, so as to minimize log (1-D (G) _i ) ) is a target; the discriminator targets the identification of differences between the real dataset and the filler dataset with parameters updated to maximize log (D (R) _i ))+log(1-D(G _i ) ) is targeted; where D (-) is the discriminator output and R is the true complete data set. The input of the discriminator is a complete real data set, and the output is close to 1; and when the input contains padding data, the value is close to 0, so that the data source is distinguished.

5) Through the continuous game of the generator and the discriminator, the discriminator can not identify the data source, and an effective generator is obtained for filling missing values in the data set.

It should be noted that, the first feature set after the missing data is filled may be adopted for the subsequent processing by applying the first feature set, or may be adopted before the filling.

And 2, step: and (5) characteristic engineering.

Step 21: and (4) identifying master factors and visualizing a partial dependency graph.

And (3) sequencing the feature importance by using two interpretable intelligent algorithms (interpretable hoisting machines, LightGBM), and re-sequencing the feature importance by combining the two interpretable intelligent algorithms in a performance weighting mode, identifying main control factors influencing the reservoir productivity, and generating a feature set 2. And drawing a partial dependence graph of the master factors. As shown in fig. 2, this is achieved by the following steps.

(1) Obtaining a raw normalized data set A by step 1 _scale Thereafter, the data is divided into a training set and a test set (step 3).

(2) The interpretable elevator and LightGBM are used to fit the training set, and the R-square is used to evaluate the fit effect of the model on the training set. After the interpretable hoister is used for fitting the training set, the square R of the evaluation index obtained by calculation is recorded as,

after the LightGBM is used for fitting the training set, the square R of the obtained evaluation index is calculated and recorded as,

(3) after the model fitting is completed, calculating the importance scores of the characteristics under the two interpretable models, and respectively marking the importance score of the ith characteristic under the two interpretable models as the importance score

And

the feature importance score is the importance of the kini and is calculated by the following steps:

a. calculate the Gini index GI (a) for each node. The following formula can be used for calculation:

wherein K represents K categories, p ² (k | a) represents the proportion of the class k in the node a.

b. Computing feature x _i Importance at node a. That is, the variation of the Gini indexes before and after the node a is branched is recorded as GID _ia . I.e., the sum of the kini indices of each node minus the kini indices of the child nodes. The following calculation formula can be adopted:

GID _ia ＝GI(a)-GI(l)-GI(r)

wherein, gi (l) and gi (r) respectively represent the kini indexes of two new nodes after the node a branches.

c. If the feature x _i The node appearing in decision tree j is in set M, then x _i The importance in decision tree j is score _ij . The following calculation formula can be adopted:

score _ij ∑ _a∈M GID _ia 。

d. assuming that there are t decision trees in the model in total, then feature x _i Of importance is score _i . The following calculation formula can be adopted:

e. characteristic x _i The importance of is score _i And (6) carrying out normalization processing. The following formula may be used:

the importance score value can be between 0 and 1, and a larger value represents a larger importance of the node.

(4) The final importance score of each feature is calculated by using a performance weighting method, and the calculation formula for the ith feature is as follows:

(5) and sorting the characteristics in a descending order according to the importance scores, and taking the first N characteristics as main control factors.

(6) Comparison

And

and selecting the algorithm with the maximum R square value, namely the interpretable intelligent algorithm with the best fitting effect. And drawing a partial dependence graph between the main control factors and the reservoir productivity based on the selected intelligent algorithm.

The partial dependency graph reflects how a certain characteristic affects the prediction result, and the specific steps are as follows:

a. the relationship between the influencing factors and the reservoir productivity in the intelligent algorithm is assumed as follows:

wherein the content of the first and second substances,

represents the predicted value, x, corresponding to the sample i _ij Representing the jth eigenvalue of sample i.

b. Assume the kth feature x _k For the main control factor, the characteristic value x is fixed _i1 ，x _i2 ，...，x _i(k-1) ，x _i(k+1) ，...，x _im Keeping the characteristic value x unchanged, and changing the characteristic value x from small to large _ik Calculating the feature x under different values _k Partial dependence on reservoir productivity

The following calculation formula can be adopted:

c. by the feature x _k The value of (a) is the abscissa,

drawing a partial dependence graph for the ordinate to visualize reservoir productivity as a function of a master control factor x _k To assist the relevant personnel in making management decisions. If the change curve is in an ascending trend, the influence factor value tends to be increased so as to improve the productivity of the reservoir; and vice versa.

Step 22: and (3) performing feature dimension reduction on the m-dimensional feature set 1 to 2 dimensions by using a manifold dimension reduction method t-distribution field embedding algorithm to generate a feature set 3. Given sample set D ═ x ₁ ，x ₁ ，...，x _n The specific steps of the embedded algorithm in the t-distribution field are as follows:

(1) and calculating the similarity of the sample set D in the observation space. Two samples x in the observation space _i And x _j The Euclidean distance between the two is converted into a conditional probability p _j|i 。p _j|i The calculation formula of (a) is as follows:

wherein σ _i Is the standard deviation of the gaussian distribution.

Two samples x _i And x _j The mapped point in the lower dimension is denoted as y _i And y _j Calculating the conditional probability q of similarity thereof _ji 。q _ji The calculation formula of (c) is as follows:

(2) representing joint probability distribution p among data points in the same space by corresponding conditional probability distribution _ij And q is _ij . The following formula may be used:

(3) and minimizing the difference between the similarity P of the high-dimensional sample and the similarity Q of the low-dimensional sample to obtain the optimal dimension reduction result. The difference between the two samples is measured using the KL divergence and can be expressed as follows:

step 23: and (3) performing feature dimension reduction on the m-dimensional feature set 1 to 2 dimensions by using a manifold dimension reduction method local linear embedding algorithm to generate a feature set 4.

Given sample set D ═ x ₁ ，x ₂ ，...，x _n The specific steps of the local linear embedding algorithm are as follows:

1) calculate each sample x _i K nearest neighbors K _i ＝(x _i1 ，x _i2 ，...，x _ik )。

2) For each sample x _i Calculating its local variance matrix and finding the corresponding weight coefficient vector W _i 。

3) Weighting coefficient vector W of each sample _i And combining the weight coefficient matrix W and calculating to obtain a matrix M. The calculation formula of the matrix M is as follows:

M＝(I-W)(I-W) ^T

wherein I is an identity matrix.

4) And calculating the first 3 eigenvalues of the matrix M, and calculating eigenvectors corresponding to the first 3 eigenvalues.

5) And extracting a matrix formed by the second eigenvector and the 3 rd eigenvector, namely the solved low-dimensional sample set matrix.

And 3, step 3: and dividing a training set and a testing set. And dividing the data set into a training set and a testing set by adopting a K-fold cross validation mode.

And 4, step 4: and constructing an intelligent algorithm integrated prediction model.

Step 41: according to data in the training set, an induction learning algorithm is respectively constructed for four feature combinations (feature set 1, feature set 2, feature set 3 and feature set 4), and an elevator and a LightGBM model can be explained.

The principle of the inductive learning algorithm is as follows:

the induction learning algorithm has a multilayer neuron network structure, and can establish a high-order polynomial relationship between independent variables and dependent variables to obtain a polynomial model with an explanatory capability on the dependent variables. The concrete implementation steps are as follows:

a. an initial network architecture is generated. D (d is the number of input variables) neurons of the first layer are generated, and an initial network structure is constructed.

b. And calculating the external criterion value of each intermediate candidate model, and selecting a part of models with better external criterion values to enter the next layer. Selecting a minimum deviation criterion as a neuron selection criterion, wherein the calculation formula is as follows:

wherein Y is the output value of the dependent variable, N is the sample set size, Y _jkl The predicted value of the kth neuron in the jth layer for the ith sample.

Step 42: weights are assigned to 12(═ 3 × 4) base learners based on a performance weighting method. Assuming an inductive learning algorithm, the loss value of each model training of the elevator and LightGBM is

Respectively corresponding to three intelligent algorithms facing four feature sets. The weight corresponding to the induction learning algorithm trained by the characteristic set i is applied

The following formula can be used for calculation:

step 43: processing the result of the base learner based on the weight value to obtain a corresponding prediction model result; thereby constructing a prediction model.

And 5: reservoir productivity prediction based on algorithm integration-data self-organizing control.

Step 51: and (4) integrating the prediction model according to the test set obtained by dividing in the step (3) and the intelligent algorithm in the step (4) to obtain the predicted yield corresponding to the test set. Assume that the prediction results for k, 12(═ 3 × 4) basis learners in the test set are

The final predicted result

The following formula can be used for calculation:

step 52: and evaluating the prediction effect according to the mean square error and the R square index.

The calculation principle of the 2 evaluation indexes is as follows:

mean square error (MeanSquareError, MSE)

R square (R) ² )

Where N is the test set sample volume, y _n In order to be the true value of the value,

in order to predict the value of the target,

mean of true values. The smaller the MSE, R ² The larger the model, the better the prediction performance.

The working principle is as follows: the reservoir productivity prediction method based on algorithm integration and self-control can fully mine the potential relation between the characteristics and the prediction target, and avoid the expansion of errors to the maximum extent; by integrating the intelligent algorithm, the feature set is enhanced and the model is self-organized, so that the generalization capability and robustness of the model are effectively improved, the over-fitting phenomenon is effectively avoided, and the method is superior to other intelligent learning methods. By integrating the high-precision prediction model and the interpretable prediction model, a high-efficiency reservoir productivity prediction model is built, main control factors influencing productivity can be identified, and a new idea is provided for reservoir productivity prediction and production management optimization.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. The reservoir productivity prediction method based on algorithm integration and self-control is characterized by comprising the following steps of:

sorting the first feature set by using two interpretable intelligent algorithms of an interpretable hoister and a LightGBM respectively to obtain a second feature set formed by main control factors, performing feature dimension reduction on the first feature set by using a first manifold dimension reduction method to obtain a third feature set, and performing feature dimension reduction on the first feature set by using a second manifold dimension reduction method to obtain a fourth feature set;

2. The method for predicting reservoir productivity based on algorithm integration and self-control as claimed in claim 1, wherein the obtaining of the first feature set comprises:

standardizing the acquired related parameters and capacity data, storing the standardized related parameters and capacity data into a target database, and combining the standardized characteristics to serve as a first characteristic set;

3. The method for predicting reservoir productivity based on algorithm integration and self-control as claimed in claim 1, wherein the obtaining of the second feature set comprises:

4. The method for predicting the productivity of the reservoir based on the algorithm integration and the self-control as claimed in claim 1, wherein the first manifold dimension reduction method adopts a distribution domain embedding algorithm, and the acquisition process of the third feature set specifically comprises the following steps:

5. The method for predicting reservoir productivity based on algorithm integration and self-control as claimed in claim 1, wherein the second manifold dimension reduction method adopts a local linear embedding algorithm, and the step of acquiring the fourth feature set specifically comprises:

calculating k nearest neighbors of each sample in the sample set;

6. The method for predicting productivity of a reservoir based on algorithm integration and self-control as claimed in any one of claims 1 to 5, wherein the first feature set, the second feature set, the third feature set and the fourth feature set are divided into corresponding training set and testing set by adopting a K-fold cross validation method.

7. The method for predicting the productivity of a reservoir based on the algorithm integration and the self-control as claimed in any one of claims 1 to 5, wherein the obtaining process of the final prediction model comprises:

8. The method as claimed in any one of claims 1 to 5, wherein the partial dependency graph is drawn by selecting the interpretable intelligent algorithm having the best fitting effect from two interpretable intelligent algorithms, namely, interpretable hoist and LightGBM.

9. The reservoir productivity prediction system based on algorithm integration and self-control is characterized by comprising the following steps:

the data processing module is used for sorting the first feature set by adopting two interpretable intelligent algorithms of an interpretable hoister and a LightGBM respectively to obtain a second feature set formed by main control factors, performing feature dimension reduction on the first feature set by adopting a first manifold dimension reduction method to obtain a third feature set, and performing feature dimension reduction on the first feature set by adopting a second manifold dimension reduction method to obtain a fourth feature set;

the data division module is used for respectively carrying out data division on the first feature set, the second feature set, the third feature set and the fourth feature set and fusing to obtain a training set and a testing set;

the model building module is used for sequentially building a first prediction model based on an induction learning algorithm, a second prediction model based on an interpretable hoister and a third prediction model based on a LightGBM according to the training set, and processing the first prediction model, the second prediction model and the third prediction model based on the weight values to obtain a final prediction model;

and the prediction module is used for predicting to obtain the predicted yield under a single factor according to the test set and the final prediction model and obtaining the reservoir productivity by combining a part of dependency graphs.

10. A computer terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the program implements the method for reservoir capacity prediction based on algorithm integration and self-control as claimed in any one of claims 1 to 7.