CN113283174B - Reservoir productivity prediction method, system and terminal based on algorithm integration and self-control - Google Patents

Reservoir productivity prediction method, system and terminal based on algorithm integration and self-control Download PDF

Info

Publication number
CN113283174B
CN113283174B CN202110643996.8A CN202110643996A CN113283174B CN 113283174 B CN113283174 B CN 113283174B CN 202110643996 A CN202110643996 A CN 202110643996A CN 113283174 B CN113283174 B CN 113283174B
Authority
CN
China
Prior art keywords
feature set
prediction model
feature
reservoir
productivity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110643996.8A
Other languages
Chinese (zh)
Other versions
CN113283174A (en
Inventor
周长林
乐宏
刘飞
张华礼
周朗
陈伟华
付艳
吕泽飞
曾嵘
张曦
王茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Petrochina Co Ltd
Original Assignee
Petrochina Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Petrochina Co Ltd filed Critical Petrochina Co Ltd
Priority to CN202110643996.8A priority Critical patent/CN113283174B/en
Publication of CN113283174A publication Critical patent/CN113283174A/en
Application granted granted Critical
Publication of CN113283174B publication Critical patent/CN113283174B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/22Yield analysis or yield optimisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a reservoir productivity prediction method, a reservoir productivity prediction system and a reservoir productivity prediction terminal based on algorithm integration and self-control, which relate to the technical field of reservoir development and have the technical scheme key points that: preprocessing productivity test data to obtain a first characteristic set; sorting the first feature set by adopting two interpretable intelligent algorithms to obtain a second feature set, and performing feature dimension reduction on the first feature set by adopting a first manifold dimension reduction method and a second manifold dimension reduction method to obtain a third feature set and a fourth feature set; dividing the feature set into a training set and a testing set; establishing a final prediction model based on an inductive learning algorithm, an interpretable hoisting machine and a LightGBM; and predicting the predicted yield under the single factor, and combining the partial dependence graph to obtain the reservoir productivity. The method can fully mine the potential relation between the characteristics and the prediction target, and avoids the expansion of errors to the maximum extent; the feature set is enhanced and the model is self-organized, so that the generalization capability and robustness of the model are effectively improved.

Description

Reservoir productivity prediction method, system and terminal based on algorithm integration and self-control
Technical Field
The invention relates to the technical field of reservoir development, in particular to a reservoir productivity prediction method, a reservoir productivity prediction system and a reservoir productivity prediction terminal based on algorithm integration and self-control.
Background
With the increasing demand for oil and gas energy and the continuous development of technology, more and more reservoirs are being developed. How to improve the exploitation degree and the utilization value of the reservoir becomes a key problem of reservoir development. A set of systematic and reasonable scheme and technology is established to predict the reservoir productivity, and meanwhile, the method has important significance in realizing efficient production optimization management.
At present, methods for predicting reservoir productivity can be divided into three categories, namely underground testing, numerical simulation methods and intelligent algorithms. The downhole test, although the safety performance of the instrument applied to the downhole test is improved compared with the prior art, has the defects of high risk and time consumption. The numerical simulation method describes a complex dynamic model by establishing a refined geological model so as to accurately predict the productivity, but the complexity and difficulty of establishing and solving a corresponding mathematical physical model are high due to the complexity of geological conditions. With the arrival of the big data era, the intelligent algorithms such as machine learning and fuzzy logic break the limitations of the former two methods, and continuously help the energy exploitation industry to mine the potential of business.
However, in the context of reservoir capacity prediction, the existing intelligent algorithm application has some disadvantages. First, intelligent algorithms are singularly severe in energy storage development applications. If only one algorithm or model is used in the prediction, it may result in: a. the prediction method cannot effectively depict the mapping relation between the complex data characteristics and the prediction target, and further the prediction precision is poor; b. the permeability, the voidage and the like of different reservoirs are obviously different, and the unicity of an intelligent algorithm cannot ensure the effectiveness under multiple situations, so that the model lacks generalization capability; in addition, the algorithm is simplified, so that the prediction method has difficulty in considering high prediction precision and interpretability; models with higher prediction accuracy, such as deep neural networks, can be generated and are often not interpretable, so that the models cannot be used for production optimization management guidance; for an intelligent algorithm with interpretability, such as a random forest, the prediction performance is relatively poor. The loss of either factor of high prediction accuracy and interpretability can cause the intelligent algorithm to have low efficiency in the application of actual reservoir capacity prediction scenes. Secondly, in the reservoir productivity prediction context, feature selection is generally deployed manually, so that the feature selection is lack of intelligence, and the fixed feature also causes the model to lack of generalization capability and robustness.
Therefore, how to research and design a reservoir productivity prediction method, system and terminal based on algorithm integration and self-control is a problem which is urgently needed to be solved at present.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a reservoir productivity prediction method, a system and a terminal based on algorithm integration and self-control, and the reservoir productivity prediction method capable of realizing model self-organization control is established by integrating a plurality of intelligent algorithms. The method has high predictive performance, and simultaneously, the method has interpretability; in addition, the method has generalization capability and robustness, and can effectively avoid model overfitting.
The technical purpose of the invention is realized by the following technical scheme:
in a first aspect, a reservoir productivity prediction method based on algorithm integration and self-control is provided, which comprises the following steps:
acquiring productivity test data of a target reservoir, and preprocessing the productivity test data to obtain a first characteristic set;
sorting the first feature set by using two interpretable intelligent algorithms of an interpretable elevator and a LightGBM respectively to obtain a second feature set formed by main control factors, performing feature dimension reduction on the first feature set by using a first manifold dimension reduction method to obtain a third feature set, and performing feature dimension reduction on the first feature set by using a second manifold dimension reduction method to obtain a fourth feature set;
respectively carrying out data division on the first feature set, the second feature set, the third feature set and the fourth feature set, and fusing to obtain a training set and a testing set;
sequentially establishing a first prediction model based on an induction learning algorithm, a second prediction model based on an interpretable hoisting machine and a third prediction model based on a LightGBM according to a training set, and processing the first prediction model, the second prediction model and the third prediction model based on weight values to obtain a final prediction model;
drawing a partial dependency graph between the main control factors and the reservoir productivity based on an interpretable intelligent algorithm;
and predicting according to the test set and the final prediction model to obtain the predicted yield under the single factor, and combining a partial dependence graph to obtain the reservoir productivity.
Further, the obtaining process of the first feature set specifically includes:
acquiring relevant parameters and capacity data of a finished production layer of the capacity test;
standardizing the acquired related parameters and productivity data, storing the standardized related parameters and productivity data in a target database, and combining the standardized characteristics to serve as a first characteristic set;
and filling up missing data in the acquired data by adopting a generation countermeasure network.
Further, the obtaining process of the second feature set specifically includes:
respectively fitting the training sets divided by the first feature set by using two interpretable intelligent algorithms of an interpretable elevator and a LightGBM, and respectively sequencing the first feature set according to the fitting effect;
and respectively calculating the importance scores of all the features in the first feature set under the two interpretable models, reordering the importance of all the features by adopting a performance weighting mode, and identifying main control factors influencing the rock formation productivity according to the reordering result to obtain a second feature set.
Further, the first manifold dimension reduction method adopts a distribution domain embedding algorithm, and the acquisition process of the third feature set specifically comprises the following steps:
calculating the similarity of the sample set in an observation space, and converting the Euclidean distance between two samples in the observation space into conditional probability;
representing the joint probability distribution among the data points in the same space by using the corresponding conditional probability distribution;
and minimizing the difference between the similarity of the high-dimensional sample and the similarity of the low-dimensional sample to obtain a third feature set after the optimal dimension reduction.
Further, the second manifold dimension reduction method adopts a local linear embedding algorithm, and the acquisition process of the fourth feature set specifically comprises the following steps:
calculating k nearest neighbors of each sample in the sample set;
calculating a corresponding local variance matrix for each sample, and solving a corresponding weight coefficient vector;
combining the weight coefficient vectors of each sample into a weight coefficient matrix, and calculating to obtain a matrix M;
calculating the first N eigenvalues of the matrix M, and calculating eigenvectors corresponding to the first N eigenvalues;
and extracting a matrix formed by the second eigenvector and the Nth eigenvector to obtain a low-dimensional sample set matrix.
Furthermore, the first feature set, the second feature set, the third feature set and the fourth feature set are divided into corresponding training sets and test sets by adopting a K-fold cross validation method.
Further, the process of obtaining the final prediction model specifically includes:
respectively constructing a first prediction model, a second prediction model and a third prediction model aiming at four feature combinations of a first feature set, a second feature set, a third feature set and a fourth feature set according to data in a training set;
respectively obtaining 12 loss values of the training process of the first prediction model, the second prediction model and the third prediction model, and distributing weight values for 12 base learners according to the loss values based on a performance weighting method;
and calculating to obtain the weight corresponding to each prediction model according to the weight distribution result of the base learner, and performing integrated calculation on each prediction model and the corresponding weight to obtain a final prediction model.
Furthermore, the partial dependency graph is drawn by selecting the interpretable intelligent algorithm with the best fitting effect from the interpretable intelligent algorithms of the interpretable elevator and the LightGBM.
In a second aspect, there is provided a reservoir capacity prediction system based on algorithm integration and self-control, comprising:
the data acquisition module is used for acquiring productivity test data of a target reservoir and preprocessing the productivity test data to obtain a first characteristic set;
the data processing module is used for sequencing the first feature set by adopting two interpretable intelligent algorithms of an interpretable hoister and a LightGBM to obtain a second feature set formed by main control factors, performing feature dimension reduction on the first feature set by adopting a first manifold dimension reduction method to obtain a third feature set, and performing feature dimension reduction on the first feature set by adopting a second manifold dimension reduction method to obtain a fourth feature set;
the data division module is used for respectively carrying out data division on the first feature set, the second feature set, the third feature set and the fourth feature set, and fusing to obtain a training set and a test set;
the model building module is used for sequentially building a first prediction model based on an induction learning algorithm, a second prediction model based on an interpretable hoisting machine and a third prediction model based on a LightGBM according to the training set, and processing the first prediction model, the second prediction model and the third prediction model based on weight values to obtain a final prediction model;
the drawing module is used for drawing a partial dependency graph between the main control factors and the reservoir productivity based on the interpretable intelligent algorithm;
and the prediction module is used for predicting to obtain the predicted yield under the single factor according to the test set and the final prediction model and obtaining the reservoir productivity by combining a part of dependency graphs.
In a third aspect, there is provided a computer terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the method for predicting reservoir capacity based on algorithm integration and self-control according to any one of the first aspect.
Compared with the prior art, the invention has the following beneficial effects:
1. the calculation method is simple and convenient, and a complex mathematical physical model needs to be established for the traditional reservoir yield prediction based on numerical simulation; however, due to the complexity of the real environment, the real conditions cannot be accurately simulated, the difficulty of model establishment is high, and the calculation complexity is high; according to the invention, the reservoir productivity is predicted in an intelligent algorithm integration mode, all data are from historical observation, and manual intervention is reduced;
2. the invention gives consideration to the precision and interpretability of reservoir productivity prediction, integrates a high-precision intelligent algorithm (an induction learning algorithm) and an interpretable intelligent algorithm (an interpretable hoisting machine, LightGBM), can realize the high-precision prediction of the reservoir productivity, can also combine a plurality of interpretable models to identify main control factors influencing the reservoir productivity, and provides a basis for efficient production optimization management.
3. The method is advanced in calculation method and has good model generalization capability, the data self-organization feature generation strategy based on the model combined screening main control factors, the induction learning algorithm and the performance weighting strategy aiming at the base learner can carry out intelligent self-organization control, so that the prediction model can timely adjust the model structure of the prediction model when the uncertainty change of the prediction target and the external condition occurs, and further, the generalization capability and the robustness of the model are enhanced; the induction learning algorithm adopted by the invention can be effectively applied to the small sample prediction problem, and the characteristic provides an effective solution for poor prediction performance caused by less sample data at the initial stage of reservoir development, so that the method can be earlier and better applied to the current reservoir development project compared with other methods; in addition, the lifting interpreter and the LightGBM adopted by the invention not only have interpretability, but also have improved accuracy compared with a random forest algorithm and an xgboost algorithm. More importantly, the light-weight memory usage and the faster training speed of the interpreter and the LightGBM can be improved, so that the interpreter and the LightGBM are more suitable for practical industrial application scenes.
4. The invention realizes the intellectualization of the filling of the missing data, and the traditional data filling method needs to establish a plurality of models to fill the missing values with different attributes for a multivariable missing mode or any missing mode, so the calculated amount is overlarge; the method adopts the generation of the fitting data distribution of the confrontation network to carry out data filling, and can complete the filling of the multivariable missing model at one time after the training is finished.
5. The invention realizes the intellectualization of the characteristic engineering. Reservoir structures are very complex and irregular, and the features of manual deployment lack accuracy and generalization capability. According to the method, two manifold dimension reduction methods (t-distribution field embedding algorithm and local linear embedding algorithm) and the enhanced feature set of the data self-organization feature generation construction prediction model based on the model combined screening main control factors are adopted, so that the productivity prediction can be well carried out, and the generalization capability and the robustness of the model are improved.
In conclusion, the reservoir productivity prediction method based on algorithm integration and self-control can fully mine the potential relation between the characteristics and the prediction target, and avoid the expansion of errors to the maximum extent; by integrating the intelligent algorithm, the feature set is enhanced and the model is self-organized, so that the generalization capability and robustness of the model are effectively improved, the over-fitting phenomenon is effectively avoided, and the method is superior to other intelligent learning methods. By integrating the high-precision prediction model and the interpretable prediction model, a high-efficiency reservoir productivity prediction model is built, main control factors influencing productivity can be identified, and a new idea is provided for reservoir productivity prediction and production management optimization.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a flow chart of an implementation in an embodiment of the invention;
FIG. 2 is a flowchart illustrating the implementation of the screening of the master factors in the embodiment of the present invention;
fig. 3 is a schematic diagram of a data missing value filling framework in an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and the accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not used as limiting the present invention.
The embodiment is as follows: the reservoir productivity prediction method based on algorithm integration and self-control is specifically realized by the following steps as shown in fig. 1.
Step 1: and (4) collecting and preprocessing data.
Step 11: and acquiring relevant parameters and capacity data of the finished production layer of the capacity test. Assume that the original dataset is a matrix A of dimension n × (m +1), A being expressed as:
Figure BDA0003108312020000051
wherein, each row corresponds to a reservoir, the first m elements of each row correspond to m influencing factors of the capacity of the reservoir, and the m +1 th element is the capacity of the reservoir.
Step 12: and performing maximum-minimum standardization processing on the acquired data and storing the data in a target database. And the combination of features obtained at this time is taken as feature set 1.
The max-min normalization formula is as follows:
Figure BDA0003108312020000052
wherein x is ab As a value after normalization, X ab Is an original value, X min Is X ab Minimum value of column, X max Is X ab Maximum value of the column.
At this time, the raw data can be expressed as:
Figure BDA0003108312020000061
where B is denoted as feature set 1.
B is expressed as:
Figure BDA0003108312020000062
or B ═ x 1 ,x 2 ,...,x m )。
Step 13: the missing data is filled by using a generation countermeasure network, as shown in fig. 3, and the method is specifically implemented by the following steps.
1) After replacing the missing value NA with 0, the ith sample K in the real data set containing the missing value i And corresponding missing position vector V i Splicing to obtain the input vector of the generator
Figure BDA0003108312020000063
The output of the generator is noted as:
Figure BDA0003108312020000064
2) calculating to obtain a final filling result: g i =G i (1-V i )+K i ·V i (ii) a The formula leaves the non-missing values unchanged and the missing values generated by the generator are retained as padding values.
3) The data set filled by the generator and the complete real data set are input into a discriminator to be classified into two categories so as to distinguish data sources.
4) During training, the generator aims to let the arbiter consider the filled data to be from the real data set, and the gradient of parameter update is from the arbiter's sample evaluation of the generator's generated result, so as to minimize log (1-D (G) i ) ) is a target; the discriminator targets the identification of differences between the real dataset and the filler dataset with parameters updated to maximize log (D (R) i ))+log(1-D(G i ) ) is targeted; where D (-) is the discriminator output and R is the true complete data set. The input of the discriminator is a complete real data set, and the output is close to 1; and when the input contains padding data, the value is close to 0, so that the data source is distinguished.
5) Through the continuous game of the generator and the discriminator, the discriminator can not identify the data source, and an effective generator is obtained for filling missing values in the data set.
It should be noted that, the first feature set after the missing data is filled may be adopted for the subsequent processing by applying the first feature set, or may be adopted before the filling.
And 2, step: and (5) characteristic engineering.
Step 21: and (4) identifying master factors and visualizing a partial dependency graph.
And (3) sequencing the feature importance by using two interpretable intelligent algorithms (interpretable hoisting machines, LightGBM), and re-sequencing the feature importance by combining the two interpretable intelligent algorithms in a performance weighting mode, identifying main control factors influencing the reservoir productivity, and generating a feature set 2. And drawing a partial dependence graph of the master factors. As shown in fig. 2, this is achieved by the following steps.
(1) Obtaining a raw normalized data set A by step 1 scale Thereafter, the data is divided into a training set and a test set (step 3).
(2) The interpretable elevator and LightGBM are used to fit the training set, and the R-square is used to evaluate the fit effect of the model on the training set. After the interpretable hoister is used for fitting the training set, the square R of the evaluation index obtained by calculation is recorded as,
Figure BDA0003108312020000071
after the LightGBM is used for fitting the training set, the square R of the obtained evaluation index is calculated and recorded as,
Figure BDA0003108312020000072
(3) after the model fitting is completed, calculating the importance scores of the characteristics under the two interpretable models, and respectively marking the importance score of the ith characteristic under the two interpretable models as the importance score
Figure BDA0003108312020000073
And
Figure BDA0003108312020000074
the feature importance score is the importance of the kini and is calculated by the following steps:
a. calculate the Gini index GI (a) for each node. The following formula can be used for calculation:
Figure BDA0003108312020000075
wherein K represents K categories, p 2 (k | a) represents the proportion of the class k in the node a.
b. Computing feature x i Importance at node a. That is, the variation of the Gini indexes before and after the node a is branched is recorded as GID ia . I.e., the sum of the kini indices of each node minus the kini indices of the child nodes. The following calculation formula can be adopted:
GID ia =GI(a)-GI(l)-GI(r)
wherein, gi (l) and gi (r) respectively represent the kini indexes of two new nodes after the node a branches.
c. If the feature x i The node appearing in decision tree j is in set M, then x i The importance in decision tree j is score ij . The following calculation formula can be adopted:
score ija∈M GID ia
d. assuming that there are t decision trees in the model in total, then feature x i Of importance is score i . The following calculation formula can be adopted:
Figure BDA0003108312020000081
e. characteristic x i The importance of is score i And (6) carrying out normalization processing. The following formula may be used:
Figure BDA0003108312020000082
the importance score value can be between 0 and 1, and a larger value represents a larger importance of the node.
(4) The final importance score of each feature is calculated by using a performance weighting method, and the calculation formula for the ith feature is as follows:
Figure BDA0003108312020000083
(5) and sorting the characteristics in a descending order according to the importance scores, and taking the first N characteristics as main control factors.
(6) Comparison
Figure BDA0003108312020000084
And
Figure BDA0003108312020000085
and selecting the algorithm with the maximum R square value, namely the interpretable intelligent algorithm with the best fitting effect. And drawing a partial dependence graph between the main control factors and the reservoir productivity based on the selected intelligent algorithm.
The partial dependency graph reflects how a certain characteristic affects the prediction result, and the specific steps are as follows:
a. the relationship between the influencing factors and the reservoir productivity in the intelligent algorithm is assumed as follows:
Figure BDA0003108312020000086
wherein the content of the first and second substances,
Figure BDA0003108312020000087
represents the predicted value, x, corresponding to the sample i ij Representing the jth eigenvalue of sample i.
b. Assume the kth feature x k For the main control factor, the characteristic value x is fixed i1 ,x i2 ,...,x i(k-1) ,x i(k+1) ,...,x im Keeping the characteristic value x unchanged, and changing the characteristic value x from small to large ik Calculating the feature x under different values k Partial dependence on reservoir productivity
Figure BDA0003108312020000088
The following calculation formula can be adopted:
Figure BDA0003108312020000089
c. by the feature x k The value of (a) is the abscissa,
Figure BDA00031083120200000810
drawing a partial dependence graph for the ordinate to visualize reservoir productivity as a function of a master control factor x k To assist the relevant personnel in making management decisions. If the change curve is in an ascending trend, the influence factor value tends to be increased so as to improve the productivity of the reservoir; and vice versa.
Step 22: and (3) performing feature dimension reduction on the m-dimensional feature set 1 to 2 dimensions by using a manifold dimension reduction method t-distribution field embedding algorithm to generate a feature set 3. Given sample set D ═ x 1 ,x 1 ,...,x n The specific steps of the embedded algorithm in the t-distribution field are as follows:
(1) and calculating the similarity of the sample set D in the observation space. Two samples x in the observation space i And x j The Euclidean distance between the two is converted into a conditional probability p j|i 。p j|i The calculation formula of (a) is as follows:
Figure BDA0003108312020000091
wherein σ i Is the standard deviation of the gaussian distribution.
Two samples x i And x j The mapped point in the lower dimension is denoted as y i And y j Calculating the conditional probability q of similarity thereof ji 。q ji The calculation formula of (c) is as follows:
Figure BDA0003108312020000092
(2) representing joint probability distribution p among data points in the same space by corresponding conditional probability distribution ij And q is ij . The following formula may be used:
Figure BDA0003108312020000093
Figure BDA0003108312020000094
(3) and minimizing the difference between the similarity P of the high-dimensional sample and the similarity Q of the low-dimensional sample to obtain the optimal dimension reduction result. The difference between the two samples is measured using the KL divergence and can be expressed as follows:
Figure BDA0003108312020000095
step 23: and (3) performing feature dimension reduction on the m-dimensional feature set 1 to 2 dimensions by using a manifold dimension reduction method local linear embedding algorithm to generate a feature set 4.
Given sample set D ═ x 1 ,x 2 ,...,x n The specific steps of the local linear embedding algorithm are as follows:
1) calculate each sample x i K nearest neighbors K i =(x i1 ,x i2 ,...,x ik )。
2) For each sample x i Calculating its local variance matrix and finding the corresponding weight coefficient vector W i
3) Weighting coefficient vector W of each sample i And combining the weight coefficient matrix W and calculating to obtain a matrix M. The calculation formula of the matrix M is as follows:
M=(I-W)(I-W) T
wherein I is an identity matrix.
4) And calculating the first 3 eigenvalues of the matrix M, and calculating eigenvectors corresponding to the first 3 eigenvalues.
5) And extracting a matrix formed by the second eigenvector and the 3 rd eigenvector, namely the solved low-dimensional sample set matrix.
And 3, step 3: and dividing a training set and a testing set. And dividing the data set into a training set and a testing set by adopting a K-fold cross validation mode.
And 4, step 4: and constructing an intelligent algorithm integrated prediction model.
Step 41: according to data in the training set, an induction learning algorithm is respectively constructed for four feature combinations (feature set 1, feature set 2, feature set 3 and feature set 4), and an elevator and a LightGBM model can be explained.
The principle of the inductive learning algorithm is as follows:
the induction learning algorithm has a multilayer neuron network structure, and can establish a high-order polynomial relationship between independent variables and dependent variables to obtain a polynomial model with an explanatory capability on the dependent variables. The concrete implementation steps are as follows:
a. an initial network architecture is generated. D (d is the number of input variables) neurons of the first layer are generated, and an initial network structure is constructed.
b. And calculating the external criterion value of each intermediate candidate model, and selecting a part of models with better external criterion values to enter the next layer. Selecting a minimum deviation criterion as a neuron selection criterion, wherein the calculation formula is as follows:
Figure BDA0003108312020000101
wherein Y is the output value of the dependent variable, N is the sample set size, Y jkl The predicted value of the kth neuron in the jth layer for the ith sample.
Step 42: weights are assigned to 12(═ 3 × 4) base learners based on a performance weighting method. Assuming an inductive learning algorithm, the loss value of each model training of the elevator and LightGBM is
Figure BDA0003108312020000102
Respectively corresponding to three intelligent algorithms facing four feature sets. The weight corresponding to the induction learning algorithm trained by the characteristic set i is applied
Figure BDA0003108312020000103
The following formula can be used for calculation:
Figure BDA0003108312020000104
step 43: processing the result of the base learner based on the weight value to obtain a corresponding prediction model result; thereby constructing a prediction model.
And 5: reservoir productivity prediction based on algorithm integration-data self-organizing control.
Step 51: and (4) integrating the prediction model according to the test set obtained by dividing in the step (3) and the intelligent algorithm in the step (4) to obtain the predicted yield corresponding to the test set. Assume that the prediction results for k, 12(═ 3 × 4) basis learners in the test set are
Figure BDA0003108312020000111
Figure BDA0003108312020000112
The final predicted result
Figure BDA0003108312020000113
The following formula can be used for calculation:
Figure BDA0003108312020000114
step 52: and evaluating the prediction effect according to the mean square error and the R square index.
The calculation principle of the 2 evaluation indexes is as follows:
mean square error (MeanSquareError, MSE)
Figure BDA0003108312020000115
R square (R) 2 )
Figure BDA0003108312020000116
Where N is the test set sample volume, y n In order to be the true value of the value,
Figure BDA0003108312020000117
in order to predict the value of the target,
Figure BDA0003108312020000118
mean of true values. The smaller the MSE, R 2 The larger the model, the better the prediction performance.
The working principle is as follows: the reservoir productivity prediction method based on algorithm integration and self-control can fully mine the potential relation between the characteristics and the prediction target, and avoid the expansion of errors to the maximum extent; by integrating the intelligent algorithm, the feature set is enhanced and the model is self-organized, so that the generalization capability and robustness of the model are effectively improved, the over-fitting phenomenon is effectively avoided, and the method is superior to other intelligent learning methods. By integrating the high-precision prediction model and the interpretable prediction model, a high-efficiency reservoir productivity prediction model is built, main control factors influencing productivity can be identified, and a new idea is provided for reservoir productivity prediction and production management optimization.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. The reservoir productivity prediction method based on algorithm integration and self-control is characterized by comprising the following steps of:
acquiring productivity test data of a target reservoir, and preprocessing the productivity test data to obtain a first characteristic set;
sorting the first feature set by using two interpretable intelligent algorithms of an interpretable hoister and a LightGBM respectively to obtain a second feature set formed by main control factors, performing feature dimension reduction on the first feature set by using a first manifold dimension reduction method to obtain a third feature set, and performing feature dimension reduction on the first feature set by using a second manifold dimension reduction method to obtain a fourth feature set;
respectively carrying out data division on the first feature set, the second feature set, the third feature set and the fourth feature set, and fusing to obtain a training set and a testing set;
sequentially establishing a first prediction model based on an induction learning algorithm, a second prediction model based on an interpretable hoisting machine and a third prediction model based on a LightGBM according to a training set, and processing the first prediction model, the second prediction model and the third prediction model based on weight values to obtain a final prediction model;
drawing a partial dependency graph between the main control factors and the reservoir productivity based on an interpretable intelligent algorithm;
and predicting according to the test set and the final prediction model to obtain the predicted yield under the single factor, and combining a partial dependence graph to obtain the reservoir productivity.
2. The method for predicting reservoir productivity based on algorithm integration and self-control as claimed in claim 1, wherein the obtaining of the first feature set comprises:
acquiring relevant parameters and capacity data of a finished production layer of the capacity test;
standardizing the acquired related parameters and capacity data, storing the standardized related parameters and capacity data into a target database, and combining the standardized characteristics to serve as a first characteristic set;
and filling up missing data in the acquired data by adopting a generation countermeasure network.
3. The method for predicting reservoir productivity based on algorithm integration and self-control as claimed in claim 1, wherein the obtaining of the second feature set comprises:
respectively fitting the training sets divided by the first feature set by using two interpretable intelligent algorithms of an interpretable elevator and a LightGBM, and respectively sequencing the first feature set according to the fitting effect;
and respectively calculating the importance scores of all the features in the first feature set under the two interpretable models, reordering the importance of all the features by adopting a performance weighting mode, and identifying main control factors influencing the rock formation productivity according to the reordering result to obtain a second feature set.
4. The method for predicting the productivity of the reservoir based on the algorithm integration and the self-control as claimed in claim 1, wherein the first manifold dimension reduction method adopts a distribution domain embedding algorithm, and the acquisition process of the third feature set specifically comprises the following steps:
calculating the similarity of the sample set in an observation space, and converting the Euclidean distance between two samples in the observation space into conditional probability;
representing the joint probability distribution among the data points in the same space by using the corresponding conditional probability distribution;
and minimizing the difference between the similarity of the high-dimensional sample and the similarity of the low-dimensional sample to obtain a third feature set after the optimal dimension reduction.
5. The method for predicting reservoir productivity based on algorithm integration and self-control as claimed in claim 1, wherein the second manifold dimension reduction method adopts a local linear embedding algorithm, and the step of acquiring the fourth feature set specifically comprises:
calculating k nearest neighbors of each sample in the sample set;
calculating a corresponding local variance matrix for each sample, and solving a corresponding weight coefficient vector;
combining the weight coefficient vectors of each sample into a weight coefficient matrix, and calculating to obtain a matrix M;
calculating the first N eigenvalues of the matrix M, and calculating eigenvectors corresponding to the first N eigenvalues;
and extracting a matrix formed by the second eigenvector and the Nth eigenvector to obtain a low-dimensional sample set matrix.
6. The method for predicting productivity of a reservoir based on algorithm integration and self-control as claimed in any one of claims 1 to 5, wherein the first feature set, the second feature set, the third feature set and the fourth feature set are divided into corresponding training set and testing set by adopting a K-fold cross validation method.
7. The method for predicting the productivity of a reservoir based on the algorithm integration and the self-control as claimed in any one of claims 1 to 5, wherein the obtaining process of the final prediction model comprises:
respectively constructing a first prediction model, a second prediction model and a third prediction model aiming at four feature combinations of a first feature set, a second feature set, a third feature set and a fourth feature set according to data in a training set;
respectively obtaining 12 loss values of the training process of the first prediction model, the second prediction model and the third prediction model, and distributing weight values for 12 base learners according to the loss values based on a performance weighting method;
and calculating to obtain the weight corresponding to each prediction model according to the weight distribution result of the base learner, and performing integrated calculation on each prediction model and the corresponding weight to obtain a final prediction model.
8. The method as claimed in any one of claims 1 to 5, wherein the partial dependency graph is drawn by selecting the interpretable intelligent algorithm having the best fitting effect from two interpretable intelligent algorithms, namely, interpretable hoist and LightGBM.
9. The reservoir productivity prediction system based on algorithm integration and self-control is characterized by comprising the following steps:
the data acquisition module is used for acquiring productivity test data of a target reservoir and preprocessing the productivity test data to obtain a first characteristic set;
the data processing module is used for sorting the first feature set by adopting two interpretable intelligent algorithms of an interpretable hoister and a LightGBM respectively to obtain a second feature set formed by main control factors, performing feature dimension reduction on the first feature set by adopting a first manifold dimension reduction method to obtain a third feature set, and performing feature dimension reduction on the first feature set by adopting a second manifold dimension reduction method to obtain a fourth feature set;
the data division module is used for respectively carrying out data division on the first feature set, the second feature set, the third feature set and the fourth feature set and fusing to obtain a training set and a testing set;
the model building module is used for sequentially building a first prediction model based on an induction learning algorithm, a second prediction model based on an interpretable hoister and a third prediction model based on a LightGBM according to the training set, and processing the first prediction model, the second prediction model and the third prediction model based on the weight values to obtain a final prediction model;
the drawing module is used for drawing a partial dependency graph between the main control factors and the reservoir productivity based on the interpretable intelligent algorithm;
and the prediction module is used for predicting to obtain the predicted yield under a single factor according to the test set and the final prediction model and obtaining the reservoir productivity by combining a part of dependency graphs.
10. A computer terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the program implements the method for reservoir capacity prediction based on algorithm integration and self-control as claimed in any one of claims 1 to 7.
CN202110643996.8A 2021-06-09 2021-06-09 Reservoir productivity prediction method, system and terminal based on algorithm integration and self-control Active CN113283174B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110643996.8A CN113283174B (en) 2021-06-09 2021-06-09 Reservoir productivity prediction method, system and terminal based on algorithm integration and self-control

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110643996.8A CN113283174B (en) 2021-06-09 2021-06-09 Reservoir productivity prediction method, system and terminal based on algorithm integration and self-control

Publications (2)

Publication Number Publication Date
CN113283174A CN113283174A (en) 2021-08-20
CN113283174B true CN113283174B (en) 2022-08-30

Family

ID=77283970

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110643996.8A Active CN113283174B (en) 2021-06-09 2021-06-09 Reservoir productivity prediction method, system and terminal based on algorithm integration and self-control

Country Status (1)

Country Link
CN (1) CN113283174B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516818A (en) * 2019-05-13 2019-11-29 南京江行联加智能科技有限公司 A kind of high dimensional data prediction technique based on integrated study technology
CN112069567B (en) * 2020-08-07 2024-01-12 湖北交投十巫高速公路有限公司 Method for predicting compressive strength of concrete based on random forest and intelligent algorithm
CN111932039A (en) * 2020-09-29 2020-11-13 北京交通大学 Train arrival late prediction method and device, electronic equipment and storage medium
CN112258251B (en) * 2020-11-18 2022-12-27 北京理工大学 Grey correlation-based integrated learning prediction method and system for electric vehicle battery replacement demand
CN112906298B (en) * 2021-02-05 2023-05-26 重庆邮电大学 Blueberry yield prediction method based on machine learning

Also Published As

Publication number Publication date
CN113283174A (en) 2021-08-20

Similar Documents

Publication Publication Date Title
CN109800863B (en) Logging phase identification method based on fuzzy theory and neural network
CN116448419A (en) Zero sample bearing fault diagnosis method based on depth model high-dimensional parameter multi-target efficient optimization
CN110135635A (en) A kind of region electric power saturation load forecasting method and system
CN114944053A (en) Traffic flow prediction method based on spatio-temporal hypergraph neural network
CN104732067A (en) Industrial process modeling forecasting method oriented at flow object
CN114548591A (en) Time sequence data prediction method and system based on hybrid deep learning model and Stacking
Zhang et al. Support vector machine weather prediction technology based on the improved quantum optimization algorithm
Wang et al. Combined digital twin and hierarchical deep learning approach for intelligent damage identification in cable dome structure
CN108830407B (en) Sensor distribution optimization method in structure health monitoring under multi-working condition
Haixiang et al. Optimizing reservoir features in oil exploration management based on fusion of soft computing
Zhang et al. Zero-small sample classification method with model structure self-optimization and its application in capability evaluation
Pimenov et al. Interpretation of a trained neural network based on genetic algorithms
CN113033898A (en) Electrical load prediction method and system based on K-means clustering and BI-LSTM neural network
Wang et al. Mutual information-weighted principle components identified from the depth features of stacked autoencoders and original variables for oil dry point soft sensor
CN112711912A (en) Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm
CN113283174B (en) Reservoir productivity prediction method, system and terminal based on algorithm integration and self-control
Singaravel et al. Explainable deep convolutional learning for intuitive model development by non–machine learning domain experts
CN116629352A (en) Hundred million-level parameter optimizing platform
Gao et al. Establishment of economic forecasting model of high-tech industry based on genetic optimization neural network
CN108596781A (en) A kind of electric power system data excavates and prediction integration method
CN112465253B (en) Method and device for predicting links in urban road network
CN114444763A (en) Wind power prediction method based on AFSA-GNN
CN114254828A (en) Power load prediction method based on hybrid convolution feature extractor and GRU
CN106529725A (en) Gas outburst prediction method based on firefly algorithm and SOM network
CN113095466A (en) Algorithm of satisfiability model theoretical solver based on meta-learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant