CN110807544A

CN110807544A - Oil field residual oil saturation distribution prediction method based on machine learning

Info

Publication number: CN110807544A
Application number: CN201910951088.8A
Authority: CN
Inventors: 宋洪庆; 张启涛; 李正一; 都书一; 王九龙
Original assignee: University of Science and Technology Beijing USTB
Current assignee: Qingdao Dongkunwei Huashuzhi Energy Technology Co ltd
Priority date: 2019-10-08
Filing date: 2019-10-08
Publication date: 2020-02-18
Anticipated expiration: 2039-10-08
Also published as: CN110807544B

Abstract

The invention relates to a method for predicting the saturation distribution of residual oil in an oil field, which comprises the following steps: (1) acquiring a sample data set from historical data of an oil field block, wherein the sample data set comprises dynamic sample data and static sample data; (2) carrying out normalization processing on the sample data set; (3) performing characteristic relevance compression on the static sample data in the normalized sample data set; (4) performing dimensionality reduction on the normalized and compressed sample data set on the basis of retaining the time dimension; (5) dividing the sample data set which is subjected to normalization compression and dimension reduction processing to obtain a training set and a test set; (6) constructing an input set of a training set and an input set of a testing set; (7) training a weight matrix and a bias item of a training set input set by using a machine learning method, and performing reinforced training on a key data unit to obtain an optimal training model; (8) and obtaining a test set output set according to the optimal training model, and performing inverse normalization and dimension increasing processing. The method can be applied to rapid prediction of the residual oil exploitation capacity under the current complex geological condition, and has high prediction accuracy and adaptability.

Description

Oil field residual oil saturation distribution prediction method based on machine learning

Technical Field

The invention belongs to the field of oilfield development, relates to a method for predicting the saturation distribution of residual oil in an oilfield, and particularly relates to a method for predicting the distribution of the residual oil based on a machine learning algorithm.

Background

In the process of oil field development, due to the heterogeneity and production mode of a reservoir, a large amount of residual oil is usually produced in the reservoir, and prediction of the distribution of the residual oil has important value on oil field production. Therefore, the accurate prediction of the distribution of the oil saturation of the oil reservoir is beneficial to making a reasonable development technical policy, so that the potential of the residual oil of the oil reservoir is better excavated, and the accuracy of the prediction result is directly related to the effect of future development. Numerical simulation is widely applied to the petroleum industry, and a set of problems of a nonlinear parabolic partial differential equation, an auxiliary equation, a boundary condition and an initial condition are solved through a numerical method by utilizing numerical simulation of an oil reservoir, so that the oil saturation and the pressure distribution of the oil reservoir are calculated. After decades of development, the reservoir numerical calculation theory is gradually mature, and various simulation techniques and methods are continuously developed. However, for complex models, history matching, and prediction calculations, which take a long time, the prediction cost is relatively high. With the rapid development of computer technology, machine learning and artificial intelligence technology are gradually applied to the petroleum industry, and machine learning is a multi-field cross subject and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The machine learning method is mainly applied to analyzing data, pattern searching and predicting target variables, specially studies how a computer simulates or realizes human learning behaviors so as to acquire new knowledge or skills, and reorganizes an existing knowledge structure to continuously improve the performance of the computer. In the petroleum industry, a simple machine learning method is only used for reference at present, and the problem encountered in the petroleum exploitation process cannot be treated in a targeted manner.

Disclosure of Invention

Aiming at the problem that the conventional oil reservoir numerical simulation technology cannot rapidly solve complex oilfield exploitation prediction, the invention innovatively provides a machine learning method-based residual oil saturation distribution prediction method. The system designed by the method can be applied to rapid prediction of the residual oil exploitation capacity under the current complex geological condition, has high prediction accuracy and adaptability, is high in calculation speed, can avoid the phenomenon that sample data is weakened at key nodes in the oil exploitation process by a general machine learning method, and can well solve the complex oil field exploitation prediction problem.

In order to achieve the above object, the machine learning algorithm used in the present invention is used as a main method of the prediction calculation process, and the basic prediction process is as follows:

(1) and acquiring a sample data set from historical data of the oilfield block, wherein the sample data set comprises dynamic sample data and static sample data. In the step (1), various historical data of the oilfield blocks are innovatively introduced, a result is predicted by using various data, and the potential influence of multiple factors is considered.

(2) And carrying out normalization processing on the sample data set.

(3) And performing characteristic relevance compression on the static sample data in the normalized sample data set. In the step (2), a characteristic relevance compression method is innovatively introduced, so that the number of static data samples is compressed on the premise of ensuring the prediction accuracy, and the operation efficiency is improved.

(4) And performing dimensionality reduction on the normalized and compressed sample data set on the basis of retaining the time dimension.

(5) The normalized compressed and dimension-reduced sample data set is segmented to obtain a training set and a test set.

(6) An input set of a training set and an input set of a test set are constructed.

(7) And training the weight matrix and the bias items of the training set input set by using a machine learning method, and performing reinforced training on the key data units to obtain an optimal training model. In the step (7), a strengthened training method is innovatively applied, so that the phenomenon that the sample data of the key data unit is weakened due to long-time training is avoided.

(8) And obtaining a test set output set according to the optimal training model, and performing inverse normalization and dimension increasing treatment to further obtain a prediction result of the oil field residual oil saturation distribution.

(9) The validity of the prediction results after inverse normalization and upscaling was verified using the mean absolute relative error method (AARD).

Wherein, the historical data of the oilfield block in the step (1) often comprises static data and dynamic data. More kinds of historical data may make the accuracy of the machine learning algorithm higher, but more historical data may also increase the computation time.

Further, in order to reduce the time consumption of calculation, the feature relevance compression in step (3) can improve the efficiency of machine learning. The feature relevance compression method is characterized in that static sample data which does not change along with time in the sample data is compressed through solving of a covariance matrix, so that the static sample data is changed into one-dimensional feature sample data capable of representing multi-dimensional static sample data, the feature sample data can represent original static sample data, the calculated amount in a machine learning cycle process can be reduced, and the calculation efficiency is further improved.

Further, in order to reduce the computation time, the dimension reduction process in step (4) may improve the efficiency of machine learning. The dimension reduction processing means that original high-dimension sample data is converted into a one-dimension vector form through a preprocessing reading method on the premise that the sample data is not lost, and the dimension reduction method can obviously improve the calculation efficiency.

The sample data of the key data node in the step (7) has a relatively obvious difference with the sample data around the key data node, but the weakening phenomenon of the production data can occur along with the iteration of the machine learning method.

The weakening phenomenon of the production data refers to the phenomenon that the sample data of the key node has less difference than the original sample data. The phenomenon is caused by the fact that the machine learning method does not have a real physical condition to correct the machine learning method, sample data of the key nodes should have difference with sample data around under the real physical condition, but the calculation efficiency of the machine learning method is obviously reduced by introducing a physical law into the machine learning method.

Further, in order to correct the weakening phenomenon of the key node, the intensive training method in the step (7) is to judge whether the key node is weakened according to a variation difference of the sample data in the time dimension, and then perform intensive correction on the key node according to a first-order mean value and a second-order mean value in the time dimension, so that the key node is more in line with the real physical condition, the accuracy of machine learning is improved, and meanwhile, the calculation efficiency of the machine learning method is not reduced.

The beneficial technical effects of the invention are as follows:

1. the method designs a computational model framework based on a machine learning algorithm, predicts production data in oil field development, and improves the accuracy of prediction by introducing various oil field production data.

2. The method provides a feature relevance compression method, which is used for performing dimensional compression on static data on the premise of ensuring inconvenient prediction accuracy and improving the calculation efficiency.

3. The method introduces a sample data strengthening training method, aims at the phenomenon that the sample data of the key nodes is weakened, and uses a forced value-added method according to the time dimension to ensure that the sample data of the key nodes is not assimilated by the data of the surrounding units, thereby further conforming to the physical meaning.

In order to solve the above technical problems, the above advantageous technical effects are achieved. The specific technical scheme of the invention is as follows:

the invention relates to a method for predicting the saturation distribution of residual oil in an oil field, which comprises the following steps:

the method comprises the following steps of (1) collecting various production parameters of an oil field block in a time dimension to construct a sample data set, wherein the sample data set comprises dynamic sample data and static sample data, extracting 4 production parameters of historical residual oil saturation, formation pressure, oil production and water production as the dynamic sample data, and extracting 3 production parameters of porosity, permeability and residual water saturation as the static sample data; generating a sample data set F ═ x with a 5-dimensional matrix_i,j,k,n,tWherein x is a certain number of samplesAccordingly, i is the ith row of the sample data, j is the jth column, and k is the kth layer; n is the nth production parameter, and the production parameters are arranged according to the sequence of historical residual oil saturation, formation pressure, oil production, water production, porosity, permeability and residual water saturation; t is the sample data of the tth month, and the units of the sample data are dimensionless;

step (2), the sample data set F in the step (1) is normalized, the sample data of different production parameters in the sample data set F is normalized, all the sample data are processed by using a normalization processing method, and the sample data set after normalization processing is obtained

Step (3), performing characteristic relevance compression on the static sample data in the sample data set after normalization processing in the step (2); classifying the sample data set after the normalization processing in the step (2) by using production parameters to express that:

wherein

The dynamic sample data of the oilfield block comprises historical remaining oil saturation

Formation pressure

Oil production

Water yield

4, the static sample data of the oilfield block comprises porosity

Permeability rate of penetrationResidual water saturation

These 3 kinds; performing feature relevance compression on the 3 types of static sample data to obtain a one-dimensional feature sample data vector subjected to feature relevance compression

Such that the one-dimensional feature sample data vector

Replace the original multi-dimensional static sample data set

Further obtaining a sample data set compressed by characteristic relevanceThe sample data set is compressed into 5 types from the original 7 types of production data in the production data dimension;

and (4) performing dimensionality reduction on the sample data set subjected to the characteristic relevance compression and obtained in the step (3) on the basis of keeping the time dimension to obtain the sample data set subjected to dimensionality reduction

Step (5), the sample data set obtained in the step (4) after the dimensionality reduction is divided to obtain a training set F _ train and a test set F _ test; preferably, the sample data set obtained in the step (4) after the dimensionality reduction treatment is taken

The first 80% of the data volume in the time dimension is used as a training set F _ train, and the rest 20% is used as a test set F _ test;

step (6), constructing an input set X _ train of a training set F _ train and an input set X _ test of a test set F _ test;

step (7), training the weight matrix and the bias item of the input set X _ train of the training set F _ train obtained in the step (6); preferably, a machine learning method is used for training a weight matrix and a bias item of an input set X _ train of a training set F _ train, and the key data unit is subjected to reinforced training to obtain an optimal training model;

and (8) obtaining a test set output set according to the optimal training model obtained in the step (7), and performing inverse normalization and dimension increasing treatment to obtain a prediction result of the oil field residual oil saturation distribution.

And (9) verifying the validity of the prediction result after inverse normalization and dimension-increasing processing by using an average absolute relative error (AARD) method.

In the step (2), the normalization processing method of each production parameter is expressed as follows:

wherein (x)_i,j,k,n,t)_minRepresents the minimum value in the data of the production parameter, (x)_i,j,k,n,t)_maxThe maximum value in the data representing such a production parameter,processing all sample data for the normalized production parameter data by using a normalization processing method to obtain a normalized sample data set

In step (3), a covariance matrix C is solved for the static sample data, and then the covariance matrix C can be obtained by the following formula:

then solving the feature matrix V and the feature vector of the covariance matrix CThe eigenvector and the characteristic matrix satisfy the equation

Wherein the feature vector

Is a one-dimensional vector, the characteristic matrix V is a 3 multiplied by 3 matrix, and the characteristic vector is selectedMaximum value λ of_maxThen correspondingly finding out the corresponding maximum value lambda in the feature matrix V_maxVector V of_maxIn which V is_max＝(v₁,v₂,v₃)，v₁、v₂、v₃Is a vector V _max3 vector values of; through the multi-dimensional static sample data set

And vector V_maxMultiplying to obtain one-dimensional characteristic sample data vector compressed by relevance characteristic

Namely, it is

Further obtaining a sample data set compressed by characteristic relevance

The sample data set has been compressed from the original 7 production data to 5 in the production data dimension.

Wherein, in the step (4), the sub-sample data set X of the time dimension is used_tExpressed as a four-dimensional matrix of sample data in month t, i.e.

Wherein

The subsample dataset X is then_tThe four-dimensional matrix i x j x k x n is reduced to a 1 x (i x j x k x n) one-dimensional column vector form, i.e. the sample data set of reduced dimension

The expression is as follows:normalizing the compressed sample data set

Can be reassembled into a new two-dimensional matrix state, i.e.

Wherein, [ … ]]^TThe transpose process is a matrix process, and thus the number of all sample data in the matrix in parentheses is not reduced, but is reordered.

In the step (8), performing weighted calculation on the test set input set X _ test and the optimal training model of the weight matrix and the bias item to obtain a test set output set Y _ test, and performing inverse normalization on the obtained test set output set Y _ test to obtain a physical parameter value with an actual size; further obtaining a predicted true value Y _ predict; the Y _ predict at this time is a t × (i × j × k × n) two-dimensional matrix to be restored to the initial i × j × k × n × t five-dimensional matrix by ascending dimension.

Drawings

FIG. 1 is a flow chart of the present invention for achieving residual oil saturation prediction based on a machine learning method.

FIG. 2 is a comparison diagram of the distribution of the residual oil saturation of a certain water injection and oil production block. Results from the sample dataset are on the left and results from machine learning training are on the right.

FIG. 3 is a comparison graph of pressure distribution of a water injection and oil production block. Results from the sample dataset are on the left and results from machine learning training are on the right.

Detailed Description

The invention is described in further detail below with reference to the figures and the specific embodiments;

the method comprises the following steps of firstly, obtaining a sample data set from historical data of an oil field block, wherein the sample data set comprises dynamic sample data and static sample data. Taking a certain water injection and oil production block as an example, the block has 25 rows, 24 columns and 34 layers of grids in three directions of a horizontal stratum and a vertical stratum, so that 25 × 24 × 34 data units exist, each data unit contains multiple months of various production data including permeability, porosity, residual oil saturation, formation pressure, water yield, oil yield and the like, 4 production parameters including the residual oil saturation, the formation pressure, the oil yield and the water yield are extracted as dynamic sample data, 3 production parameters including the porosity, the permeability and the residual water saturation are extracted as static sample data, and the oil field block has 120 months of data in a time dimension, so that a sample data set F x with a 5-dimension matrix can be generated_i,j,k,n,tWhere x is some sample data and i is the ith row of the sample data, i is 1,2, 3. j is the jth column,

j

1,2, 3. k is the kth layer, k ═ 1,2, 3.., 34; n is the nth production parameter, and the production parameters are arranged according to the sequence of residual oil saturation, formation pressure, oil production, water production, porosity, permeability and residual water saturation, wherein n is 1,2, 3. t is the sample data of the tth month, and t is 1,2, 3. For example, the remaining oil saturation of the data unit in the 3 rd row, 8 th column and 12 th layer in the 30 th month is sample data x in the sample data set_3,8,12,1,30。

And secondly, normalizing the sample data set. And carrying out normalization processing on the sample data of different production parameters in the sample data set F. Taking the oil field as an example, the normalization processing method of a certain production parameter is expressed as follows:

wherein (x)_i,j,k,n,t)_minRepresents the minimum value in the data of the production parameter, (x)_i,j,k,n,t)_maxThe maximum value in the data representing such a production parameter,

the normalized data of the production parameters. Similarly, all the sample data are processed by using a normalization processing method to obtain a normalized sample data set

And thirdly, performing characteristic relevance compression on the static sample data in the unified sample data set. Taking a certain water injection and oil recovery block as an example, if the normalized sample data set is classified by the production parameters, the normalized sample data set can be expressed as:

wherein

Thus, the dynamic data of the sample data set of the block comprises the saturation of the residual oil

Formation pressure

Oil production

Water yield

These 4, static sample data include porosity

Permeability rate of penetrationResidual water saturation

The covariance matrix C is solved for the static sample data, and then the covariance matrix C of the water-flooding oilfield can be obtained by the following formula:

wherein the content of the first and second substances,sample data for all porosities;sample data for all permeabilities;

for all of the residual water saturation sample data,

is a sample data set

And

the covariance of (a). Then solving the feature matrix V and the feature vector of the covariance matrix C

The eigenvectors and feature matrices should satisfy the equation

Wherein the feature vector

Is a one-dimensional vector, the characteristic matrix V is a 3 multiplied by 3 matrix, and the characteristic vector is selected

Maximum value λ of_maxThen correspondingly finding out the corresponding maximum value lambda in the feature matrix V_maxVector V of_maxIn which V is_max＝(v₁,v₂,v₃)，v₁、v₂、v₃Is a vector V _max3 vector values. Through the multi-dimensional static sample data setAnd vector V_maxMultiplying to obtain one-dimensional feature sample data vector compressed by relevance feature

Namely, it is

Further obtaining a sample data set compressed by characteristic relevanceThe sample data set has been compressed from the original 7 production data to 5 in the production data dimension.

And fourthly, performing dimensionality reduction on the normalized and compressed sample data set on the basis of the reserved time dimension. Taking the water injection and oil extraction block as an example, a time-dimension sub-sample data set X is obtained_tExpressed as a four-dimensional matrix of sample data in month t, i.e.

Wherein

Can be expressed as:

normalizing the compressed sample data set

Can be reassembled into a new two-dimensional matrix state, i.e.

Wherein, [ … ]]^TMeaning that the transpose process is applied to the matrix in parentheses, the transpose process is a matrix process such that the amount of all sample data in the matrix is not reduced, but is reordered.

And fifthly, segmenting the sample data set which is subjected to normalization compression and dimensionality reduction to obtain a training set and a test set. Typically, the sample dataset is sampledIn order to balance the data volume of the first 80% or so in the time dimension as the training set F _ train and the remaining 20% as the test set F _ test, only sample data can be used for verifying the accuracy of the training result, so at least a small part of sample data is left as the test set, the more training sets make the training result more accurate, but the less remaining test sets make the subsequent verification process not good, and generally, the sample data volume of the first 80% or so in the time dimension is suggested as the training set, and the remaining sample data is used as the test set. Taking the water injection and oil extraction block as an example, the data of the previous 100 months in the time dimension are taken out and used as the training set of the next step

The remaining last 20 months of data are used as a test set

And sixthly, constructing an input set of the training set and an input set of the test set. Setting a window matrix W_kThe matrix is a two-dimensional matrix of d x (i × j × k × n), and the sample data set can be compressed according to normalization and reduced in dimension

Obtaining W_kThus, therefore, it is

Where k is 1,2,., t-d +1, the unit is dimensionless, d is the search width, and the unit is dimensionless. While a smaller search width d increases the training frequency, but also decreases the effective ability of each training, so the search width is typically one tenth of the time dimension. Recombining partial window matrices W_kThen a training set input set X _ train may be constructed. Similarly, the test set also recombines the remaining window matrices W_kAnd is denoted as test set input set X _ test. Taking the water injection and oil extraction block as an example, if the block has 120 months of sample data, d may be taken to be 10, and further, all the window matrices of the block may be expressed as:therefore, from the training set F _ train, the training set input set X _ train ═ W can be obtained₁,W₂,W₃,...,W₁₀₀]^TAccording to the test set F _ test, the test set input set X _ test ═ W can be obtained₁₀₁,W₁₀₂,W₁₀₃,...,W₁₁₁]^T。

And seventhly, training the weight matrix and the bias items of the training set input set by using a machine learning method, and performing reinforced training on the key data units to obtain an optimal training model. The weight matrix is a training model, if a predicted value at a future moment is desired to be obtained, known data at the current moment is required to be multiplied by the weight matrix in a mapping mode, namely the known data is subjected to weighting solving, the obtained result is the predicted value, and the weight matrix capable of predicting the result most accurately is the optimal training model. The bias term isThe value of how much modification is needed to the weight matrix used in the previous cycle for each iteration determines how much the mapping values of the weight matrix in each cycle step are different. Taking some block data as an example, a weight matrix and a bias item in an initial state are set for the training set input set X _ train, and then X _ train can be weighted and calculated according to the weight matrix and the bias item to obtain a training set output set Y _ train ═ Y₁₁,Y₁₂,...,Y₁₀₀]^T，Y_t＝(y_i,j,k,n)_tHowever, the training result is not optimal at present, and Y _ train and the training set F _ train obtained at this time

And carrying out mapping difference calculation and averaging to obtain a loss function L. And solving a first order partial derivative and a second order partial derivative of the weight matrix and the bias item matrix by the loss function L to obtain a weight gradient and an updating amount of the weight gradient, and updating the weight matrix and the bias item matrix according to the updating amount of the weight gradient to obtain a new weight matrix and a new bias item matrix. In order to avoid the weakening of the production data in the critical data unit, the production data of the critical data unit needs to be strengthened. Taking the remaining oil saturation as an example, the position of a general oil production well is taken as a key data unit, and the block has 6 oil production wells in total, so that whether the remaining oil saturation at the well position is weakened or not can be judged by the following two formulas:

wherein the content of the first and second substances,

residual oil saturation data for a well at the previous time step,

for the remaining oil saturation data for the well at the current time step,

for 8 sample data around the well in the current time step, α is a fluctuation coefficient, α is 1.1-1.2, generally 1.15, β is a proximity coefficient, β is 1.1-1.2, generally 1.15, units are dimensionless, if the key data simultaneously satisfy two conditions in formula (2), the production data can be determined to be weakened, and for data enhancement, a first-order average value m of the residual oil saturation in the time dimension is taken_tAnd second order mean v_t：

The remaining oil saturation x for that well location should be adjusted_t+1According to a first-order mean value m_tAnd second order mean v_tUpdating:

η is a correction coefficient, η is 0.01-0.1, generally 0.1 is taken, and then the strengthening training of the key data unit is completed.

And eighthly, obtaining a test set output set according to the optimal training model, and performing inverse normalization and dimension increasing treatment to further obtain a prediction result of the oil field residual oil saturation distribution. Carrying out weighted calculation on the test set input set X _ test and the optimal training model of the weight matrix and the bias item to obtain a test set output set Y _ test, wherein Y _ test is [ Y _ test ═ Y₁₁₁,Y₁₁₂,...,Y₁₂₀]^T. And then carrying out inverse normalization on the obtained test set output set Y _ test, namely carrying out inverse processing on the normalization in the second step to obtain the physical parameter value of the actual size. And then the predicted true value Y _ predict is obtained. The Y _ predict at this time is a t × (i × j × k × n) two-dimensional matrix to be restored to on by upscalingI × j × k × n × t at the beginning. Taking the residual oil saturation data of the water injection and oil extraction block in the 120 th month period as an example, a 17 th layer residual oil saturation plane distribution diagram is taken, the left side in fig. 2-3 is an original result from a sample data set, and the right side is a result obtained from machine learning method training.

And a ninth step of verifying the validity of the test set output set after the inverse normalization and the dimension ascending processing by using an average absolute relative error (AARD) method. The average absolute relative error method (AARD) is calculated as follows:

y_i,j,k,n,trepresenting prediction data, x, in the predicted true value Y _ predict_i,j,k,n,tAnd representing the sample data of the time corresponding to the real value Y _ predict in the test set F _ test, wherein N represents the total amount of the sample data in the predicted real value Y _ predict and is dimensionless. In general, AARD<A10% training result can be considered valid, and the closer AARD is to 0, the smaller the relative deviation is, and the more accurate the predicted value is. Taking the oilfield block as an example, if the accuracy of the prediction result of the residual oil saturation is shown, y_i,j,k,n,tRepresents the remaining oil saturation data of the real value Y _ predict from month 111 to month 120, and x_i,j,k,n,tRepresenting the residual oil saturation data from month 111 to month 120 in the test set F _ test, the AARD of the current prediction model is 4.83%, AARD<10%, the accuracy is higher, and the optimal training model is effective.

Claims

1. The method for predicting the saturation distribution of the residual oil in the oil field is characterized by comprising the following steps of:

step (1), collecting multiple production parameters of an oil field block in a time dimension to construct a sample data set, wherein the sample data set comprises dynamic sample data and static sample data, extracting 4 production parameters of historical residual oil saturation, formation pressure, oil production and water production as the dynamic sample data, and extracting 3 production parameters of porosity, permeability and residual water saturation as the static sample dataAccordingly; generating a sample data set F ═ x with a 5-dimensional matrix_i,j,k,n,tWherein x is a certain sample data, i is the ith row of the sample data, j is the jth column, and k is the kth layer; n is the nth production parameter, and the production parameters are arranged according to the sequence of historical residual oil saturation, formation pressure, oil production, water production, porosity, permeability and residual water saturation; t is the sample data of the tth month, and the units of the sample data are dimensionless;

Step (3), performing characteristic relevance compression on the static sample data in the sample data set after normalization processing in the step (2); classifying the sample data set after the normalization processing in the step (2) by using production parameters to express that:wherein

Formation pressure

Oil production

Water yield4, the static sample data of the oilfield block comprises porosity

Permeability rate of penetrationResidual water saturation

Such that the one-dimensional feature sample data vector

Replace the original multi-dimensional static sample data set

Step (5), the sample data set obtained in the step (4) after the dimensionality reduction is divided to obtain a training set F _ train and a test set F _ test; preferably, the product obtained in step (4) is takenSample data set after dimensionality reduction

2. The method according to claim 1, further comprising a step (9) of verifying the validity of the prediction result after inverse normalization and upscaling using the mean absolute relative error method (AARD).

3. The method according to any one of claims 1 to 2, wherein in the step (2), the normalization processing method for each production parameter is expressed as follows:

for the normalized data of the production parameters, a normalization processing method is usedProcessing all the sample data to obtain a normalized sample data set

4. The method according to one of claims 1-3, wherein in step (3), a covariance matrix C is solved for the static sample data, and the covariance matrix C is obtained by the following formula:

then solving the feature matrix V and the feature vector of the covariance matrix C

The eigenvector and the characteristic matrix satisfy the equation

Wherein the feature vector

Maximum value λ of_maxThen correspondingly finding out the corresponding maximum value lambda in the feature matrix V_maxVector V of_maxIn which V is_max＝(v₁,v₂,v₃)，v₁、v₂、v₃Is a vector V_max3 vector values of; through the multi-dimensional static sample data setAnd vector V_maxMultiplying to obtain one-dimensional characteristic sample data vector compressed by relevance characteristic

Namely, it is

Further obtaining a sample data set compressed by characteristic relevance

5. The method according to one of claims 1 to 4, wherein in step (4), the time-dimension subsampled dataset X_tExpressed as a four-dimensional matrix of sample data in month t, i.e.

Wherein

The expression is as follows:

normalizing the compressed sample data set

Can be reassembled into a new two-dimensional matrix state, i.e.

Wherein, [ … ]]^TExpressed as transposing the matrix in brackets, transposing is a type of matrixThe way in which the number of all sample data in the matrix within brackets is thus not reduced, but is just reordered.

6. The method according to one of claims 1 to 5, wherein in step (8), the test set input set X _ test and the optimal training model of the weight matrix and the bias term are weighted to obtain a test set output set Y _ test, and then the obtained test set output set Y _ test is subjected to inverse normalization to obtain physical parameter values of actual size; further obtaining a predicted true value Y _ predict; the Y _ predict at this time is a t × (i × j × k × n) two-dimensional matrix to be restored to the initial i × j × k × n × t five-dimensional matrix by ascending dimension.