CN110807544A - Oil field residual oil saturation distribution prediction method based on machine learning - Google Patents

Oil field residual oil saturation distribution prediction method based on machine learning Download PDF

Info

Publication number
CN110807544A
CN110807544A CN201910951088.8A CN201910951088A CN110807544A CN 110807544 A CN110807544 A CN 110807544A CN 201910951088 A CN201910951088 A CN 201910951088A CN 110807544 A CN110807544 A CN 110807544A
Authority
CN
China
Prior art keywords
sample data
matrix
training
production
test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910951088.8A
Other languages
Chinese (zh)
Other versions
CN110807544B (en
Inventor
宋洪庆
张启涛
李正一
都书一
王九龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Dongkunwei Huashuzhi Energy Technology Co ltd
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN201910951088.8A priority Critical patent/CN110807544B/en
Publication of CN110807544A publication Critical patent/CN110807544A/en
Application granted granted Critical
Publication of CN110807544B publication Critical patent/CN110807544B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Mining

Abstract

The invention relates to a method for predicting the saturation distribution of residual oil in an oil field, which comprises the following steps: (1) acquiring a sample data set from historical data of an oil field block, wherein the sample data set comprises dynamic sample data and static sample data; (2) carrying out normalization processing on the sample data set; (3) performing characteristic relevance compression on the static sample data in the normalized sample data set; (4) performing dimensionality reduction on the normalized and compressed sample data set on the basis of retaining the time dimension; (5) dividing the sample data set which is subjected to normalization compression and dimension reduction processing to obtain a training set and a test set; (6) constructing an input set of a training set and an input set of a testing set; (7) training a weight matrix and a bias item of a training set input set by using a machine learning method, and performing reinforced training on a key data unit to obtain an optimal training model; (8) and obtaining a test set output set according to the optimal training model, and performing inverse normalization and dimension increasing processing. The method can be applied to rapid prediction of the residual oil exploitation capacity under the current complex geological condition, and has high prediction accuracy and adaptability.

Description

Oil field residual oil saturation distribution prediction method based on machine learning
Technical Field
The invention belongs to the field of oilfield development, relates to a method for predicting the saturation distribution of residual oil in an oilfield, and particularly relates to a method for predicting the distribution of the residual oil based on a machine learning algorithm.
Background
In the process of oil field development, due to the heterogeneity and production mode of a reservoir, a large amount of residual oil is usually produced in the reservoir, and prediction of the distribution of the residual oil has important value on oil field production. Therefore, the accurate prediction of the distribution of the oil saturation of the oil reservoir is beneficial to making a reasonable development technical policy, so that the potential of the residual oil of the oil reservoir is better excavated, and the accuracy of the prediction result is directly related to the effect of future development. Numerical simulation is widely applied to the petroleum industry, and a set of problems of a nonlinear parabolic partial differential equation, an auxiliary equation, a boundary condition and an initial condition are solved through a numerical method by utilizing numerical simulation of an oil reservoir, so that the oil saturation and the pressure distribution of the oil reservoir are calculated. After decades of development, the reservoir numerical calculation theory is gradually mature, and various simulation techniques and methods are continuously developed. However, for complex models, history matching, and prediction calculations, which take a long time, the prediction cost is relatively high. With the rapid development of computer technology, machine learning and artificial intelligence technology are gradually applied to the petroleum industry, and machine learning is a multi-field cross subject and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The machine learning method is mainly applied to analyzing data, pattern searching and predicting target variables, specially studies how a computer simulates or realizes human learning behaviors so as to acquire new knowledge or skills, and reorganizes an existing knowledge structure to continuously improve the performance of the computer. In the petroleum industry, a simple machine learning method is only used for reference at present, and the problem encountered in the petroleum exploitation process cannot be treated in a targeted manner.
Disclosure of Invention
Aiming at the problem that the conventional oil reservoir numerical simulation technology cannot rapidly solve complex oilfield exploitation prediction, the invention innovatively provides a machine learning method-based residual oil saturation distribution prediction method. The system designed by the method can be applied to rapid prediction of the residual oil exploitation capacity under the current complex geological condition, has high prediction accuracy and adaptability, is high in calculation speed, can avoid the phenomenon that sample data is weakened at key nodes in the oil exploitation process by a general machine learning method, and can well solve the complex oil field exploitation prediction problem.
In order to achieve the above object, the machine learning algorithm used in the present invention is used as a main method of the prediction calculation process, and the basic prediction process is as follows:
(1) and acquiring a sample data set from historical data of the oilfield block, wherein the sample data set comprises dynamic sample data and static sample data. In the step (1), various historical data of the oilfield blocks are innovatively introduced, a result is predicted by using various data, and the potential influence of multiple factors is considered.
(2) And carrying out normalization processing on the sample data set.
(3) And performing characteristic relevance compression on the static sample data in the normalized sample data set. In the step (2), a characteristic relevance compression method is innovatively introduced, so that the number of static data samples is compressed on the premise of ensuring the prediction accuracy, and the operation efficiency is improved.
(4) And performing dimensionality reduction on the normalized and compressed sample data set on the basis of retaining the time dimension.
(5) The normalized compressed and dimension-reduced sample data set is segmented to obtain a training set and a test set.
(6) An input set of a training set and an input set of a test set are constructed.
(7) And training the weight matrix and the bias items of the training set input set by using a machine learning method, and performing reinforced training on the key data units to obtain an optimal training model. In the step (7), a strengthened training method is innovatively applied, so that the phenomenon that the sample data of the key data unit is weakened due to long-time training is avoided.
(8) And obtaining a test set output set according to the optimal training model, and performing inverse normalization and dimension increasing treatment to further obtain a prediction result of the oil field residual oil saturation distribution.
(9) The validity of the prediction results after inverse normalization and upscaling was verified using the mean absolute relative error method (AARD).
Wherein, the historical data of the oilfield block in the step (1) often comprises static data and dynamic data. More kinds of historical data may make the accuracy of the machine learning algorithm higher, but more historical data may also increase the computation time.
Further, in order to reduce the time consumption of calculation, the feature relevance compression in step (3) can improve the efficiency of machine learning. The feature relevance compression method is characterized in that static sample data which does not change along with time in the sample data is compressed through solving of a covariance matrix, so that the static sample data is changed into one-dimensional feature sample data capable of representing multi-dimensional static sample data, the feature sample data can represent original static sample data, the calculated amount in a machine learning cycle process can be reduced, and the calculation efficiency is further improved.
Further, in order to reduce the computation time, the dimension reduction process in step (4) may improve the efficiency of machine learning. The dimension reduction processing means that original high-dimension sample data is converted into a one-dimension vector form through a preprocessing reading method on the premise that the sample data is not lost, and the dimension reduction method can obviously improve the calculation efficiency.
The sample data of the key data node in the step (7) has a relatively obvious difference with the sample data around the key data node, but the weakening phenomenon of the production data can occur along with the iteration of the machine learning method.
The weakening phenomenon of the production data refers to the phenomenon that the sample data of the key node has less difference than the original sample data. The phenomenon is caused by the fact that the machine learning method does not have a real physical condition to correct the machine learning method, sample data of the key nodes should have difference with sample data around under the real physical condition, but the calculation efficiency of the machine learning method is obviously reduced by introducing a physical law into the machine learning method.
Further, in order to correct the weakening phenomenon of the key node, the intensive training method in the step (7) is to judge whether the key node is weakened according to a variation difference of the sample data in the time dimension, and then perform intensive correction on the key node according to a first-order mean value and a second-order mean value in the time dimension, so that the key node is more in line with the real physical condition, the accuracy of machine learning is improved, and meanwhile, the calculation efficiency of the machine learning method is not reduced.
The beneficial technical effects of the invention are as follows:
1. the method designs a computational model framework based on a machine learning algorithm, predicts production data in oil field development, and improves the accuracy of prediction by introducing various oil field production data.
2. The method provides a feature relevance compression method, which is used for performing dimensional compression on static data on the premise of ensuring inconvenient prediction accuracy and improving the calculation efficiency.
3. The method introduces a sample data strengthening training method, aims at the phenomenon that the sample data of the key nodes is weakened, and uses a forced value-added method according to the time dimension to ensure that the sample data of the key nodes is not assimilated by the data of the surrounding units, thereby further conforming to the physical meaning.
In order to solve the above technical problems, the above advantageous technical effects are achieved. The specific technical scheme of the invention is as follows:
the invention relates to a method for predicting the saturation distribution of residual oil in an oil field, which comprises the following steps:
the method comprises the following steps of (1) collecting various production parameters of an oil field block in a time dimension to construct a sample data set, wherein the sample data set comprises dynamic sample data and static sample data, extracting 4 production parameters of historical residual oil saturation, formation pressure, oil production and water production as the dynamic sample data, and extracting 3 production parameters of porosity, permeability and residual water saturation as the static sample data; generating a sample data set F ═ x with a 5-dimensional matrixi,j,k,n,tWherein x is a certain number of samplesAccordingly, i is the ith row of the sample data, j is the jth column, and k is the kth layer; n is the nth production parameter, and the production parameters are arranged according to the sequence of historical residual oil saturation, formation pressure, oil production, water production, porosity, permeability and residual water saturation; t is the sample data of the tth month, and the units of the sample data are dimensionless;
step (2), the sample data set F in the step (1) is normalized, the sample data of different production parameters in the sample data set F is normalized, all the sample data are processed by using a normalization processing method, and the sample data set after normalization processing is obtained
Figure BDA0002225811790000031
Step (3), performing characteristic relevance compression on the static sample data in the sample data set after normalization processing in the step (2); classifying the sample data set after the normalization processing in the step (2) by using production parameters to express that:
Figure BDA0002225811790000032
wherein
Figure BDA0002225811790000033
The dynamic sample data of the oilfield block comprises historical remaining oil saturation
Figure BDA0002225811790000034
Formation pressure
Figure BDA0002225811790000035
Oil production
Figure BDA0002225811790000036
Water yield
Figure BDA0002225811790000037
4, the static sample data of the oilfield block comprises porosity
Figure BDA0002225811790000038
Permeability rate of penetrationResidual water saturation
Figure BDA00022258117900000310
These 3 kinds; performing feature relevance compression on the 3 types of static sample data to obtain a one-dimensional feature sample data vector subjected to feature relevance compression
Figure BDA00022258117900000311
Such that the one-dimensional feature sample data vector
Figure BDA00022258117900000312
Replace the original multi-dimensional static sample data set
Figure BDA0002225811790000041
Further obtaining a sample data set compressed by characteristic relevanceThe sample data set is compressed into 5 types from the original 7 types of production data in the production data dimension;
and (4) performing dimensionality reduction on the sample data set subjected to the characteristic relevance compression and obtained in the step (3) on the basis of keeping the time dimension to obtain the sample data set subjected to dimensionality reduction
Figure BDA0002225811790000043
Step (5), the sample data set obtained in the step (4) after the dimensionality reduction is divided to obtain a training set F _ train and a test set F _ test; preferably, the sample data set obtained in the step (4) after the dimensionality reduction treatment is taken
Figure BDA0002225811790000044
The first 80% of the data volume in the time dimension is used as a training set F _ train, and the rest 20% is used as a test set F _ test;
step (6), constructing an input set X _ train of a training set F _ train and an input set X _ test of a test set F _ test;
step (7), training the weight matrix and the bias item of the input set X _ train of the training set F _ train obtained in the step (6); preferably, a machine learning method is used for training a weight matrix and a bias item of an input set X _ train of a training set F _ train, and the key data unit is subjected to reinforced training to obtain an optimal training model;
and (8) obtaining a test set output set according to the optimal training model obtained in the step (7), and performing inverse normalization and dimension increasing treatment to obtain a prediction result of the oil field residual oil saturation distribution.
And (9) verifying the validity of the prediction result after inverse normalization and dimension-increasing processing by using an average absolute relative error (AARD) method.
In the step (2), the normalization processing method of each production parameter is expressed as follows:
Figure BDA0002225811790000045
wherein (x)i,j,k,n,t)minRepresents the minimum value in the data of the production parameter, (x)i,j,k,n,t)maxThe maximum value in the data representing such a production parameter,processing all sample data for the normalized production parameter data by using a normalization processing method to obtain a normalized sample data set
Figure BDA0002225811790000047
In step (3), a covariance matrix C is solved for the static sample data, and then the covariance matrix C can be obtained by the following formula:
Figure BDA0002225811790000048
then solving the feature matrix V and the feature vector of the covariance matrix CThe eigenvector and the characteristic matrix satisfy the equation
Figure BDA0002225811790000052
Wherein the feature vector
Figure BDA0002225811790000053
Is a one-dimensional vector, the characteristic matrix V is a 3 multiplied by 3 matrix, and the characteristic vector is selectedMaximum value λ ofmaxThen correspondingly finding out the corresponding maximum value lambda in the feature matrix VmaxVector V ofmaxIn which V ismax=(v1,v2,v3),v1、v2、v3Is a vector V max3 vector values of; through the multi-dimensional static sample data set
Figure BDA0002225811790000055
And vector VmaxMultiplying to obtain one-dimensional characteristic sample data vector compressed by relevance characteristic
Figure BDA0002225811790000056
Namely, it is
Figure BDA0002225811790000057
Further obtaining a sample data set compressed by characteristic relevance
Figure BDA0002225811790000058
The sample data set has been compressed from the original 7 production data to 5 in the production data dimension.
Wherein, in the step (4), the sub-sample data set X of the time dimension is usedtExpressed as a four-dimensional matrix of sample data in month t, i.e.
Figure BDA0002225811790000059
Wherein
Figure BDA00022258117900000510
The subsample dataset X is thentThe four-dimensional matrix i x j x k x n is reduced to a 1 x (i x j x k x n) one-dimensional column vector form, i.e. the sample data set of reduced dimension
Figure BDA00022258117900000511
The expression is as follows:normalizing the compressed sample data set
Figure BDA00022258117900000513
Can be reassembled into a new two-dimensional matrix state, i.e.
Figure BDA00022258117900000514
Wherein, [ … ]]TThe transpose process is a matrix process, and thus the number of all sample data in the matrix in parentheses is not reduced, but is reordered.
In the step (8), performing weighted calculation on the test set input set X _ test and the optimal training model of the weight matrix and the bias item to obtain a test set output set Y _ test, and performing inverse normalization on the obtained test set output set Y _ test to obtain a physical parameter value with an actual size; further obtaining a predicted true value Y _ predict; the Y _ predict at this time is a t × (i × j × k × n) two-dimensional matrix to be restored to the initial i × j × k × n × t five-dimensional matrix by ascending dimension.
Drawings
FIG. 1 is a flow chart of the present invention for achieving residual oil saturation prediction based on a machine learning method.
FIG. 2 is a comparison diagram of the distribution of the residual oil saturation of a certain water injection and oil production block. Results from the sample dataset are on the left and results from machine learning training are on the right.
FIG. 3 is a comparison graph of pressure distribution of a water injection and oil production block. Results from the sample dataset are on the left and results from machine learning training are on the right.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments;
the method comprises the following steps of firstly, obtaining a sample data set from historical data of an oil field block, wherein the sample data set comprises dynamic sample data and static sample data. Taking a certain water injection and oil production block as an example, the block has 25 rows, 24 columns and 34 layers of grids in three directions of a horizontal stratum and a vertical stratum, so that 25 × 24 × 34 data units exist, each data unit contains multiple months of various production data including permeability, porosity, residual oil saturation, formation pressure, water yield, oil yield and the like, 4 production parameters including the residual oil saturation, the formation pressure, the oil yield and the water yield are extracted as dynamic sample data, 3 production parameters including the porosity, the permeability and the residual water saturation are extracted as static sample data, and the oil field block has 120 months of data in a time dimension, so that a sample data set F x with a 5-dimension matrix can be generatedi,j,k,n,tWhere x is some sample data and i is the ith row of the sample data, i is 1,2, 3. j is the jth column, j 1,2, 3. k is the kth layer, k ═ 1,2, 3.., 34; n is the nth production parameter, and the production parameters are arranged according to the sequence of residual oil saturation, formation pressure, oil production, water production, porosity, permeability and residual water saturation, wherein n is 1,2, 3. t is the sample data of the tth month, and t is 1,2, 3. For example, the remaining oil saturation of the data unit in the 3 rd row, 8 th column and 12 th layer in the 30 th month is sample data x in the sample data set3,8,12,1,30
And secondly, normalizing the sample data set. And carrying out normalization processing on the sample data of different production parameters in the sample data set F. Taking the oil field as an example, the normalization processing method of a certain production parameter is expressed as follows:
Figure BDA0002225811790000061
wherein (x)i,j,k,n,t)minRepresents the minimum value in the data of the production parameter, (x)i,j,k,n,t)maxThe maximum value in the data representing such a production parameter,
Figure BDA0002225811790000062
the normalized data of the production parameters. Similarly, all the sample data are processed by using a normalization processing method to obtain a normalized sample data set
And thirdly, performing characteristic relevance compression on the static sample data in the unified sample data set. Taking a certain water injection and oil recovery block as an example, if the normalized sample data set is classified by the production parameters, the normalized sample data set can be expressed as:
Figure BDA0002225811790000064
wherein
Figure BDA0002225811790000065
Thus, the dynamic data of the sample data set of the block comprises the saturation of the residual oil
Figure BDA0002225811790000066
Formation pressure
Figure BDA0002225811790000067
Oil production
Figure BDA0002225811790000068
Water yield
Figure BDA0002225811790000069
These 4, static sample data include porosity
Figure BDA00022258117900000610
Permeability rate of penetrationResidual water saturation
Figure BDA00022258117900000612
The covariance matrix C is solved for the static sample data, and then the covariance matrix C of the water-flooding oilfield can be obtained by the following formula:
Figure BDA0002225811790000071
wherein the content of the first and second substances,sample data for all porosities;sample data for all permeabilities;
Figure BDA0002225811790000074
for all of the residual water saturation sample data,
Figure BDA0002225811790000075
is a sample data set
Figure BDA0002225811790000076
And
Figure BDA0002225811790000077
the covariance of (a). Then solving the feature matrix V and the feature vector of the covariance matrix C
Figure BDA0002225811790000078
The eigenvectors and feature matrices should satisfy the equation
Figure BDA0002225811790000079
Wherein the feature vector
Figure BDA00022258117900000710
Is a one-dimensional vector, the characteristic matrix V is a 3 multiplied by 3 matrix, and the characteristic vector is selected
Figure BDA00022258117900000711
Maximum value λ ofmaxThen correspondingly finding out the corresponding maximum value lambda in the feature matrix VmaxVector V ofmaxIn which V ismax=(v1,v2,v3),v1、v2、v3Is a vector V max3 vector values. Through the multi-dimensional static sample data setAnd vector VmaxMultiplying to obtain one-dimensional feature sample data vector compressed by relevance feature
Figure BDA00022258117900000713
Namely, it is
Figure BDA00022258117900000714
Further obtaining a sample data set compressed by characteristic relevanceThe sample data set has been compressed from the original 7 production data to 5 in the production data dimension.
And fourthly, performing dimensionality reduction on the normalized and compressed sample data set on the basis of the reserved time dimension. Taking the water injection and oil extraction block as an example, a time-dimension sub-sample data set X is obtainedtExpressed as a four-dimensional matrix of sample data in month t, i.e.
Figure BDA00022258117900000716
Wherein
Figure BDA00022258117900000717
The subsample dataset X is thentThe four-dimensional matrix i x j x k x n is reduced to a 1 x (i x j x k x n) one-dimensional column vector form, i.e. the sample data set of reduced dimension
Figure BDA00022258117900000718
Can be expressed as:
Figure BDA00022258117900000719
normalizing the compressed sample data set
Figure BDA00022258117900000720
Can be reassembled into a new two-dimensional matrix state, i.e.
Figure BDA00022258117900000721
Wherein, [ … ]]TMeaning that the transpose process is applied to the matrix in parentheses, the transpose process is a matrix process such that the amount of all sample data in the matrix is not reduced, but is reordered.
And fifthly, segmenting the sample data set which is subjected to normalization compression and dimensionality reduction to obtain a training set and a test set. Typically, the sample dataset is sampledIn order to balance the data volume of the first 80% or so in the time dimension as the training set F _ train and the remaining 20% as the test set F _ test, only sample data can be used for verifying the accuracy of the training result, so at least a small part of sample data is left as the test set, the more training sets make the training result more accurate, but the less remaining test sets make the subsequent verification process not good, and generally, the sample data volume of the first 80% or so in the time dimension is suggested as the training set, and the remaining sample data is used as the test set. Taking the water injection and oil extraction block as an example, the data of the previous 100 months in the time dimension are taken out and used as the training set of the next step
Figure BDA0002225811790000081
The remaining last 20 months of data are used as a test set
Figure BDA0002225811790000082
And sixthly, constructing an input set of the training set and an input set of the test set. Setting a window matrix WkThe matrix is a two-dimensional matrix of d x (i × j × k × n), and the sample data set can be compressed according to normalization and reduced in dimension
Figure BDA0002225811790000083
Obtaining WkThus, therefore, it is
Figure BDA0002225811790000084
Where k is 1,2,., t-d +1, the unit is dimensionless, d is the search width, and the unit is dimensionless. While a smaller search width d increases the training frequency, but also decreases the effective ability of each training, so the search width is typically one tenth of the time dimension. Recombining partial window matrices WkThen a training set input set X _ train may be constructed. Similarly, the test set also recombines the remaining window matrices WkAnd is denoted as test set input set X _ test. Taking the water injection and oil extraction block as an example, if the block has 120 months of sample data, d may be taken to be 10, and further, all the window matrices of the block may be expressed as:therefore, from the training set F _ train, the training set input set X _ train ═ W can be obtained1,W2,W3,...,W100]TAccording to the test set F _ test, the test set input set X _ test ═ W can be obtained101,W102,W103,...,W111]T
And seventhly, training the weight matrix and the bias items of the training set input set by using a machine learning method, and performing reinforced training on the key data units to obtain an optimal training model. The weight matrix is a training model, if a predicted value at a future moment is desired to be obtained, known data at the current moment is required to be multiplied by the weight matrix in a mapping mode, namely the known data is subjected to weighting solving, the obtained result is the predicted value, and the weight matrix capable of predicting the result most accurately is the optimal training model. The bias term isThe value of how much modification is needed to the weight matrix used in the previous cycle for each iteration determines how much the mapping values of the weight matrix in each cycle step are different. Taking some block data as an example, a weight matrix and a bias item in an initial state are set for the training set input set X _ train, and then X _ train can be weighted and calculated according to the weight matrix and the bias item to obtain a training set output set Y _ train ═ Y11,Y12,...,Y100]T,Yt=(yi,j,k,n)tHowever, the training result is not optimal at present, and Y _ train and the training set F _ train obtained at this time
Figure BDA0002225811790000086
And carrying out mapping difference calculation and averaging to obtain a loss function L. And solving a first order partial derivative and a second order partial derivative of the weight matrix and the bias item matrix by the loss function L to obtain a weight gradient and an updating amount of the weight gradient, and updating the weight matrix and the bias item matrix according to the updating amount of the weight gradient to obtain a new weight matrix and a new bias item matrix. In order to avoid the weakening of the production data in the critical data unit, the production data of the critical data unit needs to be strengthened. Taking the remaining oil saturation as an example, the position of a general oil production well is taken as a key data unit, and the block has 6 oil production wells in total, so that whether the remaining oil saturation at the well position is weakened or not can be judged by the following two formulas:
Figure BDA0002225811790000091
wherein the content of the first and second substances,
Figure BDA0002225811790000092
residual oil saturation data for a well at the previous time step,
Figure BDA0002225811790000093
for the remaining oil saturation data for the well at the current time step,
Figure BDA0002225811790000094
for 8 sample data around the well in the current time step, α is a fluctuation coefficient, α is 1.1-1.2, generally 1.15, β is a proximity coefficient, β is 1.1-1.2, generally 1.15, units are dimensionless, if the key data simultaneously satisfy two conditions in formula (2), the production data can be determined to be weakened, and for data enhancement, a first-order average value m of the residual oil saturation in the time dimension is takentAnd second order mean vt
Figure BDA0002225811790000095
The remaining oil saturation x for that well location should be adjustedt+1According to a first-order mean value mtAnd second order mean vtUpdating:
Figure BDA0002225811790000096
η is a correction coefficient, η is 0.01-0.1, generally 0.1 is taken, and then the strengthening training of the key data unit is completed.
And eighthly, obtaining a test set output set according to the optimal training model, and performing inverse normalization and dimension increasing treatment to further obtain a prediction result of the oil field residual oil saturation distribution. Carrying out weighted calculation on the test set input set X _ test and the optimal training model of the weight matrix and the bias item to obtain a test set output set Y _ test, wherein Y _ test is [ Y _ test ═ Y111,Y112,...,Y120]T. And then carrying out inverse normalization on the obtained test set output set Y _ test, namely carrying out inverse processing on the normalization in the second step to obtain the physical parameter value of the actual size. And then the predicted true value Y _ predict is obtained. The Y _ predict at this time is a t × (i × j × k × n) two-dimensional matrix to be restored to on by upscalingI × j × k × n × t at the beginning. Taking the residual oil saturation data of the water injection and oil extraction block in the 120 th month period as an example, a 17 th layer residual oil saturation plane distribution diagram is taken, the left side in fig. 2-3 is an original result from a sample data set, and the right side is a result obtained from machine learning method training.
And a ninth step of verifying the validity of the test set output set after the inverse normalization and the dimension ascending processing by using an average absolute relative error (AARD) method. The average absolute relative error method (AARD) is calculated as follows:
Figure BDA0002225811790000101
yi,j,k,n,trepresenting prediction data, x, in the predicted true value Y _ predicti,j,k,n,tAnd representing the sample data of the time corresponding to the real value Y _ predict in the test set F _ test, wherein N represents the total amount of the sample data in the predicted real value Y _ predict and is dimensionless. In general, AARD<A10% training result can be considered valid, and the closer AARD is to 0, the smaller the relative deviation is, and the more accurate the predicted value is. Taking the oilfield block as an example, if the accuracy of the prediction result of the residual oil saturation is shown, yi,j,k,n,tRepresents the remaining oil saturation data of the real value Y _ predict from month 111 to month 120, and xi,j,k,n,tRepresenting the residual oil saturation data from month 111 to month 120 in the test set F _ test, the AARD of the current prediction model is 4.83%, AARD<10%, the accuracy is higher, and the optimal training model is effective.

Claims (6)

1. The method for predicting the saturation distribution of the residual oil in the oil field is characterized by comprising the following steps of:
step (1), collecting multiple production parameters of an oil field block in a time dimension to construct a sample data set, wherein the sample data set comprises dynamic sample data and static sample data, extracting 4 production parameters of historical residual oil saturation, formation pressure, oil production and water production as the dynamic sample data, and extracting 3 production parameters of porosity, permeability and residual water saturation as the static sample dataAccordingly; generating a sample data set F ═ x with a 5-dimensional matrixi,j,k,n,tWherein x is a certain sample data, i is the ith row of the sample data, j is the jth column, and k is the kth layer; n is the nth production parameter, and the production parameters are arranged according to the sequence of historical residual oil saturation, formation pressure, oil production, water production, porosity, permeability and residual water saturation; t is the sample data of the tth month, and the units of the sample data are dimensionless;
step (2), the sample data set F in the step (1) is normalized, the sample data of different production parameters in the sample data set F is normalized, all the sample data are processed by using a normalization processing method, and the sample data set after normalization processing is obtained
Figure FDA0002225811780000011
Step (3), performing characteristic relevance compression on the static sample data in the sample data set after normalization processing in the step (2); classifying the sample data set after the normalization processing in the step (2) by using production parameters to express that:wherein
Figure FDA0002225811780000013
The dynamic sample data of the oilfield block comprises historical remaining oil saturation
Figure FDA0002225811780000014
Formation pressure
Figure FDA0002225811780000015
Oil production
Figure FDA0002225811780000016
Water yield4, the static sample data of the oilfield block comprises porosity
Figure FDA0002225811780000018
Permeability rate of penetrationResidual water saturation
Figure FDA00022258117800000110
These 3 kinds; performing feature relevance compression on the 3 types of static sample data to obtain a one-dimensional feature sample data vector subjected to feature relevance compression
Figure FDA00022258117800000111
Such that the one-dimensional feature sample data vector
Figure FDA00022258117800000112
Replace the original multi-dimensional static sample data set
Figure FDA00022258117800000113
Further obtaining a sample data set compressed by characteristic relevanceThe sample data set is compressed into 5 types from the original 7 types of production data in the production data dimension;
and (4) performing dimensionality reduction on the sample data set subjected to the characteristic relevance compression and obtained in the step (3) on the basis of keeping the time dimension to obtain the sample data set subjected to dimensionality reduction
Step (5), the sample data set obtained in the step (4) after the dimensionality reduction is divided to obtain a training set F _ train and a test set F _ test; preferably, the product obtained in step (4) is takenSample data set after dimensionality reduction
Figure FDA00022258117800000116
The first 80% of the data volume in the time dimension is used as a training set F _ train, and the rest 20% is used as a test set F _ test;
step (6), constructing an input set X _ train of a training set F _ train and an input set X _ test of a test set F _ test;
step (7), training the weight matrix and the bias item of the input set X _ train of the training set F _ train obtained in the step (6); preferably, a machine learning method is used for training a weight matrix and a bias item of an input set X _ train of a training set F _ train, and the key data unit is subjected to reinforced training to obtain an optimal training model;
and (8) obtaining a test set output set according to the optimal training model obtained in the step (7), and performing inverse normalization and dimension increasing treatment to obtain a prediction result of the oil field residual oil saturation distribution.
2. The method according to claim 1, further comprising a step (9) of verifying the validity of the prediction result after inverse normalization and upscaling using the mean absolute relative error method (AARD).
3. The method according to any one of claims 1 to 2, wherein in the step (2), the normalization processing method for each production parameter is expressed as follows:
Figure FDA0002225811780000021
wherein (x)i,j,k,n,t)minRepresents the minimum value in the data of the production parameter, (x)i,j,k,n,t)maxThe maximum value in the data representing such a production parameter,
Figure FDA0002225811780000022
for the normalized data of the production parameters, a normalization processing method is usedProcessing all the sample data to obtain a normalized sample data set
Figure FDA0002225811780000023
4. The method according to one of claims 1-3, wherein in step (3), a covariance matrix C is solved for the static sample data, and the covariance matrix C is obtained by the following formula:
Figure FDA0002225811780000024
then solving the feature matrix V and the feature vector of the covariance matrix C
Figure FDA0002225811780000025
The eigenvector and the characteristic matrix satisfy the equation
Figure FDA0002225811780000026
Wherein the feature vector
Figure FDA0002225811780000027
Is a one-dimensional vector, the characteristic matrix V is a 3 multiplied by 3 matrix, and the characteristic vector is selected
Figure FDA0002225811780000028
Maximum value λ ofmaxThen correspondingly finding out the corresponding maximum value lambda in the feature matrix VmaxVector V ofmaxIn which V ismax=(v1,v2,v3),v1、v2、v3Is a vector Vmax3 vector values of; through the multi-dimensional static sample data setAnd vector VmaxMultiplying to obtain one-dimensional characteristic sample data vector compressed by relevance characteristic
Figure FDA00022258117800000210
Namely, it is
Figure FDA00022258117800000211
Further obtaining a sample data set compressed by characteristic relevance
Figure FDA00022258117800000212
The sample data set has been compressed from the original 7 production data to 5 in the production data dimension.
5. The method according to one of claims 1 to 4, wherein in step (4), the time-dimension subsampled dataset XtExpressed as a four-dimensional matrix of sample data in month t, i.e.
Figure FDA0002225811780000031
Wherein
Figure FDA0002225811780000032
The subsample dataset X is thentThe four-dimensional matrix i x j x k x n is reduced to a 1 x (i x j x k x n) one-dimensional column vector form, i.e. the sample data set of reduced dimension
Figure FDA0002225811780000033
The expression is as follows:
Figure FDA0002225811780000034
normalizing the compressed sample data set
Figure FDA0002225811780000035
Can be reassembled into a new two-dimensional matrix state, i.e.
Figure FDA0002225811780000036
Wherein, [ … ]]TExpressed as transposing the matrix in brackets, transposing is a type of matrixThe way in which the number of all sample data in the matrix within brackets is thus not reduced, but is just reordered.
6. The method according to one of claims 1 to 5, wherein in step (8), the test set input set X _ test and the optimal training model of the weight matrix and the bias term are weighted to obtain a test set output set Y _ test, and then the obtained test set output set Y _ test is subjected to inverse normalization to obtain physical parameter values of actual size; further obtaining a predicted true value Y _ predict; the Y _ predict at this time is a t × (i × j × k × n) two-dimensional matrix to be restored to the initial i × j × k × n × t five-dimensional matrix by ascending dimension.
CN201910951088.8A 2019-10-08 2019-10-08 Oil field residual oil saturation distribution prediction method based on machine learning Active CN110807544B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910951088.8A CN110807544B (en) 2019-10-08 2019-10-08 Oil field residual oil saturation distribution prediction method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910951088.8A CN110807544B (en) 2019-10-08 2019-10-08 Oil field residual oil saturation distribution prediction method based on machine learning

Publications (2)

Publication Number Publication Date
CN110807544A true CN110807544A (en) 2020-02-18
CN110807544B CN110807544B (en) 2020-10-13

Family

ID=69488140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910951088.8A Active CN110807544B (en) 2019-10-08 2019-10-08 Oil field residual oil saturation distribution prediction method based on machine learning

Country Status (1)

Country Link
CN (1) CN110807544B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625922A (en) * 2020-04-15 2020-09-04 中国石油大学(华东) Large-scale oil reservoir injection-production optimization method based on machine learning agent model
CN112508273A (en) * 2020-12-03 2021-03-16 中国石油大学(华东) Residual oil prediction method based on generation countermeasure network
CN112819240A (en) * 2021-02-19 2021-05-18 北京科技大学 Method for predicting shale oil yield based on physical constraint LSTM model
US20220147668A1 (en) * 2020-11-10 2022-05-12 Advanced Micro Devices, Inc. Reducing burn-in for monte-carlo simulations via machine learning
CN114492213A (en) * 2022-04-18 2022-05-13 中国石油大学(华东) Wavelet neural operator network model-based residual oil saturation and pressure prediction method
US11713666B2 (en) 2020-05-11 2023-08-01 Saudi Arabian Oil Company Systems and methods for determining fluid saturation associated with reservoir depths

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2539744A4 (en) * 2010-03-19 2017-11-22 Schlumberger Technology B.V. Uncertainty estimation for large-scale nonlinear inverse problems using geometric sampling and covariance-free model compression
CN109543828A (en) * 2018-12-28 2019-03-29 中国石油大学(华东) A kind of intake profile prediction technique based under condition of small sample
CN109763800A (en) * 2019-03-18 2019-05-17 哈尔滨理工大学 A kind of separated-zone water infection oil field amount prediction technique
CN109948841A (en) * 2019-03-11 2019-06-28 中国石油大学(华东) A kind of prediction technique of the waterflooding development oil field remaining oil distribution based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2539744A4 (en) * 2010-03-19 2017-11-22 Schlumberger Technology B.V. Uncertainty estimation for large-scale nonlinear inverse problems using geometric sampling and covariance-free model compression
CN109543828A (en) * 2018-12-28 2019-03-29 中国石油大学(华东) A kind of intake profile prediction technique based under condition of small sample
CN109948841A (en) * 2019-03-11 2019-06-28 中国石油大学(华东) A kind of prediction technique of the waterflooding development oil field remaining oil distribution based on deep learning
CN109763800A (en) * 2019-03-18 2019-05-17 哈尔滨理工大学 A kind of separated-zone water infection oil field amount prediction technique

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625922A (en) * 2020-04-15 2020-09-04 中国石油大学(华东) Large-scale oil reservoir injection-production optimization method based on machine learning agent model
US11713666B2 (en) 2020-05-11 2023-08-01 Saudi Arabian Oil Company Systems and methods for determining fluid saturation associated with reservoir depths
US20220147668A1 (en) * 2020-11-10 2022-05-12 Advanced Micro Devices, Inc. Reducing burn-in for monte-carlo simulations via machine learning
CN112508273A (en) * 2020-12-03 2021-03-16 中国石油大学(华东) Residual oil prediction method based on generation countermeasure network
CN112508273B (en) * 2020-12-03 2023-04-07 中国石油大学(华东) Residual oil prediction method based on generation countermeasure network
CN112819240A (en) * 2021-02-19 2021-05-18 北京科技大学 Method for predicting shale oil yield based on physical constraint LSTM model
CN114492213A (en) * 2022-04-18 2022-05-13 中国石油大学(华东) Wavelet neural operator network model-based residual oil saturation and pressure prediction method
CN114492213B (en) * 2022-04-18 2022-07-01 中国石油大学(华东) Wavelet neural operator network model-based residual oil saturation and pressure prediction method

Also Published As

Publication number Publication date
CN110807544B (en) 2020-10-13

Similar Documents

Publication Publication Date Title
CN110807544B (en) Oil field residual oil saturation distribution prediction method based on machine learning
CN108510741B (en) Conv1D-LSTM neural network structure-based traffic flow prediction method
CN112989708B (en) Well logging lithology identification method and system based on LSTM neural network
CN113052371B (en) Residual oil distribution prediction method and device based on deep convolutional neural network
CN111324990A (en) Porosity prediction method based on multilayer long-short term memory neural network model
CN111523713B (en) Method and device for predicting saturation distribution of residual oil in oil field
CN109002917A (en) Total output of grain multidimensional time-series prediction technique based on LSTM neural network
CN105045941A (en) Oil pumping unit parameter optimization method based on traceless Kalman filtering
CN111048163B (en) Shale oil hydrocarbon retention amount (S1) evaluation method based on high-order neural network
CN112633328A (en) Dense oil reservoir transformation effect evaluation method based on deep learning
CN109492748B (en) Method for establishing medium-and-long-term load prediction model of power system based on convolutional neural network
CN112761628B (en) Shale gas yield determination method and device based on long-term and short-term memory neural network
CN106886793B (en) Hyperspectral image waveband selection method based on discrimination information and manifold information
CN112733997A (en) Hydrological time series prediction optimization method based on WOA-LSTM-MC
CN111027249B (en) Machine learning-based inter-well connectivity evaluation method
CN111058840A (en) Organic carbon content (TOC) evaluation method based on high-order neural network
CN116542429A (en) Oil reservoir production index machine learning prediction method integrating space-time characteristics
CN114091333A (en) Shale gas content artificial intelligence prediction method based on machine learning
CN112926251B (en) Landslide displacement high-precision prediction method based on machine learning
CN110688150B (en) Binary file code search detection method and system based on tensor operation
CN110486009B (en) Automatic parameter reverse solving method and system for infinite stratum
CN116720057A (en) River water quality prediction method and system based on feature screening and weight distribution
CN112149311B (en) Nonlinear multivariate statistical regression logging curve prediction method based on quantity specification
Barros et al. Clustering techniques for value-of-information assessment in closed-loop reservoir management
CN113761777A (en) Ultra-short-term photovoltaic power prediction method based on HP-OVMD

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210512

Address after: Room 501, building B3, 2566 Jiaozhou Bay East Road, LINGSHANWEI sub district office, Huangdao District, Qingdao City, Shandong Province 266500

Patentee after: Qingdao dongkunwei huashuzhi Energy Technology Co.,Ltd.

Address before: 100083 No. 30, Haidian District, Beijing, Xueyuan Road

Patentee before: University OF SCIENCE AND TECHNOLOGY BEIJING