CN112861423A

CN112861423A - Data-driven water-flooding reservoir optimization method and system

Info

Publication number: CN112861423A
Application number: CN202110028282.6A
Authority: CN
Inventors: 檀朝东; 王春秋; 赵小雨; 牛会钊; 宋文容; 宋健
Original assignee: Beijing Yadan Petroleum Technology Co ltd; China University of Petroleum Beijing
Current assignee: Beijing Yadan Petroleum Technology Co ltd; China University of Petroleum Beijing
Priority date: 2021-01-08
Filing date: 2021-01-08
Publication date: 2021-05-28
Anticipated expiration: 2041-01-08
Also published as: CN112861423B

Abstract

The invention relates to a data-driven waterflooding reservoir optimization method and system, wherein the method comprises the following steps: establishing a preliminary oil reservoir model based on the static data and the dynamic data acquired in real time; randomly selecting historical production parameters related to oil reservoir parameters to carry out numerical simulation on the preliminary oil reservoir model; under the condition that the difference between the numerical simulation result of the oil reservoir model and the production history is smaller than a second threshold value, taking the primary oil reservoir model as an oil reservoir model optimized by a subsequent water injection scheme and a drainage scheme; and under the condition that the difference between the numerical simulation result and the production history of the reservoir model is larger than a second threshold value, at least one first parameter with the correlation degree larger than the first threshold value is selected based on the correlation degree analysis of the dynamic data and the numerical simulation result, and the preliminary reservoir model is adjusted in a mode of optimizing the at least one first parameter so that the difference between the numerical simulation result and the production history is smaller than the second threshold value.

Description

Data-driven water-flooding reservoir optimization method and system

Technical Field

The invention relates to the technical field of oil reservoir engineering development, in particular to a data-driven water injection oil reservoir optimization method and system.

Background

The intelligent oil field is based on a digital oil field, makes full use of new generation information technologies such as big data, machine learning and intelligent algorithm, and enables the oil field to have the capabilities of observation (real-time acquisition of monitoring data), thinking (big data analysis method), decision (machine learning and intelligent optimization method) and execution (intelligent oil and gas well regulation and control equipment), and the intelligent oil field technology is based on all-around interconnection and intercommunication and effectively integrates all systems operated by the oil field.

The optimization design of the manual water injection scheme mainly depends on an oil reservoir engineering method or a numerical simulation technology, the design scheme has strong randomness, wastes time and labor and is easy to miss the optimal scheme. The oil reservoir dynamic real-time production optimization theory is an important research hotspot of the design of the current oil field automatic injection and production scheme, and mainly combines an oil reservoir numerical simulation technology and an optimization theory to convert the design of oil-water well injection and production parameters into an optimal control model for solving, and maximum model functions such as a concomitant gradient, a random disturbance gradient algorithm, a heuristic algorithm and the like are adopted to further automatically solve an optimal working system. The inter-well connectivity is an important basis for water injection optimization design, and an inter-well connectivity model based on injection and production dynamic data gradually develops from a single phase to a new phase of oil-water two-phase and single-layer to multi-layer prediction, has the advantages of being fast in calculation, capable of quantitatively representing inter-well connectivity and the like, and is gradually applied to oil reservoir scheme evaluation design. However, when the complex oil reservoir oil-water dynamic prediction is carried out on the current connectivity model, the saturation tracking method is not applied to the oil reservoir layered injection-production optimization scheme, so that the current oil reservoir exploitation scheme cannot be optimized and improved actually, and the oil reservoir exploitation efficiency is improved.

For example, chinese patent publication No. CN108868712B discloses a method and system for optimizing reservoir development and production based on connectivity method. According to the method, an accurate front edge tracking method is established by taking an interwell communication unit as an object to calculate the saturation, two output dynamics of oil and water at each layer of a well point are obtained, communication model parameters are automatically fitted and inverted for the historical dynamics of the oil deposit, information such as conductivity, flow splitting and water injection efficiency between injection and production wells is obtained, and the layered dynamic allocation and injection allocation automatic optimization design is carried out on the oil deposit by iterative calculation on the basis of the information, so that the low-efficiency water drive direction flow is reduced, and injection and production contradiction is improved. However, the reservoir development method and system provided by the patent do not take the heterogeneous properties of the reservoir into consideration, and particularly, the technical scheme of the patent document explains and observes the reservoir by a large amount of water injection and oil extraction layered data acquired by a wellbore device, the data has high heterogeneity in time and space, and the reservoir geology and rock physical properties have great uncertainty.

Furthermore, on the one hand, due to the differences in understanding to the person skilled in the art; on the other hand, since the inventor has studied a lot of documents and patents when making the present invention, but the space is not limited to the details and contents listed in the above, however, the present invention is by no means free of the features of the prior art, but the present invention has been provided with all the features of the prior art, and the applicant reserves the right to increase the related prior art in the background.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a data-driven water injection oil reservoir optimization method, which comprises the following steps:

establishing a preliminary oil reservoir model based on the static data and the dynamic data acquired in real time;

randomly selecting historical production parameters related to oil reservoir parameters to carry out numerical simulation on the preliminary oil reservoir model;

and under the condition that the difference between the numerical simulation result of the oil reservoir model and the production history is smaller than a second threshold value, taking the primary oil reservoir model as an oil reservoir model optimized by a subsequent water injection scheme and a drainage scheme. Preferably, in the case that the difference between the numerical simulation result and the production history of the reservoir model is greater than a second threshold, at least one first parameter with a correlation degree greater than a first threshold is selected based on the correlation degree analysis of the dynamic data and the numerical simulation result, and the preliminary reservoir model is adjusted in a manner of optimizing the at least one first parameter so that the difference between the numerical simulation result and the production history is less than the second threshold. The invention combines the dynamic data of layered injection and production to continuously correct the numerical reservoir simulation on the basis of analyzing and judging the high water-cut reservoir by using static data to perform the numerical reservoir simulation, thereby establishing a dynamic physical reservoir data model with high goodness of fit, reducing the uncertainty of residual oil distribution prediction, deepening the knowledge of reservoir heterogeneity and obtaining the reservoir fluid protection degree and pressure field evolution model under the constraint of layered injection and production real-time data. Specifically, at least one first parameter which affects a numerical simulation result of the reservoir model greatly is selected through correlation analysis, the numerical simulation results of the corresponding reservoir models are calculated according to the value of the selected first parameter, the first parameter is adjusted according to the difference between the numerical simulation results of the reservoir models and the actual production history, the difference between the numerical simulation results and the actual production history is reduced, the data simulation results of the reservoir models are smaller than a second threshold value, namely the reservoir models under the first parameter meet the requirement of the fitting rate, the understanding level of the reservoirs can be deepened on the basis of combining static data and dynamic data, and the adjustment of the reservoir models is accelerated to enable the reservoir models to be fitted quickly.

According to a preferred embodiment, the step of optimizing at least one of said first parameters is as follows:

randomly selecting an initial value of a first parameter based on the historical production parameters;

predicting based on an initial value of at least one of the first parameters to generate numerical simulation results of a plurality of the reservoir models;

and adjusting the first parameter step by step based on the predicted difference between the numerical simulation results and the production history of the plurality of reservoir models so that the predicted difference between the numerical simulation results and the production history of the plurality of reservoir models is smaller than a second threshold value.

According to a preferred embodiment, the method further comprises:

acquiring physical properties of a reservoir layer based on the oil reservoir model, and determining a water injection scheme and a drainage scheme of layered injection and production by utilizing a deep learning algorithm/a machine learning algorithm so as to determine an integrated water injection scheme and drainage scheme in a manner that the layered water injection and the layered oil production are mutually synergistic;

and executing the integrated water injection scheme and drainage scheme in a manner of simultaneously optimizing the parameters of the layered water injection and the parameters of the layered oil extraction based on the reinforcement learning/deep reinforcement learning algorithm, thereby avoiding independently implementing the layered water injection and the layered oil extraction to realize injection-production balance and supply-drainage coordination.

According to a preferred embodiment, the steps of obtaining reservoir physical properties based on the reservoir model and determining a water injection scheme and a drainage scheme of layered injection and production by using a deep learning algorithm/a machine learning algorithm are as follows:

obtaining a second parameter related to reservoir fluid flow based on the reservoir model;

and determining an integrated water injection scheme and drainage scheme based on the second parameter. Preferably, the evaluation indexes of the water injection effect and the drainage effect are quantified based on the static data, the dynamic data and the second parameter. Preferably, the water filling scheme and the drainage scheme are obtained based on a deep learning algorithm/machine learning algorithm. Preferably, the second parameters include at least one or more of reservoir petrophysical properties, single sand extension and geometry, fault assembly structure and seal, injection and production profile, perforations and pay zones in water and production wells, relative positions of water and production wells.

According to a preferred embodiment, the method further comprises optimizing the integrated flooding and drainage scheme based on a deep reinforcement learning algorithm. Preferably, the optimization goal of the integrated flooding scheme and drainage scheme is that the net present value is maximal. And at least acquiring the changes of pressure and saturation distribution based on the high-dimensional dynamic data measured in real time, and taking the high-dimensional pressure and saturation changes as the input of a deep reinforcement learning algorithm. At least the water injection frequency and the oil recovery frequency are used as decision variables. Preferably, the optimized flooding and drainage schemes are adjusted based on the second parameters provided by the reservoir model in a manner that supplements at least the reservoir petrophysical properties, the relative positions of the flooding and production wells.

According to a preferred embodiment, the steps of performing the integrated flooding scheme and drainage scheme based on reinforcement learning/deep reinforcement learning algorithm in a way that optimizes both the stratified flooding parameters and the stratified oil recovery parameters are as follows:

constructing a cost function related to the environment state and the execution action of the execution module;

in the case where the cost function converges and the optimization decision does not bring the environmental state to the optimization goal, or in the case where the cost function converges and the corresponding injection-production equipment is not damaged,

acquiring a first action under a corresponding environment state based on an epsilon-greedy strategy;

acquiring a new environment state and corresponding rewards after the first action is executed;

the learning update is based on the new environmental state and the corresponding reward. Preferably, the learning update is performed based on a linear superposition of the previous value and the loss function in the previous environmental state, and the loss function is constructed based on the learning rate and the difference between the real value and the previous value in the previous environmental state. Preferably, the real-world value includes a first real-world value learned online and a second real-world value learned offline. Preferably, after the update, the environment state is updated to a new environment state as an initial state of the next round of control.

According to a preferred embodiment, the loss function is configured as follows:

and constructing a loss function in the reinforcement learning/deep reinforcement learning algorithm by fusing online learning and offline learning modes on the basis of dividing the starting and stopping times, the well opening time and the well closing time of the water injection well and the oil production well. Preferably, the number of start-stop times, the well opening time and the well closing time of each single well in one equipment detection period are divided based on the state space, and then the first time of each single well related to well opening in different opening time and the second time of each single well related to well closing in different stopping time are determined. And in the same first time/second time, the actual value in the current state is obtained by linearly superposing the difference between the actual value in the previous state and the second actual value in the current state and the difference between the first actual value and the second actual value in the current state on the basis of the second actual value corresponding to the current state. Preferably, the real value in the previous state and the second real value in the current state are multiplied by the first weight. And multiplying the difference between the first practical value and the second practical value in the current state by the second weight. The first reality value is determined based on a way of maximally evaluating the merit function in the new environmental state. A second reality worth value is determined based on the value of the cost function in the new environmental state in the historical worth values. Preferably, the historical values may be corresponding values in a value table. In the process of using deep reinforcement learning, the historical value is the value corresponding to the environmental state and the action recorded in the previous environmental state.

According to a preferred embodiment, the method further comprises: and constructing a first time related to well opening of each single well in different opening time and a second time related to well closing in different shutdown time as a mixed integer nonlinear programming model with minimized energy consumption under the condition that the daily cumulative total output does not decrease, and further obtaining the optimal and dynamically-changed start-stop times, the first time and the second time under the condition of avoiding the local optimal problem.

The invention also provides a data-driven water injection oil reservoir optimization system which comprises an acquisition module, a control module and injection and production equipment. The control module is configured to:

establishing an oil reservoir model of the digital oil-water well based on the static data of the physical oil-water well and the dynamic data acquired by the acquisition module in real time;

and selecting at least one first parameter with the correlation degree larger than a first threshold value based on the correlation degree analysis of the dynamic data and the reservoir model simulation result, and adjusting the reservoir model in a manner of updating the at least one first parameter stage by stage so as to enable the difference between the numerical simulation result and the production history of the reservoir model to be predicted according to at least one first parameter and be smaller than a second threshold value.

The invention also provides a data-driven water injection oil reservoir optimization system which comprises an acquisition module, a control module and injection and production equipment. The control module is configured to: under the condition that the control module controls the injection and production equipment to simultaneously execute an integrated water injection scheme and a drainage scheme based on reservoir model optimization based on a reinforcement learning/deep reinforcement learning algorithm in a mode of simultaneously optimizing the layered water injection parameter and the layered oil production parameter of the injection and production equipment based on a reinforcement learning/deep reinforcement learning algorithm, the control module is configured to construct a loss function in the reinforcement learning/deep reinforcement learning algorithm in a mode of fusing online learning and offline learning on the basis of dividing the starting and stopping times, the well opening time and the well closing time of a water injection well and an oil production well.

Drawings

FIG. 1 is a simplified block diagram of a preferred embodiment of the waterflood reservoir optimization system of the present invention;

FIG. 2 is a schematic flow chart of the steps of a preferred embodiment of the waterflood reservoir optimization method of the present invention.

List of reference numerals

100: the acquisition module 200: the control module 300: injection and production equipment

Detailed Description

The following detailed description is made with reference to the accompanying drawings.

The existing oil field development mode adopts layered injection and production, and comprises at least one water injection well and at least one oil production well. The water injection well is in communication with at least one production well. The longitudinal direction between the water injection well and the oil production well is of a layered structure. The layered structure includes a plurality of water injection layers. An interlayer is arranged between two adjacent water injection layers. Preferably, the injection and production wells are provided with perforations. Perforation is an operation that a special energy-gathering material enters a preset horizon of a borehole to perform explosion perforation to allow fluid in underground strata to enter the perforation, is generally applied to oil-gas fields and coal fields, and is sometimes applied to the exploitation of water sources. Shaped-charge perforators are commonly used in most oil fields, bullet-type perforators have been used historically in perforating, and water-flow perforators have been used by some large oil companies abroad. The principle of water flooding development is to use water as a displacement agent, and to replace crude oil in an oil field by using water with a certain flow rate under the action of a certain temperature and pressure based on osmosis. Under the action of permeation, the oil-water propulsion interface moves towards the direction of the oil production well, the oil layer is propelled to the oil production well, and then crude oil is collected through the oil production well.

Preferably, after the water flooding development oil field enters a high-water-content stage, parameters such as permeability and wettability of the reservoir are changed under the influence of natural three contradictions (in-layer, plane and interlayer contradictions) of the reservoir and long-term flushing of water flooding, so that injected water suddenly enters along high-permeability dominant channels, namely large channels and crack positions, low-efficiency or ineffective circulation of the injected water is caused, water flooding waves and volume are reduced, and the water flooding development effect is influenced.

Example 1

The embodiment provides a data-driven waterflooding reservoir optimization method, as shown in fig. 1, comprising the following steps:

s100: and establishing an oil reservoir model based on the static data and the dynamic data acquired in real time. Preferably, a preliminary reservoir model is established based on the static data and the dynamic data collected in real time;

and under the condition that the difference between the numerical simulation result of the oil reservoir model and the production history is smaller than a second threshold value, taking the primary oil reservoir model as an oil reservoir model optimized by a subsequent water injection scheme and a drainage scheme. Preferably, in the case that the difference between the numerical simulation result and the production history of the reservoir model is greater than a second threshold, at least one first parameter with a correlation degree greater than the first threshold is selected based on the correlation degree analysis of the dynamic data and the numerical simulation result, and the preliminary reservoir model is adjusted in a manner of optimizing the at least one first parameter so that the difference between the numerical simulation result and the production history is less than the second threshold. Preferably, the association degree analysis may obtain the association degree of each uncertain dynamic data with the numerical simulation result of the reservoir model through a grey association rule mining algorithm. Preferably, the first threshold value can be specifically set according to dynamic data of zonal injection and oil recovery collected in real time and the established reservoir model. The value of the first threshold is 0-1. In the embodiment, the value of the first threshold is between 0.4 and 1. Preferably, the first parameter may be one or several of porosity, permeability, reservoir horizontal direction permeability, reservoir water quality direction permeability, initial pipe pressure at the oil water advancing interface. Preferably, the second threshold value characterizes how well the reservoir values fit to the actual historical production. The value of the second threshold value is between 0 and 1. Preferably, in this embodiment, the difference between the numerical simulation result and the production history is an error co-defense root. The value range of the second threshold value is between 0 and 0.3.

The step of optimizing the at least one first parameter is as follows:

randomly selecting an initial value of a first parameter based on historical production parameters;

predicting based on the initial value of the at least one first parameter to generate numerical simulation results of a plurality of reservoir models;

and adjusting the first parameter step by step based on the difference between the predicted numerical simulation results and the production history of the plurality of reservoir models so that the difference between the predicted numerical simulation results and the production history of the plurality of reservoir models is smaller than a second threshold value.

Through this setting mode, the beneficial effect who reaches is:

the traditional oil reservoir numerical simulation method is limited by grid shape and resolution, and the time consumption of dynamic simulation is not beneficial to the development of subsequent history fitting work. The setting mode can reduce the calculation overhead and the time cost under the condition that the number of the first parameters is large.

Preferably, in the step-by-step adjustment of the first parameter based on the predicted difference between the numerical simulation results and the production history of the plurality of reservoir models,

the difference between the numerical simulation results of the plurality of reservoir models and the production history is characterized as an error covariance matrix;

the first parameter is adjusted based on the error covariance matrix. Preferably, the observations in the actual production history are added to the error covariance matrix to enlarge the error covariance matrix. The manner in which the first parameter is adjusted by the error covariance matrix is a manner in which to solve the linear problem, whereas updating the reservoir model by adjusting the first parameter by the error covariance matrix includes the non-linear problem due to the reservoir having heterogeneity. According to the invention, by combining the observed values in the production history with the error covariance matrix, errors can be reduced in the process of converting the nonlinear problem into the linear problem.

S200: the method comprises the steps of obtaining physical properties of a reservoir based on an oil reservoir model, and determining a water injection scheme and a drainage scheme of layered injection and production by utilizing a deep learning algorithm/machine learning algorithm, so that the layered water injection and the layered oil production determine the integrated water injection scheme and drainage scheme in a synergistic manner. The method comprises the following steps of obtaining physical properties of a reservoir based on an oil reservoir model, and determining a water injection scheme and a drainage scheme of layered injection and production by utilizing a deep learning algorithm/a machine learning algorithm:

and determining an integrated water injection scheme and a drainage scheme based on the second parameter. Preferably, the evaluation indexes of the water filling effect and the drainage effect are quantified based on the static data, the dynamic data and the second parameter. Preferably, the water filling scheme and the drainage scheme are obtained based on a deep learning algorithm/machine learning algorithm. Preferably, the second parameters include at least one or more of reservoir petrophysical properties, single sand extension and geometry, fault assembly structure and seal, injection and production profile, perforations and stimulation zones in the water and production wells, relative positions of the water and production wells. Preferably, the value of the second parameter may be determined using the following criteria, in particular:

(1) for the same sand body, a communicated fluid flow path exists between the water injection well and the oil production well under a proper well pattern and a reasonable well spacing;

(2) for different sand bodies, the water injection well and the oil production well are not communicated;

(3) the water injection well or the oil production well drilled in the mudstone area is not communicated;

(4) the water injection well and the oil production well near the closed fault or the mudstone area are not mutually connected;

(5) for the sand geometry which causes the overlong flow channel between the water injection well and the oil production well, no fluid flow or weak flow exists between the water injection well and the oil production well;

(6) under appropriate conditions, the injected water may bypass the barrier;

(7) the secondary oil production wells in the same direction are not affected;

(8) production wells may be affected in multiple directions;

(9) under proper angle and interval, one water injection well can affect a plurality of oil production wells;

(10) the water injection well and the oil production well are free of fluid flow in the formation that is not simultaneously perforated;

(11) the streamlines cannot cross each other.

Preferably, the steps of obtaining the integrated water injection scheme and the drainage scheme based on the deep learning algorithm/the machine learning algorithm are as follows:

a measure for analyzing the layering direction of the single water injection well/oil production well by applying a machine learning algorithm;

evaluating the water injection effect and the drainage and production effect of the well group/interval by applying a machine learning algorithm;

a machine learning algorithm is applied to determine the water injection adjusting direction and the drainage and production adjusting direction of the well group/interval;

and (4) solving the optimal integrated water injection scheme and drainage and extraction scheme by using a deep learning algorithm.

Preferably, an integrated water injection scheme and a drainage scheme are optimized based on a deep reinforcement learning algorithm. Preferably, the optimization goal of the integrated flooding scheme and drainage scheme is that the net present value is maximal. And at least acquiring the changes of pressure and saturation distribution based on the high-dimensional dynamic data measured in real time, and taking the high-dimensional pressure and saturation changes as the input of a deep reinforcement learning algorithm. At least the water injection frequency and the oil recovery frequency are used as decision variables. Preferably, the optimized flooding and drainage schemes are adjusted in a manner that supplements at least the reservoir petrophysical properties, the relative positions of the flooding and production wells based on second parameters provided by the reservoir model.

S300: and executing an integrated water injection scheme and a drainage scheme in a mode of simultaneously optimizing the parameters of the layered water injection and the parameters of the layered oil extraction based on the reinforcement learning/deep reinforcement learning algorithm, thereby avoiding independently implementing the layered water injection and the layered oil extraction to realize injection-production balance and supply-drainage coordination.

According to a preferred embodiment, the steps of performing an integrated waterflooding scheme and drainage scheme based on reinforcement learning/deep reinforcement learning algorithm in a manner that optimizes both the layered waterflooding parameters and the layered oil recovery parameters are as follows:

in the case where the cost function converges and the optimization decision does not bring the environmental state to the optimization goal, or in the case where the cost function converges and the corresponding injection and production facility 300 is not damaged,

Preferably, the injection-production apparatus 300 may be controlled by a Proportional-Integral-Derivative (PID) control.

Preferably, the reinforcement learning algorithm is first introduced. The basic process of reinforcement learning is a markov decision process. The Markov decision process may form a quadruple representation { s, a, p, r } with state s, action a, state transition probability p, state transition reward or reward r. For the discrete-time markov decision process, the set of states and actions is referred to as the state space S and the action space a. Is specifically represented as state s_i∈S，a_iE.g. A. According to the action selected in step t, the state is according to the probability P(s)_t+1,s_t,a_t) From s_tIs transferred to s_t+1. At the same time of state transition, the decision-making body gets 1 instant reward R(s)_t+1,s_t,a_t). S in the above expression_tShown as the state at time t. a is_tShown as the action at time t. The accumulated rewards at the end of the above process are:

G_t＝R_t+γR_t+1+γ²R_t+2+…+γ^kR_t+k＝∑_k＝0γ^kR_t+k (1)

r in the formula (1)_tIs the accumulated prize in time t. Gamma is a discount factor, and the value range is between 0 and 1. Discounting factors for curtailment of long-term decisionsThe corresponding bonus weight. The ultimate goal of the decision is to achieve maximization of the jackpot while reaching the goal state.

Preferably, during reinforcement learning, different environmental states and actions are recorded to build a table of values. Preferably, values corresponding to historical or previous environmental states and actions are recorded in the value table. The value represents a discrete record about the cost function. Preferably, the cost function may be a set of unary quadratic functions with respect to the first optimization objective. For example, the oil production is-l (x-m)²+ n. The three coefficients l, m and n are set at least to satisfy that the oil production is in a positive value in half of the production cycle. Preferably, the first action is obtained based on an epsilon-greedy strategy. Preferably, the first action is a random action. And selecting an action corresponding to the maximum value of the value function by the epsilon-greedy strategy at the later stage of learning and training, and randomly selecting an action to acquire the reward with a certain probability epsilon.

Preferably, the optimization decision is made by training learning to approach the first optimization goal based on the environmental state at the current time and the reward after the action performed in the environmental state at the previous time. Preferably, the first optimization objective includes net present value, oil recovery and production maximization. Preferably, the first optimization goals may also include injection/production profile homogenization, energy consumption minimization, life maximization for the injection and production facility 300, and the like. Since the first optimization objective includes net present value, oil recovery and production maximization, and life maximization of the injection and production facility 300. The state space can thus select as state space S the properties directly related to production, pump cycle for injection wells, pump cycle for production wells, water injection rate, oil recovery rate, etc. Preferably, the state space S is a multidimensional matrix, where the number of rows is the number of associated attributes and the number of columns is the number of water injection wells and oil production wells. Preferably, the parameters collected in real time at least include flow and pressure of a single well, wellhead oil casing pressure, wellbore temperature and pressure distribution, pipeline pressure, injection equipment pressurization and power, lift equipment lift and power, and the like. Preferably, the decision variables may be the operation frequency of the injection-production equipment 300, the opening degree of the water nozzle valve, the opening degree of the oil nozzle valve, and the opening degree of the ICD valve. Thus, the action space A of the execution module comprisesThe operation frequency of an injection well, the operation frequency of a production well, the opening degree of a water nozzle valve, the opening degree of an oil nozzle valve and the opening degree of an ICD valve. Preferably, the motion space a is also a multidimensional matrix, wherein the number of rows is 5 and the number of columns is the number of corresponding water injection wells and oil production wells. The values of the corresponding operational characteristic quantities in the operational space a are described in terms of the operating frequency of the production well. Preferably, the operating frequency v of the production well_iThe motion characteristic quantity of (1) is:

v_iis 1, 0, -1. When the control module 200 gives feedback to the execution module, i.e., 1, 0, -1, the execution module increases, does not change, decreases Δ v from the original frequency. It should be noted that the magnitude setting of Δ v should be determined according to actual situations. If Δ v is too small, the convergence speed will be slow, and if Δ v is too large, the system will be unstable and even unable to converge.

Preferably, the function relating to the reward is constructed based on the environmental state fed back after the previously executed module performed the action. The maximum value of the reward function should be equivalent to the first optimization goal. For example, the reward function is a function of the action a performed by the execution module 300 and the environmental state s collected by the collection module 100. The reward function R (a, s) is as follows:

preferably, the updated cost function has a cost of:

Q(s_t+1,a_t+1)＝Q_o(s_t,a_t)+loss (4)

q(s) in formula (4)_t+1,a_t+1) Is the value of the updated cost function. Q_o(s_t,a_t) Previous value in previous environmental state. The previous value is the value stored in the value table or the value under the corresponding environmental state and action previously recorded. loss is a loss function.

loss＝α[Q_r(s_t+1,a_t+1)-Q_o(s_t,a_t)] (5)

Q in formula (5)_r(s_t+1,a_t+1) Is a practical value. α is the learning rate. Alpha is between 0 and 1. Alpha determines the rate of value table update.

Preferably, the real-world value includes a first real-world value learned online and a second real-world value learned offline. Preferably, the first realistic value is determined based on the way in which the merit function in the new environmental state is maximally evaluated. Preferably, the first realistic value is:

Q_r1(s_t,a_t)＝R(s_t,a_t)+γmaxQ_o(s_t+1,a_t+1) (6)

q in formula (6)_r1(s_t,a_t) Is the first realistic value. R(s)_t,a_t) The corresponding reward after the first action is executed. MaxQ_o(s_t+1,a_t+1) And the maximum value corresponding to the new state in the value table is the new state after the action is executed. Gamma denotes the state s_tBy action a_tThe value of (d) and the degree of attenuation associated with the next state and action. The value range of gamma is 0-1.

Preferably, the second realistic value is determined based on the value of the cost function at the new environmental state in the cost table. Preferably, the second realistic value is:

Q_r2(s_t,a_t)＝R(s_t,a_t)+γQ_o(s_t+1,a_t+1) (7)

q in formula (7)_r2(s_t,a_t) Representing a second reality worth.

Preferably, in the training of reinforcement learning, different updating strategies may affect the learning rate, convergence rate, stability, calculation complexity, and other problems, and further may affect the training time and the maintenance period of the injection and production equipment 300. For example, learning rate, convergence rate, and computational complexity are directly related to learning training time. In the process of selecting and executing work based on the epsilon-greedy strategy, if the first reality value based on online learning is updated, the updating is the maximum evaluation of a value function and depends on the real-time feedback of the environment state, so that the generated optimization decision is more aggressive, the action change degree is larger, the mechanical motion process of the injection and collection equipment 300 is not smooth enough, and further, the injection and collection equipment 300 can be damaged greatly, and the situation of multiple damages is caused. Under the condition of updating according to the second practical value of the off-line learning, the updating is conservative, so that the learning and training time is too long, and therefore, the optimization decision in the learning and training process is gentle on the basis of shortening the learning and training time by combining the on-line learning and the off-line learning, so that the action is smooth, and large fluctuation can not be generated.

and constructing a loss function in the reinforcement learning/deep reinforcement learning algorithm by fusing online learning and offline learning modes on the basis of dividing the starting and stopping times, the well opening time and the well closing time of the water injection well and the oil production well. Preferably, the number of start-stop times, the well opening time and the well closing time of each single well in one equipment detection period are divided based on the state space, and then the first time of each single well related to well opening in different opening time and the second time of each single well related to well closing in different stopping time are determined. And in the same first time/second time, the actual value in the current state is obtained by linearly superposing the difference between the actual value in the previous state and the second actual value in the current state and the difference between the first actual value and the second actual value in the current state on the basis of the second actual value corresponding to the current state. Preferably, the real value in the previous state and the second real value in the current state are multiplied by the first weight. And multiplying the difference between the first practical value and the second practical value in the current state by the second weight. The first reality value is determined based on a way of maximally evaluating the merit function in the new environmental state. A second reality worth value is determined based on the value of the cost function in the new environmental state in the historical worth values.

Preferably, the sum of the first weight and the second weight is 0-1. The first weight and the second weight may be set according to a value table, or may be set according to actual conditions. Preferably, the second reality value corresponding to the current state is used as the minimum value of the reality values of the current state, so that the basic time of the learning training is ensured. The actual value in the previous state and the second actual value in the current state are used for determining the difference degree between the current state and the previous state. And the difference between the first reality value and the second reality value in the current state is used for measuring the corresponding degree of aggressiveness in the same past state in the comparison value table of the current optimization strategy. Through this setting mode, the beneficial effect who reaches is:

the sum of the first weight and the second weight is 1, that is, the decision of the corresponding current state is based on the second practical value in the current state, and the difference degree between the current state and the previous state is considered, so that the execution action corresponding to the decision executed by the execution module in the two states can be stable, and a certain degree of change of the decision is increased. In addition, the degree of decision change can be further increased by considering the corresponding degree of aggressiveness of the current optimization strategy in the same state in the past compared with the corresponding degree in the value table, so that the learning training time is reduced.

Preferably, when the first time enters the second time from the adjacent first time or the first time enters the second time from the adjacent second time, the difference between the first reality value and the second reality value corresponding to the current state under the third weight is linearly superimposed on the basis of the second reality value of the current state. And the value of the third weight is between 0 and 1. Because the states of well opening and well closing are obviously different, the change degree of the first practical value and the second practical value can be only considered, so that the generated decision cannot be changed too much, and the injection-production equipment 300 is prevented from being damaged.

Preferably, when the cost function is not converged, a parameter within a threshold for executing the action in the execution module is randomly selected, and a state corresponding to the parameter is taken as an initial state. The state at least comprises oil production, liquid supply, water injection and the like, and then a new round of control is carried out. Preferably, the state of the present invention is referred to as an environmental state.

Preferably, a deep reinforcement learning algorithm may also be employed. The deep reinforcement learning algorithm constructs a cost function based on the environment state, the execution action and the update parameter. I.e. the merit function Q(s)_t,a_t) An update parameter theta is added on the basis of (a). The value of theta is between 0 and 1. The value function of the deep reinforcement learning is Q(s)_t,a_t,θ_t). Preferably, the control module 200 is configured to perform the learning update based on a linear superposition of the previous cost and loss functions at the previous environmental state. Preferably, the updated cost function has a cost of:

Q(s_t+1,a_t+1,θ_t+1)＝Q_o(s_t,a_t,θ_t)+loss (8)

preferably, the cost function may be a sine, cosine, exponential, etc. curve. Preferably, the update problem of the cost function is converted into a function fitting problem. Preferably, the cost function is fitted by a multi-order polynomial. The optimum value is approached by updating the parameter theta. By adopting the setting mode, the problem of high-dimensional input, namely the problem of large state space S and action space A, can be solved.

According to a preferred embodiment, the method further comprises: and constructing a first time related to well opening of each single well in different opening time and a second time related to well closing in different shutdown time as a mixed integer nonlinear programming model with minimized energy consumption under the condition that the daily cumulative total output does not decrease, and further obtaining the optimal and dynamically-changed start-stop times, the first time and the second time under the condition of avoiding the local optimal problem. Preferably, the optimization goal of the mixed integer nonlinear programming model is energy minimization. The constraints of the mixed integer nonlinear programming model are as follows:

1. the daily cumulative total yield does not decrease;

2. minimum flow properties are met;

3. the integrity of the tubular string is greater than a minimum threshold.

Preferably, the decision variables of the mixed integer nonlinear programming model may be the operation frequency of the injection and production equipment 300, the water nozzle opening, the oil nozzle opening and the ICD valve opening. Preferably, the lowest thresholds for lowest flow performance and string integrity may be related by the frequency of operation of the injection and production equipment 300, the water nozzle opening, the oil nozzle opening, and the ICD valve opening. Preferably, the mathematical characterization of the maximum flow performance may be that each hierarchical node satisfies the minimum critical liquid carrying flow. The wellbore and the pipe string need to operate within a certain pressure range, and therefore the pipe string needs to meet strength requirements. Preferably, the integrity of the pipe string may also be characterized by the pipe string being subjected to a range of pressures. The pressure experienced by the string is less than the highest threshold and greater than the lowest threshold. Preferably, the minimum critical liquid carrying flow rate and the working pressure range in the running process of the tubular column are set according to actual parameters of oilfield exploitation. Preferably, the above mixed integer non-linear programming model may be solved based on a mixed integer non-linear programming solver.

Example 2

As shown in fig. 2, the present invention provides a data-driven waterflooding reservoir optimization system, which includes an acquisition module 100, a control module 200, and an injection-production apparatus 300.

Preferably, the acquisition module 100 may include a pressure sensor, a temperature sensor, a voltage sensor, a current sensor. The acquisition module 100 also includes a meter that measures moisture content.

Preferably, the control module 200 may be a computer device, such as a mobile computing device, a desktop computing device, a server, or the like. The control module 200 may include a processor and a memory device. The storage device is used for storing instructions sent by the processor. The processor is configured to execute instructions stored by the storage device. Preferably, a storage device may be separately provided outside the control module 200. The Processor may be a Central Processing Unit (CPU), a general purpose Processor, a Digital Signal Processor (DSP), an Application-Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, transistor logic, hardware components, or any combination thereof.

Preferably, the control module 200 may carry an operating system, such as a Linux system, an Android system, an IOS operating system, or the like.

Preferably, the control module 200 may operate in a networked environment using logical connections to one or more remote computers, whether wired or wireless. The remote computer may be another computer, a tablet computer, a PDA, a server, a router, a network PC, a peer device or other common network node with respect to the control module 200, and typically includes some and/or all of the elements described above relative to the computer. Logical connections include local area networks, wide area networks, private networks, and the like, presented by way of example and not limitation. The control module 200 of the present invention can be used by the entities of the oil reservoir development personnel, departments, enterprises, etc. to perform operations such as remote query, modification, call operation, etc.

Preferably, the storage device may be a magnetic disk, a hard disk, an optical disk, a removable hard disk, a solid state disk, a flash memory, or the like.

Preferably, the control module 200 may be connected with the acquisition module 100 and the execution module of the injection and production device 300 in a wired or wireless manner. Preferably, the injection and production facility 300 includes an injection well 1 and a production well. The injection well 1 at least comprises a measuring and adjusting water distributor and a wellhead water nozzle. The oil production well at least comprises a submersible motor and a wellhead choke. Preferably, the injection and production equipment 300 further includes an Inflow Control valve (ICD) for horizontal wells. Preferably, the control module 200 controls the injection-production apparatus 300 through an execution module. The execution module at least comprises a frequency converter and a valve opening degree adjusting mechanism. For example, the control module 200 controls the injection well's logging and injection distributor and the production well's submersible motor via the frequency converter.

Preferably, the control module 200 is configured to:

establishing an oil reservoir model of the oil-water well based on the static data of the oil-water well and the dynamic data acquired by the acquisition module 100 in real time;

and selecting at least one first parameter with the association degree larger than a first threshold value based on the association degree analysis of the dynamic data and the reservoir model simulation result, and adjusting the reservoir model in a mode of updating the at least one first parameter stage by stage so that the difference between the numerical simulation result of at least one predicted reservoir model generated according to the at least one first parameter and the production history is smaller than a second threshold value.

Preferably, the control module 200 is configured to: under the condition that the control module 200 controls the injection and production equipment 300 to simultaneously execute an integrated water injection scheme and a drainage scheme based on reservoir model optimization in a manner of simultaneously optimizing the layered water injection parameter and the layered oil production parameter of the injection and production equipment 300 based on a reinforcement learning/deep reinforcement learning algorithm, the control module 200 is configured to construct a loss function in the reinforcement learning/deep reinforcement learning algorithm in a manner of fusing online learning and offline learning on the basis of dividing the starting and stopping times, the well opening time and the well closing time of a water injection well and an oil production well.

Preferably, the control module 200 is configured to: the digital oil-water well is constructed in a manner of interactively mapping a real production environment and a virtual production environment based on static data of the oil-water well and dynamic data acquired in real time by the acquisition module 100. Through the setting mode, the method is used for constructing the oil reservoir model by combining static data and dynamic data, and then improving the oil reservoir recognition level through the dynamic data acquired by long-term monitoring of parameters such as underground layered flow, pressure, water content and the like, and providing accurate basis for fine oil reservoir analysis and potential excavation.

Preferably, the control module 200 is configured to determine the integrated flooding scheme and drainage scheme performed by the injection and production equipment 300 using a deep learning algorithm/machine learning algorithm based on the acquired physical properties of the reservoir by the digital oil-water well. Through the setting mode, the corresponding water injection scheme and the drainage scheme can be determined based on dynamic data drive by means of quantitative evaluation standards based on an oil reservoir model and a deep learning algorithm or a mechanical learning algorithm. More importantly, the setting mode can realize the block well group collaborative optimization of reservoir development of reservoir dynamics, layered oil recovery and layered water injection, namely, the integrated design of a water injection scheme and a drainage scheme is adopted, so that the phenomenon that the production scheme of a water injection well and a production scheme of a production well are separated to cause lag regulation and control is avoided. Specifically, the physical property of the reservoir, the saturation and pressure of fluid in the injection and production process and other information can be predicted based on the oil reservoir model, so that the water injection scheme and the drainage scheme can be adjusted rapidly. However, when the water injection scheme and the drainage and production scheme are adjusted, the corresponding injection and production equipment does not have the real-time adjustment capability. Specifically, in the process of adjusting the water injection scheme and the drainage and production scheme, the corresponding injection and production equipment needs to have the capabilities of automatic optimization and quick learning. Although the existing deep reinforcement learning algorithm can adapt to the change of the environment through fast learning training, from the actual situation of oil field development, the relation between an injection well and an effective oil production well on a plane and an interval is complex, the matching degree of the geological analysis state and the actual state is poor, and because the layer system plugging is incomplete, a plurality of oil production wells have the phenomena of no injection during production and mutual string of oil layers, so that the oil utilization is unbalanced, and the development effect is poor. More importantly, the separate-zone water injection and the separate-zone oil extraction belong to the same system essentially, but are often implemented according to an independent process technology when the system is applied, and a synergistic effect is not exerted, so that the real-time adjustment and separation of the related parameters of the water injection well and the oil production well are caused during actual operation, namely the water injection parameter of the water injection well and the drainage and production parameter of the oil production well are respectively and independently adjusted, so that the non-uniformity, the pressure change and the like of the whole reservoir correspondingly change after the related parameters of the water injection well are adjusted, and the oil production parameters adopted by the oil production well are adjusted based on the property of the reservoir before the change, so that the problem of lag regulation is caused. Moreover, based on the problem of injection-production separation regulation, the existing related self-optimizing intelligent regulation technology only considers the variables of the model of the reinforcement learning algorithm from the setting of the optimization target, the decision variables, the reward function and the cost function. For example, a water injection well may only consider water injection rate, volume increase, volume decrease, well pressure, etc., and a production well may only consider production, well pressure, fluid level, submersible motor speed, etc. Therefore, the injection and production equipment is controlled to simultaneously execute the water injection scheme and the drainage scheme optimized based on the oil reservoir model based on the reinforcement learning/deep reinforcement learning algorithm in a mode of simultaneously optimizing the zonal water injection parameter and the zonal oil production parameter of the injection and production equipment, so that the zonal water injection and the zonal oil production are prevented from being independently implemented to realize injection and production balance and supply and drainage coordination.

Preferably, the control module 200 is configured to control the injection and production equipment 300 to simultaneously execute the integrated water injection scheme and the production drainage scheme based on the reinforcement learning/deep reinforcement learning algorithm in a manner of simultaneously optimizing the layered water injection parameter and the layered oil production parameter of the injection and production equipment 300, so as to avoid independently implementing the layered water injection and the layered oil production to realize the injection and production balance and the supply and drainage coordination. By the arrangement mode, the injection scheme and the drainage scheme are utilized to carry out learning training so that the relevant parameters of the injection well and the oil production well are automatically matched and synchronously adjusted, and the development of the oil field is switched to real-time optimization through lagging regulation and control.

Through the arrangement mode, the invention has the beneficial effects that:

according to the invention, by means of block cooperative application of a layered oil extraction and layered water injection technology, a layered oil extraction and layered water injection scheme is integrally designed, corresponding analysis of downhole intervals of a production end and an injection end is strengthened, namely, continuous, long-term and abundant downhole monitoring data of multiple layers of the injection end and the production end in the same block are utilized, big data-driven fine geological modeling is developed, an oil reservoir fluid saturation and pressure field evolution model under the constraint of layered injection and production real-time data is obtained, the understanding of oil reservoir heterogeneity and flow stripes is deepened, and the uncertainty of residual oil distribution prediction is reduced. Finally, a real-time regulation and control technology based on a deep reinforcement learning algorithm/reinforcement learning algorithm is utilized to carry out parameter matching and adjustment on the injection end and the extraction end, the development and adjustment are changed from 'hysteresis regulation' to 'real-time optimization', the well pattern control degree and adjustment level are improved, the natural degressive and water content rising are controlled, and the utilization degree and the recovery ratio are improved. Furthermore, on the basis of shaft injection and production automation and mass data processing, an artificial intelligence development oil reservoir analysis and optimization system is utilized to realize the real intelligent oil reservoir management.

According to a preferred embodiment, the control module 200 is configured to build a reservoir model of the digital oil-water well as follows:

establishing a preliminary oil reservoir model based on the static data of the physical oil-water well and the dynamic data acquired by the acquisition module 100 in real time;

under the condition that the difference between the numerical simulation result of the oil reservoir model and the production history is smaller than a second threshold value, taking the primary oil reservoir model as an oil reservoir model optimized by a subsequent water injection scheme and a drainage scheme;

and under the condition that the difference between the numerical simulation result and the production history of the reservoir model is larger than a second threshold value, selecting at least one first parameter with the correlation degree larger than the first threshold value based on the correlation degree analysis of the dynamic data and the numerical simulation result, and adjusting the preliminary reservoir model in a mode of optimizing the at least one first parameter so as to enable the difference between the numerical simulation result and the production history to be smaller than the second threshold value. Preferably, the association degree analysis may obtain the association degree of each uncertain dynamic data with the numerical simulation result of the reservoir model through a grey association rule mining algorithm. Preferably, the first threshold may be specifically set according to the dynamic data of the zonal injection and the oil production collected by the collection module 100 in real time and the established reservoir model. The value of the first threshold is 0-1. In the embodiment, the value of the first threshold is between 0.4 and 1. Preferably, the first parameter may be one or several of porosity, permeability, reservoir horizontal direction permeability, reservoir water quality direction permeability, initial pipe pressure at the oil water advancing interface 5. Preferably, the second threshold value characterizes how well the reservoir values fit to the actual historical production. The value of the second threshold value is between 0 and 1. Preferably, in this embodiment, the difference between the numerical simulation result and the production history is an error co-defense root. The value range of the second threshold value is between 0 and 0.3. Preferably, the control module 200 is configured to optimize the at least one first parameter as follows:

Through this setting mode, the beneficial effect who reaches is:

Preferably, after the oil reservoir model is adjusted to meet the fitting requirement, an injection-production relation optimization integrated water injection scheme and a drainage-production scheme are obtained based on a deep learning algorithm/a learning algorithm thereof. Preferably, the control module 200 is configured to:

a second parameter is obtained regarding reservoir fluid flow based on the reservoir model. Preferably, the second parameters include at least one or more of reservoir petrophysical properties, single sand extension and geometry, fault assembly structure and seal, injection and production profile, perforations and stimulation zones in the water and production wells, relative positions of the water and production wells. Preferably, the value of the second parameter may be determined using the following criteria, in particular:

(6) under appropriate conditions, the injected water may bypass the barrier;

(7) the secondary oil production wells in the same direction are not affected;

(8) production wells may be affected in multiple directions;

(11) the streamlines cannot cross each other.

Preferably, the integrated water injection scheme and drainage scheme are determined based on the second parameter. Preferably, the evaluation indexes of the water filling effect and the drainage effect of the injection-production equipment 300 are quantified based on the static data, the dynamic data and the second parameter. Preferably, an integrated water injection scheme and a drainage scheme are obtained based on a deep learning algorithm/machine learning algorithm. Preferably, the control module 200 is configured to obtain the integrated flooding scheme and the drainage scheme based on the deep learning algorithm/machine learning algorithm according to the following steps:

According to a preferred embodiment, the control module 200 is configured to optimize the integrated flooding and drainage schemes performed by the injection and production facility 300 based on a deep reinforcement learning algorithm. Preferably, the injection and production equipment 300 performs an integrated water injection scheme and drainage and production scheme with the optimization objective of maximum net present value. Preferably, at least the changes of the pressure and saturation distributions are obtained based on the high-dimensional dynamic data measured in real time by the acquisition module 100, and the high-dimensional pressure and saturation changes are used as the input of the deep reinforcement learning algorithm. Preferably, at least the water injection frequency and the oil production frequency of the injection and production equipment 300 are used as decision variables. Preferably, the optimized flooding and drainage schemes may be adjusted in a manner that supplements at least the reservoir petrophysical properties, the relative positions of the flooding and production wells based on the second parameters provided by the reservoir model.

According to a preferred embodiment, in the case that the control module 200 controls the injection and production equipment 300 to simultaneously execute the integrated water injection scheme and the drainage scheme in a manner of simultaneously optimizing the stratified water injection parameter and the stratified oil production parameter of the injection and production equipment 300 based on the reinforcement learning/deep reinforcement learning algorithm, the control module 200 is configured to construct the loss function in the reinforcement learning/deep reinforcement learning algorithm in a manner of integrating online learning and offline learning based on the division of the start-stop times, the well opening time and the well closing time of the water injection well and the oil production well. Preferably, the control module 200 is configured to control the voiding apparatus 300 via an execution module.

Preferably, the control module 200 may control the execution module by using a Proportional Integral Derivative (PID) control.

Preferably, the reinforcement learning algorithm is first introduced. The basic process of reinforcement learning is a markov decision process. Markov familyThe husband decision process may form a quadruple of s, a, p, r with state s, action a, state transition probability p, state transition reward or reward r. For the discrete-time markov decision process, the set of states and actions is referred to as the state space S and the action space a. Is specifically represented as state s_i∈S，a_iE.g. A. According to the action selected in step t, the state is according to the probability P(s)_t+1,s_t,a_t) From s_tIs transferred to s_t+1. At the same time of state transition, the decision-making body gets 1 instant reward R(s)_t+1,s_t,a_t). S in the above expression_tShown as the state at time t. a is_tShown as the action at time t. The accumulated rewards at the end of the above process are:

G_t＝R_t+γR_t+1+γ²R_t+2+…+γ^kR_t+k＝∑_k＝0γ^kR_t+k (1)

r in the formula (1)_tIs the accumulated prize in time t. Gamma is a discount factor, and the value range is between 0 and 1. The discount factor is used to reduce the reward weight corresponding to the forward decision. The ultimate goal of the decision is to achieve maximization of the jackpot while reaching the goal state.

Preferably, the control module 200 is configured to make the optimization decision by performing training learning to approach the first optimization goal based on the environmental status at the current time and the reward after the execution of the action by the execution module in the environmental status at the previous time provided by the acquisition module 100. Preferably, the first optimization objective includes net present value, oil recovery and production maximization. Preferably, the first optimization goals may also include injection/production profile homogenization, energy consumption minimization, life maximization for the injection and production facility 300, and the like. Preferably, the control module 200 constructs the state space S based on the environmental states provided by the acquisition module 100. Preferably, the control module 200 constructs the action space a of the execution module based on the optimization decisions it makes. Since the first optimization objective includes net present value, oil recovery and production maximization, and life maximization of the injection and production facility 300. The state space can thus be selected from yield, pump cycle of injection well, pump cycle of production well, water injectionThe directly related attributes of the rate, oil recovery, etc. are taken as the state space S. Preferably, the state space S is a multidimensional matrix, where the number of rows is the number of associated attributes and the number of columns is the number of water injection wells and oil production wells. Preferably, the parameters collected by the collection module 100 in real time at least include flow and pressure of a single well, wellhead oil casing pressure, wellbore temperature and pressure distribution, pipeline pressure, injection equipment pressurization and power, lift and power of a lifting device, and the like. Preferably, the decision variables may be the operation frequency of the injection-production equipment 300, the opening degree of the water nozzle valve, the opening degree of the oil nozzle valve, and the opening degree of the ICD valve. Therefore, the action space a of the execution module includes the operation frequency of the injection well, the operation frequency of the production well, the opening degree of the water nozzle valve, the opening degree of the oil nozzle valve, and the opening degree of the ICD valve. Preferably, the motion space a is also a multidimensional matrix, wherein the number of rows is 5 and the number of columns is the number of corresponding water injection wells and oil production wells. The values of the corresponding operational characteristic quantities in the operational space a are described in terms of the operating frequency of the production well. Preferably, the operating frequency v of the production well_iThe motion characteristic quantity of (1) is:

Preferably, the control module 200 constructs a function regarding the reward based on the environmental state fed back by the collection module 100 after the previously executed module performed the action. The maximum value of the reward function should be equivalent to the first optimization goal. For example, the reward function is a function of the action a performed by the execution module 300 and the environmental state s collected by the collection module 100. The reward function R (a, s) is as follows:

preferably, the control module 200 is configured to make the optimization decision as follows:

and constructing a value function about the environment state and the action executed by the execution module, and recording different environment states and actions so as to construct a value table. The value represents a discrete record about the cost function. Preferably, the cost function may be a set of unary quadratic functions with respect to the first optimization objective. For example, the oil production is-l (x-m)²+ n. The three coefficients l, m and n are set at least to satisfy that the oil production is in a positive value in half of the production cycle.

Preferably, in case the cost function converges and the optimization decision of the control module 200 does not make the environment state reach the optimization goal, or in case the cost function converges and the system is not damaged, the control module 200 is configured to obtain the first action in the corresponding environment state based on an epsilon-greedy policy. Preferably, the first action is obtained based on an epsilon-greedy strategy. Preferably, the first action is a random action. The epsilon-greedy strategy enables the control module 200 to select an action corresponding to the maximum value of the cost function at the later stage of the learning training, but a certain probability epsilon is provided to randomly select an action to obtain the reward.

Preferably, the execution module controls the injection-production apparatus 300 based on the first action information transmitted by the control module 200. The control module 200 obtains the new environment status and the corresponding reward after the execution module executes the first action based on the collection module 100. The control module 200 performs a learning update based on the new environmental state and the corresponding reward. Preferably, the control module 200 is configured to perform the learning update based on a linear superposition of the previous cost and loss functions at the previous environmental state. The control module 200 is configured to construct the loss function based on a manner that merges online learning and offline learning. Preferably, after the update, the control module 200 updates the environmental status to a new environmental status as an initial status of the next round of control.

Preferably, the control module 200 is configured to construct a loss function based on the learning rate and the difference between the real-world value and the previous value at the previous environmental state. Preferably, the updated cost function has a cost of:

Q(s_t+1,a_t+1)＝Q_o(s_t,a_t)+loss (4)

q(s) in formula (4)_t+1,a_t+1) Is the value of the updated cost function. Q_o(s_t,a_t) Previous value in previous environmental state. The previous value is the value stored in the value table. loss is a loss function.

loss＝α[Q_r(s_t+1,a_t+1)-Q_o(s_t,a_t)] (5)

Preferably, the real-world value includes a first real-world value learned online and a second real-world value learned offline. Preferably, the control module 200 configures the first reality value of online learning as follows:

the first reality value is determined based on a way of maximally evaluating the merit function in the new environmental state. Preferably, the first realistic value is:

Q_r1(s_t,a_t)＝R(s_t,a_t)+γmaxQ_o(s_t+1,a_t+1) (6)

q in formula (6)_r1(s_t,a_t) Is the first realistic value. R(s)_t,a_t) The reward corresponding to the execution of the first action by the execution module 300. MaxQ_o(s_t+1,a_t+1) And the maximum value corresponding to the new state in the value table is the new state after the action is executed. Gamma denotes the state s_tBy action a_tThe value of (d) and the degree of attenuation associated with the next state and action. The value range of gamma is 0-1.

Preferably, the control module 200 configures the second realistic value function for offline learning as follows:

Q_r2(s_t,a_t)＝R(s_t,a_t)+γQ_o(s_t+1,a_t+1) (7)

q in formula (7)_r2(s_t,a_t) Representing a second reality worth.

Preferably, in the training of reinforcement learning, different updating strategies may affect the learning rate, convergence rate, stability, calculation complexity, and other problems, and further may affect the training time and the maintenance period of the injection and production equipment 300. For example, the learning rate, convergence rate, and computational complexity are directly related to the learning training time of the control module 200. In the process of selecting the execution module based on the epsilon-greedy strategy, if the first reality value based on online learning is updated, the update is the maximum evaluation of the value function, and the real-time feedback of the acquisition module 100 on the environment state is relied on, so that the generated optimization decision is more aggressive, the action change degree executed by the execution module 300 is larger, the mechanical motion process of the injection and collection equipment 300 is not smooth enough, and further, the injection and collection equipment 300 is possibly damaged greatly, which causes the situation that the control module 200 controls the injection and collection equipment 300 to be damaged for many times in the process of learning and training. Under the condition of updating according to the second practical value of the off-line learning, the updating is conservative, so that the learning and training time of the control module 200 is too long, therefore, the optimization decision of the control module 200 in the learning and training process is smooth on the basis of shortening the learning and training time by combining the on-line learning and the off-line learning, and the action executed by the execution module 300 is smooth and does not generate large fluctuation.

Preferably, the control module 200 is configured to implement the fusion of online learning and offline learning according to the following steps:

1. and dividing the number of start-stop times, the well opening time and the well closing time of each injection and production equipment 300 in one inspection period based on the state space S, and further determining the first time of each single well related to well opening in different opening time and the second time of each single well related to well closing in different stopping time. It should be noted that intermittent water injection and intermittent oil recovery are an effective way to reduce cost and increase efficiency. The purpose of intermittent water injection and intermittent oil extraction is to increase the yield and reduce the cost. The key is to determine a reasonable interval pumping system, namely to set a proper well opening time and a proper well closing time. Therefore, the invention can determine the intermittent pumping system based on the state space S and/or the value table, and the like, then divide the optimization control of the control module 200 on the execution module 300 into different stages according to the starting and stopping times, the well opening time, the well closing time and the like in the intermittent pumping system, and optimize the learning training and decision of the control module 200 according to the different stages.

2. And in the same first time/second time, the actual value in the current state is obtained by linearly superposing the difference between the actual value in the previous state and the second actual value in the current state and the difference between the first actual value and the second actual value in the current state on the basis of the second actual value corresponding to the current state. Preferably, the real value in the previous state and the second real value in the current state are multiplied by the first weight. And multiplying the difference between the first practical value and the second practical value in the current state by the second weight. The sum of the first weight and the second weight is 0-1. The first weight and the second weight may be set according to a value table, or may be set according to actual conditions. Preferably, the second reality value corresponding to the current state is used as the minimum value of the reality value of the current state, so as to ensure the basic time of the learning training of the control module 200. The actual value in the previous state and the second actual value in the current state are used for determining the difference degree between the current state and the previous state. And the difference between the first reality value and the second reality value in the current state is used for measuring the corresponding degree of aggressiveness in the same past state in the comparison value table of the current optimization strategy. Through this setting mode, the beneficial effect who reaches is:

the sum of the first weight and the second weight is 1, that is, the decision of the corresponding current state is based on the second practical value in the current state, and the difference degree between the current state and the previous state is considered, so that the execution action corresponding to the decision executed by the execution module in the two states can be stable, and a certain degree of change of the decision is increased. In addition, it is considered that the degree of aggressiveness of the current optimization strategy compared to the corresponding past same state in the value table can further increase the degree of decision change, thereby reducing the learning training time of the control module 200.

Preferably, when the second time is entered from the adjacent first time or the first time is entered from the adjacent second time, the control module 200 is configured to linearly superimpose a difference between the first reality value and the second reality value corresponding to the current state under the third weight value on the basis of the second reality value of the current state. And the value of the third weight is between 0 and 1. Because the states of well opening and well closing are obviously different, the change degree of the first practical value and the second practical value can be only considered, so that the production decision of the control module 200 cannot be changed too much, and the injection and production equipment 300 is prevented from being damaged.

Preferably, in the case that the cost function does not converge, the control module 200 is configured to randomly select a parameter within a threshold value for executing the action in the execution module, and take a state corresponding to the parameter as an initial state. The state at least comprises oil production and liquid supply, and then a new round of control is carried out. Preferably, the state of the present invention is referred to as an environmental state.

Preferably, a deep reinforcement learning algorithm may also be employed. The control module 200 is configured to construct a cost function based on the environmental state, the execution action, and the update parameter. I.e. the merit function Q(s)_t,a_t) An update parameter theta is added on the basis of (a). The value of theta is between 0 and 1. The value function of the deep reinforcement learning is Q(s)_t,a_t,θ_t). Preferably, the control module 200 is configured to perform the learning update based on a linear superposition of the previous cost and loss functions at the previous environmental state. Preferably, the updated cost function has a cost of:

Q(s_t+1,a_t+1,θ_t+1)＝Q_o(s_t,a_t,θ_t)+loss (8)

preferably, the cost function may be a sine, cosine, exponential, etc. curve. Preferably, the control module 200 is configured to convert the update problem of the cost function into a function fitting problem. Preferably, the control module 200 is configured to fit the cost function by a multi-order polynomial. The control module 200 is configured to approximate the optimal value by updating the parameter θ. By adopting the setting mode, the problem of high-dimensional input, namely the problem of large state space S and action space A, can be solved.

Preferably, the control module 200 divides the number of start-stop times, the well opening time and the well closing time of each injection-production device 300 in one inspection period based on the state space S, and further constructs a first time about well opening in different opening times and a second time about well closing in different shutdown times of each single well as a mixed integer nonlinear programming model with minimized energy consumption under the condition that the daily cumulative total output does not decrease. Preferably, the optimization goal of the mixed integer nonlinear programming model is energy minimization. The constraints of the mixed integer nonlinear programming model are as follows:

1. the daily cumulative total yield does not decrease;

2. minimum flow properties are met;

3. the integrity of the tubular string is greater than a minimum threshold.

Preferably, the decision variables of the mixed integer nonlinear programming model may be the operation frequency of the injection and production equipment 300, the water nozzle opening, the oil nozzle opening and the ICD valve opening. Preferably, the lowest thresholds for lowest flow performance and string integrity may be related by the frequency of operation of the injection and production equipment 300, the water nozzle opening, the oil nozzle opening, and the ICD valve opening. Preferably, the mathematical characterization of the maximum flow performance may be that each hierarchical node satisfies the minimum critical liquid carrying flow. The wellbore and the pipe string need to operate within a certain pressure range, and therefore the pipe string needs to meet strength requirements. Preferably, the integrity of the pipe string may also be characterized by the pipe string being subjected to a range of pressures. The pressure experienced by the string is less than the highest threshold and greater than the lowest threshold. Preferably, the minimum critical liquid carrying flow rate and the working pressure range in the running process of the tubular column are set according to actual parameters of oilfield exploitation. Preferably, the control module 200 may solve the above mixed integer non-linear programming model based on a mixed integer non-linear programming solver.

The present specification encompasses multiple inventive concepts and the applicant reserves the right to submit divisional applications according to each inventive concept. The present description contains several inventive concepts, such as "preferably", "according to a preferred embodiment" or "optionally", each indicating that the respective paragraph discloses a separate concept, the applicant reserves the right to submit divisional applications according to each inventive concept.

It should be noted that the above-mentioned embodiments are exemplary, and that those skilled in the art, having benefit of the present disclosure, may devise various arrangements that are within the scope of the present disclosure and that fall within the scope of the invention. It should be understood by those skilled in the art that the present specification and figures are illustrative only and are not limiting upon the claims. The scope of the invention is defined by the claims and their equivalents.

Claims

1. A data-driven waterflooding reservoir optimization method, characterized in that the method comprises:

and under the condition that the difference between the numerical simulation result and the production history of the reservoir model is larger than a second threshold value, at least one first parameter with the correlation degree larger than the first threshold value is selected based on the correlation degree analysis of the dynamic data and the numerical simulation result, and the preliminary reservoir model is adjusted in a mode of optimizing the at least one first parameter so that the difference between the numerical simulation result and the production history is smaller than the second threshold value.

2. The water flooding reservoir optimization method of claim 1, wherein the step of optimizing at least one of the first parameters is as follows:

3. The water flooding reservoir optimization method of any one of claims 1 or 2, further comprising:

acquiring physical properties of a reservoir based on the oil reservoir model, and determining a water injection scheme and a drainage scheme of layered injection and production by utilizing a deep learning algorithm/a machine learning algorithm, wherein,

determining an integrated flooding scheme and drainage scheme in a manner such that stratified flooding and stratified oil production act synergistically with each other;

4. The water-flooding reservoir optimization method of any one of the preceding claims, wherein the steps of obtaining reservoir physical properties based on the reservoir model and determining a water flooding scheme and a drainage scheme for stratified injection and production using a deep learning algorithm/machine learning algorithm are as follows:

obtaining a second parameter relating to reservoir fluid flow based on the reservoir model, wherein,

the second parameters at least comprise one or more of reservoir rock physical properties, single sand extension and geometry, fault set structure and closure, injection and production profile, perforation and production enhancement layers in the water injection well and the oil production well, and relative positions of the water injection well and the oil production well;

determining an integrated flooding scheme and drainage scheme based on the second parameter, wherein,

quantifying evaluation indexes of the water injection effect and the drainage and mining effect based on the static data, the dynamic data and the second parameter;

and acquiring a water injection scheme and a drainage scheme based on a deep learning algorithm/machine learning algorithm.

5. The water-flooding reservoir optimization method of any one of the preceding claims, further comprising optimizing the integrated water flooding scheme and drainage scheme based on a deep reinforcement learning algorithm, wherein,

the optimization target of the integrated water injection scheme and drainage and mining scheme is that the net present value is maximum;

acquiring at least the changes of pressure and saturation distribution based on high-dimensional dynamic data measured in real time, and taking the high-dimensional pressure and saturation changes as the input of a deep reinforcement learning algorithm;

taking at least water injection frequency and oil extraction frequency as decision variables;

adjusting the optimized flooding and drainage schemes in a manner that supplements at least the reservoir petrophysical properties, the relative positions of the flooding and production wells based on the second parameters provided by the reservoir model.

6. The waterflood reservoir optimization method of any of the preceding claims, wherein the steps of performing the integrated waterflood injection scheme and drainage scheme based on reinforcement learning/deep reinforcement learning algorithm in a manner that optimizes both the parameters of waterflood stratification and the parameters of oil production stratification are as follows:

in the case where the cost function converges and the optimization decision does not bring the environmental state to the optimization goal, or in the case where the cost function converges and the corresponding injection and production equipment (300) is not damaged,

a learning update is made based on the new environmental state and the corresponding reward, wherein,

updating learning based on a linear superposition of a previous cost and a loss function in a previous environmental state, and constructing the loss function based on a learning rate and a difference between a real cost and the previous cost in the previous environmental state,

the reality value comprises a first reality value of online learning and a second reality value of offline learning;

after the update, the environment state is updated to a new environment state as an initial state of the next round of control.

7. The water-flooding reservoir optimization method of any one of the preceding claims, wherein the loss function is configured as follows:

constructing a loss function in a reinforcement learning/deep reinforcement learning algorithm by integrating online learning and offline learning modes on the basis of dividing the starting and stopping times, the well opening time and the well closing time of a water injection well and a production well, wherein,

dividing the starting and stopping times, the well opening time and the well closing time of each single well in one equipment detection period based on the state space, and further determining the first time of each single well related to well opening in different opening time and the second time of each single well related to well closing in different shutdown time;

in the same first time/second time, the actual value in the current state is obtained by linearly superposing the difference between the actual value in the previous state and the second actual value in the current state and the difference between the first actual value and the second actual value in the current state on the basis of the second actual value corresponding to the current state,

multiplying the actual value in the last state by the second actual value in the current state by the first weight;

multiplying the difference between the first practical value and the second practical value in the current state by a second weight;

determining a first reality value based on a mode of maximum evaluation of a value function in a new environment state;

a second reality worth value is determined based on the value of the cost function in the new environmental state in the historical worth values.

8. The water flooding reservoir optimization method of any one of the preceding claims, further comprising:

and constructing a first time related to well opening of each single well in different opening time and a second time related to well closing in different shutdown time as a mixed integer nonlinear programming model with minimized energy consumption under the condition that the daily cumulative total output does not decrease, and further obtaining the optimal and dynamically-changed start-stop times, the first time and the second time under the condition of avoiding the local optimal problem.

9. The data-driven water injection oil reservoir optimization system is characterized by comprising an acquisition module (100), a control module (200) and injection-production equipment (300), wherein the control module (200) is configured to:

establishing an oil reservoir model of the oil-water well based on the static data of the oil-water well and the dynamic data acquired by the acquisition module (100) in real time;

10. The data-driven water injection oil reservoir optimization system is characterized by comprising an acquisition module (100), a control module (200) and injection-production equipment (300), wherein the control module (200) is configured to: under the condition that the control module (200) controls the injection and production equipment (300) to simultaneously execute an integrated water injection scheme and a drainage scheme optimized based on a reservoir model in a mode of simultaneously optimizing the zonal water injection parameter and the zonal oil production parameter of the injection and production equipment (300) based on an enhanced learning/deep enhanced learning algorithm,

the control module (200) is configured to construct a loss function in a reinforcement learning/deep reinforcement learning algorithm by fusing online learning and offline learning modes on the basis of dividing the starting and stopping times, the well opening time and the well closing time of the water injection well and the oil production well.