CN112861423B

CN112861423B - Data-driven water-flooding reservoir optimization method and system

Info

Publication number: CN112861423B
Application number: CN202110028282.6A
Authority: CN
Inventors: 檀朝东; 王春秋; 赵小雨; 牛会钊; 宋文容; 宋健
Original assignee: Beijing Yadan Petroleum Technology Co ltd; China University of Petroleum Beijing
Current assignee: Beijing Yadan Petroleum Technology Co ltd; China University of Petroleum Beijing
Priority date: 2021-01-08
Filing date: 2021-01-08
Publication date: 2023-06-02
Anticipated expiration: 2041-01-08
Also published as: CN112861423A

Abstract

The invention relates to a data-driven water-flooding oil reservoir optimization method and a system, wherein the method comprises the following steps: establishing a preliminary oil reservoir model based on static data and dynamic data acquired in real time; randomly selecting historical production parameters related to oil reservoir parameters to perform oil reservoir model numerical simulation on the preliminary oil reservoir model; under the condition that the difference between the numerical simulation result and the production history of the oil reservoir model is smaller than a second threshold value, taking the preliminary oil reservoir model as the oil reservoir model with optimized subsequent water injection scheme and drainage scheme; and under the condition that the difference between the numerical simulation result and the production history of the oil reservoir model is larger than a second threshold value, selecting at least one first parameter with at least one degree of association larger than a first threshold value based on the degree of association analysis of the dynamic data and the numerical simulation result, and adjusting the preliminary oil reservoir model in a mode of optimizing at least one first parameter so that the difference between the numerical simulation result and the production history is smaller than the second threshold value.

Description

Data-driven water-flooding reservoir optimization method and system

Technical Field

The invention relates to the technical field of oil reservoir engineering development, in particular to a data-driven water-flooding oil reservoir optimization method and system.

Background

The intelligent oil field is based on the digital oil field, fully utilizes new generation information technologies such as big data, machine learning, intelligent algorithm and the like, enables the oil field to have the capabilities of observation (real-time acquisition of monitoring data), thinking (big data analysis method), decision (machine learning and intelligent optimization method) and execution (intelligent oil and gas well regulation and control equipment), and the intelligent oil field technology is based on omnibearing interconnection and intercommunication, so that each system of oil field operation is effectively integrated.

The optimal design of the manual water injection scheme mainly depends on an oil reservoir engineering method or a numerical simulation technology, the design scheme has strong randomness, time and labor are wasted, and the optimal scheme is easy to miss. The oil reservoir dynamic real-time production optimization theory is an important research hotspot of the current oil field automatic injection and production scheme design, and mainly aims to convert the oil reservoir numerical simulation technology and the optimization theory into optimal control model solution, and adopts maximization model functions such as accompanying gradient, random disturbance gradient algorithm, heuristic algorithm and the like to automatically solve an optimal working system, but because the gradient solution difficulty is higher, the digital-analog operand is larger, the dimension of the actual optimization problem is higher, and the inefficiency problem of the optimization algorithm is brought. The inter-well connectivity is an important basis for water injection optimization design, and the inter-well connectivity model based on injection and production dynamic data is gradually developed from single item to oil-water two item and single layer to new stage of multi-layer prediction, has the advantages of rapid calculation, capability of quantitatively representing inter-well communication relation and the like, and is gradually applied to oil reservoir scheme evaluation design. However, when the complex oil reservoir oil-water dynamic prediction is carried out on the current connectivity model, the saturation tracking method is not applied to the oil reservoir layered injection-production optimization scheme, so that the current oil reservoir exploitation scheme cannot be really optimized and improved, and the oil reservoir exploitation efficiency is improved.

For example, chinese patent publication No. CN108868712B discloses a method and system for optimizing oil reservoir development and production based on connectivity methods. According to the method, an accurate front tracking method is established by taking an inter-well communication unit as an object to calculate saturation, two output dynamics of each layer of oil and water at a well point are obtained, information such as conductivity, flow split and water injection efficiency among injection and production wells is obtained by automatically fitting inversion communication model parameters to reservoir history dynamics, and the reservoir is subjected to layered dynamic production allocation and injection allocation automatic optimization design by iterative calculation on the basis of the information, so that low-efficiency water drive direction flow is reduced, and the injection and production contradiction is improved. However, the oil reservoir development method and system provided by the patent do not take into account the heterogeneous nature of the oil reservoir, and in particular the technical solution of the patent document is to interpret and observe the oil reservoir with a large amount of water-injected oil-recovery stratification data acquired by the wellbore device, these data having a high degree of heterogeneity in time and space, and there being a great uncertainty in the geological and petrophysical properties of the oil reservoir.

Furthermore, there are differences in one aspect due to understanding to those skilled in the art; on the other hand, as the inventors studied numerous documents and patents while the present invention was made, the text is not limited to details and contents of all that are listed, but it is by no means the present invention does not have these prior art features, the present invention has all the prior art features, and the applicant remains in the background art to which the rights of the related prior art are added.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a data-driven water-flooding oil reservoir optimization method, which comprises the following steps:

establishing a preliminary oil reservoir model based on static data and dynamic data acquired in real time;

randomly selecting historical production parameters related to oil reservoir parameters to perform oil reservoir model numerical simulation on the preliminary oil reservoir model;

and under the condition that the difference between the numerical simulation result and the production history of the oil reservoir model is smaller than a second threshold value, taking the preliminary oil reservoir model as the oil reservoir model with optimized subsequent water injection scheme and drainage scheme. Preferably, in the case that the difference between the numerical simulation result and the production history of the reservoir model is greater than a second threshold, at least one first parameter having at least one degree of association greater than a first threshold is selected based on the degree of association analysis of the dynamic data with the numerical simulation result, and the preliminary reservoir model is adjusted in such a way that at least one of the first parameters is optimized so that the difference between the numerical simulation result and the production history is less than the second threshold. According to the invention, on the basis of utilizing static data analysis to judge a high-water-content reservoir to perform oil reservoir numerical simulation, dynamic data of layered injection and production are combined to continuously correct the oil reservoir numerical simulation, so that a dynamic and high-fitness oil reservoir data physical model is established, uncertainty of residual oil distribution prediction can be reduced, knowledge of oil reservoir heterogeneity is deepened, and oil reservoir fluid protection degree and pressure field evolution model under the constraint of layered injection and production real-time data are obtained. Specifically, at least one first parameter with larger influence on the numerical simulation result of the oil reservoir model is selected through association degree analysis, then the numerical simulation results of a plurality of corresponding oil reservoir models are calculated through the values of the selected first parameter, and the first parameter is adjusted according to the difference between the numerical simulation results of the plurality of oil reservoir models and the actual production history, so that the difference between the numerical simulation results and the actual production history is reduced, the data simulation results of the oil reservoir model are smaller than a second threshold, namely the oil reservoir model under the first parameter meets the requirement of fitting rate.

According to a preferred embodiment, the step of optimizing at least one of said first parameters is as follows:

randomly selecting an initial value of a first parameter based on the historical production parameters;

predicting based on an initial value of at least one of the first parameters to generate numerical simulation results for a plurality of the reservoir models;

the first parameters are adjusted step by step based on differences between the predicted numerical simulation results and the production history of the plurality of reservoir models such that the differences between the predicted numerical simulation results and the production history of the plurality of reservoir models are less than a second threshold.

According to a preferred embodiment, the method further comprises:

acquiring physical properties of a reservoir based on the reservoir model, and determining a water injection scheme and a drainage scheme of layered injection and production by using a deep learning algorithm/machine learning algorithm, so that the layered water injection and the layered production are in synergistic effect with each other to determine an integrated water injection scheme and drainage scheme;

the integrated water injection scheme and drainage scheme are executed in a mode of simultaneously optimizing the layered water injection parameter and the layered oil extraction parameter based on the reinforcement learning/deep reinforcement learning algorithm, so that independent implementation of layered water injection and layered oil extraction is avoided to realize injection balance and supply and drainage coordination.

According to a preferred embodiment, the steps of acquiring reservoir physical properties based on the reservoir model and determining a water injection scheme and a drainage scheme for stratified injection and production using a deep learning algorithm/machine learning algorithm are as follows: :

acquiring a second parameter relating to reservoir fluid flow based on the reservoir model;

and determining an integrated water injection scheme and drainage scheme based on the second parameter. Preferably, the evaluation indexes of the water injection effect and the drainage effect are quantized based on the static data, the dynamic data and the second parameter. Preferably, the water injection scheme and the drainage scheme are acquired based on a deep learning algorithm/machine learning algorithm. Preferably, the second parameter comprises at least one or more of reservoir rock physical properties, single sand extension and geometry, fault aggregation and closure, injection and production profile, perforation and production zone in the injection and production wells, and relative position of the injection and production wells.

According to a preferred embodiment, the method further comprises optimizing the integrated water injection and drainage scheme based on a deep reinforcement learning algorithm. Preferably, the optimization objective of the integrated water injection and drainage scheme is that the net present value is the greatest. And acquiring at least the change of pressure and saturation distribution based on the real-time measured high-dimensional dynamic data, and taking the high-dimensional pressure and saturation change as the input of a deep reinforcement learning algorithm. At least the water injection frequency and the oil recovery frequency are used as decision variables. Preferably, the optimized water injection and drainage schemes are adjusted in a manner that at least complements reservoir rock physical properties, relative positions of water injection and production wells based on said second parameters provided by the reservoir model.

According to a preferred embodiment, the integrated water injection and drainage scheme is performed in a manner that optimizes both the stratified water injection parameters and the stratified oil recovery parameters based on a reinforcement learning/deep reinforcement learning algorithm as follows:

constructing a cost function about the environmental state and the execution module executing the action;

in case the cost function converges and the optimization decision does not bring the environmental state to the optimization objective, or in case the cost function converges and the corresponding injection and production equipment is not damaged,

acquiring a first action under a corresponding environment state based on an epsilon-greedy strategy;

acquiring a new environment state after executing a first action and a corresponding reward;

learning updates are based on the new environmental status and the corresponding rewards. Preferably, the learning update is based on a linear superposition of the previous value and the loss function in the previous environmental state, and the loss function is constructed based on the learning rate and the difference between the actual value and the previous value in the previous environmental state. Preferably, the real-world value includes a first real-world value for online learning and a second real-world value for offline learning. Preferably, after the update, the environment state is updated to a new environment state as an initial state of the next round of control.

According to a preferred embodiment, the loss function is configured as follows:

and constructing a loss function in the reinforcement learning/deep reinforcement learning algorithm by combining an online learning mode and an offline learning mode on the basis of dividing the start-stop times, the well opening time and the well closing time of the water injection well and the oil extraction well. Preferably, the number of start-stops, the open time and the close time of each single well in one equipment detection period are divided based on the state space, and then the first time about the open of each single well in different open times and the second time about the close of each single well in different close times are determined. In the same first time/second time, the real value in the current state is the difference between the real value in the current state and the second real value in the current state and the difference between the first real value and the second real value in the current state which are linearly overlapped on the basis of the second real value corresponding to the current state. Preferably, the real value in the previous state is multiplied by the second real value in the current state by the first weight. The difference between the first real value and the second real value in the current state is multiplied by the second weight. The first real-world value is determined based on a manner of maximum evaluation of the cost function in the new environmental state. The second real-world value is determined based on the value of the cost function in the new environmental state in the historical value. Preferably, the historical value may be a corresponding value in a value table. In the process of using deep reinforcement learning, the historical value is the value corresponding to the environmental state and action recorded in the previous environmental state.

According to a preferred embodiment, the method further comprises: and constructing a mixed integer nonlinear programming model which determines that the energy consumption of each single well is minimized under the condition that the daily cumulative total yield does not drop in a first time about well opening in different opening time and a second time about well closing in different stopping time, so as to obtain the optimal and dynamically-changed starting and stopping times, the first time and the second time under the condition that the local optimal problem is avoided.

The invention also provides a data-driven water-injection oil reservoir optimization system, which comprises an acquisition module, a control module and injection and production equipment. The control module is configured to:

establishing an oil reservoir model of the digital oil-water well based on static data of the physical oil-water well and dynamic data acquired by the acquisition module in real time;

and selecting at least one first parameter with at least one degree of association larger than a first threshold value based on the degree of association analysis of the dynamic data and the oil reservoir model simulation result, and adjusting the oil reservoir model based on updating at least one first parameter step by step in a way that the difference between the numerical simulation result of the oil reservoir model and the production history, which is generated according to at least one first parameter, is smaller than a second threshold value.

The invention also provides a data-driven water-injection oil reservoir optimization system, which comprises an acquisition module, a control module and injection and production equipment. The control module is configured to: under the condition that the control module controls the injection and production equipment to simultaneously execute an integrated injection scheme and a drainage scheme which are optimized based on an oil reservoir model in a mode of simultaneously optimizing the layered injection parameters and the layered production parameters of the injection and production equipment based on a reinforcement learning/deep reinforcement learning algorithm, the control module is configured to construct a loss function in the reinforcement learning/deep reinforcement learning algorithm in a mode of fusing on-line learning and off-line learning based on the division of the start-stop times, the well opening time and the well closing time of the injection well and the production well.

Drawings

FIG. 1 is a simplified block diagram of a preferred embodiment of a water-filled reservoir optimization system of the present invention;

FIG. 2 is a schematic flow chart of the steps of a preferred embodiment of the water-flooding reservoir optimization method of the present invention.

List of reference numerals

100: acquisition module 200: control module 300: injection and production equipment

Detailed Description

The following detailed description refers to the accompanying drawings.

The existing oilfield development mode adopts layered injection and production and comprises at least one water injection well and at least one oil production. The water injection well is in communication with at least one oil recovery well. The water injection well and the oil production well are of a layered structure in the longitudinal direction. The layered structure includes a plurality of water-infused layers. An interlayer is arranged between two adjacent water injection layers. Preferably, the water injection well and the oil recovery well are provided with perforations. Perforation is an operation of explosion opening by using special energy gathering materials to enter a preset layer of a borehole to enable fluid in a downhole formation to enter the borehole, and is widely applied to oil and gas fields and coal fields, and sometimes also applied to exploitation of water sources. Shaped perforators are commonly used in most oil fields, gun-type perforators have been used in the history of perforation, and water-flow perforators are also used in some large petroleum companies abroad. The principle of water flooding development is to use water as a displacement object, and to adopt water with a certain flow rate under the action of a certain temperature and a certain pressure based on the osmosis action so as to replace crude oil in an oil field. Under the action of penetration, the oil-water propelling interface moves towards the oil production well to propel the oil layer to the oil production well, so that crude oil is collected through the oil production well.

Preferably, after the water injection development oil field enters the medium-high water-containing stage, the water injection development oil field is influenced by natural three contradictions (contradictions among layers, planes and layers) of a reservoir and long-term flushing of water injection, parameters such as permeability and wettability of the reservoir are changed, so that injected water is caused to intrude along a high-permeability dominant channel, namely a large pore canal and a crack part, thereby causing inefficient or invalid circulation of the injected water, causing the water injection to sweep down in volume and influencing the water flooding development effect.

Example 1

The embodiment provides a data-driven water-flooding oil reservoir optimization method, which comprises the following steps as shown in fig. 1:

s100: and establishing an oil reservoir model based on the static data and the dynamic data acquired in real time. Preferably, a preliminary reservoir model is established based on static data and dynamic data acquired in real time;

and under the condition that the difference between the numerical simulation result and the production history of the oil reservoir model is smaller than a second threshold value, taking the preliminary oil reservoir model as the oil reservoir model with optimized subsequent water injection scheme and drainage scheme. Preferably, in the case that the difference between the numerical simulation result and the production history of the reservoir model is greater than the second threshold, at least one first parameter having at least one degree of association greater than the first threshold is selected based on the degree of association analysis of the dynamic data and the numerical simulation result, and the preliminary reservoir model is adjusted by optimizing the at least one first parameter so that the difference between the numerical simulation result and the production history is less than the second threshold. Preferably, the association degree analysis can acquire the association degree of each uncertain dynamic data and the numerical simulation result of the oil reservoir model through a grey association rule mining algorithm. Preferably, the first threshold may be specifically set according to the real-time collected dynamic data of the separate layer water injection and oil extraction and the established reservoir model. The value of the first threshold is between 0 and 1. In this embodiment, the value of the first threshold is between 0.4 and 1. Preferably, the first parameter may be one or several of porosity, permeability, reservoir horizontal permeability, reservoir water quality direction permeability, initial tube pressure at the oil-water push interface. Preferably, the second threshold characterizes a fit of the reservoir value to actual historical production. The second threshold value is between 0 and 1. Preferably, in the present embodiment, the difference between the numerical simulation result and the production history is an error co-defense root. The second threshold value is in the range of 0-0.3.

The step of optimizing the at least one first parameter is as follows:

predicting based on the initial value of the at least one first parameter to generate numerical simulation results of the plurality of reservoir models;

the first parameter is adjusted step by step based on a difference between the numerical simulation results and the production history of the predicted plurality of reservoir models such that the difference between the numerical simulation results and the production history of the predicted plurality of reservoir models is less than a second threshold.

Through this setting method, the beneficial effect who reaches is:

the method comprises the steps of selecting an initial value of a first parameter based on prior information, then constructing an error covariance matrix through the difference between the predicted numerical simulation result and production history in a mode of predicting the numerical simulation result in parallel by different first parameters, and adjusting the value of the first parameter through the error covariance matrix to adjust the parameter of an oil reservoir model, so that the oil reservoir model reaches the history fitting standard. The setting mode can reduce the calculation cost and the time cost under the condition that the number of the first parameters is large.

Preferably, in the step-wise adjustment of the first parameter based on the difference between the numerical simulation results and the production history of the predicted plurality of reservoir models,

the differences between the numerical simulation results and the production histories of the plurality of oil reservoir models are characterized as error covariance matrixes;

the first parameter is adjusted based on the error covariance matrix. Preferably, observations in the actual production history are added to the error covariance matrix to expand the error covariance matrix. The way to adjust the first parameter by the error covariance matrix is a way to solve the linearity problem, and because the reservoir has non-uniformity, updating the reservoir model by the error covariance matrix adjusts the first parameter includes the non-linearity problem. The invention can reduce errors in the process of converting nonlinear problems into linear problems by combining the observed values in the production history with the error covariance matrix.

S200: and acquiring physical properties of the reservoir based on the reservoir model, and determining a water injection scheme and a drainage scheme of the layered injection and production by utilizing a deep learning algorithm/machine learning algorithm, so that the layered injection and the layered production determine an integrated water injection scheme and drainage scheme in a synergistic manner. The method comprises the following steps of obtaining physical properties of a reservoir based on an oil reservoir model and determining a water injection scheme and a drainage scheme of layered injection and production by using a deep learning algorithm/machine learning algorithm:

Obtaining a second parameter relating to reservoir fluid flow based on the reservoir model;

an integrated water injection scheme and drainage scheme is determined based on the second parameter. Preferably, the evaluation index of the water injection effect and the drainage effect is quantified based on the static data, the dynamic data and the second parameter. Preferably, the water injection scheme and the drainage scheme are acquired based on a deep learning algorithm/machine learning algorithm. Preferably, the second parameter comprises at least one or more of reservoir rock physical properties, single sand extension and geometry, fault aggregation and closure, injection and production profile, perforation and production zone in the injection and production wells, and relative position of the injection and production wells. Preferably, the value of the second parameter may be determined using the following criteria, in particular:

(1) For the same sand body, a fluid flow path communicated with a water injection well and an oil extraction well exists under a proper well pattern and a reasonable well distance;

(2) For different sand bodies, the water injection well and the oil extraction well are not communicated;

(3) The water injection well or the oil extraction well drilled in the mudstone area is not communicated;

(4) No connection exists between the water injection well and the oil extraction well near the closed fault or the mudstone area;

(5) For the geometry of the sand body which causes the overlong flow channel between the water injection well and the oil extraction well, no fluid flow or weak flow exists between the water injection well and the oil extraction well;

(6) Under appropriate conditions, the injected water can bypass the barrier;

(7) The two-stage oil production wells in the same direction are not affected;

(8) The production well may be affected by multiple directions;

(9) Under proper angles and intervals, one water injection well can affect a plurality of oil extraction wells;

(10) The water injection well and the oil extraction well have no fluid flow in the stratum which is not perforated at the same time;

(11) The streamlines cannot cross each other.

Preferably, the steps for acquiring the integrated water injection scheme and the drainage scheme based on the deep learning algorithm/the machine learning algorithm are as follows:

means for analyzing the layering direction of the single water injection well/oil extraction well by using a machine learning algorithm;

evaluating the well group/interval water injection effect and the drainage effect by using a machine learning algorithm;

qualitative well group/interval water injection adjustment direction and drainage adjustment direction by applying a machine learning algorithm;

and solving an optimal integrated water injection scheme and an optimal drainage scheme by applying a deep learning algorithm.

Preferably, the integrated water injection scheme and drainage scheme are optimized based on a deep reinforcement learning algorithm. Preferably, the optimization objective of the integrated water injection and drainage scheme is that the net present value is the greatest. And acquiring at least the change of pressure and saturation distribution based on the real-time measured high-dimensional dynamic data, and taking the high-dimensional pressure and saturation change as the input of a deep reinforcement learning algorithm. At least the water injection frequency and the oil recovery frequency are used as decision variables. Preferably, the optimized water injection and drainage schemes are adjusted in a manner that at least complements reservoir rock physical properties, relative positions of water injection and production wells based on second parameters provided by the reservoir model.

S300: the integrated water injection scheme and drainage scheme are executed in a mode of simultaneously optimizing the layered water injection parameter and the layered oil extraction parameter based on the reinforcement learning/deep reinforcement learning algorithm, so that independent implementation of layered water injection and layered oil extraction is avoided to realize injection balance and supply and drainage coordination.

According to a preferred embodiment, the steps of performing an integrated water injection scheme and drainage scheme based on a reinforcement learning/deep reinforcement learning algorithm in a manner that optimizes both the stratified water injection parameters and the stratified oil recovery parameters simultaneously are as follows:

in the event that the cost function converges and the optimization decision does not bring the environmental state to the optimization goal, or in the event that the cost function converges and the corresponding injection and production equipment 300 is not damaged,

Preferably, the injection and production apparatus 300 may be controlled by means of proportional-integral-derivative control (Proportional IntegralDerivative control, PID).

Preferably, the reinforcement learning algorithm is first described. The basic process of reinforcement learning is a markov decision process. The Markov decision process may form a quadruple representation { s, a, p, r } with states s, actions a, state transition probabilities p, state transition rewards, or rewards r. For a discrete-time markov decision process, the set of states and actions is referred to as state space S and action space a. Specifically expressed as state s _i ∈S，a _i E A. According to the action selected in step t, the state is determined according to the probability P (s _t+1 ，s _t ，a _t ) From s _t Transfer to s _t+1 . At the same time as the state transition, the decision body gets 1 instant prize R (s _t+1 ，s _t ，a _t ). S in the above expression _t Represented as the state at time t. a, a _t Denoted as action at time t. The awards accumulated at the end of the above process are:

G _t ＝R _t +γR _t+1 +γ ² R _t+2 +…+γ ^k R _t+k ＝∑ _k＝0 γ ^k R _t+k (1)

r in formula (1) _t Is the jackpot accumulated over time t. Gamma is a discount factor, and the value range is between 0 and 1. The discount factor is used to cut the bonus weight corresponding to the forward decision. The final goal of the decision is to maximize the jackpot while reaching the goal state.

Preferably, during reinforcement learning, different environmental states and actions are recorded to build the value table. Preferably recorded within the value table are historical or previous environmental states and actionsCorresponding value. The value represents a discrete record of the cost function. Preferably, the cost function may be a set of unitary quadratic functions with respect to the first optimization objective. For example, the oil production is-l (X-m) ² +n. The three coefficients of l, m and n are set at least to ensure that the oil production is positive in half of the production period. Preferably, the first action is obtained based on an ε -greedy policy. Preferably, the first action is a random action. The epsilon-greedy strategy selects the action corresponding to the maximum value of the cost function in the later stage of learning training, but a certain probability epsilon randomly selects one action to acquire rewards.

Preferably, the optimization decision is made in a manner that training learning is performed to approach the first optimization objective based on rewards after performing actions in the current time environmental state and the previous time environmental state. Preferably, the first optimization objective includes net present value, oil recovery, and yield maximization. Preferably, the first optimization objective may also include injection/production profile homogenization, energy consumption minimization, injection and production equipment 300 life maximization, etc. Since the first optimization objectives include net present value, oil recovery and production maximization, injection and production equipment 300 life maximization. Therefore, the state space may select, as the state space S, properties directly related to the production, the pump cycle of the injection well, the pump cycle of the production well, the water injection rate, the oil recovery rate, etc. Preferably, the state space S is a multidimensional matrix, wherein the number of rows is the number of related attributes, and the number of columns is the number of water injection wells and oil recovery wells. Preferably, the parameters collected in real time include at least flow and pressure of a single well, wellhead oil jacket pressure, wellbore temperature and pressure distribution, pipeline pressure, injection equipment pressurization and power, lifting equipment lift and power, and the like. Preferably, the decision variables may be the operating frequency of the injection and production device 300, the water nozzle valve opening, the oil nozzle valve opening, the ICD valve opening. Thus, the actuation space a of the execution module includes the operating frequency of the injection well, the operating frequency of the production well, the water nozzle valve opening, the oil nozzle valve opening, and the ICD valve opening. Preferably, the action space A is also a multidimensional matrix, wherein the number of rows is 5, and the number of columns is the number of corresponding water injection wells and oil extraction wells. The values of the corresponding operation characteristic amounts in the operation space a are described with the operation frequency of the oil production well. Preferably, the method comprises the steps of, Operating frequency v of oil production well _i The motion characteristic quantity of (a) is as follows:

v _i the value of (1) is 1,0, -1. When the control module 200 feeds back to the execution module, i.e., 1,0, -1, the execution module increases, does not change, and decreases Δv at the original frequency. It should be noted that the magnitude setting of Δv should be determined according to practical situations. If Deltav is too small, it will lead to slow convergence, and if it is too large, it will lead to unstable system operation and even failure to converge.

Preferably, the function for rewards is constructed based on the environmental status of the feedback after the previous execution module performed the action. The maximum value of the reward function should be equivalent to the first optimization objective. For example, the reward function is a function of the action a performed by the injection and production device 300 and the environmental state s collected by the collection module 100. The reward function R (a, s) is as follows:

preferably, the value of the updated cost function is:

Q(s _t+1 ，a _t+1 )＝Q _o (s _t ，a _t )+loss (4)

q(s) in formula (4) _t+1 ，a _t+1 ) Is the value of the updated cost function. Q (Q) _o (s _t ，a _t ) Previous value in previous environmental state. The previous value is the value stored in the value table, or the value under the corresponding environmental state and action previously recorded. loss is a loss function.

loss＝α[Q _r (s _t+1 ，a _t+1 )-Q _o (s _t ，a _t )] (5)

Q in (5) _r (s _t+1 ，a _t+1 ) Is a practical value. Alpha is the learning rate. Alpha has a value between 0 and 1. Alpha determines value table update Is a rate of (a).

Preferably, the real-world value includes a first real-world value for online learning and a second real-world value for offline learning. Preferably, the first real-world value is determined based on a way of maximum evaluation of the cost function in the new environment state. Preferably, the first real value is:

Q _r1 (s _t ，a _t )＝R(s _t ，a _t )+γmaxQ _o (s _t+1 ，a _t+1 ) (6)

q in (6) _r1 (s _t ，a _t ) Is a first real value. R(s) _t ，a _t ) To perform a corresponding reward after the first action. maxQ _o (s _t+1 ，a _t+1 ) In order to execute the new state after the action, the maximum value corresponding to the new state is in the value table. Gamma denotes the state s _t By action a _t The value of (2) is related to the degree of decay of the next state and action. The value range of gamma is between 0 and 1.

Preferably, the second real-world value is determined based on the value of the cost function in the new environmental state in the cost table. Preferably, the second real value is:

Q _r2 (s _t ，a _t )＝R(s _t ，a _t )+γQ _o (s _t+1 ，a _t+1 ) (7)

q in (7) _r2 (s _t ，a _t ) Representing a second real value.

Preferably, in the training of reinforcement learning, different updating strategies affect the learning rate, the convergence rate, the stability, the computational complexity, and the like, and further affect the training time and the maintenance period of the injection and production equipment 300. For example, the learning rate, convergence rate, and computational complexity are directly related to the learning training time. In the process of selecting and executing work based on the epsilon-greedy strategy, if the updating is performed based on the first real-time value of online learning, the updating is the maximum evaluation of the value function and depends on real-time feedback of the environment state, so that the generated optimization decision is more aggressive, the action change degree is larger, the mechanical movement process of the injection and production equipment 300 is not smooth enough, and larger damage to the injection and production equipment 300 is possibly generated, and the situation of multiple damages is caused. Under the condition of updating according to the second practical value of offline learning, the updating is relatively conservative, so that the learning training time is too long, and therefore, the method and the device enable the optimized decision in the learning training process to be gentle on the basis of shortening the learning training time based on the mode of fusing the online learning and the offline learning, so that the action is smooth, and larger fluctuation cannot be generated.

and constructing a loss function in the reinforcement learning/deep reinforcement learning algorithm by combining an online learning mode and an offline learning mode on the basis of dividing the start-stop times, the well opening time and the well closing time of the water injection well and the oil extraction well. Preferably, the number of start-stops, the open time and the close time of each single well in one equipment detection period are divided based on the state space, and then the first time about the open of each single well in different open times and the second time about the close of each single well in different close times are determined. In the same first time/second time, the real value in the current state is the difference between the real value in the current state and the second real value in the current state and the difference between the first real value and the second real value in the current state which are linearly overlapped on the basis of the second real value corresponding to the current state. Preferably, the real value in the previous state is multiplied by the second real value in the current state by the first weight. The difference between the first real value and the second real value in the current state is multiplied by the second weight. The first real-world value is determined based on a manner of maximum evaluation of the cost function in the new environmental state. The second real-world value is determined based on the value of the cost function in the new environmental state in the historical value.

Preferably, the sum of the first weight and the second weight is 0-1. The first weight and the second weight may be set according to a value table or according to actual conditions. Preferably, the second real value corresponding to the current state is used as the minimum value of the real value of the current state, so that the basic time of learning training is ensured. The actual value in the previous state and the second actual value in the current state are used for determining the difference degree between the current state and the previous state. The difference between the first real value and the second real value in the current state is used for measuring the corresponding degree of aggression in the same past state in the value table compared with the current optimization strategy. Through this setting method, the beneficial effect who reaches is:

because the sum of the first weight and the second weight is 1, namely the corresponding decision of the current state is mainly based on the second practical value in the current state, and the difference degree of the current state and the last state is considered, the execution action corresponding to the decision executed by the execution module in the two states can be stable, and the change degree of a certain decision is increased. In addition, the degree of decision change can be further increased by considering that the current optimization strategy is more aggressive than the corresponding past same state in the value table, so that the learning training time is reduced.

Preferably, in the case of entering the second time from the adjacent first time or entering the first time from the adjacent second time, the difference value between the first real value and the second real value corresponding to the current state under the third weight is linearly superimposed on the basis of the second real value of the current state. The third weight value is between 0 and 1. Because the states of opening and closing are significantly different, only the degree of change of the first real value and the second real value can be considered, so that the generated decision is not changed too much, and damage to the injection and production equipment 300 is avoided.

Preferably, in the case that the cost function does not converge, a parameter within a threshold value for executing an action in the execution module is randomly selected, and a state corresponding to the parameter is taken as an initial state. The state at least comprises oil production, liquid supply, water injection and the like, and then a new round of control is carried out. Preferably, the state of the invention is referred to as an environmental state.

Preferably, a deep reinforcement learning algorithm may also be employed. The deep reinforcement learning algorithm builds a cost function based on the environmental state, the execution actions, and the updated parameters. I.e. cost function Q (s _t ，a _t ) On the basis of which an update parameter theta is added. Theta is 0 to 1 And (3) the room(s). The cost function of deep reinforcement learning is Q (s _t ，a _t ，θ _t ). Preferably, the control module 200 is configured to learn updates based on a linear superposition of previous values and loss functions in previous environmental states. Preferably, the value of the updated cost function is:

Q(s _t+1 ，a _t+1 ，θ _t+1 )＝Q _o (s _t ，a _t ，θ _t )+loss (8)

preferably, the cost function may be a curve of sine, cosine, index, etc. Preferably, the update problem of the cost function is converted into a function fit problem. Preferably, the cost function is fitted by a polynomial of multiple order. The optimal value is approximated by updating the parameter θ. The problem that the state space S and the action space A are large can be solved by adopting the setting mode.

According to a preferred embodiment, the method further comprises: and constructing a mixed integer nonlinear programming model which determines that the energy consumption of each single well is minimized under the condition that the daily cumulative total yield does not drop in a first time about well opening in different opening time and a second time about well closing in different stopping time, so as to obtain the optimal and dynamically-changed starting and stopping times, the first time and the second time under the condition that the local optimal problem is avoided. Preferably, the optimization objective of the mixed integer nonlinear programming model is energy consumption minimization. Constraint conditions of the mixed integer nonlinear programming model are as follows:

1. The daily cumulative total yield does not drop;

2. meets the minimum flow performance;

3. the pipe string integrity is greater than the minimum threshold.

Preferably, the decision variables of the mixed integer nonlinear programming model may be the injection and production equipment 300 operating frequency, the water nozzle opening, the oil nozzle opening, and the ICD valve opening. Preferably, the lowest flow performance and the lowest threshold for string integrity may be formulated as the injection and production equipment 300 operating frequency, water nozzle opening, oil nozzle opening, and ICD valve opening. Preferably, the mathematical characterization of the most fluid performance may be that each hierarchical node meets a minimum critical fluid carrying flow. The wellbore and the string need to operate within a range of pressures, so the string needs to meet the strength requirements. Preferably, the integrity of the string may also be characterized as the pressure to which the string is subjected within a certain range. The tubular string is subjected to a pressure less than a highest threshold and greater than a lowest threshold. Preferably, the minimum critical carrier flow rate and the operating pressure range during operation of the string are set according to actual parameters of the oilfield production. Preferably, the above mixed integer nonlinear programming model may be solved based on a mixed integer nonlinear programming solver.

Example 2

As shown in fig. 2, the present invention provides a data-driven water-flooding reservoir optimization system, which includes an acquisition module 100, a control module 200, and an injection and production device 300.

Preferably, the acquisition module 100 may include a pressure sensor, a temperature sensor, a voltage sensor, a current sensor. The acquisition module 100 also includes a meter that measures moisture content.

Preferably, the control module 200 may be a computer device, such as a mobile computing device, a desktop computing device, a server, or the like. The control module 200 may include a processor and a memory device. The storage device is used for storing instructions sent by the processor. The processor is configured to execute instructions stored by the memory device. Preferably, the storage means may be separately provided outside the control module 200. The processor may be a central processing unit (Central Processing Unit, CPU), general purpose processor, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit (ASIC), field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, transistor logic device, hardware component, or any combination thereof.

Preferably, the control module 200 may carry an operating system, such as a Linux system, an Android system, an IOS operating system, and the like.

Preferably, the control module 200 may operate in a networked environment using logical connections to one or more remote computers, either through wires or wirelessly. The remote computer may be another computer, a tablet, a PDA, a server, a router, a network PC, a peer device or other common network node, relative to the control module 200, and typically includes some and/or all of the elements described above relative to the computer. Logical connections include local area networks, wide area networks, private networks, and the like that are presented by way of example and not limitation. The control module 200 of the invention can be used for remote inquiry, modification, calling and running and other operations by entities such as oil reservoir development personnel, departments, enterprises and the like.

Preferably, the storage device may be a magnetic disk, a hard disk, an optical disk, a mobile hard disk, a solid state disk, a flash memory, etc.

Preferably, the control module 200 may be connected to the acquisition module 100 and the execution module of the injection and production apparatus 300 by a wired or wireless means. Preferably, the injection and production apparatus 300 includes an injection well 1 and a production well. The injection well 1 at least comprises a measuring and adjusting water distributor and a wellhead water nozzle. The oil production well at least comprises an oil-submersible motor and a wellhead oil nozzle. Preferably, the injection and production apparatus 300 further comprises a horizontal well inflow control valve (Inflow Control Device, ICD). Preferably, the control module 200 controls the injection and production equipment 300 via an execution module. The execution module at least comprises a frequency converter and a valve opening degree adjusting mechanism. For example, the control module 200 controls the water injection well's survey and regulation water injection well and the oil production well's submersible motor through a frequency converter.

Preferably, the control module 200 is configured to:

establishing an oil reservoir model of the oil-water well based on the static data of the oil-water well and the dynamic data acquired by the acquisition module 100 in real time;

and selecting at least one first parameter with at least one degree of association larger than a first threshold value based on the degree of association analysis of the dynamic data and the oil reservoir model simulation result, and adjusting the oil reservoir model based on updating the at least one first parameter step by step in a manner that the difference between the numerical simulation result of the at least one predicted oil reservoir model generated according to the at least one first parameter and the production history is smaller than a second threshold value.

Preferably, the control module 200 is configured to: in the case where the control module 200 controls the injection and production equipment 300 to simultaneously perform the integrated injection and drainage schemes optimized based on the reservoir model in such a manner that the layered injection parameters and the layered production parameters of the injection and production equipment 300 are simultaneously optimized based on the reinforcement learning/deep reinforcement learning algorithm, the control module 200 is configured to construct a loss function in the reinforcement learning/deep reinforcement learning algorithm in such a manner that online learning and offline learning are fused based on the division of the number of start-stop times, the open-well time, and the close-well time of the injection and production wells.

Preferably, the control module 200 is configured to: the digital oil-water well is constructed in such a manner that the real production environment and the virtual production environment are interactively mapped based on the static data of the oil-water well and the dynamic data acquired in real time by the acquisition module 100. Through the arrangement mode, the method is used for combining static data and dynamic data to construct an oil reservoir model, and further improves the oil reservoir recognition level through the dynamic data obtained through long-term monitoring of parameters such as underground layering flow, pressure, water content and the like, and provides accurate basis for fine oil reservoir analysis and mining.

Preferably, the control module 200 is configured to determine the integrated water injection and drainage schemes performed by the injection and production equipment 300 using a deep learning algorithm/machine learning algorithm based on the reservoir physical properties acquired by the digital oil-water wells. Through the arrangement mode, the corresponding water injection scheme and the drainage scheme can be determined based on dynamic data driving by using a deep learning algorithm or a mechanical learning algorithm based on the oil reservoir model through quantitative evaluation standards. More importantly, the setting mode can realize the block well group collaborative optimization oil reservoir development of oil reservoir dynamic, layered oil extraction and layered water injection, namely, through the integrated design of a water injection scheme and a drainage scheme, the hysteresis regulation and control caused by the separation of a water injection well and an oil extraction well production scheme are avoided. Specifically, based on the oil reservoir model, the physical properties of the reservoir, the saturation of fluid in the injection and production process, the pressure and other information can be predicted, so that the injection scheme and the drainage scheme can be quickly adjusted. However, when the water injection scheme and the drainage scheme are adjusted, the corresponding injection and production equipment does not have real-time adjustment capability. In particular, in the process of adjusting the water injection scheme and the drainage scheme, the corresponding injection and production equipment needs to have the capabilities of automatic optimizing and quick learning. Although the existing deep reinforcement learning algorithm can adapt to environmental changes by quickly learning training, from the actual condition of oilfield development, the relation between an injection well and an effective oil extraction well on a plane and an interval is complex, the state of geological analysis is poor in conformity with the actual state, and because the layer system is not perfect in plugging, a plurality of oil extraction wells have the phenomena of injection or non-injection and oil layer strings, so that the oil layer is unbalanced, and the development effect is poor. More importantly, the layered water injection and the layered oil extraction belong to the same system in nature, but are often implemented according to independent process technology in application, and a synergistic effect is not exerted, so that in actual operation, the real-time adjustment and separation of relevant parameters of a water injection well and an oil extraction well are caused, namely, the water injection parameter of the water injection well and the oil extraction parameter of the oil extraction well are respectively and independently adjusted, and the problems of hysteresis regulation are caused by the fact that after the relevant parameters of the water injection well are adjusted, the non-uniformity, the pressure change and the like of the whole reservoir are correspondingly changed, and the oil extraction parameter adopted by the oil extraction well is also adjusted based on the reservoir property before the change. Based on the problem of injection and production separation regulation, the existing related self-optimizing intelligent regulation technology adopts a model of reinforcement learning algorithm, and only takes the self variables into consideration from the setting of optimization targets, decision variables, rewarding functions and cost functions. For example, the water injection well only considers water injection speed, injection increasing amount, emission reduction, water injection well pressure, etc., and the oil extraction well only considers yield, oil extraction well pressure, liquid level height, submersible motor speed, etc. Therefore, the method controls the injection and production equipment to simultaneously execute the injection scheme and the drainage scheme optimized based on the oil reservoir model in a mode of simultaneously optimizing the layered injection parameters and the layered production parameters of the injection and production equipment based on the reinforcement learning/deep reinforcement learning algorithm, thereby avoiding the independent implementation of layered injection and layered production to realize the balance of injection and production and coordination of supply and drainage.

Preferably, the control module 200 is configured to control the injection and production equipment 300 to simultaneously perform an integrated injection and drainage scheme in a manner that optimizes both the stratified injection parameters and the stratified production parameters of the injection and production equipment 300 based on a reinforcement learning/deep reinforcement learning algorithm, thereby avoiding the independent implementation of stratified injection and stratified production to achieve injection and production balance and supply and drainage coordination. Through the setting mode, the injection scheme and the drainage scheme are utilized for learning and training, so that the related parameters of the injection well and the oil production well are automatically matched and synchronously adjusted, and the delayed regulation and control of the oil field development is changed into real-time optimization.

Through the arrangement mode, the invention has the beneficial effects that:

according to the invention, through the block cooperative application of the layered oil extraction and layered water injection technology, the layered oil extraction and layered water injection scheme is integrally designed, the corresponding analysis of the underground intervals of the extraction end and the injection end is enhanced, namely, continuous, long-term and rich underground monitoring data of the multi-layer sections of the same block injection end and the extraction end are utilized to develop the fine geological modeling driven by big data, the reservoir fluid saturation and pressure field evolution model under the constraint of the layered injection real-time data is obtained, the knowledge of the reservoir heterogeneity and the flow strip is deepened, and the uncertainty of the residual oil distribution prediction is reduced. Finally, the parameters of the injection end and the extraction end are matched and adjusted by utilizing a real-time regulation and control technology based on a deep reinforcement learning algorithm/reinforcement learning algorithm, so that the development and adjustment are changed from 'lag regulation' to 'real-time optimization', the control degree and the adjustment level of a well pattern are improved, the natural decrease and the water content increase are controlled, and the utilization degree and the recovery ratio are improved. Furthermore, on the basis of automatic shaft injection and production and mass data processing, an oil reservoir analysis and optimization system is developed by utilizing artificial intelligence, so that the real intelligent oil reservoir management is realized.

According to a preferred embodiment, the control module 200 is configured to build a reservoir model of a digital oil-water well as follows:

establishing a preliminary oil reservoir model based on static data of the physical oil-water well and dynamic data acquired by the acquisition module 100 in real time;

under the condition that the difference between the numerical simulation result and the production history of the oil reservoir model is smaller than a second threshold value, taking the preliminary oil reservoir model as the oil reservoir model with optimized subsequent water injection scheme and drainage scheme;

and under the condition that the difference between the numerical simulation result and the production history of the oil reservoir model is larger than a second threshold value, selecting at least one first parameter with at least one degree of association larger than a first threshold value based on the degree of association analysis of the dynamic data and the numerical simulation result, and adjusting the preliminary oil reservoir model in a mode of optimizing the at least one first parameter so that the difference between the numerical simulation result and the production history is smaller than the second threshold value. Preferably, the association degree analysis can acquire the association degree of each uncertain dynamic data and the numerical simulation result of the oil reservoir model through a grey association rule mining algorithm. Preferably, the first threshold may be specifically set according to the stratified water injection, the dynamic data of oil recovery and the established reservoir model acquired by the acquisition module 100 in real time. The value of the first threshold is between 0 and 1. In this embodiment, the value of the first threshold is between 0.4 and 1. Preferably, the first parameter may be one or several of porosity, permeability, reservoir horizontal permeability, reservoir water quality direction permeability, initial tube pressure at the oil-water push interface 5. Preferably, the second threshold characterizes a fit of the reservoir value to actual historical production. The second threshold value is between 0 and 1. Preferably, in the present embodiment, the difference between the numerical simulation result and the production history is an error co-defense root. The second threshold value is in the range of 0-0.3. Preferably, the control module 200 is configured to optimize the at least one first parameter as follows:

Through this setting method, the beneficial effect who reaches is:

Preferably, after the reservoir model is adjusted to meet the fitting requirement, the injection and production relation optimization integrated injection and production scheme is obtained based on a deep learning algorithm and a learning algorithm thereof. Preferably, the control module 200 is configured to:

a second parameter relating to reservoir fluid flow is obtained based on the reservoir model. Preferably, the second parameter comprises at least one or more of reservoir rock physical properties, single sand extension and geometry, fault aggregation and closure, injection and production profile, perforation and production zone in the injection and production wells, and relative position of the injection and production wells. Preferably, the value of the second parameter may be determined using the following criteria, in particular:

(6) Under appropriate conditions, the injected water can bypass the barrier;

(7) The two-stage oil production wells in the same direction are not affected;

(8) The production well may be affected by multiple directions;

(11) The streamlines cannot cross each other.

Preferably, the integrated water injection scheme and the drainage scheme are determined based on the second parameter. Preferably, the evaluation index of the water injection effect and the drainage effect of the injection and production equipment 300 is quantified based on the static data, the dynamic data, and the second parameter. Preferably, the integrated water injection scheme and drainage scheme are acquired based on a deep learning algorithm/machine learning algorithm. Preferably, the control module 200 is configured to acquire the integrated water injection scheme and drainage scheme based on the deep learning algorithm/machine learning algorithm as follows:

According to a preferred embodiment, the control module 200 is configured to optimize the integrated water injection and drainage schemes performed by the injection and production equipment 300 based on a deep reinforcement learning algorithm. Preferably, the optimization objective of the integrated water injection and drainage schemes performed by the injection and production equipment 300 is to maximize the net present value. Preferably, at least the changes in pressure and saturation distribution are obtained based on the high-dimensional dynamic data measured in real time by the acquisition module 100, and the high-dimensional pressure and saturation changes are used as input to the deep reinforcement learning algorithm. Preferably, at least the water injection frequency and the oil recovery frequency of the injection and recovery device 300 are used as decision variables. Preferably, the optimized water injection and drainage schemes may be adjusted in a manner that at least complements reservoir rock physical properties, relative positions of water injection and production wells based on second parameters provided by the reservoir model.

According to a preferred embodiment, in the case that the control module 200 controls the injection and production equipment 300 to simultaneously perform the integrated injection and production schemes in a manner of simultaneously optimizing the layered injection parameters and the layered production parameters of the injection and production equipment 300 based on the reinforcement learning/deep reinforcement learning algorithm, the control module 200 is configured to construct the loss function in the reinforcement learning/deep reinforcement learning algorithm in a manner of integrating on-line learning and off-line learning based on the division of the start-stop times, the open-well time, and the close-well time of the injection well and the production well. Preferably, the control module 200 is configured to control the injection and production equipment 300 via an execution module.

Preferably, the control module 200 may control the execution module by means of proportional-integral-derivative control (Proportional Integral Derivative Control, PID).

Preferably, the control module 200 is configured to make an optimization decision based on the current time environmental state provided by the acquisition module 100 and rewards after the execution module performs the action in the previous time environmental state in a manner that approximates the first optimization objective. Preferably, the first optimization objective includes net present value, oil recovery, and yield maximization. Preferably, the first optimization objective may also include injection/production profile homogenization, energy consumption minimization, injection and production equipment 300 life maximization, etc. Preferably, the control module 200 builds the state space S based on the environmental state provided by the acquisition module 100. Preferably, the control module 200 builds the action space a of the execution module based on the optimization decisions it makes. Since the first optimization objectives include net present value, oil recovery and production maximization, injection and production equipment 300 life maximization. Therefore, the state space may select, as the state space S, properties directly related to the production, the pump cycle of the injection well, the pump cycle of the production well, the water injection rate, the oil recovery rate, etc. Preferably, the state space S is a multidimensional matrix, wherein the number of rows is the number of related attributes, and the number of columns is the number of water injection wells and oil recovery wells. Preferably, the parameters collected by the collection module 100 in real time at least comprise the flow rate and pressure of a single well and a wellhead oil jacket Pressure, wellbore temperature and pressure distribution, pipeline pressure, injection equipment pressurization and power, lifting equipment lift, power, and the like. Preferably, the decision variables may be the operating frequency of the injection and production device 300, the water nozzle valve opening, the oil nozzle valve opening, the ICD valve opening. Thus, the actuation space a of the execution module includes the operating frequency of the injection well, the operating frequency of the production well, the water nozzle valve opening, the oil nozzle valve opening, and the ICD valve opening. Preferably, the action space A is also a multidimensional matrix, wherein the number of rows is 5, and the number of columns is the number of corresponding water injection wells and oil extraction wells. The values of the corresponding operation characteristic amounts in the operation space a are described with the operation frequency of the oil production well. Preferably, the operating frequency v of the production well _i The motion characteristic quantity of (a) is as follows:

Preferably, the control module 200 constructs a function regarding rewards based on the environmental status fed back by the acquisition module 100 after the previous execution module performed the action. The maximum value of the reward function should be equivalent to the first optimization objective. For example, the reward function is a function of the action a performed by the injection and production device 300 and the environmental state s collected by the collection module 100. The reward function R (a, s) is as follows:

Preferably, the control module 200 is configured to make optimization decisions as follows:

constructing cost functions related to environmental states and executing actions by the execution module, and recording different environmental states and actionsAnd constructing a value table. The value represents a discrete record of the cost function. Preferably, the cost function may be a set of unitary quadratic functions with respect to the first optimization objective. For example, the oil production is-l (x-m) ² +n. The three coefficients of l, m and n are set at least to ensure that the oil production is positive in half of the production period.

Preferably, in case the cost function converges and the optimization decision of the control module 200 does not bring the environmental state to the optimization target, or in case the cost function converges and the system is not damaged, the control module 200 is configured to obtain the first action in the corresponding environmental state based on an epsilon-greedy strategy. Preferably, the first action is obtained based on an ε -greedy policy. Preferably, the first action is a random action. The epsilon-greedy strategy allows the control module 200 to select the action corresponding to the maximum value of the cost function during the late learning training period, but there is a certain probability epsilon that one action is randomly selected to obtain the reward.

Preferably, the execution module controls the injection and production equipment 300 based on the first action information communicated by the control module 200. The control module 200 obtains a new environmental state and a corresponding reward after the execution module performs the first action based on the acquisition module 100. The control module 200 learns updates based on the new environmental status and the corresponding rewards. Preferably, the control module 200 is configured to learn updates based on a linear superposition of previous values and loss functions in previous environmental states. The control module 200 is configured to construct a loss function based on a manner that fuses online learning and offline learning. Preferably, after the update, the control module 200 updates the environmental state to a new environmental state as an initial state of the next round of control.

Preferably, the control module 200 is configured to construct the loss function based on the learning rate and the difference between the real value and the previous value at the previous environmental state. Preferably, the value of the updated cost function is:

Q(s _t+1 ，a _t+1 )＝Q _o (s _t ，a _t )+loss (4)

q(s) in formula (4) _t+1 ，a _t+1 ) For updated cost functionsValue. Q (Q) _o (s _t ，a _t ) Previous value in previous environmental state. The previous value is the value stored in the value table. loss is a loss function.

loss＝α[Q _r (s _t+1 ，a _t+1 )-Q _o (s _t ，a _t )] (5)

Q in (5) _r (s _t+1 ，a _t+1 ) Is a practical value. Alpha is the learning rate. Alpha has a value between 0 and 1. Alpha determines the rate at which the value table is updated.

Preferably, the real-world value includes a first real-world value for online learning and a second real-world value for offline learning. Preferably, the control module 200 configures the first real-world value of online learning as follows:

the first real-world value is determined based on a manner of maximum evaluation of the cost function in the new environmental state. Preferably, the first real value is:

Q _r1 (s _t ，a _t )＝R(s _t ，a _t )+γmaxQ _o (s _t+1 ，a _t+1 ) (6)

q in (6) _r1 (s _t ，a _t ) Is a first real value. R(s) _t ，a _t ) Corresponding rewards after the first action is performed for the injection and production device 300. maxQ _o (s _t+1 ，a _t+1 ) In order to execute the new state after the action, the maximum value corresponding to the new state is in the value table. Gamma denotes the state s _t By action a _t The value of (2) is related to the degree of decay of the next state and action. The value range of gamma is between 0 and 1.

Preferably, the control module 200 configures the offline learned second reality function of cost as follows:

Q _r2 (s _t ，a _t )＝R(s _t ，a _t )+γQ _o (s _t+1 ，a _t+1 ) (7)

q in (7) _r2 (s _t ，a _t ) Representing a second real value.

Preferably, in the training of reinforcement learning, different updating strategies affect the learning rate, the convergence rate, the stability, the computational complexity, and the like, and further affect the training time and the maintenance period of the injection and production equipment 300. For example, the learning rate, convergence rate, and computational complexity are directly related to the learning training time of the control module 200. In the process of selecting the execution module based on the epsilon-greedy strategy, if the update is performed based on the first real-time value of online learning, the update is the maximum evaluation of the value function, and the real-time feedback of the acquisition module 100 to the environmental state is relied on, so that the generated optimization decision is more aggressive, the action executed by the injection and production equipment 300 is changed to a larger extent, the mechanical movement process of the injection and production equipment 300 is not smooth enough, and larger damage to the injection and production equipment 300 is possibly generated, so that the control module 200 controls the injection and production equipment 300 to learn and train in a process of multiple damage. Under the circumstance that the updating is performed according to the second practical value of the offline learning, the updating is relatively conservative, so that the learning training time of the control module 200 is too long, and therefore, the invention ensures that the optimization decision of the control module 200 in the learning training process is gentle on the basis of shortening the learning training time based on the mode of combining the online learning and the offline learning, and the action executed by the injection and production equipment 300 is smooth and does not generate larger fluctuation.

Preferably, the control module 200 is configured to implement a fusion of online learning and offline learning as follows:

1. the number of start-stops, the time to open the well, and the time to close the well for each injection and production equipment 300 in one inspection cycle are divided based on the state space S, and then a first time about opening the well and a second time about closing the well for each single well in different start-up times and different shut-in times are determined. It should be noted that intermittent water injection and intermittent oil extraction are one way of effectively reducing cost and increasing efficiency. The intermittent water injection and intermittent oil extraction aims to improve the yield and reduce the cost. The key is to determine a reasonable intermittent pumping system, namely, to make a proper well opening time and well closing time. Therefore, the invention can determine the intermittent pumping system based on the state space S and/or the value table, and then divide the optimal control of the control module 200 on the injection and production equipment 300 into different stages according to the start-stop times, the well opening time, the well closing time and the like in the intermittent pumping system, and optimize the learning training and decision of the control module 200 according to the different stages.

2. In the same first time/second time, the real value in the current state is the difference between the real value in the current state and the second real value in the current state and the difference between the first real value and the second real value in the current state which are linearly overlapped on the basis of the second real value corresponding to the current state. Preferably, the real value in the previous state is multiplied by the second real value in the current state by the first weight. The difference between the first real value and the second real value in the current state is multiplied by the second weight. The sum of the first weight and the second weight is 0-1. The first weight and the second weight may be set according to a value table or according to actual conditions. Preferably, the second real value corresponding to the current state is used as the minimum value of the real value of the current state, so as to ensure the basic time of learning training of the control module 200. The actual value in the previous state and the second actual value in the current state are used for determining the difference degree between the current state and the previous state. The difference between the first real value and the second real value in the current state is used for measuring the corresponding degree of aggression in the same past state in the value table compared with the current optimization strategy. Through this setting method, the beneficial effect who reaches is:

Because the sum of the first weight and the second weight is 1, namely the corresponding decision of the current state is mainly based on the second practical value in the current state, and the difference degree of the current state and the last state is considered, the execution action corresponding to the decision executed by the execution module in the two states can be stable, and the change degree of a certain decision is increased. In addition, it is contemplated that the current optimization strategy can further increase the degree of decision change over the corresponding degree of aggressiveness in the same state of the past in the value table, thereby reducing the learning training time of the control module 200.

Preferably, in the case of entering the second time from the adjacent first time or entering the first time from the adjacent second time, the control module 200 is configured to linearly superimpose the difference value between the first real value and the second real value corresponding to the current state under the third weight value on the basis of the second real value of the current state. The third weight value is between 0 and 1. Because the states of opening and closing are significantly different, only the degree of change of the first real value and the second real value can be considered, so that the decision produced by the control module 200 cannot be changed too much, and damage to the injection and production equipment 300 is avoided.

Preferably, in the case that the cost function does not converge, the control module 200 is configured to randomly select a parameter within a threshold value for performing an action in the execution module, and take a state corresponding to the parameter as an initial state. The state includes at least the oil production amount and the liquid supply amount, and then a new round of control is performed. Preferably, the state of the invention is referred to as an environmental state.

Preferably, a deep reinforcement learning algorithm may also be employed. The control module 200 is configured to construct a cost function based on the environmental state, the execution actions, and the updated parameters. I.e. cost function Q (s _t ，a _t ) On the basis of which an update parameter theta is added. The value of theta is between 0 and 1. The cost function of deep reinforcement learning is Q (s _t ，a _t ，θ _t ). Preferably, the control module 200 is configured to learn updates based on a linear superposition of previous values and loss functions in previous environmental states. Preferably, the value of the updated cost function is:

Q(s _t+1 ，a _t+1 ，θ _t+1 )＝Q _o (s _t ，a _t ，θ _t )+loss (8)

preferably, the cost function may be a curve of sine, cosine, index, etc. Preferably, the control module 200 is configured to convert the update problem of the cost function to a function fit problem. Preferably, the control module 200 is configured to fit the cost function by a polynomial of multiple order. The control module 200 is configured to approximate the optimal value by updating the parameter θ. By adopting the setting mode, the problem that the state space S and the action space A are large can be solved.

Preferably, the control module 200 divides the start-stop times, the open-well times and the close-well times of each injection and production device 300 in one inspection period based on the state space S, and further constructs the first time about the open-well and the second time about the close-well of each single well in different open-times into a mixed integer nonlinear programming model with minimized energy consumption under the condition that the daily cumulative total yield does not drop. Preferably, the optimization objective of the mixed integer nonlinear programming model is energy consumption minimization. Constraint conditions of the mixed integer nonlinear programming model are as follows:

1. the daily cumulative total yield does not drop;

2. meets the minimum flow performance;

3. the pipe string integrity is greater than the minimum threshold.

Preferably, the decision variables of the mixed integer nonlinear programming model may be the injection and production equipment 300 operating frequency, the water nozzle opening, the oil nozzle opening, and the ICD valve opening. Preferably, the lowest flow performance and the lowest threshold for string integrity may be formulated as the injection and production equipment 300 operating frequency, water nozzle opening, oil nozzle opening, and ICD valve opening. Preferably, the mathematical characterization of the most fluid performance may be that each hierarchical node meets a minimum critical fluid carrying flow. The wellbore and the string need to operate within a range of pressures, so the string needs to meet the strength requirements. Preferably, the integrity of the string may also be characterized as the pressure to which the string is subjected within a certain range. The tubular string is subjected to a pressure less than a highest threshold and greater than a lowest threshold. Preferably, the minimum critical carrier flow rate and the operating pressure range during operation of the string are set according to actual parameters of the oilfield production. Preferably, the control module 200 may solve the above mixed integer nonlinear programming model based on a mixed integer nonlinear programming solver.

The present specification contains several inventive concepts, and applicant reserves the right to issue a divisional application according to each of the inventive concepts. The description of the invention encompasses multiple inventive concepts, such as "preferably," "according to a preferred embodiment," or "optionally," all means that the corresponding paragraph discloses a separate concept, and that the applicant reserves the right to filed a divisional application according to each inventive concept.

It should be noted that the above-described embodiments are exemplary, and that a person skilled in the art, in light of the present disclosure, may devise various solutions that fall within the scope of the present disclosure and fall within the scope of the present disclosure. It should be understood by those skilled in the art that the present description and drawings are illustrative and not limiting to the claims. The scope of the invention is defined by the claims and their equivalents.

Claims

1. A data-driven waterflooding reservoir optimization method, the method comprising:

under the condition that the difference between the numerical simulation result and the production history of the oil reservoir model is larger than a second threshold value, selecting at least one first parameter with at least one degree of association larger than a first threshold value based on the degree of association analysis of the dynamic data and the numerical simulation result, and adjusting the preliminary oil reservoir model in a mode of optimizing at least one first parameter so that the difference between the numerical simulation result and the production history is smaller than the second threshold value;

acquiring reservoir physical properties based on the reservoir model and determining a water injection scheme and a drainage scheme of the layered injection and production by using a deep learning algorithm/machine learning algorithm, wherein,

determining an integrated water injection scheme and drainage scheme in a manner such that the stratified water injection and stratified oil recovery act in concert with each other;

the integrated water injection scheme and drainage scheme are executed in a mode of simultaneously optimizing the layered water injection parameters and the layered oil extraction parameters based on a reinforcement learning/deep reinforcement learning algorithm, and the steps are as follows:

in case the cost function converges and the optimization decision does not bring the environmental state to the optimization objective, or in case the cost function converges and the corresponding injection and production equipment (300) is not damaged,

learning updates are based on the new environmental status and the corresponding rewards, wherein,

learning updates based on a linear superposition of previous values in previous environmental states and a loss function, and constructing the loss function based on a learning rate and a difference between a real value and a previous value in a previous environmental state, wherein,

the real values include a first real value for online learning and a second real value for offline learning;

after the update, the environment state is updated to a new environment state as an initial state of the next round of control.

2. The method of optimizing a water-flooding reservoir of claim 1, wherein optimizing at least one of said first parameters comprises:

3. The method of optimizing a water-injected reservoir of claim 1, wherein the steps of obtaining reservoir physical properties based on the reservoir model and determining a water injection scheme and a drainage scheme for stratified injection and production using a deep learning algorithm/machine learning algorithm are as follows:

a second parameter relating to reservoir fluid flow is obtained based on the reservoir model, wherein,

the second parameters at least comprise one or more of reservoir rock physical properties, single sand extension and geometry, fault aggregation structure and closure, injection and production profile, perforation and production increase layers in the water injection well and the production well, and relative positions of the water injection well and the production well;

determining an integrated water injection scheme and drainage scheme based on the second parameter, wherein,

quantifying evaluation indexes of water injection effect and drainage effect based on the static data, the dynamic data and the second parameter;

And acquiring a water injection scheme and a drainage scheme based on a deep learning algorithm/machine learning algorithm.

4. The method of optimizing a water-filled reservoir of claim 3, further comprising optimizing the integrated water-filling and drainage schemes based on a deep reinforcement learning algorithm, wherein,

the optimization target of the integrated water injection scheme and drainage scheme is that the net present value is the largest;

based on the real-time measured high-dimensional dynamic data, at least acquiring the change of pressure and saturation distribution, and taking the high-dimensional pressure and saturation change as the input of a deep reinforcement learning algorithm;

at least the water injection frequency and the oil extraction frequency are taken as decision variables;

the optimized water injection scheme and drainage scheme are adjusted in a manner that at least complements reservoir rock physical properties, relative positions of the water injection well and the production well based on the second parameters provided by the reservoir model.

5. The waterflooding reservoir optimization method of claim 1, wherein the loss function is configured as follows:

constructing a loss function in a reinforcement learning/deep reinforcement learning algorithm based on the mode of fusing on-line learning and off-line learning on the basis of dividing the start-stop times, the well opening time and the well closing time of a water injection well and an oil extraction well, wherein,

Dividing the start-stop times, the open time and the close time of each single well in one equipment detection period based on the state space, and further determining the first time about the open of each single well in different open times and the second time about the close of each single well in different close times;

in the same first time/second time, the real value in the current state is the difference between the real value in the current state and the second real value in the current state and the difference between the first real value and the second real value in the current state which are linearly overlapped on the basis of the second real value corresponding to the current state, wherein,

multiplying the real value in the previous state by the second real value in the current state by the first weight;

multiplying the difference between the first real value and the second real value with a second weight value in the current state;

determining a first real value based on a manner of maximum evaluation of a cost function in a new environmental state;

the second real-world value is determined based on the value of the cost function in the new environmental state in the historical value.

6. The method of optimizing a water-flooding reservoir of claim 1, further comprising:

and constructing a mixed integer nonlinear programming model which determines that the energy consumption of each single well is minimized under the condition that the daily cumulative total yield does not drop in a first time about well opening in different opening time and a second time about well closing in different stopping time, so as to obtain the optimal and dynamically-changed starting and stopping times, the first time and the second time under the condition that the local optimal problem is avoided.

7. The data-driven water injection oil reservoir optimization system is characterized by comprising an acquisition module (100), a control module (200) and injection and production equipment (300), wherein the control module (200) is configured to:

establishing an oil reservoir model of the oil-water well based on static data of the oil-water well and dynamic data acquired by the acquisition module (100) in real time;

selecting at least one first parameter with at least one degree of association greater than a first threshold based on a degree of association analysis of dynamic data and the reservoir model simulation results, and adjusting the reservoir model based on updating at least one of the first parameters step by step in such a way that a difference between at least one predicted numerical simulation result of the reservoir model generated from the at least one first parameter and a production history is less than a second threshold;

8. The data-driven water injection oil reservoir optimization system is characterized by comprising an acquisition module (100), a control module (200) and injection and production equipment (300), wherein the control module (200) is configured to: in the case where the control module (200) controls the injection and production equipment (300) to simultaneously perform an integrated injection and production scheme optimized based on a reservoir model in such a manner that the injection and production equipment (300) is simultaneously optimized for the stratified injection and production parameters based on a reinforcement learning/deep reinforcement learning algorithm,

The control module (200) is configured to construct a loss function in a reinforcement learning/deep reinforcement learning algorithm in a mode of merging on-line learning and off-line learning on the basis of dividing the start-stop times, the well opening time and the well closing time of the water injection well and the oil extraction well;