WO2021127186A1 - Methods and systems for subsurface modeling employing ensemble machine learning prediction trained with data derived from at least one external model - Google Patents

Methods and systems for subsurface modeling employing ensemble machine learning prediction trained with data derived from at least one external model Download PDF

Info

Publication number
WO2021127186A1
WO2021127186A1 PCT/US2020/065620 US2020065620W WO2021127186A1 WO 2021127186 A1 WO2021127186 A1 WO 2021127186A1 US 2020065620 W US2020065620 W US 2020065620W WO 2021127186 A1 WO2021127186 A1 WO 2021127186A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
machine learning
reservoir
ensemble
training data
Prior art date
Application number
PCT/US2020/065620
Other languages
French (fr)
Inventor
Colin Daly
Original Assignee
Schlumberger Technology Corporation
Schlumberger Canada Limited
Services Petroliers Schlumberger
Geoquest Systems B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Schlumberger Technology Corporation, Schlumberger Canada Limited, Services Petroliers Schlumberger, Geoquest Systems B.V. filed Critical Schlumberger Technology Corporation
Priority to US17/757,657 priority Critical patent/US20230358917A1/en
Priority to EP20901603.9A priority patent/EP4078247A4/en
Publication of WO2021127186A1 publication Critical patent/WO2021127186A1/en

Links

Classifications

    • G01V20/00
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01VGEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
    • G01V1/00Seismology; Seismic or acoustic prospecting or detecting
    • G01V1/28Processing seismic data, e.g. analysis, for interpretation, for correction
    • G01V1/30Analysis
    • G01V1/306Analysis for determining physical properties of the subsurface, e.g. impedance, porosity or attenuation profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01VGEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
    • G01V2210/00Details of seismic processing or analysis
    • G01V2210/60Analysis
    • G01V2210/62Physical property of subsurface
    • G01V2210/624Reservoir parameters
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01VGEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
    • G01V2210/00Details of seismic processing or analysis
    • G01V2210/60Analysis
    • G01V2210/62Physical property of subsurface
    • G01V2210/624Reservoir parameters
    • G01V2210/6244Porosity
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01VGEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
    • G01V2210/00Details of seismic processing or analysis
    • G01V2210/60Analysis
    • G01V2210/62Physical property of subsurface
    • G01V2210/624Reservoir parameters
    • G01V2210/6246Permeability
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01VGEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
    • G01V2210/00Details of seismic processing or analysis
    • G01V2210/60Analysis
    • G01V2210/64Geostructures, e.g. in 3D data cubes
    • G01V2210/644Connectivity, e.g. for fluid movement
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01VGEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
    • G01V2210/00Details of seismic processing or analysis
    • G01V2210/60Analysis
    • G01V2210/66Subsurface modeling

Abstract

Method and systems are provided that create one or more models of a subsurface geological formation (such as a reservoir characterization model of a hydrocarbon reservoir or a model of some other subsurface geological formation). The method and systems are configured to extend a machine learning ensemble (such as an ensemble tree-based machine learning model such as a random forest learning model) to use or embed data derived from one or more secondary models as part of the training operations of the machine learning ensemble and online use of the trained machine learning ensemble. Such data can provide information that supplements the information contained in the training data/input data.

Description

METHODS AND SYSTEMS FOR SUBSURFACE MODELING EMPLOYING ENSEMBLE MACHINE LEARNING PREDICTION TRAINED WITH DATA DERIVED
FROM AT LEAST ONE EXTERNAL MODEL
CROSS-REFERENCE TO RELATED APPLICATION(S)
[0001] The present disclosure claims priority from U.S. Provisional Patent Appl. No. 62/949,673, filed on December 18, 2019, herein incorporated by reference in its entirety.
FIELD
[0002] The present disclosure relates to methods and systems that employ machine learning prediction to create a reservoir characterization model.
BACKGROUND
[0003] A reservoir characterization model is a three-dimensional representation of a subsurface hydrocarbon reservoir, including the spatial distribution of one or more petrophysical, geological or geophysical properties or attributes of the hydrocarbon reservoir. The reservoir characterization model quantifies properties or attributes within a subsurface volume that encompasses the hydrocarbon reservoir. The attributes or properties typically include the structural shape and thicknesses of the formation layers within the subsurface volume being modeled, their lithologies, and the porosity and permeability distributions. These attributes are relatively stable over long periods of time and can, therefore, be considered static. Porosity and permeability often vary significantly from location to location within the volume, resulting in heterogeneity. However, porosity and permeability are stable in the near-geologic timeframe and do not change due to the movement of fluids or gases through any of the formations pore spaces. The reservoir characterization model is also commonly referred to as a static model or geologic model.
[0004] The properties and attributes of the reservoir characterization model are typically defined by extrapolation from physical and chemical data related to the reservoir, including core data, well log data and seismic data. Computer-based methods and systems typically create a reservoir characterization model from relevant datasets that pertain to the reservoir. These datasets are compiled to create stratigraphic and structural frameworks that define the geometry of the reservoir. Using these frameworks, the facies, porosity, and permeability values are extrapolated horizontally and vertically throughout each layer. The facies rock types are typically modeled independently within each stratigraphic layer whereas the porosity of the model is dependent upon the facies model. Permeability is dependent upon both the facies and the porosity models. Several reservoir characterization models can be created from these attributes and then evaluated to select one or more “best” reservoir characterization models. The selected reservoir characterization model(s) can be used to simulate fluid flow in the reservoir during production. The fluid flow simulation can be used to plan and optimize production of hydrocarbons from the reservoir.
[0005] The current reservoir modeling methods and systems produce reservoir characterization models with uncertainty that can impact the accuracy of the follow-on reservoir simulation operations and the solutions developed therefrom.
SUMMARY
[0006] In embodiments, computer-based methods and systems are provided that create one or more models of a subsurface geological formation (such as a reservoir characterization model of a hydrocarbon reservoir or a model of some other subsurface geological formation). The model(s) (or parts thereof) can be output and displayed on a display screen to aid in understanding the spatial distribution of characteristics of the subsurface geological formation. The model(s) can be used to simulate fluid flow in the subsurface geological formation. The fluid flow simulation can be used to plan and optimize production of hydrocarbons from the subsurface geological formation, or plan and optimize other operations that involve fluid flow in the subsurface geological formation.
[0007] In embodiments, the methods and systems can combine traditionally ‘non-standard’ data combinations and provide results without the need to impose user-guided trends, variograms or other manual inputs, unlike traditional property modeling. The methods and systems can leverage artificial intelligence to determine the best and most likely property distribution characteristic of the subsurface geological formation. Furthermore, the methods and systems can also calculate probabilistic results that demonstrate the likely uncertainty in the model(s) based on the quality/quantity of the initial input data. The methods and systems can be significantly quicker than traditional property modeling techniques due to the minimal user input required.
[0008] In embodiments, the methods and systems can be configured to employ ensemble machine learning prediction for the spatial distribution of characteristics of the subsurface geological formation (e.g., hydrocarbon reservoir). The ensemble machine learning utilizes multiple machine learning models to obtain better predictive performance than could be obtained from any of the constituent learning models alone. For ensemble tree-based machine learning, the multiple machine learning models can employ decision trees as predictive models to go from observations about the subsurface geological formation represented in the branches of the decision trees to values or labels for the characteristics of the subsurface geological formation represented in the leaves of the decision trees. In one embodiment, the ensemble machine learning prediction can employ a random forest learning method.
[0009] In embodiments, the methods and systems can be configured to extend a machine learning ensemble (e.g. an ensemble tree-based machine learning model such as a random forest learning model) to use or embed data derived from one or more secondary models (also referred to as an external model(s)) as part of the training step. Such data can provide information that supplements the information contained in the training data.
[0010] This summary is provided to introduce a selection of concepts that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in limiting the scope of the claimed subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The subject disclosure is further described in the detailed description which follows, in reference to the noted plurality of drawings by way of non -limiting examples of the subject disclosure, in which like reference numerals represent similar parts throughout the several views of the drawings, and wherein:
[0012] Figure l is a flowchart illustrating a workflow where data generated by an external model is combined and embedded into the training operations of a machine learning ensemble that predicts a value of a target variable for a location within a subsurface reservoir given an input data vector of observations for that location;
[0013] Figure 2 is a schematic diagram that illustrates training operations of an ensemble of machine learning models;
[0014] Figure 3 is a flowchart illustrating a workflow where data generated by an external model is combined and embedded into an input data vector that is used in conjunction with a trained machine leaning ensemble to predict a value of a target variable for a location within a subsurface reservoir given an input data vector for that location;
[0015] Figure 4 is a schematic diagram illustrating operations of a trained machine learning ensemble as part of the workflow of Figure 3; and
[0016] Figure 5 is a functional block diagram of a computer processing system.
DETAILED DESCRIPTION
[0017] The particulars shown herein are by way of example and for purposes of illustrative discussion of the embodiments of the subject disclosure only and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the subject disclosure. In this regard, no attempt is made to show structural details in more detail than is necessary for the fundamental understanding of the subject disclosure, the description taken with the drawings making apparent to those skilled in the art how the several forms of the subject disclosure may be embodied in practice. Furthermore, like reference numbers and designations in the various drawings indicate like elements.
[0018] In embodiments, a computer-based system is provided that creates one or more models of a subsurface geological formation (which can be a reservoir characterization model of a hydrocarbon reservoir, or a model of some other subsurface geological formation). The model(s) (or parts thereof) can be output and displayed on a display screen to aid in understanding the spatial distribution of characteristics of the subsurface geological formation. The model(s) can be used to simulate fluid flow in the subsurface geological formation. The fluid flow simulation can be used to plan and optimize production of hydrocarbons from the subsurface geological formation, or plan and optimize other operations that involve fluid flow in the subsurface geological formation.
[0019] In embodiments, the system can combine traditionally “non-standard” data combinations and provide results without the need to impose user-guided trends, variograms or other manual inputs, unlike traditional property modeling. The system can leverage artificial intelligence to determine the best and most likely property distribution characteristic of the subsurface geological formation. Furthermore, the system can also calculate probabilistic results that demonstrate the likely uncertainty in the model(s) based on the quality/quantity of the initial input data. The system can be significantly quicker than traditional property modeling techniques due to the minimal user input required.
[0020] In embodiments, the system can be configured to employ an ensemble machine learning prediction for the spatial distribution of characteristics of the subsurface geological formation. Ensemble machine learning utilizes multiple machine learning models to obtain better predictive performance than could be obtained from any of the constituent learning methods alone. For ensemble tree-based machine learning, the multiple learning models can employ decision trees as predictive models to go from observations about the subsurface geological formation represented in the branches of the decision trees to values or labels for the characteristics of the subsurface geological formation represented in the leaves of the decision trees. In one embodiment, the ensemble machine learning prediction can employ a random forest learning method.
[0021] As used herein, the term "subsurface geological formation” refers to rock formations, structures, and other features beneath the land or sea-floor surface, including but not limited to hydrocarbon or petroleum reservoirs, water and saline aquifers, rock formations used for carbon sequestration, and other structures or features beneath the land or sea-floor surface.”
[0022] Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to learn based on training data. The computational analysis of machine learning models and their performance is a branch of theoretical computer science known as computational learning theory. The desired goal is to improve the machine learning models through experience (e.g., by applying the training data to the machine learning models in order to “train” the models). Machine learning, in other words, is the process of training computers to learn to perform certain functionalities. Typically, a machine learning model is designed and trained by applying the training data to the model. The model is adjusted (i.e., improved) based on how it responds to the training data.
[0023] A decision tree is a non-linear machine learning model that models a classification or regression problem as a series of binary “decisions” based on input features that leads to a final result stored in the tree's leaf nodes. Typically, thresholds for making decisions are selected for continuous variables to form binary decisions at each decision node while values for categorical variables may be mapped to each branch. Examples of machine learning algorithms for learning decision trees include the Iterative Dichotomiser 3 (ID3) algorithm, the C4.5 algorithm, CART, or other suitable algorithms.
[0024] While decision trees have many appealing properties, one significant disadvantage is that they are often prone to over-fitting, leading to increased generalization error. To overcome this problem, ensemble learning methods have been developed that combine collections of decision tree models (or an ensemble of decision trees or decision tree ensemble), with bootstrap sampling and other elements of randomization to produce models with higher degrees of accuracy and precision. For example, one well-known ensemble method for decision trees is the random forest learning method, which may be used for regression-type and classification-type problems. The random forest learning method employs a collection of decision trees and outputs the target variable value that is the mean of the output generated by the individual decision trees (for regression-type analysis) or the mode of the class labels output by the individual decision trees (for classification-type analysis). Other examples of decision tree ensembles include bagging decision trees, boosted trees, and rotation forest. Ensemble learning methods use multiple machine learning models to obtain better predictive performance than can be obtained from any of the constituent models.
[0025] In embodiments, the system can be configured to extend a machine learning ensemble (e.g. an ensemble tree-based machine learning model such as a random forest learning model) to use or embed data derived from one or more external models as part of the training step. Such data can provide information that supplements the information contained in the training data.
[0026] For example, consider the example where the machine learning ensemble is being trained to estimate a spatial variable (target variable) such as porosity or permeability using available secondary variable training data such as seismic attributes at different TVD and stratigraphic depth of a reservoir. These secondary variables are assumed to be known at the training data locations as well as at other locations where it is desired to estimate the target variable. In a naive implementation, the machine learning ensemble can be trained using the vector of training data together with observations (or labels) of the target variable (such as porosity or permeability) at different locations to construct a model of the distribution of the target variable of the reservoir as a function of spatial location. This trained machine learning model can then be applied to make estimates of the target variable (such as porosity or permeability) at all required locations using the secondary variables which are known at those locations. Knowledge of the secondary variables at those locations together with the trained machine learning model are sufficient to make the prediction. However, an important control on the spatial distribution of the target variable is the continuity or correlation length of the target variable. This control can be provided by several types of external petrophysical models (such as the Kriging model for porosity) that estimate the spatial distribution of the target variable in a reservoir. Note that this control is not in the form of data, so it cannot be used directly in the machine learning algorithm. Instead, the control carries useful information and thus it must be generated by the external petrophysical model and the results embedded into the training and prediction operations of the machine learning ensemble.
[0027] In embodiments, data generated by an external model(s) can be combined and embedded into the training operations of the machine learning ensemble as follows. For each respective machine learning model (e.g., decision tree) of the machine learning ensemble, and for each training data location used in the construction of the respective machine learning model (e.g., decision tree), one or more external models (such as a petrophysical model such as the Kriging model for porosity) is used to predict or estimate the value of the target variable (e.g., porosity) at that training location by cross validation (i.e., not using the sample itself). This means that at each training location, as well as the observed values of the secondary variables, we now have an estimate made by the external model of the value of target variable (e.g., porosity) at that location. This numerical estimate tells us about the predictive behavior of the external model at the training location in the same way as the numerical observations of the secondary variables tell us about the predictive behavior of those secondary variables at the training location. By considering this numerical estimate as an extra variable observed at the training location, we can construct an extended vector of training data for the training location that includes the secondary variable observations at the training location and the numerical estimate of the target variable at the training location made by the external model. Machine learning code (such as a Random Forest) can use its standard algorithm in conjunction with samples of the extended vector of training data for multiple training locations to train each machine learning model (e.g., decision tree) of the machine learning ensemble to predict the target variable (e.g., porosity) given input data corresponding to the extended vector of training data. Hence, the machine learning code learns about the relative qualities of secondary variables and embedded target variable predictions made by the external model and combines them to produce a better predictor of the target variable (e.g., porosity).
[0028] At a location where an estimate of the target variable is required, assumption observations of all the secondary variables for such location is obtained, and the external model can be used to make an estimate of the target variable at that location. Together these provide an input data vector that is fed to the trained machine learning ensemble (e.g., decision trees) to estimate the target variable at the required location in the usual way appropriate for the machine learning algorithm. For example, for a Random Forest, the estimate of the target variable (e.g. porosity) as a function of spatial location in the subsurface reservoir can be determined from the mean of all the target variable predictions produced by the ensemble of decision trees from the input data vector pertaining to a location in the subsurface reservoir.
[0029] In cases where the ensemble of target variable predictions produced by the ensemble of machine learning models (e.g., decision trees) can be considered to give a good estimate of the conditional distribution at a target location - such as the Random Forest for example - the estimate of the conditional distribution at the target location can be used to provide at least one additional product selected from the group which includes: 1) an uncertainty estimate.
2) a stochastic modeling algorithm capable of building realizations of the spatial target variable exploring the uncertainty space.
3) non-linear estimates of exceedance probabilities, P[Z(x) > c], the probability that the spatial target variable Z at location x is bigger than a cutoff c.
[0030] The additional products of 1) and 3) can be determined by using the set of predictions as an estimate of the conditional distribution and extracting uncertainty and exceedance probabilities from that. The additional product 2) can be determined using a novel approach. Since the conditional distribution has been estimated at each target location, a method known in the literature as P Field Simulation can be modified to perform a condition uniform distribution realization of a Random Function model with a prescribed variogram. However, since there will often be a non-unique solution to the quantile that matches the observed target value at the well locations, one of these is sampled at each well location in a consistent manner. A variant of a Monte Carlo algorithm by Xavier Freulon gives an appropriate solution. In this algorithm, a first step selects appropriate quantile values from the conditional distributions estimated at the well locations using a Monte Carlo sampling such that the sampled values match the target variable observed at the well locations and follow the prescribed variogram. A second step produces a conditional realization of a uniform spatial random field at the set of target locations with the prescribed variogram and which matches the sampled uniform values at the well locations. A third step samples from the family of conditional distributions with the conditional uniform random field at the target locations to give the realization of the spatial target variable exploring the uncertainty space.
[0031] Figure 1 illustrates an example workflow where data generated by an external model is combined and embedded into the training operations of a machine learning ensemble that predicts a value of a target variable (such as porosity or permeability) for a location within a subsurface reservoir given an input data vector of observations for that location.
[0032] The workflow begins in block 101 by collecting training data for a particular location (or training location) in a reservoir. The training data includes a value for one or more secondary variables that characterize a geophysical attribute or property (such as seismic attributes) at the particular location in the reservoir as well as a ground truth label (known value) for a target variable that characterizes a geophysical attribute or property (such as porosity or permeability) at the particular location in the reservoir. The training data, including the value for one or more secondary variables and the ground truth label (known value) for the target variable, can be measured by surveys, test well analysis and interpretation, rock and fluid sampling and analysis, or other methods of reservoir characterization.
[0033] In block 103, an external model is used to predict and store a value for the target variable at the particular location in the reservoir. For example, in an embodiment where the target variable represents porosity of the reservoir, a petrophysical model such as the Kriging model for porosity can be used to predict and store a value for the target variable porosity at the particular location in the reservoir.
[0034] In block 105, a training data vector associated with a particular location in the reservoir is generated or built. The training data vector includes the secondary variable training data of 101 and the predicted value for the target variable of 103. The training data vector is associated with the target variable ground truth label (known value) at the particular location in the reservoir.
[0035] In block 107, the operations check whether the operations of 101 to 105 should be repeated for additional locations in the same reservoir. If so (e.g., for the case where sufficient training data has not yet been collected), the operations revert back to 101 to repeat the operations of 101 to 105 for additional locations in the same reservoir. If not (e.g., for the case where sufficient training data has been collected), the operations continue to block 109.
[0036] In block 109, the training data vector samples and associated target variable ground truth labels of 105 for different locations (training locations) in the same reservoir are collected and stored for training the ensemble of machine learning models.
[0037] In block 111, the training data vector samples and associated target variable ground truth labels that are collected and stored in 109 are used to train an ensemble of machine learning models (e.g., ensemble of random forest machine decision tree learning models) to predict a value for the target variable (e.g., porosity or permeability) and optionally associated uncertainty (or other product) given an input data vector corresponding to the training data vector samples.
[0038] Figure 2 illustrates training operations of a machine learning ensemble. The machine learning ensemble includes a set of two or more machine learning models (labeled “ML Model 1”, “ML Model 2”, etc.) that are trained using the training data vector samples and associated target variable ground truth labels that are collected and stored in 109. In embodiments, the machine learning ensemble includes an ensemble of random forest decision tree models. In this embodiment, the training operations can employ random sampling of the training data vector samples that are collected and stored in 109, such as through randomized feature bagging or other means. Such random sampling is used to reduce the correlation between the random forest decision tree models that result from the training operations.
[0039] Figure 3 illustrates an example workflow where data generated by an external model is combined and embedded into an input data vector that is used in conjunction with a trained machine leaning ensemble to predict a value of a target variable for a location within a subsurface reservoir given an input data vector for that location.
[0040] The workflow begins in block 301 by obtaining input data for a particular location in a reservoir where the value of a target variable (e.g., porosity or permeability) at the particular location is unknown. The input data includes a value for one or more secondary variables that characterize a geophysical attribute or property (such as seismic attributes) at the particular location in the reservoir. The input data can be measured by surveys, test well analysis and interpretation, rock and fluid sampling and analysis, or other methods of reservoir characterization.
[0041] In block 303, an external model is used to predict and store a value for the target variable at the particular location in the reservoir. For example, in an embodiment where the target variable represents porosity of the reservoir, a petrophysical model such as the Kriging model for porosity can be used to predict and store a value for the target variable porosity at the particular location in the reservoir.
[0042] In block 305, an input data vector associated with a particular location in the reservoir is generated or built. The input data vector includes the secondary variable input data of 301 and the predicted value for the target variable of 303.
[0043] In block 307, the input data vector of 305 is used as input to the trained ensemble of machine learning models (e.g., trained ensemble of random forest decision tree models) of 109, which is configured to predict a value for the target variable (e.g., porosity or permeability) and optionally associated uncertainty (or other product) given the input data vector. The sampling of the input data vector of 305 for input to the respective machine learning models of the ensemble follows the sampling scheme of the training operations (109). For example, in embodiments where the ensemble of machine learning models includes an ensemble of random forest decision tree models, the sampling of the input data vector of 305 can employ the same random sampling scheme of the training data vector samples used in the training operations (109).
[0044] In block 309, the predicted value of the target variable (e.g., porosity or permeability) and optionally associated uncertainty (or other product) of 307 are incorporated into a reservoir model that characterizes spatial distribution of geophysical properties of the reservoir.
[0045] Figure 4 illustrates operations of a trained machine learning ensemble as part of block 307. The machine learning ensemble includes a set of two or more machine learning models (labeled “ML Model 1”, “ML Model 2”, etc.) that are trained as described herein (Figures 1 and 2). The trained machine learning models output respective predictions (values) for the target variable (e.g., porosity or permeability) at a particular reservoir location given observations from the input data vector corresponding to that particular location. The predictions (values) for the target variable (e.g., porosity or permeability) are combined (such as by averaging or other statistical analysis) to generate the predicted value of the target variable (e.g., porosity or permeability) and optionally associated uncertainty (or other product) as part of block 307. The sampling of the observations of the input data vector for input to the respective machine learning models of the ensemble follows the sampling scheme of the training operations (109). For example, in embodiments where the ensemble of machine learning models includes an ensemble of random forest decision tree models, the sampling of the observations of the input data vector can employ the same random sampling scheme of the training data vector samples used in the training operations (109).
[0046] In embodiments, the operations of Figures 3 and 4 can be carried out for a number of different locations in the reservoir to predict a value of the target variable (e.g., porosity or permeability) and optionally associated uncertainty (or other product) for the different reservoir locations. Such predicted target variable values and products can be incorporated into the reservoir model that characterizes spatial distribution of geophysical properties of the reservoir.
[0047] In one embodiment, the system can employ a machine learning model that belongs to the class of Conditional Random Fields (CRF) as described in Lafferty et al., “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” Proceedings ICM-2001, 2001. The form of CRF that is used in this embodiment can accommodate and embed existing spatial models using a Markovian hypothesis. Let Z(x) be a target variable of interest at the location x, and let K(x) be a vector of secondary or auxiliary variables observed at x. Let {Zi, Y } be observations of the target and secondary variables observed in the field at the locations (x , and finally let Ze *(x) = ({¾ K ) be a vector of pre-existing estimators of Z(x). Then the Markov hypothesis that we require is the conditional distribution of Z(x) given all available data Fz^All(z) satisfies,
Figure imgf000015_0001
This states that the conditional distribution of Z(x) given all the secondary values observed at x and given all the remote observations of {ZL, K reduces to the far simpler conditional distribution of Z(x) given all the secondary values observed at x and the vector of model predictions at x.
[0048] In this embodiment, the target variable, Z(x) can be porosity, and the secondary variables can be seismic attributes, stratigraphic coordinates, true earth coordinates, zone information and distance to faults. Variables representing estimated porosity from two simple kriging models are also embedded into CRF. Rather than spend time working on the inference of such models, we simply choose a long-range and a short-range model and allow the contribution of these models to be determined together with that of the secondary variables during construction of Fz(-x) Al1 (z) . This is motivated by the empirical observation that the main contribution of the embedded kriging porosity estimates in the new algorithm is to provide information about lateral continuity of the target variable. This choice allows this simplified version of the estimation process to be fully automated.
[0049] As described in Lafferty et al, an advantage of the CRF compared to a generative Bayesian model is that no effort is expended on establishing relationships between the predictor variables. In spatial models these involve stringent hypothesis such as the stationarity of the property of interest (perhaps coupled with some simple model of trend) and the stationarity of the relationship between the target variables and the explanatory variables (e.g. the hypothesis that the relationship between porosity and seismic attributes do not change spatially). One might object that our embedded models Ze *(x) are constructed with such hypotheses. This is true, but their influence is mitigated in two ways. Firstly, the Markov hypothesis removes any direct influence of the construction of Ze *(x ), instead weighing it’s influence on the final estimate in an entirely symmetric way with the secondary variables, simply on their ability to predict target distribution. Secondly, the principle impact of stationarity in the classic model is seen in stochastic realizations which need to invoke the full multivariate distribution and therefore lean heavily on the hypotheses. This can be greatly avoided in the current proposal.
[0050] In embodiments, a highly successful non-parametric paradigm for estimating pz x)\Aii (z) can be empi0yecj based on Meinshausen., N., “Quantile Random Forests,” Journal of Machine Learning Research 2006, 7, 983-999. The inference problem is complicated by the dependency on the embedded models Ze * x). If these estimators make use of Z(x) in the estimation of Z(x), we clearly have introduced a bias. We avoid this bias by the simple expediency of training the decision forest on cross validated estimates. Thus, the training data set for each tree is {ZL; Yt,ZC * Ve(x ) }, where ZC * Ve(x) are cross validated model estimates at x. With the estimates of Fz<-x^AU(z) at all target locations x, conditional realizations of the reservoir model can be produced. A modified conditional P field simulation can be used which honors data at the well locations, allows the final result to track any discontinuous shifts in the distribution (e.g. when Zone boundaries are crossed) and follows the local heteroscedasticity observed in the conditional distribution as well as the spatially varying relationship between conditioning variables and target.
[0051] In embodiments, the system can be configured to construct a reservoir characterization model of a subsurface hydrocarbon reservoir, which includes the spatial distribution of one or more properties (target variables) of a subsurface hydrocarbon reservoir. Applications to other spatial modeling problems can be considered where an external physical model carries information relevant to the target variable that is not solely contained in observed data. In still other applications, the system can be configured to perform non-spatial regression when relevant information is available through external models.
[0052] Advantageously, the system as described herein allows for the use of many secondary variables and allows for non-linear relationships between them. This is an improvement over the prior art which allows for only linear interactions between secondary variables and even then, necessitates great care to ensure that the data will not become collinear leading to singular matrices and failed solutions.
[0053] Moreover, the system needs little to no manual human interaction in many, or most cases, which can reduce the amount of time and manual user interaction required to construct the model.
[0054] Furthermore, while some ‘hyperparameters’ can exist in the system, it should be possible to do without them to give the user a simple workflow experience as compared to the prior art.
[0055] Finally, the system as described herein can be configured to provide a simple user interface experience which requires users to simply define the target variable they wish to predict, the secondary variables they want to use to calculate that prediction, the external models they want to additionally use in the prediction and a grid specifying the locations that they require the predictions to be made at.
[0056] Figure 5 illustrates an example device 2500, with a processor 2502 and memory 2504 that can be configured to implement various embodiments of the methods and systems for reservoir modeling as discussed in this disclosure. Memory 2504 can also host one or more databases and can include one or more forms of volatile data storage media such as random- access memory (RAM), and/or one or more forms of nonvolatile storage media (such as read only memory (ROM), flash memory, and so forth).
[0057] Device 2500 is one example of a computing device or programmable device and is not intended to suggest any limitation as to scope of use or functionality of device 2500 and/or its possible architectures. For example, device 2500 can comprise one or more computing devices, programmable logic controllers (PLCs), etc.
[0058] Further, device 2500 should not be interpreted as having any dependency relating to one or a combination of components illustrated in device 2500. For example, device 2500 may include one or more of computers, such as a laptop computer, a desktop computer, a mainframe computer, etc., or any combination or accumulation thereof.
[0059] Device 2500 can also include a bus 2508 configured to allow various components and devices, such as processors 2502, memory 2504, and local data storage 2510, among other components, to communicate with each other.
[0060] Bus 2508 can include one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. Bus 2508 can also include wired and/or wireless buses.
[0061] Local data storage 2510 can include fixed media (e.g., RAM, ROM, a fixed hard drive, etc.) as well as removable media (e.g., a flash memory drive, a removable hard drive, optical disks, magnetic disks, and so forth).
[0062] One or more input/output (I/O) device(s) 2512 may also communicate via a user interface (UI) controller 2514, which may connect with I/O device(s) 2512 either directly or through bus 2508.
[0063] In one possible implementation, a network interface 2516 may communicate outside of device 2500 via a connected network. [0064] A media drive/interface 2518 can accept removable tangible media 2520, such as flash drives, optical disks, removable hard drives, software products, etc. In one possible implementation, logic, computing instructions, and/or software programs comprising elements of module 2506 may reside on removable media 2520 readable by media drive/interface 2518. Various processes of the present disclosure or parts thereof can be implemented by instructions and/or software programs that are elements of module 2506. Such instructions and/or software programs may reside on removable media 2520 readable by media drive/interface 2518 as is well known in the computing arts.
[0065] In one possible embodiment, input/output device(s) 2512 can allow a user (such as a human annotator) to enter commands and information to device 2500, and also allow information to be presented to the user and/or other components or devices. Examples of input device(s)
2512 include, for example, sensors, a keyboard, a cursor control device (e.g., a mouse), a microphone, a scanner, and any other input devices known in the art. Examples of output devices include a display device (e.g., a monitor or projector), speakers, a printer, a network card, and so on.
[0066] Various processes of the present disclosure may be described herein in the general context of software or program modules, or the techniques and modules may be implemented in pure computing hardware. Software generally includes routines, programs, objects, components, data structures, and so forth that perform particular tasks or implement particular abstract data types. An implementation of these modules and techniques may be stored on or transmitted across some form of tangible computer-readable media. Computer-readable media can be any available data storage medium or media that is tangible and can be accessed by a computing device. Computer readable media may thus comprise computer storage media. “Computer storage media” designates tangible media, and includes volatile and non-volatile, removable and non-removable tangible media implemented for storage of information such as computer readable instructions, data structures, program modules, or other data. Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other tangible medium which can be used to store the desired information, and which can be accessed by a computer. Some of the methods and processes described above, can be performed by a processor. The term “processor” should not be construed to limit the embodiments disclosed herein to any particular device type or system. The processor may include a computer system. The computer system may also include a computer processor (e.g., a microprocessor, microcontroller, digital signal processor, or general-purpose computer) for executing any of the methods and processes described above.
[0067] Some of the methods and processes described above, can be implemented as computer program logic for use with the computer processor. The computer program logic may be embodied in various forms, including a source code form or a computer executable form. Source code may include a series of computer program instructions in a variety of programming languages (e.g., an object code, an assembly language, or a high-level language such as C, C++, or JAVA). Such computer instructions can be stored in a non-transitory computer readable medium (e.g., memory) and executed by the computer processor. The computer instructions may be distributed in any form as a removable storage medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server or electronic bulletin board over a communication system (e.g., the Internet or World Wide Web).
[0068] Alternatively or additionally, the processor may include discrete electronic components coupled to a printed circuit board, integrated circuitry (e.g., Application Specific Integrated Circuits (ASIC)), and/or programmable logic devices (e.g., a Field Programmable Gate Arrays (FPGA)). Any of the methods and processes described above can be implemented using such logic devices.
[0069] Although only a few example embodiments have been described in detail above, those skilled in the art will readily appreciate that many modifications are possible in the example embodiments without materially departing from this invention. Accordingly, all such modifications are intended to be included within the scope of this disclosure as defined in the following claims. In the claims, means-plus-function clauses are intended to cover the structures described herein as performing the recited function and not only structural equivalents, but also equivalent structures. Thus, although a nail and a screw may not be structural equivalents in that a nail employs a cylindrical surface to secure wooden parts together, whereas a screw employs a helical surface, in the environment of fastening wooden parts, a nail and a screw may be equivalent structures. It is the express intention of the applicant not to invoke 35 U.S.C. § 112, paragraph 6 for any limitations of any of the claims herein, except for those in which the claim expressly uses the words ‘means for’ together with an associated function.

Claims

WHAT IS CLAIMED IS:
1. A computer-implemented method for creating a model of a subsurface geological formation, comprising training a machine learning ensemble using training data as well as additional data derived from at least one external model, wherein such additional data provides information that supplements the information contained in the training data; and using the trained machine learning ensemble to construct the model based on input data pertaining to the subsurface geological formation.
2. A method according to claim 1, further comprising: constructing a plurality of training data vectors, wherein each training data vector includes the training data and the additional data derived from the at least one external model; and using the plurality of training data vectors to train the machine learning ensemble.
3. A method according to claim 2, wherein: the training data of each training data vector includes at least one observation that characterizes a reservoir property or attribute.
4. A method according to claim 3, wherein: the at least one observation is measured by a survey, test well analysis and interpretation, rock and fluid sampling and analysis, or other methods of reservoir characterization.
5. A method according to claim 1, wherein: the machine learning ensemble comprises a plurality of tree-based machine learning models.
6. A method according to claim 5, wherein: the plurality of tree-based machine learning models comprises a plurality of random forest decision-tree learning models.
7. A method according to claim 1, wherein: the model comprises a reservoir characterization model of a subsurface hydrocarbon reservoir.
8. A method according to claim 7, wherein: the reservoir characterization model includes a spatial distribution of at least one property of the subsurface hydrocarbon reservoir.
9. A method according to claim 8, wherein: the at least one property of the subsurface hydrocarbon reservoir comprises porosity or permeability.
10. A method according to claim 1, further comprising: outputting the model or parts thereof for display on a display screen to aid in understanding spatial distribution of characteristics of the subsurface geological formation.
11. A method according to claim 1, further comprising: using the model to simulate fluid flow in the subsurface geological formation.
12. A method according to claim 1, further comprising: calculating probabilistic results that demonstrate uncertainty in the model based on the quality and/or quantity of the input data.
13. A method according to claim 1, further comprising: determining at least one additional product based on data generated by the trained machine learning ensemble.
14. A method according to claim 13, wherein: the at least one additional product is selected from the group consisting of i) an uncertainty estimate, ii) a stochastic modeling algorithm that builds a realization of a spatial target variable of the model exploring its uncertainty space, and iii) non-linear estimates of exceedance probabilities, P[Z(x) > c], which is the probability that the spatial target variable Z at location x is bigger than a cutoff c.
15. A computer system including computer memory storing a sequence of instructions that are executable on a processor, wherein the sequence of instructions is configured to carry out the method of claim 1.
16. A non-transitory computer readable medium storing a sequence of instructions that are executable on a processor, wherein the sequence of instructions is configured to carry out the method of claim 1.
PCT/US2020/065620 2019-12-18 2020-12-17 Methods and systems for subsurface modeling employing ensemble machine learning prediction trained with data derived from at least one external model WO2021127186A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/757,657 US20230358917A1 (en) 2019-12-18 2020-12-17 Methods and systems for subsurface modeling employing ensemble machine learning prediction trained with data derived from at least one external model
EP20901603.9A EP4078247A4 (en) 2019-12-18 2020-12-17 Methods and systems for subsurface modeling employing ensemble machine learning prediction trained with data derived from at least one external model

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962949673P 2019-12-18 2019-12-18
US62/949,673 2019-12-18

Publications (1)

Publication Number Publication Date
WO2021127186A1 true WO2021127186A1 (en) 2021-06-24

Family

ID=76478554

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/065620 WO2021127186A1 (en) 2019-12-18 2020-12-17 Methods and systems for subsurface modeling employing ensemble machine learning prediction trained with data derived from at least one external model

Country Status (3)

Country Link
US (1) US20230358917A1 (en)
EP (1) EP4078247A4 (en)
WO (1) WO2021127186A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115423148A (en) * 2022-07-29 2022-12-02 江苏大学 Agricultural machinery operation performance prediction method and device based on kriging method and decision tree
WO2023081495A1 (en) * 2021-11-08 2023-05-11 Conocophillips Company Systems and methods of modeling geological facies for well development
US11953636B2 (en) 2022-03-04 2024-04-09 Fleet Space Technologies Pty Ltd Satellite-enabled node for ambient noise tomography

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180188403A1 (en) * 2016-12-29 2018-07-05 Thomas C. Halsey Method and System for Regression and Classification in Subsurface Models to Support Decision Making for Hydrocarbon Operations
US20180320485A1 (en) * 2016-02-05 2018-11-08 Landmark Graphics Corporation Classification and regression tree analysis of formation realizations
US20190064389A1 (en) * 2017-08-25 2019-02-28 Huseyin Denli Geophysical Inversion with Convolutional Neural Networks
WO2019221717A1 (en) * 2018-05-15 2019-11-21 Landmark Graphics Corporation Petroleum reservoir behavior prediction using a proxy flow model

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3622328A4 (en) * 2017-05-08 2021-02-17 Services Pétroliers Schlumberger Integrating geoscience data to predict formation properties
EP3894903A1 (en) * 2018-12-11 2021-10-20 ExxonMobil Upstream Research Company Automated reservoir modeling using deep generative networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180320485A1 (en) * 2016-02-05 2018-11-08 Landmark Graphics Corporation Classification and regression tree analysis of formation realizations
US20180188403A1 (en) * 2016-12-29 2018-07-05 Thomas C. Halsey Method and System for Regression and Classification in Subsurface Models to Support Decision Making for Hydrocarbon Operations
US20190064389A1 (en) * 2017-08-25 2019-02-28 Huseyin Denli Geophysical Inversion with Convolutional Neural Networks
WO2019221717A1 (en) * 2018-05-15 2019-11-21 Landmark Graphics Corporation Petroleum reservoir behavior prediction using a proxy flow model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
L1NDE N. ET AL.: "Geological realism in hydrogeological and geophysical inverse modeling: a review", ADVANCES IN WATER RESOURCES, vol. 86, 2015, pages 86 - 101, XP029319749, DOI: 10.1016/j.advwatres.2015.09.019 *
See also references of EP4078247A4 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023081495A1 (en) * 2021-11-08 2023-05-11 Conocophillips Company Systems and methods of modeling geological facies for well development
US11953636B2 (en) 2022-03-04 2024-04-09 Fleet Space Technologies Pty Ltd Satellite-enabled node for ambient noise tomography
CN115423148A (en) * 2022-07-29 2022-12-02 江苏大学 Agricultural machinery operation performance prediction method and device based on kriging method and decision tree
CN115423148B (en) * 2022-07-29 2023-10-31 江苏大学 Agricultural machinery operation performance prediction method and device based on Ke Li jin method and decision tree

Also Published As

Publication number Publication date
US20230358917A1 (en) 2023-11-09
EP4078247A4 (en) 2024-01-03
EP4078247A1 (en) 2022-10-26

Similar Documents

Publication Publication Date Title
US11599790B2 (en) Deep learning based reservoir modeling
Anifowose et al. A parametric study of machine learning techniques in petroleum reservoir permeability prediction by integrating seismic attributes and wireline data
US11636240B2 (en) Reservoir performance system
US20230358917A1 (en) Methods and systems for subsurface modeling employing ensemble machine learning prediction trained with data derived from at least one external model
de la Varga et al. Structural geologic modeling as an inference problem: A Bayesian perspective
WO2021130512A1 (en) Device and method for predicting values of porosity lithofacies and permeability in a studied carbonate reservoir based on seismic data
WO2018125760A1 (en) Method and system for regression and classification in subsurface models to support decision making for hydrocarbon operations
Rostamian et al. Evaluation of different machine learning frameworks to predict CNL-FDC-PEF logs via hyperparameters optimization and feature selection
WO2013048798A2 (en) Reservoir properties prediction with least square support vector machine
US20200401951A1 (en) Estimating permeability values from well logs using a depth blended model
Wang et al. Data-driven S-wave velocity prediction method via a deep-learning-based deep convolutional gated recurrent unit fusion network
US20220178228A1 (en) Systems and methods for determining grid cell count for reservoir simulation
Korjani et al. Reservoir characterization using fuzzy kriging and deep learning neural networks
Han et al. Multiple point geostatistical simulation with adaptive filter derived from neural network for sedimentary facies classification
Hajizadeh Ants can do history matching
WO2020142257A1 (en) Method and system for evaluating variability in subsurface models to support decision making for hydrocarbon operations
Alameedy et al. Predicting dynamic shear wave slowness from well logs using machine learning methods in the Mishrif Reservoir, Iraq
Gu et al. A smart predictor used for lithologies of tight sandstone reservoirs: a case study of member of Chang 4+ 5, Jiyuan Oilfield, Ordos Basin
Maschio et al. Integration of geostatistical realizations in data assimilation and reduction of uncertainty process using genetic algorithm combined with multi-start simulated annealing
Bhattacharya Unsupervised time series clustering, class-based ensemble machine learning, and petrophysical modeling for predicting shear sonic wave slowness in heterogeneous rocks
Rousset et al. Optimization-based framework for geological scenario determination using parameterized training images
Wang et al. Uncertainty and explainable analysis of machine learning model for reconstruction of sonic slowness logs
Kor Decision-Driven Data Analytics for Well Placement Optimization in Field Development Scenario-Powered by Machine Learning
Hallam et al. Multiple imputation via chained equations for elastic well log imputation and prediction
US11782177B2 (en) Recommendation engine for automated seismic processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20901603

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020901603

Country of ref document: EP

Effective date: 20220718