GB2584449A

GB2584449A - Apparatus method and computer-program product for calculating a measurable geological metric

Info

Publication number: GB2584449A
Application number: GB1907861.7A
Authority: GB
Inventors: Cameron Johnson Luke; Macgregor Lucy; Stewart Michael; Ross James; Benallack Keegan
Original assignee: COGNITIVE GEOLOGY Ltd
Current assignee: COGNITIVE GEOLOGY Ltd
Priority date: 2019-06-03
Filing date: 2019-06-03
Publication date: 2020-12-09
Anticipated expiration: 2039-06-03
Also published as: GB2584449B; GB201907861D0

Abstract

Apparatus, method and computer program product are provided to receive or generate, for a measurable geological metric (e.g. volume of oil) comprising a physical property of the Earth's interior, a set of candidate model functions. Each candidate model function defines a predicted variation of a geological input variable (e.g. porosity, permeability) upon which the metric depends along a respective geological direction. A representative subset of candidate model functions is determined by performing iterative refinement to calculate 414 values of the metric for a first incomplete proportion of the cells of a 3D grid of cells. The calculation uses the set of candidate model functions and performs model clustering 416 based on values of the iterative refinement. A computation is performed on a second, larger proportion of cells, the computation to calculate values of the measurable geological metric for the larger proportion of cells using the identified 418 representative subset of candidate model functions as an input to the metric computation. The number of cells in the incomplete portion may be iteratively increased after each model clustering such that the number of candidate model functions is progressively reduced. The second proportion may comprise all the cells in the grid.

Description

APPARATUS METHOD AND COMPUTER-PROGRAM PRODUCT FOR CALCULATING A MEASURABLE GEOLOGICAL METRIC

FIELD OF THE INVENTION

The present invention relates to an apparatus and computer-program product and method for calculating a predicted value for a measurable geological metric relating a physical property of the Earth's interior.

BACKGROUND OF THE INVENTION

Prediction of measurable geological metrics such as, for example, a volume of oil in place (OIP) in a given volume of the Earth's interior is an exceptionally difficult prediction to make due to uncertainties in values of the geological input variables involved in calculation of the geological metric. For example, an OIP metric may be expressed as a product of the geological input variables porosity (p), net-to-gross (NTG) and water saturation (Sw) thus: 0/P = NTG * * (1-Sw) (equation 1) Other examples of measurable geological metrics include a permeability-thickness product, a net present value of oil in the physical volume of the Earth and a Dykstra-Parsons coefficient of permeability variation. The geological input variables tend to vary in value in different ways along different geological directions within the given interior Earth volume. Examples of different geological directions are: a distance from an ancient shoreline; a vertical distance within a single depositional unit; a burial depth in a Cartesian coordinate system; a 2-dimensional map area; and a true stratigraphic thickness. Furthermore, the inaccessibility of the Earth's interior volume (e.g. a volume corresponding to a potential hydrocarbon reservoir) to make physical measurements means that the predictions are inevitably made with sparsely sampled empirical data. Examples of the types of sparsely sampled data that may be used to constrain the geological model simulation include seismic images, acoustic images, resistivity and/or conductivity measurements, nuclear measurements and rock composition samples. The predictions made by known geological algorithms are complex and may depend on a large number of different input variables such as porosity, permeability, rock-type and saturation level. Often there are inter-dependencies between the different input geological variables, which can be difficult to disentangle. As a consequence, error bounds associated with any computational prediction of a measurable geological metric can be difficult to quantify and yet the magnitude of the error is likely to be high.

Commonly applied approaches to this problem of calculating a measurable geological metric in view of sparse sampling and difficult to quantify variability of geological input variables use a simplification that tests a select subset of geological input variables (typically high, mid and low cases), thought to have the greatest impact on the outcome. A set of candidate model functions corresponding to the measurable geological metric may be defined, each candidate model function of the set defining a predicted variation of a geological input variable upon which the measurable geological metric depends along a respective geological direction. However, the commonly applied approaches test only the candidate model functions locally around a single realisation of the sub-surface model. A single candidate model function of the full ensemble may be selected and parameters may be varied only around this model, disregarding all or most of the other models. If this initial model is inaccurate, as it almost certainly will be, the uncertainty in the result will also be mis-characterised because only a small part of the complete model space is considered. Unexpected, but potentially important, solutions may also be missed. The problem is further compounded because the definition of a low-mid-high outcome can only be related to the individual input geological variables one-at-a-time; what is high for a single geological input variable may not necessarily result in a high measurable geological metric.

A more effective approach is to perform a global search of the possible model space and run every plausible scenario through to compute the geological metric. In this way a complete range of scenarios, including those well away from pre-conceived ideas of the geology, can be considered, leading to a better characterisation of the uncertainty and its impact on the accuracy of the prediction for the measurable geological metric. However, this can also be challenging because even for relatively simple problems, the size of the model space can be extremely large.

The size of the model space may depend on the number of input geological variables on which a given geological metric depends and each of those input geological variables may itself depend on variations a number of different geological directions. A given input variable with values that vary along four different geological directions results in 4! = 24 different models corresponding to the different permutations according to which statistical fits can be performed on the sampled data to characterise the variability. Non-stationary effects of a given geological variable can be removed from a mathematical function that has been fitted to the sparse data corresponding to the variable, but this is done with respect to each different geological direction in turn and the ordering in which the four geological directions are treated may change the outcome. For example, for the 01P metric calculation, the number of models may be 4!* 4! *4! because there are 4! models for each of the three input variables in the 01P metric which gives a total 13824 models, each of which may have a candidate model function associated with it.

Calculation of a more complex metric on each cell of a 3D grid of cells representing an interior volume of the Earth may seem to be an intractable problem, not least because in addition to dealing with the 13824 models, the number of cells in the 3D grid is likely to be at least of 500,000 and probably many millions of cells.

BRIEF SUMMARY OF THE DISCLOSURE

According to a first aspect, the present invention provides a computer program product embodied on a computer-readable medium comprising program instructions, configured such that when executed by processing circuitry, cause the processing circuitry to: receive or generate (i.e. obtain), for a measurable geological metric comprising a physical property of the Earth's interior, a set of candidate model functions corresponding to the measurable geological metric, each candidate model function of the set defining a predicted variation of a geological input variable upon which the measurable geological metric depends along a respective geological direction; define a computation volume comprising a three-dimensional, 3D, grid of cells representing a volume of the Earth's interior for which predicted values of the measurable geological metric are to be calculated; and identify a representative subset of the set of candidate model functions of the set of candidate model functions by performing iterative refinement to calculate values of the measurable geological metric for a first incomplete proportion of the cells of the 3D grid of cells using the set of candidate model functions and performing model clustering based on the calculated values of the iterative refinement; and perform a computation on a second proportion of cells of the 3D grid, larger than the first incomplete proportion of cells, the computation to calculate values of the measurable geological metric for the larger proportion of cells using the identified representative subset of candidate model functions as an input to the metric computation.

In some embodiments the identification of the representative subset of candidate models comprises iteratively increasing a number of cells in the first incomplete portion of cells after each corresponding model clustering iteration such that as the number of candidate model functions in the representative subset is progressively reduced via the model clustering, the number of cells in the first incomplete portion of cells is progressively increased.

In some embodiments the iterative refinement comprises determining an appropriate number of cells forming the first incomplete portion of cells by checking for a minimum number of cells required to achieve convergence of the calculated values of the measurable geological metric and progressing to the model clustering when convergence has been achieved.

In some embodiments the second proportion of the 3D grid cells comprises a full resolution calculation involving all of the cells.

In some embodiments the computation on the second proportion of 3D grid cells is performed for the first time after the set of candidate model functions has been reduced in number to a minimum representative subset.

In some embodiments the computer program product comprises instructions when executed by processing circuitry to cause the processing circuitry to generate the set of candidate model functions based on performing at least one of line or curve fits to a set of sparsely sampled measurement data to parameterise the sparsely sampled measurement data, wherein the sparsely sampled measurement data is empirically derived by making measurements of at least one physical property of the Earth's interior.

In some embodiments the sparsely sampled measurement data is fitted along a first geological direction to determine at least one fit parameter with respect to the first geological direction, wherein the at least-one fit parameter represents a best fit line or curve.

In some embodiments the computer program is to calculate a first set of residuals between values of the sparsely sampled measurement data values and corresponding first functional values from the best-fit line or curve.

In some embodiments comprise iteratively performing at least one further statistical line or curve fit to a set of residuals of a previous best-fit with respect to a further geological direction different from the first geological direction and different from any previously iterated geological directions.

In some embodiments a result of the iteratively performed statistical fits for each of a plurality of geological directions provides a number of best-fit parameters characterising the candidate model functions for the measurable geological metric, wherein the number best fit parameters comprises parameters from each iteration of the statistical fitting and provides a basis function for a respective candidate model function, the basis function enabling values of the measurable geological metric to be calculated in any cell of the 3D grid of cells.

In some embodiments the characteristic best-fit parameters and the sparse measurement data are used by the processing circuitry to populate at least a subset of values of the 3D grid of cells.

In some embodiments the model clustering preferentially allocates to the representative subset of models, ones of the set of candidate model functions associated with input geological variables to which the measurable geological metric being determined is more sensitive to variations in, relative to other input geological variables to which the measurable geological metric being determined is less sensitive to variations in.

In some embodiments the set of candidate model functions of the measurable geological metric from which the representative subset of models is identified has, for each candidate model function, a parameter set describing both stationary and any non-stationary aspects of the candidate model function.

In some embodiments any non-stationary behaviour of one or more of the input geological variables is removed from the set of candidate model functions before the identification of the representative subset of models is performed.

In some embodiments at least one of the set of candidate model functions input to the identification of the representative subset of models has a parameter set providing an indication of an ordering according to which two more transformations to reduce spatial bias in respective input geological variables has been performed.

In some embodiments the set of candidate model functions comprises a full ensemble of candidate model functions representing all of a first number of input geological variables and a second number of branch factors, the branch factors representing possible different values of a respective input geological variable.

In some embodiments the model clustering reduces the full ensemble of candidate model functions to the representative subset of candidate model functions depending on an estimate of how closely a cumulative distribution function for the measurable geological metric of the representative subset of candidate model functions represents a cumulative distribution function for the measurable geological metric of the full ensemble of candidate model functions.

In some embodiments the at least one input geological variable comprises at least one of: grid, facies, net-to-gross, porosity, XY permeability, Kv/Kh, Sw and Swirr.

In some embodiments the geological direction comprises at least one of a true stratographic thickness, a distance from an ancient shoreline, a vertical distance within a single depositional unit, a burial depth in a cartesian coordinate system and a two-dimensional map area.

In some embodiments the measurable geological metric comprises at least one of: amount of oil in place, OIP; a permeability-thickness product, a net present value of oil in the physical volume of the Earth; and a Dykstra-Parsons coefficient of permeability variation.

According to a second aspect, the present invention provides a data processing apparatus comprising; a memory; processing circuitry to: receive or generate, for a measurable geological metric comprising a physical property of the Earth's interior, a set of candidate model functions corresponding to the measurable geological metric, each candidate model function of the set defining a predicted variation of a geological input variable upon which the measurable geological metric depends along a respective geological direction; define a computation volume comprising a three-dimensional, 3D, grid of cells representing a volume of the Earth's interior for which predicted values of the measurable geological metric are to be calculated; and identify a representative subset of the set of candidate model functions that excludes at least one of the set of candidate model functions by performing iterative refinement to calculate values of the measurable geological metric for a first incomplete proportion of the cells of the 3D grid of cells using the set of candidate model functions and performing model clustering based on the calculated values of the iterative refinement; and perform a computation on a second proportion of 3D grid cells comprising at least a larger proportion of cells of the 3D grid than the first incomplete proportion of cells, the computation to calculate values of the measurable geological metric for the second proportion of cells using the identified representative subset of candidate model functions as an input to the metric computation.

In some embodiments the method comprises receiving or generating, for a measurable geological metric comprising a physical property of the Earth's interior, a set of candidate model functions corresponding to the measurable geological metric, each candidate model function of the set defining a predicted variation of a geological input variable upon which the measurable geological metric depends along a respective geological direction; defining a computation volume comprising a three-dimensional, 3D, grid of cells representing a volume of the Earth's interior for which predicted values of the measurable geological metric are to be calculated; and identifying a representative subset of the set of candidate model functions that excludes at least one of the set of candidate model functions by performing iterative refinement to calculate values of the measurable geological metric for a first incomplete proportion of the cells of the 3D grid of cells using the set of candidate model functions and performing model clustering based on the calculated values of the iterative refinement; and performing a computation on a second proportion of 3D grid cells comprising at least a larger proportion of cells of the 3D grid than the first incomplete proportion of cells, the computation to calculate values of the measurable geological metric for the second proportion of cells using the identified representative subset of candidate model functions as an input to the metric computation.

According to a fourth aspect, the present invention provides a computer program product embodied on a computer-readable medium comprising program instructions, configured such that when executed by processing circuitry, cause the processing circuitry to: receive sparsely sampled measurement data for an input geological variable; determine a plurality of best-fit line or curves by fitting data derived from the sparsely sampled measurement data in a respective plurality of geological directions, the plurality of best-fits being performed in an ordered sequence of the geological directions; generating a basis function for the input geological variable using at least a subset of the fit parameters for the best-fit lines or curves for each of the plurality of geological directions; using the basis function to calculate a measurable geological metric that depends on the input geological variable.

In some embodiments the ordered sequence of the plurality of best-fits comprises all possible permutations of the plurality of geological directions and wherein a plurality of basis functions are created for the input geological variable corresponding to a respective plurality of the possible permutations.

In some embodiments a first of the plurality of best fit lines or curves is fit to the sparsely sampled measurement data and wherein each subsequent best-fit line or curve of the sequence is fit to a residual data set of an immediately preceding fit in the ordered sequence.

Some embodiments comprise generating at least one basis function for each of one or more further input geological variables.

Some embodiments comprise inputting to a computation of a geological metric on a 3D grid of cells corresponding to an interior volume of the Earth, the basis function(s) for at least one input geological variable upon which the measurable geological metric depends, the input basis functions representing a full ensemble of candidate basis functions.

Some embodiments comprise instructions when executed by processing circuitry to cause the processing circuitry to identify a representative subset of the full ensemble of candidate basis functions by performing iterative refinement to calculate values of the measurable geological metric for a first incomplete proportion of the cells of the 3D grid of cells using the full ensemble of candidate basis functions and performing model clustering based on the calculated values of the iterative refinement; and performing a computation on a second proportion of cells of the 3D grid, larger than the first incomplete proportion of cells, the computation to calculate values of the measurable geological metric for the larger proportion of cells using the identified representative subset of basis functions as an input to the measurable geological metric computation.

In some embodiments the model clustering is performed to reduce the full ensemble of candidate basis functions to the representative subset of basis functions without fully populating the 3D grid of cells with values for the ensemble of candidate basis functions.

According to a fifth aspect, the present invention provides a method of predicting variation of an input geological variable corresponding to a hydrocarbon reservoir, the method comprising: receiving sparsely sampled measurement data for an input geological variable; determining a plurality of best-fit line or curves by fitting data derived from the sparsely sampled measurement data in a respective plurality of geological directions, the plurality of best-fits being performed in an ordered sequence of geological directions; generating a basis function for the input geological variable using at least a subset of the fit parameters for the best-fit lines or curves for each of the plurality of geological directions; using the basis function to calculate a measurable geological metric that depends on the input geological variable.

According to a sixth aspect, the present invention provides a data processing apparatus having processing circuitry configured to: receive sparsely sampled measurement data for an input geological variable; determine a plurality of best-fit line or curves by fitting data derived from the sparsely sampled measurement data in a respective plurality of geological directions, the plurality of best-fits being performed in an ordered sequence of geological directions; generate a basis function for the input geological variable using at least a subset of the fit parameters for the best-fit lines or curves for each of the plurality of geological directions; use the basis function to calculate a measurable geological metric that depends on the input geological variable.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention are further described hereinafter with reference to the accompanying drawings, in which: Figure 1 schematically illustrates an example data processing apparatus according to 5 some embodiments; Figure 2 schematically illustrates a method of computation of values of a geological metric; Figure 3 schematically illustrates contours for a number of scenarios that can be generated from a given number input geological variables (y-axis) against a branch factor on the x-axis representing a given number of possible values for each input geological variable; Figure 4 is a flow chart schematically illustrating in more detail a refine phase of the geological metric calculation algorithm of Figure 2; Figure 5 schematically illustrates how convergence of sub-cell sampling is achieved in a first iteration of the iterative refinement phase of Figure 4; Figure 6A is a flow chart schematically illustrating an expansion phase of the geological metric calculation algorithm; Figure 6B is a flow chart schematically illustrating how dimensionality reduction may be used to determine a number of reduced basis functions for an ensemble of candidate model functions for each input geological variable; Figure 6C is a flow chart schematically illustrating process elements comprising a refine phase of the geological metric calculation algorithm; Figure 7 schematically illustrates an example set of reduced basis functions generated via dimensional reduction, each basis function representing a 'gene' of a full ensemble of candidate model functions for a given geological input variable; Figure 8 schematically illustrates Cumulative Distribution Functions (CDFs) for an 01P metric for each of a full candidate ensemble and a representative subset of models obtained via clustering; Figure 9 schematically illustrates property model combinations forming a selection form the full ensemble of candidate models that may be used to obtain an accurate CDF; and Figure 10 schematically illustrates a cross plot graph of CDF values between a CDF calculated with the full candidate model ensemble (x axis) and representative subset of models (y-axis).

DETAILED DESCRIPTION

Figure 1 schematically illustrates an example data processing apparatus according to some embodiments. System 100 includes one or more processor(s) 140, system control logic 120 coupled with at least one of the processor(s) 140, system memory 110 coupled with system control logic 120, a non-volatile memory (NVM)/storage 130 coupled with system control logic 120, and Input/Output devices 150 coupled to the system control logic 120.

Processor(s) 140 may include one or more single-core or multi-core processors. Processor(s) 140 may include any combination of general-purpose processors and dedicated processors (e.g., graphics processors, application processors, baseband processors, etc.). Processors 140 may be operable to carry out the methods described herein, using suitable instructions or programs (i.e. operate via use of processor, or other logic, instructions). The instructions may be stored in system memory 110, as system memory portion (measurable geological parameter code) 115, or additionally or alternatively may be stored in (NVM)/storage 130, as NVM instruction portion (measurable geological parameter code) 135. The measurable geological parameter code 115 and/or 135 may include program instructions to cause a processor 140 to calculate a value of a measurable geological metric.

Processors(s) 140 may be configured to execute the embodiments described herein. The processor(s) 140 can comprise one or more of 3D grid definition circuitry 142, representative subset identification circuitry 144 and measurable geological metric calculation circuitry 146. It will be appreciated that the 3D grid definition functionality, the representative subset identification functionality and measurable geological metric calculation functionality may be distributed or allocated in different ways across the system involving one or more of the processor(s) 140, system memory 110 and NVM/Storage 130.

The system memory 110 may be used to store a set of candidate model functions 112 corresponding to a given measurable geological metric, each candidate model function of the candidate set defining a predicted variation of a geological input variable upon which the given measurable geological metric depends. The set of candidate model functions 112 may be generated from sparsely sampled measurement data 114 obtained by physical measurements (i.e. empirically) on a hydrocarbon reservoir. The sparsely sampled measurement data 114 may also be stored in the system memory 112.

System control logic 120 for one embodiment may include any suitable interface controllers to provide for any suitable interface to at least one of the processor(s) 140 and/or to any suitable device or component in communication with system control logic 120.

System control logic 120 for one embodiment may include one or more memory controller(s) to provide an interface to system memory 110. System memory 110 may be used to load and store data and/or instructions, for example, the measurable geological metric calculation program instructions according to the present technique. System memory 110 for one embodiment may include any suitable volatile memory, such as suitable dynamic random access memory (DRAM), for example.

NVM/storage 130 may include one or more tangible, non-transitory or transitory computer-readable media used to store data and/or instructions, for example. NVM/storage 130 may include any suitable non-volatile memory, such as flash memory, for example, and/or may include any suitable non-volatile storage device(s), such as one or more hard disk drive(s) (HDD(s)), one or more compact disk (CD) drive(s), and/or one or more digital versatile disk (DVD) drive(s), for example.

The NVM/storage 130 may include a storage resource physically part of the data processing apparatus 100 or it may be accessible by, but not necessarily a part of, the device.

For example, the NVM/storage 130 may be accessed over a network via a network interface (not shown).

System memory 110 and NVM/storage 1230 may respectively include, in particular, temporal and persistent copies of, for example, the instructions portions 115 and 135, respectively. Program code segments 1215 and 1235 may include instructions that when executed by at least one of the processor(s) 140 result in the data processing apparatus 100 implementing one or more of methods of any embodiment, as described herein.

For one embodiment, at least one of the processor(s) 140 may be packaged together with logic for one or more controller(s) of system control logic 120. For one embodiment, at least one of the processor(s) 140 may be packaged together with logic for one or more controllers of system control logic 120 to form a System in Package (SiP). For one embodiment, at least one of the processor(s) 140 may be integrated on the same die with logic for one or more controller(s) of system control logic 120. For one embodiment, at least one of the processor(s) 140 may be integrated on the same die with logic for one or more controller(s) of system control logic 120 to form a System on Chip (SoC). Each of the processors 140 may include an input for receiving data and an output for outputting data.

The one or more processor(s) comprise general purpose or special purpose processing circuitry comprising circuitry 142 to define a grid of 3D cells on which to perform calculation of a measurable geological metric, circuitry 144 to reduce the set of candidate model functions 112 to a more manageable representative subset of models that excludes potentially redundant models by performing a calculation on a relatively small proportion of 3D grid cells and circuitry 146 to calculate the measurable geological metric for the reduced number of model functions of the representative subset on a relatively larger proportion of the 3D grid cells or up to 100% of the 3D grid cells.

In various embodiments, the I/O devices 150 may include user interfaces designed to enable user interaction with the data processing apparatus 100 and peripheral component interfaces designed to enable peripheral component interaction with the data processing apparatus 100.

Figure 2 schematically illustrates a method of computation of values of a geological metric. The method comprises a setup phase 210, a define phase 220, an expand phase 230, a refine 35 phase 240 and a compute phase 250.

In the setup phase 210, data to be used in the metric computation is defined. In some examples, the sparsely sampled data may be collected by making empirical measurements in the field in a hydrocarbon reservoir. In alternative examples, the sparsely sampled data for a given hydrocarbon reservoir or an associated Earth interior volume may be obtained from one or more remote or local data repositories. In the setup phase 210, computer resources estimated to be required for the calculation may be allocated. Compute hardware and memory capacity may limit a number of cells that may be simulated on a 3D grid of cells representing the interior volume to the Earth being simulated and/or the number of models that can be incorporated in the calculation.

Within the define phase 220, at process element 222 the input geological variables to be used to calculate a given measurable geological metric are defined. Examples of these input geological variables are porosity, NTG, facies, XY permeability, Kv/Kh, Swand Swirr. Definitions of these example input geological variables are as follows: Facies: A facies is a body of rock that is distinct from adjacent bodies of rock and usually formed in a different geological environment. For example, an interior volume of the Earth to be modeled nay comprise channel full of sand (a sand facies) encased in shale (a shale facies). The facies input variable is a convenient way to distinguish rock types based on bulk properties and how the rocks were deposited.

Net-to-Gross (NTG): This is the total thickness of pay in a reservoir interval, divided by the total thickness of the reservoir interval. For example, a reservoir interval of 100m may consist of interbedded sands and shales. If in that interval there is a total of 50m of oil charged sand (and the rest is shale or water wet sand) then the NTG is 0.5.

Porosity (9): Defined as the capacity of reservoir rocks to hold fluids.

XY Permeability: Permeability is a measure of a rock's ability to transmit fluid: how easily can fluid flow through the rock, typically measured in millidarcies. This quantity is often very heterogeneous, so different values in the X and Y directions might reasonably be expected.

Kv/Kh: Permeability is usually represented by K. A Kv/Kh ratio is therefore a ratio of the vertical permeability to the horizontal permeability, that is the permeability anisotropy.

Swirr: Stands for irreducible water saturation, and is the minimum water saturation that can be found in a reservoir: 100% hydrocarbon saturation is not seen in practice because the action of capillary pressure in the interior volume if the Earth means it is not possible to eliminate all of the water.

Sw: The fraction of fluid in the rock that is water. Sw can be larger than Swirr, but not smaller.

For the 01P metric discussed above, the relevant input variables are porosity, NTG and water saturation (Sw). At 224 in the define phase of Figure 2, the mathematical formula via which the given metric is to be calculated may also be defined at t, e.g. via equation (1).

In the expand phase 230, for each of the geological input variables selected in the define phase at 222, a set of candidate model functions is generated to describe the behavior of each respective geological input variable with respect to at least one geological direction. The generation of the set of candidate model functions may be performed in any one of a number of different ways. For example, the method for making geological predictions as described in granted UK patent GB 2532734B may be used to generate the set of candidate model functions.

Known geological (geostatistical) modelling algorithms such as Sequential Gaussian Simulation rely upon an idealistic assumption that the geological parameters being distributed do not have any spatial bias in their distribution. In reality, observed geological properties almost always have some component of spatial dependence on their distributions, due to variations in the geological conditions across a given rock volume. This means that all observational datasets (such as porosity or permeability) may be investigated for any dependency to a spatial vector such as burial depth or distance from a locus of sedimentation. Accordingly, any observed or otherwise expected spatial bias to the input data can be defined mathematically and removed or at least reduced using inverse mathematical transformations. This principle was exploited in GB2532734B. However, some example embodiments may use alternative methods of expansion to define a full ensemble of characteristic model functions or to define values for a grid of 3D cells for each candidate model for an input geological variable. The method of GB2532734B prescribes both how to determine a characteristic model function in which any non-stationary properties have been reduced and how to determine a statistical weighting associated with each candidate model function for a given geological input variable with respect to a geological distance or volume measure.

In the refine phase 240 of Figure 2, a dimensionality reduction 242 is implemented in order to reduce an amount of memory that might otherwise be required to store data corresponding to the set of candidate model functions generated in the expand phase 230 and this data is used to perform calculations based on the 3D grid. Rather than storing numerical values of the input geological parameter for each cell of the 3D computational grid representing the interior Earth volume of interest, each candidate model for each input variable is parameterized to define in functional form any trends present in the data and to specify in parametric statistical form a degree of heterogeneity in each input variable. Thus, rather than storing numerical values of a given input geological variable for each cell of the 3D grid, which could comprise up to ten million cells, resulting in perhaps tens of millions of data values as input to the computation, the number of data values may be reduced to, for example, 50-100 parameters for each input geological variable. Furthermore, the characteristic parameters for each input geological variable may be stored independently of the 3D grid. These 50-100 parameters obtained via dimensionality reduction for each input variable may be denoted a "basis function"for a respective 3D candidate model function, the basis function enabling values of the measurable geological metric to be calculated in any cell of a 3D volume. Each basis function could be considered conceptually as a gene for a given candidate model for a given input geological variable in the sense that it encapsulates (or encodes) all cell values for the candidate model function.

As already indicated, the dimensionality reduction 242 may be performed using the candidate model functions as calculated in GB2532734B. The method described in GB2532734B generates a suite of possible candidate model functions by sequentially applying de-trending routines (e.g. to cancel variations) along geological directions, for example: depth in real world coordinates; two orthogonal map directions (for example dip and strike); and/or model layer increment 'k' in the vertical direction (stratigraphy). For each case a function is fit to the sparsely sampled data for the geological input variable concerned to provide a function characterizing a "trend" in the sampled data. For each property, the trends can be removed in 24 different ways (for four trend directions corresponding respectively to four geological directions, there are 24 possible orders of these trends). This results in 24 different "models" (candidate model functions) for each geological input variable.

However, the method prescribed in GB 2532734B is not the only way to determine the reduced basis representation. Any one of a number of known techniques may be used to produce a reduced basis representation. For example, machine-learning approaches such as variational auto-encoders (VAE) or Uniform Manifold Approximations (UMAP) may be used to perform the dimensional reduction. Alternatively, mathematical approaches as described in any one of the following publications may be used: * "Comparison of Sparse-grid geometric random sampling methods in nonlinear inverse solution uncertainty estimation", Tompkins, M., Martinez, J.L.F., Muniz, Z.F., 2013, Geophys.J.lnt, 61, 28-41, 2013 * "Scalable uncertainty estimation for non-linear inverse problems using parameter reduction, constraint mapping and geometric sampling: Marine CSEM examples", Tompkins, M.,Martinez, J.L.F., Alumbaugh, D.L. & Mukerji, T., Geophysocs, 76, F263-281, 2011 * "A model compression scheme for nonlinear electromagnetic inversions", Abubaker, A., Habashy, T. , Lin, Y. & Li, M., 2012, Geophysics, 77, E379-E389.

The mathematical methods specified in the above three academic papers involve generating values for each candidate model on the 3D grid before performing the dimensional reduction. However, the method of generating candidate model functions specified in GB 2532734B eliminates any requirement to generate the values for the entire 3D grid as a prerequisite to achieving the dimensionality reduction. In the GB 2532734B technique, a side effect of defining the set of candidate model functions and their associated statistical weightings is obtaining a set of parameters for each candidate model function that correspond to a reduced basis representation. This is a new realization according to the present technique. Figure 6B, described below gives a more specific example of how dimensionality reduction may be performed by implementing the method of candidate model function determination specified by GB2532734B.

In one example, a single model for a single input geological variable may consists of a 3D grid of 640,000 cells, where each cell of the grid is assigned a respective geological variable value. However, in order to uniquely reconstruct the 3D model, a much more limited range of parameters can be utilized. Using porosity as an example, trends in the porosity may be calculated from sparsely sampled data (for example at well locations) as a function of dip, strike, depth and stratigraphy using the method described above. Each of these trends can be represented by a function, whose parameters describe the trend. Since the order in which trends are removed is important, the multiple porosity scenarios may be generated by altering the sequence in which the trends are applied. As a result, in order to recreate any of the porosity models, just the parameters for the trend functions and the order in which the trends are to be applied may be used. Based on this, and the sparsely sampled data set, the 3D model can be fully constructed. As a result, rather than storing 640,000 values to represent the model, represent the same model can be represented using only 100-200 parameters (depending on the nature of the model and property). This makes manipulating and analysing large numbers of candidate models a more tractable problem.

Still considering the refine phase 240 of Figure 2, after the dimensionality reduction 242, an iterative grid refinement 244 may be performed. In previously known techniques, the complexity of dealing with the prospect of calculating a measurable geological metric in up to ten million cells of a 3D grid volume was often addressed by reducing the full ensemble of candidate model functions at the outset but still performing the calculation on most or all of the grid cells. By way of contrast, according to the present technique, all or at least the majority of candidate model functions obtained in the expand phase 230 are retained at the outset of the calculation, but instead the number of 3D grid cells on which the geological metric is calculated is considerably reduced to make the initial stage of the computation more tractable. Thus, for example, the metric calculation may be performed for the full ensemble of candidate model functions on a first incomplete portion of cells on the 3D grid (for example, 1% to 10% of the cells). This effectively sub-samples the values of the measurable geological metric across the 3D grid. The calculated values from the grid sub-sampling may be evaluated for reliability by gradually increasing the proportion of grid cells being sub-sampled (and/or gradually decreasing the proportion of cells) to determine what proportion of cells gives convergence to a stable value. Once the appropriate incomplete proportion of the cells that gives a stable estimate for the geological metric has been determined, the full set of candidate model functions can then be used to determine a value of the geological metric on that relatively small proportion of the cells and then a model clustering technique 146 may be performed on results of that calculation.

The model clustering 246 comprises identifying computationally, particular models of the full set of candidate model functions having unique (or characteristic) properties that lead to unique (or characteristic) behaviour or metric values. Clustering techniques are implemented to understand how many unique model classes, i.e. clusters of candidate model functions that give the same calculated metric) exist in the dataset. This allows the computationally expensive processes of simulation on large numbers of 3D grid cells to be concentrated on a reduced number of candidate model functions. Thus a "representative subset" of candidate model functions is identified, which is smaller in number than the full ensemble and yet accurately characterizes all candidate models of the full ensemble.

There are many alternative approaches to clustering data that may be implemented in different embodiments. For example, the model clustering 246 may be implemented using one or more of boosted trees, K-means, UMAP and VAE). According to the present technique, the model clustering 246 is used to determine the most useful parameter combinations to make the search of the model space more efficient, and to identify which input models (or parts thereof) and/or which input geological variables have the greatest effect on the predicted values of the measurable geological metric of interest.

A sequence of eliminating any redundant models from an input set of candidate model functions and then, at each iteration of model clustering, advantage is taken of the reduction in the number of candidate model functions included in the calculation by increasing the proportion of 3D grid cells used for the metric calculation in the next iteration. Thus, by gradually increasing the number of cells sub-sampled in the calculation whilst simultaneously reducing the number of candidate model functions in the ensemble via the model clustering, the geological metric calculation according to the present technique remains both efficient and computationally tractable.

Also in the refine phase 240 of Figure 2, a scenario ranking 248 of the candidate model functions is performed. In some embodiments the scenario ranking of the candidate model functions may be implemented by the creation of a cumulative density function (CDF). The CDF describes the probability that a real valued random variable V with a given probability distribution will be found to have a value less than or equal to v. In particular, the CDF, F, of a real-valued random variable V is the function given by: Fv(v) = P(V v). The candidate scenarios are arranged in ascending order according to the metric calculated (OIP in this case), and then integrated this ordered list is integrated to produce the cumulative density function (CDF). An example of a CDF for an OIP metric is described below with reference to Figure 5. Construction of the CDF is useful to quantify the distribution of the predicted values for the geological metric as well as obtaining a prediction for its actual value. The CDF allows the error bounds of a prediction to be determined. For example, P10, P50 and P90 geological metric values may be determined from a CDF, where (as an example) the P50 value is the value of the metric for which there is a 50% probability that the true value will be this given value or smaller. As a further example P10 represents a 10% probability that the true value of the geological metric will be lower than this value and so on. These CDF values may quantify the probability of success of, for example, extracting oil from particular hydrocarbon reservoir.

Figure 3 schematically illustrates contours of a number of scenarios that can be generated from a given number of input geological variables (y-axis) with each input property having a given number of possible alternative values (the branch factor on the x-axis). The values contoured are logio (total number of scenarios). For a realistic case there may be seven input geological variables (y axis): grid, facies, NTG, porosity, XY perm, Kv/Kh and Swirr. There may be a branching factor of nine for example percentiles P10-P90 for each property on the x axis. The total number of possible scenarios in this simple example case is 4.8 million, as shown at data point 310 on the graph of Figure 3. The scale of the problem is compounded by the size of the 3D cell grid: each property and scenario might consist of a grid of up to ten million cells.

Figure 4 is a flow chart schematically illustrating in more detail the refine phase 240 of Figure 2 (denoted 410 in Figure 4) and its relative position after the expand phase 420 and prior to the compute phase 430. In the refine phase 410, a full ensemble of candidate model functions generated in the expand phase 420 is received and dimensionality reduction is performed on the candidate model functions,either by generating the values of the corresponding input geological variable on the 3D grid in order to parameterise mathematically the model functions for more efficient storage. Alternatively, the fit parameters may be taken directly from the functional fits performed in the expand phase 420 if the method as described in GB2532734B is used.

Consider a specific example of calculating an OIP metric according to equation (1) above such that the relevant input geological variables are: (i) porosity; (ii) NTG; and (iii) water saturation. Once the full ensemble of candidate model functions has been parameterized by dimensionality reduction at 412, an iterative refinement is performed, which involves calculating the OIP value in an incomplete proportion (i.e. < 100%) of the cells of the 3D grid, starting at say 0.001% and gradually increasing the proportion of cells sub-sampled until a calculated value of the OIP metric converges to a stable value. Say, for example that the metric is determined to be stable when at least 1% of the cells are sub-sampled at 414, then after the first iteration with this minimum proportion of 1% of the cells, the calculation proceeds to a clustering 416 routine whereupon clustering techniques are used to identify the classes of candidate model functions that provide unique results (i.e. different from other candidate model functions or other clustered groups of candidate model functions). As an example of the model clustering 416, to accurately calculate the CDF, clustering approaches may be used to find a subset of models that accurately represents this CDF. Approaches such as principle component analysis, Kmeans clustering or other known clustering techniques may be used for this purpose. Alternatively approaches to clustering based on the distribution of models in the CDF bins may be applied in example embodiments to ensure that the statistical distribution of models is maintained through the clustering process.

Via the model clustering algorithm 416, the full ensemble of N candidate model functions that are simulated on the subset of the 3D grid cells on a first iteration that results in a converged metric value for the OIP are reduced in number to eliminate at least one candidate model function The reduced subset of candidate model functions then cycles back to the iterative refinement of 414. At least first iteration is extremely efficient computationally despite including the full ensemble of candidate models, because only around 1% to 10% of the cells of the 3D grid are included in the calculation. On a second iteration of the iterative refinement 414, the number of cells sub-sampled in the grid may be increased because the number of candidate model functions to be evaluated on the 3D grid has been reduced, releasing some computational capacity to include more grid cells. In some embodiments, at 414 there is a test for convergence of the OIP on iterations other than the first iteration. In other embodiments, once a proportion of cells appropriate to achieve convergence has been determined once, it is assumed that an increased number of cells sub-sampled for the reduced number of candidate model functions will inevitably converge without specifically verifying that convergence has been achieved. The iterative refinement and clustering are iterated over such that as the number of candidate model functions is progressively reduced via the clustering at 416, the proportion of 3D grid cells sub-sampled to calculate the OIP metric is simultaneously progressively increased.

The iteration of 414 and 416 may continue, step-wise increasing the number of cells sub-sampled until the full set of 3D grid cells or a high but incomplete proportion of the grid cells are included in the simulation. This is reflected at 418 where the representative subset of candidate model functions identified via the clustering is used to perform a full-scale computation on all or at least most of the millions of cells of the 3D grid. After completion of the refinement phase 410, the simulation has identified which subset of the full ensemble of candidate model functions for the input geological variables are representative of the full set. The representative subset of candidate model functions that are output from the refinement process may then be input to a full-resolution calculation of the measurable geological metric of interest and/or may be used to run other reservoir simulation models.

An output following cyclical iteration of the iterative refinement 414 and 416 to progressively eliminate more of the candidate model functions from the representative subset of models may out put a minimum representative subset corresponding to the most reduced subset achievable via the iterations whereby the representative subset is progressively reduced in size and the number of cells upon which the metric is calculated is progressively increased. In the simple example of calculation of the 01P metric, with three input geological variables and taking into account variations in four geological directions the full ensemble of N = 4!* 4! *4! = 13824 candidate model functions, the full candidate model set has been shown in simulations of the present technique to be reducible to a minimum set of only 366 representative model functions, giving a reduction in number of models to approximately N/38. The subset of just 366 models can correctly define a CDF of the OIP within acceptable error margins.

Figure 5 schematically illustrates how convergence of the sub-cell sampling is achieved in a first iteration of the iterative refinement 414 of Figure 4. The graph is a plot of magnitude of the calculated OIP (y axis) against the percentage of 3D grid cells sub-sampled for the calculation. It can be seen from Figure 4 that the OIP metric values diverge when 0.001% up to about 0.1% of the cells are included in the calculation. However, from 0.1% to 1% and above, the values for each of the 100 scenarios plotted converge to stable values. The 100 scenarios each correspond to a candidate model function of the full ensemble. Note that the different candidate model functions give different predicted values for the calculated OIP. Based on the information in the graph of Figure 5, it is clear that including only 0.1 to 1% of the cells is sufficient to robustly calculate the metric.

Figure 6A is a flow chart schematically illustrating an expansion phase (230 in Figure 2 or 430 in Figure 4) of the geological metric calculation algorithm. At 610 sparsely sampled measurement data for a hydrocarbon reservoir is obtained either directly from the field or from a data repository. The sparsely sampled measurement data provides values of one or more input geological parameters within an internal volume of the Earth whose properties it is intended to predict. At 612, a measurable geological metric of interest is defined. One example goal may be to use OIP as a metric to understand the amount of oil present in a reservoir. Another example metric may be used to understand behaviour of a reservoir model in a simulator. Note that OIP is just one an example and OIP alone is unlikely to be the best metric for this aim. In practice it is likely that groups of metrics encompassing both static and dynamic behaviour will be used. These may include, for example, permeability-thickness product, net present value, rate of return, Dykstra-Parsons coefficient of permeability variation, or other metrics.

For the simple case of 01P, the three input geological variables of equation (1) are specified at 612. At 614, a variable n that is used to index the input geological variables is initialised and at 616 a set of candidate model functions in generated for the input geological variable n. Optionally, at 616, any non-stationary properties with respect to one or more geological directions may be removed (e.g. via inverse mathematical transformations) for the given geological variable n. This selective removal of non-stationary properties is described in more detail in GB2532734B. Note that where there is non-stationary behaviour in more than one geological direction then each permutation of non-stationary property removal may result in a respective different candidate model function. At 620, a determination is made as to whether or not n=k, with k being the maximum number of input variables upon which the metric of interest depends. If not, then n is incremented at 622 and the process returns to 616. However, if n=k at 620 then at 624 a full set of candidate model functions for all n input geological variables is collated and output.

Figure 6B is a flow chart schematically illustrating how dimensionality reduction according to the example method of GB2532734B may be used to determine a number of reduced basis functions (see Figure 7 illustration) for an ensemble of candidate model functions for each input geological variable. In this example, the expansion and dimensionality reduction may be performed simultaneously using the statistical fitting techniques described in more detail in GB2532734B. A useful side-effect of implementing the candidate model function generation according to GB2532734B is that parameterization of the candidate model functions may be achieved without the need to generate values for each candidate function for each cell of the 3D grid. Thus dimensionality reduction and candidate model expansion may, in this example, be performed in parallel.

At 652 an index, x, representing the input geological parameters is initialized to one and at 654 sparsely sampled measurement data from a hydrocarbon reservoir is obtained for the given geological variable vx. Then at 656 an index i representing a geological direction is initialized to one and subsequently at 658 a trend function such as a spline is fitted to sparsely sampled data for the variable vx along the geological direction di. For example, the first geological direction may be depth and the best-fit function may be described by three spline knot points. Next, at 660, values of the ith trend function corresponding to the best-fit found at process element 658 is calculated at each of a plurality of data points and an ilh residual data set is calculated by subtraction the best-fit function value from the corresponding sparsely sampled data value.

Next, at 662, an (i+1)th trend function is fitted to the ith residual data set along a different geological direction di-E1 and at 664 an 0+1)?h residual data set obtained by subtracting the functional value of the (i+1)th trend function from the (i+1)h residual data set. The second geological direction may be, for example, strike and the best-fit spline for the first residual data set may again be represented by a few spline parameters. Then the geological direction index, i, is incremented at 666 and the process elements 662 and 664 and 666 are repeated until all geological directions have been cycled through in turn, whereupon i=imax at process element 668 and then the reduced basis functions for the candidate geological models 1, 2, ... N for a the given geological input variable vx are output. In one example there may be four geological directions cycled through corresponding to depth, strike, stratigraphy and dip. Then the geological variable index, x, is incremented at 672 and the process returns to 654 to analyse the sparsely sampled data for the next geological variable and the whole process is repeated again.

The flow chart illustrated in Figure 6B is for creating a single reduced basis representation of a model corresponding to one "realization" of one input geological variable. The result may be a representation of the 3D model in terms of a number of parameters describing the functions fit to the data at each step. In the case of porosity the trends in the dip, strike and depth direction can be represented using 3 knot points describing the spline function, and between 75 and 78 points in the vertical stratigraphy direction (where the variation in properties is more rapid). To get the full ensemble of N candidate models, all of the spline fitting and residual calculating steps 658, 660, 662, 664,666, 668 can be repeated to remove the trends in a different ordering of geological directions. In particular, a first ordering might comprise Depth-Strike-Stratigraphy-Dip and this corresponds to a single realization or a single permutation of an ordered sequence of geological directions. The process may then be repeated for another different ordering of the sequence, say Depth-Dip-Stratigraphy-Strike and this gives us second realization. The second realization/permutation is expected to be different from the first realization/permutation because the model depends on the detrending sequence. There are 24 different ways of ordering four geological directions so there are twenty-four candidate models to be generated by re-ordering the de-trending sequence for the four different geological directions. In other words, there are 24 different permutations for a sequence of best-line fits involving four geological directions. For any input variables that are related to another variable distributed in 3D space, a fifth vector may be defined to give 51 rather than 4! different models.

The permutation process may be repeated for each input geological variable (three in this example) which gives 243 possible 01P models (13824), that is, 13824 candidate model functions. Based upon the residual values after application of all the desired functions (i.e. the remaining difference between the observed values and those expected by the defined functions), those that best match the data can be 'scored' using a quality-of-fit metric. This may then be used to select the highest scoring scenarios (candidate model functions) and weight their respective probabilities.

Figure 6C is a flow chart schematically illustrating process elements comprising a refine phase of the geological metric calculation algorithm. The Figure 6C flow chart receives as input, an output from the expansion phase flow chart of Figure 6A, which corresponds to a set of candidate model functions. At 682 a 3D structural grid of cells is defined as a computation volume and then at 684 dimensionality reduction is performed to identify and parameterize functional trends in a storage-efficient way such that the characteristic parameters for each candidate model function may be stored independently of the 3D structural grid. In some examples, the dimensionality reduction may involve first populating each cell of the 3D with values. In other examples, the dimensionality reduction may be performed simultaneously with the candidate model function expansion without populating the 3D structural grid.

At 686 a variable, m, indicating a proportion of the cells of the grid to be subsampled is initialized. For example, m may be initialized to be in a range between 1% and 10%. At 688, iterative grid refinement is performed to calculate the geological metric on m% of the cells for the full ensemble of candidate model functions. At 690 an optional determination is made as to whether or not the geological metric converges in the m% of cells. This determination may be bypassed if it is known in advance what proportion of the cells can be used to ensure convergence. If convergence has not been achieved at 690 then the proportion of cells is increased via looping over process element 692 and 690 until the values have in fact converged.

Once convergence has been found at 690, then at 698 model clustering is performed to eliminate some of the set of candidate model functions from a next stage of the 3D grid cell computation. With the reduced set of n<N candidate model functions the process proceeds to 699, where the percentage of cells sub-sampled is increased by a suitable increment and then a further stage of iterative grid refinement is performed. The process elements 688, 698 and 699 are iterated over to progressively reduce the number of candidate model functions in a representative subset and to progressively increase the proportion of sub-sampled cells as more candidate model functions are eliminated via clustering. These iterations can be repeated until m=100% at 694, whereupon a remaining subset of the input set of candidate model functions, denoted a minimum "representative subset", is output at 696.

Figure 7 schematically illustrates reduced basis functions, such as a basis function 710 corresponding t model number 24. Each illustrated basis function represents a 'gene' of the full 3D model. The shaded bands in each illustrated basis function represent the value of the model parameter (input geological variable), which can be reconstructed solely from this information and the original sparse data, for each of the twenty four models produced. The "model parameters" illustrated to the right-hand side of the graph of Figure 7 schematically illustrate the "trend direction", which refers to variations in the values of the given input geological variable with respect to each of the four selected geological directions. In this example, the geological directions are dip, strike, depth and stratigraphy. A minimum, mid and maximum value of the input geological variable is provided for each of the four geological directions and six pairs of layer index and corresponding parameter for the stratigraphy.

Figure 8 schematically illustrates two CDFs for the 01P metric, one CDF for a full candidate ensemble and another CDF for a representative subset of models obtained via clustering. The graph illustrates a first result of calculating the CDF with the full model ensemble (total of 13824 candidate model functions in this example), and a second result of calculating the CDF with a representative model subset, consisting of two porosity models, seven NTG models and all twenty four Sw models (totalling 366 models in this example -a reduction of 38 times in the number of total number of candidate models functions to be considered). The representative subset of candidate models (x symbols) can be seen, from the similarity profiles of the CDF in Figure 8, to accurately reproduce the CDF of the full model (o symbols) of candidate model functions. Note that each of the symbols plotted in Figure 8 represents a CDF value calculated cumulatively for all of the cells of the 3D grid and then averaged over all of the models of either the full ensemble (o symbols) or the representative subset (x symbols).

Figure 9 schematically illustrates property model combinations that may be used to obtain an accurate CDF according to simulations performed using the present technique. Crosses show the full set of twenty-four candidate models (four geological directions) for each of the three input geological variables (NTG, porosity and Sw) . The complete candidate model ensemble (e.g. 13824 models) is constructed by joining every cross to every other cross.

However, in this example a reduced subset of only 366 representative models (connected black circles) is known via simulation to be sufficient to obtain an accurate prediction for the measurable geological metric.

The 01P metric calculation is found to be most sensitive to the Sw value, so all Sw models may be included. This can be seen in Figure 10 which schematically illustrates a cross plot of the CDF values between the CDF calculated with the full candidate model ensemble (x axis) and representative subset of models (y-axis). A perfect CDF in this plot would fall along the straight line y=x, which is labelled 1010. An effect of reducing the number of Sw models is shown by the triangles (only the odd numbered Sw models included) and crosses (only the even numbered Sw models included): in both of these cases the representative subset of the model ensemble can be seen from the plot of Figure 10 to produce an inaccurate CDF.

Embodiments of the present invention differ from GB2532734 in that the present technique analyses the input data and candidate model functions representing all possible variations in each of a plurality of input geological variable for patterns that will identify clusters of similar behaviour in metrics that have not yet been computed. GB2532734 specifies a mechanism for constructing and weighting an ensemble suite of candidate of candidate model functions representing a spread in the possible values of an input geological variable on which a given measurable geological metric may depend, but does not indicate use of the candidate model functions and associated parameters to perform a simplification of the calculation of a measurable geological metric via clustering and iterative refinement comprising sub-sampling of a grid of cells to narrow down the full range of scenarios to a considerably reduced subset that is accurately representative of the diversity within the full ensemble of candidate models.

Some previously known geological metric calculation algorithms analyse an ensemble suite for similarities in flow patterns, and then collapse the model space. However, in contrast to the present technique, such previously known methods are backward-looking methods in the sense that they analyse simulation results in each geological cell of an already completed ensemble simulation suite of grid cells. By way of contrast, embodiments of the present invention may operate before a 3D grid simulation has taken place, by analysing the model building instructions (i.e. ensemble of candidate model functions) for patterns that will inevitably lead to clustered metric behaviour after the simulation has been run. In this regard, the present technique is a forward-looking method, that does not require the ensemble simulation to be run in order to perform simplification and reduction in volume of data defining the full range of values of input geological variables upon which a given measurable geological metric depends.

Although some previously known methods may use dimensionality reduction techniques to cluster ensemble scenario outcomes, thereby enabling a subset of the model ensemble to be extracted and analysed further, these too are only known in the context of a backward-looking method in which an analysis of a completed ensemble simulation is performed. Embodiments of the present invention enable clustering to be performed in a metric space of candidate model functions for an input geological variable and yet obviate any requirement to first simulate the entire model suite (i.e. whole set of candidate model functions) on at least a majority of the 3D grid cells in order to identify the subset of models that are characteristic of the behaviour of the full ensemble of models. Embodiments retain the diversity of the full ensemble of candidate model functions by seeking convergence of a metric on a sub-set of 3D grid cells. This computation is very efficient because only a small proportion of the grid cells are used (sub-sampled).

Claims

CLAIMS: 1. A computer program product embodied on a computer-readable medium comprising program instructions, configured such that when executed by processing circuitry, cause the processing circuitry to: receive or generate, for a measurable geological metric comprising a physical property of the Earth's interior, a set of candidate model functions corresponding to the measurable geological metric, each candidate model function of the set defining a predicted variation of a geological input variable upon which the measurable geological metric depends along a respective geological direction; define a computation volume comprising a three-dimensional, 3D, grid of cells representing a volume of the Earth's interior for which predicted values of the measurable geological metric are to be calculated; and identify a representative subset of the set of candidate model functions of the set of candidate model functions by performing iterative refinement to calculate values of the measurable geological metric for a first incomplete proportion of the cells of the 3D grid of cells using the set of candidate model functions and performing model clustering based on the calculated values of the iterative refinement; and perform a computation on a second proportion of cells of the 3D grid, larger than the first incomplete proportion of cells, the computation to calculate values of the measurable geological metric for the larger proportion of cells using the identified representative subset of candidate model functions as an input to the metric computation.
2. The computer-program product of claim 1, wherein the identification of the representative subset of candidate models comprises iteratively increasing a number of cells in the first incomplete portion of cells after each corresponding model clustering iteration such that as the number of candidate model functions in the representative subset is progressively reduced via the model clustering, the number of cells in the first incomplete portion of cells is progressively increased.
3. The computer-program product of claim 1 or claim 2, wherein the iterative refinement comprises determining an appropriate number of cells forming the first incomplete portion of cells by checking for a minimum number of cells required to achieve convergence of the calculated values of the measurable geological metric and progressing to the model clustering when convergence has been achieved.
4. The computer-program product of any one of claims 1 to 3, wherein the second proportion of the 3D grid cells comprises a full resolution calculation involving all of the cells.
5. The computer-program product of claim 4, wherein the computation on the second proportion of 3D grid cells is performed for the first time after the set of candidate model functions has been reduced in number to a minimum representative subset.
6. The computer-program product of any one of claims 1 to 5, wherein computer program product comprises instructions when executed by processing circuitry to cause the processing circuitry to generate the set of candidate model functions based on performing at least one of line or curve fits to a set of sparsely sampled measurement data to parameterise the sparsely sampled measurement data, wherein the sparsely sampled measurement data is empirically derived by making measurements of at least one physical property of the Earth's interior.
7. The computer-program product of claim 6, wherein the sparsely sampled measurement data is fitted along a first geological direction to determine at least one fit parameter with respect to the first geological direction, wherein the at least-one fit parameter represents a best fit line or curve.
8. The computer-program product of claim 7, comprising calculating a first set of residuals between values of the sparsely sampled measurement data values and corresponding first functional values from the best-fit line or curve.
9. The computer-program product of claim 8, comprising iteratively performing at least one further statistical line or curve fit to a set of residuals of a previous best-fit with respect to a further geological direction different from the first geological direction and different from any previously iterated geological directions.
10. The computer-program product of claim 9, wherein a result of the iteratively performed statistical fits for each of a plurality of geological directions provides a number of best-fit parameters characterising the candidate model functions for the measurable geological metric, wherein the number best fit parameters comprises parameters from each iteration of the statistical fitting and provides a basis function for a respective candidate model function, the basis function enabling values of the measurable geological metric to be calculated in any cell of the 3D grid of cells.
11. The computer-program product of claim 11, wherein the characteristic best-fit parameters and the sparse measurement data are used by the processing circuitry to populate at least a subset of values of the 3D grid of cells.
12. The computer-program product of any one of claims 1 to 11, wherein the model clustering preferentially allocates to the representative subset of models, ones of the set of candidate model functions associated with input geological variables to which the measurable geological metric being determined is more sensitive to variations in, relative to other input geological variables to which the measurable geological metric being determined is less sensitive to variations in.
13. The computer-program product of any one of claims 1 to 12, wherein the set of candidate model functions of the measurable geological metric from which the representative subset of models is identified has, for each candidate model function, a parameter set describing both stationary and any non-stationary aspects of the candidate model function.
14. The computer-program product of claim 13, wherein any non-stationary behaviour of one or more of the input geological variables is removed from the set of candidate model functions before the identification of the representative subset of models is performed.
15. The computer program product of claim 14, wherein at least one of the set of candidate model functions input to the identification of the representative subset of models has a parameter set providing an indication of an ordering according to which two more transformations to reduce spatial bias in respective input geological variables has been performed.
16. The computer-program product of any one of claims 1 to 15, wherein the set of candidate model functions comprises a full ensemble of candidate model functions representing all of a first number of input geological variables and a second number of branch factors, the branch factors representing possible different values of a respective input geological variable.
17. The computer-program product of claim 16, wherein the model clustering reduces the full ensemble of candidate model functions to the representative subset of candidate model functions depending on an estimate of how closely a cumulative distribution function for the measurable geological metric of the representative subset of candidate model functions represents a cumulative distribution function for the measurable geological metric of the full ensemble of candidate model functions.
18. The computer-program product of any one of claims 1 to 17, wherein the at least one input geological variable comprises at least one of: grid, facies, net-to-gross, porosity, XY permeability, Kv/Kh, Sw and Swirr.
19. The computer-program product of any one of claims 1 to 18, wherein the geological direction comprises at least one of: a true stratographic thickness, a distance from an ancient shoreline, a vertical distance within a single depositional unit, a burial depth in a cartesian coordinate system and a two-dimensional map area.
20. The computer-program product of any one of claims 1 to 19, wherein the measurable geological metric comprises at least one of: amount of oil in place, 01P; a permeability-thickness product, a net present value of oil in the physical volume of the Earth; and a Dykstra-Parsons coefficient of permeability variation.
21. A data processing apparatus comprising; a memory; processing circuitry to: receive or generate, for a measurable geological metric comprising a physical property of the Earth's interior, a set of candidate model functions corresponding to the measurable geological metric, each candidate model function of the set defining a predicted variation of a geological input variable upon which the measurable geological metric depends along a respective geological direction; define a computation volume comprising a three dimensional, 3D, grid of cells representing a volume of the Earth's interior for which predicted values of the measurable geological metric are to be calculated; and identify a representative subset of the set of candidate model functions that excludes at least one of the set of candidate model functions by performing iterative refinement to calculate values of the measurable geological metric for a first incomplete proportion of the cells of the 3D grid of cells using the set of candidate model functions and performing model clustering based on the calculated values of the iterative refinement; and perform a computation on a second proportion of 3D grid cells comprising at least a larger proportion of cells of the 3D grid than the first incomplete proportion of cells, the computation to calculate values of the measurable geological metric for the second proportion of cells using the identified representative subset of candidate model functions as an input to the metric computation.24. A method of processing geological data, the method comprising: receiving or generating, for a measurable geological metric comprising a physical property of the Earth's interior, a set of candidate model functions corresponding to the measurable geological metric, each candidate model function of the set defining a predicted variation of a geological input variable upon which the measurable geological metric depends along a respective geological direction; defining a computation volume comprising a three-dimensional, 3D, grid of cells representing a volume of the Earth's interior for which predicted values of the measurable geological metric are to be calculated; and identifying a representative subset of the set of candidate model functions that excludes at least one of the set of candidate model functions by performing iterative refinement to calculate values of the measurable geological metric for a first incomplete proportion of the cells of the 3D grid of cells using the set of candidate model functions and performing model clustering based on the calculated values of the iterative refinement; and performing a computation on a second proportion of 3D grid cells comprising at least a larger proportion of cells of the 3D grid than the first incomplete proportion of cells, the computation to calculate values of the measurable geological metric for the second proportion of cells using the identified representative subset of candidate model functions as an input to the metric computation.