WO2022140717A9

WO2022140717A9 - Seismic embeddings for detecting subsurface hydrocarbon presence and geological features

Info

Publication number: WO2022140717A9
Application number: PCT/US2021/072282
Authority: WO
Inventors: Huseyin DENLI; Michael H. KOVALSKI; Cody MACDONALD; Jacquelyn S. DAVES
Original assignee: Exxonmobil Upstream Research Company
Priority date: 2020-12-21
Filing date: 2021-11-08
Publication date: 2022-11-24
Also published as: WO2022140717A1

Abstract

A computer-implemented method for identifying one or more geological features of interest from seismic data is disclosed. Hydrocarbon prospecting attempts to accurately model subsurface geologic structures and to detect fluid presence in those structures. Typically, seismic data of the subsurface is analyzed in order to accurately model the subsurface geologic structures. However, modeling in seismic space can be limiting. As such, a machine learning framework is used to learn a structured and compositional representation space, such as embedding space, where the distinctive features of interests, such as DHI, traps, seals, reservoirs, migration paths, or the like, are separated. In practice, an embedding model is generated, and thereafter tailored, such as by modifying the embedding model or refining the machine learning model using retraining. In this way and in contrast to seismic space, embedding space may better represent the features of interest and measure the adjacency or compositional nature (e.g., distance) of the features from one to another, thereby better modeling subsurface geologic structures.

Description

SEISMIC EMBEDDINGS FOR DETECTING SUBSURFACE HYDROCARBON PRESENCE AND GEOLOGICAL FEATURES

CROSS-REFERENCE TO RELATED APPLICATION [0001] This application claims the benefit of U.S. Provisional Patent Application 63/199347 filed 21 December 2020 entitled SEISMIC EMBEDDINGS FOR DETECTING SUBSURFACE HYDROCARBON PRESENCE AND GEOLOGICAL FEATURES, the entirety of which is incorporated by reference herein.

FIELD OF THE INVENTION

[0001] The present application relates generally to the field of hydrocarbon production. Specifically, the disclosure relates to a methodology for learning seismic embeddings in order to detect subsurface hydrocarbon presence and geological features.

BACKGROUND OF THE INVENTION

[0002] This section is intended to introduce various aspects of the art, which may be associated with exemplary embodiments of the present disclosure. This discussion is believed to assist in providing a framework to facilitate a better understanding of particular aspects of the present disclosure. Accordingly, it should be understood that this section should be read in this light, and not necessarily as admissions of prior art.

[0003] One step of hydrocarbon prospecting is to accurately model subsurface geologic structures and detect fluid presence in those structures. For example, a geologic model, which may comprise a computer-based representation, such as a two-dimensional (“2D”) representation or a three-dimensional (“3D”) representation, of a region beneath the earth’s surface. Such models may be used to model a petroleum reservoir, a depositional basin, or other regions which may have valuable mineral resources. Once the model is constructed, it may be used for various purposes, many of which are intended to facilitate efficient and economical recovery of the valuable resources. For example, the geologic model may be used as an input to simulations of petroleum reservoir fluid flows during production operations, which are used to plan well placements and predict hydrocarbon production from a petroleum reservoir over time.

[0004] In particular, a seismic survey may be gathered and processed to create a mapping (e.g. subsurface images such as 2-D or 3-D partially-stacked migration images presented on a display) of the subsurface region. The processed data may then be examined (e.g., by performing an analysis of seismic images) with a goal of identifying subsurface structures that may contain hydrocarbons. Some of those geologic structures, particularly hydrocarbonbearing reservoirs, may be directly identified by comparing pre- or partially-stacked seismic images (e.g. near-, mid- and far- stack images).

[0005] One quantitative way of comparing the stack images is based on analysis of amplitude changes with offset or angle (amplitude versus offset (AVO) or amplitude versus angle (AVA)). Examples of AVO and AVA are disclosed in US Patent Application Publication No. 2003/0046006, US Patent Application Publication No. 2014/0278115, US Patent Application Publication No. 2020/0132873, and US Patent No. 8,706,420, each of which is incorporated by reference herein in their entirety.

[0006] Typically, the relationship among the pre- or partially-stacked images (e.g., transition from near-stack to far-stack images) are considered to be multimodal (e.g., exhibiting multiple maxima) due to the offset-dependent responses of the geological structures and fluids (e.g., amplitude-versus-offset responses of hydrocarbon bearing sand, water-bearing sand, shale facies or salt facies can be different). It may be easier to detect such AVO changes in clastic reservoirs than ones in carbonate reservoirs. At reflection regimes, the relations among the stack images (AVO) may be explained by the Zoeppritz equation that describes the partitioning of seismic wave energy at an interface, a boundary between two different rock layers. Typically, the Zoeppritz equation is simplified for the pre-critical narrow-angle seismic reflection regimes and range of subsurface rock properties (e.g., Shuey approximation), and may be reduced to:

(θ) = A + Bsin²(θ) (1) where R is the reflectivity, Q is the incident angle, A is the reflectivity coefficient at zero incident angle ( Q = 0), and B is the AVO gradient. The stack images may be used to determine A and B coefficients. These coefficients may be estimated over each pixel over the seismic image, over a surface (boundaries along the formations) or over a geobody (e.g., performing mean and standard deviations of A and B values over a geobody region). AVO is not the only indicator of fluid presence and may not be the most reliable indicator because the fluid effects may be obscured due to the inaccuracies in seismic processing, the seismic resolution, and presence of noise or the seismic interference of thin beds. Other hydrocarbon indicators may be useful for derisking hydrocarbon presence include: amplitude terminations; anomaly consistency; lateral amplitude contrast; fit to structure; anomaly strength; and fluid contact reflection. Thus, distributions of A and B values may be interpreted to distinguish the AVO anomalies from the background as an AVO response of hydrocarbon presence is expected to be an anomaly. Further, this AVO analysis may be combined with the other indicators to increase the confidence around fluid presence.

SUMMARY OF THE INVENTION

[0007] In one or some embodiments, a computer-implemented method for identifying one or more geological features of interest from seismic data is disclosed. The method includes; accessing the seismic data; generating, based on machine learning with the seismic data an embedding model, the embedding model representing a plurality of geological features and indicating adjacency or compositional nature of the plurality of geological features from one to another in embedding space; tailoring the embedding model in order to identify the one or more geological features of interest, the one or more geological features of interest comprising a subset or combination of the geological features; and using the one or more geological features of interest for hydrocarbon management.

BRIEF DESCRIPTION OF THE DRAWINGS [0008] The present application is further described in the detailed description which follows, in reference to the noted plurality of drawings by way of non-limiting examples of exemplary implementations, in which like reference numerals represent similar parts throughout the several views of the drawings. In this regard, the appended drawings illustrate only exemplary implementations and are therefore not to be considered limiting of scope, for the disclosure may admit to other equally effective embodiments and applications. The seismic images (partial stacks) used in the illustrations are provided to the public by New Zealand Petroleum and Minerals. These publicly available seismic images are accessible through New

[0009] FIG. 1 A is a scatter plot of A and B values from pixels of a seismic image.

[0010] FIG. IB is a scatter plot of A and B mean values from a set of geobodies.

[0011] FIG. 2A shows a block diagram of an autoencoder.

[0012] FIG. 2B shows a block diagram of an autoencoder architecture.

[0013] FIG. 3A shows a sequence of transitions between the various spaces, including seismic space, embedding space, and target task-based space.

[0014] FIG. 3B shows a block diagram of a methodology for identifying seismic embeddings in order to perform seismic interpretation.

[0015] FIG. 4 is an illustration for training an autoencoder (e.g., encoder+decoder) in order to disentangle classes of features at embedding space z.

[0016] FIG. 5 A is an illustration for training an encoder network (e.g., a CNN) in order to learn the representation of distinctive features of images at embedding space z using contrastive loss.

[0017] FIG. 5B illustrates training of a mapping model from the embedding space (e.g., output of the encoder trained in FIG. 5 A) to the semantic segmentation of the geobodies. [0018] FIG. 6 is an illustration of stratigraphic zones detected from a seismic image volume by a manual pick, with various shades of color representing a different stratigraphic zone. [0019] FIG. 7 illustrates distributions of seismic image pixels in AB space versus in a learned embedding space.

[0020] FIG. 8 illustrates a stratigraphic zone, unlabeled and labeled data points (e.g., with labels DHI and non-DHI) for a given pre-stack seismic data.

[0021] FIG. 9 illustrates a scatter plot of the data points with calibration labels with a first inset indicating an area recognized as DHI and a second inset indicating an area recognized as non-DHI.

[0022] FIG. 10 illustrates examples for training a classifier of the embedded patches (e.g., 32x32 patched are represented in embedding space by a 512 dimensional vector) and predictions at the rest of the image (middle and right columns), with the classifier being trained only with the label in the left column of the first row, and with the first row showing the labeled DHI geobodies and the second row showing the predictions of the DHI geobodies with the trained classifier.

[0023] FIG. 11 illustrates a workflow for detecting seismic features by mapping images to embedding space to disentangle one or more geobody features.

[0024] FIG. 12 is a diagram of an exemplary computer system that may be utilized to implement the methods described herein.

DETAILED DESCRIPTION OF THE INVENTION [0025] The methods, devices, systems, and other features discussed below may be embodied in a number of different forms. Not all of the depicted components may be required, however, and some implementations may include additional, different, or fewer components from those expressly described in this disclosure. Variations in the arrangement and type of the components may be made without departing from the spirit or scope of the claims as set forth herein. Further, variations in the processes described, including the addition, deletion, or rearranging and order of logical operations, may be made without departing from the spirit or scope of the claims as set forth herein.

[0026] It is to be understood that the present disclosure is not limited to particular devices or methods, which may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. As used herein, the singular forms “a,” “an,” and “the” include singular and plural referents unless the content clearly dictates otherwise. Furthermore, the words “can” and “may” are used throughout this application in a permissive sense (i.e., having the potential to, being able to), not in a mandatory sense (i.e., must). The term “include,” and derivations thereof, mean “including, but not limited to.” The term “coupled” means directly or indirectly connected. The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects. The term “uniform” means substantially equal for each sub-element, within about ±10% variation.

[0027] The term “seismic data” as used herein broadly means any data received and/or recorded as part of the seismic surveying and interpretation process, including displacement, velocity and/or acceleration, pressure and/or rotation, wave reflection, and/or refraction data. “Seismic data” is also intended to include any data (e.g., seismic image, migration image, reverse-time migration image, pre-stack image, partially-stack image, full-stack image, poststack image or seismic attribute image) or interpretation quantities, including geophysical properties such as one or more of: elastic properties (e.g., P and/or S wave velocity, P- Impedance, S-Impedance, density, attenuation, anisotropy and the like); and porosity, permeability or the like, that the ordinarily skilled artisan at the time of this disclosure will recognize may be inferred or otherwise derived from such data received and/or recorded as part of the seismic surveying and interpretation process. Thus, this disclosure may at times refer to “seismic data and/or data derived therefrom,” or equivalently simply to “seismic data.” Both terms are intended to include both measured/recorded seismic data and such derived data, unless the context clearly indicates that only one or the other is intended. “Seismic data” may also include data derived from traditional seismic (e.g., acoustic) data sets in conjunction with other geophysical data, including, for example, gravity plus seismic; gravity plus electromagnetic plus seismic data, etc. For example, joint-inversion utilizes multiple geophysical data types. [0028] The term “geophysical data” as used herein broadly includes seismic data, as well as other data obtained from non-seismic geophysical methods such as electrical resistivity. In this regard, examples of geophysical data include, but are not limited to, seismic data, gravity surveys, magnetic data, electromagnetic data, well logs, image logs, radar data, or temperature data.

[0029] The term “geological features” (interchangeably termed geo-features) as used herein broadly includes attributes associated with a subsurface, such as any one, any combination, or all of: subsurface geological structures (e.g., channels, volcanos, salt bodies, geological bodies, geological layers, etc.); boundaries between subsurface geological structures (e.g., a boundary between geological layers or formations, etc.); or structure details about a subsurface formation (e.g., subsurface horizons, subsurface faults, mineral deposits, bright spots, salt welds, distributions or proportions of geological features (e.g., lithotype proportions, facies relationships, distribution of petrophysical properties within a defined depositional facies), etc.). In this regard, geological features may include one or more subsurface features, such as subsurface fluid features, that may be hydrocarbon indicators (e.g., Direct Hydrocarbon Indicator (DHI)). Examples of geological features include, without limitation salt, fault, channel, environment of deposition (EoD), facies, carbonate, rock types (e.g., sand and shale), horizon, stratigraphy, or geological time, and are disclosed in US Patent Application Publication No. 2010/0186950 Al, incorporated by reference herein in its entirety.

[0030] The terms “velocity model,” “density model,” “physical property model,” or other similar terms as used herein refer to a numerical representation of parameters for subsurface regions. Generally, the numerical representation includes an array of numbers, typically a 2-D or 3-D array, where each number, which may be called a “model parameter,” is a value of velocity, density, or another physical property in a cell, where a subsurface region has been conceptually divided into discrete cells for computational purposes. For example, the spatial distribution of velocity may be modeled using constant-velocity units (layers) through which ray paths obeying Snell’s law can be traced. A 3-D geologic model (particularly a model represented in image form) may be represented in volume elements (voxels), in a similar way that a photograph (or 2-D geologic model) is represented by picture elements (pixels). Such numerical representations may be shape-based or functional forms in addition to, or in lieu of, cell-based numerical representations.

[0031] The term “subsurface model” as used herein refer to a numerical, spatial representation of a specified region or properties in the subsurface. [0032] The term “geologic model” as used herein refer to a subsurface model that is aligned with specified geological feature such as faults and specified horizons.

[0033] The term “reservoir model” as used herein refer to a geologic model where a plurality of locations have assigned properties including any one, any combination, or all of rock type, EoD, subtypes of EoD (sub-EoD), porosity, clay volume, permeability, fluid saturations, etc.

[0034] For the purpose of the present disclosure, subsurface model, geologic model, and reservoir model are used interchangeably unless denoted otherwise.

[0035] Stratigraphic model is a spatial representation of the sequences of sediment, formations and rocks (rock types) in the subsurface. Stratigraphic model may also describe the depositional time or age of formations.

[0036] Structural model or framework results from structural analysis of reservoir or geobody based on the interpretation of 2D or 3D seismic images. For examples, the reservoir framework comprises horizons, faults and surfaces inferred from seismic at a reservoir section. [0037] As used herein, “hydrocarbon management” or “managing hydrocarbons” includes any one, any combination, or all of the following: hydrocarbon extraction; hydrocarbon production, (e.g., drilling a well and prospecting for, and/or producing, hydrocarbons using the well; and/or, causing a well to be drilled, e.g., to prospect for hydrocarbons); hydrocarbon exploration; identifying potential hydrocarbon-bearing formations; characterizing hydrocarbon -bearing formations; identifying well locations; determining well injection rates; determining well extraction rates; identifying reservoir connectivity; acquiring, disposing of, and/or abandoning hydrocarbon resources; reviewing prior hydrocarbon management decisions; and any other hydrocarbon-related acts or activities, such activities typically taking place with respect to a subsurface formation. The aforementioned broadly include not only the acts themselves (e.g., extraction, production, drilling a well, etc.), but also or instead the direction and/or causation of such acts (e.g., causing hydrocarbons to be extracted, causing hydrocarbons to be produced, causing a well to be drilled, causing the prospecting of hydrocarbons, etc.). Hydrocarbon management may include reservoir surveillance and/or geophysical optimization. For example, reservoir surveillance data may include, well production rates (how much water, oil, or gas is extracted over time), well injection rates (how much water or CO2 is injected over time), well pressure history, and time-lapse geophysical data. As another example, geophysical optimization may include a variety of methods geared to find an optimum model (and/or a series of models which orbit the optimum model) that is consistent with observed/measured geophysical data and geologic experience, process, and/or observation.

[0038] As used herein, “obtaining” data generally refers to any method or combination of methods of acquiring, collecting, or accessing data, including, for example, directly measuring or sensing a physical property, receiving transmitted data, selecting data from a group of physical sensors, identifying data in a data record, and retrieving data from one or more data libraries.

[0039] As used herein, terms such as “continual” and “continuous” generally refer to processes which occur repeatedly over time independent of an external trigger to instigate subsequent repetitions. In some instances, continual processes may repeat in real time, having minimal periods of inactivity between repetitions. In some instances, periods of inactivity may be inherent in the continual process.

[0040] If there is any conflict in the usages of a word or term in this specification and one or more patent or other documents that may be incorporated herein by reference, the definitions that are consistent with this specification should be adopted for the purposes of understanding this disclosure.

[0041] Seismic data may be analyzed to identify one or both of fluid or geologic elements in the subsurface. As one example, seismic data may be analyzed to detect the presence of hydrocarbon in order to perform hydrocarbon management. As another example, seismic data may be analyzed to categorize different rock types or lithology (e.g., seismic inversion produces derivative volumes such as Vclay and porosity, whose relationship may be used to categorize different rock types or lithology). In this regard, various fluids or geologic elements in the subsurface may be recognized based on analysis of one or more types of seismic input derivative volumes.

[0042] As discussed in the background, A VO analysis may be used to detect hydrocarbon in a subsurface. In particular, the input derivative volumes may be derived from the offset cubes, such as the A VO intercept and gradient volumes (e g., interpreting distributions of A and B values to distinguish the AVO anomalies from the background as an AVO response of hydrocarbon presence is expected to be an anomaly). These relationships may potentially show presence of fluid and may also show⁷ confidence in the anomaly by its signature with respect to the background. However, there are several drawbacks in the typical AVO analysis. As a general matter, A and B values may not be naturally clustered to separate the classes (see FIGS 1 A-B, which illustrate a scatter plot 100 of A and B values from pixels of a seismic image and a scatter plot 150 of A and B mean values from a set of geobodies, respectively). In this regard, A and B values may not be clustered to recognize hydrocarbon-bearing zones or geological features of interest. Thus, the AVO analysis may be combined with the other indicators to increase the confidence around fluid presence.

[0043] Separate from this, the AVO analysis may be subject various other deficiencies. As one example, the Zoeppritz equation is a crude approximation to the relationships of the field pre- and partially stacked images because of the complex interactions of seismic waves, noise and inaccuracies in processing and migration imaging. Such equations may be useful for reasoning about how seismic waves interact with the subsurface; however, these equations may be insufficient to process the data. As another example, flat reflections may be caused by a change in stratigraphy and may be misinterpreted as a fluid contact. As still another example, rocks with low impedance could be mistaken for hydrocarbons, such as coal beds, low density shale, ash, mud volcano, volcanoclastics, high porosity and wet sands, etc. As yet another example, polarity of the images may be incorrect, causing a bright amplitude in a high impedance zone. As another example, AVO responses may be obscured by superposition of seismic reflections and tuning effects. Finally, signals may be contaminated with systematic or acquisition noise.

[0044] In order to overcome any one, any combination, or all of the deficiencies in the art, in one or some embodiments, a methodology is disclosed that transforms, such as via a machine learning network from seismic data space (e.g., seismic image space) to a structured and compositional representation space where one or more geological features of interest are separated (e.g. separated by planes or hyperplanes). In one or some embodiments, the structured and compositional representation space comprises embedding space, which represents the one or more geological features of interest and measures the adjacency (e.g., distance) or compositional nature of the geological features from one to another. Various geological features of interest (e.g., DHI, play elements such as traps, seals, reservoirs, source or migration paths, rock types such as high porosity sand, shale, cement etc.) are contemplated, including any aspect that defines the subsurface, as discussed above. The discussion below describes the methodology in the context of detecting DHIs (e.g., AVO classes). The methodology may likewise be used with regard to detecting other geological features. In this regard, any discussion regarding detecting DHIs may likewise be applied to detecting other geological features. With regard to detecting DHIs, the methodology may take a set of partially- stack images or pre-stack images along with calibration data (e.g., a pair of seismic images and DHI label(s) in the image(s)) describing a feature of interest on the images and may output a model that maps the images to a new space where the features of interest are separated and classified with the calibration data. The calibration data may be generated in real-time during the interpretation step. For instance, the interpreter may manually segment a geological feature of interest.

[0045] Various types of calibration data are contemplated. Merely by way of example, the calibration data may be dense or sparse labels of the features or may be annotations limited to an isolated portion of the image volumes. Alternatively, or in addition, the calibration data may be obtained interactively from annotations of domain experts, from physics-based simulations (e.g., seismic simulations, rock physics models process stratigraphy) or may be extracted from available databases where samples of the futures of interest exist.

[0046] As discussed in more detail below, in one or some embodiments, the methodology may include an unsupervised learning approach. Various methods of unsupervised learning are contemplated. Example unsupervised learning methods for extracting features from images may be based on clustering methods (k-means), generative-adversarial networks (GANs), transformer-based networks, normal izing-flow networks, recurrent networks, or autoencoders, such as illustrated in the block diagram 200 in FIG. 2 A. An autoencoder 210 may learn a latent representation Z while reconstructing the image along with the following two functions: (1) an encoding function (performed by encoder 220) parameterized by Q that takes in image x as an input and outputs the values of latent variable z = (x); and (2) a decoding function (performed by decoder 230) parameterized by m that takes in the values of latent variables and outputs an image, x' = (z).

[0047] Training of an autoencoder may determine Q and m by solving the following optimization problem:

[0048] After training, one may use the learned encoding function (x) to map an image (or a portion of image such as a patch) to its latent space representation in Z. This representation of the image may be used for the image analysis. An example of an autoencoder architecture, which may include encoder 260 and decoder 270, is shown in the block diagram 250 in FIG. 2B.

[0049] The latent space typically captures the high-level features in the image x and has dimension much smaller than that of x. It is often difficult to interpret the latent space because the mapping from x to z is nonlinear and no structure over this space is enforced during the training. One approach may be to compare the images in this space with reference images (or patches) using a distance function measuring similarity between the pair (e.g.

- Z_Reference II). See A. Veillard, O. Morere, M. Grout and J. Gruffeille, “Fast 3D Seismic Interpretation with Unsupervised Deep Learning: Application to a Potash Network in the North Sea”, EAGE, 2018. There are two challenges for detecting DHIs with such an approach. First, DHIs are anomalous features in seismic images and autoencoders are designed to represent salient features, not the anomalous features. Anomalous features are typically treated as statistically not meaningful or significant for reconstructing images. Second, an autoencoder cannot guarantee to cluster image features in latent space and they may not be separable in the latent space.

[0050] Typically, identifying fluid presence from seismic data depends heavily on engineered image attributes, which may lead to unduly noisy results and may not lend itself to automation, but may nevertheless help the interpreter to select the fluids presence. Conversely, machine learning approaches based on supervised learning do not rely on the feature engineering, but require an abundant amount of labelled examples to train. The requirement of a large amount of labelled training data is a challenge for many important interpretation tasks, particularly for DHIs, One reason is that generating labelled data for fluid detection is a labor- intensive process. Another reason is that DHIs are often difficult to pick, particularly subtle Dills. Still another reason is that labeling for DHLs may introduce interpreter’s bias and/or a bias for what is known to be positive DHFs (as only so many DHFs may actually be drilled). This limits the amount of training data that may be generated even if with unlimited resources. Thus, detecting DHIs and other features of interest from seismic images may be difficult due to these limitations and due to the labor-intensive labelling of data.

[0051] In one or some embodiments, the methodology is configured to map the seismic data (e.g., the seismic images) to a structured representation space (e.g., embedding space) where geological features of interest (e.g,, DHIs or the other geological features of interests) are separated to be detected. In this way, the one or more geological features of interest may be identified.

[0052] The methodology may be performed in a one or more stages of machine learning. For example, in one or some embodiments, a single machine learning stage (such as from seismic space to embedding space) is used. Alternatively, multiple machine learning stages may be used, wherein the machine learning from a first or initial stage is used in the machine learning in a subsequent or second stage and wherein the first or initial stage and the second or subsequent stage are different in one or more aspects (e.g., any one, any combination, or all of: un supervised learning versus supervised learning; larger dataset, used versus smaller dataset used; different datasets used; or an initial set of geological features considered versus a subset of the initial set; etc.). In one particular example, the initial or first stage may comprise unsupervised (or self-supervised) machine learning with a larger dataset that is configured for pre-training (e.g., thereby generating a higher dimensional pre-trained model, such as an embedding model discussed below, that is directed to a larger set of geological features, such as an N-dimensional pre-trained model that is directed to a set of geological features N in number) and the subsequent or second stage may comprise supervised machine learning with a smaller labelled dataset that is configured to tailor to a subset of geological features of interest (e.g., thereby generating a lower dimensional tailored model, such as target task-based model discussed below, directed to the subset of geological features of interest, such as an M- dimensional tailored model that is directed to the subset of geological features of interest that is M in number, with M being less than N, such as with M being at least one order of magnitude, at least two orders of magnitude, etc. less than N), In another particular example, the initial or first stage may comprise the unsupervised machine learning with a larger dataset that is configured for pre-training and the subsequent or second stage may comprise unsupervised machine learning with a smaller dataset that is configured to tailor to a subset of geological features of interest. In this regard, the initial or first stage may comprise a pre-training stage directed to a general set of geological features and the subsequent or second stage may comprise one or more tailored stages in which the general set of geological features may be tailored to a subset of the general set of geological features (e.g., one subsequent stage may tailor to one subset of geological features of interest and another subsequent stage may tailor to another subset of geological features of interest, with the one subset of geological features of interest being different from the another subset of geological features of interest). In particular, a much smaller labeled dataset may be used in the subsequent or second stage, thereby necessitating less labeling of data. Further, the two-stage approach (e.g., unsupervised and then supervised) may balance the interests of specificity in providing a labeled dataset while reducing the overall burden typically associated with supervised learning. As the model size (e.g., number of parameters such as weights and biases of the machine learning model) increases, the chances of separating geological features in the embedding space increase as well. The larger sized pre-trained models may learn a specific task (e.g. segmenting DHI geobodies) with fewer examples (e.g. qualifying them as few- shot learners).

[0053] As discussed in more detail below', the transition from seismic space to embedding space enables representation of a set of the geological features (such as a genera! set of geological features), with analysis of at least one aspect of the representation in embedding space indicative of similarity or difference. Various ways of transitioning from seismic space to embedding space are contemplated. In particular, seismic data, such as two image patches, may be compared with one another to determine an indicator of similarity or dissimilarity. As one example, embedding space may be constructed by taking an image or a set of images, performing a transformation on the image or the set of images (such as rotating the image, cropping the image, etc.), and representing the image/transformed image or the set of images/set of transformed images in latent space (e.g., embedding space) as similar to one another because the transformed image or images are derived from the image. As another example, images (such as image patches) with at least one difference (such as from different stacks) may be represented in embedding space as dissimilar to one another because two images are originated from the different part of the image volume. Various mathematical descriptors are contemplated that manifest the indication of similarity or dissimilarity. As merely one example, distance may comprise an indicator (e.g., an indicator with smaller dimensionality may be illustrated in a scatter plot, discussed further below, in which a smaller distance in the scatter plot is indicative of more similarity versus a larger distance indicative of less similarity). [0054] Referring back to the figures, FIG. 3 A shows a sequence of transitions 300 between the various spaces, including seismic space 310, general embedding space 312, and targeted embedding space 314. Seismic space 310 may comprise compiling or accessing seismic data. In the transition to embedding space 312 (e.g., via machine learning with the seismic data), the generated (or learned) embedding model (interchangeably termed an embedding space model) may be highly dimensional and may be directed to a generalized set of geological features. After which, the transition to target task-based space 314 may be performed in which the embedding model is tailored to generate a target task-based model. Various methodologies are contemplated in which to tailor the embedding model. As one example, a different set of data (such as labelled data on a subset of the data used to train the embedding space or on a different set of data) may be used to tailor the embedding model. As another example, the machine learning model, used to generate the embedding model, may be refined using retraining. In this regard, tailoring of the embedding model may be performed in one of several ways, such as by modifying the embedding model and/or refining the machine learning model using retraining. Space 314 may include a classification or categorization space (e.g. DHI or no-DHI) possibly with multiple classes (e.g., A VO Class I, A VO Class II, A VO Class Up, A VO Class III, or A VO Class IV) or a segmentation space where each pixel assigned to a class. Space 314 may thus be dependent on the specific task considered.

[0055] FIG. 3B shows a block diagram 320 of a methodology for identifying seismic embeddings in order to perform seismic interpretation. As shown, a pre-training stage comprises using a pre-training dataset 222, which may be unlabeled. Further, machine learning 324 may be performed in an unsupervised or a self-supervised manner. In one example, the machine learning 324 may perform self-supervised learning as a contrastive task. In particular, the machine learning 324 may access an image patch, modify it in some manner (e.g., delete a part of the image patch; rotate the image patch; rotate the image phases in the frequency domain; augment the image frequencies such as boosting low frequencies; low-pass or high- pass images; generate new images from the same reflection data by modifying velocity volumes used to generate the images; etc.), such that the representation of the image patch and the modified image patch are represented in embedding space as similar or close to each other in the embedding space. Alternatively, the machine learning 324 may access two separate images that are dissimilar in one or more aspects (e.g., from different stacks or different geographical locations or stratigraphic zones), and represent the two separate images in embedding space as dissimilar or apart from each other in the embedding space. In this way, the self-supervised process may create artificial labels in order to pre-train the model, which may be high-dimensional due to the large number of geologic features modeled. The self- supervised task may be based on an auto-regressive reconstructive or generative task. This generative task may predict an image patch from its adjacent image patches (e.g., adjacent in spatial coordinate system) in a seismic volume. Various image patches are contemplated. As one example, the patch here may be as small as a pixel or sequence of pixels (e.g., 25 pixels in depth direction may be used to predict the adjacent 26^th pixel in depth direction). The adjacent patch could be the next patch in the depth or the time domain or in the lateral geographical direction. The patches sampled in the depth direction may be considered as generated by the depositional system (or process stratigraphy). The depositiona! systems (or process stratigraphy) may be considered as auto-regressive systems, which may be interrupted by the fault or folding processes. The auto-regressive reconstructive or generative tasks may be used to learn the features and context of the depositional process by generating a patch from a neighboring (e.g., adjacent) patch. The learned features may then be used in the specific target tasks. The auto-regressive generative tasks may also be applied to the adjacent images in pre- or partial-stack axes. For instance, the pre-training process may predict mid-stack image from a near-stack image. While generating those mid-stack images, it may learn the image features and learn how images are modified with the offset (such as A VO effects). Subsequently, those learned features may later be transferred for learning a specific target task. Such auto-regressive generative pre-training tasks on partial-stack images may straightforwardly be extended to the pre-stack images,

[0056] The pre-trained model may then be used in one or more ways. In one way, the pre- trained model may be used without additional clustering (e.g., the high-dimensional pre-trained model is used without further modification or combination). In another way, the pre-trained model may be combined with another model (e.g., the pre-trained model may be combined with a separately determined or already-existing model). In still another way, the pre-trained model may be further refined using additional self-supervised learning, such as at 326. In particular, machine learning 326 may focus on a subset or combination of the general set of geological features, with the subset or combination being composed of the geological features of interest, thereby generating a lower-dimensional model directed to the geological features of interest.

[0057] In yet another way, the pre-trained model may be further refined (e.g., tuned) using additional supervised learning with a labeled dataset. The architecture of the pre-trained model may be modified to output prediction in the target space, such as, for instance, by adding a multilayer perception model as a classifier to map embedding space to the target class space. As merely one example, a fine-tuning dataset 328, which may or may not be a subset or combination of 322, may include labels that are directed to the geological features of interest. Additional machine learning model 330 may use the fine-tuning dataset 328 and the labeled dataset 332 in order to fine tune the pre-trained model or its modified version for the specific target task. In one or some embodiments, the machine learning model 330 may be trained in real-time interactively using the fine-tuning dataset 328 which may be much smaller (e.g., labels may be created in real-time, thereby adjusting the parameters of the model 330 in real- time) than the pre-training dataset. The pre-trained model 324 may also be fine-timed along with the model 330. This may or may not be achieved in real time due to update on the parameters of the larger sized pre-trained model 324. The example in FIG. 3B show's images in which salt is labeled in the image patch. In this regard, the labeled dataset 332 may be represented as no salt = 0 (or black) and salt = 1 (or white). In this way, the pre-trained model may be combined with a simpler model (e.g., a linear model) for a very specific task (e.g., identifying salt regions).

[0058] Further, in some instances, it may be simpler to interpret the embedding space rather than the image space because the target features of interests may naturally be clustered in the embedding space learned through the pre-training, in particular, clustering, as an unsupervised method, may be used to understand embedding within embedding space, thereby assisting users to identify notable clusters so that the user may interpret the clustered space directly. As one example, textures subject to classification may be difficult to explore (e.g., there may be source wavelet led blurring of layers). As such, it may be easier to explore/label in embedding space instead of in seismic space (e.g., on the image itself). This may accelerate labeling so that transitioning from the high-dimensional embedding space to the lower dimensional target space may be simpler through training a simpler model (e.g., a linear classifier). This type of methodology may be particular applicable with regard to A VO, where clustering may be difficult, particularly in the context of outliers.

[0059] By way of example, the one or more geological features of interest may include fluid presence or other geological feature(s) (e.g., any one, any combination, or all of: hydrocarbon trap; seal; stratigraphic zone; facies; geological reservoir; fluid-fluid contact; environment of deposition; rock types such as sand, shale cement; etc.). To that end, pre- or partially-stack images and the relationships in pre-stack images or in partially-stack images may be analyzed.

[0060] Referring to FIG. 4, illustration 400 is for training an autoencoder (e.g., encoder+decoder) in order to disentangle classes of features at embedding space z 410. Images 402 and calibration data 404 are used to generate class batches 406 of pairs of similar and dissimilar geological features, as input to encoder 408 in order to generate embedding space z 410. In turn, decoder 412 may be used to generate images 414. For instance, class 1 batch includes all the segmented bag of pixels belong to the same class 1. When the autoencoder model is trained with in-class samples (e.g., sampled from class 1), the parameters of the model may be updated to reduce a distance among the samples representations in the embedding space (z) 410. When the model is trained with the out-of-class samples, it may be updated to increase the distance among the samples representations in the embedding space (z) 410.

[0061] Without loss of generality, one may assume to have a pre-stack seismic image X in which each value is represented by X_0ijk where index o refers to offset axis, index i refers to depth or depth-equivalent time axis, and indices j and k refer to in-line and cross-line axes respectively for a subsurface volume. The offset dimension size may be on the order of at least tens or at least hundreds for the pre-stack images, however, it is typically at least two or three (near stack, middle stack and far stack) for the partially-stack images. The partially-stack images may stack pre-stack images for ranges of various offsets between source and receiver pairs (the offset can also be equivalently translated into incident angles of primary waves). In this regard, the partial-stack images may be constructed from moveout corrected, pre-stack gathers containing various offsets between source and receiver pairs. The image has a size of N = N_Q X Ni x N_j x N_k where N_a refers to the number cells along axis α (α E { o , i,j, k}). The amplitude changes in a spatio-temporal-offset context may indicate the fluid type (e.g. hydrocarbon or water) or a geological feature type. For instance, when the semblance metric (coherence of seismic energy from multiple offset channels normalized by the total energy from all channels) among the neighboring cells combined with the amplitude changes along offset direction can be a stronger indicator of the fluid presence.

[0062] x may comprise a patch or subset image (ID, 2D, 3D or 4D) extracted from X , where x e X, y is a mask describing class that x belongs at each pixel in x. y may also include empty set when labels are not available for some x. For example, y may take integer values 0, 1, 2 for no-label, non-DHI, and DHI classes respectively. Samples of patches may be drawn from random locations of the image volumes to form a batch which may have the same class, different class or no-label samples as shown in FIG. 4. The patches may be augmented to form new samples. The augmentation methods may be based on seismic, petrophysics or handcrafted synthetic datasets and may alter the class of the sample.

[0063] In cases of a limited calibration dataset (e.g., only one DHI geobody label), one may use self-supervised learning which uses data augmentation to learn an effective representation (e.g., embedding) of the distinctive features that makes up the seismic images (pre- or partially- stack images). A geology and geophysics-aware data augmentation takes an image (pre-stack or partially stack) x_a and transforms into another images x_p or x_n . During the training of machine learning model translating images to their embeddings, if images x_a and x_p (positive pair or in-class pair) are related, the distance between their representations in the embedding space is minimized; otherwise, for other images (e.g., negative pair x_a and x_n or out-of-class pair), the distance is maximized.

[0064] Various data augmentation methods are contemplated. Example data augmentation methods include: (1) cropping of an image and resizing back to its original size; (2) image downsampling and resampling to its original size (e.g., zoom in and out); (3) image cutout, amplitude modifications in the offset axis which may produce positive or negative pairs (modifications from DHI to non-DHI class or vice versa or modifications among AVO classes); (4) image frequency augmentation (blurring or smoothing); or (5) phase rotations. The augmentation may be expanded to an already augmented image x_p to generate a new augmented image x_pp or x_pn.

[0065] The augmented samples in the batch may then be passed through the encoder to calculate their corresponding z vector values and the decoder function may input z values to reconstruct the associated images x' . The parameters of the encoder and decoder functions, q,m, may be determined by minimizing the following functional (Loss) based on multiple objectives:

Loss = (reconstruction loss)

+a(distance between the same class samples in z — distance between the different class samples in z) + b (regularization over z) (3)

[0066] The loss functional may take several forms, such as according to the following:

where indexes a, p ,n refer to the reference sample, the in-class sample, out-of-class sample respectively, y_a, y_p, y_n are class labels for samples a, p and n, N_B is the batch size, D_xy() is a distance measure such as ||x — between two samples x and y either in space (x) or (z),

R(z) is the regularization term over latent space z (e.g., Kullback-Leiber divergence for enforcing z distribution to be a standard normal distribution over the batch, KC(N(m_z, s_z)||N(0,1)) ), m, α and β are hyper-parameters tuned by the domain expert or a meta-learning algorithms (e.g., Bayesian optimization methods). In Equations (3) and (4), if label y is null (or, for example, 0 for the practical applications) for sample a, the distance calculations in z (terms multiplied with α ) may be omitted.

[0067] The objective functional in Equation (4) may be simplified by dropping the reconstruction and regularization terms. This simplification eliminates a need for the decoder and may depend on an encoder network. Equation (4) may further be simplified by avoiding explicit negative sampling. As such, the loss function for a positive pair of examples (α, p) may be defined as contrastive learning loss [see A. V. D. Oord, Y. Li and O. Vinyals, Representation learning with contrastive predictive coding. ArXiv preprint, arXiv: 1807.03748, 2018]:

||) may also be a cosine distance between z_a and z_p , and t is a hyperparameter scaling the distances between pairs to keep the distance values finite. In some cases, a projected embedding space may be preferred for the loss function. In such cases, D_a,p(z) will take a form ofD_a,p(g(z)) with g(z^') = W₁a(W₂z) where s is a rectified linear unit function (ReLU), and W_x and W₂ are matrices to be learned during the training which minimizes functional in Equation (5).

[0068] FIG. 5A is a diagram 500 for training an encoder network (e.g., CNN) in order to learn the representation of distinctive features of images at embedding space z using contrastive loss given in Equation (5). Various ways to measure loss, other than contrastive loss, are contemplated (e.g., triplet loss). As shown in FIG. 5A, CNNs 510, 512, 514, 516 may input various image patches (such as image patch 520, 522 from image patch 1 and image patch 524, 526 from image patch 2) in order to determine similarity or dissimilarity of the different image patches. As shown, image patch 522 is a subsection of image patch 520, thus resulting in a similarity determination. Further, an image patch, which is rotated from another image patch, may be deemed similar as well. Likewise, image patch 524 is a subsection of image patch 526, thus resulting in a similarity determination. Conversely, image patch 522 is different from image patch 524, resulting in a dissimilarity determination.

[0069] FIG. 5B is a diagram for training of a mapping model (identified as Model 566) from the embedding space (represented by 564 and which is the output of the encoder trained in FIG. 5 A) to the semantic segmentation of the geobodies. Thus, an image patch 560 may be input to CNN 562 to generate embedding 564, which is used to train Model 566, which is used to identify one or more geobodies of interest, such as illustrated by output 568.

[0070] The objective functional in Equations (4) or (5) may be minimized by using a gradient-based optimization method (e.g., stochastic gradient descent, ADAM, etc.) over a dataset associated with a stratigraphic zone or multiple analog stratigraphic zones. The stratigraphic zones may determine the seismic and geological context where the distinctive features are separated. In one or some embodiments, the stratigraphic zone(s) may be used as input to the methodology because subtle DHIs (e.g., lower porosity deep reservoirs) may be more difficult to recognize when they are taken out of a stratigraphic zone (or seismic context) or when they are combined with the apparent DHIs (e.g., shallow high porosity reservoirs). The stratigraphic zones (or segments) may be determined in a variety of ways, such as by using another machine learning model trained with or without a labelled dataset. Alternatively, the stratigraphic zones may be determined by a domain expert. An example of stratigraphic zones segmented by a domain expert are shown in the illustration 600 in FIG. 6. Each of the different shades represents a different stratigraphic zone, such as zone 1, zone 2, zone 3, or zone 4. [0071] The parameters ( θ and μ ) of the encoder (E_θ) and decoder (D_μ) may be determined by solving Equation (3). The encoder may be used to calculate the embedding values z of new data points within the same or analog stratigraphic zones. This inference is illustrated in the distributions 700, 710 in FIG. 7. Specifically, the data points are represented in AB space (shown in distribution 700) where both classes (DHI and non-DHI) are tangled. This is also illustrated in FIGS. 1 A-B. When these data points are mapped to z space (e.g., two dimensional z space with components z_x and z₂ shown in distribution 710) using the trained encoder, they may be separated and clustered. Some of the unlabeled data points are also separated as DHI or non-DHIs. The inference over the new data points may be performed using a distance measure between clusters and the data point or using a clustering method (e.g., K-means, support-vector-machines (SVM), spectral clustering methods, kernel methods and etc.). In other cases, the inference can be performed using a classifier network which maps the embedding z values to class identifier (e.g., 0, 1, 2, ...). The augmentation strategies (e.g., image cropping and/or rotation) in 500 may be altered or dropped to maximize the classifier network performance. The choice of the augmentation methods is a hyperparameter that may be optimized using a Bayesian optimization method or meta-learning algorithm.

[0072] In one or some embodiments, performance of the target-task-specific model may depend on the target specific task. For classification tasks, the performance may be based on FI score. The training objective (loss) function used for fmetuning the model for the target- specific task may also be the metric for evaluating the performance. For instance, for a binary segmentation task, binary cross entropy loss may be used as a training loss and for evaluating the performance of the model for this target specific task. In turn, application of the augmentation strategy (e.g., whether applied at all or what type of augmentation strategy) may depend on the determined performance of the target-task-specific model. For example, when the augmentation method is used for the pre-training process, and the determined target- specific-task performance of the mode is better than the instance of not using the augmentation method, then the augmentation method is included for the pre-straining stage. Conversely, responsive to determining that the performance has not improved, the augmentation method is excluded from the pre-training stage.

[0073] If the simplified objective functional in Equation (5) is used, the training may determine parameters Q of the encoder (E_θ) and matrices W₁ and W₂ by solving Equation (5). Once training of the encoder is performed, a downstream task (e.g., segmentation or classification) can be performed using another machine learning model, which may map embedding values to the desired outputs (e.g., class probabilities - DHI or non-DHI, or semantic segmentation probabilities) of the downstream tasks.

[0074] Embedding space may thus be used in a variety of contexts, such as analyzing different types of datasets (e.g., seismic images (including real or synthetic data), dialogs, etc.), tracking, or cataloging. With regard to analyzing different datasets, one may use embedding space to learn across different datasets (e.g., an unclassified dataset may be compared with one or more known or classified datasets in order to determine the similarity/dissimilarity of the unclassified dataset with the one or more known or classified datasets; meta information or other types of tagged information associated with an unclassified dataset may be used to retrieve similar datasets).

[0075] One example with a real pre-stack dataset is carried out with labels shown in the illustration 800 in FIG. 8. In particular, a pre-determined stratigraphic zone, and a set of calibration labels (regions with DHI or Non-DHI labels) are plotted in FIG. 8. The pre-stack dataset is used to train an autoencoder model with the loss functional of Equation (4). The size of an input patch is 64x3x3x1 (N₀ x N_i x N_j x N_k) and the size of the latent space is 32.

The output of the decoder is 64x3x3x1, but also may be 64x1x1x1 for the reconstruction of the pre-stack data only at the middle cell of the 3X3 input patch.

[0076] The training is performed by a random selection of 64x3x3x1 patches from the stratigraphic zone. Following the training, the pre-stack seismic data is mapped to z space. The mapped data is plotted over the first two principal axis of the z space for visualization. A clear clustering of the dataset is shown in the mapping 900 in FIG. 9 with some overlapped labels. As shown, section 910 corresponds to an area recognized as DHI 930 and section 920 corresponds to an area recognized as non-DHI 940. A portion of the DHI labels (shown within section 920 as 960 within a larger mass of non-DHI labels 950) are recognized as a non-DHI data points by this framework. When calibration labels are reanalyzed, there is huge uncertainty around these areas regarding whether they belong to the DHI anomaly or not. This approach, illustrated in FIG. 9, suggests that they may belong to the non-DHI cluster. In this regard, the data points clustered with the learned embedding suggests that a part of the DHI labeled data points are non-DHI data points (as illustrate by the different shaded data points). These DHI labeled non-DHI data points may mostly belong to the lower part of the anomaly where the domain expert may not be confident whether they are actually DHI data points.

[0077] Another example of applying embedding space is illustrated in FIG. 10. In particular, the embeddings learned by solving Equation (5) over the same stratigraphic zone using a supervised training were examined to detect the DHI geobodies. First, an encoder was trained using the contrastive loss given by Equation (5), such as illustrated in FIG. 5A. Then, a linear model was trained using a single DHI geobody label to learn a mapping from the embedding space to the class identifier (e.g., 1 or 0) of DHI presence. For this test, a predefined threshold was set to determine what percentage of the segmented DHI needed to be present in the image to classify an image patch as a positive DHI image. It is demonstrated that the network may successfully detect all the labeled DHI attributes, as well as other plausible DHI geobodies as shown in FIG. 10, using merely a single label as calibration data. In particular, FIG. 10 illustrates examples for training a classifier of the embedded patches (1000, 1030, which are 32x32 patches that are represented in embedding space by a 512 dimensional vector) and predictions at the rest of the image (1010, 1020, 1040, 1050). The classifier is trained with the label as illustrated in 1000, 1030. 1000, 1010, 1020 show the labeled DHI geobodies and 1030, 1040, 1050 display the predictions of the DHI geobodies with the trained classifier. In this way, the methodology may predict similar image patches, where the similarity criteria is focused (e.g., a specific task of detecting DHIs may be used to determine whether one image, such as 1000 is similar to other image patches).

[0078] An example workflow 1100 for the methodology is summarized in FIG. 11. Stratigraphic zones 1110, seismic images 1120 and calibration data 1130 (labels describing the presence of fluid or a geological feature of interest) are gathered as inputs to the workflow. A machine learning model that learns seismic embeddings 1140 is trained with these data to learn a mapping function from seismic image to embedding space where the response of the fluid or geological feature of interest are separated. The embedding is used to classify the unlabeled section or other seismic cross sections of the same stratigraphic zone, thereby detecting seismic features with the learned embeddings 1150.

[0079] In all practical applications, the present technological advancement must be used in conjunction with a computer, programmed in accordance with the disclosures herein. For example, FIG. 12 is a diagram of an exemplary computer system 1200 that may be utilized to implement methods described herein. A central processing unit (CPU) 1202 is coupled to system bus 1204. The CPU 1202 may be any general-purpose CPU, although other types of architectures of CPU 1202 (or other components of exemplary computer system 1200) may be used as long as CPU 1202 (and other components of computer system 1200) supports the operations as described herein. Those of ordinary skill in the art will appreciate that, while only a single CPU 1202 is shown in FIG. 12, additional CPUs may be present. Moreover, the computer system 1200 may comprise a networked, multi-processor computer system that may include a hybrid parallel CPU/GPU system. The CPU 1202 may execute the various logical instructions according to various teachings disclosed herein. For example, the CPU 1202 may execute machine-level instructions for performing processing according to the operational flow described.

[0080] The computer system 1200 may also include computer components such as non- transitory, computer-readable media. Examples of computer-readable media include computer- readable non-transitory storage media, such as a random access memory (RAM) 1206, which may be SRAM, DRAM, SDRAM, or the like. The computer system 1200 may also include additional non-transitory, computer-readable storage media such as a read-only memory (ROM) 1208, which may be PROM, EPROM, EEPROM, or the like. RAM 1206 and ROM 1208 hold user and system data and programs, as is known in the art. The computer system 1200 may also include an input/output (EO) adapter 1210, a graphics processing unit (GPU) 1214, a communications adapter 1222, a user interface adapter 1224, a display driver 1216, and a display adapter 1218.

[0081] The EO adapter 1210 may connect additional non-transitory, computer-readable media such as storage device(s) 1212, including, for example, a hard drive, a compact disc (CD) drive, a floppy disk drive, a tape drive, and the like to computer system 1200. The storage device(s) may be used when RAM 1206 is insufficient for the memory requirements associated with storing data for operations of the present techniques. The data storage of the computer system 1200 may be used for storing information and/or other data used or generated as disclosed herein. For example, storage device(s) 1212 may be used to store configuration information or additional plug-ins in accordance with the present techniques. Further, user interface adapter 1224 couples user input devices, such as a keyboard 1228, a pointing device 1226 and/or output devices to the computer system 1200. The display adapter 1218 is driven by the CPU 1202 to control the display on a display device 1220 to, for example, present information to the user such as subsurface images generated according to methods described herein.

[0082] The architecture of computer system 1200 may be varied as desired. For example, any suitable processor-based device may be used, including without limitation personal computers, laptop computers, computer workstations, and multi-processor servers. Moreover, the present technological advancement may be implemented on application specific integrated circuits (ASICs) or very large scale integrated (VLSI) circuits. In fact, persons of ordinary skill in the art may use any number of suitable hardware structures capable of executing logical operations according to the present technological advancement. The term “processing circuit” encompasses a hardware processor (such as those found in the hardware devices noted above), ASICs, and VLSI circuits. Input data to the computer system 1200 may include various plug- ins and library files. Input data may additionally include configuration information.

[0083] Preferably, the computer is a high-performance computer (HPC), known to those skilled in the art. Such high-performance computers typically involve clusters of nodes, each node having multiple CPU’s and computer memory that allow parallel computation. The models may be visualized and edited using any interactive visualization programs and associated hardware, such as monitors and projectors. The architecture of system may vary and may be composed of any number of suitable hardware structures capable of executing logical operations and displaying the output according to the present technological advancement. Those of ordinary skill in the art are aware of suitable supercomputers available from Cray or IBM or other cloud computing based vendors such as Microsoft, Amazon.

[0084] The above-described techniques, and/or systems implementing such techniques, can further include hydrocarbon management based at least in part upon the above techniques, including using the one or more generated geological models in one or more aspects of hydrocarbon management. For instance, methods according to various embodiments may include managing hydrocarbons based at least in part upon the one or more generated geological models and data representations (e.g., seismic images, feature probability maps, feature objects, etc.) constructed according to the above-described methods. In particular, such methods may include drilling a well, and/or causing a well to be drilled, based at least in part upon the one or more generated geological models and data representations discussed herein (e.g., such that the well is located based at least in part upon a location determined from the models and/or data representations, which location may optionally be informed by other inputs, data, and/or analyses, as well) and further prospecting for and/or producing hydrocarbons using the well.

[0085] It is intended that the foregoing detailed description be understood as an illustration of selected forms that the invention can take and not as a definition of the invention. It is only the following claims, including all equivalents which are intended to define the scope of the claimed invention. Further, it should be noted that any aspect of any of the preferred embodiments described herein may be used alone or in combination with one another. Finally, persons skilled in the art will readily recognize that in preferred implementation, some or all of the steps in the disclosed method are performed using a computer so that the methodology is computer implemented. In such cases, the resulting physical properties model may be downloaded or saved to computer storage.

REFERENCES:

[0086] The following references are hereby incorporated by reference herein in their entirety, to the extent they are consistent with the disclosure of the present invention:

[0087] LeCun, Y., Bengio, Y., & Hinton, G., “Deep Learning.”, Nature 521, 436-444, 2015.

[0088] K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large- Scale Image Recognition. arXiv technical report, 2014.

[0089] J. Long, E. Shelhamer and T. Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR 2015.

[0090] O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention (MICCAI), Springer, LNCS, Vol.9351: 234-241, 2015.

[0091] C. Zhang, C. Frogner and T. Poggio, Automated Geophysical Feature Detection with Deep Learning. GPU Technology Conference, 2016.

[0092] Y. Jiang and B. Wulff, Detecting prospective structures in volumetric geo-seismic data using deep convolutional neural networks, Poster presented on November 15, 2016 at the annual foundation council meeting of the Bonn-Aachen International Center for Information Technology (b-it).

[0093] J. Mun, W. D. Jang, D. J. Sung and C. S. Kim, Comparison of objective functions in CNN-based prostate magnetic resonance image segmentation. 2017 IEEE International Conference on Image Processing (ICIP), Beijing, pp. 3859-3863, 2017. [0094] K.H. Zou, S.K. Warfield, A. Bharatha, C.M.C. Tempany, M.R. Kaus, S.J. Haker, W.M. Wells III, F.A. Jolesz, R. Kikinis, Statistical validation of image segmentation quality based on a spatial overlap index” Acad. Radiol., 11 (2), pp. 178-189, 2004.

[0095] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio, Generative adversarial nets. In Proceedings of NIPS, pp. 2672- 2680, 2014.

[0096] M. G. Bellemare, I. Danihelka, W. Dabney, S. Mohamed, B. Lakshminarayanan, S. Hoyer and R. Munos, The cramer distance as a solution to biased Wasserstein gradients. arXiv: 1705.10743, 2017. [0097] A. Veillard, O. Morere, M. Grout and J. Gruffeille, Fast 3D Seismic Interpretation with Unsupervised Deep Learning: Application to a Potash Network in the North Sea. EAGE, 2018.

[0098] A. V. D. Oord, Y. Li and O. Vinyals, Representation learning with contrastive predictive coding. arXiv preprint, arXiv: 1807.03748, 2018. [0099] Ting Chen and Simon Kornblith and Mohammad Norouzi and Geoffrey Hinton, A

Simple Framework for Contrastive Learning of Visual Representations. arXiv: 2002.05709, 2020

Claims

CLAIMS What is claimed is:

1. A computer-implemented method for identifying one or more geological features of interest from seismic data, the method comprising; accessing the seismic data; generating, based on machine learning with the seismic data an embedding model, the embedding model representing a plurality of geological features and indicating adjacency or compositional nature of the plurality of geological features from one to another in embedding space; tailoring the embedding model in order to identify the one or more geological features of interest, the one or more geological features of interest comprising a subset or combination of the geological features; and using the one or more geological features of interest for hydrocarbon management.

2. The method of claim 1, wherein generating the embedding model comprises performing unsupervised or self-supervised machine learning; and wherein tailoring the embedding model comprises performing supervised machine learning.

3. The method of claim 2, wherein an amount of the seismic data accessed for performing unsupervised machine learning is at least an order of magnitude greater than the amount of the seismic data accessed for performing supervised machine learning.

4. The method of claim 1 , wherein the one or more geological features of interest comprise direct hydrocarbon indicators (DHI) or subsurface fluids.

5. The method of claim 4, wherein the one or more geological features comprise any one, any combination, or all of: salt; fault; channel; environment of deposition (EoD); facies; carbonate; rock types; horizon; stratigraphy; or geological time.

6. The method of claim 4, wherein tailoring the embedding model in order to identify the one or more geological features of interest comprises: accessing fully-stack images, partially-stack images or pre-stack images and calibration data that describes the one or more geological features of interest on the fully-stack images, partially-stack images or pre-stack images; and generating a model that maps the fully-stack images, partially-stack images or pre-stack images to a new space where the one or more geological features of interest are separated and classified with the calibration data.

7. The method of claim 6, wherein the calibration data comprises one or more of: labels of the one or more geological features of interest in the fully-stack images, partially-stack images or pre-stack images; annotations limited to an isolated portion of image volumes; or information obtained interactively from annotations of domain experts or from physics- based simulations.

8. The method of claim 7, wherein generating the embedding model includes: augmenting seismic images; pairing the seismic images and the augmented seismic images; and training the embedding model using the paired seismic images and the augmented seismic images for learning mapping from a seismic image domain to an embedding space; and wherein tailoring the embedding model in order to identify the one or more geological features of interest comprises: training a target-task-specific model to infer hydrocarbon presence and the one or more geological features of interest from the embedding space using the calibration data; and inferring hydrocarbon presence and geological features outside the calibration data or at an analog data.

9. The method of claim 8, wherein augmenting is based on at least one of: cropping/resizing; downsampling/upsampling; image cutouts; amplitude modifications in an offset axis; frequency manipulations; or new image generations based on the modified velocity models or phase rotations.

10. The method of claim 9, further comprising: determining a performance of the target-task-specific model; and determining one or more augmentation strategies depending on the performance of the target-task-specific model.

11. The method of claim 8, further comprising determining a stratigraphic zone; and wherein inferring the hydrocarbon presence is based on the determined stratigraphic zone.

12. The method of claim 8, wherein the calibration data is based on multiple classes of objects including fluid and play element types such as geological seal, trap and container.

13. The method of claim 8, wherein generating the embedding model is based on one or more of the following architectures: encoder network; autoencoder network; variational autoencoder network; generative network; classifier network; discriminator network; generative-adversarial network; transformer-based network; normalizing-flow network; or recurrent network.

14. The method of claim 8, wherein the embedding model is modified with a target-task- specific model to map the embedding space to target feature space; wherein only the target-task-specific model is trained with labelled data; or the target-task-specific model and the embedding model are trained simultaneously with the labelled data.

15. The method of claim 1, wherein tailoring the embedding model is based on one or more of the following architectures: decoder network; classifier network; segmentation network; transformer-based network; or recurrent network.

16. The method of claim 1, wherein tailoring the embedding model includes using a clustering method based on a distance functional which measures distance between a data point and each of a plurality of clusters in embedding space and identifies a class of the data point based on a respective cluster the data point is closest to.

17. The method of claim 1 , wherein generating the embedding model comprises performing unsupervised machine learning; and wherein tailoring the embedding model comprises performing additional unsupervised machine learning.

18. The method of claim 1, wherein the one or more geological features of interest comprise play elements including any one, any combination, or all of traps, seals, reservoirs, source or migration paths.

19. The method of claim 1, wherein generating the embedding model comprises performing self-supervised machine learning; and wherein tailoring the embedding model comprises performing a supervised machine learning.

20. The method of claim 1, wherein the embedding model is used to perform at least one of: determine seismic analogs or generate a vector representation of meta information or other types of tagged information associated with a dataset which may be used to retrieve similar datasets.