EP4251853A1

EP4251853A1 - Concentration prediction in produced water

Info

Publication number: EP4251853A1
Application number: EP20853517.9A
Authority: EP
Inventors: Frank DESPINOIS; Najate OCHBOUK; Yann HALLOUARD
Original assignee: TotalEnergies Onetech SAS
Current assignee: TotalEnergies Onetech SAS
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2023-10-04
Also published as: US20240133293A1; WO2022144565A1

Abstract

The invention notably relates to a computer-implemented method of machine- learning a predictive model configured for predicting a concentration of an element in produced water of a given well of hydrocarbon production in a hydrocarbon reservoir having wells of hydrocarbon production. The method comprises providing a dataset comprising values of one or more geoscience well-wise variables. Each value corresponds to a respective well of hydrocarbon production in the hydrocarbon reservoir other than the given well. Each value that corresponds to a respective well is associated to a respective ground truth value representing a concentration of the element in the respective well. The method further comprises learning the predictive model based on the dataset. This forms an improved solution for predicting a concentration of an element in produced water of a given well of hydrocarbon production in a hydrocarbon reservoir.

Description

CONCENTRATION PREDICTION IN PRODUCED WATER

FIELD OF THE INVENTION

The invention relates to the field of computer programs and systems, and more specifically to a method, device and program related to machine-learning a predictive model configured for predicting a concentration of an element in produced water of a given well of hydrocarbon production in a hydrocarbon reservoir having wells of hydrocarbon production.

BACKGROUND

Hydrocarbon wells for extraction of natural gas and oil also produce a large amount of water, the so-called "produced water", as a by-product of the extraction process. On average three barrels of water are produced for one barrel of extracted oil. The produced water is either discharged into the environment or used to maintain pressure in the produced reservoirs, therefore reinjected into producing fields. As this water comes from aquifers in direct contact with hydrocarbons, it is rich in various metals and minerals and it is unsuitable for consumption or being discharged directly in the environment. Thus, the treating of the produced water in order to recover the metals and minerals optimizes the production process. In this direction, it is desired to estimate beforehand the quantity of the extractable metals and minerals in the produced water.

Most methods rely on direct well-wise measurement of the concentration of metals and minerals for the wells of hydrocarbon production which are the subjects of produced water treatment. The direct measurement may be performed, for example using Inductively coupled plasma (ICP) techniques. This can be cumbersome, in particular in the context of shale gas production where a reservoir may comprise thousands of production wells.

Document JP2013019815A relates to a method for estimating the elution amount of heavy metal elements into a water system for a long period of time. The method estimates changes in water quality including at least pH in a water system based on the content ratio of plural normative minerals which are possibly contained in soil, and estimates the elution amount of heavy metal elements based thereon. Within this context, there is still a need for an improved solution for predicting a concentration of an element in produced water of a given well of hydrocarbon production in a hydrocarbon reservoir.

SUMMARY OF THE INVENTION

It is therefore provided a computer-implemented method (also referred to as "machine-learning method" in the following) of machine-learning a predictive model configured for predicting a concentration of an element in produced water of a given well of hydrocarbon production in a hydrocarbon reservoir having wells of hydrocarbon production. The method comprises providing a dataset comprising values of one or more geoscience well-wise variables. Each value corresponds to a respective well of hydrocarbon production in the hydrocarbon reservoir other than the given well. Each value that corresponds to a respective well is associated to a respective ground truth value representing a concentration of the element in the respective well. The method also comprises learning the predictive model based on the dataset.

The method may comprise one or more of the following: the one or more geoscience variables comprise one or more geochemical variables, one or more geological variables, one or more well design variables, and/or one or more production variables; the one or more geochemical variables include a pH, an amount of dissolved salt, a water density, a CO2 concentration, a bicarbonate concentration, a chloride concentration, a sulphate concentration, a sodium concentration, a potassium concentration, a magnesium concentration, a calcium concentration, a strontium concentration, a barium concentration, an iron concentration, a hydrogen sulfide concentration, a manganese concentration, and/or a zinc concentration; the one or more geological variables include a water saturation, and/or an identifier for fault presence; the one or more well design variables include one or more dimensional variables, one or more positional variables, an identifier of a connected pipeline, and/or one or more elevation variables; the one or more production variables include a date of first production, a gas production value, a volume of injected fracking water, a volume of injected propane; the dataset comprises missing values of one or more variables for a number of wells of hydrocarbon production in the hydrocarbon reservoir, and the method comprises determining a respective filling value for each missing value; the one or more variables include the following variables which each have at least one missing value: a fault presence, an identifier of a connected pipeline, a volume of injected fracking water, a volume of injected propane, and/or one or more elevation variables; the number of missing values of the dataset is lower than 30% of the total number of values in the dataset, for example lower than 20% of the total number of values in the dataset; determining a given respective filling value for a given missing value of a given variable for the given well comprises inferring the given missing value from one or more values of the given variable in the dataset; the inferring is performed from historical values of the given variable for the given well, and/or values of the given variable for neighbouring wells; the inferring comprises computing a mean; or applying a machine- learnt inference model; the method further comprises, prior to the learning, analysing the dataset to obtain a set of key variables from the one or more geoscience variables, the learning being then based on a restriction of the dataset to the key variables associated to the respective ground truth value; and/or the element is one of lithium, cobalt, nickel, or cadmium. It is further provided a computer-implemented method (also referred to as "prediction method" in the following) for predicting a concentration of an element in produced water of a given well of hydrocarbon production in a hydrocarbon reservoir having wells of hydrocarbon production. The method comprises providing a predictive model learnt according to the machine-learning method. The method also comprises predicting the concentration of the element in the given well by applying the predictive model to values of the one or more geoscience well-wise variables corresponding to the given well.

It is further provided a computer program comprising instructions for performing the machine-learning method and/or the prediction method.

It is further provided a data structure representing a predictive model learnt according to the machine-learning method. The predictive model learnt thus have parameter (e.g. weight) values and/or hyperparameter values that have been set during the machine-learning method, and is thus adapted for accurate prediction. Indeed, during the training, the model learnt patterns from data that make it adapted for accurate prediction.

It is further provided the above dataset comprising values of one or more geoscience well-wise variables, wherein each value corresponds to a respective well of hydrocarbon production in the hydrocarbon reservoir other than the given well, and each value that corresponds to a respective well is associated to a respective ground truth value representing a concentration of the element in the respective well.

It is further provided a device comprising a data storage medium having recorded thereon the computer program, data structure and/or the dataset. The device may form or serve as a non-transitory computer-readable medium on a SaaS (Software as a service) or other server, or a cloud based platform, or the like. The device may alternatively comprise a processor coupled to the data storage medium. The device may thus form a system in whole or in part (e.g., the device is a subsystem of the overall system). The system may further comprise a graphical user interface coupled to the processor.

BRIEF DESCRIPTION OF THE DRAWINGS Embodiments of the invention will now be described, by way of non-limiting example, and in reference to the accompanying drawings, where:

FIG. 1 shows an example of the system; and

FIG. s 2 to 15 illustrate the method.

DETAILED DESCRIPTION OF THE INVENTION

It is hereby proposed a computer-method of machine-learning a predictive model configured for predicting a concentration of an element in produced water of a given well of hydrocarbon production (also referred to as "production well") in a hydrocarbon reservoir having wells of hydrocarbon production. The method comprises providing a dataset comprising values of one or more geoscience well-wise variables (i.e. each value of such a variable represents a geoscience quantity which defines a respective well). Each value corresponds to a respective well of hydrocarbon production in the hydrocarbon reservoir other than the given well. Each value that corresponds to a respective well is associated to a respective ground truth value representing a concentration of the element in the respective well. The method of machine learning further comprises learning the predictive model based on the dataset.

Such a method forms an improved solution for predicting a concentration of an element in produced water of a given well of hydrocarbon production in a hydrocarbon reservoir.

In particular, the method obtains a predictive model for predicting said concentration by learning said predictive model on a known dataset available for other wells of the reservoir. The wells of the reservoir may be physical wells in the real world that have been already drilled. The method thereby allows obtaining a particularly accurate predictive model. It has indeed been found that values of geoscience well-wise variables can be accurate predictors of the concentration of the element in produced water of hydrocarbon production wells. It has further been found that one can reach a particularly accurate predictive model, by machinelearning the model based on values of the geoscience well-wise variables which are available for certain wells of a same hydrocarbon reservoir and are associated to respective ground truth values representing the (true) concentration of the element in those wells. It is thought that the geoscience variable values of a well can explain the concentration of the element in produced water of the well, according to a theoretic geological model which is identical or similar enough among wells of a same hydrocarbon reservoir.

The machine-learning method may comprise performing well-wise measurements to obtain at least part of the values of the one or more geoscience well-wise variables and the respective ground truth values, and/or retrieving at least part of the values of the one or more geoscience well-wise variables and the respective ground truth values from one or more databases. The databases may have been populated upon performing well-wise measurements. Any well-wise measurement herein may have been obtained during a hydrocarbon production process performed on the hydrocarbon reservoir.

It is also proposed a method for predicting a concentration of an element in produced water of a given well of hydrocarbon production in a hydrocarbon reservoir having wells of hydrocarbon production. The method comprises providing a predictive model learnt according to the machine-learning method. The method further comprises predicting the concentration of the element in the given well by applying the predictive model to values of the one or more geoscience well-wise variables corresponding to the given well.

The method for predicting forms an improved solution for estimating the concentration of an element in a well without measuring it.

This is in particular of interest for a hydrocarbon reservoir before exploitation, or when the direct measurement of the concentration is difficult or impossible. Notably, such predictions are applicable in shale gas production where a reservoir may have many production wells (up to several thousands) and it is cumbersome and costly to measure the concentration in all production wells. In examples, the reservoir may be a shale gas reservoir and/or comprise more than 100, 500 or 1000 production wells.

The prediction method may comprise the method of machine-learning, or it may be performed afterwards. The prediction method may comprise performing measurements respective to the given well to obtain at least part of values of the one or more geoscience wellwise variables corresponding to the given well, and/or retrieving at least part of values of the one or more geoscience well-wise variables corresponding to the given well from one or more databases, for example the same one or more databases as before. The databases may have been populated upon performing well-wise measurements. Any measurement respective to the given well may have been obtained during a hydrocarbon production process performed on the hydrocarbon reservoir, for example the same hydrocarbon production process as before or a subsequent one.

The predicted concentration may be a concentration of the element at a certain point in time, i.e. an instantaneous value. The predicted concentration may be interpreted as a concentration at a current time, i.e. at the time when the predicted method is performed, or at a time corresponding to the values of the one or more geoscience well-wise variables corresponding to the given well, for example a time of measurement or of applicability of said values. In case the values correspond to different times, the predicted concentration may be interpreted as a concentration at any function thereof, for example an average time thereof.

It is further proposed such a hydrocarbon production process, comprising producing hydrocarbon with production wells of a hydrocarbon reservoir. The hydrocarbon production process also comprises, the machine-learning method based on a plurality of production wells of the reservoir, or, alternatively, retrieval/reception of a corresponding machine-learnt predictive model from (e.g. distant) hardware memory. The hydrocarbon process further comprises applying the prediction method for one or more production wells of the reservoir, to predict a concentration of the element for each such production well. The hydrocarbon production process may further comprise any measurement and/or data retrieval mentioned above.

The hydrocarbon process may further comprise adjusting the hydrocarbon production based on the result of the prediction. For example, the selection of a production well for production may amount to select a well having a highest (or lowest) predicted value of the concentration of the element, or a value higher (or lower) than a predetermined threshold. Alternatively, the selection may be performed according to a function which depends increasingly (or decreasingly) on the value of the concentration of the element, and optionally other parameters.

The reservoir may be a shale gas reservoir and/or comprise more than 100, 500 or 1000 production wells. The ground truth value may be available for only part of the production wells. In that case, said part may serve to form the dataset and perform the machine-learning method, while the prediction method may allow an easy estimation of the concentration of the element for the other wells, that is, those wells for which no ground truth value was available.

The hydrocarbon process may further comprise, aside production of hydrocarbon (i.e., oil and/or gas), extraction of the element from the produced water. The element may be sought to be extracted for its value, i.e., for the sake of production of said element. The element may be hazardous and be sought to be extracted to improve the quality of the produced water. The hydrocarbon process comprises applying the prediction method for one or more production wells of the reservoir, to predict a concentration of the element for each such production well for an extraction feasibility study. The extraction feasibility study may decide, based on the result of the prediction, if the concentration of the element in the produced water of the production well passes a necessary threshold to be extractable. Alternatively, or additionally, the extraction feasibility study may decide, based on the result of the prediction for the one or more production wells of the reservoir, for a best geographical location for one or more water treatment plants for the reservoir to extract the element. The best geographical locations may be chosen such that the one or more water treatment plants are as close as possible to one or more production wells with produced water richest in the element. This minimizes the costs of transporting and piping.

According to some examples, a time profile of the produced water of said one or more production wells for a time period may be provided. The amount of produced water may be measured in barrels per day (bpd). The prediction method may thus be followed by an estimation of a value of the total amount of the element in the produced water of said one or more production wells during said time period. For example, the predicted concentration may be multiplied by the total quantity of produced water over said time period (e.g. obtained as a function of the integral of the time profile). The feasibility study and the best geographical locations may depend on the estimated value of the total amount of the element in said one or more production wells.

The element may be one of lithium, cobalt, nickel, or cadmium. The concentration of the element in the produced water, i.e., the corresponding ground truth, may be measured by any known method, for example an Inductively Coupled Plasma (ICP) mass spectrometry. The concentration of the element may be measured, e.g., in mg/L.

Any of the methods is computer-implemented. This means that steps (or substantially all the steps) of the method are executed by at least one computer, or any system alike. Thus, steps of the method are performed by the computer, possibly fully automatically, or, semi-automatically. In examples, the triggering of at least some of the steps of the method may be performed through user-computer interaction. The level of user-computer interaction required may depend on the level of automatism foreseen and put in balance with the need to implement user's wishes. In examples, this level may be user-defined and/or pre-defined.

A typical example of computer-implementation of a method is to perform the method with a system adapted for this purpose. The system may comprise a processor coupled to a memory and a graphical user interface (GUI), the memory having recorded thereon a computer program comprising instructions for performing the method. The memory may also store a database. The memory is any hardware adapted for such storage, possibly comprising several physical distinct parts (e.g. one for the program, and possibly one for the database).

By "database", it is meant any collection of data (i.e. information) organized for search and retrieval (e.g. a relational database, e.g. based on a predetermined structured language, e.g. SQL). When stored on a memory, the database allows a rapid search and retrieval by a computer. Databases are indeed structured to facilitate storage, retrieval, modification, and deletion of data in conjunction with various data-processing operations. The database may consist of a file or set of files that can be broken down into records, each of which consists of one or more fields. Fields are the basic units of data storage. Users may retrieve data primarily through queries. Using keywords and sorting commands, users can rapidly search, rearrange, group, and select the field in many records to retrieve or create reports on particular aggregates of data according to the rules of the database management system being used.

FIG. 1 shows an example of the system, wherein the system is a client computer system, e.g. a workstation of a user.

The client computer of the example comprises a central processing unit (CPU) 1010 connected to an internal communication BUS 1000, a random-access memory (RAM) 1070 also connected to the BUS. The client computer is further provided with a graphical processing unit (GPU) 1110 which is associated with a video random access memory 1100 connected to the BUS. Video RAM 1100 is also known in the art as frame buffer. A mass storage device controller 1020 manages accesses to a mass memory device, such as hard drive 1030. Mass memory devices suitable for tangibly embodying computer program instructions and data include all forms of nonvolatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM disks 1040. Any of the foregoing may be supplemented by, or incorporated in, specially designed ASICs (application-specific integrated circuits). A network adapter 1050 manages accesses to a network 1060. The client computer may also include a haptic device 1090 such as cursor control device, a keyboard or the like. A cursor control device is used in the client computer to permit the user to selectively position a cursor at any desired location on display 1080. In addition, the cursor control device allows the user to select various commands, and input control signals. The cursor control device includes a number of signal generation devices for input control signals to system. Typically, a cursor control device may be a mouse, the button of the mouse being used to generate the signals. Alternatively or additionally, the client computer system may comprise a sensitive pad, and/or a sensitive screen.

The computer program may comprise instructions executable by a computer, the instructions comprising means for causing the above system to perform the method. The program may be recordable on any data storage medium, including the memory of the system. The program may for example be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. The program may be implemented as an apparatus, for example a product tangibly embodied in a machine-readable storage device for execution by a programmable processor. Method steps may be performed by a programmable processor executing a program of instructions to perform functions of the method by operating on input data and generating output. The processor may thus be programmable and coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. The application program may be implemented in a high-level procedural or object-oriented programming language, or in assembly or machine language if desired. In any case, the language may be a compiled or interpreted language. The program may be a full installation program or an update program. Application of the program on the system results in any case in instructions for performing the method.

The method of machine-learning comprises providing a dataset comprising values of one or more geoscience well-wise variables. The dataset may be provided upon a user action, for example by loading the dataset in a memory. By "geoscience variables" it is meant the variables of interest in the field of geoscience. The dataset is "well-wise", i.e., the values of one or more geoscience variables are in relation to a well of hydrocarbon production in a hydrocarbon reservoir. The hydrocarbon reservoir may comprise a plurality of wells of hydrocarbon production wells. Each value corresponding to a respective well are associated to a respective ground truth value. The ground truth value represents a concentration of the element in the respective well. By "each value corresponding to a respective well", it is meant the value that quantifies the variable for the respective well, i.e., the respective well has that value of said variable. Each value is also associated to a respective ground truth value. In an efficient implementation of the method of machine-learning, all values of a respective well are grouped, and the group is associated to the ground truth value representing the measured concentration of the element, for example, as a single line of a database.

The method further comprises learning the predictive model based on the dataset. The predictive model may take as input the one or more geoscience wellwise variables, and output a value for the concentration of the element. Each predictive model may be defined by one or more model parameters which define the relationship between the input and the output of the model. The value for the one or more model parameters may be obtained during the learning. In some examples, the predictive model may further comprise one or more hyperparameters. A hyperparameter is used to control the learning of the predictive model, i.e., to obtain the value for the one or more parameters. The value of the one or more (e.g. all) hyperparameters may be set by a user or obtained automatically by the method. The predictive model may be any of the known predictive models in the field of machinelearning. According to some examples of the method, the predictive model may be one of a linear regression learnt with a stochastic gradient descent (SGD), a k-nearest neighbours (KNN) algorithm, a random forest (RF) algorithm, or a multilayer perception (MLP). Hereinafter, MLP may be equivalently referred to as neural network (NN). In examples, the RF algorithm may comprise a number of decision trees, for example more than 10, 50, or 100. Each decision tree may have a maximum depth, for example smaller than 100, 50 or 20. The number of decision trees and their depth may be set according to any known method in the literature. In other examples, the structure of the NN may be set according to any known method in the literature.

The predictive model may be learnt according to any known machine-learning method. For that, each data piece of the dataset including values of the one or more geoscience variables in relation to the respective ground truth value forms a respective potential training sample. The training samples represent the diversity of the situations where the predictive model is to be used after being learnt. Any dataset referred herein (e.g. before or after restriction) may comprise a number of (e.g. potential) training samples higher than 1000, 10000, or 100000.

The learning may comprise a supervised training, which is inputted with at least part of the dataset.

In first examples, said at least part of the dataset is the restriction of the dataset to the values corresponding to a subset of the samples included in the dataset (i.e. above-mentioned grouped variables in the dataset). In other words, not all lines of the dataset are inputted to the training, only a subset thereof.

In second examples, which are combinable with the first examples, said at least part of the dataset is the restriction of the dataset to one or more geoscience wellwise variables. In other words, not all columns of the dataset are inputted to the training, only a subset thereof.

The part of the one or more geoscience variables on which the supervised training is performed may be equivalently called the "key variables". The key variables may comprise the variables most representing the concentration of the element or the variables most significant for a study. The learnt predictive model may then be inputted with all the one or more geoscience variables or only the key variables.

In tested examples of the first examples, of the second examples, and of the combination of the first and second examples, the at least part of the dataset comprised at most 80% of the values or the group of values in the dataset.

Learning the method based on a part of variables in the dataset forms an improved method of machine-learning as it enables to build a predictive model using some specific variables for different stages of production and thus the different levels of knowledge of the field, i.e., before the values of all geoscience variables in the dataset are filled. For example, before doing chemical measurements on the produced water, the provided dataset may only comprise values for one or more geographical variables. As known from the field of machine-learning, each training may comprise iteratively processing a respective dataset, for example mini-batch-by-mini-batch and modifying model parameters of the predictive model along the iterative processing. This may be performed according to a stochastic gradient descent. The model parameters may be initialized in any way for each training. The model parameters may be initialized in any arbitrary manner, for example randomly or each to the zero value. The learning may comprise minimizing a loss function, wherein the loss function represents a disparity between the concentration of the element of the training samples in the dataset, i.e., the ground truth value, and the value outputted by the predictive model, from the one or more geoscience well-wise variables corresponding to the ground truth value.

The method for predicting a concentration of an element comprises providing a predictive model which is, as discussed, learnt based on the dataset. The method then predicts the concentration of the element by inputting the values of the one or more geoscience well-wise variables to the predictive model for the given well to output a predicted value for the concentration of the element in produced water of the given well. In some embodiments, when the predictive model accepts only the key variables, the method for predicting may restrict the inputted values to the corresponding values of the key variables.

In examples, the one or more geoscience variables comprise one or more geochemical variables, one or more geological variables, one or more well design variables, and/or one or more production variables. Any of the one or more geoscience well-wise variables may be measured by any known method in engineering and/or chemistry.

According to some examples, the dataset may further comprise one or more identifiers assigned to each well to relate the well to the respective values of one or more geoscience variables. The identifier may be a number assigned to the well. The method may implement any other manner to perform such relation.

According to some examples, the provided dataset may comprise several sets of values for one or more of the one or more geoscience well-wise variables. Each set of values may relate to different times. According to some specific examples, all of the one more values in the dataset relate to a same time instant. According to these examples, the dataset may further comprise at least one sampling date. This allows enriching the dataset, while contextualizing the data appropriately.

The one or more production variables may include a date of first production, a gas production value, a volume of injected fracking water, and/or a volume of injected propane. The gas production value may be an average of the gas production of the production well during the year preceding date of measurement, e.g., in mcf/d. The volume of injected fracking water may be measured in Ib/gal relative to the gallons of produced water. The volume of injected propane may be measured in Ib/gal relative to the gallons of produced water.

The one or more well design variables may include one or more dimensional variables, one or more positional variables, an identifier of a connected pipeline, and/or one or more elevation variables. The one or more dimensional variables may include a total measured depth of the well which is measured, e.g., in meters, and/or a stage length or a length of completion of the well which is measured, e.g., in ft. One or more positional variables may include a geographical label (e.g., state and/or country) and/orgeographical location data (e.g., longitude and/or latitude measured, e.g., in ft). The geographical location data may be in the conical coordinate systems. The identifier for a connected pipeline may be the name of the pipeline connected to the well. The one or more elevation variables may comprise a Kelly Bushing elevation variable and/or Rotary Table elevation variable. The one or more elevation variables may be measured in ft.

The one or more geological variables may include one or more of a water saturation, and an identifier for fault presence. The water saturation may be a maximum of water saturation in the completed interval of the well . The identifier for fault presence may be a binary number to identify the presence or absence of a fault crossing the well path. In examples, the binary number is 1 to indicate the presence of a fault and is 0 to indicate the absence of a fault.

According to some examples, the one or more geological variables may also include a variable representing a fault density around each well. In some examples, the dataset may further comprise one or more saturation indices for each well. The one or more saturation indices may comprise a pressure of the reservoir, a temperature of the formation, and/or a percentage for one or more minerals in the formation.

The one or more geochemical variables may directly define the composition of the produced water. The one or more geochemical variables may include a variable which its value represent a hardness of the produced water. Alternatively, the one or more geochemical variables may include one or more variables which a combination of their value(s) represents a hardness of the produced water. The one or more geochemical variables may include a pH, an amount of dissolved salt, a water density, a CO2 concentration, a bicarbonate concentration, a chloride concentration, a sulphate concentration, a sodium concentration, a potassium concentration, a magnesium concentration, a calcium concentration, a strontium concentration, a barium concentration, an iron concentration, a hydrogen sulfide concentration, a manganese concentration, and/or a zinc concentration. The density of water may be measured in lb/ft³ . The pH may be an indicator of the acidity of the produced water and represents the potential of hydrogen. The amount of dissolved salt may be a value of total dissolved salt (TDS) in the produced water measured in mg/L. The CO2 and bicarbonate concentrations may be measured in percentage. Any of the chloride concentration, the sulphate concentration, the sodium concentration, the potassium concentration, the magnesium concentration, the calcium concentration, the strontium concentration, the barium concentration, the iron concentration, the hydrogen sulfide concentration, the manganese concentration, and/or the zinc concentration may be measured in mg/L. Any of the bicarbonate concentration, the chloride concentration, the sulphate concentration, the sodium concentration, the potassium concentration, the magnesium concentration, the calcium concentration, the strontium concentration, the barium concentration, the iron concentration, the hydrogen sulfide concentration, the manganese concentration, and/or the zinc concentration may be measured by an ICP mass spectrometry. The ICP mass spectrometry may involve measurement uncertainties of up to 10%. In a tested example which provided a particularly accurate predictive model, the one or more geoscience well-wise variables comprise at least or consist of the following exact list: a well identifier, a sampling date, a water density, a pH, a CO2 concentration, a bicarbonate concentration, a value of total dissolved salt a chloride concentration, a sulphate concentration, a sodium concentration, a potassium concentration, a magnesium concentration, a calcium concentration, a strontium concentration, a barium concentration, an iron concentration, a manganese concentration, a zinc concentration, a maximum water saturation, a stage length, a total measured depth, a state, a country, a fault presence, a date of first production, a gas production value, an identifier of a connected pipeline, a volume of injected fracking water, a latitude and a longitude, a volume of injected propane, a Kelly Bushing elevation variable, and/or a Rotary Table elevation variable.

The provided dataset may comprise missing values of one or more variables for a number of wells of hydrocarbon production in the hydrocarbon reservoir. This may be due to the fact that hydrocarbon production processes involve numerous players and actions, such that databases are huge and most often comprise such missing values. According to some examples, the method may comprise determining a respective filling value for each missing value, that is, a value that fills the gap left by the missing value in the dataset. The determined filling values may be inserted in the dataset. The determination of a respective filling value may be the first step of the method of machine-learning after providing the dataset. Other steps of the method, including the learning the predictive model, may thus be based on a "prepared" (i.e. pre-processed) dataset. By "prepared dataset", it is meant a dataset comprised of the provided dataset after determining and inserting the filling values.

The one or more variables may include the following variables which each have at least one missing value: a fault presence, an identifier of a connected pipeline, a volume of injected fracking water, a volume of injected propane, and/or one or more elevation variables. These variables are known to often have missing values in the field of hydrocarbon production. Yet, it has been found that values of these variables participate quite well in an accurate prediction of the concentration of elements in produced water. Thus, by providing filling values for these variables, the method avoids excluding them from the dataset. In this way, the machine-learning can benefit from the valuable information conveyed by the present (i.e. non-missing) values of these variables.

The number of missing values of the dataset may be lower than 30% of the total number of values in the dataset, for example lower than 20% of the total number of values in the dataset. According to some examples, the number of missing values for each one of the one or more geoscience well-wise variables may be lower than 30% of the total number of values of the respective variable in the dataset, for example lower than 20%. According to particularly accurate examples, the number of missing values in the dataset and/or the number of missing values corresponding to each variable may be lower than 2%, for example lower than 1%.

Determining a given respective filling value for a given missing value of a given variable for the given well may comprise setting the given filling value to a predefined value. According to some examples, missing value for a fault presence is set to 0 indicating the absence of the fault.

Determining a given respective filling value for a given missing value of a given variable for the given well may comprise inferring the given missing value from one or more values of the given variable in the dataset. The one or more values of the given variable in the dataset may comprise the values corresponding to the values of one or more groups of variables of the neighbouring wells. By "neighbouring wells", it is meant the wells with the closest geographically location to the well with missing corresponding value. In some examples, the neighbouring wells may be the one or more wells closer to the given well than a geographical distance threshold. The geographical distance threshold for choosing the neighbouring wells may be defined by the method or predefined by the user. The threshold may be set for example to 20 Km, thus the wells with the geographical position closer than 20 Km are considered as neighbouring wells. The threshold may present any value between 2 Km and 50 Km. In other examples, the neighbouring wells may be a number of closest wells to the given well. The number may be set to be smaller than 50, for example 10; then the 10 closest wells in the reservoir to the given well are the neighbouring wells.

According to some examples, the inferring may be performed from historical values of the given variable for the given well, and/or values of the given variable for neighbouring wells. The historical values for inferring may be the most recent ones to the corresponding sampling date of the missing value. The historical values may be retrieved from one or more databases.

According to some examples, the inferring may comprise computing a mean, or applying a machine-learnt inference model. The mean may be taken over one or more historical values of the given variable for the given well, and/or one or more values of the given variable for neighbouring wells. The method of machine-learning may be a regression, for example a linear, a quadratic, or a logistic regression.

Such examples of pre-processing the dataset allow determining consistent filling values, thus limiting noise introduced in the dataset. As a result, the predicted model remains accurate.

The method of machine-learning may further comprise, prior to the learning, analysing the dataset to obtain a set of key variables from the one or more geoscience variables. Thus, the learning is based on a restriction of the dataset to the key variables associated to the respective ground truth value, by keeping only the values of the dataset corresponding to the key variables. As mentioned above, the key variables may be the variables of the dataset which are most representative of the concentration of the element. According to some examples of the method, the key variables are the variables most correlated with the concentration of the element. The correlation between the variables may be computed based on the corresponding values of the variables in the dataset and according to any known method for computing the correlation.

According to some examples, when the dataset comprises missing values of one or more variable for a number of wells, the analysing may be performed after determining a respective filling value for each missing value as explained above.

Selecting the key variables restricts the number of variables in the learning to the most explanatory one. This may increase efficiency.

In examples, the analysing the dataset to obtain the set of key variables may comprise performing at least one of principal component analysis (PCA), correspondence analysis (CA), multiple correspondence analysis (MCA), and classification by clustering. The analysing may comprise performing a PCA as a dimensional reduction technique and choosing less than five important variables (e.g., two important variables) as known in the literature. In other examples, the analyzing the dataset to obtain the set of key variables may comprise an analysis of the correlation of the variables with lithium. Correlations with Barium, Strontium correlate with the measured lithium isotopy d7Li = 2 for a crustal origin via hydrothermal fluids. Implementations of the methods that were tested are now discussed with reference to FIGs. 2 to 15.

In these tested implementations, the element of interest is lithium and the method for prediction predicts its concentration in the produced water of a given well belonging to a test sample taken from the provided dataset.

The dataset comprises 21000 groups of values corresponding to the variables of the exact list defined earlier. Each group of variables is in association to a ground truth value representing the concentration of lithium for one of 1300 production wells in a reservoir. The concentration may be predicted in mg/L.

The provided dataset comprises several measurements for each well; thus, each well identifier is not unique in the dataset. In this case, optionally only one of the measurements for each well, for example the latest one, or an average of all of them may be used in training the predictive model. This option for reducing the dataset, i.e., unifying well identifiers such that each well has an equal representation, provides 1300 groups of values, each corresponding to the variables associated to one of the production wells in the reservoir. This subset of the dataset with unique well identifiers may be referred to as a reduced dataset.

Using the provided dataset and the reduced dataset are compared in the below results.

The method of machine-learning may start by performing a PCA on geochemical variables to obtain one or more dimensions, i.e., axes. Each of the dimensions obtained in PCA may be a linear combination of the variables of the dataset analyzed by PCA. In reference to FIG. 2, the percentage on each axis represents the percentage explained variance by that dimension (represented by columns). The variance of the dataset explained by the first two axes is 56.2%.

Alternatively, the machine-learning may be inputted with all the available data in the dataset, that is, no restriction of the dataset to the key variables. Tests for this alternative were also performed and provided better results (FIG.s 7 and 8).

FIG. 3 represents the correlation of each of the geochemical variables with the first and second axes of FIG. 2, i.e., the two best axes. The correlation circle is the visual representation of the contributions of the variables on the two best axes. Each arrow represents one variable. Its direction represents its affinity with each axis and its length represents its contribution (i.e., importance) on the axis. Parallel variables are correlated: those in the same direction positively, those in the opposite direction negatively. The projection on axis 2 of the concentration of lithium is correlated to that of barium, strontium and inversely to that of sulfate. Further, the projections on axis 2 of the density and concentration of chloride, sodium, borate, calcium, magnesium, potassium, and hardness are correlated and they are all inversely correlated to the projection on axis 2 of the pH of the water.

FIG. 4 presents a projection of the dataset points on the two best axes. The points were separated into three classes according to their lithium concentration: higher than lOOmg/L (A), higher than 40mg/L (B), lower than 40mg/L (C) The directions of correlation are the same as in the correlation circle of FIG. 3. The points with a high lithium concentration are found in the same direction as in the correlation circle. The majority of the points spread on axis 1 and a part on axis 2 approximately in the direction of the correlation with lithium. The wells with a high lithium concentration are mostly clustered at the top.

The model is trained on restriction of the dataset to three sets of the variables. There is first a group I composed of the three variables the most correlated to lithium (barium, strontium and sulfate), a second group II composed of all the geochemical variables of the exact list defined earlier, and a third group III composed of all variables of the exact list defined earlier.

Further, the implementations comprise five different predictive models to train: a linear regression optimized with SGD, a KNN algorithm, an RF algorithm and a NN network on the dataset and the sub-dataset. The RF algorithm comprises 100 decision trees with a maximum depth of 10.

In tested examples, the supervised learning was performed on the values of the dataset corresponding to at most 80% of the group of values in the dataset/reduced dataset, and the associated ground truth values.

For each model and for each dataset or sub-dataset the implementations may train a group of models, e.g., five groups of models. The training of a group of models on the same dataset is a technique called the cross-validation. It consists of choosing randomly N Training-Validation pairs in the original dataset, to train as many models on each training set, and to calculate their performance on the validation set. Means, standard deviations or any other type of analysis can then be computed on the results in order to obtain a more complete and stable analysis of the models' behavior. Cross-validation may be performed according to the known methods in the literature.

In reference to FIG. 5, the details of cross-validation are presented. Each dataset may be divided into five Training-Validation pairs, with training representing 80% of the information (i.e., group of variables in the dataset/reduced dataset and their associated ground truth values) and validation on the remaining 20%. The class in FIG. 5 refers to the classes A to C presented in FIG. 4.

In cases of using the dataset with more than one measurement per well, the implementations may restrict the measurements from the same well to be divided between the training set and the validation set to avoid over-fitting. In reference to FIG. 6 a splitting method may be used to keep all the values corresponding to a well in a same set. The class in FIG. 6 refers to the classes A to C presented in FIG. 4 and the sample index on the horizontal axis is presented in percentage.

The error used to train and validate the models may be the mean absolute error (MAE), i.e. the absolute distance between the actual value of the concentration of the element and the prediction. This error has the advantage of being easily comparable to a real application.

A comparison of different predictive models and for different sets of variables is now explained in reference to FIG.s 7 and 8.

FIG. 7 presents the result by using the reduced dataset and FIG. 8 presents the result for the whole dataset, for each group I to III (from the top to the bottom). The number recorded for each model is the mean MAE and the black bar is the variance of the cross-validation errors by considering the five groups of the cross-validation. The values at 0 indicate results that are too irrelevant to be displayed. The Mean is FIG.s 7 and 8 refers to taking a mean average on ground truth values to predict the concentration.

According to the results of FIG.s 7 and 8, the RF algorithm trained on the reduced dataset performs the best for all the three sets of variables available. The use of a reduced dataset containing only once each well induces better performance than the use of the complete dataset. It is thought that a reduced dataset eliminates the possible problem of over-representation, i.e., over-fitting, and introduces less noise was introduced in the data, such that the predictor eventually is more accurate.

FIG.s 9 - 14 presents the results of this RF algorithm trained on all variables of the reduced dataset (group III).

FIG. 9 presents a scatter plot on the validation set between actual and predicted concentrations. The points below the red line are underestimated and those above are overestimated. The distribution of points shows that the prediction model correctly estimates the real concentrations. The conical shape of the distribution shows that the error increases for higher lithium concentrations. This is due to the under-representation of wells with high lithium concentrations in the dataset, see class A in FIG. 4.

FIG. 10 presents a confusion matrix on the validation set between actual and predicted concentration classes. The small number of points present in category A (concentrations greater than 100 mg/L) makes their error proportionally small compared to the other groups, although 5 of the 6 points were misclassified as category B. This is due to the fact that the objective of the model is to minimize the distance to the actual concentrations (see MAE) rather than minimizing the error by class.

FIG. 11 presents a histogram comparing the distributions between the set of actual (blue) known concentrations and the set of predicted (orange) concentrations whose actual value is not measured. The histogram shows that the distribution of actual and predicted lithium concentrations both follow a normal distribution with the expected value, i.e., the ground truth value, of predicted values slightly higher than actual values. The variance of the predicted concentration values is lower than the variance of the actual data. The model therefore tends to predict values closer to the mean.

FIG. 12 presents a histogram comparing the distributions in the validation set between actual and predicted concentrations. In practice, only wells with a lithium (Li) concentration higher than 40 mg/L may be considered exploitable based on the selected extraction technologies. The lithium in the produced water may be found in form of lithium carbonate where one gram lithium may be extracted from 5.323 grams of lithium carbonate. FIG. 13 presents a lithium carbonate forecast profile, proportional to water forecasted production. FIG. 14 present a forecasted lithium carbonate (tons) production year by year for wells with a predicted concentration above 40 mg/L (right).

The methods were also tested when there exist some missing values for the variables of the exact list defined earlier and for a subset of the variables in the dataset. The subset of variables may be selected to have the maximum completeness, i.e., fewest missing values. In the second example PCA is not applied. The missing values may be filled by inferring from the values of the neighboring wells and/or from historical values. The filling method ensures that the dataset does not introduce a bias or noise into the model's training.

The method trains an RF algorithm as described above on the values of the dataset corresponding to 1300 wells and obtain similarly good performance. FIG.s 15A-B present (superposition of) scatter plots and confusion matrices on the validation set between actual and predicted concentrations and concentration classes for a confusion threshold equals to 40 (FIG. 15A) and a confusion threshold equals to 100 (FIG. 15B). The values on the plot presents the frequency percentage of each of the prediction regions of the validation set. The superposition of a confusion matrix with the comparison of predictions with their ground truth enables a quick review of False Positives and False Negatives miss-classified elements

Claims

26 CLAIMS

1. A computer-implemented method of machine-learning a predictive model configured for predicting a concentration of an element in produced water of a given well of hydrocarbon production in a hydrocarbon reservoir having wells of hydrocarbon production, the method comprising: providing a dataset comprising values of one or more geoscience wellwise variables, each value corresponding to a respective well of hydrocarbon production in the hydrocarbon reservoir other than the given well, each value that corresponds to a respective well being associated to a respective ground truth value representing a concentration of the element in the respective well; and learning the predictive model based on the dataset.

2. The method of claim 1, wherein the one or more geoscience variables comprise: one or more geochemical variables, one or more geological variables, one or more well design variables, and/or one or more production variables.

3. The method of any of claims 1 or 2, wherein: the one or more geochemical variables include: a pH, an amount of dissolved salt, a water density, a CO2 concentration, a bicarbonate concentration, a chloride concentration, a sulphate concentration, a sodium concentration, a potassium concentration, a magnesium concentration, a calcium concentration, a strontium concentration, a barium concentration, an iron concentration, a hydrogen sulfide concentration, a manganese concentration, and/or a zinc concentration; the one or more geological variables include: a water saturation, and/or an identifier for fault presence; the one or more well design variables include: one or more dimensional variables, one or more positional variables, an identifier of a connected pipeline, and/or one or more elevation variables; and/or the one or more production variables include: a date of first production, a gas production value, a volume of injected fracking water, a volume of injected propane.

4. The method of claim 1 to 3, wherein the dataset comprises missing values of one or more variables for a number of wells of hydrocarbon production in the hydrocarbon reservoir, and the method comprises determining a respective filling value for each missing value.

5. The method of claim 4, wherein the one or more variables include the following variables which each have at least one missing value: a fault presence, an identifier of a connected pipeline, a volume of injected fracking water, a volume of injected propane, and/or one or more elevation variables.

6. The method of any of claims 4 and 5, wherein the number of missing values of the dataset is lower than 30% of the total number of values in the dataset, for example lower than 20% of the total number of values in the dataset.

7. The method of any of claims 4 to 6, wherein determining a given respective filling value for a given missing value of a given variable for the given well comprises inferring the given missing value from one or more values of the given variable in the dataset.

8. The method of claim 7, wherein the inferring is performed from: historical values of the given variable for the given well, and/or values of the given variable for neighbouring wells.

9. The method of claim 7 or 8, wherein the inferring comprises: computing a mean; or applying a machine-learnt inference model.

10. The method of any of claims 1 to 9, wherein the method further comprises, prior to the learning, analysing the dataset to obtain a set of key variables from the one or more geoscience variables, the learning being then based on a restriction of the dataset to the key variables associated to the respective ground truth value.

11. The method of any of claims I to 11, wherein the element is one of lithium, cobalt, nickel, or cadmium. 29

12. A computer-implemented method for predicting a concentration of an element in produced water of a given well of hydrocarbon production in a hydrocarbon reservoir having wells of hydrocarbon production, the method comprising: providing a predictive model learnt according to the method of any one of claims 1 to 11; and predicting the concentration of the element in the given well by applying the predictive model to values of the one or more geoscience well-wise variables corresponding to the given well.

13. A computer program comprising instructions for performing the method according to any one of claims 1 to 11 and/or the method according to claim 12.

14. A device including computer readable storage medium having recorded thereon the computer program of claim 13.

15. The device of claim 14, wherein the device further comprises a processor coupled to a memory and a graphical user interface, the memory having recorded thereon the computer program of claim 13.