CN114723095A

CN114723095A - Missing well logging curve prediction method and device

Info

Publication number: CN114723095A
Application number: CN202110008586.6A
Authority: CN
Inventors: 冯周; 武宏亮; 徐彬森; 王克文; 刘鹏; 李雨生
Original assignee: Petrochina Co Ltd
Current assignee: Petrochina Co Ltd
Priority date: 2021-01-05
Filing date: 2021-01-05
Publication date: 2022-07-08

Abstract

The invention discloses a method and a device for predicting a missing logging curve, wherein the method comprises the following steps: acquiring a logging curve, horizon data, well position data and geological information of a research area; preferably selecting a logging curve combination used for establishing a machine learning network model according to the correlation between a preset curve to be predicted and other logging curves of a complete well section of the logging curve of the research area; determining the weight of sample well data used for machine learning model training according to the well logging curve, the horizon data, the well location data and the geological information of the research area; constructing a machine learning network model; training and verifying the machine learning network model by using the sample well data; and predicting by using the trained and verified machine learning network model according to the known logging curve of the well to be processed in the research area to obtain the missing logging curve of the well to be processed. The method and the device can improve the accuracy of predicting the missing curve by using a machine learning method.

Description

Missing well logging curve prediction method and device

Technical Field

The invention relates to the technical field of logging curve prediction of complex lithologic reservoirs such as carbonate rock, volcanic rock, shale and the like, in particular to a missing logging curve prediction method and a missing logging curve prediction device.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

Logging is an important technical means for oil and gas exploration and development, and qualitative identification and parameter quantitative calculation of an oil and gas reservoir can be realized by processing and analyzing underground rock geophysical characteristic curve data obtained by measurement, such as acoustics, radioactivity, electricity and the like, and key data support is provided for comprehensive evaluation of the oil and gas reservoir. However, due to the complex underground situation, various unpredictable and unavoidable problems such as borehole diameter expansion and instrument failure exist in the measurement process, and due to human reasons such as improper operation of logging, consideration of economic factors and the like, distortion or missing of logging data of partial well sections often occurs in practical application, the missing parts and even the whole logging curve can bring great challenges to evaluation work of reservoir logging, and curve prediction is a common technical method for solving the problems.

The traditional missing logging curve prediction mainly depends on the internal connection among various logging data to be directly carried out, for example, the empirical relationship between a curve to be predicted and one or more known curves is determined by methods such as cross plot, multiple regression and the like, but because the underground condition is complex and the heterogeneity is strong, the logging data often presents extremely strong nonlinear relationship, the mapping relationship among the data is also extremely complex, and the practical application effect is poor. In recent years, with the wide application of machine learning methods in the fields of science and engineering, many researchers also propose to use data-driven methods to solve geological problems, such as logging curve prediction by using methods of Support Vector Machines (SVMs), Fuzzy Logic Models (FLMs), Artificial Neural Networks (ANN), and the like, but these methods essentially construct a mapping relationship between point-to-point or depth sequences, do not consider the relevance and difference between sample data used for establishing a prediction model and a well to be predicted in a hydrocarbon reservoir geological structure, formation lithology change, and the like, which is contrary to actual geological analysis experience and geological thinking, and therefore the accuracy of predicting and generating a logging curve is low.

Disclosure of Invention

The embodiment of the invention provides a method for predicting a missing well logging curve, which is used for improving the accuracy of predicting and generating the well logging curve and comprises the following steps:

acquiring a logging curve, horizon data, well position data and geological information of a research area, wherein the geological information comprises geological structures or sedimentary facies or lithofacies paleogeography;

preferably selecting a logging curve combination used for establishing a machine learning network model according to the correlation between a preset curve to be predicted and other logging curves of a complete well section of the logging curve of the research area;

determining the weight of sample well data used for machine learning model training according to a logging curve, horizon data, well position data and geological information of a research area, wherein the sample well is a logging curve completion well of the research area, and the sample well data are logging curve values of different depth positions in the sample well;

constructing a machine learning network model;

training and verifying the constructed machine learning network model by using sample well data;

and predicting the missing well logging curve by using the trained and verified machine learning network model according to the known well logging curve of the well to be processed in the research area to obtain the missing well logging curve of the well to be processed.

The embodiment of the invention also provides a device for predicting the missing well logging curve, which is used for improving the accuracy of predicting and generating the well logging curve, and comprises the following components:

the system comprises an information acquisition module, a data acquisition module and a data processing module, wherein the information acquisition module is used for acquiring a logging curve, horizon data, well bit data and geological information of a research area, and the geological information comprises geological structure or sedimentary facies or lithofacies paleogeography;

the curve combination priority module is used for preferably selecting a logging curve combination used for establishing a machine learning network model according to the correlation between a preset curve to be predicted and other logging curves of a complete well section of the logging curve of the research area;

the weight determination module is used for determining the weight of sample well data used for machine learning model training according to a logging curve, horizon data, well position data and geological information of a research area, wherein the sample well is a logging curve completion well of the research area, and the sample well data are logging curve values of different depth positions in the sample well;

the machine learning network model building module is used for building a machine learning network model;

the training and verifying module is used for training and verifying the machine learning network model by using the sample well data;

and the prediction module is used for predicting the missing well logging curve by utilizing the trained and verified machine learning network model according to the known well logging curve of the well to be processed in the research area to obtain the missing well logging curve of the well to be processed.

The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the missing well logging curve prediction method when executing the computer program.

Embodiments of the present invention further provide a computer-readable storage medium, where a computer program for executing the above-mentioned missing log prediction method is stored in the computer-readable storage medium.

Compared with the prior art, the embodiment of the invention provides a method for weighting the sample data set used for establishing the logging curve prediction model, and the weight of the sample well data used for training the machine learning model is determined according to the logging curve, the horizon data, the well location data and the geological information of the research area, so that the contribution degree of the training sample well data in the prediction model training is increased, and the precision of predicting the missing curve by using the machine learning method is obviously improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts. In the drawings:

FIG. 1 is a flow chart of a missing log prediction method in an embodiment of the present invention;

FIG. 2 is a more detailed flow chart of missing log prediction in an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an LSTM unit in an embodiment of the present invention;

FIG. 4 is a diagram of a data information transfer process corresponding to five layers of LSTMs in an embodiment of the present invention;

FIG. 5 is a schematic illustration of a weight calculation based on well distance in an embodiment of the present invention;

FIG. 6 is a schematic diagram of weight calculation based on geologic properties according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating the effect of missing curve prediction according to the present invention;

FIG. 8 is a comparison of the predicted effect of the method of the present invention compared to the conventional method in the embodiment of the present invention;

FIG. 9 is a block diagram of a missing log prediction apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

In order to achieve the above object, the present invention provides the following technical solutions, as shown in fig. 1 and fig. 2:

step 102: and acquiring a logging curve, horizon data, well position data and geological information of the research area, wherein the geological information comprises geological structures or sedimentary facies or lithofacies paleogeography. And the method can also comprise data such as core experiment analysis and the like.

Specifically, the well log may include well logs of natural gamma, gamma energy spectrum, resistivity, three-porosity, and the like; the well location data refers to the coordinates of each well in the work area and can be used for determining the positions of all wells and the distance between the wells; horizon data refers to geological stratification of each well; the geologic structure/sedimentary facies/lithofacies paleogeography refers to a geologic structure diagram, a sedimentary facies diagram of each layer system, a lithology paleogeography diagram and the like drawn by geological research in a research area; the results of the invention can be further verified by core experimental analysis data.

Step 104: and preferably selecting a logging curve combination used for establishing a machine learning network model according to the correlation between a preset curve to be predicted and other logging curves of the complete well sections of the logging curves in the research area.

Specifically, the common well log curves include acoustic time difference (AC), neutron (CNL), Density (DEN), natural Gamma (GR), uranium removed gamma (CGR), deep Resistivity (RD), shallow Resistivity (RS), photoelectric adsorption cross-section index (PE), and the like, and are recorded as curves

The waiting curve is recorded as

i₁Represents a curve order (1. ltoreq. i)₁N is less than or equal to n, n is the total number of logging curves), j₁Also represents the order of the curves (1. ltoreq. j)₁≤k₁，k₁Total number of curves to be predicted).

In the well section with complete logging curves, any X is assumed_iAnd Y_jPearson's correlation coefficient phase

If present, then in ρ_ijIs n × k of elements₁The order matrix is called the correlation matrix of the random vector of the dimension and is recorded asR, expressed as:

wherein the content of the first and second substances,

and

to represent

And

expectation of variable, D (X)_i) And D (Y)_j) To represent

And

variance of the variables.

The correlation between each logging curve and the curve to be predicted can be visually and quantitatively evaluated through the correlation coefficient of the corresponding row and column in the correlation matrix, and generally, the correlation coefficient is

Indicating that the degree of correlation between the two curves is low, when the correlation coefficient is low

Indicating that the correlation between the two curves is moderate, when the correlation coefficient is

Indicating a high degree of correlation between the two curves. By setting a correlation coefficient thresholdTherefore, the well logging combination with high correlation can be preferably determined to predict each target curve. The correlation coefficient threshold may be set to 0.3.

Step 106: determining weights of sample well data used for machine learning model training according to the well logging curves, horizon data, well location data and geological information of the research area, wherein the sample well data is an optimal well logging curve combination.

The existing method for predicting the missing curve by using machine learning is mainly developed from the aspects of a machine learning method and a network structure, and does not consider the relevance and the difference between sample data used for establishing a prediction model and a well to be predicted, such as the geological structure of an oil and gas reservoir, the lithological change of a stratum and the like, so that the applicability is limited, and the prediction precision is low. The invention provides a train sample weight changing idea and a method aiming at the problem, and optionally, the weights of data points in sample wells for machine learning model training can be determined by one or more of the following methods.

The sample wells mentioned below refer to typical wells in the study area whose well logs are in the same well, while the sample well data refer to well log values at different depth locations in the sample well.

The well to be treated is a well with a missing log.

1) The weight of each sample well data point is determined using the distance between the well to be processed and the sample well.

In general, in a field block, the distance between two wells reflects the similarity between two wells, and particularly in a field area with relatively undeveloped and stable sediments, the smaller the distance between two wells, the smaller the difference between the lithology, physical properties and oil and gas properties of the two wells in the same stratum. Therefore, when performing missing curve prediction, sample data of a neighboring well of the well to be processed may be set to a larger weight, and sample well data distant from the well to be processed may be set to a smaller weight. Setting the coordinates of the well position of the well A to be treated as (x)₀,y₀) And the well position coordinate of the B sample well is (x)_b,y_b) Then, when the sample data points of the B well are trained in the machine learning model, the weight coefficients of all the sample data points are uniformly setThe method comprises the following steps:

wherein, w_dbIs the weight coefficient of the data point of the B well, w is more than or equal to 0_dbLess than or equal to 1; a. b is a weight value calculation coefficient of the work area, and generally, a is 1, and b is 0; l is_mThe maximum well spacing of the work area.

Calculating the weight coefficient w of all sample wells participating in training_dThen, the weights of all sample data points corresponding to all sample wells can be set to w_d。

2) And determining the weight of each sample well data point by using the geological properties of the well to be processed and the sample well.

In an oil field area with stable deposition and little change of the structure, the distance between the wells can generally represent the similarity, but in an area with large deposition change and complex structure movement, even if the distance between the two wells is very close, the formation characteristic and the logging curve response characteristic can also change remarkably, and the similarity is also influenced by the distribution of the deposition phases and the regional structure. Therefore, when missing curve prediction is carried out, sample data close to the geological property of the well to be processed can be set to be larger in weight, and sample data with obvious geological property difference can be set to be smaller in weight.

Setting the sedimentary facies of a certain stratum in a block to be divided into N types, firstly, selecting a plurality of typical wells in different sedimentary areas in a work area, and recording the number of the typical wells included in each sedimentary facies area as num_i(wherein i represents the number of the sedimentary phase belt). And recording a sedimentary facies region where the well to be processed is located as a reference region, and comparing the average similarity of the target curve to be predicted corresponding to each typical well of other sedimentary facies zones and the typical well of the reference region. Wherein the average similarity of the preset target curve to be predicted is preferably selected from typical wells in each sedimentary facies region, the preset target curve to be predicted is actually existed in the typical wells, and the curve similarity coefficient of the wells is used for determining the weight. The similarity between the two curves can be determined by Dynamic Time Warping (Dynamic Time Warping,DTW) method calculates the accumulated distance gamma for evaluation, and the smaller the accumulated distance gamma is, the higher the similarity of the two curves is.

Recording the similarity of a preset target curve to be predicted between wells as s, wherein the reciprocal of a power function taking e as a base is taken as a similarity evaluation index for the accumulated distance gamma:

s＝e^-γ；

the average similarity of a sedimentary facies belt to the reference zone is then:

wherein, num_referRepresenting the number of typical wells in the reference zone; num_iRepresenting the number of typical wells in a sedimentary facies zone; s_lkRepresenting the similarity between the first well in the reference area and the kth well in a certain sedimentary facies belt;

representing the average similarity of a sedimentary facies belt to a reference zone; i represents the number of the sedimentary phase belt; l and k are well numbers. Wherein s is_lkAccording to the formula s ═ e^-γAnd (4) calculating.

Accordingly, the weight coefficient can be set according to the sedimentary facies region of a layer of the sample well

If the sample well and the well to be treated belong to the same sedimentary phase region, w _f1. Calculating the weight coefficient w of a certain layer of all the sample wells participating in training_fThen, the weights of all sample data points corresponding to a certain layer of the sample wells can be set as w_f。

The geological structure or lithofacies ancient geography is similar, and the average similarity of a reference region typical well and other region typical wells is determined according to the region block where the well to be processed is located as a reference region.

3) And determining the weight of each sample well data point by utilizing the curve similarity of the well to be processed and the sample well.

Besides the distance between the well to be processed and the sample well and the geological attribute relation between the well to be processed and the sample well, the weight of each sample well data point can be determined by utilizing the similarity of the corresponding input curve combination of the well to be processed and the sample well.

The combination of input well logs for establishing the machine learning model determined in step 104 is assumed to be X_i′And the combination of the input well-logging curves of the A well of a certain sample is recorded as X_Ai′And the logging curve combination corresponding to the well to be processed is recorded as X_0i′. Calculating the similarity of the corresponding curve in a certain interval by the dynamic time warping method and marking the similarity as s_i′And then the weighting coefficient of the interval of the well A is as follows:

where ρ is_i′The assigned weight representing the ith' curve is given according to the correlation coefficient between each logging curve and the curve to be predicted in the step 104; s_i′The similarity between the ith curve of the well A and the corresponding curve of the well to be treated is shown; w is a_AsDetermining a weighting coefficient of the A well in the interval; and N is the total number of input logging curve combinations.

Calculating the weight coefficient w of all sample wells participating in training_sThen, the weights of all sample data points in the interval corresponding to all sample wells can be set to w_s。

4) Determining the weight of each sample well data point by the weight combination calculated by the methods

Setting the weight coefficient of a well determined by the distance between the well to be processed and the sample well as w_dThe weight coefficient determined by the geological properties of the well to be processed and the sample well is w_fThe weighting factor determined by the curve similarity of the well to be treated and the sample well is w_sThe weight coefficient w of the sample data point corresponding to the well can be finally determined according to the weight coefficient combination determined by one or more methods_c：

w_c＝(w_d+w_f+w_s)/3。

The above formula is selected from three kinds, if two kinds are selected, the corresponding weights are added and then divided by 2.

Step 108: and establishing a machine learning network model.

Specifically, common machine learning models include linear regression, Support Vector Machine (SVM), multi-layer neural network (MLP), etc., and the model structure thereof can be represented as an input layer, a hidden layer, and an output layer. The input layer is responsible for receiving signals, the hidden layer is responsible for decomposing and processing data, and the final result is integrated into the output layer.

Optionally, the invention adopts a long-short term memory network (LSTM) to establish a curve prediction model, and the long-short term memory network is an improved recurrent neural network, so that the problem that the traditional recurrent neural network cannot handle long-distance dependence can be solved, and the method is suitable for well logging curve prediction with deep sequence significance.

The model adopts 7-layer network structure, and the TensorFlow builds the LSTM model, wherein:

the first layer is an input layer, which is used for inputting sample point Data, and the Data structure is a three-dimensional tensor form of [ Batch _ size, Sequence _ length, and Data _ dim ], wherein the Batch _ size is the size of the number of input Data blocks, the size of the Batch _ size is the total number of sample points, the Sequence _ length is the length of Sequence information contained in each Data block, the size of the Sequence information is related to the sampling interval of the input logging curve, generally 10-80, the larger the sampling interval of the input logging curve is, the smaller the Sequence _ length is, and the Data _ dim is the curve dimension of the Data point, namely the number of curves contained in the logging curve combination determined in step 2. Thus, each Data block actually contains a Sequence _ length × Data _ dim Data body, and each Data block corresponds to one target output value Y _ output of the curve to be predicted and one target Weight _ Y.

The second to sixth layers of the network are standard LSTM layers, which will abstract deeper levels according to the principle of stacked recurrent neural networks. Each LSTM cell is shown in fig. 3. Fig. 4 shows a data information transfer process corresponding to the five-layer LSTM.

A plurality of thresholds are arranged inside the LSTM unit, and the working process of each gate can be expressed as follows:

the unit inputs the formula: z is a radical of^t＝g(W_z×x^t+R_zy^t-1+b_z)；

Input gate formula: i.e. i^t＝σ(W_i×x^t+R_iy^t-1+p_ie c^t-1+b_i)；

Forget gate formula: f. of^t＝σ(W_f×x^t+R_fy^t-1+p_fe c^t-1+b_f)；

Neuronal cell state formula: c. C^t＝i^te z^t+f^te c^t-1；

Output gate formula: o^t＝σ(W_ox^t+R_oy^t-1+p_oe c^t+b_o)；

The unit output formula: y is^t＝o^te h(c^t)。

Wherein z is^tIndicating LSTM cell input Module, x^tCharacteristic data representing input at time t, W_zRepresenting a parameter matrix between input data and input modules, y^t-1And y^tThe table is the output at t-1 and t (representing the hidden state in practice), g is the activation function of the input module of the LSTM unit, R_z、R_i、R_f、R_oRepresenting the weight of the hidden unit, and representing the utilization rate of the prediction result of the previous depth point; b_z、b_i、b_f、b_oAll represent bias parameters, p_i、p_f、p_o，c^t-1The vector value of the memory cell at time t-1, c^tIs the neuronal cell state i^t、f^t、o^tRespectively is an activation vector value W of an input gate, a forgetting gate and an output gate of a certain node of the LSTM neural network at the time t_iRepresenting a parameter matrix between the input module and the hidden layer cell unit, W_fRepresenting a parameter matrix between the input data and the forgetting module cell unit, and e representing a corresponding dot product between the two vectors.

The seventh layer of the network is an output layer, and each unit in the hidden layer and the output layer has a plurality of inputs which are connected with neurons in the upper layer until the output layer calculates and obtains a predicted value of the network.

Step 110: and training and verifying the constructed machine learning network model by using the sample well data.

Specifically, due to the difference of the logging principle and the physical response characteristic, different logging curve dimensions are different, the response value is also different, and the input curves participating in model operation are normalized before the machine learning model is trained, tested and applied. The normalized target interval is [0,1], and the transformation method is shown as follows:

wherein x is_iRepresenting the value of a curve at a certain depth point before normalization; x is the number of_i ^*Represents a normalized value; x is the number of_max、x_minRespectively representing the maximum and minimum response of the whole well section through the logging curve.

And after the data normalization processing is finished, training and verifying the machine learning model by using sample data of complete well sections of the logging curve. The process of model training is essentially to determine the parameters of the machine learning model (i.e., establish the correspondence between the target curve data and other well log data) using the known data of a certain target curve and other well log data, so as to predict reasonable curve results using other well log data in the absence of the well log data.

During the model training process, a certain percentage η may be randomly extracted from the data to be trained for verifying the convergence degree of the model (generally η ═ 0.2). For a neural network model, a back propagation algorithm is usually adopted for gradient updating to obtain weights and bias parameters in a hidden layer and an output layer of the model, so that the difference between an actually measured target curve and a curve predicted by the model is minimized, and an objective function can be expressed as:

wherein, Loss (Y)_pred,Y_data) A cost function representing the difference between the actually measured target curve and the curve predicted by the model; y is_predA curve value representing a model prediction; y is_dataValues of a target curve, w, representing actual measurements_tRepresenting the weight coefficients for each sample point determined according to step 106.

Generally, if Loss (Y)_pred,Y_data) And if the weight and the bias parameters are lower than a certain threshold and do not decrease along with the increase of the training times, the model is considered to be converged, and the obtained weight and the bias parameters can be considered as the optimal parameters obtained by the model training.

And in the model verification stage, a target curve predicted by the model is obtained by using the trained model weight and the bias parameters, then difference calculation is carried out on the target curve and the input actual logging curve, and when the difference value is within an acceptable range and the variation trend is similar to that in the training stage, the model parameters are considered to be stable, no overfitting phenomenon occurs, and the model verification stage can be used for predicting and processing subsequent actual data.

Step 112: and predicting the missing well logging curve by using the trained and verified machine learning network model according to the known well logging curve of the well to be processed in the research area to obtain the missing well logging curve of the well to be processed.

Specifically, in the well section to be processed, a predicted curve result is obtained by utilizing the trained machine learning model. The operation is similar to the model verification stage in step 110, and at this time, other well logging curve values are known and input into the trained machine learning model, so that a corresponding target curve prediction result can be obtained. It should be noted that the logging curve data input into the machine learning model also requires the normalization process described in step 110, and the prediction result output by the model needs to be denormalized to recover the prediction curve of the normal target logging response distribution range. The denormalization process formula is described as follows:

wherein the content of the first and second substances,

the prediction result is output according to a machine learning model at a certain depth point according to a certain target curve; y is_max、y_minThe maximum response value and the minimum response value of the logging curve given in step 110;

and the prediction curve value is the prediction curve value which is recovered to the normal logging response distribution range after the anti-normalization processing.

Examples

1. Taking the example that a shale oil reservoir in a certain block of the Daqing oil field utilizes a conventional logging curve to predict the transverse wave time difference, the conventional logging curve in the block comprises a sound wave longitudinal wave time difference (AC), neutrons (CNL), Density (DEN), natural Gamma (GR), deep Resistivity (RD), shallow Resistivity (RS) and the like, and the curve to be predicted is a sound wave transverse wave time Difference (DTS).

2. According to six conventional well logging curves of the well section, namely, the acoustic longitudinal wave time difference (AC), the neutrons (CNL), the Density (DEN), the natural Gamma (GR), the deep Resistivity (RD) and the shallow Resistivity (RS), and the acoustic transverse wave time Difference (DTS) obtained by well processing of the array acoustic logging data, calculating a correlation coefficient matrix of each curve according to the formula in the step 104, wherein the correlation coefficient matrix is shown in the following table 1:

TABLE 1

Coefficient of correlation	DTS
		AC	0.79
CNL	0.78
		GR	0.4
RS	-0.38
		RD	-0.42
DEN	-0.52

According to the description of step 104, if the correlation coefficient threshold is set to 0.3, it can be seen from the table that the absolute values of the correlation coefficients of the six curves of the acoustic longitudinal wave time difference (AC), the neutron (CNL), the natural Gamma (GR), the deep Resistivity (RD), the shallow Resistivity (RS), and the Density (DEN) and the acoustic transverse wave time Difference (DTS) are 0.79, 0.78, 0.4, 0.38, 0.42, and 0.52, respectively, and are greater than the correlation coefficient threshold of 0.3, which indicates that the six curves of the acoustic longitudinal wave time difference (AC), the neutron (CNL), the natural Gamma (GR), the deep Resistivity (RD), the shallow Resistivity (RS), and the Density (DEN) have better correlation with the acoustic transverse wave time Difference (DTS). Accordingly, the conventional well logging combination for predicting acoustic transverse wave time Difference (DTS) is preferably six curves of acoustic longitudinal wave time difference (AC), neutron (CNL), natural Gamma (GR), deep Resistivity (RD), shallow Resistivity (RS) and Density (DEN).

3. Weights are determined for data points in sample wells used for machine learning model training.

FIG. 5 is a schematic diagram of weight calculation based on well distance, in FIG. 5, well A is the target well, the coordinates of well location are (18551675, 3349782), B, C, D is the sample well, and the well location isCoordinates are (18555039, 3351527), (18550893, 3358441), (18541819, 3354836) respectively, a is 1, b is 0, and the maximum distance L of the work area is set_m20km, then the following formula is obtained in step 106:

w_dc＝0.565；

w_dd＝0.446；

thus, the sample data point weight coefficient for all sample data points in sample well B may be set to 0.811, the sample data point weight coefficient for all sample data points in well C may be set to 0.565, and the sample data point weight coefficient for all sample data points in well D may be set to 0.446.

As shown in FIG. 6, it is a sedimentary facies diagram of a region, and sedimentary facies of a certain stratum of the region can be divided into four categories of shallow lake, coastal lake, diversion bay and dike river. Setting a well A to be processed at a discriminant river facies zone, designating the discriminant river facies zone as a reference facies zone, and selecting typical wells A1, A2 and A3 in the zone; typical wells B1, B2, B3 were selected in the shallow lake phase zone, typical wells C1, C2 in the coastal lake phase zone, and typical well D1 in the diversion interbay bay phase zone. The similarities of the transverse wave time difference curves (DTS) of the wells B1, B2, B3, C1, C2 and D1 with those of the wells A1, A2 and A3 are compared. The similarity between the two curves can be evaluated by calculating the cumulative distance γ by a Dynamic Time Warping (DTW) method. And recording the similarity of a certain stratum curve between wells as s, wherein the reciprocal of a power function taking e as the base for the accumulated distance gamma is taken as a similarity evaluation index. Taking the calculation of the similarity of the transverse wave time difference curves (DTS) of the B1 and a1 wells in a certain stratum as an example, assuming that the cumulative distance γ calculated by dynamic time warping is 0.89, the similarity of the two curves is:

s_b1a1＝e^-0.89＝0.41；

similarly, the similarity between B1, B2 and B3 and a1, a2 and A3 can be calculated as:

then the similarity of the shallow lake phase zone and the reference zone is evaluated as follows:

similarly, the evaluation similarity between the lake-shore phase zone and the reference zone can be determined as

The similarity between the shunted bay and the reference zone was evaluated as

Accordingly, the weight of the sample data point of the interval of the sample well can be set, for example, in the B well in fig. 6, the well position is in the shallow lake phase zone, the weight coefficient is set to 0.43, the weight of the sample data point of the interval of the C well is set to 0.67, and the weight of the sample data point of the interval of the D well is set to 0.52.

Taking a certain block of the Daqing oil field as an example, the input logging curves of the machine learning model are combined into sound wave longitudinal wave time difference (AC), neutrons (CNL), natural Gamma (GR), deep Resistivity (RD), shallow Resistivity (RS) and Density (DEN), and the similarity of each curve of the well B and the well to be processed in a certain interval is calculated to be s respectively by the dynamic time warping method according to the above_AC＝0.83、s_CNL＝0.66、s_GR＝0.81、s_RD＝0.76、s_RS＝0.75、s_DEN0.61, the weighting factor of the interval of the B well is:

w_Bs＝(0.79×0.83+0.78×0.66+0.4×0.81+0.38×0.76+0.42×0.75+0.52×0.61)/6＝0.43

similarly, the weight of the data point of each curve of other sample wells and the similarity of the curve of the well to be treated in a certain interval can be determined after calculation.

After the weight coefficients of the sample well data are obtained through calculation by the methods, the weight coefficients of the sample data points corresponding to the well can be finally determined according to the combination of the weight coefficients determined by one or more methods, for example, the weight coefficient determined by the distance between the well and the sample well of the B well is 0.81, the weight coefficient determined by the geological properties of the well to be processed and the sample well is 0.43, and the weight coefficient determined by the curve similarity between the well to be processed and the sample well is 0.43, then the weight coefficient of the sample data points corresponding to the well is finally determined:

w_cb＝(0.81+0.43+0.43)/3＝0.56。

4. establishing machine learning network model

The invention adopts a long-short term memory network (LSTM) to establish a curve prediction model, the model adopts a 7-layer network structure, and a TensorFlow establishes an LSTM model. The model hyper-parameters are set as follows:

inputting the logging curve quantity data _ dim to be 6;

sequence length number seq _ length ═ 20;

the neuron number of each layer of hidden layers, hidden _ dim, is 49;

outputting the dimension output _ dim of the curve to be 1;

the LSTM layer number n _ layers is 5;

dropout ratio Dropout _ rate is 0.2;

learning rate learning _ rate is 0.005;

BATCH SIZE BATCH _ SIZE 640;

all data iteration rounds number EPOCHS is 30.

5. The machine learning model is trained and validated using sample well data.

6. At the interval to be processed, the predicted curve results are obtained using the machine learning model trained in step 108.

And obtaining a predicted curve result by utilizing the trained machine learning model at the well section to be processed. The operation is similar to the model verification stage in step 110, and at this time, other well logging curve values are known and input into the trained machine learning model, so that a corresponding target curve prediction result can be obtained.

FIG. 7 is a comparison between the predicted effect of the sample data point weight setting of the present invention and the predicted effect of the conventional method, wherein the predicted results of both methods are based on long-term short-term memory network (LSTM), and as shown in step 108, the hyper-parameters of the network model are kept consistent, thereby ensuring the comparability of the predicted results. In the figure, the first path is a borehole diameter curve, the second path is a natural gamma curve, the third path is a depth curve, the fourth path is a stratum layer position, the fifth path is a deep and shallow resistivity curve, the sixth path is a density curve, the seventh path is a compensated neutron curve, the eighth path is a longitudinal wave time difference curve, the ninth path is a transverse wave time difference curve actually measured through the array acoustic logging, and the tenth path is a comparison between a transverse wave time difference curve predicted by a traditional equal sample data point weight method and an actually measured transverse wave time difference curve; the tenth step is to set the comparison between the transverse wave time difference curve predicted by the sample data point weight and the measured transverse wave time difference curve after the weights calculated by the three methods are averaged as described in step 106 (i.e., the method 4 described in step 106). It can be seen from the figure that the transverse wave time difference curve predicted by the method of the invention is obviously more consistent with the actually measured transverse wave time difference curve in the curve variation trend and the curve amplitude, and the prediction precision is higher.

Fig. 8 is a comparison of the prediction effect of the method of the present invention and the conventional method, where the left graph is the comparison between the transverse wave time difference curve predicted by the conventional equal sample data point weight method and the measured transverse wave time difference curve, and the right graph is the comparison between the transverse wave time difference curve predicted by the sample data point weight method and the measured transverse wave time difference curve set according to the present invention after the weights calculated by the three methods are averaged in step 106 (i.e., the method 4 described in step 106), as can be seen from the comparison, the data points of the conventional method prediction result and the measured transverse wave time difference curve are dispersed, and the correlation coefficient is about 0.37, but the data points of the method of the present invention and the measured transverse wave time difference curve are more concentrated, and the correlation coefficient is about 0.50, and the accuracy is higher.

The embodiment of the invention also provides a device for predicting the missing log, which is described in the following embodiment. Because the principle of solving the problems of the device is similar to that of the missing logging curve prediction method, the implementation of the device can refer to the implementation of the missing logging curve prediction method, and repeated parts are not described again.

Fig. 9 is a block diagram of a missing log prediction apparatus according to an embodiment of the present invention, and as shown in fig. 9, the missing log prediction apparatus includes:

the information acquisition module 02 is used for acquiring a logging curve, horizon data, well position data and geological information of a research area, wherein the geological information comprises geological structures or sedimentary facies or lithofacies paleogeography;

the curve combination optimization module 04 is used for optimizing a logging curve combination used for establishing a machine learning network model according to the correlation between a preset curve to be predicted and other logging curves of a complete well section of the logging curve of the research area;

the weight determining module 06 is configured to determine a weight of sample well data used for machine learning model training according to a well logging curve, horizon data, well location data, and geological information of a research area, where the sample well is a well logging curve completion well of the research area, and the sample well data is well logging curve values at different depth positions in the sample well;

a machine learning network model construction module 08, configured to construct a machine learning network model;

the training and verifying module 10 is used for training and verifying the machine learning network model by using sample well data;

and the prediction module 12 is configured to perform missing well log prediction by using the trained and verified machine learning network model according to a known well log of a well to be processed in the research area, so as to obtain a missing well log of the well to be processed.

In an embodiment of the present invention, the curve combination optimization module is specifically configured to:

calculating the correlation between a preset curve to be predicted of a complete well section of a logging curve of a research area and other logging curves;

and comparing the correlation with a preset correlation coefficient threshold, and selecting corresponding logging curves to form a logging curve combination when the correlation exceeds the preset correlation coefficient threshold.

In an embodiment of the present invention, the weight determining module is specifically configured to:

determining weights for sample well data used for machine learning model training using one or more of the following combinations:

determining the weight of sample well data used for training a machine learning model according to well data;

or determining the weight of sample well data used for training a machine learning model according to geological information, a logging curve and horizon data of a research area;

or, determining weights for sample well data used for machine learning model training based on the well log and horizon data for the region of interest.

weights for sample well data used for machine learning model training are determined from well data and horizon data as follows:

and calculating the distance between the well to be processed and the sample well according to the well bit data of the well to be processed and the well bit data of the sample well, and determining the weight of the sample well data used for training the machine learning model according to the distance.

determining weights for sample well data for machine learning model training based on the distances according to the following formula:

wherein, w_dbIs the weight coefficient of the sample well data, w is more than or equal to 0_dbLess than or equal to 1; a. b is a weight value calculation coefficient of the work area, and generally, a is set to be 1, and b is set to be 0; l is_mThe maximum well spacing of the work area is defined; (x)₀,y₀) As well location coordinates of the well to be treated, (x)_b,y_b) Is the well location coordinates of the sample well.

determining weights for sample well data for machine learning model training from geological information, well logs, and horizon data for a study region as follows:

dividing a plurality of blocks based on geological information of a research area;

selecting a plurality of typical wells in each block;

taking the block where the well to be processed is located as a reference area, and determining the average similarity of the typical well of the reference area and the typical wells of other blocks;

the average similarity is used as a weight for sample well data for machine learning model training.

and taking the block where the well to be processed is located as a reference area, and determining the average similarity of the reference area typical well and other block typical wells as follows:

wherein, num_referRepresenting the number of typical wells in the reference zone; num_iRepresenting the number of typical wells in the ith sedimentary facies band; s_lkRepresenting the similarity between the ith well in the reference zone and the kth well in the ith sedimentary facies belt;

representing the average similarity of the ith sedimentary phase belt to the reference zone; i represents the number of the sedimentary phase belt; l and k are well numbers.

weights for sample well data used for machine learning model training are determined from the well log and horizon data for the study region as follows:

determining a preferred logging curve combination of the sample well and a preferred logging curve combination of the well to be processed;

determining the similarity of the logging curves corresponding to each layer based on the optimal logging curve combination of the sample well and the optimal logging curve combination of the well to be processed;

determining weights for sample well data used for machine learning model training based on the similarities.

determining weights for sample well data used for machine learning model training based on the similarities according to the following formula:

where ρ is_i′Assigned weights representing the ith' curve; s_i′Similarity between the ith' curve of a certain layer of the sample well and the corresponding curve of the well to be processed; w is a_AsWeighting coefficients of the sample well in the corresponding interval; and N is the total number of curves in the logging curve combination.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program for executing the missing well log prediction method is stored in the computer-readable storage medium.

In the embodiment of the invention, compared with the conventional processing method, the logging curve prediction method provided by the invention can perform weighting processing on the sample data set used for establishing the logging curve prediction model, so that the weights of data points of different wells and different intervals in the sample data set during model training can be set according to the well position distance, the oil reservoir geological model, the curve form characteristics and the like, and the weights of the sample well data which are close to the well to be processed, have similar geological characteristics and are similar to the curve form are set to be higher, so that the contribution degree of the sample well data in the prediction model training is increased, and the accuracy of predicting the missing curve by using a machine learning method is obviously improved.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for missing log prediction, comprising:

acquiring a logging curve, horizon data, well bit data and geological information of a research area, wherein the geological information comprises geological structures or sedimentary facies or lithofacies paleogeography;

constructing a machine learning network model;

2. The missing log prediction method of claim 1, wherein the combination of logs preferably used to build a machine learning network model based on correlations between pre-set curves to be predicted and other logs for a full range of logs for a study area, comprises:

3. The method of missing log prediction of claim 1 wherein determining weights for sample well data for machine learning model training based on logs, horizon data, well site data, and geological information for the study area comprises:

determining weights of sample well data used for training a machine learning model according to well data;

or determining the weight of sample well data used for machine learning model training according to geological information, well logging curves and horizon data of a research area;

4. The missing log prediction method of claim 3, wherein determining weights for sample well data used for machine learning model training from well data comprises:

and calculating the distance between the well to be processed and the sample well according to the well position data of the well to be processed and the well position data of the sample well, and determining the weight of the sample well data used for machine learning model training according to the distance.

5. The missing log prediction method of claim 4 wherein determining weights for sample well data used for machine learning model training based on the distances is determined according to the following equation:

6. The missing log prediction method of claim 3 wherein determining weights for sample well data used for machine learning model training based on geological information, logs, and horizon data for the study area comprises:

selecting a plurality of typical wells in each block;

7. The missing log prediction method of claim 6, wherein the block in which the well to be processed is located is taken as a reference zone, and the average similarity between the reference zone representative well and other block representative wells is determined as follows:

wherein, num_referRepresenting the number of typical wells in the reference zone; num_iRepresenting the number of typical wells in the ith sedimentary facies band; s is_lkRepresenting the similarity between the ith well in the reference zone and the kth well in the ith sedimentary facies belt;

8. The missing log prediction method of claim 3 wherein determining weights for sample well data used for machine learning model training from the log and horizon data for the region of interest comprises:

9. The missing log prediction method of claim 8, wherein weights for sample well data used for machine learning model training are determined based on the similarities according to the following formula:

where ρ is_i′Assigned weights representing the ith' curve; s_i′Similarity between the ith' curve of a certain position of the sample well and the corresponding curve of the well to be processed; w is a_AsWeighting coefficients of the sample wells at the corresponding horizons; and N is the total number of curves in the logging curve combination.

10. A missing log prediction device, comprising:

the information acquisition module is used for acquiring a logging curve, horizon data, well position data and geological information of a research area, wherein the geological information comprises geological structures or sedimentary facies or lithofacies paleogeography;

the curve combination optimization module is used for optimizing a logging curve combination used for establishing a machine learning network model according to the correlation between a preset curve to be predicted and other logging curves of a complete well section of the logging curve of the research area;

11. The missing log prediction device of claim 10 wherein the curve combination optimization module is specifically configured to:

12. The missing log prediction device of claim 10 wherein the weight determination module is specifically configured to:

13. The missing log prediction device of claim 12 wherein the weight determination module is specifically configured to:

weights for sample well data used for machine learning model training are determined from well data as follows:

14. The missing log prediction device of claim 13 wherein the weight determination module is specifically configured to:

wherein, w_dbIs the weight coefficient of the sample well data, w is more than or equal to 0_dbLess than or equal to 1; a. b is a weight value calculation coefficient of the work area, and generally, a is set to be 1, and b is set to be 0; l is a radical of an alcohol_mThe maximum well spacing of the work area is defined; (x)₀,y₀) As well location coordinates of the well to be treated, (x)_b,y_b) Is the well location coordinates of the sample well.

15. The missing log prediction device of claim 12 wherein the weight determination module is specifically configured to:

selecting a plurality of typical wells in each block;

16. The missing log prediction device of claim 15 wherein the weight determination module is specifically configured to:

17. The missing log prediction device of claim 12 wherein the weight determination module is specifically configured to:

18. The missing log prediction device of claim 17, wherein the weight determination module is specifically configured to:

19. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 9 when executing the computer program.

20. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 9.