Disclosure of Invention
Accordingly, it is an object of the present invention to overcome the above-mentioned deficiencies of the prior art and to provide a method of training a model for reconstructing well log data, comprising:
1) carrying out dimensionality reduction on the collected data of the logging attributes;
2) the method comprises the steps that false data which are generated by a generating network G based on current parameters and aim at logging attributes are used as one input of a judging network D, a dimension reduction processing result is used as the other input of the judging network D, and the judging network D judges the false data as really acquired data by adjusting the parameters of the generating network G;
the generation network G is a convolutional neural network, and the dummy data corresponds to a plane coordinate.
Preferably, according to the method, the discrimination network D is a classifier for classifying whether a difference between a result of the dimension reduction processing and the dummy data is smaller than a set threshold.
Preferably, according to said method, said discriminating network D uses the mean square error to evaluate said difference.
Preferably, according to the method, wherein step 2) comprises:
2-1) carrying out normalization processing on the result of the dimensionality reduction processing;
2-2) when the mean square error is less than 1.5, taking the current parameters of the generating network G as the parameters of a model for reconstructing logging data.
Preferably, according to the method, the generating network G takes plane coordinates as input for generating logging property values corresponding to the plane coordinates based on its current parameters.
Preferably, according to the method, wherein step 2) comprises:
inputting a random value generated based on noise as the plane coordinate into the generating network G to generate a logging attribute value corresponding to the plane coordinate as the dummy data by the generating network G based on its current parameter and the plane coordinate.
Preferably, according to the method, wherein step 1) comprises: and reducing the dimension of the data of one logging attribute at different depths based on a PCA algorithm.
Preferably, according to the method, wherein step 1) comprises: data for a well log attribute at different depths is down sampled.
Preferably, the method according to, wherein the logging property of step 1) is selected from the group consisting of: CARB, CLLB, VAC, VAF90, VAT10, VAT20, VAT30, VAT60, VAT90, VCA, VCILD, VGR, VKRO, VKRW, VPERM, VPOR, VSH, VSP, VSPC, VSW, VSWIR, VSXO, DEN, SPC, RT, RM, SW, SOR, POR, PORT.
A method of reconstructing well log data based on a model generated by any one of the methods above, comprising:
1) inputting the plane coordinates of the area to be predicted into the obtained generation network G;
2) and generating a logging attribute value corresponding to the plane coordinate of the area to be predicted by the generation network G.
A computer-readable storage medium, in which a computer program is stored which, when executed, is adapted to carry out the method of any of the above.
A system for reconstructing a model of well log data, comprising:
a storage device and a processor;
wherein the storage means is adapted to store a computer program which, when executed by the processor, is adapted to carry out the method of any of the above.
Compared with the prior art, the embodiment of the invention has the advantages that:
the neural network model is trained by adopting the logging attributes instead of seismic data as samples, the condition that proper seismic data cannot be obtained due to the fact that the mining years are early is avoided, the logging attributes are relatively less interfered by noise, and the neural network model trained in the mode is better in prediction effect. In order to train the neural network model by using the logging attribute data, the original logging attribute data with huge data volume is subjected to dimensionality reduction, and a data part which is most beneficial to reconstructing the logging data and can express the nonlinear relation is reserved to be used as a sample of the training model. The method adopts a model for reconstructing logging data by adopting the confrontation network training, and makes the discrimination network D difficult to distinguish whether the logging attribute value generated by the generation network G is really explored data or not through the confrontation game, so that the obtained generation network G can generate a prediction result closest to the real data.
Detailed Description
The logging data in the invention mainly include various physical parameters recorded aiming at the resource reservoir, such as natural potential, resistivity, acoustic velocity, rock volume density and the like, and geological information such as lithology, shale content, water saturation, permeability and the like obtained by processing the directly acquired data. The data can be called logging attributes, and the curve reflecting the change rule of the logging attributes in depth is also called logging attribute curve.
Fig. 1 shows a log property curve for a certain log property, and it can be seen that the information obtained by performing multiple detections at different depths accurately reflects the change of the property at the depth level.
Typically, well log attributes for oil exploration comprise over 160 attributes, which should be sampled once every 0.125 meters of depth, based on the national standard specifications, in order to accurately characterize the curve variation of one well log attribute. It can be understood that for each log, the values of the various logging attributes at different depths are acquired, and the data volume is huge. Thus, existing machine learning based log data reconstruction options use seismic data, rather than employing logging attributes.
Based on the background, the inventor proposes to perform dimensionality reduction processing on the logging attributes and train a confrontation network model by using processed real data aiming at the problem that a machine model is difficult to train by directly adopting the logging attributes. The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
According to one embodiment of the invention, the machine model used is a countermeasure network model (GAN), which is an unsupervised deep learning model that includes a discrimination network D and a generation network G. Based on the GAN theory, it is not required that the generation network G and the discrimination network D both use a neural network, and the main principle is to generate data by using the generation network G, and to determine whether the data generated by the generation network G is real data by the discrimination network D, and to obtain a result that can be generated to be very similar to the real data by playing a confrontation game.
Fig. 2 illustrates a countermeasure network model according to one embodiment of the invention. As shown in fig. 2, for the application scenario of predicting the logging attributes, the decision network D employs a two-classifier, which takes the real data acquired for logging and the false data generated by the generation network G based on the high-order noise data as its input, and outputs the prediction label that is determined as "true" when the difference between the real data and the false data is smaller than the set threshold, or outputs the prediction label that is determined as "false".
The following describes how to train a model for reconstructing well log data based on the above-described countermeasure network.
Referring to FIG. 3, in accordance with an embodiment of the present invention, there is provided a method of training a model for reconstructing well log data, comprising:
step 1, screening all logging attributes of logging. Based on existing national standards, 160 log attributes may be collected for a single log. The inventor finds that 30 logging attributes are very beneficial for reconstructing logging data, and the logging curve codes are respectively as follows: CARB, CLLB, VAC, VAF90, VAT10, VAT20, VAT30, VAT60, VAT90, VCA, VCILD, VGR, VKRO, VKRW, VPERM, VPOR, VSH, VSP, VSPC, VSW, VSWIR, VSXO, DEN, SPC, RT, RM, SW, SOR, POR, PORT. Thus, when training the model, the one or more logging attributes may be filtered from the raw well log data for use in training the model.
And 2, selecting data of a plurality of logs, and performing down-sampling on the data of each log attribute of each log on different depths.
Since the logging attributes of one log can only reflect the geological attribute conditions near the log, it is preferable to use data of the logging attributes of a plurality of logs in order to recover the geological attribute characteristics of a wide area. Preferably, data of logging attributes of the evenly distributed plurality of logs is selected.
The reason for down-sampling in this step is that the well logging attribute data obtained based on the national standard is sampled once at a depth of 0.125m, and the down-sampling can reduce the data amount of the depth layer on one hand and filter out the similar data between adjacent sampling points on the other hand. For example, the AC property value at a sample point of 0.125m is 119.6, and the AC property value at a sample point of 0.250m is 121.4, which are very similar. Preferably, one sample value is obtained by adjusting the sampling frequency to a depth of 0.5m per interval by down-sampling.
The number of logging attributes can be greatly reduced through downsampling, and the training of a model is facilitated.
Taking two attributes of VGR and VAC as an example, the result of downsampling at depth is shown in table 1:
TABLE 1
Depth of field
|
VGR
|
VAC
|
|
Normalized VGR
|
Normalized VAC
|
901.500
|
103.03
|
323.74
|
Retention
|
0.3535
|
0.7374
|
901.625
|
120.46
|
323.65
|
Abandon
|
|
|
901.750
|
138.55
|
323.36
|
Abandon
|
|
|
901.875
|
146.82
|
322.90
|
Abandon
|
|
|
902.000
|
140.99
|
322.46
|
Retention
|
0.6066
|
0.4831
|
902.125
|
125.84
|
322.25
|
Abandon
|
|
|
Wherein, the logging attribute value of every interval 0.5m depth is reserved, and the rest logging attribute values are discarded. In addition, given that the values of many logging attributes are relatively large and there are differences between the values of different logging attributes, the values of the retained logging attributes may be normalized for subsequent processing. For example, find the maximum value of VGR, calculate the ratio of the VGR value of the current depth to the maximum value to obtain the normalized result. The normalization operation is beneficial to reducing the differential influence of different dimensions and different data scales on the subsequent training of the confrontation network.
And 3, performing dimensionality reduction on each logging attribute obtained through the step by using a Principal Components Analysis (PCA) dimensionality reduction method.
The PCA is a dimensionality reduction algorithm commonly used in image processing, and is very beneficial to remove linearly dependent portions of data and find linearly independent portions. The core idea of this algorithm is: for a given dissimilarity matrix between high-dimensional data points, the dissimilarity matrix between the point pairs can be matched with the given dissimilarity matrix between the high-dimensional data points by finding data corresponding to the high-dimensional data points in a low-dimensional space.
Assume that the dissimilarity matrix between pairs of points in the high-dimensional data set X is ∑ (σ)ij)n×nPCA aims to find low-dimensional data Y ═ YiI 1,2,.. multidata, n } such that the distance d between pairs of points in the low-dimensional data setijCan be maximally close to sigmaij. If the Euclidean measurement is adopted, the following are:
σij 2=dij 2=||xi-xj||2=xi Txi-2xi Txj+xj Txj
wherein x isi、xjRepresenting the ith and jth attribute values, d, in a well log attribute data setijRepresenting the distance between the ith and jth log attributes.
The data Y of the logging attributes after the dimensionality reduction of the PCA can be calculated by the following calculation formula.
Y=S0.5Ud
Wherein S is a data value x of a logging attribute to be processed1,x2…xnDiagonal matrix, U, arranged in descending orderdAnd e represents a unit vector. According to empirical values, d ∈ [0,30 ]]。
Assuming that the values of VAC are distributed between 250 and 350 and the values of VGR are distributed between 50 and 200, the one-dimensional data shown in Table 2 is obtained after PCA processing:
TABLE 2
Well log attributes
|
Depth of field
|
Normalized numerical value
|
VAC
|
901.500
|
0.5425
|
VGR
|
902.000
|
0.7248 |
In the above steps 1,2, and 3, three operations of performing dimensionality reduction on the logging attribute data are provided, and low-dimensional data is obtained for subsequent model training.
And 4, the real logging data obtained in the step 3 are input into a discrimination network D, and the generation network G takes the randomly generated numerical value as a plane coordinate position to generate a logging attribute value corresponding to the position, namely false data, and the false data are also input into the discrimination network D. And the discrimination network D evaluates whether the mean square error between the real data and the false data is less than a set threshold value. If not, adjusting the parameters of the generated network G, repeating the operation in the step until the network D is judged to be unable to identify whether the false data is real data, namely the mean square error is smaller than a set threshold value, and taking the parameters of the generated network G at the moment as the parameters of the model obtained by training.
The generation network G adopts a convolution neural network model. Preferably, the pooling operation in the generation network G is replaced by a deconvolution layer, and the deconvolution operation mainly uses a filter after the convolution process is reversed, which can perform better function for extracting the feature values than the pooling operation. The parameters of the deconvolution layer are changed in the back propagation of each iteration subject to the constraints of the parameters associated with the convolution layer. For example, in the reverse process of convolution, the step size and filling pattern of the deconvolution layer will change with each iteration.
In this embodiment, the input of the generation network G is a plane coordinate position generated based on noise, and the plane coordinate position may be generated randomly or based on a trained neural network model. In each iteration process of adjusting the parameters of the generating network G, the generating network G generates a predicted value of the logging attribute value corresponding to the current parameters of the generating network G and the input plane coordinate position of the generating network G.
And the judgment network D is a two-classifier, and outputs a prediction label judged to be 'true' when the difference between real data acquired by logging and false data generated by the generation network G based on high-order noise data is smaller than a set threshold, or outputs a prediction label judged to be 'false'. In other embodiments of the present invention, other models can be used as the discriminant network D as long as the model can identify the difference between the real data and the dummy data.
According to an embodiment of the invention, mean-square error (MSE) is used to measure the difference between the real data and the false data, the value of the MSE can reach tens of thousands at the initial stage of training the countermeasure network, and the MSE can continuously decrease to an ideal value range along with the development of the training process. For the case of normalizing the logging attributes, if the mean square error between the network generation data and the original data is less than 1.5, the training is considered to reach the expected target, the training is stopped, and the parameters of the generated network G are used as the training result.
Through the training process, the obtained logging attribute data generated by the generating network G can be enough to cheat the discriminating network D, so that the discriminating network D can not distinguish whether the logging attribute data from the generating network G is the data collected from real logging or not. Such a generated network G will be used for reconstruction of well log data for unknown regions where no production logs are produced.
When the model obtained by the method is used for well logging data reconstruction of an unknown region without exploitation well logging, only the generation network G is adopted, and the discrimination network D is not required. The planar coordinates of the area to be predicted may be input into the obtained generation network G, and logging attribute values corresponding to the planar coordinates of the area to be predicted may be produced by the generation network G.
In the embodiment, the neural network model is trained by adopting the logging attributes instead of the seismic data as the sample, so that the condition that proper seismic data cannot be acquired due to the early mining year is avoided, the logging attributes are relatively less interfered by noise, and the prediction effect of the neural network model trained in the mode is better. In order to train the neural network model by using the logging attribute data, the original logging attribute data with huge data volume is subjected to dimensionality reduction, and a data part which is most beneficial to reconstructing the logging data and can express the nonlinear relation is reserved to be used as a sample of the training model. The method adopts a model for reconstructing logging data by adopting the confrontation network training, and makes the discrimination network D difficult to distinguish whether the logging attribute value generated by the generation network G is really explored data or not through the confrontation game, so that the obtained generation network G can generate a prediction result closest to the real data.
In order to verify the effect obtained based on the method of the present invention, the inventors conducted simulation experiments. FIG. 4a shows the result of predicting GR well logging attributes using the generated network model obtained by the training method of the present invention, where the ordinate is GR well logging attribute values and the abscissa is depth values. The number of training iterations experienced by the model used in fig. 4a is 20,000, and the output condition of the training is to end the training when the number of iterations reaches 20,000. If the number of training times is increased, a better training result may be obtained, but when the number of training times is 20,000, the model already obtains a convergence result, so the invention chooses to finish the training when the number of training times is 20,000. FIG. 4b shows the values of the actual detected logging attributes corresponding to FIG. 4 a. For reconstruction of well log data, the reconstructed curve is usually not fit, and a good prediction should be one that gives as close to the true value as possible at the corresponding depth. It can be seen that the depths corresponding to the peaks and valleys in fig. 4a substantially match the values of the corresponding depths in fig. 4b, which means that the situation of acquiring the petroleum resource at the corresponding depth corresponding to the peak thereof substantially does not occur when no petroleum is acquired with reference to the prediction result of the present invention, and thus the method provided by the present invention is very beneficial to the exploration of petroleum logging.
To further quantify the effect of such predictions, the inventors also calculated the signal-to-noise ratio of the predictions using the generated network model of the present invention and compared it to the conventional MSE algorithm. The snr is a measure of the quality of the data reconstruction result by looking at the ratio between the predicted value (in the present invention, the reconstructed value) and the original value. The signal-to-noise ratio calculated here is defined as:
where SNR represents the signal-to-noise ratio, z represents the raw logging data,
representing reconstructed log data (i.e., predicted values).
Based on the above calculation, a signal-to-noise ratio of 20.42 using the conventional MSE algorithm and a signal-to-noise ratio of 21.73 based on the present invention can be obtained. Therefore, the generated model obtained by the method can obtain better and ideal prediction results. It should be noted that, all the steps described in the above embodiments are not necessary, and those skilled in the art may make appropriate substitutions, replacements, modifications, and the like according to actual needs.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.