CN118171122A

CN118171122A - Method for detecting and correcting abnormal points of data in calculation of drilling pressure parameters

Info

Publication number: CN118171122A
Application number: CN202211532836.7A
Authority: CN
Inventors: 李萍; 王建龙; 于琛; 贾培娟; 罗洁; 刘轩; 范永涛
Original assignee: China National Petroleum Corp; CNPC Bohai Drilling Engineering Co Ltd
Current assignee: China National Petroleum Corp; CNPC Bohai Drilling Engineering Co Ltd
Priority date: 2022-12-01
Filing date: 2022-12-01
Publication date: 2024-06-11

Abstract

The invention discloses a method for detecting and correcting data outliers in drilling pressure parameter calculation, which overcomes the defect of low calculation precision of a subsequent calculation module caused by anomalies of data in original drilling data and errors calculated by an intermediate calculation module, can accurately judge the well depth of the data outliers and correct the data outliers, and provides a basis for scientifically and reasonably guiding drilling construction. And (3) attempting to use a gradient descent fitting algorithm of machine learning for calculating parameters such as pressure in the drilling process, detecting input data of each calculation module and data abnormal points in calculation results, and giving out optimal calculation results of various pressures and parameters.

Description

Method for detecting and correcting abnormal points of data in calculation of drilling pressure parameters

Technical Field

The invention relates to the field of oil and gas drilling, in particular to a method for detecting and correcting abnormal points of data in drilling pressure parameter calculation.

Background

In the drilling process, data such as overburden pressure, rock mechanical parameters, formation pore pressure, ground stress, formation collapse pressure, formation fracture pressure, formation leakage pressure and the like of different formations are calculated according to logging data, and sometimes pressure prediction is needed for other wells of the same formation in the same region. The logging data is the most basic data, and generally comprises well name, logging, well diameter, neutrons, density, natural gamma, natural potential, deep lateral resistivity, shallow lateral resistivity, longitudinal wave time difference, transverse wave time difference and the like. In general, logging data more comprehensively reflects production test conditions of a researched well at different depths of the well, and due to various practical reasons, few data in the data can be abnormal, so that errors and even errors occur in subsequent calculation of various pressure parameters, and if the production is guided by the data, safety accidents can be generated. Therefore, it is important to detect and repair outliers in the raw log data and the various pressure values generated by the calculations.

Outliers, also called outliers, are commonly detected by: a quartile bin graph, a 3 sigma principle, a model-based detection and a similarity-based detection method. The detection based on the similarity can be further divided into detection of outliers based on the proximity, detection of outliers based on the density, a clustering-based method and the like.

The method of detecting abnormal values by using the quartile range (IQR) of the box diagram is equivalent to manually detecting observed values. The 3σ principle is to calculate the mean μ and standard deviation, and the probability that the data falls within the interval (μ -3σ, μ+3σ) is 99%, and when the data distribution exceeds this interval, it can be regarded as abnormal data. The 3 sigma principle requires that the data obeys a normal distribution, and if the data does not obey the normal distribution, the multiples of the standard deviation can be customized according to experience and actual conditions.

Based on model detection, a probability distribution model is defined, the probability that a sample value accords with the model is calculated, and an object with low probability is regarded as an outlier. If the model is a collection of clusters, then the anomaly is an object that does not significantly belong to any cluster; if the model is regression, the anomaly is an object relatively far from the predicted value. The method has a solid statistical theoretical basis, and these tests may be very effective when there is sufficient data and knowledge of the type of test used; but the detection probability is poor for multivariate data and high dimensional data.

The distribution model is relatively easy to determine based on the adjacent ideas compared with the model detection method, and three common methods exist. The first type of outlier detection based on proximity is also called the K-NN method, where K is the range of proximity. If k is too small, a small number of adjacent outliers may result in a lower outlier score; if K is too large, all objects in clusters with less than K points may become outliers. In order to make the scheme more robust to the choice of k, an average distance of k nearest neighbors may be used. The method is sensitive to K values, has large calculated amount and is not suitable for processing data sets with areas with different densities.

The second type of density-based outlier detection employs a density perspective, outliers being objects in low density regions. Outlier detection based on density is closely related to proximity-based detection, as density is typically defined in terms of proximity. The density may be defined as the inverse of the average distance to k nearest neighbors or as the number of objects within a specified distance d of an object. As with the distance-based approach, this approach has a high time complexity and parameter selection is difficult.

The clustering-based approach is to sort the data, and if a test value does not strongly belong to any class, then the object belongs to an outlier. The method is also called a k_means algorithm, the k value is the number of clusters, the effectiveness of the structure generated by the clusters is sensitive to outliers, and the influence on outlier judgment is very large.

In the field of outlier data restoration, polynomial interpolation and lagrangian polynomial interpolation are commonly used. Polynomial interpolation is to find a polynomial to fit the data sample points, and predict or update the outlier values according to the polynomial with depth as an argument. The method has the advantages of intuitiveness and clear algorithm property. However, the coefficient matrix of the linear equation set to be solved is a vandermonde matrix, generally a pathological matrix, and a great error is generated when the equation set is actually solved.

The Lagrange polynomial interpolation is to construct a group of Lagrange basis functions with the degree not exceeding n, and then linearly combine the basis functions to obtain the Lagrange polynomial interpolation. The formula of the Lagrange interpolation method has a neat and compact structure, is quite convenient in theoretical analysis, however, when interpolation points are increased or decreased by one in calculation, the corresponding basic polynomials need to be completely recalculated. In addition, when interpolation points are relatively large, the degree of the lagrangian interpolation polynomial may be high, and thus the value is unstable.

In view of the above problems, the above conventional mathematical methods have various drawbacks and disadvantages in abnormal data detection and data restoration. In the age of big data and artificial intelligence, the characteristics of the data are required to be fully utilized, and a new method for searching for solving problems from the data perspective is sought.

Disclosure of Invention

The invention aims to solve the technical problem of providing a method for detecting and correcting abnormal points of data in drilling pressure parameter calculation, which is used for detecting abnormal values of logging data in the drilling process and various data such as pressure calculated based on the logging data, so as to reduce the influence of basic data errors on calculation results and improve the timeliness and accuracy of data processing.

In order to solve the technical problems, the invention adopts the following technical scheme: a method for detecting and correcting abnormal points of data in calculation of drilling pressure parameters comprises the following steps:

A. For data needing abnormal point detection and constant correction, assuming that the input of each calculation module is called a sample sequence D, wherein the sample sequence is expressed as D= { x ₁,x₂,......,x_n }, each sample x _i in the sample sequence D is expressed in a form of a (well depth js, value v) "binary group, the unit of the well depth js is m, the data can be provided with decimal floating point type data, the value v is expressed as the input data of the calculation module or the calculation result of the module, the unit is uniform, the number of samples in the sample sequence D is n, and the samples are arranged according to the increasing sequence of the well depth;

B. Detecting abnormal points of data: defining an initial sample space, classifying the data of the sample sequence through a clustering method, selecting the data class nearest to the sample when a new sample arrives, and if the degree of the sample deviating from the data class is greater than 3 times of the intra-class divergence of the data class, marking the new sample as a data outlier and marking the new sample as the data outlier;

C. Correction of data outlier: and (3) adopting a gradient descent fitting algorithm in machine learning, determining the degree of a fitting polynomial by using an adaptive method, and stopping the iterative process of the gradient descent algorithm when the difference |err _train-err_test | between the current training error err _train and the test error err _test meets a certain condition, and evaluating according to the fitting polynomial by taking the depth of the data outlier as an independent variable, wherein the value is the corrected value of the data outlier.

The step B of detecting the abnormal value of the data comprises the following steps: giving a depth space, setting the number of samples as n, and carrying out value self-adaptive clustering on samples, wherein the maximum number of data classes is K, and the data average value of each data class K isThe intra-class divergence is S _k, K epsilon K; each input sample x _i, calculating the distance d _k from the sample x _i to each data class center, selecting the data class C _min with the minimum distance, if the distance d _k is not within 3 times of the intra-class divergence, marking the sample x _i as an abnormal point for C, otherwise, clustering the sample x _i into the data class C _min, and updating the average value/>, after each data class k changesIntra-class divergence S _k;

each data class mean value calculation formula:

Wherein M is the number of samples of the kth class;

The distance d _k of sample x _i to each datacenter computes the formula:

The calculation formula of the divergence in the class of data:

Wherein, Is the mean of the k-th class of sample values.

If the number of samples in a certain class of data C _k increases faster to a certain extent, the class of data will split into two sub-classes C _k1、C_k2; in order to control the rapid increase of the number of samples of a certain class, it is necessary to determine whether one class of data is to be separated into two classes of data; judging whether the condition of splitting the data class is needed, namely S ₀≤a*(S₁+S₂), wherein alpha is a super parameter, and taking a value between 1.1 and 1.2 according to a parameter adjusting result.

In the step C, when correcting the data outlier, comprehensively considering total win samples before and after the depth of the data outlier, wherein the win value is called a correction window;

C1. Let the fitting polynomial in the local range win be y=θ ^Txⁱ,i＝0,1,2,...N,θ＝＜θ₀,θ₁,...,θ_N＞^T, θ is a vector, which is the parameter to be solved;

C2. setting a loss function in training

C3. sampling 80% of data intervals in a range win to serve as training data, and taking the remaining 20% of data as test data;

C4. And (5) adopting a gradient descent fitting algorithm to carry out iterative computation. Wherein, the learning rate is set to be 0.005 after parameter adjustment. In the iteration process, when the difference |err _train-err_test | between the current training error err _train and the test error err _test meets a certain condition, stopping the iteration process of the gradient descent algorithm, saving the polynomial order i and the polynomial parameter theta value at the moment, and stopping the iteration process. The depth is taken as an independent variable and is evaluated according to a fitting polynomial y=theta ^Txⁱ, namely the correction value of the data outlier.

The termination condition of the gradient descent fitting algorithm is that the gradient descent fitting algorithm iteration is stopped when |err _train-err_test | of the current iteration time is greater than 1/(2.7 x n) Σ _i|errⁱ _train-errⁱ _test |.

Judging whether the fitting is performed by using a conventional machine learning method, and when the fitting is performed, more data are needed to simulate the fitting function asThe win value should be increased at this time. The method for adaptively adjusting the parameters win comprises the following steps: win= (1+β/r) ×win, where win is a super parameter, when actually used, a win value is given, and the larger the win value is, the more accurate the correction of the outlier is, but the fitting time is increased; wherein r is the win value at the last fitting.

In step C1, in order to prevent overfitting, the polynomial degree is set to 6 at the highest.

The beneficial effects of the invention are as follows: the abnormal detection and correction method for the logging data for the formation pressure parameter calculation achieves the effect of data optimization and provides a theoretical basis for guiding production in real time in the drilling process.

Drawings

FIG. 1 is a diagram of a drilling pressure parameter calculation framework;

FIG. 2 is a flow chart of data outlier detection;

FIG. 3 is a data class splitting flow diagram;

FIG. 4 is a data outlier correction flow chart;

FIG. 5 is a graph of the calculation result of rock mechanical parameters before optimization;

FIG. 6 is a graph of the results of the calculation of the rock mechanical parameters after optimization (number of samples used/4);

FIG. 7 is a graph of the results of the calculation of the rock mechanical parameters after optimization (number of samples used/8).

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention; it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments, and that all other embodiments obtained by persons of ordinary skill in the art without making creative efforts based on the embodiments in the present invention are within the protection scope of the present invention.

The invention relates to a method for detecting and correcting abnormal points of data in drilling pressure parameter calculation, which comprises the following steps:

each data class mean value calculation formula:

Wherein M is the number of samples of the kth class;

The distance d _k of sample x _i to each datacenter computes the formula:

The calculation formula of the divergence in the class of data:

Wherein, Is the mean of the k-th class of sample values.

C2. setting a loss function in training

In the process of processing logging data and calculating drilling related pressure, the method can be used for detecting abnormal points and correcting abnormal data. Logging data preprocessing is the initial process followed by various pressure parameter calculation modules during the drilling process. The sequence of the various computing modules is shown in fig. 1.

The data preprocessing and each calculation module are progressive layer by layer, and if the ground stress calculation is to be performed, the stratum pore pressure calculation result is needed, the stratum pore pressure calculation is dependent on the rock mechanical parameter calculation result, and the like. Therefore, errors or errors of the calculation results of the previous module are also conducted to the subsequent calculation modules layer by layer, and the calculation results of the subsequent modules are greatly affected. Before each calculation module in fig. 1 starts, the detection and correction of the data outlier can be performed on the input data, and the detection and correction of the data outlier can also be performed on the result of each calculation module.

In the process of data anomaly detection and data repair, the following assumptions are made:

The input to each computation module is referred to as an input dataset, also referred to as a set of sample points D or a sequence of samples D, denoted d= { x ₁,x₂,......,x_n }, each sample x _i in the sequence of samples D being represented in the form of a binary group of "(well depth js, value v)". The well depth js is in meters, and can be provided with decimal floating point type data, and the value v can be expressed as input data of a calculation module or a calculation result of the module, and the units are uniform. The number of samples in the sample sequence D is n, and the samples are arranged according to the increasing sequence of the well depth.

The method of the invention consists of two steps. The first step is data outlier detection; the second step is outlier data repair.

First, detecting abnormal points of data.

The n values v of the sample sequence D, which is formed in increasing order of well depth, may or may not be stationary throughout the depth space. However, in a local depth space, the value v at a certain depth has a strong correlation with the value v of the adjacent well depth point, is a non-white noise sequence, and has analysis value. Thus, the basic idea of data outlier detection is: firstly, defining an initial sample space, classifying data of a sample sequence through a clustering method, selecting a data class nearest to the sample when a new sample arrives, and if the degree of the sample deviation from the data class is greater than 3 times of the intra-class divergence of the data class, obtaining the new sample as a data outlier.

The data outlier detection flow chart is shown in fig. 2:

The calculation formula involved in fig. 2 is as follows:

each data class mean value calculation formula:

M is the number of samples in the kth data class, v _m is the mth data in the kth data class.

The distance d _k of sample x _i to the center of each category is calculated as:

v _i is the data value of sample x _i.

An intra-class divergence calculation formula for a data class:

u _k is the mean of the data values in the kth data class.

If the number of samples in a certain class of data C _k grows faster, to a certain extent, this class of data will split into two sub-classes C _k1、C_k2. Class splitting judgment flow:

Alpha in the flow chart of fig. 3 is a super parameter, and the value is between 1.1 and 1.2 according to the parameter adjusting result.

Accordingly, the program implementation flow of the data outlier detection method is as follows: giving a depth space, setting the number of samples as n, and carrying out value self-adaptive clustering on the samples, wherein the maximum number of data classes is K, and the data average value of each data class K isThe intra-class divergence is S _k, K epsilon K; for each input sample x _i, calculate its distance d _k,/>, to each datacenterData class C _min with the smallest distance is selected, if the distance d _k is not within 3 times of the intra-class divergence, the sample x _i is an outlier and marked as the outlier for use in the second step, otherwise, the sample x _i is clustered into the data class C _min. After each data class k changes, its mean/>, is updatedIntra-class divergence S _k. To control the rapid increase in the number of samples in a data class, it is necessary to determine whether to separate a class into two classes, and a determination flow is shown in fig. 3.

The first step of data outlier detection has two features.

(1) According to the application characteristics, the condition for judging the abnormal point of the data is provided. I.e., when the distance d _k of one sample point x _i to the nearest data class k is >3 times the intra-class divergence S _k, then x _i is the data outlier.

(2) To control the rapid increase in the number of samples in a data class, it is necessary to determine whether to split one data class into two data classes. As shown in fig. two, a judgment condition is given, i.e., S ₀≤a*(S₁+S₂), where α is a range determined according to the tuning parameter, and is applied in the art as a range of empirical values of α.

And secondly, correcting abnormal points of the data.

The data outlier uses the data outlier marked in the first step, and the marked point is the data outlier.

When correcting the data outlier, consider a total win number of samples before and after the depth of the data outlier, which is called the correction window.

The basic idea of correcting the outliers of the data is to use a gradient descent fitting algorithm in machine learning and determine the degree of a fitting polynomial by using an adaptive method. Let the fitting polynomial in the local range win be y=θ ^Txⁱ,i＝0,1,2,...N,θ＝＜θ₀,θ₁,...,θ_N＞^T, θ being a vector, a parameter to be solved. To prevent the algorithm from overfitting, the polynomial degree is set to 6 at maximum. Conventionally, 80% of the data interval within the range win is sampled as training data, and the remaining 20% is used as validation test data. Training the polynomial of each order by using gradient descent fitting method, wherein the designed loss function in the training isThe learning rate is set to 0.005 after the parameter adjustment. When the difference |err _train-err_test | between the current training error err _train and the test error err _test meets a certain condition, stopping the iterative process of the gradient descent algorithm, memorizing the polynomial order i and the polynomial parameter theta value at the moment, and exiting.

Among the polynomials, the polynomial with the smallest test error err _test is the fitting polynomial, and the depth is used as the independent variable to evaluate according to the polynomial, namely the correction value of the data outlier.

The data outlier correction flow chart is as shown in fig. 4:

the second step of data outlier correction has two features.

(1) The termination conditions in fig. 4 are given. When |err _train-err_test | of the current iteration is greater than 1/(2.7×n) Σ _i|errⁱtrain-errⁱ _test |, the iterative gradient descent fitting algorithm is terminated.

(2) An adaptive parameter win adjustment method is provided. Judging whether the fitting is performed by using a conventional machine learning method, and when the fitting is performed, more data are needed to simulate the fitting function asThe win value should be increased at this time. The method for adaptively adjusting the parameters win comprises the following steps: win= (1+β/r) ×win, where win is a super parameter, when actually used, a win value is given, and the larger the win value is, the more accurate the correction of the outlier is, but the fitting time is increased; wherein r is the win value at the last fitting.

The data anomaly detection and anomaly data correction method for well drilling pressure parameter calculation achieves the effect of data optimization and provides a theoretical basis for guiding production in real time in the well drilling process.

Examples.

14755 Pieces of data exist in the most original logging data, and each piece of data comprises a well name, a sounding m, a well diameter in', neutrons, density g/cm ³, natural gamma, natural potential, deep lateral resistivity, shallow lateral resistivity, longitudinal wave time difference us/ft and transverse wave time difference us/ft.

It is assumed that overburden pressure has been calculated from the raw log data, followed by calculation of rock mechanical parameters at the calculation module.

According to the mechanical formula, the rock mechanical parameters are calculated by taking the result of the overburden pressure calculation as input. Calculation results of rock mechanical parameters totally comprise 6 aspects of data: poisson's ratio, modulus of elasticity, cohesion, internal friction angle, compressive strength and tensile strength, which are referred to herein as raw data, are shown in fig. 5, where the ordinate is well depth m, without any abnormal point detection and abnormal correction of data.

In the first step of the data outlier detection algorithm, n=all data/4 is set. The smaller the value of n, the faster the previous data is forgotten. The detection result of the abnormal point of the data is as follows: poisson's ratio anomaly 1072, elastic modulus anomaly 1339, cohesion anomaly 836, internal friction angle anomaly 1146, compressive strength anomaly 728, tensile strength anomaly 911. And using the data outlier correction function of the second step, the corrected result is shown in fig. 6.

In comparison with fig. 5, the poisson ratio, elastic modulus, cohesion, compressive strength and tensile strength of fig. 6 are significantly reduced, and the internal friction angle data has 1146 abnormal points, but the corrected structure is not obvious because the data has no significant rule.

Setting n=total data/8, and performing outlier detection on the original data. The detection result of the abnormal point of the data is as follows: poisson's ratio outliers 1048, elastic modulus outliers 1107, cohesion outliers 844, internal friction angle outliers 1010, compressive strength outliers 707, tensile strength outliers 853. And using the outlier correction function of the second step, the corrected result is shown in fig. 7.

The poisson's ratio, elastic modulus, cohesion, compressive strength and tensile strength anomaly in fig. 7 are significantly reduced as compared to fig. 5. As compared with fig. 6, the number of data outliers is significantly reduced because the n value is reduced.

The invention overcomes the abnormality of the data in the original drilling data, also overcomes the defect of low calculation precision of the subsequent calculation module caused by the error calculated by the intermediate calculation module in fig. 1, can accurately judge the well depth of the data abnormal point, corrects the data abnormal point, and provides a basis for scientifically and reasonably guiding drilling construction. And (3) attempting to use a gradient descent fitting algorithm of machine learning for calculating parameters such as pressure in the drilling process, detecting input data of each calculation module and data abnormal points in calculation results, and giving out optimal calculation results of various pressures and parameters.

In view of the foregoing, the present invention is not limited to the above-described embodiments, and other embodiments can be easily proposed by those skilled in the art within the scope of the technical teaching of the present invention, but such embodiments are included in the scope of the present invention.

Claims

1. The method for detecting and correcting the abnormal point of the data in the calculation of the drilling pressure parameter is characterized by comprising the following steps:

2. The method for detecting and correcting data outliers in the calculation of a drilling pressure parameter of claim 1, wherein the step B data outlier detection comprises: giving a depth space, setting the number of samples as n, and carrying out value self-adaptive clustering on samples, wherein the maximum number of data classes is K, and the data average value of each data class K isThe intra-class divergence is S _k, K epsilon K; each input sample x _i, calculating the distance d _k from the sample x _i to each data class center, selecting the data class C _min with the minimum distance, if the distance d _k is not within 3 times of the intra-class divergence, marking the sample x _i as an abnormal point for C, otherwise, clustering the sample x _i into the data class C _min, and updating the average value/>, after each data class k changesIntra-class divergence S _k;

each data class mean value calculation formula:

Wherein M is the number of samples of the kth class;

The distance d _k of sample x _i to each datacenter computes the formula:

The calculation formula of the divergence in the class of data:

Wherein, Is the mean of the k-th class of sample values.

3. The method for detecting and correcting abnormal data in the calculation of drilling pressure parameters according to claim 2, wherein if the number of samples in a certain data class C _k increases faster to a certain extent, the data class is split into two sub-classes C _k1、C_k2; in order to control the rapid increase of the number of samples of a certain class, it is necessary to determine whether one class of data is to be separated into two classes of data; judging whether the condition of splitting the data class is needed, namely S ₀≤a*(S₁+S₂), wherein alpha is a super parameter, and taking a value between 1.1 and 1.2 according to a parameter adjusting result.

4. The method for detecting and correcting abnormal data points in the calculation of drilling pressure parameters according to claim 2, wherein in the step C, when correcting abnormal data points, the total win samples before and after the depth of one abnormal data point are comprehensively considered, and the win value is called a correction window;

C2. setting a loss function in training

5. The method for detecting and correcting abnormal points of data in calculation of drilling pressure parameters according to claim 2, wherein the gradient descent fitting algorithm is stopped when the termination condition of the gradient descent fitting algorithm is that |err _train-err_test | of the current iteration is greater than 1/(2.7 x n) Σ _i|errⁱ _train-errⁱ _test |.

6. The method for detecting and correcting abnormal points of data in calculation of pressure parameters of well drilling according to claim 2, wherein the conventional method of machine learning is used to determine whether the fitting is over, and when the fitting is over, more data simulation fitting functions are needed to beThe win value should be increased at this time. The method for adaptively adjusting the parameters win comprises the following steps: win= (1+βr) win, wherein win is a super parameter, when in actual use, a win value is given, and the larger the win value is, the more accurate the correction of the abnormal points of the data is, but the fitting time is increased; wherein r is the win value at the last fitting.

7. The method for detecting and correcting abnormal points of data in calculation of drilling pressure parameters according to claim 4, wherein in step C1, the maximum polynomial degree is set to 6 in order to prevent overfitting.