CN118171122A - Method for detecting and correcting abnormal points of data in calculation of drilling pressure parameters - Google Patents

Method for detecting and correcting abnormal points of data in calculation of drilling pressure parameters Download PDF

Info

Publication number
CN118171122A
CN118171122A CN202211532836.7A CN202211532836A CN118171122A CN 118171122 A CN118171122 A CN 118171122A CN 202211532836 A CN202211532836 A CN 202211532836A CN 118171122 A CN118171122 A CN 118171122A
Authority
CN
China
Prior art keywords
data
class
sample
calculation
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211532836.7A
Other languages
Chinese (zh)
Inventor
李萍
王建龙
于琛
贾培娟
罗洁
刘轩
范永涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China National Petroleum Corp
CNPC Bohai Drilling Engineering Co Ltd
Original Assignee
China National Petroleum Corp
CNPC Bohai Drilling Engineering Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China National Petroleum Corp, CNPC Bohai Drilling Engineering Co Ltd filed Critical China National Petroleum Corp
Priority to CN202211532836.7A priority Critical patent/CN118171122A/en
Publication of CN118171122A publication Critical patent/CN118171122A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for detecting and correcting data outliers in drilling pressure parameter calculation, which overcomes the defect of low calculation precision of a subsequent calculation module caused by anomalies of data in original drilling data and errors calculated by an intermediate calculation module, can accurately judge the well depth of the data outliers and correct the data outliers, and provides a basis for scientifically and reasonably guiding drilling construction. And (3) attempting to use a gradient descent fitting algorithm of machine learning for calculating parameters such as pressure in the drilling process, detecting input data of each calculation module and data abnormal points in calculation results, and giving out optimal calculation results of various pressures and parameters.

Description

Method for detecting and correcting abnormal points of data in calculation of drilling pressure parameters
Technical Field
The invention relates to the field of oil and gas drilling, in particular to a method for detecting and correcting abnormal points of data in drilling pressure parameter calculation.
Background
In the drilling process, data such as overburden pressure, rock mechanical parameters, formation pore pressure, ground stress, formation collapse pressure, formation fracture pressure, formation leakage pressure and the like of different formations are calculated according to logging data, and sometimes pressure prediction is needed for other wells of the same formation in the same region. The logging data is the most basic data, and generally comprises well name, logging, well diameter, neutrons, density, natural gamma, natural potential, deep lateral resistivity, shallow lateral resistivity, longitudinal wave time difference, transverse wave time difference and the like. In general, logging data more comprehensively reflects production test conditions of a researched well at different depths of the well, and due to various practical reasons, few data in the data can be abnormal, so that errors and even errors occur in subsequent calculation of various pressure parameters, and if the production is guided by the data, safety accidents can be generated. Therefore, it is important to detect and repair outliers in the raw log data and the various pressure values generated by the calculations.
Outliers, also called outliers, are commonly detected by: a quartile bin graph, a 3 sigma principle, a model-based detection and a similarity-based detection method. The detection based on the similarity can be further divided into detection of outliers based on the proximity, detection of outliers based on the density, a clustering-based method and the like.
The method of detecting abnormal values by using the quartile range (IQR) of the box diagram is equivalent to manually detecting observed values. The 3σ principle is to calculate the mean μ and standard deviation, and the probability that the data falls within the interval (μ -3σ, μ+3σ) is 99%, and when the data distribution exceeds this interval, it can be regarded as abnormal data. The 3 sigma principle requires that the data obeys a normal distribution, and if the data does not obey the normal distribution, the multiples of the standard deviation can be customized according to experience and actual conditions.
Based on model detection, a probability distribution model is defined, the probability that a sample value accords with the model is calculated, and an object with low probability is regarded as an outlier. If the model is a collection of clusters, then the anomaly is an object that does not significantly belong to any cluster; if the model is regression, the anomaly is an object relatively far from the predicted value. The method has a solid statistical theoretical basis, and these tests may be very effective when there is sufficient data and knowledge of the type of test used; but the detection probability is poor for multivariate data and high dimensional data.
The distribution model is relatively easy to determine based on the adjacent ideas compared with the model detection method, and three common methods exist. The first type of outlier detection based on proximity is also called the K-NN method, where K is the range of proximity. If k is too small, a small number of adjacent outliers may result in a lower outlier score; if K is too large, all objects in clusters with less than K points may become outliers. In order to make the scheme more robust to the choice of k, an average distance of k nearest neighbors may be used. The method is sensitive to K values, has large calculated amount and is not suitable for processing data sets with areas with different densities.
The second type of density-based outlier detection employs a density perspective, outliers being objects in low density regions. Outlier detection based on density is closely related to proximity-based detection, as density is typically defined in terms of proximity. The density may be defined as the inverse of the average distance to k nearest neighbors or as the number of objects within a specified distance d of an object. As with the distance-based approach, this approach has a high time complexity and parameter selection is difficult.
The clustering-based approach is to sort the data, and if a test value does not strongly belong to any class, then the object belongs to an outlier. The method is also called a k_means algorithm, the k value is the number of clusters, the effectiveness of the structure generated by the clusters is sensitive to outliers, and the influence on outlier judgment is very large.
In the field of outlier data restoration, polynomial interpolation and lagrangian polynomial interpolation are commonly used. Polynomial interpolation is to find a polynomial to fit the data sample points, and predict or update the outlier values according to the polynomial with depth as an argument. The method has the advantages of intuitiveness and clear algorithm property. However, the coefficient matrix of the linear equation set to be solved is a vandermonde matrix, generally a pathological matrix, and a great error is generated when the equation set is actually solved.
The Lagrange polynomial interpolation is to construct a group of Lagrange basis functions with the degree not exceeding n, and then linearly combine the basis functions to obtain the Lagrange polynomial interpolation. The formula of the Lagrange interpolation method has a neat and compact structure, is quite convenient in theoretical analysis, however, when interpolation points are increased or decreased by one in calculation, the corresponding basic polynomials need to be completely recalculated. In addition, when interpolation points are relatively large, the degree of the lagrangian interpolation polynomial may be high, and thus the value is unstable.
In view of the above problems, the above conventional mathematical methods have various drawbacks and disadvantages in abnormal data detection and data restoration. In the age of big data and artificial intelligence, the characteristics of the data are required to be fully utilized, and a new method for searching for solving problems from the data perspective is sought.
Disclosure of Invention
The invention aims to solve the technical problem of providing a method for detecting and correcting abnormal points of data in drilling pressure parameter calculation, which is used for detecting abnormal values of logging data in the drilling process and various data such as pressure calculated based on the logging data, so as to reduce the influence of basic data errors on calculation results and improve the timeliness and accuracy of data processing.
In order to solve the technical problems, the invention adopts the following technical scheme: a method for detecting and correcting abnormal points of data in calculation of drilling pressure parameters comprises the following steps:
A. For data needing abnormal point detection and constant correction, assuming that the input of each calculation module is called a sample sequence D, wherein the sample sequence is expressed as D= { x 1,x2,......,xn }, each sample x i in the sample sequence D is expressed in a form of a (well depth js, value v) "binary group, the unit of the well depth js is m, the data can be provided with decimal floating point type data, the value v is expressed as the input data of the calculation module or the calculation result of the module, the unit is uniform, the number of samples in the sample sequence D is n, and the samples are arranged according to the increasing sequence of the well depth;
B. Detecting abnormal points of data: defining an initial sample space, classifying the data of the sample sequence through a clustering method, selecting the data class nearest to the sample when a new sample arrives, and if the degree of the sample deviating from the data class is greater than 3 times of the intra-class divergence of the data class, marking the new sample as a data outlier and marking the new sample as the data outlier;
C. Correction of data outlier: and (3) adopting a gradient descent fitting algorithm in machine learning, determining the degree of a fitting polynomial by using an adaptive method, and stopping the iterative process of the gradient descent algorithm when the difference |err train-errtest | between the current training error err train and the test error err test meets a certain condition, and evaluating according to the fitting polynomial by taking the depth of the data outlier as an independent variable, wherein the value is the corrected value of the data outlier.
The step B of detecting the abnormal value of the data comprises the following steps: giving a depth space, setting the number of samples as n, and carrying out value self-adaptive clustering on samples, wherein the maximum number of data classes is K, and the data average value of each data class K isThe intra-class divergence is S k, K epsilon K; each input sample x i, calculating the distance d k from the sample x i to each data class center, selecting the data class C min with the minimum distance, if the distance d k is not within 3 times of the intra-class divergence, marking the sample x i as an abnormal point for C, otherwise, clustering the sample x i into the data class C min, and updating the average value/>, after each data class k changesIntra-class divergence S k;
each data class mean value calculation formula:
Wherein M is the number of samples of the kth class;
The distance d k of sample x i to each datacenter computes the formula:
The calculation formula of the divergence in the class of data:
Wherein, Is the mean of the k-th class of sample values.
If the number of samples in a certain class of data C k increases faster to a certain extent, the class of data will split into two sub-classes C k1、Ck2; in order to control the rapid increase of the number of samples of a certain class, it is necessary to determine whether one class of data is to be separated into two classes of data; judging whether the condition of splitting the data class is needed, namely S 0≤a*(S1+S2), wherein alpha is a super parameter, and taking a value between 1.1 and 1.2 according to a parameter adjusting result.
In the step C, when correcting the data outlier, comprehensively considering total win samples before and after the depth of the data outlier, wherein the win value is called a correction window;
C1. Let the fitting polynomial in the local range win be y=θ Txi,i=0,1,2,...N,θ=<θ01,...,θNT, θ is a vector, which is the parameter to be solved;
C2. setting a loss function in training
C3. sampling 80% of data intervals in a range win to serve as training data, and taking the remaining 20% of data as test data;
C4. And (5) adopting a gradient descent fitting algorithm to carry out iterative computation. Wherein, the learning rate is set to be 0.005 after parameter adjustment. In the iteration process, when the difference |err train-errtest | between the current training error err train and the test error err test meets a certain condition, stopping the iteration process of the gradient descent algorithm, saving the polynomial order i and the polynomial parameter theta value at the moment, and stopping the iteration process. The depth is taken as an independent variable and is evaluated according to a fitting polynomial y=theta Txi, namely the correction value of the data outlier.
The termination condition of the gradient descent fitting algorithm is that the gradient descent fitting algorithm iteration is stopped when |err train-errtest | of the current iteration time is greater than 1/(2.7 x n) Σ i|erri train-erri test |.
Judging whether the fitting is performed by using a conventional machine learning method, and when the fitting is performed, more data are needed to simulate the fitting function asThe win value should be increased at this time. The method for adaptively adjusting the parameters win comprises the following steps: win= (1+β/r) ×win, where win is a super parameter, when actually used, a win value is given, and the larger the win value is, the more accurate the correction of the outlier is, but the fitting time is increased; wherein r is the win value at the last fitting.
In step C1, in order to prevent overfitting, the polynomial degree is set to 6 at the highest.
The beneficial effects of the invention are as follows: the abnormal detection and correction method for the logging data for the formation pressure parameter calculation achieves the effect of data optimization and provides a theoretical basis for guiding production in real time in the drilling process.
Drawings
FIG. 1 is a diagram of a drilling pressure parameter calculation framework;
FIG. 2 is a flow chart of data outlier detection;
FIG. 3 is a data class splitting flow diagram;
FIG. 4 is a data outlier correction flow chart;
FIG. 5 is a graph of the calculation result of rock mechanical parameters before optimization;
FIG. 6 is a graph of the results of the calculation of the rock mechanical parameters after optimization (number of samples used/4);
FIG. 7 is a graph of the results of the calculation of the rock mechanical parameters after optimization (number of samples used/8).
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention; it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments, and that all other embodiments obtained by persons of ordinary skill in the art without making creative efforts based on the embodiments in the present invention are within the protection scope of the present invention.
The invention relates to a method for detecting and correcting abnormal points of data in drilling pressure parameter calculation, which comprises the following steps:
A. For data needing abnormal point detection and constant correction, assuming that the input of each calculation module is called a sample sequence D, wherein the sample sequence is expressed as D= { x 1,x2,......,xn }, each sample x i in the sample sequence D is expressed in a form of a (well depth js, value v) "binary group, the unit of the well depth js is m, the data can be provided with decimal floating point type data, the value v is expressed as the input data of the calculation module or the calculation result of the module, the unit is uniform, the number of samples in the sample sequence D is n, and the samples are arranged according to the increasing sequence of the well depth;
B. Detecting abnormal points of data: defining an initial sample space, classifying the data of the sample sequence through a clustering method, selecting the data class nearest to the sample when a new sample arrives, and if the degree of the sample deviating from the data class is greater than 3 times of the intra-class divergence of the data class, marking the new sample as a data outlier and marking the new sample as the data outlier;
C. Correction of data outlier: and (3) adopting a gradient descent fitting algorithm in machine learning, determining the degree of a fitting polynomial by using an adaptive method, and stopping the iterative process of the gradient descent algorithm when the difference |err train-errtest | between the current training error err train and the test error err test meets a certain condition, and evaluating according to the fitting polynomial by taking the depth of the data outlier as an independent variable, wherein the value is the corrected value of the data outlier.
The step B of detecting the abnormal value of the data comprises the following steps: giving a depth space, setting the number of samples as n, and carrying out value self-adaptive clustering on samples, wherein the maximum number of data classes is K, and the data average value of each data class K isThe intra-class divergence is S k, K epsilon K; each input sample x i, calculating the distance d k from the sample x i to each data class center, selecting the data class C min with the minimum distance, if the distance d k is not within 3 times of the intra-class divergence, marking the sample x i as an abnormal point for C, otherwise, clustering the sample x i into the data class C min, and updating the average value/>, after each data class k changesIntra-class divergence S k;
each data class mean value calculation formula:
Wherein M is the number of samples of the kth class;
The distance d k of sample x i to each datacenter computes the formula:
The calculation formula of the divergence in the class of data:
Wherein, Is the mean of the k-th class of sample values.
If the number of samples in a certain class of data C k increases faster to a certain extent, the class of data will split into two sub-classes C k1、Ck2; in order to control the rapid increase of the number of samples of a certain class, it is necessary to determine whether one class of data is to be separated into two classes of data; judging whether the condition of splitting the data class is needed, namely S 0≤a*(S1+S2), wherein alpha is a super parameter, and taking a value between 1.1 and 1.2 according to a parameter adjusting result.
In the step C, when correcting the data outlier, comprehensively considering total win samples before and after the depth of the data outlier, wherein the win value is called a correction window;
C1. Let the fitting polynomial in the local range win be y=θ Txi,i=0,1,2,...N,θ=<θ01,...,θNT, θ is a vector, which is the parameter to be solved;
C2. setting a loss function in training
C3. sampling 80% of data intervals in a range win to serve as training data, and taking the remaining 20% of data as test data;
C4. And (5) adopting a gradient descent fitting algorithm to carry out iterative computation. Wherein, the learning rate is set to be 0.005 after parameter adjustment. In the iteration process, when the difference |err train-errtest | between the current training error err train and the test error err test meets a certain condition, stopping the iteration process of the gradient descent algorithm, saving the polynomial order i and the polynomial parameter theta value at the moment, and stopping the iteration process. The depth is taken as an independent variable and is evaluated according to a fitting polynomial y=theta Txi, namely the correction value of the data outlier.
The termination condition of the gradient descent fitting algorithm is that the gradient descent fitting algorithm iteration is stopped when |err train-errtest | of the current iteration time is greater than 1/(2.7 x n) Σ i|erri train-erri test |.
Judging whether the fitting is performed by using a conventional machine learning method, and when the fitting is performed, more data are needed to simulate the fitting function asThe win value should be increased at this time. The method for adaptively adjusting the parameters win comprises the following steps: win= (1+β/r) ×win, where win is a super parameter, when actually used, a win value is given, and the larger the win value is, the more accurate the correction of the outlier is, but the fitting time is increased; wherein r is the win value at the last fitting.
In step C1, in order to prevent overfitting, the polynomial degree is set to 6 at the highest.
In the process of processing logging data and calculating drilling related pressure, the method can be used for detecting abnormal points and correcting abnormal data. Logging data preprocessing is the initial process followed by various pressure parameter calculation modules during the drilling process. The sequence of the various computing modules is shown in fig. 1.
The data preprocessing and each calculation module are progressive layer by layer, and if the ground stress calculation is to be performed, the stratum pore pressure calculation result is needed, the stratum pore pressure calculation is dependent on the rock mechanical parameter calculation result, and the like. Therefore, errors or errors of the calculation results of the previous module are also conducted to the subsequent calculation modules layer by layer, and the calculation results of the subsequent modules are greatly affected. Before each calculation module in fig. 1 starts, the detection and correction of the data outlier can be performed on the input data, and the detection and correction of the data outlier can also be performed on the result of each calculation module.
In the process of data anomaly detection and data repair, the following assumptions are made:
The input to each computation module is referred to as an input dataset, also referred to as a set of sample points D or a sequence of samples D, denoted d= { x 1,x2,......,xn }, each sample x i in the sequence of samples D being represented in the form of a binary group of "(well depth js, value v)". The well depth js is in meters, and can be provided with decimal floating point type data, and the value v can be expressed as input data of a calculation module or a calculation result of the module, and the units are uniform. The number of samples in the sample sequence D is n, and the samples are arranged according to the increasing sequence of the well depth.
The method of the invention consists of two steps. The first step is data outlier detection; the second step is outlier data repair.
First, detecting abnormal points of data.
The n values v of the sample sequence D, which is formed in increasing order of well depth, may or may not be stationary throughout the depth space. However, in a local depth space, the value v at a certain depth has a strong correlation with the value v of the adjacent well depth point, is a non-white noise sequence, and has analysis value. Thus, the basic idea of data outlier detection is: firstly, defining an initial sample space, classifying data of a sample sequence through a clustering method, selecting a data class nearest to the sample when a new sample arrives, and if the degree of the sample deviation from the data class is greater than 3 times of the intra-class divergence of the data class, obtaining the new sample as a data outlier.
The data outlier detection flow chart is shown in fig. 2:
The calculation formula involved in fig. 2 is as follows:
each data class mean value calculation formula:
M is the number of samples in the kth data class, v m is the mth data in the kth data class.
The distance d k of sample x i to the center of each category is calculated as:
v i is the data value of sample x i.
An intra-class divergence calculation formula for a data class:
u k is the mean of the data values in the kth data class.
If the number of samples in a certain class of data C k grows faster, to a certain extent, this class of data will split into two sub-classes C k1、Ck2. Class splitting judgment flow:
Alpha in the flow chart of fig. 3 is a super parameter, and the value is between 1.1 and 1.2 according to the parameter adjusting result.
Accordingly, the program implementation flow of the data outlier detection method is as follows: giving a depth space, setting the number of samples as n, and carrying out value self-adaptive clustering on the samples, wherein the maximum number of data classes is K, and the data average value of each data class K isThe intra-class divergence is S k, K epsilon K; for each input sample x i, calculate its distance d k,/>, to each datacenterData class C min with the smallest distance is selected, if the distance d k is not within 3 times of the intra-class divergence, the sample x i is an outlier and marked as the outlier for use in the second step, otherwise, the sample x i is clustered into the data class C min. After each data class k changes, its mean/>, is updatedIntra-class divergence S k. To control the rapid increase in the number of samples in a data class, it is necessary to determine whether to separate a class into two classes, and a determination flow is shown in fig. 3.
The first step of data outlier detection has two features.
(1) According to the application characteristics, the condition for judging the abnormal point of the data is provided. I.e., when the distance d k of one sample point x i to the nearest data class k is >3 times the intra-class divergence S k, then x i is the data outlier.
(2) To control the rapid increase in the number of samples in a data class, it is necessary to determine whether to split one data class into two data classes. As shown in fig. two, a judgment condition is given, i.e., S 0≤a*(S1+S2), where α is a range determined according to the tuning parameter, and is applied in the art as a range of empirical values of α.
And secondly, correcting abnormal points of the data.
The data outlier uses the data outlier marked in the first step, and the marked point is the data outlier.
When correcting the data outlier, consider a total win number of samples before and after the depth of the data outlier, which is called the correction window.
The basic idea of correcting the outliers of the data is to use a gradient descent fitting algorithm in machine learning and determine the degree of a fitting polynomial by using an adaptive method. Let the fitting polynomial in the local range win be y=θ Txi,i=0,1,2,...N,θ=<θ01,...,θNT, θ being a vector, a parameter to be solved. To prevent the algorithm from overfitting, the polynomial degree is set to 6 at maximum. Conventionally, 80% of the data interval within the range win is sampled as training data, and the remaining 20% is used as validation test data. Training the polynomial of each order by using gradient descent fitting method, wherein the designed loss function in the training isThe learning rate is set to 0.005 after the parameter adjustment. When the difference |err train-errtest | between the current training error err train and the test error err test meets a certain condition, stopping the iterative process of the gradient descent algorithm, memorizing the polynomial order i and the polynomial parameter theta value at the moment, and exiting.
Among the polynomials, the polynomial with the smallest test error err test is the fitting polynomial, and the depth is used as the independent variable to evaluate according to the polynomial, namely the correction value of the data outlier.
The data outlier correction flow chart is as shown in fig. 4:
the second step of data outlier correction has two features.
(1) The termination conditions in fig. 4 are given. When |err train-errtest | of the current iteration is greater than 1/(2.7×n) Σ i|erritrain-erri test |, the iterative gradient descent fitting algorithm is terminated.
(2) An adaptive parameter win adjustment method is provided. Judging whether the fitting is performed by using a conventional machine learning method, and when the fitting is performed, more data are needed to simulate the fitting function asThe win value should be increased at this time. The method for adaptively adjusting the parameters win comprises the following steps: win= (1+β/r) ×win, where win is a super parameter, when actually used, a win value is given, and the larger the win value is, the more accurate the correction of the outlier is, but the fitting time is increased; wherein r is the win value at the last fitting.
The data anomaly detection and anomaly data correction method for well drilling pressure parameter calculation achieves the effect of data optimization and provides a theoretical basis for guiding production in real time in the well drilling process.
Examples.
14755 Pieces of data exist in the most original logging data, and each piece of data comprises a well name, a sounding m, a well diameter in', neutrons, density g/cm 3, natural gamma, natural potential, deep lateral resistivity, shallow lateral resistivity, longitudinal wave time difference us/ft and transverse wave time difference us/ft.
It is assumed that overburden pressure has been calculated from the raw log data, followed by calculation of rock mechanical parameters at the calculation module.
According to the mechanical formula, the rock mechanical parameters are calculated by taking the result of the overburden pressure calculation as input. Calculation results of rock mechanical parameters totally comprise 6 aspects of data: poisson's ratio, modulus of elasticity, cohesion, internal friction angle, compressive strength and tensile strength, which are referred to herein as raw data, are shown in fig. 5, where the ordinate is well depth m, without any abnormal point detection and abnormal correction of data.
In the first step of the data outlier detection algorithm, n=all data/4 is set. The smaller the value of n, the faster the previous data is forgotten. The detection result of the abnormal point of the data is as follows: poisson's ratio anomaly 1072, elastic modulus anomaly 1339, cohesion anomaly 836, internal friction angle anomaly 1146, compressive strength anomaly 728, tensile strength anomaly 911. And using the data outlier correction function of the second step, the corrected result is shown in fig. 6.
In comparison with fig. 5, the poisson ratio, elastic modulus, cohesion, compressive strength and tensile strength of fig. 6 are significantly reduced, and the internal friction angle data has 1146 abnormal points, but the corrected structure is not obvious because the data has no significant rule.
Setting n=total data/8, and performing outlier detection on the original data. The detection result of the abnormal point of the data is as follows: poisson's ratio outliers 1048, elastic modulus outliers 1107, cohesion outliers 844, internal friction angle outliers 1010, compressive strength outliers 707, tensile strength outliers 853. And using the outlier correction function of the second step, the corrected result is shown in fig. 7.
The poisson's ratio, elastic modulus, cohesion, compressive strength and tensile strength anomaly in fig. 7 are significantly reduced as compared to fig. 5. As compared with fig. 6, the number of data outliers is significantly reduced because the n value is reduced.
The invention overcomes the abnormality of the data in the original drilling data, also overcomes the defect of low calculation precision of the subsequent calculation module caused by the error calculated by the intermediate calculation module in fig. 1, can accurately judge the well depth of the data abnormal point, corrects the data abnormal point, and provides a basis for scientifically and reasonably guiding drilling construction. And (3) attempting to use a gradient descent fitting algorithm of machine learning for calculating parameters such as pressure in the drilling process, detecting input data of each calculation module and data abnormal points in calculation results, and giving out optimal calculation results of various pressures and parameters.
In view of the foregoing, the present invention is not limited to the above-described embodiments, and other embodiments can be easily proposed by those skilled in the art within the scope of the technical teaching of the present invention, but such embodiments are included in the scope of the present invention.

Claims (7)

1. The method for detecting and correcting the abnormal point of the data in the calculation of the drilling pressure parameter is characterized by comprising the following steps:
A. For data needing abnormal point detection and constant correction, assuming that the input of each calculation module is called a sample sequence D, wherein the sample sequence is expressed as D= { x 1,x2,......,xn }, each sample x i in the sample sequence D is expressed in a form of a (well depth js, value v) "binary group, the unit of the well depth js is m, the data can be provided with decimal floating point type data, the value v is expressed as the input data of the calculation module or the calculation result of the module, the unit is uniform, the number of samples in the sample sequence D is n, and the samples are arranged according to the increasing sequence of the well depth;
B. Detecting abnormal points of data: defining an initial sample space, classifying the data of the sample sequence through a clustering method, selecting the data class nearest to the sample when a new sample arrives, and if the degree of the sample deviating from the data class is greater than 3 times of the intra-class divergence of the data class, marking the new sample as a data outlier and marking the new sample as the data outlier;
C. Correction of data outlier: and (3) adopting a gradient descent fitting algorithm in machine learning, determining the degree of a fitting polynomial by using an adaptive method, and stopping the iterative process of the gradient descent algorithm when the difference |err train-errtest | between the current training error err train and the test error err test meets a certain condition, and evaluating according to the fitting polynomial by taking the depth of the data outlier as an independent variable, wherein the value is the corrected value of the data outlier.
2. The method for detecting and correcting data outliers in the calculation of a drilling pressure parameter of claim 1, wherein the step B data outlier detection comprises: giving a depth space, setting the number of samples as n, and carrying out value self-adaptive clustering on samples, wherein the maximum number of data classes is K, and the data average value of each data class K isThe intra-class divergence is S k, K epsilon K; each input sample x i, calculating the distance d k from the sample x i to each data class center, selecting the data class C min with the minimum distance, if the distance d k is not within 3 times of the intra-class divergence, marking the sample x i as an abnormal point for C, otherwise, clustering the sample x i into the data class C min, and updating the average value/>, after each data class k changesIntra-class divergence S k;
each data class mean value calculation formula:
Wherein M is the number of samples of the kth class;
The distance d k of sample x i to each datacenter computes the formula:
The calculation formula of the divergence in the class of data:
Wherein, Is the mean of the k-th class of sample values.
3. The method for detecting and correcting abnormal data in the calculation of drilling pressure parameters according to claim 2, wherein if the number of samples in a certain data class C k increases faster to a certain extent, the data class is split into two sub-classes C k1、Ck2; in order to control the rapid increase of the number of samples of a certain class, it is necessary to determine whether one class of data is to be separated into two classes of data; judging whether the condition of splitting the data class is needed, namely S 0≤a*(S1+S2), wherein alpha is a super parameter, and taking a value between 1.1 and 1.2 according to a parameter adjusting result.
4. The method for detecting and correcting abnormal data points in the calculation of drilling pressure parameters according to claim 2, wherein in the step C, when correcting abnormal data points, the total win samples before and after the depth of one abnormal data point are comprehensively considered, and the win value is called a correction window;
C1. Let the fitting polynomial in the local range win be y=θ Txi,i=0,1,2,...N,θ=<θ01,...,θNT, θ is a vector, which is the parameter to be solved;
C2. setting a loss function in training
C3. sampling 80% of data intervals in a range win to serve as training data, and taking the remaining 20% of data as test data;
C4. And (5) adopting a gradient descent fitting algorithm to carry out iterative computation. Wherein, the learning rate is set to be 0.005 after parameter adjustment. In the iteration process, when the difference |err train-errtest | between the current training error err train and the test error err test meets a certain condition, stopping the iteration process of the gradient descent algorithm, saving the polynomial order i and the polynomial parameter theta value at the moment, and stopping the iteration process. The depth is taken as an independent variable and is evaluated according to a fitting polynomial y=theta Txi, namely the correction value of the data outlier.
5. The method for detecting and correcting abnormal points of data in calculation of drilling pressure parameters according to claim 2, wherein the gradient descent fitting algorithm is stopped when the termination condition of the gradient descent fitting algorithm is that |err train-errtest | of the current iteration is greater than 1/(2.7 x n) Σ i|erri train-erri test |.
6. The method for detecting and correcting abnormal points of data in calculation of pressure parameters of well drilling according to claim 2, wherein the conventional method of machine learning is used to determine whether the fitting is over, and when the fitting is over, more data simulation fitting functions are needed to beThe win value should be increased at this time. The method for adaptively adjusting the parameters win comprises the following steps: win= (1+βr) win, wherein win is a super parameter, when in actual use, a win value is given, and the larger the win value is, the more accurate the correction of the abnormal points of the data is, but the fitting time is increased; wherein r is the win value at the last fitting.
7. The method for detecting and correcting abnormal points of data in calculation of drilling pressure parameters according to claim 4, wherein in step C1, the maximum polynomial degree is set to 6 in order to prevent overfitting.
CN202211532836.7A 2022-12-01 2022-12-01 Method for detecting and correcting abnormal points of data in calculation of drilling pressure parameters Pending CN118171122A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211532836.7A CN118171122A (en) 2022-12-01 2022-12-01 Method for detecting and correcting abnormal points of data in calculation of drilling pressure parameters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211532836.7A CN118171122A (en) 2022-12-01 2022-12-01 Method for detecting and correcting abnormal points of data in calculation of drilling pressure parameters

Publications (1)

Publication Number Publication Date
CN118171122A true CN118171122A (en) 2024-06-11

Family

ID=91353628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211532836.7A Pending CN118171122A (en) 2022-12-01 2022-12-01 Method for detecting and correcting abnormal points of data in calculation of drilling pressure parameters

Country Status (1)

Country Link
CN (1) CN118171122A (en)

Similar Documents

Publication Publication Date Title
CN115577275A (en) Time sequence data anomaly monitoring system and method based on LOF and isolated forest
CN113434357B (en) Log anomaly detection method and device based on sequence prediction
CN112989708B (en) Well logging lithology identification method and system based on LSTM neural network
US20170328181A1 (en) Method and apparatus for drilling a new well using historic drilling data
CN111046341A (en) Unconventional natural gas fracturing effect evaluation and capacity prediction method based on principal component analysis
CN110674841A (en) Logging curve identification method based on clustering algorithm
CN109992872B (en) Mechanical equipment residual life prediction method based on stacked separation convolution module
CN112364560B (en) Intelligent prediction method for working hours of mine rock drilling equipment
CN113236228B (en) Method and system for rapidly predicting single well yield
CN111985825A (en) Crystal face quality evaluation method for roller mill orientation instrument
CN112633328A (en) Dense oil reservoir transformation effect evaluation method based on deep learning
CN114358434A (en) Drilling machine drilling speed prediction method based on LSTM recurrent neural network model
CN115293197A (en) Borehole strain data anomaly detection method based on long-term and short-term memory network
CN116644284A (en) Stratum classification characteristic factor determining method, system, electronic equipment and medium
CN117194995A (en) Rail vehicle RAMS data association analysis method based on data mining
CN117892162A (en) Watershed runoff forecasting method based on nonnegative matrix factorization and machine learning
Bahari et al. Intelligent drilling rate predictor
CN113842135A (en) BilSTM-based sleep breathing abnormality automatic screening method
CN117473305A (en) Method and system for predicting reservoir parameters enhanced by neighbor information
CN117310118A (en) Visual monitoring method for groundwater pollution
CN116070767B (en) Drilling fluid leakage horizon prediction method based on capsule neural network
CN118171122A (en) Method for detecting and correcting abnormal points of data in calculation of drilling pressure parameters
CN113762394B (en) Blasting block prediction method
CN114862007A (en) Short-period gas production rate prediction method and system for carbonate gas well
US20230348197A1 (en) Sound-based roller fault detecting method by using double-projection neighborhoods preserving embedding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination