CN113268803B - Method for generating drilling overflow diagnosis model, drilling overflow diagnosis method and device - Google Patents

Method for generating drilling overflow diagnosis model, drilling overflow diagnosis method and device Download PDF

Info

Publication number
CN113268803B
CN113268803B CN202110636358.3A CN202110636358A CN113268803B CN 113268803 B CN113268803 B CN 113268803B CN 202110636358 A CN202110636358 A CN 202110636358A CN 113268803 B CN113268803 B CN 113268803B
Authority
CN
China
Prior art keywords
data
overflow
drilling
neural network
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110636358.3A
Other languages
Chinese (zh)
Other versions
CN113268803A (en
Inventor
岳元龙
李仙琳
左信
高小永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum Beijing
Original Assignee
China University of Petroleum Beijing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum Beijing filed Critical China University of Petroleum Beijing
Priority to CN202110636358.3A priority Critical patent/CN113268803B/en
Publication of CN113268803A publication Critical patent/CN113268803A/en
Application granted granted Critical
Publication of CN113268803B publication Critical patent/CN113268803B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/13Architectural design, e.g. computer-aided architectural design [CAAD] related to design of buildings, bridges, landscapes, production plants or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/08Fluids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/08Thermal analysis or thermal optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/10Noise analysis or noise optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Architecture (AREA)
  • Civil Engineering (AREA)
  • Structural Engineering (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application discloses a method for generating a drilling overflow diagnosis model, a method for drilling overflow diagnosis and a device, wherein the method for generating the model comprises the following steps: acquiring multiple groups of historical data of a well with an overflow event, wherein part of the multiple groups of historical data are collected before the overflow event occurs, and the rest parts of the multiple groups of historical data are collected during the overflow event, and each group of historical data comprises multidimensional field data which are collected at the same time and are used for representing the condition of a drilling field; acquiring mark data corresponding to part of groups in multiple groups of historical data; training a neural network model by taking a plurality of groups of historical data and corresponding labeled data as a plurality of training samples to obtain a drilling overflow diagnosis model; and the multi-dimensional field data in each training sample is used as input data of the model, and the marking data of whether the well has an overflow event or not is used as output data of the model. The neural network training method and the neural network training device do not need all training samples to correspond to marked data, and therefore labor and time costs are reduced.

Description

Method for generating drilling overflow diagnosis model, drilling overflow diagnosis method and device
Technical Field
The application relates to the technical field of petroleum drilling, in particular to a drilling overflow diagnosis model generation method, a drilling overflow diagnosis method and a drilling overflow diagnosis device.
Background
The overflow is that the pressure of the formation fluid can not be balanced due to the density of the drilling fluid, oil gas and water in the formation are pressed into the shaft, so that the drilling fluid in the shaft overflows, if the pump is stopped, the drilling fluid can automatically flow out from the well head, if the drilling fluid is in a circulating state, the drilling fluid amount pumped into the shaft is smaller than the return fluid amount, and the overflow can be found by measuring the drilling fluid amount in the circulating tank. The overflow often is the precursor of taking place the blowout, and if the quantity of oil gas water that gets into the pit shaft is less, in time discover can be through improving drilling fluid density reestablishment pressure balance, if not discover in time, then pit shaft pressure is lower and lower more and more, and oil gas water gets into sooner and more, finally scurries out ground and arouses the blowout. A blowout preventer is arranged on a common wellhead, so that the wellhead can be closed, and further control can be implemented; however, if the pressure of oil, gas and water in the stratum is too high, the pressure of the well mouth may exceed the rated pressure of the blowout preventer after the well mouth is closed, at the moment, the well mouth has to be opened again, stratum fluid is spurted out, the subsequent control difficulty is very high, and therefore the liquid level condition is observed, and overflow is found in time to be an important work in drilling construction.
To do so, a technician collects and analyzes field drilling site data. The data types comprise well depth, drill bit depth, weight on bit, rotation speed, drilling speed, torque, riser pressure, hook load, hook height, pump stroke, drilling fluid inlet and outlet flow, drilling fluid inlet and outlet temperature, drilling fluid inlet and outlet conductivity, drilling fluid inlet and outlet density, total pond volume and the like, and the data are collected once per second, and the collection times in half an hour reach 1800 times. At present, an experienced expert is generally required to judge whether the overflow phenomenon exists at each moment according to a field accident recording report and the data collected at each moment when the overflow diagnosis is carried out according to field collected data, namely, the data at each moment are labeled, however, the work consumes a large amount of manpower and time due to huge data amount.
Disclosure of Invention
The embodiment of the application aims to provide a drilling overflow diagnosis model generation method, a drilling overflow diagnosis method and a drilling overflow diagnosis device, so as to solve the problem that time and labor are wasted when multi-dimensional historical data acquired in a drilling site are labeled in the prior art.
The embodiment of the specification provides a method for generating a drilling overflow diagnosis model, which comprises the following steps: acquiring multiple groups of historical data of a well with an overflow event, wherein part of the multiple groups of historical data are acquired before the overflow event occurs, and the rest of the multiple groups of historical data are acquired during the overflow event, and each group of historical data comprises multidimensional field data which are acquired at the same time and are used for representing the condition of a drilling field; acquiring mark data corresponding to part of groups of the plurality of groups of historical data respectively, wherein the mark data is used for indicating whether the well is overflowed at the moment when one group of historical data is collected; training a neural network model by taking the multiple groups of historical data and the corresponding marked data as multiple training samples to obtain a drilling overflow diagnosis model; and the multi-dimensional field data in each training sample is used as input data of the model, and the marking data of whether the well has an overflow event or not is used as output data of the model.
Embodiments of the present disclosure also provide a drilling overflow diagnostic method, including: the drilling overflow diagnosis model generated by the method; acquiring at least one set of multi-dimensional field data of a well; inputting the group of multidimensional field data into the drilling overflow diagnosis model to obtain an output result; and determining whether the well has an overflow event at the moment corresponding to the multidimensional field data according to the output result.
According to the method for generating the drilling well overflow diagnosis model, the drilling well overflow diagnosis model can be trained according to the multi-group multi-dimensional historical data and the mark data which are corresponding to part of group data and indicate whether the drilling well overflows, all training samples are not required to correspond to the mark data, and therefore the work of manually labeling the collected historical data is greatly reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 shows a comparison graph of accuracy of neural network models trained based on U-ELM, PCA-ELM, KPCA-ELM methods;
FIG. 2 is a graph showing the comparison of accuracy and recall of neural network models trained based on KPCA-ELM and KPCA-SSELM methods;
FIG. 3 is a graph showing the comparison of the accuracy of the neural network model obtained by the KPCA-ELM and KPCA-SSELM method training when the ratio of the data with overflow marks is different;
FIG. 4 illustrates a flow chart of a method for generating a drilling kick diagnostic model provided by embodiments of the present description;
FIG. 5 shows a flow diagram of a method of training a neural network model by an SSELM semi-supervised extreme learning machine;
FIG. 6 illustrates a flow chart of a method of evaluating the performance of a trained neural network model and optimizing the neural network model accordingly;
FIG. 7 illustrates a flow diagram of a method for dimensionality reduction of data using KPCA;
FIG. 8 illustrates a flow chart of a method of drilling kick diagnostic provided by embodiments of the present description;
FIG. 9 is a schematic block diagram illustrating an apparatus for generating a drilling kick diagnostic model provided in an embodiment of the present disclosure;
FIG. 10 illustrates a functional block diagram of a drilling kick diagnostic apparatus provided in an embodiment of the present description;
fig. 11 shows a schematic diagram of an electronic device provided in an embodiment of the present specification.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application shall fall within the scope of protection of the present application.
In one scenario embodiment, for each well, the well site data (also called logging data) is collected once per second during the well drilling process, and each time is a set of data, and the data includes 19-dimensional data including well depth, drill bit depth, weight on bit, rotation speed, drilling speed, torque, riser pressure, hook load, hook height, pump stroke, drilling fluid inlet and outlet flow rate, drilling fluid inlet and outlet temperature, drilling fluid inlet and outlet conductivity, drilling fluid inlet and outlet density, total pool volume, and the like. This example focuses on the current day logging data of 4 wells in a certain area and 6 total overflow events occurring in the drilling process, and takes 10800 groups of data in total of half an hour before and during the overflow event as the raw data, and the data during the overflow event. These raw data all correspond to tagged data of whether a well has been flooded, the tagged data and the corresponding 19-dimensional data being derived from the same well's acquired values at the same time. The marker data may be-1 or 1, where-1 indicates that no overflow event has occurred and 1 indicates that an overflow event has occurred. These marking data may typically be those determined and given by experienced experts based on field incident record reports and logging data collected at each time, or may be based on actual phenomena at the drilling site that indicate flooding. In addition, for a large amount of history data, corresponding tag data cannot be generally acquired, and the tag data may be filled with 0.
In the data acquisition process, equipment failure or human factors and other reasons may cause some dirty data to exist in actually acquired field data, that is, the acquired data has the problems of missing, noise, outliers, large variation range differences of different dimensions and the like. For this purpose, data preprocessing can be performed in the following manner
(1) Outlier processing
Considering the influence factors of the performance of the sensor, the installation environment and the working conditions in the well, the collected data may exceed the normal acceptable fluctuation range, and the data with serious deviation is called outlier. Unlike noise, the presence of outliers can lead to false alarms. However, in the drilling process, due to the fact that the working procedures of normal drilling, single joint connection, tripping and the like exist, data can fluctuate greatly in the corresponding working procedure switching process, and under the situation, an outlier cannot be considered to exist, so that the drilling working condition knowledge needs to be combined during processing. The embodiments of the present description provide the following methods to eliminate outliers: determining the length of a sliding window, wherein the length of the sliding window is used for limiting the number of data concerned in one-time processing, and the data concerned are acquired at adjacent acquisition moments; calculating the average value and standard deviation of the data in the window; sequentially judging whether the difference between the data in the window and the average value exceeds 3 times of standard deviation or not; if the average value exceeds the preset value, the data is an outlier, and the data is replaced by the average value; if not, the data portion is processed.
(2) Missing value handling
Data storage may fail due to failure of the acquisition device and other reasons, and then data loss and vacancy conditions occur. This can be selected depending on the nature of the distribution of the data and the importance in the subsequent modeling. For example, when the missing rate of a certain dimension of data is high and the importance is low, the dimension of data can be directly removed; if a certain data missing rate is low, filling can be performed according to the data distribution condition.
Specifically, for collected drilling site data, firstly, data analysis is performed to detect the missing proportion of each dimension of data, deletion or filling is considered according to the importance of the data, for example, the data of the torque dimension is found to be missing, the missing proportion is 1.5%, and the mean value filling mode is selected to fill the mean value of a predetermined number of data near the missing value considering that the torque reflects the drilling condition of a drill bit and the formation change.
In addition, the collected drilling site data are often represented as-9.999, -9999 and the like, such as parameters of the lifting suction pressure, the descending activation pressure, partial pool volume and the like, and the processing modes are directly deleted.
(3) Noise processing
The data measured by the sensors is inevitably noisy, which makes the collected data often unsmooth, and the noise processing is required to determine the best noise reduction effect while maintaining the original distribution of the data. The main method of smoothing data noise is based on a sliding window, and data smoothing is achieved by averaging the data within the window.
(4) The variation range of different dimensions is greatly different
In the 19-dimensional data, data in different dimensions have dimensions and different variation ranges, for example, the variation range of the weight on bit is 50-200, and the variation range of the drilling rate is 5-20, if the data are not processed into dimensionless scalars, the data in the dimension with the large variation range may be decisive for the result, and the data in the dimension with the small variation range may be neglected. Therefore, in addition to the preprocessing, normalization processing needs to be performed on the collected data, dimensional data is converted into a dimensionless scalar, and the variation ranges of the dimensional data are unified into one interval. The specific method can adopt a Z-score normalization formula:
Figure BDA0003105382280000041
wherein the content of the first and second substances,
Figure BDA0003105382280000042
the mean value of a certain dimension in the original data is sigma, the standard deviation of the dimension data is sigma, x is data to be normalized in the dimension data, and z is data after normalization. I.e., the mean value of each one-dimensional data after normalization becomes 0.
For the 19-dimensional data provided by the embodiment of the specification, each piece of dimensional data is normalized by adopting the method. But the accompanying flag data indicating whether a well has been flooded does not do so.
For the collected drilling field data with more dimensions, the situations of high redundancy and high nonlinear correlation degree among the dimensional data also exist, so that the data volume in the model training, evaluation and application processes is too large, and for this reason, feature extraction and dimension reduction can be carried out on the preprocessed data. For example, PCA (Principal Component Analysis, chinese: principal Component Analysis) is adopted, and the embodiments of the present specification perform dimensionality reduction and dimensionality reduction on data by using a KPCA (Kernel Principal Component Analysis) method, and the specific steps are as follows.
Forming the well site data after preprocessing into a matrix X, X = [ X ] 1 ,x 2 ,...,x n ] T And n is the number of data sets 10800, where the dimension m of each set of data is 19. The method provided by the embodiments of the present specification contemplates mapping Φ (x) non-linearly j ) Data point x 1 ,x 2 ,...,x n Into space of another dimension (e.g., a higher dimension) and then reduced in size by the transformed data. However, this non-linear mapping Φ (x) j ) And are not easy to find. In order to make the dimension reduction calculation more convenient, the method provided by the specification makes phi (x) i ) T Φ(x j )=K ij Kernel matrix K ij For an n × n (10800 × 10800) matrix, a core matrix K is calculated by using a gaussian Radial Basis Function (RBF) ij Ith row and jth column, and k ij Obtained from the ith input data and the jth input data, i.e.
Figure BDA0003105382280000051
Recalculating the kernel matrix K ij Corresponding characteristic value lambda 12 ,…,λ n And corresponding feature vector u 1 ,u 2 ,…,u n . I.e. the following equation exists:
Figure BDA0003105382280000052
wherein Λ is diagonal matrix, and elements on the diagonalElement is a nuclear matrix K ij And U is a matrix formed by eigenvectors corresponding to the eigenvalues.
Then, the eigenvalues are arranged from large to small and the positions of the corresponding eigenvectors are correspondingly adjusted to obtain lambda' 1 ,λ′ 2 ,…,λ′ n And corresponding feature vector u' 1 ,u′ 2 ,…,u′ n . Further converting the characteristic vector into a characteristic vector alpha with unit orthogonality by a Schmidt orthogonalization method 12 ,…,α n
Then, the eigenvalues λ 'are respectively calculated in order' 1 ,λ′ 2 ,…,λ′ n Cumulative contribution rate C of 1 ,C 2 ,...,C n Determining the subscript l of the characteristic value when the cumulative contribution rate is greater than a preset threshold for the first time, and selecting the first l of the unit orthogonal characteristic vectors, namely alpha 12 ,…,α l . The calculation formula of the cumulative contribution rate is as follows:
Figure BDA0003105382280000053
in the embodiment of the present specification, when the preset threshold is 90%, the following table shows subscripts of the feature values, specific numerical values of the feature values, contribution rates of individual feature values, and cumulative contribution rates corresponding to the first i feature values, respectively. As can be seen from the figure, l is 5, and the cumulative contribution rate of the 5 th eigenvalue is the sum of the contribution rates of the first 5 eigenvalues.
Table one contribution rate of each principal component
Serial number Characteristic value Contribution rate% Cumulative contribution rate%
1 8.2684 32.92 32.92
2 4.6425 18.48 51.4
3 3.7991 15.12 66.52
4 3.1716 12.63 79.15
5 2.9219 11.63 90.78
Finally, a kernel matrix K is calculated ij The projection on the first l eigenvectors obtains the data of the drilling site data after KPCA dimension reduction, thereby changing the 19-dimensional drilling site data into 5-dimensional data, namely the data before dimension reduction is 10800 multiplied by 19 matrix, and the data after dimension reduction is 10800 multiplied by 5 matrix.
After the data after the dimensionality reduction is obtained, the sample set can be divided into a training sample set S, a verification sample set V and a test sample set T, and the data samples in the sample sets correspond to the marking data of whether the well has an overflow event or not. The training sample set S is used for training the neural network model, the verification sample set V is used for optimizing parameters of the trained neural network, and the test sample set T is used for verifying the performance of the model. The specific method of dividing the sample set may be a hierarchical sampling method. For example, for the data of each well, the number 1,2,3 … … is divided by 3 according to the time sequence, the data with the remainder of 0 is drawn into the training sample set S, the data with the remainder of 1 is drawn into the verification sample set V, and the data with the remainder of 2 is drawn into the test sample set T. The layered sampling method can ensure that the data distribution of the overflow event and the overflow event non-occurrence state of the well drilling in each sample set is approximately the same as that of the original logging sample set, and the data distribution of the overflow event and the overflow event non-occurrence state of the well drilling is relatively balanced.
In the embodiment provided in the present specification, the data amount in the training sample set S is 6000 groups, the data amount in the verification sample set is 1800 groups, and the data amount in the test sample set is 3000 groups.
For the training sample set S, the method provided in this specification further extracts a part of samples by a hierarchical sampling method, and removes the labeled data of whether the well has an overflow event in these samples, i.e. forming an unlabeled data subset U, and the remaining data forming a labeled data subset L. Unlabeled data, as opposed to labeled data, is generally not useful for training neural network models. The embodiment of the specification provides a training method for training a neural network model by using a labeled data subset L and an unlabeled data subset U together, namely a semi-supervised learning method, wherein the number of data in the labeled data subset L is far smaller than that in the unlabeled data subset U. The neural network model adopts a semi-supervised extreme learning machine model.
The Semi-Supervised Extreme Learning machine (SSELM) is obtained by improvement on the basis of the Extreme Learning Machine (ELM). For the sake of understanding the SSELM, the training principle and process of the ELM will be briefly described below.
For ELM, the input weights and biases can be randomly initialized and the corresponding output weights derived. Suppose there are N arbitrary samples (X) 1 ,t 1 ),(X 2 ,t 2 ),…,(X i ,t i ),…,(X N ,t N ) Wherein X is i =[x i1 ,x i1 ,…,x in ] T I.e. X i Is a set of data of dimension n, t i =[t i1 ,t i1 ,…,t in ] T I.e. t i Is also a set of data of dimension n. For a single hidden layer neural network with L hidden layer nodes can be expressed as
Figure BDA0003105382280000071
Wherein g (x) is an activation function, W = [ W = i1 ,w i2 ,…w in ] T As input weights, β i As output weights, b i Is the bias of the i-th hidden layer node of the hidden layer. W i ·X j Represents W i And X j The inner product of (d).
The goal of single-hidden-layer neural network learning is to minimize the error in the output, which can be expressed as
Figure BDA0003105382280000072
I.e. the presence of beta i 、W i And b i So that
Figure BDA0003105382280000073
It can be expressed as H β = T, where H is the output of each node of the hidden layer, β is the output weight, and T is the desired output.
To be able to train a single hidden layer neural network, we wish to obtain
Figure BDA0003105382280000074
And i =1, …, L, such that
Figure BDA0003105382280000075
Where i =1, …, L, which is equivalent to minimizing the loss function.
Figure BDA0003105382280000076
Conventional gradient descent-based algorithms can be used to solve such a problem, but a gradient-based learning algorithm requires that all parameters be adjusted in an iterative process, whereas in the ELM algorithm, once the weights W are input i And bias of the hidden layer b i Randomly determined, the output matrix H of each node of the hidden layer is uniquely determined. The problem of training a single hidden layer neural network can be translated into solving a linear system Y = H β, and the output weight matrix β can be determined as
Figure BDA0003105382280000077
Wherein H + Is a Moore-Penrose generalized inverse of matrix H and can be found by proof
Figure BDA0003105382280000078
Is the smallest and unique.
SSELM adds manifold regularization on the basis of ELM, and uses the internal manifold structure of data as a penalty term to constrain the objective function of the extreme learning machine, so that a large amount of unmarked data can be fully utilized, the workload of marking the data is reduced, and the advantages of no need of iteration and high model execution efficiency of the extreme learning machine are inherited.
The SSELM objective function is expressed in the form of
Figure BDA0003105382280000081
Wherein the content of the first and second substances,
Figure BDA0003105382280000082
Figure BDA0003105382280000083
x 1 、x 2 ……x N is input data of the neural network model, N is the number of the input data,
Figure BDA0003105382280000084
g (x) is the activation function of the hidden layer, m is the dimension of the input data,
Figure BDA0003105382280000085
respectively the input weight of each node of the hidden layer of the neural network,
Figure BDA0003105382280000086
in order to imply the number of layer nodes,
Figure BDA0003105382280000087
respectively, the bias of each node of the hidden layer of the neural network. H represents the output of each node of the hidden layer, and β is the output weight. Y is the desired output, i.e., the marker data in the embodiment of the present specification indicating whether a well has been flooded.
Figure BDA0003105382280000088
The graph Laplacian matrix is constructed by marked data and unmarked data, and lambda is a regularization factor of the flow regularization.
The penalty matrix C is set for the input data amount imbalance between the overflow flag (i.e., the flag data corresponding to the history data of the overflow event) and the input data amount imbalance between the overflow flag (i.e., the flag data corresponding to the history data of the overflow event), and the overfitting problem is alleviated by assigning different penalty parameters to the input data amount corresponding to the overflow flag and the input data amount corresponding to the non-overflow flag. C = diag (C) 1 ,C 2 ,…,C l 0, …) is a diagonal matrix, its diagonalThe above elements are penalty parameters, which should be l + u, l is the number of marked data, u is the number of unmarked data, no penalty is needed for unmarked data, and therefore, the value of some elements on the diagonal is 0. For elements other than 0, C i =C 0 /N y Wherein y represents a flag indicating whether an overflow event has occurred, and y has N y Number of samples, C i Is a penalty parameter set for the data corresponding to the marker y, C 0 The regularization parameters are customized for the user. For example, if the number of data sets corresponding to overflow markers in the multi-dimensional historical data corresponding to marked data is 2000, and the number of data sets corresponding to non-overflow markers is 4000, then the penalty parameter corresponding to each input data corresponding to overflow markers is C 0 2000, for each input data corresponding to the non-overflow mark, the corresponding penalty parameter is C 0 /4000。
Solving the SSELM objective function to obtain beta, wherein when the gradient is 0, the beta is
β=(I+H T CH+λH T LH) -1 H T CY
Wherein I is an identity matrix.
Based on the above description, the following describes specific training steps of the neural network model provided in the embodiments of the present specification.
1. Determining the number of nodes of a single hidden layer of the neural network model, selecting an activation function, and randomly initializing the input weight and bias of each node of the hidden layer to obtain an output matrix H of each node of the hidden layer.
The number of hidden layer nodes can be freely chosen to be 10, 15, 20 or 30, etc.
The hidden layer matrix H is a matrix composed of input weights, activation functions, and bias parameters, as shown below. The output weights additionally form a matrix beta. H β may be the output value of each node of the hidden layer.
Figure BDA0003105382280000091
2. Selecting a similarity metric function
Figure BDA0003105382280000092
A graph laplacian matrix L is established based on input data of the training samples.
The graph laplacian matrix L = D-W, where W is an adjacent matrix of the graph and the matrix W is an N × N matrix, where elements in the matrix are
Figure BDA0003105382280000093
The similarity between the ith and jth input values is expressed, i.e. by measuring the similarity between two elements in euclidean distance. D is the degree matrix of the graph, which is also an N × N matrix, and is a diagonal matrix, and the elements on a certain diagonal are the sum of the elements on the corresponding columns in the W matrix.
3. From a series of exponents 10 -6 ,10 -5 ,…10 6 Randomly selecting two data as regularization parameters C 0 And a manifold regularization parameter λ.
4. According to the equation β = (I + H) T CH+λH T LH) -1 H T Solving the hidden layer output weight by CY to obtain a neural network model Y = H beta; wherein I is an identity matrix, lambda is a pre-selected manifold regularization parameter,
Figure BDA0003105382280000094
Figure BDA0003105382280000095
x 1 ,x 2 …x N as input data of a neural network model, y 1 ,y 2 …y N Is the output data of the neural network model,
Figure BDA0003105382280000096
the output weight of the hidden layer is solved.
Because many parameters are randomly determined in the neural network training process, the neural network with better performance may not be trained according to the randomly determined parameters, and for this reason, the method provided in the embodiment of the present specification further adopts the verification sample set V to verify the trained neural network model to confirm whether the performance of the neural network model meets the requirements.
Specifically, the accuracy or the recall rate may be selected, or a combination of the accuracy and the recall rate may be used to verify the performance of the model. The method comprises the following steps of taking various types of drilling field data in each group of training samples in data in a verification sample set V as input data of a model, processing the input data through a trained neural network model to obtain output data, and judging which of the following situations exists between the verification output data and labeled data corresponding to the input data in the sample set V: (1) The output data and the mark data both represent that an overflow event has occurred, that is, the input historical data is correctly identified as the data when the overflow event has occurred, which is also called True Positive (TP) data; (2) Both the output data and the mark data indicate that no overflow event occurs, that is, the input historical data is correctly identified as the data when no overflow event occurs, which is also called True Negative (TN) data; (3) The output data represents that an overflow event has occurred and the mark data represents that no overflow event has occurred, i.e. the input historical data is erroneously identified as data when an overflow event has occurred, which is also called False Positive (FP) data; (4) The output data indicates that no overflow event has occurred and the flag data indicates that an overflow event has occurred, i.e., data when the input history data is erroneously recognized as that no overflow event has occurred, is also referred to as False Negative (FN) data.
And then calculating the accuracy rate and/or the recall rate according to a formula. Wherein the content of the first and second substances,
the calculation formula of the accuracy is as follows:
Figure BDA0003105382280000101
the recall rate is calculated by the formula
Figure BDA0003105382280000102
In the two calculation formulas, TP represents the number of sets of the corresponding output data and the corresponding mark data, both of which represent the historical data of the overflow event, TN represents the number of sets of the corresponding output data and the corresponding mark data, both of which represent the historical data of the overflow event, not occurring, FP represents the number of sets of the corresponding output data, both of which represent the historical data of the overflow event, not occurring, and FN represents the number of sets of the corresponding output data, both of which represent the historical data of the overflow event, not occurring, and the corresponding mark data, both of which represent the historical data of the overflow event.
Accuracy can be used to measure the level at which the model correctly identifies the state of the input data corresponding to an overflow event that has occurred or not. In the field of drilling technology, much attention is paid to how much input data in the state that overflow events actually occur are correctly diagnosed by a model, and therefore the performance of the model can be evaluated by adopting the recall rate.
Adjusting the regularization parameter C when the accuracy and/or recall does not reach a predetermined value 0 And/or a manifold regularization parameter lambda, and recalculating and outputting the weight of the hidden layer so as to obtain a new neural network model. And correspondingly, when the accuracy and/or the recall ratio reach a preset value, taking the neural network model obtained by the final training as a final training result.
After the final neural network model is obtained by the method, the inventor evaluates the performance advantages of the neural network model obtained by the KPCA-SSELM-based neural network training method provided by the present specification by verifying the sample set V and testing the sample set T.
For this reason, the effect of KPCA on the performance of neural network models was first verified. Specifically, the various types of drilling site data in the verification sample set V and the test sample set T are respectively input into a neural network model obtained by training based on U-ELM, PCA-ELM and KPCA-ELM methods, the accuracy is calculated, and the result is shown in FIG. 1. It can be seen from the figure that the KPCA dimension reduction method can improve the accuracy of the neural network model obtained by training.
Then, the effect of SSELM on the performance of neural network models was verified. Specifically, 90% of data in the verification sample set V is extracted, labeled data corresponding to the data is removed, the various types of drilling site data in the verification sample set V after the labels are removed are respectively input into a neural network model obtained based on the training of the KPCA-ELM and KPCA-SSELM methods, and the accuracy and the recall rate are respectively calculated, and the result is shown in fig. 2. As can be seen from the figure, the SSELM training method can greatly improve the accuracy of the neural network model.
Finally, part of historical data in the test sample set T is extracted for five times respectively, the marked data corresponding to the historical data are removed, the percentage of the historical data corresponding to the marked data in the test sample set T1, T2, T3, T4 and T5 after removal is respectively 10%, 20%, 30%, 40% and 50%, and the neural network models trained on the basis of the KPCA-ELM method and the KPCA-SSELM method are respectively input into the various types of drilling site data in the sample set T1, T2, T3, T4 and T5, the accuracy is respectively calculated, and the result is shown in figure 3. It can be seen from the graph that no matter the historical data proportion corresponding to the marked data is, the accuracy of the neural network model obtained by the KPCA-SSELM method is high and stable on a high level; and the lower the proportion is, the better the performance of the neural network model obtained by the KPCA-SSELM method is than the KPCA-ELM. Therefore, for the situation that the quantity of the collected data is large and the marked data is small in the field of petroleum drilling, the model obtained by training through the KPCA-SSELM method can greatly reduce the recognition error rate, and the safety of a drilling site is better guaranteed.
The above embodiments represent the method for generating the drilling overflow diagnostic model according to the embodiments of the present invention, and the method is shown in fig. 4.
S10: acquiring multiple groups of historical data of a well with an overflow event, wherein part of the multiple groups of historical data are collected before the overflow event occurs, and the rest parts of the multiple groups of historical data are collected during the overflow event, and each group of historical data comprises multidimensional field data which are collected at the same time and are used for representing the condition of a drilling field.
A "set of historical data" for a well is data for multiple dimensions acquired at the same time. The multidimensional data may include well depth, bit depth, weight on bit, rotational speed, rate of penetration, torque, riser pressure, hook load, hook height, pump stroke, drilling fluid inlet and outlet flow, drilling fluid inlet and outlet temperature, drilling fluid inlet and outlet conductivity, drilling fluid inlet and outlet density, total sump volume, and the like. These sets of historical data may be from one well or from multiple wells.
During the overflow event, the well drilling site has overflow phenomenon until being artificially processed, and the multidimensional historical data collected at any time during the period corresponds to the state of the overflow event.
In order to enable the trained neural network model to give more accurate diagnosis results, historical data before and during the occurrence of the overflow event should be included in the training sample.
S20: and acquiring mark data corresponding to partial groups of the historical data respectively, wherein the mark data is used for indicating whether the well has overflow events at the moment when the historical data is acquired.
The marking data may typically be a mark determined and given by an experienced expert based on the field incident record report and the logging data collected at each moment, or may be a mark given based on the fact that the drilling site is showing flooding or not.
S30: training a neural network model by taking a plurality of groups of historical data and corresponding labeled data as a plurality of training samples to obtain a drilling overflow diagnosis model; and the multi-dimensional field data in each training sample is used as input data of the model, and the marking data of whether the well has overflowed is used as output data of the model.
Since there is some historical data without label data, before training the neural network model in step S30, these historical data are usually filled with third label data, which is any data other than the label data indicating whether a well has been flooded. For example, in the above embodiment, the third flag data is filled to be 0.
Because some groups in the multiple groups of historical data correspond to the labeled data, the training sample with the multiple groups of historical data and the corresponding labeled data as the multiple training samples is used for training the neural network model, that is, when the neural network is trained, some data does not output data. The method for training the neural network by adopting the training sample is a semi-supervised learning method. The neural network model trained is essentially a two-classifier.
According to the method for generating the drilling well overflow diagnosis model, the drilling well overflow diagnosis model can be trained according to the multi-group multi-dimensional historical data and the mark data which are corresponding to part of group data and indicate whether the drilling well overflows, all training samples are not required to correspond to the mark data, and therefore the work of manually labeling the collected historical data is greatly reduced.
The method for training the neural network can adopt an ELM extreme learning machine, an SSELM semi-supervised extreme learning machine, a BP algorithm, an SVM support vector machine, a KMSE kernel minimum square error and other methods. The above-described embodiment employs the SSELM semi-supervised extreme learning machine method. The following describes a method for training a neural network model by using an SSELM semi-supervised extreme learning machine, and the steps are shown in FIG. 5.
S31: determining the number of nodes of a single hidden layer of the neural network model, selecting an activation function, and randomly initializing the input weight and bias of each node of the hidden layer to obtain an output matrix H of each node of the hidden layer.
S32: and selecting a similarity measurement function, and establishing a graph Laplace matrix L based on a matrix formed by input data of training samples.
S33: selecting a regularization parameter C 0 Calculating a penalty matrix C; the penalty matrix C is a diagonal matrix, each diagonal element corresponds to a training sample with label data, and the element of the ith row and the ith column is C i =C 0 /N y And when C is i N when the corresponding labeled data in the training sample indicates that an overflow event has occurred y For marking dataA number of training samples representing that an overflow event has occurred; when C is present i N when the labeled data in the corresponding training sample indicates that no overflow event occurs y The labeled data represents the number of training samples for which no overflow event occurred.
S34: according to the equation β = (I + H) T CH+λH T LH) -1 H T Solving the output weight of the hidden layer by CY to obtain a neural network model Y = H beta; wherein I is an identity matrix, lambda is a pre-selected manifold regularization parameter,
Figure BDA0003105382280000131
Figure BDA0003105382280000132
x 1 、x 2 ……x N is input data of the neural network model, N is the number of the input data,
Figure BDA0003105382280000133
g (x) is the activation function of the hidden layer, m is the dimension of the input data,
Figure BDA0003105382280000134
respectively the input weight of each node of the hidden layer of the neural network,
Figure BDA0003105382280000135
in order to imply the number of layer nodes,
Figure BDA0003105382280000136
respectively biasing each node of the hidden layer of the neural network; h denotes the output of each node of the hidden layer,
Figure BDA0003105382280000137
the output weight of the solved hidden layer is obtained; y is 1 ,y 2 …y N Being models of neural networksAnd outputting the data.
The description of the steps S31 to S34 may refer to the above embodiments.
The embodiments of the present specification also provide a method for evaluating the performance of the trained neural network model and optimizing the neural network model accordingly, which is shown in fig. 6.
S41: and acquiring multiple sets of historical data of the well drilling with the overflow event, wherein part of the multiple sets of historical data are acquired before the overflow time occurs, and the rest are acquired during the overflow event, and each set of historical data comprises multidimensional field data which are acquired at the same time and are used for representing the condition of the well drilling field.
In the above embodiment, all the raw data are preprocessed and reduced in dimension, and then the sample set is divided, the neural network model is trained once through all the data of the training sample set S, and the parameters of the neural network model are optimized through the verification sample set V.
The above "additionally acquiring sets of historical data of at least one well" is intended to express that the samples used to optimize the neural network model are different from the samples used to train the neural network model for the first time.
S42: and acquiring marking data corresponding to each group of the plurality of groups of historical data, wherein the marking data is used for indicating whether the well has an overflow event at the moment when the corresponding group of the historical data is collected.
S43: and sequentially inputting multiple groups of historical data into the trained neural network model to respectively obtain corresponding output data.
S44: judging the following situations of the output data and the marking data corresponding to each group of historical data: the output data and the marker data each indicate that an overflow event has occurred, the output data and the marker data each indicate that no overflow event has occurred, the output data indicate that an overflow event has occurred and the marker data indicate that an overflow event has not occurred, the output data indicate that an overflow event has not occurred and the marker data indicate that an overflow event has occurred.
S45: and calculating the accuracy and/or recall rate according to the situations corresponding to the multiple groups of historical data.
S46: adjusting the regularization parameter C when the accuracy and/or recall does not reach a predetermined threshold 0 And/or a manifold regularization parameter lambda, and recalculating the output weight of the hidden layer to obtain a new neural network model.
Wherein the accuracy is calculated by the formula
Figure BDA0003105382280000141
The recall rate is calculated by the formula
Figure BDA0003105382280000142
TP indicates the number of sets of output data and flag data corresponding to each indicating that an overflow event has occurred, TN indicates the number of sets of output data and flag data corresponding to each indicating that an overflow event has not occurred, FP indicates the number of sets of output data corresponding to each indicating that an overflow event has occurred and flag data indicating that an overflow event has not occurred, and FN indicates the number of sets of output data corresponding to each indicating that an overflow event has not occurred and flag data indicating that an overflow event has occurred.
The description of the steps S41 to S46 may refer to the above embodiments.
Because the number of dimensions of data in a drilling field is large, the calculated amount of a training neural network model is large, and the training speed is low. For this reason, the multidimensional drilling site data in the training sample data as input data of the neural network model may be subjected to dimensionality reduction before the neural network model is trained. The dimensionality reduction method can adopt methods such as a PCA principal component analysis method, a KPCA kernel principal component analysis method, a Missing value Ratio (Missing Values Ratio), low Variance filtering (Low Variance Filter), high Correlation filtering (High Correlation Filter), backward Feature Elimination (Backward Feature Elimination), forward Feature Construction (Forward Feature Construction) and the like. The above specific embodiment adopts the KPCA kernel principal component analysis method, and the following method adopts KPCA to perform dimension reduction on data, and the steps are shown in fig. 7.
S51: a kernel matrix is computed based on the selected kernel function, the kernel matrix being equivalent to the product of the transpose of the mapping matrix and the mapping matrix, the mapping matrix being used to map the data to a high-dimensional feature space.
S52: calculating an eigenvalue λ of a kernel matrix 12 ,…,λ n And corresponding feature vector u 1 ,u 2 ,…,u n
S53: obtaining unit orthogonal characteristic vector alpha by adopting Schmidt orthogonalization method 12 ,…,α n
S54: the characteristic value lambda is calculated in turn by the following formula 12 ,…,λ n Cumulative contribution rate C of 1 ,C 2 ,…,C n
Figure BDA0003105382280000143
S55: determining subscript l of characteristic value when accumulated contribution rate is greater than predetermined threshold value for the first time, and selecting the first characteristic value lambda 12 ,…,λ l And corresponding unit orthogonal eigenvectors alpha 12 ,…,α l
S56: calculating the first unit orthogonal eigenvector alpha of the kernel matrix 12 ,…,α l And the input data after dimension reduction is obtained through the projection.
The description of the steps S51 to S56 may refer to the above embodiments.
After the dimension reduction and reduction of the sample data, correspondingly, a plurality of groups of historical data and corresponding marked data are used as a plurality of training samples, a neural network model is trained, and the drilling overflow diagnosis model is obtained through the following steps: taking data obtained after dimensionality reduction of multiple groups of historical data and corresponding marked data as multiple training samples, and training a neural network model to obtain a drilling overflow diagnosis model; data obtained after dimensionality reduction of multi-dimensional field data in each training sample is used as input data of the model, and marking data of whether the well has an overflow event or not is used as output data of the model. And, accordingly, step S43 is: and sequentially inputting the data obtained after dimensionality reduction of the multiple groups of historical data into the trained neural network model to respectively obtain corresponding output data.
The embodiment of the specification further provides a drilling overflow diagnosis method, as shown in fig. 8, which includes the following steps.
S61: a well spill diagnostic model is generated by the method illustrated in fig. 4.
S62: at least one set of multi-dimensional field data for a well is acquired.
S63: and inputting a group of multidimensional field data into the drilling overflow diagnosis model to obtain an output result.
S64: and determining whether the well has an overflow event at the moment corresponding to the multidimensional field data according to the output result.
In some embodiments, the data is normalized before training the neural network, and the data obtained in step S62 also needs to be normalized by the same formula.
In some embodiments, the data is reduced before training the neural network, and the data obtained in step S62 also needs to be similarly reduced in dimension, that is: and substituting the acquired data and input data adopted when calculating the kernel matrix in the process of training the neural network model for the last time into a kernel function to calculate to obtain a coring matrix of the data acquired in the step S62, and calculating the projection of the coring matrix on the first characteristic vectors to obtain the input data after dimension reduction. For example, based on the above specific embodiment, when the data obtained in step S62 is 5 sets of 19-dimensional field data, that is, 5 sets of data, the 5 sets of data and the original 10800 sets of data are substituted into the gaussian radial basis kernel function to perform calculation, so as to obtain a kernel matrix of 5 × 10800, which is also referred to as "kernel" the 5 sets of data; then 5 units orthogonal eigenvector alpha of 5 multiplied by 10800 nuclear matrix is calculated 12 ,…,α 5 And (4) obtaining the data after dimensionality reduction, and inputting the data into a drilling overflow diagnosis model in the step S63.
The embodiment of the present specification provides a device for generating a drilling overflow diagnostic model, which can be used to implement the method for generating a drilling overflow diagnostic model described in fig. 4. As shown in fig. 9, the generating means includes a first acquiring module 11, a second acquiring module 12 and a training module 13.
The first obtaining module 11 is configured to obtain multiple sets of historical data of a drilling well in which an overflow event occurs, where a part of the multiple sets of historical data are collected before the overflow event occurs, and the rest of the multiple sets of historical data are collected during the overflow event, where each set of historical data includes multidimensional field data that is collected at the same time and is used for representing the condition of the drilling field.
The second obtaining module 12 is configured to obtain marking data corresponding to a part of the sets of historical data, where the marking data is used to indicate whether a flooding event has occurred in the drilling well at the time when the set of historical data was collected.
The training module 13 is used for training a neural network model by using a plurality of groups of historical data and corresponding labeled data as a plurality of training samples to obtain a drilling overflow diagnosis model; and the multi-dimensional field data in each training sample is used as input data of the model, and the marking data of whether the well has an overflow event or not is used as output data of the model.
The embodiment of the specification provides a drilling overflow diagnosis device which can be used for realizing the drilling overflow diagnosis method shown in fig. 8. As shown in fig. 10, the diagnosis apparatus includes the generation apparatus 10 of the drilling overflow diagnosis model shown in fig. 9, a third acquisition module 20, a calculation module 30, and a determination module 40.
The third acquisition module 20 is for acquiring at least one set of multi-dimensional field data for a well.
The calculation module 30 is used to input a set of multidimensional field data into the drilling overflow diagnostic model to obtain an output result.
The determining module 40 is configured to determine whether an overflow event has occurred in the drilling at a time corresponding to the multidimensional field data according to the output result.
An electronic device is further provided in this embodiment of the present application, as shown in fig. 11, the electronic device may include a processor 1101 and a memory 1102, where the processor 1101 and the memory 1102 may be connected through a bus or in another manner, and fig. 11 takes the example of connection through a bus as an example.
Processor 1101 may be a Central Processing Unit (CPU). The Processor 1101 may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic, discrete hardware components, or any combination thereof.
The memory 1102, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the first acquisition module 11, the second acquisition module 12, and the training module 13 shown in fig. 9) corresponding to the method for generating a drilling kick diagnostic model in the embodiments of the present application. The processor 1101 executes the non-transitory software programs, instructions and modules stored in the memory 1102 to execute various functional applications and data processing of the processor, that is, to implement the drilling overflow diagnostic model generation method and the drilling overflow diagnostic method in the above method embodiments.
The memory 1102 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required for at least one function; the storage data area may store data created by the processor 1101, and the like. Further, the memory 1102 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1102 may optionally include memory located remotely from the processor 1101, which may be connected to the processor 1101 by a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory 1102 and when executed by the processor 1101, implement the method for generating a drilling overflow diagnostic model and the method for drilling overflow diagnostic in the above-described embodiments.
The details of the electronic device may be understood by referring to the related descriptions and effects shown in the above embodiments, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when the computer program is executed, the processes of the embodiments of the methods described above can be implemented. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (9)

1. A method of generating a drilling flood diagnostic model, comprising:
acquiring multiple groups of historical data of a well with an overflow event, wherein part of the multiple groups of historical data are acquired before the overflow event occurs, and the rest of the multiple groups of historical data are acquired during the overflow event, and each group of historical data comprises multidimensional field data which are acquired at the same time and are used for representing the condition of a drilling field;
acquiring mark data corresponding to part of the groups of historical data respectively, wherein the mark data are used for indicating whether the well drilling has an overflow event at the moment when one group of historical data is acquired;
training a neural network model by taking the multiple groups of historical data and the corresponding marked data as multiple training samples to obtain a drilling overflow diagnosis model; the multi-dimensional field data in each training sample is used as input data of the model, and the marking data of whether the well is subjected to the overflow event is used as output data of the model; the step of training a neural network model by taking the multiple sets of historical data and the corresponding labeled data as multiple training samples to obtain a drilling overflow diagnosis model comprises the following steps:
determining the number of nodes of a single hidden layer of a neural network model, selecting an activation function, and randomly initializing the input weight and bias of each node of the hidden layer to obtain an output matrix H of each node of the hidden layer;
selecting a similarity measurement function, and establishing a graph Laplacian matrix L based on a matrix formed by input data of a training sample;
selecting a regularization parameter C 0 Calculating a penalty matrix C; the penalty matrix C is a diagonal matrix, each diagonal element corresponds to a training sample with labeled data, and the element of the ith row and the ith column is C i =C 0 /N y And when C is i N when the corresponding labeled data in the training sample indicates that an overflow event has occurred y Representing for the labeled data a number of training samples for which an overflow event has occurred; when C is present i N when the labeled data in the corresponding training sample indicates that no overflow event occurs y Representing the number of training samples for which no overflow event occurred for the labeled data;
according to the equation β = (I + H) T CH+λH T LH) -1 H T Solving the output weight of the hidden layer by CY to obtain a neural network model Y = H beta; wherein I is an identity matrix, lambda is a pre-selected manifold regularization parameter,
Figure FDA0003786148840000011
Figure FDA0003786148840000012
x 1 、x 2 ……x N is input data of the neural network model, N is the number of the input data,
Figure FDA0003786148840000013
for the number of nodes in the hidden layer, g (x) is the activation function of the hidden layer, m is the dimension of the input data, w 1 、w 2 ……w N% Respectively the input weight of each node of the hidden layer of the neural network, N% is the number of the nodes of the hidden layer, b 1 、b 2 ……b N% Respectively biasing each node of the hidden layer of the neural network; h denotes the output of each node of the hidden layer, β 12 L
Figure FDA0003786148840000021
The output weight of the solved hidden layer is obtained; y is 1 ,y 2 L y N Is the output data of the neural network model.
2. The method of generating a drilling overflow diagnostic model of claim 1, further comprising:
additionally acquiring multiple groups of historical data of the well drilling with the overflow event, wherein part of the multiple groups of historical data are acquired before the overflow event occurs, and the rest parts of the multiple groups of historical data are acquired during the overflow event, and each group of historical data comprises multidimensional field data which are acquired at the same time and are used for representing the condition of the well drilling field;
acquiring marking data corresponding to each group of historical data, wherein the marking data is used for indicating whether the well is overflowed at the moment when the corresponding group of historical data is collected;
sequentially inputting the multiple groups of historical data into the trained neural network model to respectively obtain corresponding output data;
judging the following situations of the output data and the marking data corresponding to each group of historical data: the output data and the mark data indicate that an overflow event has occurred, the output data and the mark data indicate that no overflow event has occurred, the output data indicate that an overflow event has occurred and the mark data indicate that no overflow event has occurred, the output data indicate that no overflow event has occurred and the mark data indicate that an overflow event has occurred;
calculating the accuracy and/or recall rate according to the situations corresponding to the multiple groups of historical data;
adjusting a regularization parameter C when the accuracy and/or the recall rate do not reach a predetermined threshold 0 And/or a manifold regularization parameter lambda, and recalculating the output weight of the hidden layer to obtain a new neural network model;
wherein the accuracy is calculated according to the formula
Figure FDA0003786148840000022
The recall rate is calculated according to the formula
Figure FDA0003786148840000023
TP indicates the number of sets of output data and flag data corresponding to each indicating that an overflow event has occurred, TN indicates the number of sets of output data and flag data corresponding to each indicating that an overflow event has not occurred, FP indicates the number of sets of output data corresponding to each indicating that an overflow event has occurred and flag data indicating that an overflow event has not occurred, and FN indicates the number of sets of output data corresponding to each indicating that an overflow event has not occurred and flag data indicating that an overflow event has occurred.
3. The method of generating a drilling overflow diagnostic model of claim 1, further comprising: performing dimensionality reduction on the acquired historical data;
correspondingly, the step of training a neural network model by using the multiple groups of historical data and the corresponding marked data as multiple training samples to obtain a drilling overflow diagnosis model comprises the following steps: training a neural network model by taking the data obtained after dimensionality reduction of the multiple groups of historical data and the corresponding marked data as a plurality of training samples to obtain a drilling overflow diagnosis model; data obtained after dimensionality reduction of multi-dimensional field data in each training sample is used as input data of the model, and marking data of whether the well has an overflow event or not is used as output data of the model.
4. The method of generating a drilling overflow diagnostic model of claim 3, wherein the historical data is reduced in dimension by:
calculating a kernel matrix based on the selected kernel function, wherein the kernel matrix is equivalent to the product of the transposition of a mapping matrix and the mapping matrix, and the mapping matrix is used for mapping data to a high-dimensional feature space;
calculating an eigenvalue λ of a kernel matrix 12 ,L,λ n And corresponding feature vector u 1 ,u 2 ,L,u n
Obtaining unit orthogonal characteristic vector alpha by adopting Schmidt orthogonalization method 12 ,L,α n
The characteristic value lambda is calculated sequentially by the following formula 12 ,L,λ n Cumulative contribution rate C of 1 ,C 2 ,L,C n
Figure FDA0003786148840000031
Determining subscript l of the characteristic value when the accumulated contribution rate is greater than a predetermined threshold value for the first time, and selecting the first l characteristic values lambda 12 ,L,λ l And corresponding unit orthogonal eigenvectors alpha 12 ,L,α l
Calculating the first unit orthogonal eigenvector alpha of the kernel matrix 12 ,L,α l And the input data after dimension reduction is obtained through the projection.
5. A method of drilling flood diagnosis, comprising:
generating a drilling flood diagnostic model by the method of any of claims 1 to 4;
acquiring at least one set of multi-dimensional field data of a well;
inputting the group of multidimensional field data into the drilling overflow diagnosis model to obtain an output result;
and determining whether the well has an overflow event at the moment corresponding to the multidimensional field data according to the output result.
6. An apparatus for generating a drilling overflow diagnostic model, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring multiple groups of historical data of a well with an overflow event, part of the multiple groups of historical data is acquired before the overflow event occurs, and the rest of the multiple groups of historical data is acquired during the overflow event, and each group of historical data comprises multidimensional field data which are acquired at the same time and are used for representing the conditions of a drilling field;
the second acquisition module is used for acquiring mark data corresponding to part of groups of the plurality of groups of historical data respectively, and the mark data is used for indicating whether the well is overflowed at the moment when one group of historical data is acquired;
the training module is used for training a neural network model by taking the multiple groups of historical data and the corresponding marked data as multiple training samples to obtain a drilling overflow diagnosis model; the multi-dimensional field data in each training sample is used as input data of the model, and the marking data of whether the well has an overflow event or not is used as output data of the model;
the step of training a neural network model by taking the multiple sets of historical data and the corresponding labeled data as multiple training samples to obtain a drilling overflow diagnosis model comprises the following steps:
determining the number of nodes of a single hidden layer of a neural network model, selecting an activation function, and randomly initializing the input weight and bias of each node of the hidden layer to obtain an output matrix H of each node of the hidden layer;
selecting a similarity measurement function, and establishing a graph Laplacian matrix L based on a matrix formed by input data of a training sample;
selecting a regularization parameter C 0 Calculating a penalty matrix C; the penalty matrix C is a diagonal matrix, each diagonal element corresponds to a training sample with label data, and the element of the ith row and the ith column is C i =C 0 /N y And when C is i N when the corresponding labeled data in the training sample indicates that an overflow event has occurred y Representing for the labeled data a number of training samples for which an overflow event has occurred; when C is i N when the labeled data in the corresponding training sample indicates that no overflow event occurs y Representing the number of training samples for which no overflow event occurred for the labeled data;
according to the equation β = (I + H) T CH+λH T LH) -1 H T Solving the output weight of the hidden layer by CY to obtain a neural network model Y = H beta; wherein I is an identity matrix, lambda is a pre-selected manifold regularization parameter,
Figure FDA0003786148840000041
Figure FDA0003786148840000042
x 1 、x 2 ……x N is input data of the neural network model, N is the number of the input data,
Figure FDA0003786148840000043
for the number of nodes in the hidden layer, g (x) is the activation function of the hidden layer, m is the dimension of the input data, w 1 、w 2 ……w N% Respectively the input weight of each node of the hidden layer of the neural network, N% is the number of the nodes of the hidden layer, b 1 、b 2 ……b N% Respectively biasing each node of the hidden layer of the neural network; h denotes the output of each node of the hidden layer, β 12 L
Figure FDA0003786148840000044
The output weight of the solved hidden layer is obtained; y is 1 ,y 2 L y N Is the output data of the neural network model.
7. A drilling overflow diagnostic device is characterized in that,
means for generating the well overflow diagnostic model of claim 6;
the third acquisition module is used for acquiring at least one group of multi-dimensional field data of a well;
the calculation module is used for inputting the group of multidimensional field data into the drilling overflow diagnosis model to obtain an output result;
and the determining module is used for determining whether the well has an overflow event at the moment corresponding to the multidimensional field data according to the output result.
8. An electronic device, characterized in that,
a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor performing the steps of the method of any one of claims 1 to 5 by executing the computer instructions.
9. A computer-readable storage medium storing computer instructions which, when executed, implement the steps of the method of any one of claims 1 to 5.
CN202110636358.3A 2021-06-08 2021-06-08 Method for generating drilling overflow diagnosis model, drilling overflow diagnosis method and device Active CN113268803B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110636358.3A CN113268803B (en) 2021-06-08 2021-06-08 Method for generating drilling overflow diagnosis model, drilling overflow diagnosis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110636358.3A CN113268803B (en) 2021-06-08 2021-06-08 Method for generating drilling overflow diagnosis model, drilling overflow diagnosis method and device

Publications (2)

Publication Number Publication Date
CN113268803A CN113268803A (en) 2021-08-17
CN113268803B true CN113268803B (en) 2022-11-29

Family

ID=77234666

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110636358.3A Active CN113268803B (en) 2021-06-08 2021-06-08 Method for generating drilling overflow diagnosis model, drilling overflow diagnosis method and device

Country Status (1)

Country Link
CN (1) CN113268803B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114000862B (en) * 2021-10-26 2023-07-18 中国地质大学(武汉) Geological drilling process drilling rate intelligent control system based on dynamic optimization
CN114997485B (en) * 2022-05-26 2023-10-03 中国石油天然气集团有限公司 Overflow condition prediction model training method and device and overflow condition prediction method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852018A (en) * 2019-10-21 2020-02-28 中国石油集团长城钻探工程有限公司 PSO drilling parameter optimization method based on neural network
CN111191836A (en) * 2019-12-27 2020-05-22 东软集团股份有限公司 Well leakage prediction method, device and equipment
CN111827982A (en) * 2019-04-17 2020-10-27 中国石油天然气集团有限公司 Method and device for predicting overflow and leakage working conditions of drilling well

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111827982A (en) * 2019-04-17 2020-10-27 中国石油天然气集团有限公司 Method and device for predicting overflow and leakage working conditions of drilling well
CN110852018A (en) * 2019-10-21 2020-02-28 中国石油集团长城钻探工程有限公司 PSO drilling parameter optimization method based on neural network
CN111191836A (en) * 2019-12-27 2020-05-22 东软集团股份有限公司 Well leakage prediction method, device and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向早期溢流监测的钻井工程参数仿真建模方法;张博等;《录井工程》;20191225(第04期);全文 *

Also Published As

Publication number Publication date
CN113268803A (en) 2021-08-17

Similar Documents

Publication Publication Date Title
CN113268803B (en) Method for generating drilling overflow diagnosis model, drilling overflow diagnosis method and device
CN105089620B (en) Monitoring system, the method and device of bit freezing
CN111275288A (en) XGboost-based multi-dimensional data anomaly detection method and device
CN112101554B (en) Abnormality detection method and apparatus, device, and computer-readable storage medium
CN109522948A (en) A kind of fault detection method based on orthogonal locality preserving projections
CN113255848A (en) Water turbine cavitation sound signal identification method based on big data learning
WO2022001125A1 (en) Method, system and device for predicting storage failure in storage system
CN111734396B (en) Friction determination method, device and equipment
WO2022222026A1 (en) Medical diagnosis missing data completion method and completion apparatus, and electronic device and medium
CN116644284A (en) Stratum classification characteristic factor determining method, system, electronic equipment and medium
CN111043050A (en) Fault diagnosis method and system for centrifugal pump
CN114091600B (en) Data-driven satellite association fault propagation path identification method and system
CN116432071A (en) Rolling bearing residual life prediction method
CN113505531B (en) Diagnostic model construction method based on combination of traditional diagnosis and machine learning method and aiming at abnormal valve clearance faults of reciprocating machinery
CN114443338A (en) Sparse negative sample-oriented anomaly detection method, model construction method and device
CN112861422B (en) Deep learning coal bed gas screw pump well health index prediction method and system
CN110543108B (en) Heating system measurement data correction method and system based on mechanism model prediction
CN116776717A (en) Drilling parameter multi-objective dynamic optimization method based on improved NSGA-III algorithm
Jinasena et al. Modeling and analysis of fluid flow through a non-prismatic open channel with application to drilling
CN114648089A (en) Method, device and equipment for detecting abnormal data of inclinometer and storage medium
CN113536691A (en) Deepwater drilling parameter optimization method and system
CN113218683A (en) Petroleum underground electric casing cutter fault identification method based on vibration signals
CN111612300A (en) Scene anomaly perception index calculation method and system based on deep hybrid cloud model
Jiang et al. Anomaly detection of Argo data using variational autoencoder and k-means clustering
CN111444659A (en) Centrifugal pump fault diagnosis method, system and medium based on improved particle filtering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant