CN117171700A

CN117171700A - Drilling overflow prediction combined model based on deep learning and model timely silence updating and migration learning method

Info

Publication number: CN117171700A
Application number: CN202311000615.XA
Authority: CN
Inventors: 张秋实; 任星兆; 王家骏; 王胡振; 穆朗枫
Original assignee: Liaoning Shihua University
Current assignee: Liaoning Shihua University
Priority date: 2023-08-10
Filing date: 2023-08-10
Publication date: 2023-12-05

Abstract

The invention discloses a drilling overflow prediction combined model based on deep learning and a model timely silence updating and transferring learning method. According to the invention, four improved optimization models based on deep learning are trained, prediction verification is carried out on the improved optimization models respectively by using new sample data, the combination model is obtained by carrying out weight division fusion according to the prediction effect by using an advantage matrix method, and overflow prediction is carried out on sample data to be predicted by combining a classification prediction model; the method for updating the model timely and quietly by using the combination model based on the timer-trigger and the multithreading technology realizes the lasting, stable and accurate prediction of overflow; the 'one-time training' effect of the model is realized by using a transfer learning method, and after the model is fully trained by using a large amount of data, the application can be rapidly deployed by fine adjustment and training of a small amount of data of other wells.

Description

Drilling overflow prediction combined model based on deep learning and model timely silence updating and migration learning method

Technical Field

The invention relates to the field of deep learning and the field of drilling engineering, in particular to a drilling overflow prediction combined model based on deep learning and a model timely silent updating and migration learning method.

Background

During oil drilling and exploitation, if drilling accidents such as wellhead leakage, overflow, kick, blowout, well wall collapse and the like occur, the drilling cost is increased, the environment is polluted, and even casualties can occur. Among these, overflow is the most frequent type of all drilling accidents, and is also an important complex accident affecting the safety of the drilling operation. The overflow not only can cause serious damage to the oil reservoir, increase the drilling and development cost and reduce the exploitation efficiency, but also can cause more serious disasters such as well wall collapse, blowout and the like if the overflow is not treated in time. Therefore, it is highly desirable to develop an accurate and fast predictive model of the risk of flooding downhole during drilling that can be applied in the field.

The traditional overflow avoiding method mainly monitors the change condition of each logging data manually, and judges whether the underground overflow occurs or has the symptom of overflow according to the change of the logging curve by empirical analysis. This is relevant to the experience and responsibility of the monitoring personnel, and is easy to misjudge relative to the automatic recognition of the machine, and has more uncontrollable factors.

The technology for learning and identifying the manually selected logging parameters by using a Bayesian network, a neural network and other methods has the following limitations:

(1) Most of overflow prediction researches are called prediction and real-time monitoring at present, namely, logging data collected on the oilfield site are collected currently or afterwards and then analyzed and processed, which is not the advanced prediction of the invention. Compared with real-time monitoring or post analysis to judge whether overflow occurs underground, although the method has practical value, obviously, the oil field is more required to be capable of predicting the probability of overflow in a future time period according to the drilling data and logging data of the current drilling, so as to help the oil field take measures in time, thereby avoiding overflow and even serious disaster accidents such as blowout;

(2) At present, the monitoring accuracy of most models on early overflow is required to be improved, and the models are frequently subjected to false alarm. Therefore, the method is based on four improved deep learning optimization models, combines according to the prediction conditions of the drilling well by using the dominant matrix method according to each model, and has higher accuracy and reliability compared with a single model;

(3) In a specific network model training process, the existing papers and practice data are all segmented aiming at the same sample data, and then the segmented training set and test set are subjected to model learning, but the risk of fitting and data leakage is easy to exist because the essence is still based on the same sample data. Therefore, the invention uses another new log data when evaluating the prediction capability of each method of the fusion model, and the model has higher credibility and generalization capability relatively;

(4) The model methods used in the prior papers and practice data are all training and predicting for a certain well, and the prediction effect of the model can be seriously affected when the well logging data and the well drilling data are changed when the well logging data are changed into other wells;

(5) The method uses a model timely silence updating method, timely corrects according to the prediction effect when the normal prediction task of the model is not influenced, and can output stable and accurate predictions for a long time.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a drilling overflow prediction combined model based on deep learning and a model timely silent updating and migration learning method. The advanced overflow model is stable, adaptive and continuously accurate in prediction, helps operators to know underground overflow conditions, accurately judges, identifies and early warns the operators before overflow occurs, reminds the operators to take measures in time to avoid overflow accidents, and further prevents overflow from evolving into more serious blowout disasters.

The technical scheme of the invention is as follows:

a well drilling overflow prediction combined model based on deep learning and a model timely silence updating and migration learning method comprise the following parts:

data preprocessing and model training part: firstly, carrying out data preprocessing operation on original logging data, and respectively training four optimization models based on deep learning, namely an attention mechanism time sequence convolution Long-Term Memory network model IPSO-TCN-LSTM-2DAttention, IPSO improved particle swarm optimization and an attention mechanism convolution Long-Term Memory network (LSTM) model IPSO-ConvLSTM-2DAttention, IPSO improved particle swarm optimization, which are optimized by an IPSO-improved particle swarm algorithm (Improved Particle Swarm Optimization, IPSO) by using the logging data after processing to respectively train an attention mechanism-based convolution cyclic neural network model IPSO-1DCNN-LSTM-2DAttention, IPSO improved particle swarm algorithm.

Building and timely silence updating part of the combined model: the method mainly comprises the steps of deciding and fusing the four optimization models based on an advantage matrix method, constructing a combined model according to respective advantages (the average absolute error value MAE (Mean Absolute Error) of the respective prediction results of the four optimization models), realizing that a timer automatically calculates the average absolute error value MAE between the prediction value of the combined model and the corresponding real value of the on-site logging data at intervals of a designated time, and finally judging whether to trigger the timely silence updating operation of the combined model according to whether the MAE value reaches a designated threshold value;

overflow prediction and visualization section: performing overflow classification prediction on the basis of the regression prediction result of the combined model by using an improved particle swarm optimization (IPSO-BP) neural network model, and visually displaying the regression prediction result of the combined model and the overflow prediction result of the IPSO-BP classification model;

model migration learning section: and (3) based on a migration learning strategy, the combined model and the classification model are subjected to fine adjustment and then are applied to overflow prediction of an adjacent well or other oilfield wells.

Further, the data preprocessing step includes: firstly, performing three steps of outlier processing, missing value filling and z-score standardization on original logging data, and performing primary cleaning and dimension unification on the original logging data to prevent the influence on subsequent component analysis; performing PCA principal component analysis operation on the preliminarily cleaned X-dimensional logging data to extract key features affecting overflow; data set segmentation is then performed: dividing the data set into a training set, a testing set and a verification set; and finally, solving the actual situation of too few overflow samples in the data set by using an SMOTE unbalanced data processing method.

The outlier processing and missing value filling operation refers to processing outliers, noise values, data filled with 9999, -9999 or infinitesimal values and missing values in original logging data, wherein the processing method is to fill the outliers or the average value of five data before and after the missing values.

The PCA principal component analysis operation comprises sampling a sample matrix, carrying out centering treatment (average value taking) on the sample matrix, calculating a covariance matrix of the sample, calculating eigenvalues and eigenvectors and selecting principal components.

Further, the model training part respectively trains the four optimization models based on deep learning by using the original logging data subjected to the data preprocessing operation: attention mechanism sequential convolution long-term memory network model IPSO-ConvLSTM-2DAttention, IPSO improved particle swarm optimization based on attention mechanism convolution cyclic neural network model IPSO-1DCNN-LSTM-2DAttention, IPSO improved particle swarm optimization based on attention mechanism sequential convolution long-term memory network model IPSO-TCN-LSTM-2DAttention, IPSO improved particle swarm optimization based on attention mechanism convolution long-term memory network model IPSO-ConvLSTM-2 DCRNN.

The four optimization models based on deep learning all use an IPSO improved particle swarm optimization algorithm, and are characterized in that the following optimization is performed on the basis of a basic PSO particle swarm algorithm:

(1) Realizing nonlinear updating of particle individual learning factor C ₁ And group learning factor C ₂ : particle iteration early C ₁ Large C ₂ The method is small to ensure the diversity of particles, and the wide random search is performed in the search space as soon as possible; late stage C ₁ Small C ₂ Greatly, help the model converge to a globally optimal location as soon as possible, which is compared to direct fixed assignment C ₁ ＝C ₂ For =2, particle optimizing performance can be improved, convergence speed can be increased, and global and local searches can be dynamically balanced;

(2) An inertial weight updating mode of a particle velocity formula in an original particle swarm algorithm is improved: comparing the linear decreasing update method with the A, B nonlinear decreasing method and the random generation inertia weight updating method, preferably a B-type nonlinear decreasing method, and obviously improving the convergence rate;

(3) Setting two random search coefficients r ₁ 、r ₂ The diversity and randomness of searching are further increased, the global optimizing capability of particles is improved, and the particles are further prevented from being trapped in a local optimal solution;

(4) The IPSO improved particle swarm algorithm can exit the round of optimization when reaching a specified error during the global optimization of each round of iteration, and can end the algorithm when reaching the set maximum iteration times so as to improve the model training efficiency;

(5) Designing a fitness function as a prediction error of the target model to be optimized returned after each particle carries the super-parameter training of the set target model to be optimized, wherein the fitness value is an average absolute error value MAE returned by the target model;

three of the four optimization models based on deep learning use an Attention mechanism, and the optimization model is characterized in that the Attention layer is realized by a response full-connection layer with an activation function of softmax, and the front Attention-LSTM mode provided by the invention is easier to notice key information causing overflow compared with the design mode after the Attention mechanism is applied to the LSTM layer because the output of the rear Attention-full-connection layer is more abstract and is unfavorable for the Attention mechanism to learn useful information. Aiming at multidimensional well logging data, the invention correspondingly designs the Attention mechanism into a multichannel, namely the Attention mechanism of n channels corresponding to n-dimensional features, and can pay more Attention to key information of each feature dimension compared with a shared weight mode of only setting a group of Attention weights. And the invention innovatively uses the attention mechanism in both the time dimension and the feature dimension, and can simultaneously improve the roles of important time steps and important features in LSTM (least squares) compared with the situation that only the time dimension or the feature dimension is set in other paper researches, thereby further reducing the model prediction error.

The action of the Attention mechanism is to guide the model to pay Attention to the part having positive influence on the result by learning a weight distribution and applying the weight distribution to the characteristics, and the mode can effectively alleviate the problem that the traditional neural network needs excessive memory information to cause excessive complexity, and the formula of the Attention mechanism for calculating the weight is as follows:

wherein Q (query), K (key), V (value) are three matrices from the same input, d _k For the dimension of the Q or K vector, the result is finally normalized by the softmax function and multiplied by the matrix V.

The IPSO improved particle swarm optimization is based on an attention mechanism convolution cyclic neural network model IPSO-1DCNN-LSTM-2DAttention, and is characterized in that:

(1) And performing global optimization on 9 key super parameters affecting regression prediction of the IPSO-1DCNN-LSTM-2 DAttntion model, namely the learning rate of the 1DCNN-LSTM-2 DAttntion model, the number of LSTM layers, the number of neurons of each layer of the LSTM layers, the number of convolution kernels of a CNN convolution neural network part, the sliding step length of the convolution kernels of the CNN convolution neural network part, the pooling size of a pooling layer, the sliding step length of the pooling layer, the number of iterative training times epoch value of each round of the LSTM long-term memory network and the number of samples batch size of each time the weight parameter is updated by the LSTM network by using the IPSO algorithm.

Among other things, convolutional layers use a leak ReLU as an activation function (a variant of the ReLU activation function), which has many advantages: the problem of non-activation of the ReLU is solved, and by providing a small positive slope α, even if the input is negative, the leak ReLU can also provide a non-zero gradient, so that the weight is updated and the training speed is accelerated due to the high computational efficiency of the linear nature of the leak ReLU, and the computational formula of the leak ReLU is as follows:

(2) In the invention, the two-dimensional Attention mechanism 2DAttention is arranged before the multilayer LSTM layer and after the CNN layer, the Attention weight distribution can be carried out on the feature dimension and the time step dimension extracted by the CNN model at the same time, so that the overall prediction accuracy of the model is improved;

(3) The one-dimensional convolutional neural network 1DCNN part in the IPSO-1DCNN-LSTM-2DAttention model can read the logging data input subjected to data preprocessing and automatically learn the features in the logging data input subjected to data preprocessing, the extracted features are subjected to importance weighting processing through a two-dimensional attention mechanism 2DAttention and then sent to a multi-layer LSTM model for interpretation and further learning, wherein the input structure of the CNN model is identical to that of the LSTM model, the CNN model part has two layers of CNN convolutional neural networks in total, the first layer of convolutional reads an input sequence, and the result is projected onto a feature map. The second layer convolution performs the same operation on the feature map created by the first layer, attempting to magnify its salient features, and the stacked CNN convolution layers can capture long-range dependencies. The pooling layer simplifies and reduces the dimension of the extracted features by a downsampling method, and extracts key features.

The IPSO improved particle swarm optimization is based on an attention mechanism time sequence convolution long-period memory network model IPSO-TCN-LSTM-2DAttention, and is characterized in that:

(1) And performing global optimization on 9 key hyper-parameters affecting regression prediction of the IPSO-TCN-LSTM-2 dattntion model, namely the learning rate of the IPSO-TCN-LSTM-2 dattntion model, the number of LSTM layers, the number of neurons of each layer of the LSTM layers, the size of convolution kernels in the TCN time sequence convolution network, the number of residual blocks, the Dropout size in a residual structure unit, the number of iterative training times epoch value of each round of the LSTM long-short-term memory network and the sample number batch size used by the LSTM network for updating the weight parameter each time by using the IPSO-TCN-LSTM-2 dattntion algorithm. Wherein, the TCN time sequence convolution layer uses a leak ReLU as an activation function;

(2) In the IPSO-TCN-LSTM-2DAttention model, the TCN time sequence convolutional neural network is a convolutional network with time sequence processing capability, which is proposed in recent years, improves convolutional in CNN into causal convolutional based on CNN convolutional neural network, and consists of an expansion layer with equal input and output lengths, a causal 1D full convolutional layer and a residual structure, wherein the causal convolutional is responsible for modeling logging data, the cavity convolutional is responsible for expanding a receptive field as much as possible on the premise of not increasing a pooling layer, the historic memory capability of logging data is increased, the residual module is responsible for improving the prediction capability of a long time step of a time sequence, and the addition of the module ensures that the TCN structure has the advantages of parallelism, flexible receptive field, variable input length, no gradient vanishing or gradient explosion problem and smaller occupied memory. After the input logging data is subjected to TCN time sequence convolutional neural network feature extraction, importance weighting processing is carried out on the TCN extracted features by a two-dimensional attention mechanism 2DAttention, so that the feature extraction capacity and classification capacity of the model can be improved. And then entering a multi-layer LSTM model for interpretation and further learning. The input format of the TCN network is the same as that of the CNN network, and two convolution layers are arranged;

(3) The convolution layer used in the TCN time sequence convolution neural network is one-dimensional convolution, and the invention adopts a multi-channel feature extraction method aiming at the characteristics of multi-dimensional features of logging data, namely, n-dimensional features adopt corresponding n submodels, and in order not to lose features with different abstract degrees, the features of n channels extracted by the n submodels are spliced into a new feature vector to be used as a result of feature extraction of the convolution layer in the TCN.

Wherein the IPSO improved particle swarm optimization is based on an attention mechanism convolution long-short term memory network model IPSO-ConvLSTM-2DAttention,

(1) And performing global optimization on 8 key super parameters influencing regression prediction of the IPSO-ConvLSTM-2 DAttntion model, wherein the key super parameters are the number of ConvLSTM convolution long-term and short-term memory network layer layers, the number of LSTM layer neurons, the number of ConvLSTM layer convolution kernels, the convolution kernel size, the convolution kernel moving step length, the number of epoch values of each iteration training of the ConvLSTM model and the number of samples used by each updating weight parameter of the model by using the IPSO algorithm. Wherein, the convolution layer of ConvLSTM uses the Leaky ReLU as an activation function;

(2) In the IPSO-ConvLSTM-2DAttention model, convLSTM is used for carrying out feature abstract extraction on the multidimensional logging data and retaining time sequence information at the same time, then an extraction result is sent into an IPSO-optimized LSTM model for learning, then a two-dimensional attention mechanism 2DAttention part is used for carrying out weight calculation of multidimensional features to distribute attention, and finally a full-connection layer is used for carrying out information integration and then outputting;

(3) Wherein ConvLSTM is different from CNN-LSTM or TCN-LSTM in that the former is directly using convolution as part of reading the LSTM cell input, and the latter two are each time step in which a convolution operation is applied to the LSTM. The Keras library provides ConvLSTM2D class when the codes are realized, and the ConvLSTM model supporting two-dimensional data can also be configured into one-dimensional multivariable time sequence prediction. The ConvLSTM2D type requires the input data to be in the form of samples, timeteps, rows, cols, channels, and is expressed as the number of samples, the time step, the one-dimensional shape of each sub-sequence (the number of lines of the sub-sequence), how many columns there are in each sub-sequence, and the number of channels (the concept in the image recognition task, and the feature number in the time sequence prediction task studied in the invention).

Wherein the IPSO improved particle swarm optimization two-dimensional multi-channel non-heterogeneous convolutional neural network model IPSO-2DCRNN,

(1) And performing global optimization on 9 key super parameters affecting regression prediction of the IPSO-2DCRNN model, namely the learning rate of the IPSO-2DCRNN model, the number of RNN circulating neural network layers, the number of neurons of each hidden layer, the dropout value, the number of convolution layers, the convolution kernel size, the number of convolution kernels, the number of iterative training times epoch of each round of 2DCRNN two-dimensional multi-channel non-heterogeneous convolution circulating neural network and the sample number batch size used by the RNN circulating neural network to update weight parameters each time by using the IPSO-2DCRNN algorithm. Wherein, the convolution layer of 2DCRNN uses the leak ReLU as an activation function;

(2) In the IPSO improved particle swarm optimization two-dimensional multichannel non-heterogeneous convolutional neural network model IPSO-2DCRNN model, two-dimensional refers to that the moving direction of a convolution kernel in a convolution layer simultaneously comprises a time step dimension and a characteristic dimension; the multi-channel refers to logging data aiming at multi-dimensional characteristics, wherein each dimensional characteristic data is provided with a single channel for a model to extract characteristics and learn, and then information learned by all the channels is spliced and sent to a next module to continue learning; "non-heterogeneous" means that instead of providing each channel with a convolution kernel, the convolution layer in the IPSO-2DCRNN model of the present invention will use the same convolution kernel to perform feature extraction, and each input sequence is read into a set of separately set maps, essentially feature information learned from secondary mining of each input time sequence variable;

(3) The convolution layer in the IPSO-2DCRNN model uses the optimal depth obtained through the global optimization of the IPSO to fully mine the abstract features of the multidimensional logging data, the abstract features are input into the circulating neural network for further learning, and the main structure comprises the convolution layer, the pooling layer, the circulating neural network layer and the output layer.

Further, the prediction results of the four optimization models based on the deep learning are weighted and fused by using an advantage matrix method, and the four optimization models are formed into a combined model: let W ₁ 、W ₂ 、W ₃ 、W ₄ The weight coefficients of the four models are respectively f _it F, as a prediction result of the ith model at the moment t in a designated time interval _t For the prediction result of the combined model at the t moment in the appointed time interval, determining a weight coefficient W by a dominant matrix method _i The method of (2) is as follows:

F _t ＝W ₁ *f _1t +W ₂ *f _2t +W ₃ *f _3t +W ₄ *f _4t ，

wherein Z is ₁ For the number of times that the model 1 has better prediction effect than other models in a specified time interval, Z ₂ For the number of times that the model 2 has better prediction effect than other models in a specified time interval, Z ₃ Z for specifying the number of times of model 3 prediction effect over other models in the time interval ₄ The number of times the model 4 is predicted to be better than the other models for a given time interval. The prediction effect refers to the average absolute error value MAE of the predicted result and the real result after the model predicts the new well logging data subjected to data arrangement, and the smaller the error value is, the better the prediction effect of the model is. The designated time interval can be freely set, and the time interval related to the new logging data is defaulted.

Furthermore, the multi-thread method can be used for continuously and parallelly carrying out data backtracking of a designated time step without affecting the normal execution of a prediction task of the model, error calculation is carried out on the predicted value of the model and the recorded actual value, if the error is too large, the multi-thread method is used for retraining the combined model by using the recorded actual value so as to ensure the continuous high accuracy of the model prediction, and the combined model timely silence updating method has the following characteristics:

(1) Setting a timer to perform specified operation according to specified time intervals, wherein the operation is divided into two steps, and the first step is to inquire new prediction results of the combined model constructed by the dominant matrix method on the to-be-predicted well subjected to data arrangement operation in specified time steps and logging data true values corresponding to new records of the to-be-predicted well; secondly, using a python multithreading technology to establish a thread to calculate the average absolute error value MAE of the predicted value and the true value, and not affecting the normal execution of the predicted task of the model;

(2) Judging whether the magnitude of the error value MAE exceeds a specified threshold, and executing specified operation by a trigger when the magnitude exceeds the threshold: creating a thread by using a python multithreading technology, and retraining the combined model by using the logging data of the new record of the well to be predicted, so that the normal execution of the prediction task of the model is not influenced;

(3) When the combined model retraining task is completed, the model is saved as a local file and enabled when the next time step predictive task is performed, while the previous higher error combined model file is deleted.

Further, the IPSO improved particle swarm algorithm is used for carrying out global optimization on 6 super parameters which influence the classification prediction performance of the BP neural network, namely the hidden layer number of the BP neural network, the number of neurons of each layer (which can be different), the weight initial value W and the threshold value b of each layer of neurons of each layer (which can be different), the iterative training frequency epoch value of each round of the neural network and the sample number batch size used by the neural network for updating the weight parameters each time.

Furthermore, on the premise of keeping the overall structure of the combined regression prediction model and the IPSO-BP classification prediction model unchanged (discarding the output layer), a specified number of (defaulting to two layers) Dense full connection layers and output layers are added at the tail ends of the two models, and the migration model is trained by using logging data of adjacent wells or wells in other oil fields subjected to the same data pretreatment. Specifically, the migration model refers to an original model with the output layer discarded and a new model composed of a plurality of fully connected layers and the output layer newly added. In particular, the purpose of using the transfer learning is to utilize a completely trained model, and the transfer learning method can be rapidly applied to overflow prediction on an adjacent well or other oilfield wells only by carrying out fine adjustment operation and a small amount of training time, so that a large amount of time can be saved compared with retraining, and the problem that a plurality of new wells cannot be trained due to lack of drilling data or lack of a large amount of logging data can be solved.

The invention has the following advantages:

(1) The overflow prediction is realized on the basis of overflow real-time monitoring, and the prediction result is visually displayed, so that operators can know the situation in the first time conveniently;

(2) Various deep learning models are used, various improvements and innovations are made on the basis of the deep learning models, and the prediction effects of the models are weighted and combined according to the prediction effects of the models, so that the generalization capability and the prediction accuracy of the models are improved to the greatest extent;

(3) Sample data of two groups of different wells are used in the training and prediction verification stage of the model, so that risks of data leakage and overfitting are completely avoided;

(4) A timely silence updating mechanism of the combined model is used, so that the continuous prediction capability of the model for a long time step is ensured to be stable and reliable;

(5) The method has the advantages that the transfer learning technology is used, the effect of 'one-time training and multi-well use' is realized, the problem that a plurality of new wells cannot be predicted due to insufficient data for training a machine learning or deep learning model is solved, the model retraining time is greatly saved, and the model can be rapidly deployed by training only a small amount of data.

Drawings

In order to more clearly illustrate the embodiments herein or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments herein and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a schematic diagram showing the overall structural design of a well drilling overflow prediction combined model based on deep learning and a model timely silence updating and transfer learning method according to an embodiment of the invention;

FIG. 2 is a general flow chart of a well overflow prediction combination model based on deep learning and a model timely silence updating and transfer learning method according to an embodiment of the invention;

FIG. 3 is a flow chart of a data preprocessing method in an embodiment of the invention;

FIG. 4 is a flowchart of a method for updating a combined model in real time according to an embodiment of the present invention;

fig. 5 is a flowchart of a method for migration learning according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The present specification provides method operational steps as described in the examples or flowcharts, but may include more or fewer operational steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When a system or apparatus product in practice is executed, it may be executed sequentially or in parallel according to the method shown in the embodiments or the drawings.

It should be noted that, the well drilling overflow prediction combined model and the model timely silence updating and transferring learning method based on deep learning can be applied to the prediction problem of overflow working conditions in the field of well drilling engineering, and can also be applied to other problems except the field of well drilling engineering and overflow working condition prediction.

Example 1

The invention provides a well drilling overflow prediction combined model based on deep learning and a model timely silence updating and migration learning method, which can realize advanced prediction of overflow working conditions, train original logging data and new logging data based on various optimized deep learning improved models, evaluate and verify prediction accuracy, merge a combined regression prediction model and a classification prediction model which can accurately predict overflow conditions of a well to be predicted, visualize model prediction results, and realize timely silence updating of the combined model so as to maintain long-term stable model prediction effects.

Fig. 1 is a schematic diagram of the overall structural design of a drilling overflow prediction combined model based on deep learning and a model timely silence updating and migration learning method according to an embodiment of the invention, which comprises four parts of data preprocessing and model training, combined model construction and timely silence updating, overflow prediction and visualization and model migration learning.

Fig. 2 is a general flowchart of a drilling overflow prediction combination model based on deep learning and a model timely silence updating and migration learning method according to an embodiment of the invention, wherein the method specifically comprises the following steps:

step 201, performing data preprocessing operation on the original logging data. In the step, the data preprocessing operation comprises three steps of firstly carrying out outlier processing, missing value filling and z-score standardization on original logging data, and carrying out preliminary cleaning and dimension unification on the original logging data so as to prevent the influence on subsequent component analysis; performing PCA principal component analysis operation on the preliminarily cleaned X-dimensional logging data to extract main features affecting overflow; finally, the data set is divided into a training set and a testing set, and then the actual situation that overflow samples in the data set are too few is solved by using the SMOTE unbalanced data processing method.

Wherein, the calculation formula of the z-score standardization is as follows:

wherein z is a standard fraction, X is a value of a specific data, μ is an average number of the group of data, σ is a standard deviation of the group of data, and a calculation formula of the standard deviation is:

wherein N represents the number of data pieces in the logging data, x _i The i-th data in the data set is represented, compared with a normalization processing method, the normalization processing method can better keep the sample distance, the situation that most samples are piled up due to the influence of extreme values is avoided, and the global maximum and minimum range can be influenced when the sample data are newly added by using the normalization processing method, so that the robustness is relatively poor.

The principal component analysis operation of PCA can extract the most critical and representative abstract features in the input data, and can reduce data items on the premise of ensuring the data quality, and the process is as follows:

1) Sample matrix: taking the preprocessed m rows of n-dimensional logging data as a sample matrix, wherein m represents the number of samples distributed according to a time sequence, and n represents the dimension of the samples or the number of sample characteristics;

2) The sample matrix is centred (averaged). Wherein the centering process is to subtract the mean μ of the sample matrix dimensions i from each element of the matrix _i I is the dimension or column number of the element, and the average calculation formula is:

3) Calculating covariance matrix of the sample: the covariance matrix represents the degree of linear relationship between the variables in the data, and element C (i, j) of the covariance matrix C represents the covariance between the ith and jth variables. The covariance formula is shown in the following chart:

4) Calculating eigenvalues and eigenvectors: by performing feature decomposition on the covariance matrix, feature values and corresponding feature vectors can be obtained. The eigenvalues represent the variance of the data on each principal component, and the eigenvectors represent the principal components corresponding to these variances.

5) And selecting main components: according to the magnitude of the eigenvalues, the first k largest eigenvalues and their corresponding eigenvectors are selected as principal components, where k is the dimension that is desired to be preserved.

The original logging data, the new logging data and the logging data to be predicted are the data of different adjacent wells, are used for more objectively and accurately evaluating the prediction effect of the training model, and further avoid lower prediction verification accuracy caused by higher training accuracy due to data leakage. The original logging data and the new logging data can specifically contain different data contents, and generally should contain data such as time record column, mechanical drilling speed, outlet flow, inlet flow, pump flushing speed, riser pressure, total pool volume, hook hanging weight, torque, outlet conductivity and the like.

Step 202, training four optimization models based on deep learning by using the logging data to train IPSO-1DCNN-LSTM-2DAttention, IPSO-TCN-LSTM-2DAttention, IPSO-ConvLSTM-2DAttention, IPSO-2DCRNN respectively. In this step, the four optimization models based on deep learning are trained respectively by using the original logging data subjected to the data preprocessing operation: . The four models use IPSO improved particle swarm optimization to carry out super-parameter global optimization, wherein the three models use a two-dimensional attention mechanism 2DAttention to distribute attention weight coefficients with different magnitudes to the characteristics learned by the models so as to help the models capture key local characteristics, and the invention uses the attention mechanism in the time dimension and the characteristic dimension simultaneously to realize that weights are distributed to different time steps and physical quantities before data enter an LSTM model for learning, so that the model learning and prediction results better reflect the physical essence;

the principle of the IPSO improved particle swarm algorithm is that a group of basic particles are used, each particle carries some super parameters of a model to be optimized as self attributes, and iterative optimization is carried out until the maximum cycle number or a specified error range is reached with the aim of minimum prediction error or highest prediction accuracy returned by the model after the super parameters are applied;

Specifically, the four optimization models based on deep learning are designed to be continuously and circularly updated by setting a fitness function by using an IPSO improved particle swarm algorithm with the aim of achieving an optimal fitness value, and calculating the fitness value of each round;

specifically, the IPSO algorithm uses 9 key super parameters which influence regression prediction of the IPSO-1DCNN-LSTM-2 DAttntion model, such as the learning rate of the 1DCNN-LSTM-2 DAttntion model, the number of LSTM layers, the number of neurons of each layer of the LSTM layers, the number of convolution kernels of a CNN convolution neural network part, the sliding step length of the convolution kernels of the CNN convolution neural network part, the pooling size of the pooling layer, the sliding step length of the pooling layer, the number of iterative training times epoch value of each round of the LSTM long-term memory network and the sample number batch size of each time of updating the weight parameter of the LSTM network, as the basic individual attributes of the optimization algorithm; the IPSO algorithm takes 9 key super parameters which influence regression prediction of the IPSO-TCN-LSTM-2 dattntion model, such as the learning rate of the IPSO-TCN-LSTM-2 dattntion model, the number of LSTM layers, the number of neurons of each layer of the LSTM layer, the size of convolution kernels in the TCN time sequence convolution network, the number of residual blocks, the Dropout size in a residual structure unit, the number of iterative training times epoch value of each round of the LSTM long-short-term memory network and the sample number batch size used by the LSTM network for updating the weight parameter each time, as the basic individual attribute of the optimization algorithm; the IPSO algorithm uses 8 key super parameters which influence regression prediction of the IPSO-ConvLSTM-2 DAttntion model as the basic individual attribute of an optimization algorithm, wherein the key super parameters are the number of ConvLSTM convolution long-term and short-term memory network layer layers, the number of LSTM layer neurons, the number of ConvLSTM layer convolution kernels, the convolution kernel size, the convolution kernel moving step length, the number of epoch values of each iteration training of the ConvLSTM model and the number of samples batch size used by each updating weight parameter of the model; the IPSO algorithm takes 9 key hyper-parameters affecting regression prediction of the IPSO-2DCRNN model, such as the learning rate of the IPSO-2DCRNN model, the number of RNN circulating neural network layers, the number of neurons of each layer of hidden layers, the dropout value, the number of convolution layers, the convolution kernel size, the number of convolution kernels, the number of epoch values of each iteration training of the 2DCRNN two-dimensional multi-channel non-heterogeneous convolution circulating neural network and the sample number batch size used by the RNN circulating neural network for updating weight parameters each time, as the basic individual attribute of an optimization algorithm. And the mean absolute error MAE of the predicted values of the four models and the true values of the samples is used as the fitness value of the IPSO particle swarm optimization algorithm to design a fitness function.

And 203, performing data sorting operation on the new well logging data. In the step, the data sorting operation refers to three steps of abnormal value processing, missing value filling (filling by taking the average value of five units before and after the missing value) and overflow key feature taking. The overflow key feature refers to a principal component obtained by performing PCA principal component analysis on the original logging data in step 201, where specific data contents of the new logging data and the original logging data may be different.

And 204, predicting the new well logging data subjected to the data arrangement operation by using the four optimization models based on deep learning, and collecting the respective prediction results and prediction error evaluation result MAE values of the four models. In this step, the mean absolute error value MAE is calculated as follows:

and 205, determining weight values of the four optimization models by using a dominant matrix method, constructing a regression prediction combination model and storing the regression prediction combination model locally. In this step, the prediction results of the four optimization models based on deep learning are weighted and combined into one model by using the dominant matrix method: let W ₁ 、W ₂ 、W ₃ 、W ₄ The weight coefficients of the four models are respectively f _it F, as a prediction result of the ith model at the moment t in a designated time interval _t Is to refer toThe dominant matrix method determines the weight coefficient W according to the prediction result of the combined model at the moment t in the fixed time interval _i The method of (2) is as follows:

F _t ＝W ₁ *f _1t +W ₂ *f _2t +W ₃ *f _3t +W ₄ *f _4t ，

/>

The advantage matrix method is used for combining multiple deep learning optimization models, the advantage complementation among the models can be realized, and the obtained combined model is higher in prediction accuracy and reliability than a single model.

And 206, performing data arrangement operation on the logging data of the well to be predicted, and loading the constructed regression prediction combination model to predict the logging data. In this step, the data sort operation includes three steps of outlier processing, missing value filling (filling by taking the average of five units before and after the missing value), and taking the overflow key feature. The overflow key feature is a principal component obtained by performing PCA principal component analysis on the original logging data in step 201, where specific data contents of the logging data to be predicted and the original logging data may be different.

Step 207, the timer automatically performs query and calculation of the regression prediction combined model prediction error MAE value at intervals of a specified time, and if the error exceeds a specified threshold, the trigger performs a combined model update task: the silent updating combined model is realized based on the multithreading technology, and the updated model is saved locally for starting in the next round of execution prediction (the detailed flow is shown in fig. 4). In this step:

(1) Specifically, the timer uses the datetime component in python to realize the functions of reading the system time and counting down, and can set a fixed time, such as the task of starting to execute the inquiry and calculating the error value MAE at the zero point every day;

(2) Specifically, the timer and the trigger use the python multithreading technology to silence in the background when executing tasks, and the normal prediction tasks of the model are not affected. The multithreading is realized by using a threading module of python, a timer is created by using threading (target=timer) and timer-start () operations, a new thread is started for inquiring and calculating operations, and a trigger is created by using threading (target=trigger) and trigger-start () operations, and a new thread is started for retraining a combined model by using newly recorded logging data to be predicted within a specified time step range;

(3) Furthermore, it is required to be clear that, when the timer calculates the error value MAE between the predicted value of the combined model and the actual value of the logging data newly recorded in the well to be predicted in the specified time step, and when the calculated error value is greater than the specified threshold, the trigger multithread executes the timely silence update task of the combined model. The "timely" of the "timely silence update" means that the model is not updated (no operation is performed) if the timer calculates that the error does not exceed the specified threshold value, and the "silence" means that the model update operation is performed in the background by using a multithreading technology, so that the normal prediction task is not affected;

(4) Further, the "designated time step" may be set separately by default from the time length when the last timer is activated.

And step 208, training an improved Particle Swarm Optimization (PSO) neural network model IPSO-BP, and carrying out overflow classification prediction on the basis of the regression prediction result of the combined model. In this step:

(1) Training the BP neural network model IPSO-BP optimized by the improved particle swarm algorithm by using the new well logging data subjected to the data arrangement operation, and storing the trained IPSO-BP model into a local file;

(2) The IPSO improved particle swarm algorithm is used for carrying out global optimization on 6 super parameters affecting the classification prediction performance of the BP neural network, namely the hidden layer number of the BP neural network, the number of neurons of each layer (which can be different), the weight initial value W and the threshold value b of each layer of neurons of each layer (which can be different), the number of iterative training times epoch value of the neural network and the sample number batch size used by the neural network for updating the weight parameters each time.

And step 209, the latest saved regression prediction combination model and classification prediction model IPSO-BP are transferred and learned to a nearby well or are applied to wells of other oil fields after fine tuning. In this step, the migration learning refers to adding a specified number (two layers by default) of Dense full connection layers and output layers to the ends of the two models, respectively, while maintaining the overall structures of the combined regression prediction model and the IPSO-BP classification prediction model unchanged (discarding the output layers), and training the migration model using the logging data of the adjacent wells or wells in other fields subjected to the same data preprocessing.

Specifically, the migration model refers to an original model with the output layer discarded and a new model composed of a plurality of fully connected layers and the output layer newly added.

In particular, the purpose of using the transfer learning is to utilize a completely trained model, and the transfer learning can be rapidly applied to overflow prediction on other similar wells only by carrying out fine adjustment operation and a small amount of training time, so that a large amount of time can be saved compared with retraining, and the problem that a plurality of new wells cannot be trained due to lack of drilling data or lack of a large amount of logging data can be solved.

Step 210, visually displaying the regression prediction result of the combined model and the overflow prediction result of the IPSO-BP classification model. In the step, the regression prediction result of the combined model is visualized by displaying all predicted overflow characteristic data in the same image in the form of curves with different colors and distinguishing historical data from predicted future data; the overflow prediction result visualization mode of the IPSO-BP classification model is a curve formed by overflow occurrence probability values corresponding to each time step in a specified time step from the current time point.

Fig. 3 is a flowchart of a data preprocessing section in the embodiment of the present invention. The method comprises the steps of performing all data preprocessing operations on original logging data, and performing three operations of outlier and outlier processing, missing value filling and overflow key feature taking on new logging data and logging data to be predicted. Furthermore, the overflow key features of the new logging data and the logging data to be predicted are main feature components obtained when PCA main component analysis is carried out on the original logging data. Specifically, fig. 3 includes the following steps:

Step 301, outlier processing and outlier filling. The step is to perform preliminary cleaning of the data, because the situation that the logging data of the production environment is recorded by error or unrecorded due to instrument failure or other unexpected situations is unavoidable, and the logging data is usually missing part of the data and invalid filling data of-9999 or 0 is usually used for the situation that if only a small amount of data is missing or a small amount of outliers and invalid data are present, a mean filling method (taking the mean value of 5 units before and after the missing value for filling) is adopted, and if the large block of data is abnormal, the part of the data is directly deleted.

Step 302, data normalization processing. The calculation formula for the z-score normalization in this step is:

wherein N represents the number of data pieces in the logging data, x _i Representing the ith data in the data set. Further, normalization is to center the data on features and scale the variance such that each feature has zero mean and unit variance, with the purpose of eliminating dimensional differences between different features to ensure that the contribution to the result is fair when calculating the covariance matrix. In contrast, normalization is typically scaling the data to a specific range such as (0, 1) or (-1, 1) without paying attention to the distribution receptivity of features, normalization is typically applicable to some specific algorithms or specific requirements such as feature scaling in a neural network and is not well applicable to PCA principal component analysis, and using the normalization method, the global maximum-minimum range is affected when sample data is newly added, and robustness is relatively poor. Therefore, the invention performs normalization rather than normalization treatment on the data before performing PCA principal component analysis operation, so that the data reaches the state of mean centering and unit variance, thereby better capturing the related structure in the data and better keeping the sample spacing, and avoiding the situation that most samples are piled up due to extreme value influence.

And 303, analyzing principal components of PCA, and extracting key characteristics of overflow. The step is to select the most critical feature causing overflow in the logging data, and is an important means for compressing the data dimension and improving the training efficiency, wherein each main component is a linear combination of the original data, and the main components are mutually independent, so that most of information of the original data is reserved.

In step 304, the data set is divided into a training set, a test set, and a validation set. This step is to avoid the problem of overfitting the model training results. In this step, the segmentation ratio of the training set, the validation set and the test set is 7:3:3. the verification set is used for verifying the training effect of the model, and the test set is used for testing the prediction effect of the trained model on the divided new data. Further, the new well log data and the well log data to be predicted do not do so.

Step 305 smote unbalanced data processing. The step is to solve the serious unbalance problem of data, and as the overflow occurs rarely in actual production, the ratio of non-overflow data to overflow data in logging data can even reach 1: more than 10, there is a serious data unbalance problem, and the unbalanced data can lead the training result of the overflow prediction model to be biased to a category with larger data quantity (overflow does not occur).

Specifically, an SMOTE unbalanced data processing method based on oversampling improvement is used in a training set, and the original proportion of a test set and a verification set is kept unchanged;

fig. 4 is a flowchart of timely silence update of a combined model according to an embodiment of the present invention. The method specifically comprises the following steps of:

in step 401, the activation time and sleep interval of the timer are set. In this step, since python has no natively supported timer method, a timer module is used to implement a timer that supports multi-threaded operation, custom-specified trigger time and interval time. An implementation example is that a timer reads the system time according to a time module, and starts to execute the inquiry and calculation tasks when the system time is set to 00:00, and the dormancy time after each activation is 48 hours, namely the timer is activated once every two days and the activation time is zero, an independent thread is automatically created in the background to execute the inquiry and calculation tasks when each activation is performed, and the thread is destroyed after the task is completed.

Step 402, the system time is one minute before the specified time, and the countdown prompt is started to be output until the specified time is reachedTime-by-time reading and calculating a combination model (predictive value) within a time interval of a specified length ) And the corresponding logging data (true value y). In this step, the time interval of the specified length is generally default to the time length from the last timer activation to the present time, and may be set to other time lengths.

Step 403, triggering a task when the calculated error value exceeds a set threshold value: the multi-threading method is used for taking logging data within a time interval of a specified length to retrain the combined model, the updated combined model file is stored locally and replaces the original combined model file, the original combined model normally executes prediction operation during the period, and the updated combined model takes effect in next-round prediction. In this step, the time interval of the predetermined length is also generally default to the time length from the last timer activation to the present time, and may be set to another time length.

And step 404, whether the combined model needs to be updated or not, inputting the regression prediction result of the combined model into the BP neural network model IPSO-BP optimized by the improved particle swarm algorithm to perform overflow classification prediction based on the regression prediction data result. In this step, the regression prediction result of the combined model refers to time series prediction of the overflow key feature data in the logging data to be predicted by using the combined model, the classification prediction refers to machine learning classification prediction based on the predicted value of the overflow key feature data by using the trained IPSO-BP model, and the classification prediction result of the IPSO-BP model is the probability of overflow occurrence corresponding to each time step point in a future time period of a specified length, and usually one time step is 1 second.

Fig. 5 is a flowchart of the transfer learning in the embodiment of the present invention. Specifically, the migration learning refers to that a pre-trained model trained from a source domain is subjected to fine tuning and then is migrated and applied to a target domain, and a similar relationship often exists between the source domain and the target domain. Specifically, one application of the transfer learning is as follows: the pre-training model for training an image recognition by using tens of millions of images is transferred to a new image field, and the same effect as tens of millions of images can be obtained by using tens of thousands of pictures in the new field. Furthermore, the application of the transfer learning has a plurality of benefits in the invention, such as the pre-training model which is completely trained by a large amount of data can be utilized, the method can be rapidly applied to overflow prediction on other similar wells by performing fine tuning operation and a small amount of training time, and compared with retraining, a large amount of time can be saved, and the problem that a plurality of new wells cannot be trained by the model due to lack of drilling data or a large amount of logging data can be solved. Wherein, fig. 5 specifically includes the following steps:

step 501, loading the stored model files of the latest regression prediction combination model and the classification prediction model IPSO-BP, reading and migrating the structural information and model parameters of the original model to a new model, and adding a full connection layer and an output layer at the tail end. In this step, "reading and migrating the structural information and model parameters of the original model to a new model" refers to copying the trained combined model, the frame structure of the IPSO-BP model and all super parameter values, removing the output layers of the two models, splicing the two layers of fully connected layers and an output layer to form a new model structure, and training the new model (hereinafter referred to as "migration model") by using well logging data of an adjacent well or other oil fields, wherein the migration model can be used as a model after migration to perform an overflow prediction task.

Step 502, training the new model after the migration by using logging data of an adjacent well or other oil fields subjected to the same data preprocessing operation as the present invention, and storing the new model. In this step, the "same data preprocessing operation as the present invention" refers to the data preprocessing flow included in fig. 3.

And step 503, loading the new model after migration to perform model prediction, and simultaneously setting a model timely silence updating function to realize the stable and accurate prediction capability of the new model. The method mainly aims at the application problem of the migration model, and specifically, the method is used for realizing the overflow prediction task of an adjacent well or other oilfield wells, not only is the migration model loaded for prediction, but also a timely updating function of a combined model part is realized, so that the migration model can be ensured to continuously output a stable and accurate overflow prediction result.

Example two

The innovation points are as follows:

1. the invention uses various improved mechanisms to optimize the original model:

a) The basic particle swarm optimization algorithm is subjected to weight nonlinear updating, iteration limiting and other improvements, and the IPSO improved particle swarm optimization algorithm is used for carrying out super-parameter global optimization on the model, so that the model is prevented from being trapped into a local optimal solution;

b) Compared with the method without the mechanism or after the network, the method can effectively distribute high Attention weight to important abstract features and help the model to fully learn the key features;

c) The Attention mechanism uses a plurality of groups of weights instead of sharing one group of weights, and fully considers the multidimensional information characteristics of logging data;

2. the Attention mechanism is innovatively applied to the time dimension and the abstract feature dimension of logging data, namely, the two-dimensional Attention mechanism 2DAttention is only applied to one dimension, so that more important information such as the nonlinear relation among abstract features when entering an overflow section and the change condition of the abstract features in the time dimension can be focused;

3. the method and the device have the advantages that the one-dimensional CNN convolutional neural network is generally used when the time sequence data are processed, and the information of logging data with multi-dimensional characteristics cannot be fully mined, so that the two-dimensional CNN convolutional neural network is changed, namely, a convolutional kernel simultaneously learns and extracts abstract information in the time dimension and the logging data characteristic dimension, so that the convolutional kernel can simultaneously learn the nonlinear relation among abstract characteristics and the change condition of the abstract characteristics in the time dimension, and compared with the one-dimensional convolutional, the method and the device can learn more abundant contents;

4. Because the information provided by different wells is different and the learning effect of different models on the information structure is also different, in order to give consideration to different situations, the invention adopts a combined model mode, adopts the prediction results of the models according to the prediction effect weight, and can improve the reliability and the accuracy of the final prediction results of the models;

5. in consideration of the problem that prediction errors are larger and larger due to gradient disappearance in long-time prediction of the current model, the invention creatively provides a model timely silence updating method based on a timer-trigger and a multithreading technology, and a training model is timely updated according to the current prediction effect of the model on the premise of not influencing a normal prediction task so as to ensure that the model always maintains an accurate prediction effect;

6. the method uses a mode of combining a combined regression prediction model and a classification prediction model to fully utilize the respective advantages of different models (the characteristic extraction advantages of a convolutional neural network, the regression prediction advantages of a cyclic neural network and the classification prediction advantages of a classical BP neural network);

7. in order to facilitate the use of users in the oilfield site, the invention visually displays the regression prediction result and the classification model prediction result of the combined model, marks the important information, and enables the users to quickly know the output information of the model;

8. The invention uses another new logging data when evaluating the prediction capability of each model, so that the model has higher credibility and generalization capability, and more importantly, the problems of data leakage and overfitting possibly generated by using the same sample data in training and verification are avoided;

9. the invention creatively uses the migration learning method to realize the training of the primary model, and can rapidly deploy the overflow prediction task by only carrying out fine adjustment and a small amount of training when being applied to the adjacent well or other oilfield wells, thereby solving the problem that a plurality of new wells cannot be trained by the model because of lacking drilling data or lacking a large amount of logging data, and saving a large amount of time compared with the method that a new model is retrained by searching a large amount of data for each new well.

Specific examples are set forth herein to illustrate the principles and embodiments herein and are merely illustrative of the methods herein and their core ideas; also, as will be apparent to those of ordinary skill in the art in light of the teachings herein, many variations are possible in the specific embodiments and in the scope of use, and nothing in this specification should be construed as a limitation on the invention.

Claims

1. A well drilling overflow prediction combined model based on deep learning and a model timely silence updating and migration learning method are characterized by comprising the following parts:

data preprocessing and model training part: firstly, carrying out data preprocessing operation on original logging data, and respectively training four optimization models based on deep learning, namely a convolutional cyclic neural network model IPSO-1DCNN-LSTM-2DAttention, IPSO based on attention mechanism improved particle swarm optimization, a time sequence convolution long-term memory network model IPSO-TCN-LSTM-2DAttention, IPSO based on attention mechanism improved particle swarm optimization, a convolutional long-term memory network LSTM model IPSO-ConvLSTM-2DAttention, IPSO based on attention mechanism improved particle swarm optimization, wherein the two-dimensional multi-channel non-heterogeneous convolutional neural network model IPSO-2DCRNN based on attention mechanism improved particle swarm optimization is used for the logging data after processing;

building and timely silence updating part of the combined model: the method mainly comprises the steps of deciding and fusing the four optimization models based on an advantage matrix method, constructing a combined model according to the respective advantages, namely the average absolute error value MAE of the respective prediction results of the four optimization models, automatically calculating the average absolute error value MAE between the prediction value of the combined model and the corresponding real value of the on-site logging data at intervals of a specified time by a timer, and finally judging whether to trigger the timely silence updating operation of the combined model according to whether the MAE value reaches a specified threshold value;

2. The deep learning-based drilling overflow prediction combination model and the model timely silence updating and migration learning method according to claim 1, wherein the data preprocessing step comprises the following steps: firstly, performing three steps of outlier processing, missing value filling and z-score standardization on original logging data, and performing primary cleaning and dimension unification on the original logging data to prevent the influence on subsequent component analysis; performing PCA principal component analysis operation on the preliminarily cleaned X-dimensional logging data to extract key features affecting overflow; data set segmentation is then performed: dividing the data set into a training set, a testing set and a verification set; finally, the actual situation that overflow samples in the data set are too few is solved by using an SMOTE unbalanced data processing method;

The outlier processing and missing value filling operation refers to processing outliers, noise values, data filled with 9999, -9999 or infinitesimal values and missing values in original logging data, wherein the processing method is to fill the outliers or the noise values and the missing values in the original logging data by using average values of five data before and after the outliers or the missing values;

3. The well overflow prediction combined model based on deep learning and the model timely silence updating and migration learning method according to claim 1, wherein the model training part respectively trains the four optimization models based on deep learning by using raw logging data subjected to data preprocessing operation: attention mechanism sequential convolution long-short term memory network model based on attention mechanism convolution cyclic neural network model IPSO-1DCNN-LSTM-2DAttention, IPSO improved particle swarm optimization based on attention mechanism sequential convolution long-short term memory network model IPSO-TCN-LSTM-2DAttention, IPSO improved particle swarm optimization based on attention mechanism convolution cyclic neural network model

IPSO-ConvLSTM-2DAttention, IPSO improves a two-dimensional multi-channel non-heterogeneous convolutional neural network model IPSO-2DCRNN optimized by a particle swarm algorithm;

the four optimization models based on deep learning all use an IPSO improved particle swarm optimization algorithm, and the following optimization is performed on the basis of a basic PSO particle swarm algorithm:

three of the four optimization models based on deep learning use an Attention mechanism, the Attention layer is realized by a Dense full-connection layer with an activation function of softmax, and the design mode is more easy to notice key information causing overflow compared with the design mode after the Attention mechanism is applied to the LSTM layer because the output of the later is more abstract and is unfavorable for the Attention mechanism to learn useful information; aiming at multidimensional well logging data, the invention correspondingly designs the Attention mechanism into a multichannel, namely the Attention mechanism of n channels corresponding to n-dimensional features, and can pay more Attention to key information of each feature dimension compared with a shared weight mode of only setting a group of Attention weights; the invention innovatively uses the attention mechanism in both the time dimension and the feature dimension, and can simultaneously improve the effect of important time steps and important features in LSTM (least squares) compared with the situation that only the time dimension or the feature dimension is set in other paper researches, so that the model prediction error is further reduced;

wherein Q (query), K (key), V (value) are three matrices from the same input, d _k For the dimension of the Q or K vector, finally, normalizing the result by using a softmax function and multiplying the normalized result by a matrix V;

(1) Performing global optimization on 9 key super parameters affecting regression prediction of the IPSO-1DCNN-LSTM-2 DAttntion model, namely the learning rate of the 1DCNN-LSTM-2 DAttntion model, the number of LSTM layers, the number of neurons of each layer of the LSTM layers, the number of convolution kernels of a CNN convolution neural network part, the sliding step length of the convolution kernels of the CNN convolution neural network part, the pooling size of a pooling layer, the sliding step length of the pooling layer, the number of epoch values of each iteration training of the LSTM long-term memory network and the number of samples used by the LSTM network for updating the weight parameters each time by using the IPSO algorithm;

(3) The one-dimensional convolutional neural network 1DCNN part in the IPSO-1DCNN-LSTM-2DAttention model can read the logging data input subjected to data preprocessing and automatically learn the characteristics in the logging data input subjected to data preprocessing, the extracted characteristics are subjected to importance weighting processing through a two-dimensional attention mechanism 2DAttention and then sent to a multi-layer LSTM model for interpretation and further learning, wherein the input structure of the CNN model is identical to the input structure of the LSTM model, the CNN model part has two layers of CNN convolutional neural networks in total, the first layer of convolutional reads an input sequence, and the result is projected onto a characteristic diagram; the second layer convolution performs the same operation on the feature map created by the first layer, attempts to amplify the salient features of the feature map, and the stacked CNN convolution layers can capture long-distance dependence; the pooling layer simplifies and reduces the dimension of the extracted features by a downsampling method, and extracts key features;

(1) Performing global optimization on 9 key super parameters affecting regression prediction of the IPSO-TCN-LSTM-2 dattntion model, namely the learning rate of the IPSO-TCN-LSTM-2 dattntion model, the number of LSTM layers, the number of neurons of each layer of the LSTM layers, the size of convolution kernels in the TCN time sequence convolution network, the number of residual blocks, the Dropout size in a residual structure unit, the number of iterative training times epoch value of each round of the LSTM long-short-term memory network and the sample number batch size used by the LSTM network for updating the weight parameter each time by using the IPSO-TCN-LSTM-2 dattntion algorithm; wherein, the TCN time sequence convolution layer uses a leak ReLU as an activation function;

(2) In the IPSO-TCN-LSTM-2DAttention model, the TCN time sequence convolutional neural network is a convolutional network with time sequence processing capability, which is proposed in recent years, improves convolutional in CNN into causal convolutional on the basis of CNN convolutional neural network, and consists of an expansion layer with equal input and output lengths, a causal 1D full convolutional layer and a residual structure, wherein the causal convolutional layer is responsible for modeling logging data, the cavity convolutional layer is responsible for expanding a receptive field as much as possible on the premise of not increasing a pooling layer, the historic memory capability of logging data is increased, and the residual module is responsible for improving the prediction capability of a time sequence long time step, and the addition of the module ensures that the TCN structure has the advantages of parallelism, flexible receptive field, variable input length, no gradient vanishing or gradient explosion problem and smaller occupied memory; after the input logging data is subjected to TCN time sequence convolutional neural network feature extraction, importance weighting processing is carried out on the TCN extracted features by a two-dimensional attention mechanism 2DAttention, so that the feature extraction capacity and classification capacity of the model are improved; then, entering a multi-layer LSTM model for interpretation and further learning; the input format of the TCN network is the same as that of the CNN network, and two convolution layers are arranged;

(3) The convolution layer used in the TCN time sequence convolution neural network is one-dimensional convolution, and the characteristic of multi-dimensional characteristics of logging data is aimed at, the invention adopts a multi-channel characteristic extraction method, namely n-dimensional characteristics adopt corresponding n submodels, in order not to lose characteristics with different abstract degrees, the characteristics of n channels extracted by the n submodels are spliced into a new characteristic vector, and the new characteristic vector is used as a result of the characteristic extraction of the convolution layer in the TCN;

the IPSO improved particle swarm optimization is based on an attention mechanism convolution long-short term memory network model IPSO-ConvLSTM-2DAttention, and is characterized in that:

(1) Performing global optimization on 8 key super parameters influencing regression prediction of the IPSO-ConvLSTM-2 DAttntion model, wherein the key super parameters are the number of ConvLSTM convolution long-term and short-term memory network layer layers, the number of LSTM layer neurons, the number of ConvLSTM layer convolution kernels, the convolution kernel size, the convolution kernel moving step length, the number of iterative training times epoch value of each round of the ConvLSTM model and the number of samples batch size used by each time of updating weight parameters of the model by using the IPSO-ConvLSTM-2 DAttntion model; wherein, the convolution layer of ConvLSTM uses the Leaky ReLU as an activation function;

(3) Wherein ConvLSTM is different from CNN-LSTM or TCN-LSTM in that the former is directly using convolution as part of reading LSTM cell inputs, the latter two are each time step of applying convolution operations to LSTM; the Keras library provides ConvLSTM2D class when the codes are realized, and the ConvLSTM model supporting two-dimensional data can also be configured into one-dimensional multivariable time sequence prediction; the ConvLSTM2D type requires the input data to be in the shape of [ samples, timeteps, rows, cols, channels ] by default, and is respectively expressed as the number of samples, the time step, the one-dimensional shape of each sub-sequence (the number of lines of the sub-sequence), how many columns are in each sub-sequence, and the number of channels (the concept in the image recognition task, and the feature number in the time sequence prediction task studied by the invention);

The IPSO improved particle swarm optimization two-dimensional multi-channel non-heterogeneous convolutional neural network model IPSO-2DCRNN is characterized in that:

(1) Performing global optimization on 9 key super parameters affecting regression prediction of the IPSO-2DCRNN model, namely the learning rate of the IPSO-2DCRNN model, the number of RNN circulating neural network layers, the number of neurons of each hidden layer, the dropout value, the number of convolution layers, the convolution kernel size, the number of convolution kernels, the number of epoch values of each iteration training of the 2DCRNN two-dimensional multi-channel non-heterogeneous convolution circulating neural network and the sample number batch size used by the RNN circulating neural network for updating weight parameters each time by using the IPSO-2DCRNN algorithm; wherein, the convolution layer of 2DCRNN uses the leak ReLU as an activation function;

4. The well overflow prediction combined model based on deep learning and the model timely silence updating and migration learning method according to claim 1, wherein the prediction results of the four optimization models based on deep learning are weighted and fused by using an advantage matrix method, and the four optimization models are formed into a combined model: let W ₁ 、W ₂ 、W ₃ 、W ₄ The weight coefficients of the four models are respectively f _it F, as a prediction result of the ith model at the moment t in a designated time interval _t For the prediction result of the combined model at the t moment in the appointed time interval, determining a weight coefficient W by a dominant matrix method _i The method of (2) is as follows:

F _t ＝W ₁ *f _1t +W ₂ *f _2t +W ₃ *f _3t +W ₄ *f _4t ，

wherein Z is ₁ For the number of times that the model 1 has better prediction effect than other models in a specified time interval, Z ₂ For the number of times that the model 2 has better prediction effect than other models in a specified time interval, Z ₃ Z for specifying the number of times of model 3 prediction effect over other models in the time interval ₄ The number of times that the model 4 has better prediction effect than other models in a specified time interval; the prediction effect refers to the average absolute error value MAE of a prediction result and a real result after the model predicts the new well logging data subjected to data arrangement, and the smaller the error value is, the better the prediction effect of the model is; the designated time interval can be freely set, and the time interval related to the new logging data is defaulted.

5. The method for updating the real-time silence of the combined model according to claim 1, wherein the method comprises the following steps: the method can continuously and parallelly trace back data of a designated time step on the premise that a model normally executes a prediction task, compares a predicted value of the model with a recorded actual value, carries out error calculation, retrains a combined model by using the recorded actual value by using the multithreading method if the error is overlarge so as to ensure continuous high accuracy of model prediction, and particularly, the method for updating timely silence of the combined model has the following characteristics:

6. The overflow classification prediction of the regression prediction result of the combined model using the BP neural network model IPSO-BP optimized by the modified particle swarm algorithm according to claim 1, wherein: and performing global optimization on 6 super parameters affecting the classification prediction performance of the BP neural network, namely the hidden layer number of the BP neural network, the number of neurons at each layer, the weight initial value W and the threshold value b of the neurons at each layer, the iterative training frequency epoch value of each iteration of the neural network and the sample number batch size of the weight parameters updated each time of the neural network by using an IPSO improved particle swarm algorithm.

7. The model migration learning part according to claim 1, wherein: on the premise of keeping the integral structure of the combined regression prediction model and the IPSO-BP classification prediction model unchanged, adding a specified number of Dense full-connection layers and output layers at the tail ends of the two models, and training a migration model by using logging data of adjacent wells or wells in other oil fields subjected to the same data preprocessing; specifically, the migration model refers to an original model without an output layer and a new model formed by a plurality of newly added full-connection layers and the output layer; in particular, the purpose of using the transfer learning is to utilize a completely trained model, and the transfer learning method can be rapidly applied to overflow prediction on an adjacent well or other oilfield wells only by carrying out fine adjustment operation and a small amount of training time, so that a large amount of time can be saved compared with retraining, and the problem that a plurality of new wells cannot be trained due to lack of drilling data or lack of a large amount of logging data can be solved.