CN112183568A

CN112183568A - Submarine pipeline corrosion grade classification method based on DBN and SVM

Info

Publication number: CN112183568A
Application number: CN201910608632.9A
Authority: CN
Inventors: 刘颖; 王立凡
Original assignee: Tianjin University of Science and Technology
Current assignee: Tianjin University of Science and Technology
Priority date: 2019-07-05
Filing date: 2019-07-05
Publication date: 2021-01-05

Abstract

The invention belongs to corrosion grade classification in the field of submarine pipelines, and relates to a sea pipeline corrosion grade classification method based on a DBN (database based network) and an SVM (support vector machine). The main technical characteristics are as follows: and simultaneously, a processing mode of pipeline mechanism is adopted by the DBN and the SVM, and pipeline can connect a plurality of algorithm model algorithms in series. Firstly, dbn is used for processing data, representative features are extracted from the data set, and the features become important bases in the subsequent classification process; unsupervised pre-training specific to the DBN adjusts the transmission weights of the network to appropriate initial values. And then classifying the pipeline corrosion grade by using an SVM classifier. According to the classification of the corrosion grade of the pipeline, workers can know the corrosion condition of the pipeline, the pipeline is reasonably maintained, and unnecessary loss is reduced.

Description

Submarine pipeline corrosion grade classification method based on DBN and SVM

Technical Field

The invention belongs to corrosion grade classification in the field of submarine pipelines, and relates to a sea pipeline corrosion grade classification method based on a DBN (database based network) and an SVM (support vector machine).

Background

The offshore oil in China has become an important part of the petroleum industry in China after 30 years of exploration and development, and a submarine pipeline between an offshore oil and gas field and a land terminal is a life line for offshore oil and gas transmission and plays an important role in ensuring the safety and stability of petroleum energy in China. Meanwhile, the construction of the submarine pipeline also reaches the peak period, and the total mileage of the Chinese oil and gas long-distance pipeline is accumulated to be about 12.6 ten thousand kilometers by 2016, wherein the total mileage of the natural gas pipeline is about 7.43 ten thousand kilometers, the total mileage of the crude oil pipeline is about 2.62 ten thousand kilometers, and the total mileage of the finished oil pipeline is about 2.55 ten thousand kilometers. There are also a number of subsea pipelines in planned construction. The long-distance pipeline under the sea is a permanent project with great cost, and is generally required to be normally used for 20-30 years (except for vulnerable parts of a riser and an upper shore pipe) without maintenance. However, due to the fact that the environment of the submarine pipeline is particularly complex, the inner part of the submarine pipeline is corroded, the probability of accidents of the submarine pipeline is high, especially for old pipelines which have been in a long time and are in a severe environment, leakage of the submarine pipeline is prone to being caused by corrosion damage in the pipeline, a large amount of manpower and material resources are consumed for maintenance of the submarine pipeline, huge economic loss is brought to oil companies, and the marine environment is polluted. According to a statistical report, Russian natural gas pipeline accidents 1980 to 2013 account for 33% of all failures due to external corrosion and 6.9% of failures due to internal corrosion. On 20/4/2010, uk oil companies developed explosions together caused by methane gas leaks, causing 11 workers to die on the spot, the rig to sink and be completely destroyed, millions of gallons of oil leaks. In China, more than 60% of in-service oil and gas pipelines are more than 25 years old in service, a large number of pipelines enter a period of frequent accidents, the accidents caused by corrosion, cracks, mechanical damage and the like are very frequent every year, the corrosion failure is the main failure form of submarine pipelines, and the proportion of the corrosion failure reaches 35%. The offshore pipelines belonging to the China general offshore oil company have a failure 38 from 1995 to 2012, wherein the internal corrosion cause 11 accounts for 28.9%, and the trend is increasing. The corrosion of the submarine pipeline causes serious economic loss and environmental pollution. The loss and harm caused by failure of the submarine pipeline due to corrosion are particularly prominent, the safety and economic targets of the submarine pipeline in the operation stage are seriously influenced, and the submarine pipeline is also a serious and common problem in the petroleum and natural gas transportation process in China. Therefore, there is no doubt that the pipeline must be analyzed in order to achieve accurate prevention of pipeline failure.

Machine learning is one of the most intelligent characteristic and leading-edge research fields of artificial intelligence. Machine learning based on data is an important element in data mining technology, which simulates the ability of humans to learn induction from instances, studies to find laws from observed data, and uses these laws to classify future or unobservable data. One of the common important theoretical bases of the existing machine learning methods is statisticsAnd (5) studying. The traditional statistical study is the asymptotic theory when the number of samples tends to infinity, but in practical problems

The number of samples is often limited, and thus, some of the theoretically excellent learning methods do not perform well in practice. The support vector machine is a novel machine learning method developed on the basis of a statistical learning theory in the middle of the nineties of the twentieth century. The support vector machine trains the learning machine by adopting a structural risk minimization criterion, and has the main advantages that the learning machine is specially used for the limited sample condition, and the aim is to obtain the optimal solution under the existing information, not only the optimal value when the number of samples tends to infinity; the algorithm is finally converted into a quadratic optimization problem, theoretically, the obtained optimal point is a global optimal point, and the problem of local extremum which cannot be avoided in a neural network method is solved. The DBN deep belief network is one of neural networks, can be used for unsupervised learning and is similar to an Autoencorder self-encoder; and also can be used for supervised learning and used as a classifier. The DBN can be used not only to identify features, classify data, but also to generate data.

The corrosion of the submarine pipeline is caused by a plurality of reasons, and the classification of the corrosion grade of the submarine pipeline provides reference for the safe operation of the submarine pipeline later.

Disclosure of Invention

The invention classifies the corrosion grade data of submarine pipelines, and pipeline related data comprises basic data, design data, internal detection data, routing data, historical process data, historical yield data and the like. A data set is formed that classifies the corrosion level of the pipe.

According to the sorted data set, the corrosion of the pipeline is divided into mild corrosion, moderate corrosion and severe corrosion according to the corrosion grade, and the classification is carried out by using a machine learning and deep learning method, so that the corrosion condition of each part of the pipeline section can be determined, a basis is provided for the maintenance and protection of the pipeline, and the working condition operation of the pipeline is safer. In this regard, we use a combination of DBN and SVM methods to classify the corrosion level of the pipe.

The DBN is a probabilistic generative model, as opposed to the neural network of the traditional discriminant model, which builds a joint distribution between observed data and labels, and evaluates both P (observer | Label) and P (Label | observer), while the discriminant model has only evaluated the latter, i.e., P (Label | observer). The DBN is composed of layers of constrained Boltzmann Machines (Restricted Boltzmann Machines), which are "constrained" into a visible layer and a hidden layer, with connections between layers but not between cells within a layer. Hidden layer units are trained to capture the dependencies of the high-order data represented in the visible layer. The classic DBN network structure is a deep neural network composed of several layers of RBMs. Since RBMs can be trained quickly, this framework bypasses the highly complex training of DBNs directly as a whole, but simplifies the training of DBNs to multiple RBMs, thereby simplifying the problem. After the training in the mode, the network can be finely adjusted through a traditional global learning algorithm (such as a BP algorithm), so that the model converges to a local optimal point, and a deep network can be efficiently trained through the mode. A common application of deep neural networks is feature extraction. The realization idea is very simple, and the characteristics of the data are extracted by using a deep network structure formed by the laminated RBM networks. When supervised tuning training is performed, a forward propagation algorithm is needed to obtain a certain output value from input, and then a backward propagation algorithm is used to update the weight value and the bias value of the network.

The classifier uses a Support Vector Machine (SVM) to perform classification. A Support Vector Machine (SVM) is a generalized linear classifier (generalized linear classifier) that performs binary classification (binary classification) on data in a supervised learning (supervised learning) manner, and a decision boundary of the SVM is a maximum margin hyperplane for solving a learning sample.

SVM (support vector machine model) separates different classes in finding the optimal hyperplane of n-dimensional space. Optimal here means that the closest distance between a sample point and the hyperplane is maximized, the maximum separation makes it different from perceptron learning, and kernel skills are present in the SVM, so that the SVM is a practically non-linear classifier function. The SVM adopts a kernel function to realize the mapping from a low-dimensional space to a high-dimensional space, thereby solving the problem that the linearity of the low-dimensional space is inseparable to a certain extent. If we map the original one-dimensional feature space to two-dimensional feature spaces X2 and X, then we can find the separation hyperplane X2-X ═ 0. When X2} -X < 0, it can be discriminated as class 1, and when X2-X > 0, it can be discriminated as class 0. Mapping X2-X to 0 back to the original feature space, it can be known that the instance class between 0 and 1 is 1, and the instance classes on the remaining space (less than 0 and greater than 1) are 0.

1. A submarine pipeline corrosion grade classification method based on DBNs and SVM comprises the following steps:

step 1: carrying out missing value data processing and invalid data processing, and dividing a training set and a test set;

step 2: reading pipeline data through pandas, and converting the read data into a matrix, so as to facilitate training;

and step 3: configuring parameters and initial values of a DBN neural network;

and 4, step 4: configuring parameters of the SVM classifier;

and 5: classifying the corrosion grade by using the DBN and the SVM simultaneously;

step 6: and inputting test data to detect the model method and verify the classification effect.

Drawings

FIG. 1 is a diagram of a deep network of a DBN

FIG. 2 is an ACCURACY diagram of the classification results of data processed by different classification methods.

FIG. 3 is a PRECISION diagram of the classification results of the data after being processed by different classification methods.

Fig. 4 is a classification result record graph of data processed by different classification methods.

FIG. 5 is a classification result F1-SCORE diagram of data processed by different classification methods.

Detailed Description

in this example, the data used is the corrosion data for a pipeline in a field, for a total of 4015 valid data. Wherein some data are missing and some are invalid, replacing the missing invalid data with an average. And selecting the first 2018 data as a training set for model training, using the rest data as a test set to verify the feasibility of the model, and classifying the pipeline corrosion grade.

for the raw data read with pandas, both the training set and the test set are converted into matrices that can be used for input. Because the size difference of each dimension data is large, the data needs to be normalized to reduce errors and improve the classification precision. The normalization method adopted by the invention is to normalize the min-max minimum and maximum values, so that the data values are mapped between [0 and 1], and the formula is as follows:

wherein x is_minIs the minimum value of the sample data, x_maxIs the maximum value of the sample data, x^*Is the data to be input to the neural network.

And step 3: configuring parameters and initial values of a DBN neural network;

the design of the DBN network uses a three-layer network structure, the number of neurons in each layer is 64, 32 and 4, the training iteration number is 25, the learning rate is 0.1, the number of samples put in each training is 20, and the activation function of the model is relu.

And 4, step 4: configuring parameters of the SVM classifier;

the kernel function of the SVM model selects rbf, and the method has the advantages of no linear inseparability, less feature dimension, and good effect under the condition of no prior knowledge when the number of samples is normal. The Gamma value was 0.1 and C was 10000.

and simultaneously, a processing mode of pipeline mechanism is adopted by the DBN and the SVM, and pipeline can connect a plurality of algorithm model algorithms in series. Firstly, dbn is used for processing data, representative features are extracted from the data set, and the features become important bases in the subsequent classification process; unsupervised pre-training specific to the DBN adjusts the transmission weights of the network to appropriate initial values. And then classifying the pipeline corrosion grade by using an SVM classifier.

The method provided by the invention is compared with a single SVM, a lightGBM model and a Keras model. Obtaining a confusion matrix from the classification result, tp (true positive): true positive case/true positive refers to a positive tuple correctly classified by the classifier; tn (true negative): true negative case/true negative, which refers to the negative tuple correctly classified by the classifier; FP (false Positive): false positive case/false positive, a negative tuple that is incorrectly labeled as a positive tuple; fn (false negative): false negative cases/false negatives are positive tuples that are incorrectly labeled as negative tuples. The classification result evaluation indexes are represented by accuracy, precision, call and F1-score, and the accuracy (accuracy): the percentage of tuples correctly classified by the classifier, the accuracy rate is also called the overall recognition rate of the classifier, i.e. it reflects the correct recognition situation of the classifier on various tuples, and is most effective when the class distribution is relatively balanced. Namely accracy ═ TP + TN)/(P + N). Precision (precision): can be seen as a measure of accuracy (the tuples marked as positive classes are actually a percentage of positive classes) precision TP/(TP + FP). Recall (recall): a measure of completeness (percentage of positive tuple labeled positive) is the sensitivity recall TP/(TP + FN). F1-score (2 × P × R)/(P + R).

Claims

and step 3: configuring parameters and initial values of a DBN neural network;

and 4, step 4: configuring parameters of the SVM classifier;