CN117313041A - Method for predicting pitting corrosion of outer surface of buried pipeline and analyzing factors - Google Patents
Method for predicting pitting corrosion of outer surface of buried pipeline and analyzing factors Download PDFInfo
- Publication number
- CN117313041A CN117313041A CN202311346348.1A CN202311346348A CN117313041A CN 117313041 A CN117313041 A CN 117313041A CN 202311346348 A CN202311346348 A CN 202311346348A CN 117313041 A CN117313041 A CN 117313041A
- Authority
- CN
- China
- Prior art keywords
- pipeline
- data
- pitting
- soil
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007797 corrosion Effects 0.000 title claims abstract description 38
- 238000005260 corrosion Methods 0.000 title claims abstract description 38
- 238000000034 method Methods 0.000 title claims abstract description 24
- 239000002689 soil Substances 0.000 claims abstract description 48
- 238000000576 coating method Methods 0.000 claims abstract description 28
- 239000011248 coating agent Substances 0.000 claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 27
- 238000012360 testing method Methods 0.000 claims abstract description 10
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 230000000694 effects Effects 0.000 claims description 17
- 238000005457 optimization Methods 0.000 claims description 17
- 238000002790 cross-validation Methods 0.000 claims description 13
- 238000012795 verification Methods 0.000 claims description 10
- 238000010586 diagram Methods 0.000 claims description 8
- 238000011156 evaluation Methods 0.000 claims description 7
- 238000010200 validation analysis Methods 0.000 claims description 6
- 230000009471 action Effects 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 5
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 4
- 230000002596 correlated effect Effects 0.000 claims description 4
- 238000003066 decision tree Methods 0.000 claims description 4
- KJONHKAYOJNZEC-UHFFFAOYSA-N nitrazepam Chemical compound C12=CC([N+](=O)[O-])=CC=C2NC(=O)CN=C1C1=CC=CC=C1 KJONHKAYOJNZEC-UHFFFAOYSA-N 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 230000003628 erosive effect Effects 0.000 claims description 2
- 239000004927 clay Substances 0.000 description 16
- VEXZGXHMUGYJMC-UHFFFAOYSA-M Chloride anion Chemical compound [Cl-] VEXZGXHMUGYJMC-UHFFFAOYSA-M 0.000 description 3
- 239000010426 asphalt Substances 0.000 description 3
- 239000002320 enamel (paints) Substances 0.000 description 3
- 229920006334 epoxy coating Polymers 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 239000011280 coal tar Substances 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003245 coal Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003912 environmental pollution Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000033116 oxidation-reduction process Effects 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/27—Regression, e.g. linear or logistic regression
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2113—Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Testing Resistance To Weather, Investigating Materials By Mechanical Methods (AREA)
Abstract
The invention relates to the field of corrosion protection of oil and gas pipelines, and discloses a method for predicting pitting corrosion and analyzing factors on the outer surface of a buried pipeline, which comprises the following steps: selecting soil characteristics and pipeline characteristics related to pitting of the outer surface of the buried pipeline as input characteristics; obtaining sample data of soil types and pipeline coating types; preprocessing sample data to form a corrosion data set; constructing a regression prediction model of the pitting depth of the outer surface of the buried pipeline based on an XGBoost algorithm; randomly selecting 80% of data in the corrosion data set as a training set for optimizing training of the model, and using the remaining 20% of data as a test set for evaluating the predicted performance of the trained model; and inputting the sample data into the trained model to obtain the soil characteristic and pipeline characteristic importance and pitting depth prediction result. The invention converts the classification characteristic into the continuous variable for the pitting depth regression prediction model, and realizes accurate prediction of the pitting depth of the pipeline.
Description
Technical Field
The invention relates to the technical field of corrosion protection of oil and gas pipelines, in particular to a method for predicting pitting corrosion of the outer surface of a buried pipeline and analyzing factors.
Background
The oil and gas pipeline is called an energy vessel and plays an important role in ensuring energy safety. After the buried oil gas conveying pipeline is in service for a certain time, corrosion failure can occur due to interaction of various media and environments, so that oil gas leakage is caused, normal operation of an oil gas conveying system is affected, and environmental pollution, economic loss and even safety accidents are caused. Therefore, it is urgently required to predict the pitting corrosion of the buried oil and gas pipeline so as to provide important basis for the detection and maintenance thereof.
At present, the prediction method of the pitting depth of the buried oil and gas pipeline mainly comprises an empirical calculation model, a corrosion mechanism model, a gray theory, a neural network model and the like. However, the accuracy of the models in predicting the pitting depth of the outer surface of the buried oil and gas pipeline is poor, the interpretation of the models is poor, and the importance and the action rule of various pitting influence factors cannot be determined, so that the results of the prediction models cannot directly promote the upgrading of oil and gas pipeline materials and construction processes.
In order to meet the actual requirements of accurate prediction of pitting corrosion of the outer surface of an oil and gas pipeline and quantitative analysis of factors, it is urgently required to establish a prediction model based on interpretable machine learning and to deeply analyze and evaluate the action rules of various factors.
Disclosure of Invention
Aiming at the problems, the invention provides a method for predicting pitting corrosion and analyzing factors on the outer surface of a buried pipeline.
The invention adopts the following technical scheme:
a method for predicting pitting corrosion and analyzing factors on the outer surface of a buried pipeline comprises the following steps:
step 1: selecting soil characteristics and pipeline characteristics related to pitting of the outer surface of the buried pipeline as input characteristics; step 2: obtaining sample data of soil types and pipeline coating types;
step 3: preprocessing the sample data obtained in the step 2 to form a corrosion data set;
step 4: constructing a regression prediction model of the pitting depth of the outer surface of the buried pipeline based on an extremum gradient lifting algorithm XGBoost;
step 5: randomly selecting 80% of data in the corrosion data set as a training set for optimized training of the XGBoost model, and using the remaining 20% of data as a test set for evaluating the predicted performance of the trained model;
step 6: and inputting the sample data into the trained model to obtain the soil characteristic and pipeline characteristic importance and pitting depth prediction result.
Step 7: and calculating an accumulated local effect ALE value to analyze the action rule and threshold value of the soil characteristics and the pipeline characteristics on the pitting depth.
Further, the soil characteristics in step 1 include soil type, resistance, water content, HCO 3 - Content of Cl - Content of SO 4 2- Content, redox potential, pH, soil resistivity, soil-to-pipeline potential; pipeline characteristics include pipeline coating type, service life.
Further, the sample data preprocessing in the step 3 includes the following steps:
step 3.1: outlier processing: detecting an outlier in the sample data in the step 2 by using a quartile range of the box diagram, wherein the outlier is corrected by using a mode;
step 3.2: and (3) data conversion, namely converting the soil type and pipeline coating type variables in the sample data in the step (2) into numerical data by using a single-heat coding mode.
Further, the step 5 includes the following steps:
step 5.1: inputting a training set into the model constructed in the step 4;
step 5.2: performing feature optimization on the buried pipeline outer surface pitting prediction model based on the extremum gradient lifting algorithm XGBoost;
step 5.3: and performing super-parameter optimization on the buried pipeline outer surface pitting prediction model based on the extremum gradient lifting algorithm XGBoost.
Further, the specific feature optimization step in the step 5.2 is as follows: the Pelson correlation coefficient r between the features is calculated by adopting the following formula, wherein the two features with the correlation coefficient exceeding 0.7 are highly linearly correlated, one of the features is regarded as redundant features to be removed, and the remaining features are input into a model for training to obtain an optimal model;
wherein x is i And y i Representing the values of the ith sample in features x and y respectively,and->Represents the average of two features, N being the total number of samples.
Further, the super parameter optimization in the step 5.3 uses network searching and adopts cross verification, and the specific steps include:
step 5.3.1: setting a parameter grid, and setting a candidate value list for each super parameter;
step 5.3.2: setting the number of K-fold cross-validation folds, dividing a training set into K parts, respectively making each part of data into a validation set, taking the rest K-1 parts of data as the training set, and taking the average evaluation result of the K models in the validation set as cross-validation errors;
step 5.3.3: calculating model cross verification results under different super parameter combinations, and outputting the parameter combination with the smallest cross verification error as the optimal super parameter;
step 5.3.4: and under the condition of obtaining the optimal super-parameter combination, retraining the model, and verifying the reserved test data to obtain the optimal model.
Further, in the step 6, the importance of the soil feature and the pipeline feature is determined as the important feature by taking the sum of the times of dividing the feature in all decision trees as the dividing attribute, wherein the sum of the times is larger than a set threshold value.
Further, the specific steps of calculating the accumulated local effect ALE value in the step 7 are as follows: dividing the distribution interval of each feature into K subintervals, and calculating the accumulated local effect ALE value of the feature in each subinterval by adopting the following formula; a line graph is made for the calculated accumulated local effect value, and the line graph reflects the change rule of the pitting depth along with the characteristics and a threshold value;
wherein ALE (j, x) represents the feature j non-centered ALE value; x represents a feature other than feature j; k (k) j (x) The number of the division intervals of the feature j; n is n j (k) Representing the sample size in the kth interval; i represents an i-th sample point in a k-th section; z is Z k,j An upper boundary value for feature j in the kth interval; z is Z k-1,j A lower boundary value for feature j in the kth interval;representing the ith sample point in the kth interval.
The beneficial effects of the invention are as follows:
1. the method can convert the classification characteristic into the continuous variable for the pitting depth regression pre-model based on the XGBoost algorithm, and realizes accurate prediction of the pitting depth of the pipeline.
2. The invention can screen and sort the importance of the features, and improve the prediction performance and generalization capability of the model.
3. The method can quantitatively analyze the influence rule of soil characteristics and pipeline characteristics on pitting corrosion, can find out the influence threshold value of each characteristic, increases the understanding of a pipeline pitting corrosion mechanism, and can be used for pipeline pitting corrosion prediction and rule analysis and development of corresponding corrosion protection methods.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following brief description of the drawings of the embodiments will make it apparent that the drawings in the following description relate only to some embodiments of the present invention and are not limiting of the present invention.
FIG. 1 is a schematic diagram of an operational framework of the present invention;
FIG. 2 is a diagram showing the result of the pearson correlation coefficient of the input features obtained after the preprocessing in the present invention;
FIG. 3 is a schematic view of the feature importance result of the feature importance evaluation calculation according to the present invention;
FIG. 4 is a schematic diagram of the predicted result of the pitting depth of the outer surface of the oil and gas pipeline according to the invention;
FIG. 5 is a graph schematically illustrating the concentration of chloride ions versus the depth of pitting corrosion calculated by the cumulative local effect value according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without creative efforts, based on the described embodiments of the present invention fall within the protection scope of the present invention.
The invention will be further described with reference to the drawings and examples.
As shown in FIG. 1, the method for predicting the pitting corrosion and analyzing the factors of the outer surface of the buried pipeline comprises the following steps:
step 1: soil characteristics and pipe characteristics associated with pitting of the outer surface of the buried pipe are selected as input characteristics.
Wherein the soil characteristics in the step 1 comprise soil type, resistance, water content and HCO 3 - Content of Cl - Content of SO 4 2- Content, redox potential, pH, soil resistivity, soil-to-pipeline potential; pipeline characteristics include pipeline coating type, service life.
Step 2: soil type and pipe coating type sample data are obtained.
Step 3: and (3) preprocessing the sample data obtained in the step (2) to form a corrosion data set.
The pretreatment in the step 3 comprises the following steps:
step 3.1: outlier processing: and (3) detecting an outlier in the sample data in the step (2) by using the quartile range of the box diagram, wherein the outlier is corrected by using the mode.
Step 3.2: and (3) data conversion, namely converting the soil type and pipeline coating type variables in the sample data in the step (2) into numerical data by using a single-heat coding mode.
Step 4: and constructing a regression prediction model of the pitting depth of the outer surface of the buried pipeline based on the extremum gradient lifting algorithm XGBoost.
Step 5: randomly selecting 80% of data in the corrosion data set as a training set for optimized training of the XGBoost model, and the remaining 20% of data as a test set for evaluating the performance of the trained model prediction.
Said step 5 comprises the steps of:
step 5.1: inputting the training set into the model constructed in the step 4.
Step 5.2: and performing feature optimization on the buried pipeline outer surface pitting prediction model based on the extremum gradient lifting algorithm XGBoost.
The specific step of feature optimization in the step 5.2 is as follows: the Pelson correlation coefficient r between the features is calculated by adopting the following formula, wherein the two features with the correlation coefficient exceeding 0.7 are highly linearly correlated, one of the features is regarded as redundant features to be removed, and the remaining features are input into a model for training to obtain an optimal model;
wherein x is i And yi represents the value of the ith sample in features x and y,and->Represents the average of two features, N being the total number of samples.
Step 5.3: and performing super-parameter optimization on the buried pipeline outer surface pitting prediction model based on the extremum gradient lifting algorithm XGBoost.
The super parameter optimization in the step 5.3 uses network searching and adopts cross verification, and the specific steps comprise:
step 5.3.1: setting a parameter grid, and setting a candidate value list for each super parameter.
Step 5.3.2: setting the number of K-fold cross-validation folds, dividing the training set into K parts, respectively making each part of data into a validation set, taking the rest K-1 parts of data as the training set, and taking the average evaluation result of the K models in the validation set as the cross-validation error.
Step 5.3.3: and calculating the model cross-validation results under different super-parameter combinations, and outputting the parameter combination with the minimum cross-validation error as the optimal super-parameter.
Step 5.3.4: and under the condition of obtaining the optimal super-parameter combination, retraining the model, and verifying the reserved test data to obtain the optimal model.
Step 6: and inputting the sample data into the trained model to obtain the soil characteristic and pipeline characteristic importance and pitting depth prediction result.
And in the step 6, the importance of the soil characteristic and the pipeline characteristic is determined as the important characteristic by taking the characteristic as the sum of the times of dividing the attribute in all decision trees, wherein the sum of the times is larger than a set threshold value.
Step 7: and calculating an accumulated local effect ALE value to analyze the action rule of soil characteristics and pipeline characteristics on the erosion depth.
The specific steps of calculating the accumulated local effect ALE value in the step 7 are as follows: dividing the distribution interval of each feature into K subintervals, and calculating the accumulated local effect ALE value of the feature in each subinterval by adopting the following formula; and (3) making a line graph for the calculated accumulated local effect ALE value, wherein the line graph reflects the law of the pitting depth along with the characteristic change and the threshold value, and the specific law is that the characteristic concentration is increased, the pitting depth is increased, and the characteristic concentration exceeds the threshold value mutation point, so that the pitting depth is rapidly increased.
Wherein ALE (j, x) represents the feature j non-centered ALE value; x represents a feature other than feature j; k (k) j (x) The number of the division intervals of the feature j; n is n j (k) Representing the sample size in the kth interval; i represents an i-th sample point in a k-th section; z is Z k,j An upper boundary value for feature j in the kth interval; z is Z k-1,j A lower boundary value for feature j in the kth interval;representing the ith sample point in the kth interval.
Examples
The method for predicting the pitting corrosion and analyzing the factors of the outer surface of the buried pipeline provided by the embodiment of the invention is described in detail below.
Step 1: selecting the maximum pitting depth of the outer surface of the pipeline as an index of pitting corrosion and taking the index as an output characteristic of a model, and selecting the input characteristic as soil type, resistance, water content and HCO 3- Content of Cl - Content of SO4 2- Content, redox potential, pH, soil resistivity, soil-to-pipeline potential, pipeline coating type, service life.
Step 2: sample data of soil types and pipeline coating types are obtained, and 259 sample data are obtained by detecting the maximum pitting depths of the outer surfaces of the service oil and gas pipelines at different positions and surrounding soil on site, wherein the sample data comprise all input and output characteristics in the step 1.
Step 3: pre-processing 259 sets of sample data obtained in the step 2 to form a corrosion data set. The specific process is as follows:
step 3.1: outlier processing: detecting outliers in the sample data in the step 2 according to a box diagram with a quartile whisker line, wherein the outliers in the box diagram are defined as data greater than Q3+1.5IQR or less than Q1-1.5IQR; and correcting the outlier by using the mode of the corresponding feature. Wherein Q1 is the lower quartile of the data, Q3 is the upper quartile of the data, IQR is the quartile spacing, and is the difference between Q3 and Q1.
Step 3.2: data conversion: and (3) converting the soil type and pipeline coating type variables in the sample data in the step (2) into numerical data by using a single-heat coding mode. The method specifically comprises the steps of counting the number m of types of soil and pipeline coating types, determining that the soil and the pipeline coating types have m-bit codes, marking the ith bit of the ith type in the soil and the pipeline coating types as 1, marking the rest bit as 0, and converting the data of the soil and the pipeline coating types in the examples shown in the table 1:
table 1 results table after soil and pipe coating type data conversion
Note that: soil type: c (clay), SYCL (sandy clay loam), CL (clay loam), SCL (silty clay loam), SC (silty clay), SL (silty loam)
Type of coating: NC (uncoated), AEC (asphalt enamel coating), WTC (tape coating),
CTCs (coal tar coatings), FBEs (fusion epoxy coatings).
Step 4: and constructing a regression prediction model of the pitting depth of the outer surface of the buried pipeline based on the extremum gradient lifting algorithm XGBoost.
FIG. 1 shows a framework of a regression prediction model of buried pipeline outer surface pitting depth based on the XGBoost algorithm; the XGBoost model is built through a machine learning library Scikit-learn, input variables are input features preprocessed in the step 3, and the model is an existing model.
Step 5: randomly selecting 80% of data in the data set as a training set for optimized training of the XGBoost model, and using the remaining 20% of data as a test set for evaluating the predicted performance of the trained model. Training optimization and evaluation are carried out by adopting the XGBoost model framework constructed in the step 4, and the specific process is as follows:
step 5.1: inputting the training set into the model constructed in the step 4.
Step 5.2: and performing feature optimization on the buried pipeline outer surface pitting prediction model based on the extremum gradient lifting algorithm XGBoost.
The specific step of feature optimization in the step 5.2 is as follows: the Pelson correlation coefficient r between the features is calculated by adopting the following formula, wherein the two features with the correlation coefficient exceeding 0.7 are highly linearly correlated, one of the features is regarded as redundant features to be removed, and the remaining features are input into a model for training to obtain an optimal model;
wherein x is i And y i Representing the value of the ith sample in features x and y,and->Represents the average of two features, N being the total number of samples.
The result of the pearson correlation coefficient of the input features obtained after the pretreatment in the examples is shown in FIG. 2, in which the soil type (clay (ct_C), sandy clay (ct_SYCL), clay (ct_CL), silty clay (ct_SCL), silty clay (ct_SC), silty clay (ct_SL)), water content (wc), HCO 3 - Content (bc), cl - Content (cc), SO 4 2- Content (sc), oxidation-reduction potential (rp), pH (pH), soil resistivity (re), soil-pipe potential (pp); the type of the coded pipeline coating (no coating (class_NC), asphalt enamel coating (class_AEC), tape coating (class_WTC), coal tarCoating (class_ctc), fusion epoxy coating (class_fbe)), service life (t).
Step 5.3: and performing super-parameter optimization on the buried pipeline outer surface pitting prediction model based on the extremum gradient lifting algorithm XGBoost.
The super parameter optimization in the step 5.3 uses grid search and adopts cross verification, and the specific steps include:
step 5.3.1: setting a parameter grid: a candidate list is set for each super parameter. The hyper-parameter setting list setting values of the XGBoot model in the embodiment are as follows: the number of base estimators n_estimators: [10,20,50,100,200,300]. Sub-sampling rate subsamples: [1.0,0.9,0.8], maximum tree depth max_depth: [5,9,15,20,25], learning rate: [0.05,0.1,0.2,0.4,0.8,1], remaining super-parameters are default values.
Step 5.3.2: setting the number of K-fold cross validation folds: dividing the training set into K parts, respectively making each part of data into a verification set, taking the rest K-1 parts of data as the training set, and taking the average evaluation result of the K models in the verification set as the cross verification error. The cross-validation fold number is set to 5 in the example.
Step 5.3.3: and calculating the model cross-validation results under different super-parameter combinations, and outputting the parameter combination with the minimum cross-validation error as the optimal super-parameter.
Step 5.3.4: and after the optimal super-parameter combination is obtained, retraining the model, and verifying the reserved test data to obtain the optimal model.
Step 6: and after model prediction, obtaining the soil characteristic and pipeline characteristic importance and pitting depth prediction result.
And in the step 6, the importance of the soil characteristic and the pipeline characteristic is determined as the important characteristic by taking the characteristic as the sum of the times of dividing the attribute in all decision trees, wherein the sum of the times is larger than a set threshold value.
The importance of the features calculated by the feature importance evaluation in the examples is shown in FIG. 3, wherein the features include the encoded soil types (clay (ct_C), sandy clay loam (ct_SYCL), clay loam (ct_CL), silty clay loam (ct_SCL), silty clay (ct_SC), silty loam (ct_SL)), and waterRate (wc), HCO 3 - Content (bc), cl - Content (cc), SO 4 2- Content (sc), redox potential (rp), pH (pH), soil resistivity (re), soil-to-pipe potential (pp), and coded pipe coating type (no coating (class_nc), asphalt enamel coating (class_aec), tape coating (class_wtc), coal tar coating (class_ctc), fusion epoxy coating (class_fbe)), service life (t).
In the embodiment, the result of the test set predicted by the optimal model is shown in fig. 4, and the predicted value is close to the true value, so that the reliability of the model is verified.
The calculating of the accumulated local effect ALE value in the step 7 specifically comprises the following steps: dividing the distribution interval of each feature into K subintervals, and calculating the accumulated local effect value ALE of the feature in each subinterval by adopting the following formula; and (3) making a line graph for the calculated accumulated local effect ALE value, wherein the line graph reflects the law and the threshold value of the pitting depth along with the characteristic change.
Wherein ALE (j, x) represents the feature j non-centered ALE value; x represents a feature other than feature j; k (k) j (x) Feature j divides the number of intervals. n is n j (k) Representing the sample size in the kth interval; i represents an i-th sample point in a k-th section; z is Z k,j An upper boundary value for feature j in the kth interval; z is Z k-1,j A lower boundary value for feature j in the kth interval;representing the ith sample point in the kth interval.
The calculated K value for ALE in the example was set to 20, and a plot of the calculated cumulative local effect of chloride ion concentration versus pitting depth is shown in fig. 5, which shows that as chloride ion concentration increases, pitting depth increases, and that pitting depth increases rapidly once concentration exceeds the threshold discontinuity by 30 ppm.
The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the invention in any way, but any simple modification, equivalent variation and modification made to the above embodiment according to the technical matter of the present invention still fall within the scope of the technical scheme of the present invention.
Claims (8)
1. The method for predicting the pitting corrosion and analyzing the factors of the outer surface of the buried pipeline is characterized by comprising the following steps:
step 1: selecting soil characteristics and pipeline characteristics related to pitting of the outer surface of the buried pipeline as input characteristics;
step 2: obtaining sample data of soil types and pipeline coating types;
step 3: preprocessing the sample data obtained in the step 2 to form a corrosion data set;
step 4: constructing a regression prediction model of the pitting depth of the outer surface of the buried pipeline based on an extremum gradient lifting algorithm XGBoost;
step 5: randomly selecting 80% of data in the corrosion data set as a training set for optimizing training of the model, and using the remaining 20% of data as a test set for evaluating the predicted performance of the trained model;
step 6: and inputting the sample data into the trained model to obtain the soil characteristic and pipeline characteristic importance and pitting depth prediction result.
Step 7: and calculating an accumulated local effect ALE value to analyze the action rule of soil characteristics and pipeline characteristics on the erosion depth.
2. The method for predicting and analyzing the pitting corrosion on the outer surface of a buried pipeline according to claim 1, wherein the soil characteristics in the step 1 include soil type, water content and HCO 3 - Content of Cl - Content of SO 4 2- Content, redox potential, pH, soil resistivity, soil-to-pipeline potential; pipeline characteristics include pipeline coating type, service life.
3. The method for predicting pitting corrosion and analyzing factors on the outer surface of a buried pipeline according to claim 1, wherein the preprocessing in the step 3 comprises the following steps:
step 3.1: outlier processing: detecting an outlier in the sample data in the step 2 by using a quartile range of the box diagram, wherein the outlier is corrected by using a mode;
step 3.2: and (3) data conversion, namely converting the soil type and pipeline coating type variables in the sample data in the step (2) into numerical data by using a single-heat coding mode.
4. The method for predicting and analyzing the pitting corrosion on the outer surface of the buried pipeline according to claim 1, wherein the step 5 comprises the following steps:
step 5.1: inputting a training set into the model constructed in the step 4;
step 5.2: performing feature optimization on the buried pipeline outer surface pitting prediction model based on the extremum gradient lifting algorithm XGBoost;
step 5.3: and performing super-parameter optimization on the buried pipeline outer surface pitting prediction model based on the extremum gradient lifting algorithm XGBoost.
5. The method for predicting pitting corrosion and analyzing factors on the outer surface of a buried pipeline according to claim 4, wherein the specific feature optimization step in step 5.2 is as follows: the Pelson correlation coefficient r between the features is calculated by adopting the following formula, wherein the two features with the correlation coefficient exceeding 0.7 are highly linearly correlated, one of the features is regarded as redundant features to be removed, and the remaining features are input into a model for training to obtain an optimal model;
wherein x is i And y i Representing the value of the ith sample in features x and y,and->Represents the average of two features, N being the total number of samples.
6. The method for predicting pitting corrosion and analyzing factors on the outer surface of a buried pipeline according to claim 4, wherein the super-parameter optimization in step 5.3 uses network search and adopts cross validation, and the specific steps include:
step 5.3.1: setting a hyper-parameter grid, and setting a candidate value list for each hyper-parameter;
step 5.3.2: setting the number of K-fold cross-validation folds, dividing a training set into K parts, respectively making each part of data into a validation set, taking the rest K-1 parts of data as the training set, and taking the average evaluation result of the K models in the validation set as cross-validation errors;
step 5.3.3: calculating model cross verification results under different super parameter combinations, and outputting the parameter combination with the smallest cross verification error as the optimal super parameter;
step 5.3.4: and under the condition of obtaining the optimal super-parameter combination, retraining the model, and verifying the reserved test data to obtain the optimal model.
7. The method for predicting and analyzing the pitting corrosion on the outer surface of the buried pipeline according to claim 1, wherein the soil characteristic and the importance of the pipeline characteristic in the step 6 are the sum of the times of taking the characteristic as the dividing attribute in all decision trees, and the sum of the times is larger than a set threshold value, and the important characteristic is judged.
8. The method for predicting and analyzing the pitting corrosion on the outer surface of the buried pipeline according to claim 1, wherein the specific steps of calculating the accumulated local effect ALE value in the step 7 are as follows: dividing the distribution interval of each feature into K subintervals, and calculating the accumulated local effect ALE value of the feature in each subinterval by adopting the following formula; a line graph is made for the calculated accumulated local effect ALE value, and the line graph reflects the change rule of the pitting corrosion depth along with the characteristics;
wherein ALE (j, x) represents the feature j non-centered ALE value; x represents a feature other than feature j; k (k) j (x) The number of the division intervals of the feature j; n is n j (k) Representing the sample size in the kth interval; i represents an i-th sample point in a k-th section; z is Z k,j An upper boundary value for feature j in the kth interval; z is Z k-1,j A lower boundary value for feature j in the kth interval; i: x j i ∈n j (k) Representing the ith sample point in the kth interval.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311346348.1A CN117313041A (en) | 2023-10-17 | 2023-10-17 | Method for predicting pitting corrosion of outer surface of buried pipeline and analyzing factors |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311346348.1A CN117313041A (en) | 2023-10-17 | 2023-10-17 | Method for predicting pitting corrosion of outer surface of buried pipeline and analyzing factors |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117313041A true CN117313041A (en) | 2023-12-29 |
Family
ID=89286452
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311346348.1A Pending CN117313041A (en) | 2023-10-17 | 2023-10-17 | Method for predicting pitting corrosion of outer surface of buried pipeline and analyzing factors |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117313041A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117589663A (en) * | 2024-01-18 | 2024-02-23 | 西南石油大学 | Residual life prediction method for nonmetallic pipeline of oil-gas field |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109284877A (en) * | 2018-11-19 | 2019-01-29 | 福州大学 | Based on AIGA-WLSSVM Buried Pipeline rate prediction method |
US20210319265A1 (en) * | 2020-11-02 | 2021-10-14 | Zhengzhou University | Method for segmentation of underground drainage pipeline defects based on full convolutional neural network |
CN116227166A (en) * | 2023-01-13 | 2023-06-06 | 中国地质大学(武汉) | Method and device for calculating explosion vibration speed of corrosion metal pipelines in different operating years |
-
2023
- 2023-10-17 CN CN202311346348.1A patent/CN117313041A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109284877A (en) * | 2018-11-19 | 2019-01-29 | 福州大学 | Based on AIGA-WLSSVM Buried Pipeline rate prediction method |
US20210319265A1 (en) * | 2020-11-02 | 2021-10-14 | Zhengzhou University | Method for segmentation of underground drainage pipeline defects based on full convolutional neural network |
CN116227166A (en) * | 2023-01-13 | 2023-06-06 | 中国地质大学(武汉) | Method and device for calculating explosion vibration speed of corrosion metal pipelines in different operating years |
Non-Patent Citations (1)
Title |
---|
YUHUI SONG等: "Interpretable machine learning for maximum corrosion depth and influence factor analysis", MATERIALS DEGRADATION, 28 February 2023 (2023-02-28), pages 1 - 15 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117589663A (en) * | 2024-01-18 | 2024-02-23 | 西南石油大学 | Residual life prediction method for nonmetallic pipeline of oil-gas field |
CN117589663B (en) * | 2024-01-18 | 2024-03-19 | 西南石油大学 | Residual life prediction method for nonmetallic pipeline of oil-gas field |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111291097B (en) | Drilling leaking layer position real-time prediction method based on decision tree data mining | |
KR102170765B1 (en) | Method for creating a shale gas production forecasting model using deep learning | |
CN117313041A (en) | Method for predicting pitting corrosion of outer surface of buried pipeline and analyzing factors | |
CN111259953B (en) | Equipment defect time prediction method based on capacitive equipment defect data | |
CN116881671A (en) | Atmospheric pollution tracing method and system based on neural network | |
Sambo et al. | Application of Artificial Intelligence Methods for Predicting Water Saturation from New Seismic Attributes. | |
US11555943B2 (en) | Method for identifying misallocated historical production data using machine learning to improve a predictive ability of a reservoir simulation | |
CN110795888A (en) | Petroleum drilling risk prediction method | |
CN116644284A (en) | Stratum classification characteristic factor determining method, system, electronic equipment and medium | |
CN116738192A (en) | Digital twinning-based security data evaluation method and system | |
CN115455791B (en) | Method for improving landslide displacement prediction accuracy based on numerical simulation technology | |
CN113468821B (en) | Decision regression algorithm-based slope abortion sand threshold determination method | |
CN115545112A (en) | Method for automatically identifying and processing large amount of sewage real-time automatic monitoring data | |
CN115906602A (en) | Wind turbine generator tower barrel dumping monitoring and evaluating method based on deep learning Transformer self-coding | |
CN114066037A (en) | Drainage basin pollution source tracing prediction method and device based on artificial intelligence | |
CN113887049A (en) | Drilling speed prediction method and system for petroleum drilling based on machine learning | |
CN117708625B (en) | Dam monitoring historical data filling method under spent data background | |
CN117540277B (en) | Lost circulation early warning method based on WGAN-GP-TabNet algorithm | |
CN116976146B (en) | Fracturing well yield prediction method and system coupled with physical driving and data driving | |
CN112070399B (en) | Safety risk assessment method and system for large-scale engineering structure | |
RATNAYAKE | Classification of static mechanical equipment using a fuzzy inference system: a case study from an offshore installation | |
Torebekov | PREDICTING OIL RECOVERY FACTOR IN POLYMER INJECTION: A COMPARATIVE ANALYSIS OF MLP AND RBF NEURAL NETWORKS WITH OPTIMIZED PARAMETERS | |
Wang et al. | Characteristic space-based algorithm for detecting abnormal monitoring data in underground engineering | |
CN118153944A (en) | Storage tank risk prediction method, storage tank risk prediction system, computer equipment and storage medium | |
CN114239264A (en) | Crude oil water content self-adaptive soft measurement method based on incremental Gaussian mixed regression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |