CN117349612A - Drainage pipeline maximum corrosion depth prediction method based on LightGBM - Google Patents

Drainage pipeline maximum corrosion depth prediction method based on LightGBM Download PDF

Info

Publication number
CN117349612A
CN117349612A CN202311159187.5A CN202311159187A CN117349612A CN 117349612 A CN117349612 A CN 117349612A CN 202311159187 A CN202311159187 A CN 202311159187A CN 117349612 A CN117349612 A CN 117349612A
Authority
CN
China
Prior art keywords
pipeline
lightgbm
data
drainage pipeline
corrosion depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311159187.5A
Other languages
Chinese (zh)
Inventor
方宏远
王念念
宋留洋
李斌
翟科杰
杜威仪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University
Original Assignee
Zhengzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University filed Critical Zhengzhou University
Priority to CN202311159187.5A priority Critical patent/CN117349612A/en
Publication of CN117349612A publication Critical patent/CN117349612A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/14Pipes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Testing Resistance To Weather, Investigating Materials By Mechanical Methods (AREA)

Abstract

The invention discloses a drain pipeline maximum corrosion depth prediction method based on a LightGBM, which comprises the following steps: acquiring related data of a drainage pipeline in service; data preprocessing is carried out on the collected drainage pipeline related data, and the quality of a data set is improved; carrying out characteristic dimension reduction on the drainage pipeline data set after pretreatment by utilizing a Principal Component Analysis (PCA), and comprehensively extracting characteristics capable of reflecting the condition of the pipeline; constructing a lightGBM model for predicting the maximum corrosion depth of the drainage pipeline based on a lightGBM algorithm; optimizing and selecting the super parameters of the LightGBM model by using a whale optimizing algorithm WOA, and optimizing the super parameter combination with highest model prediction accuracy; predicting the maximum corrosion depth of the pipeline by utilizing the optimized LightGBM model and carrying out error evaluation on a prediction result; the invention provides technical support for the safety maintenance of the drainage pipeline.

Description

Drainage pipeline maximum corrosion depth prediction method based on LightGBM
Technical Field
The invention relates to the technical field of machine learning and pipeline engineering interdisciplinary science, in particular to a drainage pipeline maximum corrosion depth prediction method based on a LightGBM.
Background
The drainage pipeline is an important component of urban infrastructure and is used for discharging sewage and wastewater and maintaining urban environmental sanitation. However, due to long-term use and influence of external environment, the drainage pipeline is easy to corrode, age and the like, so that the service life and normal drainage function of the drainage pipeline are influenced, even serious pipeline leakage and collapse accidents are caused, and the normal life of people is influenced.
The corrosion depth of the drainage pipeline is an important index for measuring the damage degree of the pipeline, and the greater the corrosion depth of the pipeline is, the strength and sealing performance of the pipeline are reduced, and even the problems of pipeline breakage, water leakage and the like are caused. Therefore, the corrosion depth of the drainage pipeline is predicted timely and accurately, and the method has important significance for safe transportation and damage repair of the pipeline.
In the prior art, the prediction of the maximum corrosion depth of the drainage pipeline is mostly modeled by adopting finite element calculation or a traditional BP neural network, but the traditional algorithms have larger errors and complex calculation, and the predicted maximum corrosion depth of the pipeline is easily larger or smaller than the actual maximum corrosion depth, so that the pipeline is overhauled too early or too late, unnecessary economic loss is caused, or pipeline accidents are caused.
Disclosure of Invention
The invention aims to provide a drain pipeline maximum corrosion depth prediction method based on a LightGBM, which utilizes a novel machine learning algorithm LightGBM to model, combines a whale optimization algorithm to optimize super parameters, greatly improves the prediction precision of a machine learning model, and solves the problems as a novel drain pipeline maximum corrosion depth prediction method.
In order to achieve the above purpose, the invention adopts the following technical scheme: a drain pipeline maximum corrosion depth prediction method based on a LightGBM comprises the following steps:
acquiring related data of a drainage pipeline in service;
data preprocessing is carried out on the collected drainage pipeline related data, and the quality of a data set is improved;
carrying out characteristic dimension reduction on the drainage pipeline data set after pretreatment by utilizing a Principal Component Analysis (PCA), and comprehensively extracting characteristics capable of reflecting the condition of the pipeline;
constructing a lightGBM model for predicting the maximum corrosion depth of the drainage pipeline based on a lightGBM algorithm;
optimizing and selecting the super parameters of the LightGBM model by using a whale optimizing algorithm WOA, and optimizing the super parameter combination with highest model prediction accuracy;
and predicting the maximum corrosion depth of the pipeline by using the optimized LightGBM model and carrying out error evaluation on the prediction result.
As a further improvement of the present invention, the drainage pipeline related data includes construction and maintenance records of the pipeline, basic data, corrosion data, internal monitoring data and external environment data, and specifically includes:
pipeline construction and maintenance records: building year, material and maintenance records;
pipeline base data: pipe diameter, wall thickness and burial depth, and design service life;
pipeline corrosion data: corrosion length, corrosion width, maximum corrosion depth;
pipeline internal monitoring data: setting water pressure and strain in a time;
external environment data: setting the temperature, humidity, rainfall and groundwater level in the time.
As a further improvement of the invention, the data preprocessing of the collected drainage pipeline related data is specifically as follows:
and performing primary analysis processing on the collected pipeline data set, including missing value processing, outlier/outlier processing and data standardization processing.
As a further improvement of the invention, the feature dimension reduction of the drainage pipeline data set after pretreatment by utilizing the Principal Component Analysis (PCA) algorithm is specifically as follows:
under the condition of ensuring that the original information quantity of the drainage pipeline is not lost, the dimension of the relevant characteristic variable of the pipeline is reduced, the characteristic vectors corresponding to the first N largest characteristic values are reserved, the relevant characteristic variable of the original pipeline is converted into a new space constructed by the N characteristic vectors, and the dimension reduction of the data set is completed.
As a further improvement of the invention, a LightGBM model for constructing the drain pipeline maximum corrosion depth prediction based on the LightGBM algorithm is specifically as follows:
randomly dividing a pipeline data set subjected to PCA dimension reduction by a principal component analysis algorithm into a training set and a testing set according to a certain proportion;
and constructing an intelligent model by utilizing a LightGBM algorithm to predict the maximum corrosion depth of the drainage pipeline.
As a further improvement of the invention, the super parameters of the LightGBM model are optimized and selected by using whale optimization algorithm WOA, specifically as follows:
searching super parameters of the LightGBM by using a whale optimization algorithm: max_ depth, learning _rate, n_ estimators, num _ leaves, feauture _fraction, find the hyper-parameter value combination that minimizes model prediction error on the training set.
As a further improvement of the invention, the optimized LightGBM model is used for predicting the maximum corrosion depth of the pipeline and carrying out error assessment on the predicted result, specifically as follows:
predicting the maximum corrosion depth of the drainage pipeline by using the searched optimal super-parameter combination on the test set;
error analysis is carried out on the prediction result of the model test set, 6 error indexes are calculated, namely an interpretable variance value EV and a fitting goodness R respectively 2 Correction determination coefficient adjusted_R 2 Root mean square error RMSE, mean absolute error MAE and mean absolute percent error MAPE.
The beneficial effects of the invention are as follows:
the prediction of the maximum corrosion depth of the drainage pipeline based on the LightGBM is realized by adopting the means of data preprocessing, PCA dimension reduction, lightGBM modeling, whale optimization algorithm and error analysis. The invention aims to design a high-precision prediction method capable of accurately predicting the corrosion state of a buried drainage pipeline, facilitating understanding of the working process, comprehensively evaluating the working condition of the pipeline, and realizing high efficiency and intellectualization of the maximum corrosion depth of the buried drainage pipeline based on data preprocessing, a principal component analysis algorithm, lightGBM regression prediction, whale optimization algorithm and error analysis, and overcomes the defects and a plurality of short plates of the current buried drainage pipeline corrosion prediction method, and based on the method, the invention has the following advantages:
(1) The working process is simple and clear, the understanding is convenient, and the calculated amount is moderate. The work flow of the drainage pipeline maximum corrosion depth prediction method based on the LightGBM is divided into 6 steps in total, each step is simple and clear, the method is easy to understand, and complicated calculation steps and huge calculation amount are not needed.
(2) The method comprises the steps of comprehensively evaluating the working condition of a pipeline, collecting a plurality of data of the drainage pipeline, including construction and maintenance records, basic data, corrosion data, internal monitoring data and external environment data of the pipeline, wherein related variables of the pipeline include construction year, materials, maintenance records, pipe diameter, wall thickness, burial depth, design service life, corrosion length, corrosion width, water pressure, strain, temperature, humidity, rainfall, groundwater level and maximum corrosion depth of the pipeline, and all aspects of the drainage pipeline are covered.
(3) The calculation efficiency is high, and the intelligence is realized. After the data set is collected, the model is only required to be modeled and programmed on python, the data preprocessing, PCA dimension reduction, the establishment of the LightGBM model, the whale optimization algorithm and the final error analysis can be realized through programming, complicated artificial calculation is not required at all, and high-efficiency intellectualization is realized.
(4) The prediction precision is high, and the generalization error is small. The original data is projected to the main component through PCA dimension reduction to obtain a new data set, the dimension of the data set can be reduced while the information quantity is maintained, the complexity of the data is reduced, the calculation accuracy is improved to a certain extent, the maximum corrosion depth prediction model of the buried drain pipeline is built by adopting the advanced LightGBM algorithm in the machine learning world, and the advantages of the LightGBM algorithm are that the model has unique advantages in the machine learning prediction field, and the addition of the emerging whale optimization algorithm can enable the model to achieve extremely high prediction accuracy and extremely low generalization error.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention;
fig. 2 is a flowchart of the LightGBM model predictive process according to an embodiment of the invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Example 1
As shown in fig. 1 and fig. 2, a method for predicting the maximum corrosion depth of a drain pipeline based on a LightGBM can accurately predict the corrosion depth of the drain pipeline in service, and scientifically evaluate the corrosion state of the pipeline, and includes the following steps:
s1, acquiring relevant data of a drainage pipeline in service, wherein the relevant data comprise construction and maintenance records, basic data, corrosion data, internal monitoring data and external environment data of the pipeline;
s2, carrying out data preprocessing on collected drainage pipeline related data, and improving the quality of a data set;
s3, performing characteristic dimension reduction on the drainage pipeline data set after pretreatment by using a PCA (principal component analysis) algorithm, and comprehensively extracting characteristics capable of reflecting the condition of the pipeline;
s4, constructing a drainage pipeline maximum corrosion depth prediction model based on a LightGBM algorithm;
s5, optimizing and selecting the super parameters of the LightGBM model by using a Whale Optimization Algorithm (WOA), and preferably selecting the super parameter combination with highest model prediction accuracy;
and S6, predicting the maximum corrosion depth of the pipeline by using the optimized LightGBM model and carrying out error evaluation on a prediction result.
The step S1 of acquiring relevant data of the service drainage pipeline, including construction and maintenance records, basic data, corrosion data, internal monitoring data and external environment data of the pipeline, includes:
(1) checking the working condition and corrosion condition of the drainage pipeline in service;
(2) performing an in-situ survey of the drainage pipeline in the sampling area, and collecting the pipeline data set comprises:
construction and maintenance records of the pipeline: collecting construction year, material and maintenance records of the pipeline;
pipeline base data: collecting each item of basic data of the pipeline, wherein the basic data comprise pipe diameter, wall thickness, burial depth and design service life;
pipeline corrosion data: collecting the data of the corrosion condition of the pipeline, wherein the data comprise corrosion length, corrosion width and maximum corrosion depth;
pipeline internal monitoring data: collecting the water pressure and strain in the pipeline within a set time;
external environment data: and collecting the temperature, humidity, rainfall and underground water level outside the environment where the pipeline is located.
The step S2 of performing data preprocessing on the collected drainage pipeline related data to improve the quality of the data set includes:
(1) missing value processing: filling the missing values, which can be replaced by: average, median, mode;
(2) outlier processing: identifying abnormal values in the data set by using simple statistics, 3 sigma principle or box diagram method, deleting the abnormal values or filling the abnormal values according to the treatment of the missing values;
(3) data standardization processing: and the predicted targets are standardized, so that the numerical ranges of the predicted targets are consistent, and model optimization and evaluation are facilitated.
The feature dimension reduction is performed on the drainage pipeline data set after preprocessing by using a PCA (principal component analysis) algorithm in step S3, and features capable of reflecting the pipeline condition are comprehensively proposed, including:
(1) calculating a covariance matrix;
(2) calculating eigenvalues and eigenvectors of the covariance matrix;
(3) sorting the eigenvalues to select principal components;
(4) and converting the data to realize characteristic dimension reduction compression.
The constructing a drainage pipeline maximum corrosion depth prediction model based on the LightGBM algorithm in step S4 includes:
(1) the tubing dataset after PCA dimension reduction is processed according to 8:2, randomly dividing a training set and a testing set;
(2) and (3) establishing a LightGBM regression prediction model by using Python, and predicting the maximum corrosion depth of the drainage pipeline, wherein the input variables are N variables extracted after PCA dimension reduction, and the output variables are the maximum corrosion depth of the pipeline.
In step S5, the optimizing selection of the super parameters of the LightGBM model by using Whale Optimizing Algorithm (WOA), preferably, the super parameter combination that makes the model prediction accuracy highest, includes:
(1) searching super parameters of the LightGBM by using a whale optimization algorithm: max_ depth, learning _rate, n_ estimators, num _ leaves, feauture _fraction, a specific search range is set for each super parameter;
(2) after several iterations, finding out the hyper-parameter value combination which minimizes the model prediction error (RMSE) on the training set;
the predicting the maximum corrosion depth of the pipeline by using the optimized LightGBM model and performing error assessment on the predicted result in step S6 includes:
(1) predicting the maximum corrosion depth of the pipeline by using the optimal super-parameter combination searched by using a whale optimization algorithm on the test set;
(2) error analysis is carried out on the prediction results of the model test set, error indexes are calculated, and the error indexes are respectively an interpretable variance value (EV) and a fitting goodness (R 2 ) Correction of the determination coefficient (adjusted_R) 2 ) Root Mean Square Error (RMSE), mean Absolute Error (MAE), and Mean Absolute Percent Error (MAPE).
According to the embodiment, the machine learning algorithm LightGBM is adopted to predict the maximum corrosion depth of the service drainage pipeline, and the PCA principal component analysis and the whale optimization algorithm are combined, so that the prediction accuracy of the model is improved, and the accurate prediction of the maximum corrosion depth of the drainage pipeline is realized.
Example 2
As shown in fig. 1 and fig. 2, a drain pipe maximum corrosion depth prediction method based on LightGBM includes: collecting related data of a drainage pipeline, preprocessing the data, reducing the dimension by PCA, constructing a LightGBM prediction model, optimizing super parameters of the LightGBM model by a whale optimization algorithm, and analyzing errors of the prediction results. The specific implementation method is as follows:
s1: and acquiring relevant data of the drainage pipeline in service, wherein the relevant data comprise construction and maintenance records, basic data, corrosion data, internal monitoring data and external environment data of the pipeline.
S11: by referring to the data, the factors related to the corrosion of the drainage pipeline mainly comprise pipeline construction factors, pipeline self basic information, pipeline internal conditions, pipeline external environment factors and the like;
s12: determining a suitable sampling area and sampling pipeline, and collecting soil-related variables includes:
construction and maintenance records of the pipeline: searching data of the initial stage of pipeline construction, and collecting construction year, material and maintenance record of the pipeline;
pipeline base data: collecting each item of basic data of the pipeline, wherein the basic data comprise pipe diameter, wall thickness, burial depth and design service life;
pipeline corrosion data: collecting the data of the corrosion condition of the pipeline, wherein the data comprise corrosion length, corrosion width and maximum corrosion depth;
pipeline internal monitoring data: the method comprises the steps that a plurality of sensors are arranged in a pipeline, each sensor comprises a water pressure sensor and a strain sensor, and water pressure and strain in the pipeline are collected within a set time;
external environment data: and acquiring data of the area where the pipeline is located in terms of climate, hydrology, geology and the like, and acquiring temperature, humidity, rainfall and groundwater level outside the environment where the pipeline is located.
S13: 300 groups of different pipeline data are collected in a sampling area and used as an original data set of the experiment.
S2: and data preprocessing is carried out on the collected drainage pipeline related data, so that the quality of a data set is improved.
S21: and storing the data set in a csv format or an xlsx format, importing the data set into python for identification processing, and respectively carrying out missing value processing, outlier processing and data standardization processing.
S22: the missing value processing method comprises the following steps: filling the missing values, which can be replaced by: average, median, mode.
The outlier processing method comprises the following steps: and identifying abnormal values in the data set by using simple statistics, 3 sigma principle or box diagram method, and deleting the abnormal values or filling the abnormal values according to the processing of the missing values. In the 3 sigma principle, the data need to follow normal distribution, and if the data exceeds 3 times of standard deviation, the data can be regarded as abnormal values; the box graph method is to detect an abnormal value by using the quarter bit distance (IQR) of a box graph.
The data standardization processing method comprises the following steps: the method can eliminate the influence of large scale difference of different characteristic data, scale the characteristics of each dimension to the same standard, and enable the different data to have comparability. After the missing value and the abnormal value are processed, the z-score method is adopted to perform data standardization processing, the data is scaled to a certain data distribution with 0 as the center and 1 standard deviation, and the method can keep the original data information and does not change the original data distribution type. The z-score formula is shown below:
where μ is a vector of the mean of the column features of the original dataset, μ=mean (X old ) σ is the vector of labeling differences for each column feature of the original dataset.
S3: and (3) performing characteristic dimension reduction on the drainage pipeline data set after pretreatment by using a PCA (principal component analysis) algorithm, and comprehensively extracting characteristics capable of reflecting the pipeline condition.
S31: the pre-processed pipe dataset was imported in IBM SPSS software for PCA dimension reduction. First, a covariance matrix is calculated: a covariance matrix is calculated on the normalized data, with the elements in the matrix representing the correlation between the two features. The element on the diagonal of the covariance matrix is the variance of each feature and the element on the off-diagonal is the covariance between the two features.
S32: calculating eigenvalues and eigenvectors: and carrying out eigenvalue decomposition on the covariance matrix to obtain eigenvalues and eigenvectors. The eigenvector represents the principal direction of the data, the eigenvalue represents the magnitude of the variance of the data in this direction, and a larger eigenvalue describes the more important features.
S33: and selecting main components: and sorting the feature vectors according to the corresponding feature values, and selecting the first k feature vectors as main components, wherein k is the dimension after dimension reduction.
S34: converting data: and projecting the original data onto the k principal components selected in the last step to obtain a new data set with the dimension of k.
S4: and constructing a drainage pipeline maximum corrosion depth prediction model based on a LightGBM algorithm.
S41: the new pipeline data set subjected to PCA dimension reduction is processed according to the following steps of 8:2 randomly dividing a training set and a testing set, wherein the data of the training set and the testing set are respectively 340 groups and 60 groups.
S42: modeling with pyrarm, downloading relevant libraries includes: pandas, numpy, scikit-learn et al, introducing an xlsx-format conduit data set into the model, wherein the input variable of the model is k principal components after PCA dimension reduction, and the output variable of the model is the maximum corrosion depth of the conduit.
S43: the prediction of the maximum corrosion depth of a drainage pipeline belongs to a regression problem, the maximum corrosion depth of the drainage pipeline is y, and the corrosion variable related to y is X 1 ,X 2 ,X 3 …X N The input-output relationship of the present model can be expressed as:
f(X 1 ,X 2 ,X 3 …X N )=y (2)
the weak learner regression tree of the LightGBM model may be represented as T q(x) Q (x) e {1,2, 3..j }, where T is the sample weight vector of the leaf node and J is the number of leaves in the regression tree, and the final fitting model obtained after integration of K regression trees can be expressed as:
according to the forward distribution algorithm, when generating the t-th tree, the information of the t-1 tree on the front can be used for representing, after t iterations, the generated objective function can be expressed as:
wherein Ω represents model complexity, g i Representing the first derivative of the loss function. Omega (f) m (x) The regularization term, added to the purpose of the regularization term to avoid overfitting of the model, second-order taylor expansion of the objective function, the corresponding loss function experience term can be expressed as:
the objective function after expansion is:
where j=1, 2,3 … … T, the jth leaf node of the T th regression decision tree contains a sample set, namely:
I j ={i|q(x i )=j} (7)
the regression tree model is represented by q (x), and the corresponding objective function is:
in the aboveThe optimal weight score representing each leaf node, also moduloThe model requires the implementation of optimization problems. Wherein g i Represents the first derivative of the loss function, h i Representing the second derivative of the loss function. The split gain of the leaf nodes of the regression tree is calculated through multiple iterations to maximize the split gain, and the multiple iterations are continued until the condition is met, so that the maximum split gain is found. The information gain after splitting can be expressed as:
where K represents the total tree of the final model regression tree, L represents the left regression tree, and R represents the right regression tree.
S5: and optimizing and selecting the super parameters of the LightGBM model by using a Whale Optimization Algorithm (WOA), and preferably selecting the super parameter combination with the highest model prediction accuracy.
S51: the basic principle of the whale optimization algorithm is to solve the optimization problem by simulating predation behaviors of whale groups, and larger whales in the groups have higher detection capability and are easier to find food. Each individual is regarded as whale, and the information of the better whale individual is used for guiding the searching direction and distance, so that a better optimizing effect is achieved.
S52: the whale optimization algorithm comprises the following steps:
(1) initializing a population: a number of individuals N are randomly generated as an initial population.
(2) Calculating the fitness: fitness evaluation was performed for each whale individual.
(3) Setting parameters: setting parameters required by the algorithm, including the maximum iteration number t max Search range, etc.
(4) Updating the optimal solution and updating the position according to the adaptability: and selecting an individual with the best fitness in the current population as a global optimal solution, and calculating a new position.
(5) Updating the search range: according to the current iteration times t and the maximum iteration times t max The search range is updated.
(6) Judging whether a termination condition is satisfied: if yes, the algorithm ends; otherwise, returning to the step (4).
Through continuous iteration, the WOA algorithm can continuously optimize the position of the individual, and finally find the optimal solution.
S53: the optimal hyper-parameter value of the LGBM found by using the whale optimization algorithm is as follows: max_depth=6, learning_rate=0.0142, n_evastiators=350, num_leave=10, feaurure_fraction=1.
S6: and predicting the maximum corrosion depth of the pipeline by using the optimized LightGBM model and carrying out error evaluation on the prediction result.
S61: and predicting the maximum corrosion depth of the 60 groups of test sets by using the optimized LightGBM model.
S62: the interpretable variance value (EV), the goodness of fit (R2), the correction decision coefficient (adjusted_R2), the Root Mean Square Error (RMSE), the Mean Absolute Error (MAE) and the Mean Absolute Percent Error (MAPE) are selected as evaluation indexes of model prediction results. Can explain the variance value (EV) and the goodness of fit (R 2 ) Correction of the determination coefficient (adjusted_R) 2 ) Representing the fitting accuracy of the model sample values and the prediction results, the Root Mean Square Error (RMSE), mean Absolute Error (MAE) and Mean Absolute Percent Error (MAPE) represent the model prediction error magnitudes.
S63: the mathematical expression of each error index is as follows:
the variance can be explained:the value range is as follows: EV E [0,1 ]];
Determining coefficients:the value range is as follows: EV E [0,1 ]];
Correction determining coefficient:the value range is as follows: EV E [0,1 ]];
Root mean square error:the value range is as follows: EV is an element of [0 ], ++ infinity a) is provided;
average absolute error:the value range is as follows: EV is an element of [0 ], ++ infinity a) is provided;
average absolute percentage error:the value range is as follows: EV is an element of [0 ], ++ infinity A kind of electronic device.
Wherein n represents the total number of samples; p represents the total number of features; y is i Representing the true value of the sample;representing model predictive values;an arithmetic mean value representing the true value; />Representing the arithmetic mean of the predicted values.
EV、R 2 、Adjusted_R 2 The larger and better the values in the respective value ranges, the higher the accuracy of the model is shown;
RMSE, MAE, MAPE is smaller and better, and smaller values indicate lower generalization errors of the model and better robustness.
S64: the following table 1 shows the scoring condition of the error index of the prediction result of the model, and can show that the error of the model is smaller, the prediction accuracy is higher, so that the prediction of the drain pipeline corrosion depth can be well completed by the method.
Table 1 error indicator sizes
The foregoing examples merely illustrate specific embodiments of the invention, which are described in greater detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention.

Claims (7)

1. The method for predicting the maximum corrosion depth of the drainage pipeline based on the LightGBM is characterized by comprising the following steps of:
acquiring related data of a drainage pipeline in service;
data preprocessing is carried out on the collected drainage pipeline related data, and the quality of a data set is improved;
carrying out characteristic dimension reduction on the drainage pipeline data set after pretreatment by utilizing a Principal Component Analysis (PCA), and comprehensively extracting characteristics capable of reflecting the condition of the pipeline;
constructing a lightGBM model for predicting the maximum corrosion depth of the drainage pipeline based on a lightGBM algorithm;
optimizing and selecting the super parameters of the LightGBM model by using a whale optimizing algorithm WOA, and optimizing the super parameter combination with highest model prediction accuracy;
and predicting the maximum corrosion depth of the pipeline by using the optimized LightGBM model and carrying out error evaluation on the prediction result.
2. The method for predicting the maximum corrosion depth of a drain pipe based on the LightGBM according to claim 1, wherein the drain pipe related data includes a construction and maintenance record of the pipe, basic data, corrosion data, internal monitoring data and external environment data, and specifically includes:
pipeline construction and maintenance records: building year, material and maintenance records;
pipeline base data: pipe diameter, wall thickness and burial depth, and design service life;
pipeline corrosion data: corrosion length, corrosion width, maximum corrosion depth;
pipeline internal monitoring data: setting water pressure and strain in a time;
external environment data: setting the temperature, humidity, rainfall and groundwater level in the time.
3. The method for predicting the maximum corrosion depth of the drainage pipeline based on the LightGBM according to claim 1, wherein the data preprocessing of the collected drainage pipeline related data is specifically as follows:
and performing primary analysis processing on the collected pipeline data set, including missing value processing, outlier/outlier processing and data standardization processing.
4. The method for predicting the maximum corrosion depth of the drainage pipeline based on the LightGBM according to claim 1, wherein the feature dimension reduction of the drainage pipeline data set after pretreatment by using a principal component analysis algorithm PCA is specifically as follows:
under the condition of ensuring that the original information quantity of the drainage pipeline is not lost, the dimension of the relevant characteristic variable of the pipeline is reduced, the characteristic vectors corresponding to the first N largest characteristic values are reserved, the relevant characteristic variable of the original pipeline is converted into a new space constructed by the N characteristic vectors, and the dimension reduction of the data set is completed.
5. The method for predicting the maximum corrosion depth of the drainage pipeline based on the LightGBM according to claim 1 or 4, wherein the LightGBM model for constructing the drainage pipeline maximum corrosion depth prediction based on the LightGBM algorithm is specifically as follows:
randomly dividing a pipeline data set subjected to PCA dimension reduction by a principal component analysis algorithm into a training set and a testing set according to a certain proportion;
and constructing an intelligent model by utilizing a LightGBM algorithm to predict the maximum corrosion depth of the drainage pipeline.
6. The method for predicting the maximum corrosion depth of a drain pipeline based on the LightGBM according to claim 5, wherein the optimization selection of the super parameters of the LightGBM model by using whale optimization algorithm WOA is specifically as follows:
searching super parameters of the LightGBM by using a whale optimization algorithm: max_ depth, learning _rate, n_ estimators, num _ leaves, feauture _fraction, find the hyper-parameter value combination that minimizes model prediction error on the training set.
7. The method for predicting the maximum corrosion depth of the drain pipeline based on the LightGBM according to claim 6, wherein the optimized LightGBM model is used for predicting the maximum corrosion depth of the pipeline and performing error assessment on the predicted result is specifically as follows:
predicting the maximum corrosion depth of the drainage pipeline by using the searched optimal super-parameter combination on the test set;
error analysis is carried out on the prediction result of the model test set, 6 error indexes are calculated, namely an interpretable variance value EV and a fitting goodness R respectively 2 Correction determination coefficient adjusted_R 2 Root mean square error RMSE, mean absolute error MAE and mean absolute percent error MAPE.
CN202311159187.5A 2023-09-09 2023-09-09 Drainage pipeline maximum corrosion depth prediction method based on LightGBM Pending CN117349612A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311159187.5A CN117349612A (en) 2023-09-09 2023-09-09 Drainage pipeline maximum corrosion depth prediction method based on LightGBM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311159187.5A CN117349612A (en) 2023-09-09 2023-09-09 Drainage pipeline maximum corrosion depth prediction method based on LightGBM

Publications (1)

Publication Number Publication Date
CN117349612A true CN117349612A (en) 2024-01-05

Family

ID=89370060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311159187.5A Pending CN117349612A (en) 2023-09-09 2023-09-09 Drainage pipeline maximum corrosion depth prediction method based on LightGBM

Country Status (1)

Country Link
CN (1) CN117349612A (en)

Similar Documents

Publication Publication Date Title
CN113919448B (en) Method for analyzing influence factors of carbon dioxide concentration prediction at any time-space position
CN112508105A (en) Method for detecting and retrieving faults of oil extraction machine
CN108875118B (en) Method and device for evaluating accuracy of prediction model of silicon content of blast furnace molten iron
Tinoco et al. Piezometric level prediction based on data mining techniques
CN112948932A (en) Surrounding rock grade prediction method based on TSP forecast data and XGboost algorithm
CN112989711B (en) Aureomycin fermentation process soft measurement modeling method based on semi-supervised ensemble learning
CN111915022B (en) Gaussian process method and device for rapidly identifying stability coefficient of sliding karst dangerous rock
CN111144636B (en) Slope deformation prediction method
Ye et al. A deep learning-based method for automatic abnormal data detection: Case study for bridge structural health monitoring
CN112614021A (en) Tunnel surrounding rock geological information prediction method based on built tunnel information intelligent identification
CN115640744A (en) Method for predicting corrosion rate outside oil field gathering and transportation pipeline
CN112926251B (en) Landslide displacement high-precision prediction method based on machine learning
CN110852415B (en) Vegetation index prediction method, system and equipment based on neural network algorithm
CN116779172A (en) Lung cancer disease burden risk early warning method based on ensemble learning
CN116303626A (en) Well cementation pump pressure prediction method based on feature optimization and online learning
CN117349612A (en) Drainage pipeline maximum corrosion depth prediction method based on LightGBM
CN107066786A (en) Aerosol optical depth inversion algorithm based on neutral net
CN116738822A (en) Drainage pipeline maximum corrosion depth prediction method based on LightGBM
CN112765141A (en) Continuous large-scale water quality missing data filling method based on transfer learning
CN114971097B (en) Soil moisture content data reconstruction method and prediction method
CN117114105B (en) Target object recommendation method and system based on scientific research big data information
CN109740636B (en) Stratum oil-water layer identification method based on AdaBoost combined with GRU
Hedlin et al. Accounting for imperfect detection in estimates of yearly site occupancy
Băicoianu et al. Fractal interpolation in the context of prediction accuracy optimization
CN117370741A (en) Missing data filling method for mine ventilation parameters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination