CN117349612A

CN117349612A - Drainage pipeline maximum corrosion depth prediction method based on LightGBM

Info

Publication number: CN117349612A
Application number: CN202311159187.5A
Authority: CN
Inventors: 方宏远; 王念念; 宋留洋; 李斌; 翟科杰; 杜威仪
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2023-09-09
Filing date: 2023-09-09
Publication date: 2024-01-05

Abstract

The invention discloses a drain pipeline maximum corrosion depth prediction method based on a LightGBM, which comprises the following steps: acquiring related data of a drainage pipeline in service; data preprocessing is carried out on the collected drainage pipeline related data, and the quality of a data set is improved; carrying out characteristic dimension reduction on the drainage pipeline data set after pretreatment by utilizing a Principal Component Analysis (PCA), and comprehensively extracting characteristics capable of reflecting the condition of the pipeline; constructing a lightGBM model for predicting the maximum corrosion depth of the drainage pipeline based on a lightGBM algorithm; optimizing and selecting the super parameters of the LightGBM model by using a whale optimizing algorithm WOA, and optimizing the super parameter combination with highest model prediction accuracy; predicting the maximum corrosion depth of the pipeline by utilizing the optimized LightGBM model and carrying out error evaluation on a prediction result; the invention provides technical support for the safety maintenance of the drainage pipeline.

Description

Drainage pipeline maximum corrosion depth prediction method based on LightGBM

Technical Field

The invention relates to the technical field of machine learning and pipeline engineering interdisciplinary science, in particular to a drainage pipeline maximum corrosion depth prediction method based on a LightGBM.

Background

The drainage pipeline is an important component of urban infrastructure and is used for discharging sewage and wastewater and maintaining urban environmental sanitation. However, due to long-term use and influence of external environment, the drainage pipeline is easy to corrode, age and the like, so that the service life and normal drainage function of the drainage pipeline are influenced, even serious pipeline leakage and collapse accidents are caused, and the normal life of people is influenced.

The corrosion depth of the drainage pipeline is an important index for measuring the damage degree of the pipeline, and the greater the corrosion depth of the pipeline is, the strength and sealing performance of the pipeline are reduced, and even the problems of pipeline breakage, water leakage and the like are caused. Therefore, the corrosion depth of the drainage pipeline is predicted timely and accurately, and the method has important significance for safe transportation and damage repair of the pipeline.

In the prior art, the prediction of the maximum corrosion depth of the drainage pipeline is mostly modeled by adopting finite element calculation or a traditional BP neural network, but the traditional algorithms have larger errors and complex calculation, and the predicted maximum corrosion depth of the pipeline is easily larger or smaller than the actual maximum corrosion depth, so that the pipeline is overhauled too early or too late, unnecessary economic loss is caused, or pipeline accidents are caused.

Disclosure of Invention

The invention aims to provide a drain pipeline maximum corrosion depth prediction method based on a LightGBM, which utilizes a novel machine learning algorithm LightGBM to model, combines a whale optimization algorithm to optimize super parameters, greatly improves the prediction precision of a machine learning model, and solves the problems as a novel drain pipeline maximum corrosion depth prediction method.

In order to achieve the above purpose, the invention adopts the following technical scheme: a drain pipeline maximum corrosion depth prediction method based on a LightGBM comprises the following steps:

acquiring related data of a drainage pipeline in service;

data preprocessing is carried out on the collected drainage pipeline related data, and the quality of a data set is improved;

carrying out characteristic dimension reduction on the drainage pipeline data set after pretreatment by utilizing a Principal Component Analysis (PCA), and comprehensively extracting characteristics capable of reflecting the condition of the pipeline;

constructing a lightGBM model for predicting the maximum corrosion depth of the drainage pipeline based on a lightGBM algorithm;

optimizing and selecting the super parameters of the LightGBM model by using a whale optimizing algorithm WOA, and optimizing the super parameter combination with highest model prediction accuracy;

and predicting the maximum corrosion depth of the pipeline by using the optimized LightGBM model and carrying out error evaluation on the prediction result.

As a further improvement of the present invention, the drainage pipeline related data includes construction and maintenance records of the pipeline, basic data, corrosion data, internal monitoring data and external environment data, and specifically includes:

pipeline construction and maintenance records: building year, material and maintenance records;

pipeline base data: pipe diameter, wall thickness and burial depth, and design service life;

pipeline corrosion data: corrosion length, corrosion width, maximum corrosion depth;

pipeline internal monitoring data: setting water pressure and strain in a time;

external environment data: setting the temperature, humidity, rainfall and groundwater level in the time.

As a further improvement of the invention, the data preprocessing of the collected drainage pipeline related data is specifically as follows:

and performing primary analysis processing on the collected pipeline data set, including missing value processing, outlier/outlier processing and data standardization processing.

As a further improvement of the invention, the feature dimension reduction of the drainage pipeline data set after pretreatment by utilizing the Principal Component Analysis (PCA) algorithm is specifically as follows:

under the condition of ensuring that the original information quantity of the drainage pipeline is not lost, the dimension of the relevant characteristic variable of the pipeline is reduced, the characteristic vectors corresponding to the first N largest characteristic values are reserved, the relevant characteristic variable of the original pipeline is converted into a new space constructed by the N characteristic vectors, and the dimension reduction of the data set is completed.

As a further improvement of the invention, a LightGBM model for constructing the drain pipeline maximum corrosion depth prediction based on the LightGBM algorithm is specifically as follows:

randomly dividing a pipeline data set subjected to PCA dimension reduction by a principal component analysis algorithm into a training set and a testing set according to a certain proportion;

and constructing an intelligent model by utilizing a LightGBM algorithm to predict the maximum corrosion depth of the drainage pipeline.

As a further improvement of the invention, the super parameters of the LightGBM model are optimized and selected by using whale optimization algorithm WOA, specifically as follows:

searching super parameters of the LightGBM by using a whale optimization algorithm: max_ depth, learning _rate, n_ estimators, num _ leaves, feauture _fraction, find the hyper-parameter value combination that minimizes model prediction error on the training set.

As a further improvement of the invention, the optimized LightGBM model is used for predicting the maximum corrosion depth of the pipeline and carrying out error assessment on the predicted result, specifically as follows:

predicting the maximum corrosion depth of the drainage pipeline by using the searched optimal super-parameter combination on the test set;

error analysis is carried out on the prediction result of the model test set, 6 error indexes are calculated, namely an interpretable variance value EV and a fitting goodness R respectively ² Correction determination coefficient adjusted_R ² Root mean square error RMSE, mean absolute error MAE and mean absolute percent error MAPE.

The beneficial effects of the invention are as follows:

the prediction of the maximum corrosion depth of the drainage pipeline based on the LightGBM is realized by adopting the means of data preprocessing, PCA dimension reduction, lightGBM modeling, whale optimization algorithm and error analysis. The invention aims to design a high-precision prediction method capable of accurately predicting the corrosion state of a buried drainage pipeline, facilitating understanding of the working process, comprehensively evaluating the working condition of the pipeline, and realizing high efficiency and intellectualization of the maximum corrosion depth of the buried drainage pipeline based on data preprocessing, a principal component analysis algorithm, lightGBM regression prediction, whale optimization algorithm and error analysis, and overcomes the defects and a plurality of short plates of the current buried drainage pipeline corrosion prediction method, and based on the method, the invention has the following advantages:

(1) The working process is simple and clear, the understanding is convenient, and the calculated amount is moderate. The work flow of the drainage pipeline maximum corrosion depth prediction method based on the LightGBM is divided into 6 steps in total, each step is simple and clear, the method is easy to understand, and complicated calculation steps and huge calculation amount are not needed.

(2) The method comprises the steps of comprehensively evaluating the working condition of a pipeline, collecting a plurality of data of the drainage pipeline, including construction and maintenance records, basic data, corrosion data, internal monitoring data and external environment data of the pipeline, wherein related variables of the pipeline include construction year, materials, maintenance records, pipe diameter, wall thickness, burial depth, design service life, corrosion length, corrosion width, water pressure, strain, temperature, humidity, rainfall, groundwater level and maximum corrosion depth of the pipeline, and all aspects of the drainage pipeline are covered.

(3) The calculation efficiency is high, and the intelligence is realized. After the data set is collected, the model is only required to be modeled and programmed on python, the data preprocessing, PCA dimension reduction, the establishment of the LightGBM model, the whale optimization algorithm and the final error analysis can be realized through programming, complicated artificial calculation is not required at all, and high-efficiency intellectualization is realized.

(4) The prediction precision is high, and the generalization error is small. The original data is projected to the main component through PCA dimension reduction to obtain a new data set, the dimension of the data set can be reduced while the information quantity is maintained, the complexity of the data is reduced, the calculation accuracy is improved to a certain extent, the maximum corrosion depth prediction model of the buried drain pipeline is built by adopting the advanced LightGBM algorithm in the machine learning world, and the advantages of the LightGBM algorithm are that the model has unique advantages in the machine learning prediction field, and the addition of the emerging whale optimization algorithm can enable the model to achieve extremely high prediction accuracy and extremely low generalization error.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention;

fig. 2 is a flowchart of the LightGBM model predictive process according to an embodiment of the invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Example 1

As shown in fig. 1 and fig. 2, a method for predicting the maximum corrosion depth of a drain pipeline based on a LightGBM can accurately predict the corrosion depth of the drain pipeline in service, and scientifically evaluate the corrosion state of the pipeline, and includes the following steps:

s1, acquiring relevant data of a drainage pipeline in service, wherein the relevant data comprise construction and maintenance records, basic data, corrosion data, internal monitoring data and external environment data of the pipeline;

s2, carrying out data preprocessing on collected drainage pipeline related data, and improving the quality of a data set;

s3, performing characteristic dimension reduction on the drainage pipeline data set after pretreatment by using a PCA (principal component analysis) algorithm, and comprehensively extracting characteristics capable of reflecting the condition of the pipeline;

s4, constructing a drainage pipeline maximum corrosion depth prediction model based on a LightGBM algorithm;

s5, optimizing and selecting the super parameters of the LightGBM model by using a Whale Optimization Algorithm (WOA), and preferably selecting the super parameter combination with highest model prediction accuracy;

and S6, predicting the maximum corrosion depth of the pipeline by using the optimized LightGBM model and carrying out error evaluation on a prediction result.

The step S1 of acquiring relevant data of the service drainage pipeline, including construction and maintenance records, basic data, corrosion data, internal monitoring data and external environment data of the pipeline, includes:

(1) checking the working condition and corrosion condition of the drainage pipeline in service;

(2) performing an in-situ survey of the drainage pipeline in the sampling area, and collecting the pipeline data set comprises:

construction and maintenance records of the pipeline: collecting construction year, material and maintenance records of the pipeline;

pipeline base data: collecting each item of basic data of the pipeline, wherein the basic data comprise pipe diameter, wall thickness, burial depth and design service life;

pipeline corrosion data: collecting the data of the corrosion condition of the pipeline, wherein the data comprise corrosion length, corrosion width and maximum corrosion depth;

pipeline internal monitoring data: collecting the water pressure and strain in the pipeline within a set time;

external environment data: and collecting the temperature, humidity, rainfall and underground water level outside the environment where the pipeline is located.

The step S2 of performing data preprocessing on the collected drainage pipeline related data to improve the quality of the data set includes:

(1) missing value processing: filling the missing values, which can be replaced by: average, median, mode;

(2) outlier processing: identifying abnormal values in the data set by using simple statistics, 3 sigma principle or box diagram method, deleting the abnormal values or filling the abnormal values according to the treatment of the missing values;

(3) data standardization processing: and the predicted targets are standardized, so that the numerical ranges of the predicted targets are consistent, and model optimization and evaluation are facilitated.

The feature dimension reduction is performed on the drainage pipeline data set after preprocessing by using a PCA (principal component analysis) algorithm in step S3, and features capable of reflecting the pipeline condition are comprehensively proposed, including:

(1) calculating a covariance matrix;

(2) calculating eigenvalues and eigenvectors of the covariance matrix;

(3) sorting the eigenvalues to select principal components;

(4) and converting the data to realize characteristic dimension reduction compression.

The constructing a drainage pipeline maximum corrosion depth prediction model based on the LightGBM algorithm in step S4 includes:

(1) the tubing dataset after PCA dimension reduction is processed according to 8:2, randomly dividing a training set and a testing set;

(2) and (3) establishing a LightGBM regression prediction model by using Python, and predicting the maximum corrosion depth of the drainage pipeline, wherein the input variables are N variables extracted after PCA dimension reduction, and the output variables are the maximum corrosion depth of the pipeline.

In step S5, the optimizing selection of the super parameters of the LightGBM model by using Whale Optimizing Algorithm (WOA), preferably, the super parameter combination that makes the model prediction accuracy highest, includes:

(1) searching super parameters of the LightGBM by using a whale optimization algorithm: max_ depth, learning _rate, n_ estimators, num _ leaves, feauture _fraction, a specific search range is set for each super parameter;

(2) after several iterations, finding out the hyper-parameter value combination which minimizes the model prediction error (RMSE) on the training set;

the predicting the maximum corrosion depth of the pipeline by using the optimized LightGBM model and performing error assessment on the predicted result in step S6 includes:

(1) predicting the maximum corrosion depth of the pipeline by using the optimal super-parameter combination searched by using a whale optimization algorithm on the test set;

(2) error analysis is carried out on the prediction results of the model test set, error indexes are calculated, and the error indexes are respectively an interpretable variance value (EV) and a fitting goodness (R ² ) Correction of the determination coefficient (adjusted_R) ² ) Root Mean Square Error (RMSE), mean Absolute Error (MAE), and Mean Absolute Percent Error (MAPE).

According to the embodiment, the machine learning algorithm LightGBM is adopted to predict the maximum corrosion depth of the service drainage pipeline, and the PCA principal component analysis and the whale optimization algorithm are combined, so that the prediction accuracy of the model is improved, and the accurate prediction of the maximum corrosion depth of the drainage pipeline is realized.

Example 2

As shown in fig. 1 and fig. 2, a drain pipe maximum corrosion depth prediction method based on LightGBM includes: collecting related data of a drainage pipeline, preprocessing the data, reducing the dimension by PCA, constructing a LightGBM prediction model, optimizing super parameters of the LightGBM model by a whale optimization algorithm, and analyzing errors of the prediction results. The specific implementation method is as follows:

s1: and acquiring relevant data of the drainage pipeline in service, wherein the relevant data comprise construction and maintenance records, basic data, corrosion data, internal monitoring data and external environment data of the pipeline.

S11: by referring to the data, the factors related to the corrosion of the drainage pipeline mainly comprise pipeline construction factors, pipeline self basic information, pipeline internal conditions, pipeline external environment factors and the like;

s12: determining a suitable sampling area and sampling pipeline, and collecting soil-related variables includes:

construction and maintenance records of the pipeline: searching data of the initial stage of pipeline construction, and collecting construction year, material and maintenance record of the pipeline;

pipeline internal monitoring data: the method comprises the steps that a plurality of sensors are arranged in a pipeline, each sensor comprises a water pressure sensor and a strain sensor, and water pressure and strain in the pipeline are collected within a set time;

external environment data: and acquiring data of the area where the pipeline is located in terms of climate, hydrology, geology and the like, and acquiring temperature, humidity, rainfall and groundwater level outside the environment where the pipeline is located.

S13: 300 groups of different pipeline data are collected in a sampling area and used as an original data set of the experiment.

S2: and data preprocessing is carried out on the collected drainage pipeline related data, so that the quality of a data set is improved.

S21: and storing the data set in a csv format or an xlsx format, importing the data set into python for identification processing, and respectively carrying out missing value processing, outlier processing and data standardization processing.

S22: the missing value processing method comprises the following steps: filling the missing values, which can be replaced by: average, median, mode.

The outlier processing method comprises the following steps: and identifying abnormal values in the data set by using simple statistics, 3 sigma principle or box diagram method, and deleting the abnormal values or filling the abnormal values according to the processing of the missing values. In the 3 sigma principle, the data need to follow normal distribution, and if the data exceeds 3 times of standard deviation, the data can be regarded as abnormal values; the box graph method is to detect an abnormal value by using the quarter bit distance (IQR) of a box graph.

The data standardization processing method comprises the following steps: the method can eliminate the influence of large scale difference of different characteristic data, scale the characteristics of each dimension to the same standard, and enable the different data to have comparability. After the missing value and the abnormal value are processed, the z-score method is adopted to perform data standardization processing, the data is scaled to a certain data distribution with 0 as the center and 1 standard deviation, and the method can keep the original data information and does not change the original data distribution type. The z-score formula is shown below:

where μ is a vector of the mean of the column features of the original dataset, μ=mean (X _old ) σ is the vector of labeling differences for each column feature of the original dataset.

S3: and (3) performing characteristic dimension reduction on the drainage pipeline data set after pretreatment by using a PCA (principal component analysis) algorithm, and comprehensively extracting characteristics capable of reflecting the pipeline condition.

S31: the pre-processed pipe dataset was imported in IBM SPSS software for PCA dimension reduction. First, a covariance matrix is calculated: a covariance matrix is calculated on the normalized data, with the elements in the matrix representing the correlation between the two features. The element on the diagonal of the covariance matrix is the variance of each feature and the element on the off-diagonal is the covariance between the two features.

S32: calculating eigenvalues and eigenvectors: and carrying out eigenvalue decomposition on the covariance matrix to obtain eigenvalues and eigenvectors. The eigenvector represents the principal direction of the data, the eigenvalue represents the magnitude of the variance of the data in this direction, and a larger eigenvalue describes the more important features.

S33: and selecting main components: and sorting the feature vectors according to the corresponding feature values, and selecting the first k feature vectors as main components, wherein k is the dimension after dimension reduction.

S34: converting data: and projecting the original data onto the k principal components selected in the last step to obtain a new data set with the dimension of k.

S4: and constructing a drainage pipeline maximum corrosion depth prediction model based on a LightGBM algorithm.

S41: the new pipeline data set subjected to PCA dimension reduction is processed according to the following steps of 8:2 randomly dividing a training set and a testing set, wherein the data of the training set and the testing set are respectively 340 groups and 60 groups.

S42: modeling with pyrarm, downloading relevant libraries includes: pandas, numpy, scikit-learn et al, introducing an xlsx-format conduit data set into the model, wherein the input variable of the model is k principal components after PCA dimension reduction, and the output variable of the model is the maximum corrosion depth of the conduit.

S43: the prediction of the maximum corrosion depth of a drainage pipeline belongs to a regression problem, the maximum corrosion depth of the drainage pipeline is y, and the corrosion variable related to y is X ¹ ,X ² ,X ³ …X ^N The input-output relationship of the present model can be expressed as:

f(X ¹ ,X ² ,X ³ …X ^N )＝y (2)

the weak learner regression tree of the LightGBM model may be represented as T _q(x) Q (x) e {1,2, 3..j }, where T is the sample weight vector of the leaf node and J is the number of leaves in the regression tree, and the final fitting model obtained after integration of K regression trees can be expressed as:

according to the forward distribution algorithm, when generating the t-th tree, the information of the t-1 tree on the front can be used for representing, after t iterations, the generated objective function can be expressed as:

wherein Ω represents model complexity, g _i Representing the first derivative of the loss function. Omega (f) _m (x) The regularization term, added to the purpose of the regularization term to avoid overfitting of the model, second-order taylor expansion of the objective function, the corresponding loss function experience term can be expressed as:

the objective function after expansion is:

where j=1, 2,3 … … T, the jth leaf node of the T th regression decision tree contains a sample set, namely:

I _j ＝{i|q(x _i )＝j} (7)

the regression tree model is represented by q (x), and the corresponding objective function is:

in the aboveThe optimal weight score representing each leaf node, also moduloThe model requires the implementation of optimization problems. Wherein g _i Represents the first derivative of the loss function, h _i Representing the second derivative of the loss function. The split gain of the leaf nodes of the regression tree is calculated through multiple iterations to maximize the split gain, and the multiple iterations are continued until the condition is met, so that the maximum split gain is found. The information gain after splitting can be expressed as:

where K represents the total tree of the final model regression tree, L represents the left regression tree, and R represents the right regression tree.

S5: and optimizing and selecting the super parameters of the LightGBM model by using a Whale Optimization Algorithm (WOA), and preferably selecting the super parameter combination with the highest model prediction accuracy.

S51: the basic principle of the whale optimization algorithm is to solve the optimization problem by simulating predation behaviors of whale groups, and larger whales in the groups have higher detection capability and are easier to find food. Each individual is regarded as whale, and the information of the better whale individual is used for guiding the searching direction and distance, so that a better optimizing effect is achieved.

S52: the whale optimization algorithm comprises the following steps:

(1) initializing a population: a number of individuals N are randomly generated as an initial population.

(2) Calculating the fitness: fitness evaluation was performed for each whale individual.

(3) Setting parameters: setting parameters required by the algorithm, including the maximum iteration number t _max Search range, etc.

(4) Updating the optimal solution and updating the position according to the adaptability: and selecting an individual with the best fitness in the current population as a global optimal solution, and calculating a new position.

(5) Updating the search range: according to the current iteration times t and the maximum iteration times t _max The search range is updated.

(6) Judging whether a termination condition is satisfied: if yes, the algorithm ends; otherwise, returning to the step (4).

Through continuous iteration, the WOA algorithm can continuously optimize the position of the individual, and finally find the optimal solution.

S53: the optimal hyper-parameter value of the LGBM found by using the whale optimization algorithm is as follows: max_depth=6, learning_rate=0.0142, n_evastiators=350, num_leave=10, feaurure_fraction=1.

S6: and predicting the maximum corrosion depth of the pipeline by using the optimized LightGBM model and carrying out error evaluation on the prediction result.

S61: and predicting the maximum corrosion depth of the 60 groups of test sets by using the optimized LightGBM model.

S62: the interpretable variance value (EV), the goodness of fit (R2), the correction decision coefficient (adjusted_R2), the Root Mean Square Error (RMSE), the Mean Absolute Error (MAE) and the Mean Absolute Percent Error (MAPE) are selected as evaluation indexes of model prediction results. Can explain the variance value (EV) and the goodness of fit (R ² ) Correction of the determination coefficient (adjusted_R) ² ) Representing the fitting accuracy of the model sample values and the prediction results, the Root Mean Square Error (RMSE), mean Absolute Error (MAE) and Mean Absolute Percent Error (MAPE) represent the model prediction error magnitudes.

S63: the mathematical expression of each error index is as follows:

the variance can be explained:the value range is as follows: EV E [0,1 ]]；

Determining coefficients:the value range is as follows: EV E [0,1 ]]；

Correction determining coefficient:the value range is as follows: EV E [0,1 ]]；

Root mean square error:the value range is as follows: EV is an element of [0 ], ++ infinity a) is provided;

average absolute error:the value range is as follows: EV is an element of [0 ], ++ infinity a) is provided;

average absolute percentage error:the value range is as follows: EV is an element of [0 ], ++ infinity A kind of electronic device.

Wherein n represents the total number of samples; p represents the total number of features; y is _i Representing the true value of the sample;representing model predictive values;an arithmetic mean value representing the true value; />Representing the arithmetic mean of the predicted values.

EV、R ² 、Adjusted_R ² The larger and better the values in the respective value ranges, the higher the accuracy of the model is shown;

RMSE, MAE, MAPE is smaller and better, and smaller values indicate lower generalization errors of the model and better robustness.

S64: the following table 1 shows the scoring condition of the error index of the prediction result of the model, and can show that the error of the model is smaller, the prediction accuracy is higher, so that the prediction of the drain pipeline corrosion depth can be well completed by the method.

Table 1 error indicator sizes

The foregoing examples merely illustrate specific embodiments of the invention, which are described in greater detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention.

Claims

1. The method for predicting the maximum corrosion depth of the drainage pipeline based on the LightGBM is characterized by comprising the following steps of:

acquiring related data of a drainage pipeline in service;

2. The method for predicting the maximum corrosion depth of a drain pipe based on the LightGBM according to claim 1, wherein the drain pipe related data includes a construction and maintenance record of the pipe, basic data, corrosion data, internal monitoring data and external environment data, and specifically includes:

pipeline internal monitoring data: setting water pressure and strain in a time;

3. The method for predicting the maximum corrosion depth of the drainage pipeline based on the LightGBM according to claim 1, wherein the data preprocessing of the collected drainage pipeline related data is specifically as follows:

4. The method for predicting the maximum corrosion depth of the drainage pipeline based on the LightGBM according to claim 1, wherein the feature dimension reduction of the drainage pipeline data set after pretreatment by using a principal component analysis algorithm PCA is specifically as follows:

5. The method for predicting the maximum corrosion depth of the drainage pipeline based on the LightGBM according to claim 1 or 4, wherein the LightGBM model for constructing the drainage pipeline maximum corrosion depth prediction based on the LightGBM algorithm is specifically as follows:

6. The method for predicting the maximum corrosion depth of a drain pipeline based on the LightGBM according to claim 5, wherein the optimization selection of the super parameters of the LightGBM model by using whale optimization algorithm WOA is specifically as follows:

7. The method for predicting the maximum corrosion depth of the drain pipeline based on the LightGBM according to claim 6, wherein the optimized LightGBM model is used for predicting the maximum corrosion depth of the pipeline and performing error assessment on the predicted result is specifically as follows: