CN115130343A

CN115130343A - Pipeline defect type identification method based on GA deep optimization machine learning

Info

Publication number: CN115130343A
Application number: CN202210729035.3A
Authority: CN
Inventors: 潘建华; 高伦; 赵冬军
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2022-09-30

Abstract

The invention discloses a pipeline defect type identification method based on GA deep optimization machine learning, which is used for acquiring triaxial magnetic flux leakage signals of defects under different defect types; extracting signal characteristic parameters aiming at the triaxial magnetic leakage signal of the defect; establishing a sample set by taking the peak-valley interval of the axial component differential signal, the median interval of the circumferential component peak-valley, the radial component peak-valley value, the radial component peak-valley interval and the axial component waveform area, and the number of sensors acquiring the defect magnetic leakage signal on the magnetic leakage sensor as identification parameters of defect types; constructing a neural network, wherein the input of the neural network is the identification parameter of the defect type, and the output of the neural network is the defect type; training by using a sample set and generating a neural network; and identifying the type of the unknown defect, inputting the identification parameters of the unknown defect into a neural network for prediction, and predicting and outputting the defect type of the unknown defect. The method can accurately identify the type of the pipeline defect, and has great engineering significance and good application prospect.

Description

Pipeline defect type identification method based on GA deep optimization machine learning

Technical Field

The invention relates to the technical field of pipeline defect detection, in particular to a pipeline defect type identification method based on GA deep optimization machine learning.

Background

In pipeline safety engineering, pipeline detection is a basic method for ensuring pipeline safety. Among various types of pipeline detection technologies, the magnetic flux leakage detection technology is the most widely applied and technically mature magnetic pipeline defect detection technology. The traditional magnetic flux leakage detection technology cannot accurately identify the type of the pipeline defect in the operation process, and only can obtain the external dimension and the position of the defect according to a magnetic flux leakage signal.

However, identifying the type of the pipeline defect has great engineering significance. For example, different pipeline defect types have different damage influences on the pipeline, corresponding pipeline maintenance methods are different, the pipeline defect types are identified, and the pipeline maintenance cost can be reduced to a certain extent.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a pipeline defect type identification method based on GA deep optimization machine learning, which can accurately identify the pipeline defect type and has great engineering significance and good application prospect.

In order to achieve the purpose, the invention adopts the following technical scheme that:

a pipeline defect type identification method based on GA deep optimization machine learning comprises the following steps:

s1, acquiring triaxial magnetic flux leakage signals of the defects under different defect types;

the three-axis magnetic flux leakage signal of the defect refers to: the axial magnetic leakage signal is an axial component, the radial magnetic leakage signal is a radial component, and the circumferential magnetic leakage signal is a circumferential component;

the axial direction is the direction along the length of the pipeline, the radial direction is the direction vertical to the inner wall of the pipeline, and the circumferential direction is the circumferential direction along the pipeline;

the circumferential component is selected in the following mode: determining the peak position of the radial component on the axial path, and selecting the value of the radial component on the circumferential path where the peak position is as the value of the circumferential component;

s2, extracting signal characteristic parameters aiming at the triaxial leakage magnetic signal of the defect, including: axial component differential signal peak-to-valley spacing DS _xp-p Axial component waveform area S _x Median circumferential component peak-to-valley spacing S _y-50％ Radial component peak-to-valley value B _zp-p Radial component peak-to-valley spacing S _zp-p ；

Wherein the axial component differential signal peak-to-valley spacing DS _xp-p The extraction method comprises the following steps: differentiating the axial component to obtain an axial component differential signal, and extracting the peak-to-valley spacing DS of the axial component differential signal _xp-p ；

Median spacing S between circumferential component peaks and valleys _y-50％ The extraction method comprises the following steps: extracting the wave crest and the wave trough of the circumferential component, calculating the middle value of the wave crest and the wave trough, namely 50% of the difference between the wave crest and the wave trough as the median of the wave crest and the wave trough, and the distance between the median of the wave crest and the wave trough on the circumferential component is the distance S between the median of the wave crest and the wave trough on the circumferential component _y-50％；

S3, differentiating the axial component into a signal peak-to-valley spacing DS _xp-p Median circumferential component peak-to-valley spacing S _y-50％ Radial component peak-to-valley value B _zp-p Radial component peak-to-valley spacing S _zp-p Axial component waveform area S _x And taking the number N of the sensors which acquire the defect leakage magnetic signals on the leakage magnetic sensors as identification parameters of defect types to construct a sample set;

the sample data in the sample set includes: each identification parameter value of the defect and the corresponding defect type;

s4, constructing a neural network, wherein the input of the neural network is each identification parameter value of the defect, and the output is the defect type; training by using a sample set and generating a neural network;

s5, performing type identification on the unknown defects, wherein the process is as follows:

s51, extracting characteristic parameters of each signal according to the triaxial magnetic leakage signal of the unknown defect, and obtaining the number N of sensors on a magnetic leakage sensor, which acquire the magnetic leakage signal of the unknown defect, to obtain identification parameter values of the unknown defect;

and S52, inputting each identification parameter value of the unknown defect into the neural network obtained in the step S4 for prediction, and predicting and outputting the defect type of the unknown defect.

Preferably, in step S4, in the training process of the neural network, the neural network is deeply optimized based on the genetic algorithm; the process of deep optimization of the BP neural network by the genetic algorithm and training of the BP neural network is specifically as follows:

s41, initializing relevant parameters of the neural network, including: the number of hidden layer nodes, the hyper-parameters, the initial weight and the initial offset;

s42, in the setting range of the hyperparameter and the hidden layer node empirical formula, firstly, carrying out iterative optimization on the number of the hidden layer nodes and the hyperparameter by using a genetic algorithm, leading in a neural network model after determining the optimal values of the number of the hidden layer nodes and the hyperparameter, and determining the topological structure of the neural network;

s43, performing iterative optimization on the initial weight and the initial offset by using a genetic algorithm, and introducing the initial weight and the initial offset obtained after the iterative optimization into a neural network model;

s44, training the neural network by using the sample set, calculating the error between the predicted value and the true value of the defect type, judging whether the error between the predicted value and the true value meets the condition, and if not, updating the weight and the offset; and if the conditions are met, outputting a training result to obtain the trained neural network.

Preferably, in step S42, the empirical formula of the hidden layer node is:

wherein m and n are the number of nodes of the input layer and the output layer respectively, m is 6, and n is 1; a is an adjusting constant with the value of 1-10; h is the number of hidden layer nodes, and the number h of the hidden layer nodes is optimized within the range of 3-20.

Preferably, in step S42, the algorithm parameters of the neural network, i.e. the hyperparameters, include sigma and lambda; setting the two hyper-parameters of sigma and lambda to be optimized in the range of 0-1.

Preferably, in step S42, the objective function for iteratively optimizing the number of hidden layer nodes and the hyper-parameter by using the genetic algorithm is a mean square error MSE between a predicted output value and a true value of the neural network, and the formula is as follows:

wherein Z is the number of sample data, p _z In order to be the true value of the value,

is the predicted output value.

Preferably, in step S43, the objective function iteratively optimized by using the genetic algorithm for the initial weight and the initial offset is also a mean square error MSE between a predicted output value and a true value of the neural network, where the relationship between the independent variables of the objective function optimized for the initial weight and the initial offset and the initial weight and the initial offset is as follows:

Num＝w ₁ +b ₁ +w ₂ +b ₂

w ₁ ＝m*h

b ₁ ＝h

w ₂ ＝h*n

b ₂ ＝n

where Num is the total number of independent variables, w ₁ Is the number of weights from the input layer to the hidden layer, b ₁ Number of offsets, w, for the hidden layer ₂ The number of weights from the hidden layer to the output layer, b ₂ The number of offset of the output layer is h is the number of nodes of the hidden layer, and m and n are the number of nodes of the input layer and the output layer respectively; setting the optimization intervals of the initial weight and the initial offset to be-2.

Preferably, in step S1, the defect types include: cracks, surface layer falling, pits and air holes.

Preferably, the classification information of each defect type is as follows:

the length of the crack is 5-30 mm, the width is 0.3-1 mm, and the depth is 1-8 mm;

the length of the surface layer falling off is 20-50 mm, the width is 20-50 mm, and the depth is 1-3 mm;

the length of the pit is 5-15 mm, the width is 5-30 mm, the depth is 2-9 mm,

the radius of the air hole is 1-3 mm, and the depth from the inner wall of the pipeline is 2-10 mm.

Preferably, in step S1, a three-dimensional finite element model of the pipeline defect is established, and in the three-dimensional finite element model, magnetic flux leakage simulation is performed on the pipeline defect by using ANSYS Maxwell to obtain a corresponding magnetic flux leakage signal, so as to obtain a triaxial magnetic flux leakage signal of the defect under different defect types.

The invention has the advantages that:

(1) the invention provides a pipeline defect type identification method based on GA deep optimization machine learning, which can accurately identify the type of a pipeline defect according to a magnetic flux leakage signal of the defect, and accurately identify the type of the pipeline defect in the operation process by a magnetic flux leakage detection technology, and has great engineering significance and good application prospect.

(2) According to the invention, through analyzing the relation between the magnetic leakage signal and the defect size, the corresponding identification parameters are extracted, and the effectiveness of different identification parameters on different defect size inversions is analyzed, so that the workload can be greatly reduced, the accuracy rate of pipeline defect identification and quantification is improved, and the method has great significance for defect detection.

(3) The invention deeply optimizes the BP neural network by using the genetic algorithm, optimizes the related parameters by using the genetic algorithm, and has the advantage of the optimization capability of the genetic algorithm in the currently used mainstream optimization algorithm. And a greedy algorithm is used for carrying out iterative optimization on the number of nodes of the hidden layer within the setting range of the hyper-parameter and the empirical formula, and then carrying out iterative optimization on the initial weight and the offset, so that the training capability of the neural network achieves the optimal effect. The greedy algorithm is applied to divide the whole optimization problem of the BP neural network into two subproblems, each subproblem is solved to obtain the local optimal solution of the subproblem, then the optimal solution of each subproblem is synthesized into the whole solution of the original problem, and because the number of independent variables for whole optimization is large, the requirement on the operand is high, the greedy algorithm saves a large amount of time which needs to be consumed for seeking the whole optimal solution in one step and is exhaustive, and a relatively good optimization effect can be obtained.

(4) The BP neural network algorithm model after GA deep optimization has a very obvious classification effect on sample data, the defect type can be effectively judged, the identification precision of the optimized BP neural network can reach 100%, and meanwhile, the effectiveness of the selected identification parameters on identifying the defect type is verified through the BP neural network deep optimization through a genetic algorithm.

(5) According to the actual condition of the pipeline, the invention uniformly defines and classifies the defects by utilizing the difference of the sizes and the relative position information of the pipeline defects, and divides the pipeline defects into 4 types of cracks, surface layer falling, pits and air holes.

(6) The method comprises the steps of establishing a three-dimensional finite element model of the pipeline defects, wherein each type of defects in the three-dimensional finite element model are uniformly distributed in a defined size range and a defined relative position, and performing magnetic leakage simulation on the pipeline defects by using ANSYS Maxwell to obtain corresponding magnetic leakage signals, so that the triaxial magnetic leakage signals of the defects under different defect types are obtained.

(7) The invention extracts 6 identification parameters expressing defect types, realizes the visualization of the unsupervised clustering result of multi-dimensional data by using a genetic algorithm optimization K-means clustering method in the embodiment, can check and predict the quality of the defect type identification effect by using the principal component analysis technology to reduce the dimension to process the data, and finally verifies the effectiveness of the selected characteristic parameters for identifying the defect types by using a genetic algorithm deep optimization BP neural network method.

(8) The method provided by the invention has the advantages that the sample data acquired under the conditions of different working scenes, different equipment, different materials, different magnetization conditions and the like are different, but the integral type prediction method is suitable for all occasions.

Drawings

FIG. 1 is a flow chart of a pipeline defect type identification method based on GA deep optimization machine learning.

Fig. 2 is a leakage flux plot of an axial component differential signal.

Fig. 3 is a graph of leakage flux of the circumferential component.

Fig. 4 is a graph of leakage flux for the radial component.

Fig. 5 is a graph of leakage flux of the axial component.

Fig. 6 is a two-dimensional scatter diagram of the clustering result visualization.

FIG. 7 is a flow chart of the GA _ BP algorithm.

FIG. 8 is a diagram of the effect of BP neural network on defect type identification before optimization.

Fig. 9 is a diagram of the effect of the optimized BP neural network on defect type identification.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, a pipeline defect type identification method based on GA deep optimization machine learning is characterized by including the following steps:

and S1, acquiring triaxial magnetic leakage signals of the defects under different defect types.

The defect types include: cracks, surface layer falling, pits and air holes;

the circumferential component is constructed by selecting a value of a radial component on a circumferential path of the pipeline, and the specific selection mode is as follows: determining the crest position of the radial component on the axial path, and selecting the value of the radial component on the circumferential path where the crest position is as the value of the circumferential component.

In the step S1, a three-dimensional finite element model of the pipeline defect is established, and in the three-dimensional finite element model, magnetic flux leakage simulation is performed on the pipeline defect by using ANSYS Maxwell to obtain a corresponding magnetic flux leakage signal, and a triaxial magnetic flux leakage signal of the defect under different defect types is obtained.

S2, extracting signal characteristic parameters aiming at the triaxial leakage magnetic signal of the defect, including: axial component differential signal peak-to-valley spacing DS _xp-p Axial component waveform area S _x Median circumferential component peak-to-valley spacing S _y-50％ Radial component peak-to-valley value B _zp-p Radial component peak-to-valley spacing S _zp-p 。

Median circumferential component peak-to-valley spacing S _y-50％ The extraction method comprises the following steps: extracting the wave crest and the wave trough of the circumferential component, and calculating the intermediate value of the wave crest and the wave trough, namely the difference S between the wave crest and the wave trough _max 50% of the circumferential component is used as the median value of the peak valley, and the distance between the median values of the peak valley on the circumferential component is the median distance S between the peak valley of the circumferential component _y-50％。

Axial component differential signal peak-to-valley spacing DS _xp-p Axial component waveform area S _x Median circumferential component peak-to-valley spacing S _y-50％ Radial component peak-to-valley value B _zp-p Radial component peak-to-valley spacing S _zp-p As shown in particular in fig. 2-5.

S3, differentiating the axial component into a signal peak-to-valley spacing DS _xp-p Circumferential component peak-to-valley median spacing S _y-50％ Radial component peak-to-valley value B _zp-p Radial component peak-to-valley spacing S _zp-p Axial component waveform area S _x And taking the number N of the sensors collecting the magnetic leakage signals of the defects on the magnetic leakage sensor as identification parameters of the defect types to construct a sampleCollecting; the sample data includes: the respective identification parameter values of the defects and the corresponding defect types.

S4, constructing a neural network, wherein the input of the neural network is the identification parameter of the defect type, and the output of the neural network is the defect type; training and generating a neural network by using the sample set.

In step S4, as shown in fig. 7, in the training process of the neural network, the neural network is deeply optimized based on the genetic algorithm; the process of deep optimization of the BP neural network by the genetic algorithm and training of the BP neural network is specifically as follows:

s41, initializing relevant parameters of the neural network, including: hidden layer node number, hyper-parameter, initial weight and initial offset.

S42, in the setting range of the hyperparameter and the hidden layer node empirical formula, the number of the hidden layer nodes and the hyperparameter are iteratively optimized by utilizing a genetic algorithm, the number of the hidden layer nodes and the optimal value of the hyperparameter are determined, then a neural network model is introduced, and the topological structure of the neural network is determined.

In step S42, the empirical formula of the hidden layer node is:

wherein m and n are the number of nodes of the input layer and the output layer respectively, m is 6, and n is 1; a is an adjusting constant with the value of 1-10; h is the number of hidden layer nodes, and the number h of the hidden layer nodes is optimized within the range of 3-20;

algorithm parameters of the neural network, namely hyper-parameters, comprise sigma and lambda; setting the two hyperparameters of sigma and lambda to be in the range of 0-1 for optimization;

the objective function for iteratively optimizing the number of hidden layer nodes and the hyper-parameter by using the genetic algorithm is the mean square error MSE of the predicted output value and the real value of the neural network, and the formula is as follows:

is a predicted output value.

And S43, performing iterative optimization on the initial weight and the initial offset by using a genetic algorithm, and introducing the initial weight and the initial offset obtained after the iterative optimization into the neural network model.

In step S43, the objective function that is iteratively optimized by using the genetic algorithm for the initial weight and the initial offset is also the mean square error MSE between the predicted output value and the true value of the neural network, where the relationship between the independent variable of the objective function that is optimized for the initial weight and the initial offset, the initial weight, and the initial offset is as follows:

Num＝w ₁ +b ₁ +w ₂ +b ₂

w ₁ ＝m*h

b ₁ ＝h

w ₂ ＝h*n

b ₂ ＝n

where Num is the total number of independent variables, w ₁ Is the number of weights from the input layer to the hidden layer, b ₁ Number of offsets for the hidden layer, w ₂ The number of weights from the hidden layer to the output layer, b ₂ The number of offset of the output layer is h is the number of nodes of the hidden layer, and m and n are the number of nodes of the input layer and the output layer respectively; setting the optimization intervals of the initial weight and the initial offset to be-2.

s51, extracting characteristic parameters of each signal according to the triaxial magnetic leakage signal of the unknown defect, and acquiring the number N of sensors on a magnetic leakage sensor for acquiring the magnetic leakage signal of the unknown defect, namely acquiring the identification parameters of the unknown defect;

and S52, inputting the identification parameters of the unknown defect into the neural network obtained in the step S4 for prediction, and predicting and outputting the defect type of the unknown defect.

In this embodiment, to the difficult problem of discerning of defect type when oil gas pipeline magnetic leakage detects, at first classify common pipeline defect, include: cracks, surface layer falling, pits and air holes; a three-dimensional finite element model for simulating the magnetic leakage of the pipeline defects is established, magnetic leakage simulation is carried out on 400 groups of defects with different sizes and different defect types by adopting Maxwell electromagnetic simulation, and defect magnetic leakage signals with different defect types and different sizes are obtained. And then, the relation between different types and sizes of the defects and magnetic leakage signals is analyzed, and 6 identification parameters for expressing the defect types are extracted by comparing different signal characteristic parameters. The method for optimizing the K-means clustering by using the genetic algorithm realizes the visualization of the unsupervised clustering result of the multi-dimensional data, and the method can detect and predict the quality of the defect type identification effect by using the principal component analysis technology to reduce the dimension to process the data. And finally, verifying the effectiveness of the selected characteristic parameters on identifying the defect type by a method for deeply optimizing the BP neural network through a genetic algorithm.

In this embodiment, the defects are uniformly defined and classified by using the difference between the sizes and the relative position information of the defects of the pipeline, and the defects of the pipeline are classified into 4 types, i.e., cracks, surface layer falling, pits and air holes, and the sizes and the relative position information of the defect classification are shown in the following table 1:

TABLE 1

Type of defect	Crack(s)	Detachment of surface layer	Pit	Air hole
					Length/mm	5～30	20～50	5～15	—
Width/mm	0.3～1	20～50	5～15	—
					Depth/mm	1～8	1～3	2～9	—
Radius/mm	—	—	—	1～3
					Depth from inner wall of pipeline	—	—	—	2～10
Number of groups	100	100	100	100

Establishing a three-dimensional finite element model of the pipeline defects, wherein each type of defect is uniformly distributed in a defined size range and a defined relative position in the three-dimensional finite element model, and performing magnetic leakage simulation on the pipeline defects by using ANSYS Maxwell to obtain corresponding magnetic leakage signals, wherein the magnetic leakage signals comprise axial magnetic leakage signals, namely axial components, radial magnetic leakage signals, namely radial components, and circumferential magnetic leakage signals, namely circumferential components; constructing a sample set, wherein the sample data comprises: the respective identification parameter values of the defects and the corresponding defect types. The axial direction is the direction along the length of the pipeline, the radial direction is the direction vertical to the inner wall of the pipeline, and the circumferential direction is the circumferential direction along the pipeline; the circumferential component is constructed by selecting a value of a radial component on a circumferential path of the pipeline, and the specific selection mode is as follows: determining the crest position of the radial component on the axial path, and selecting the value of the radial component on the circumferential path where the crest position is as the value of the circumferential component.

The relationship between the magnetic flux leakage signal and the defect size is analyzed, corresponding signal characteristic parameters are extracted, the effectiveness of different signal characteristic parameters on different defect size inversion is analyzed, the workload can be greatly reduced, the accuracy rate of pipeline defect identification and quantification is improved, and the method has great significance on defect detection.

The specific analysis is as follows:

wherein the radial component peak-to-valley distance S of the leakage magnetic signal _zp-p Differential signal peak-to-valley spacing DS from axial component _xp-p The linear relationship is in direct proportion to the defect length, and the method is suitable for evaluating the characteristic quantity of the defect length. Axial component differential signal peak-to-valley spacing DS _xp-p The extraction method comprises the following steps: differentiating the axial component to obtain an axial component differential signal, and extracting the peak-to-valley spacing DS of the axial component differential signal _xp-p . In addition, the axial component differential signal of the leakage magnetic signalPeak-to-valley spacing DS _xp-p The method is not influenced by factors such as defect width, depth, sensor lift-off value change and the like, and has strong stability, namely, when the interference amount changes, the peak-valley distance DS of the differential signal of the axial component _xp-p The method does not change along with the change of the defect length, and is verified in finite element simulation, and the signal fluctuation error is within 10 percent. The axial component differential signal peak-to-valley spacing DS _xp-p Is a typical signal characteristic of defect length, and the peak-valley space DS of the differential signal of the axial component _xp-p As shown in fig. 2.

When the magnetic leakage detector carries out axial excitation, the sensors for detecting the magnetic leakage signals are arranged in the circumferential direction, and the circumferential distance of the magnetic leakage signals detected by the sensors is approximately equal to the width of the defect. Therefore, the number N of the sensors receiving the defect leakage magnetic signals is close to the direct proportional relation with the defect width. Therefore, the number N of the sensors receiving the defect leakage magnetic signals can be used as an important index for quantifying the defect width. Also because of this particular property, a circumferential component peak-to-valley median spacing S is also produced _y-50％ The extraction of the signal characteristic parameter and the circumferential component firstly determines the maximum position of the radial component on the axial path, namely the peak position, and then extracts the radial component on the circumferential path where the peak position is located as the circumferential component; median circumferential component peak-to-valley spacing S _y-50％ The extraction method comprises the following steps: extracting the wave crest and the wave trough of the circumferential component, calculating the middle value of the wave crest and the wave trough, namely 50 percent of the difference value between the wave crest and the wave trough as the median value of the wave crest and the wave trough, and the distance between the median values of the wave crest and the wave trough on the circumferential component is the distance S between the median values of the wave crest and the wave trough of the circumferential component _y-50％ . Median circumferential component peak-to-valley spacing S _y-50％ Differential signal peak-to-valley spacing DS from the axial component _xp-p Has the same signal stability, does not change along with the change of the defect depth and the lifting distance, and is verified in finite element simulation, and the median distance S between the peaks and the valleys of the circumferential component _y-50％ The fluctuation error is small, and the circumferential component peak-valley median spacing S _y-50％ As shown in fig. 3.

Radial component peak-to-valley spacing S of leakage signal _zp-p Sum axial component waveform area S _x Both signal characteristic parameters are equal toThe defect depth is close to the direct proportion relation, and the larger the defect depth is, the stronger the signal characteristic is. Thus, these two signal features are somewhat representative, the radial component peak-to-valley spacing S _zp-p Sum axial component waveform area S _x As shown in fig. 4 and 5.

Thus, the peak-to-valley spacing DS of the differential signal of the axial component is selected _xp-p Axial component waveform area S _x Median circumferential component peak-to-valley spacing S _y-50％ Radial component peak-to-valley value B _zp-p Radial component peak-to-valley spacing S _zp-p And the number N of the sensors which acquire the defect leakage magnetic signals is used as an identification parameter of the defect type.

In this embodiment, there are 6 identification parameters for extracting defect types, and there are 400 groups of defect samples of different types, and the correspondence between the identification parameters and the defect types cannot be directly observed from the data only.

In the embodiment, the unsupervised classification of the identification parameters based on GA optimization K-means clustering is firstly analyzed. Clustering, which is a technique for finding such an internal structure, is a process of categorizing and organizing data members of a data set that are similar in some way, and is often referred to as unsupervised learning. Dividing all data into K groups, then randomly selecting K objects as initial clustering centers, then analyzing the Euclidean square distance from all data samples to each clustering center, and distributing each sample to the nearest clustering center. The cluster center and the sample data to which it belongs represent a cluster. And whenever samples are grouped into a cluster, the cluster center of the cluster is updated according to the existing sample amount. The process is circulated until all samples are clustered, the cluster center point positions of different clusters are not updated, and the square sum of errors is partially minimized. The method applies the principal component analysis technology to perform dimensionality reduction processing on the multidimensional data, and realizes the visualization of the clustering result of the data in a two-dimensional plane by calculating comprehensive influence factors.

Since K clustering centers initially selected by the K-means clustering algorithm have randomness, the effect of each calculation is greatly different, and the instability is strong. If the initial cluster center selection effect is poor, the iteration times of the algorithm are increased, and the workload is increased. And the result that is not necessarily what we want is finally obtained, the clustering error is increased. Therefore, in the embodiment, the K-means clustering is optimized based on the genetic algorithm, and the purpose is to find the optimal initial clustering point through the genetic algorithm, so that the algorithm achieves the optimal clustering result. The number of the optimized independent variables is as follows:

N＝K*n

in the formula: k is the number of clusters, namely the number of clustering groups, and k is 4, which represents 4 types of defect types; n is a sample variable dimension, and n is 6 and represents identification parameters of 6 defect types; n is the number of independent variables, and N is 24. The objective function optimized by the genetic algorithm is the sum of Euclidean distances from each data sample to the nearest cluster center point. The formula for calculating the Euclidean distance is as follows:

wherein d is the Euclidean distance, n is the dimension of the sample variable, x _i And y _i Two-point coordinates in the same dimension.

And determining an optimal initial clustering center by calculating the minimum value of the sum of Euclidean distances, and then importing an algorithm model to achieve an optimal clustering effect. In this embodiment, whether the selected defect type identification parameter can achieve an effective defect classification effect is determined by the method, and a clustering result is shown in fig. 6. In the data dimension reduction process, the influence factor 1 and the influence factor 2 are two main components for calculating the comprehensive score of the multidimensional data influence weight.

As can be seen from FIG. 6, the overall data clustering effect is very obvious, the data points in each cluster are closely connected, and the correlation is strong; the data point intervals among different clusters are dispersed and easy to distinguish. In these 4 defect types, only the boundary distinction between cluster 2 and cluster 3 is not so obvious, and the edge data structures of these two clusters are similar, and the defect sizes are about the same, which produces the effect as shown in the figure.

Therefore, the defect type is identified in an unsupervised machine learning mode, and the effect of the extracted 6 identification parameters on the identification and quantification of the defect type is verified to be very obvious, so that the expected effect is met.

In this embodiment, the defect type is identified by the BP neural network based on GA (genetic algorithm) deep optimization, and there are many relevant parameters that affect the training effect of the BP neural network: setting algorithm parameters, setting the number of hidden layer nodes, randomly selecting initial weight and initial offset and the like. Different settings of the related parameters can greatly influence the training effect of the BP neural network, so that the related parameters are optimized through the genetic algorithm, and the optimization capability of the genetic algorithm is more advantageous in the currently used mainstream optimization algorithm.

A genetic algorithm deeply optimizes a BP neural network, a greedy algorithm is used for carrying out iterative optimization on the number of nodes of an implicit layer within the setting range of a hyper-parameter and an empirical formula, and then iterative optimization is carried out on initial weight and offset, so that the training capacity of the neural network achieves the optimal effect. The greedy algorithm is applied to divide the whole optimization problem of the BP neural network into two subproblems, each subproblem is solved to obtain the local optimal solution of the subproblem, then the optimal solution of each subproblem is synthesized into the whole solution of the original problem, and because the number of independent variables for whole optimization is large, the requirement on the operand is high, the greedy algorithm saves a large amount of time which needs to be consumed for seeking the whole optimal solution in one step and is exhaustive, and a relatively good optimization effect can be obtained.

As shown in fig. 7, the process of deep optimization of the BP neural network by the genetic algorithm and training of the BP neural network is as follows:

s41, initializing relevant parameters of the BP neural network, including: algorithm parameters, namely hyper-parameters, the number of hidden layer nodes, initial weight and initial offset;

and S42, carrying out iterative optimization on the hyperparameter and the number of nodes of the hidden layer and the hyperparameter in the set range of the empirical formula of the nodes of the hidden layer by using a genetic algorithm.

The empirical formula for setting the hidden layer nodes is as follows:

h is the number of nodes of the hidden layer, m and n are the number of nodes of the input layer and the output layer respectively, and a is an adjusting constant with the value of 1-10. The designed BP neural network has 6 input layer nodes and 1 output layer nodes. Therefore, the number h of hidden layer nodes is set to be optimized in an integer range of 3-20.

trainspg is a network training function that updates the initial weights and bias values according to a scaled conjugate gradient method. The training of the BP neural network is carried out according to trainspg training parameters, the training is carried out according to default parameter values when no intervention is carried out, most default parameter values have little influence on the training effect of different neural networks, and the two hyper-parameters of sigma and lambda are easily influenced by different training data. The hyperparametric sigma is used to determine the weight change of the second derivative approximation, with default value of 5 × 10 ^-5 . The hyper-parameter lambda is used to adjust the parameters of the uncertainty of the Hessian, with default values of 5 × 10 ^-7 . Therefore, the two hyper-parameters of sigma and lambda are set to be optimized within the range of 0-1.

The optimized objective function of the genetic algorithm is the mean square error MSE of the BP neural network test set, and the formula is as follows:

wherein SSE is sum variance, Z is number of samples, p _z In order to be the true value of the value,

is a predicted output value. And optimizing the two hyperparameters of sigma and lambda and the three independent variables of the number of hidden layer nodes by using a systematic random initial weight and an initial offset.

S43, after the super-parameters and the hidden layer nodes are optimized, introducing the optimal values of the super-parameters and the hidden layer nodes into an algorithm model, firstly determining the topological structure of the BP neural network, then optimizing the initial weight and the initial offset by using a genetic algorithm, wherein the objective function optimized by the genetic algorithm is still the Mean Square Error (MSE) of the BP neural network test set. The relationship and the composition formula of the independent variable, the initial weight and the initial offset are as follows:

Num＝w ₁ +b ₁ +w ₂ +b ₂

w ₁ ＝m*h

b ₁ ＝h

w ₂ ＝h*n

b ₂ ＝n

where Num is the total number of independent variables, w ₁ As the number of weights from the input layer to the hidden layer, b ₁ Number of offsets for the hidden layer, w ₂ Number of weights from hidden layer to output layer, b ₂ The number of offsets of the output layer, h the number of nodes of the hidden layer, and m and n the number of nodes of the input layer and the output layer, respectively. Setting the optimization upper and lower limits of the initial weight and the initial offset to be-2.

S44, finally, importing the optimum solution of the number of hidden layer nodes, the hyper-parameters of sigma and lambda, the initial weight and the initial offset into a BP neural network, training the BP neural network by using a sample set, calculating the error between the predicted output value and the true value to update the weight and the offset, judging whether the error between the predicted output value and the true value meets the condition or not, if not, further performing iterative feedback, and updating the weight and the offset; and if the condition is met, outputting a training result.

In this embodiment, data set category labels of cracks, surface layer peeling, pits, and air holes are set to 1, 2, 3, and 4, respectively, and each defect type is divided into 60 as a training set, 20 as a verification set, and 20 as a test set. And (3) setting the number of input layer nodes of the BP neural network as 6, the number of output layer nodes as 1 and the number of hidden layer nodes as 15 after optimization, and rounding the output result of the BP neural network. The effect pair of the BP neural network before and after the deep optimization of the genetic algorithm on defect type identification is shown in FIGS. 8 and 9. Wherein, fig. 8 is a defect type identification effect of the BP neural network before the genetic algorithm deep optimization, and fig. 9 is a defect type identification effect of the BP neural network after the genetic algorithm deep optimization. In this embodiment, the defect type identification results of the BP neural network before and after GA depth optimization are respectively shown in table 2 below:

TABLE 2

As can be seen from table 2 and fig. 8 and 9, the optimized BP neural network algorithm model has a very significant classification effect on sample data, and can effectively distinguish the defect type, the recognition accuracy of the optimized BP neural network can reach 100%, and meanwhile, the effectiveness of the selected recognition parameters for recognizing the defect type is also verified by deeply optimizing the BP neural network through the genetic algorithm.

The invention is not to be considered as limited to the specific embodiments shown and described, but is to be understood to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A pipeline defect type identification method based on GA deep optimization machine learning is characterized by comprising the following steps:

the selection mode of the circumferential component is as follows: determining the crest position of the radial component on the axial path, and selecting the value of the radial component on the circumferential path where the crest position is as the value of the circumferential component;

s2, extracting signal characteristic parameters aiming at the triaxial leakage magnetic signal of the defect, including: axial component differential signal peak-to-valley spacing DS _xp-p Axial component waveform area S _x Circumferential component peak-to-valley median spacing S _y-50％ Radial component peak-to-valley value B _zp-p Radial component peak-to-valley spacing S _zp-p ；

Median circumferential component peak-to-valley spacing S _y-50％ The extraction method comprises the following steps: extracting the wave crest and the wave trough of the circumferential component, calculating the middle value of the wave crest and the wave trough, namely 50% of the difference between the wave crest and the wave trough as the median of the wave crest and the wave trough, and the distance between the median of the wave crest and the wave trough on the circumferential component is the distance S between the median of the wave crest and the wave trough on the circumferential component _y-50％；

S3, differentiating the peak-to-valley interval DS of the axial component differential signal _xp-p Median circumferential component peak-to-valley spacing S _y-50％ Radial component peak-to-valley value B _zp-p Radial component peak-to-valley spacing S _zp-p Axial component waveform area S _x And taking the number N of the sensors which acquire the defect leakage magnetic signals on the leakage magnetic sensors as identification parameters of defect types to construct a sample set;

s51, extracting characteristic parameters of each signal according to the triaxial magnetic leakage signal of the unknown defect, and obtaining the number N of sensors on a magnetic leakage sensor for acquiring the magnetic leakage signal of the unknown defect to obtain each identification parameter value of the unknown defect;

2. The GA deep optimization machine learning-based pipeline defect type identification method of claim 1, wherein in step S4, in the training process of the neural network, the neural network is deeply optimized based on a genetic algorithm; the process of deep optimization of the BP neural network by the genetic algorithm and training of the BP neural network is specifically as follows:

s41, initializing relevant parameters of the neural network, including: the number of hidden layer nodes, the hyper-parameter, the initial weight and the initial offset;

s42, in the setting range of the hyperparameter and the hidden layer node empirical formula, the number of the hidden layer nodes and the hyperparameter are subjected to iterative optimization by using a genetic algorithm, the number of the hidden layer nodes and the optimal value of the hyperparameter are determined, and then a neural network model is introduced to determine the topological structure of the neural network;

3. The GA deep optimization machine learning-based pipeline defect type identification method of claim 2, wherein in step S42, the empirical formula of the hidden layer node is:

4. The method for identifying pipeline defect types based on GA deep optimization machine learning of claim 2, wherein in step S42, the algorithm parameters of neural network, i.e. hyper-parameters, comprise sigma and lambda; setting the two hyper-parameters of sigma and lambda to be optimized in the range of 0-1.

5. The method for identifying the type of the pipeline defect based on GA depth optimization machine learning of claim 2, wherein in step S42, an objective function for iterative optimization of the number of hidden layer nodes and the hyper-parameter by using a genetic algorithm is a Mean Square Error (MSE) between a predicted output value and a real value of a neural network, and the formula is as follows:

is the predicted output value.

6. The method for identifying pipeline defect types based on GA depth optimization machine learning of claim 5, wherein in step S43, the objective function iteratively optimized by using the genetic algorithm on the initial weight and the initial offset is also the Mean Square Error (MSE) between the predicted output value and the true value of the neural network, wherein the relationship between the independent variables of the objective function optimized on the initial weight and the initial offset and the initial weight and the initial offset is as follows:

Num＝w ₁ +b ₁ +w ₂ +b ₂

w ₁ ＝m*h

b ₁ ＝h

w ₂ ＝h*n

b ₂ ＝n

where Num is the total number of independent variables, w ₁ Is the weight number of the input layer to the hidden layerNumber b ₁ Number of offsets, w, for the hidden layer ₂ The number of weights from the hidden layer to the output layer, b ₂ The number of offset of the output layer is h is the number of nodes of the hidden layer, and m and n are the number of nodes of the input layer and the output layer respectively; setting the optimization intervals of the initial weight and the initial offset to be-2.

7. A GA depth optimization machine learning-based pipeline defect type identification method according to claim 1, wherein in step S1, the defect types comprise: cracks, surface layer falling, pits and air holes.

8. The GA deep optimization machine learning-based pipeline defect type identification method according to claim 7, wherein classification information of each defect type is as follows:

the length of the pit is 5-15 mm, the width is 5-30 mm, the depth is 2-9 mm,

9. The method for identifying the type of the pipeline defect based on GA depth optimization machine learning of claim 1, 7 or 8, wherein in step S1, a three-dimensional finite element model of the pipeline defect is established, and in the three-dimensional finite element model, magnetic flux leakage simulation is performed on the pipeline defect by using ANSYS Maxwell to obtain corresponding magnetic flux leakage signals, so that triaxial magnetic flux leakage signals of the defect under different defect types are obtained.