CN113379737A

CN113379737A - Intelligent pipeline defect detection method based on image processing and deep learning and application

Info

Publication number: CN113379737A
Application number: CN202110792859.0A
Authority: CN
Inventors: 王兵; 肖斌; 乐红霞; 赵春兰
Original assignee: Southwest Petroleum University
Current assignee: Southwest Petroleum University
Priority date: 2021-07-14
Filing date: 2021-07-14
Publication date: 2021-09-10

Abstract

The invention discloses an intelligent pipeline defect detection method based on image processing and deep learning and application. The intelligent detection method comprises the following steps: preprocessing pipeline image data, including edge detection, inputting the preprocessed image into a trained detection model, and performing defect detection; the detection model is constructed based on a Darknet53 backbone network and a multi-scale feature extraction network. The detection method can accurately identify and detect different defects of the pipeline, and obtain the model meeting the industrial application level, and has high detection efficiency and strong environmental universality.

Description

Intelligent pipeline defect detection method based on image processing and deep learning and application

Technical Field

The invention relates to the technical field of Internet of things and artificial intelligence.

Background

The pipeline is a liquid and gas transmission medium commonly used in the petroleum field, and due to the complexity of working environment and transmission materials, the pipeline is very easy to corrode, block or even crack in the using process, so that the pipeline is required to be detected regularly to ensure the service life and the safety of the pipeline. The traditional manual pipeline defect detection method has the conditions of more time consumption, easiness in false detection and missed detection and the like, so that the intelligent defect detection method is a better choice.

The existing intelligent petroleum pipeline defect detection method relates to a computer vision technology and an embedded hardware technology, wherein in the aspect of hardware, a camera and a hardware platform which can sense the environment, analyze the scene and make corresponding reactions are needed. However, due to the complexity of the detection environment, the requirements of a hardware platform for carrying out scene automatic detection through a small camera are higher and higher, and a plurality of new challenges are brought, for example, how to correctly judge the defects of the petroleum pipeline; how to deploy a detection system on a platform with limited computing power and memory; how to balance detection real-time performance and accuracy, and the like.

In part of the prior art, the above challenges are considered to be solved through deep learning, for example, a detection framework is constructed through a Convolutional Neural Network (CNN) in a deep neural network, and an intelligent detection method with relatively good speed and accuracy is obtained. The current CNN-based target detectors mainly comprise SPP-NET, FasterR-CNN, MaskR-CNN, Retina Net, SSD, YOLO, YOLOv2(YOLO9000), YOLOv3 and the like, and the target detection methods can be divided into two types, namely a two-stage target detector and a one-stage target detector according to whether an additional region suggestion module is needed. The two-stage target detector needs to generate a sparse candidate box set by using a candidate box generator, extract features from each candidate box, and predict the category of a candidate box region through a region classifier; the object detector in the first stage directly performs class prediction on the object at each position on the feature map without a region classification step. Generally speaking, the two-stage target detector has better detection performance, can obtain the current optimal result on the public reference, and the one-stage target detector is more time-saving and has stronger applicability in the aspect of real-time target detection. The two detectors can extract an interested target in a picture or a video and are applied to the fields of blind guiding systems, pedestrian detection, traffic sign detection, vehicle detection and the like, but the target detectors basically need to rely on a computing platform with higher performance and a large operating memory to keep better performance.

In order to solve the challenge that defects are difficult to distinguish under special conditions, part of the prior art adds an image processing technology to strengthen the visual effect of a target and improve the judgment of the related characteristics of the image.

For example, in the prior art document of "an expansion circular crack detection algorithm based on image processing", an expansion circular crack detection algorithm based on image processing is proposed, which sets a crack as a planar space filled with a plurality of circles, and establishes and restores a geometric model of the crack by using coordinates of the circles and a diameter of a circle capable of approximately representing a crack width.

Or for example, chinese patent application CN111325738A discloses "an intelligent detection method and system for transverse perforation peripheral cracks", the detection method includes a process of inputting an ultrasonic image of a transverse perforation to be detected into a crack detection model to obtain a detection result, wherein a crack data set is processed by a K-Means + + algorithm and a K-medidates clustering algorithm, and category information and position information of the transverse perforation crack data set are trained and tested by a YOLOV3 algorithm. The method can quickly and accurately identify the crack defects, but is also difficult to be applied to the detection of the corrosion type defects, and meanwhile, the method needs a special application scene, and the detection efficiency in a common scene is possibly poor.

It can be seen that the existing steel pipe defect detection system and method still have many defects, including that the direct deep learning algorithm is difficult to distinguish the steel pipe defect, the target detection method thereof needs a high-performance computing platform and a large running memory, after image processing is combined, the direct defect image resolution is low, the detection and training effects are not good, the detection of corrosive defects can not be realized by simple image geometric modeling, and meanwhile, the existing image acquisition means does not have environmental universality and the like.

Disclosure of Invention

The invention aims to provide an intelligent detection method combining an image processing technology and a deep learning network, which can accurately identify and detect different defects of a pipeline and obtain a model meeting industrial application level. The detection efficiency is high, and various pipeline defects including cracks, corrosion, peeling, scratches and the like can be accurately detected in various application environments.

The invention also aims to provide an application of the intelligent detection method.

The invention firstly discloses the following technical scheme:

the intelligent pipeline defect detection method based on image processing and deep learning comprises the following steps:

s1 obtaining image data from the video about the pipe;

s2 performing preprocessing including edge detection on the image data;

s3, inputting the preprocessed image into the trained detection model for defect detection;

the detection model is constructed based on a Darknet53 backbone network and a multi-scale feature extraction network.

According to some embodiments of the present invention, the edge detection includes obtaining gradient modulus values of the grayed image in four directions including horizontal, vertical and two diagonal lines, and screening candidate pixels according to the obtained gradient modulus values through a high threshold and a low threshold.

According to some embodiments of the invention, the pre-treating comprises:

s21 converting the obtained image data into grayscale image data;

s22 performs edge detection on the obtained grayscale image data to obtain an edge detection map, i.e., a preprocessed image, as follows:

s221, obtaining the difference of the gray-scale image in the horizontal and vertical directions through a first-order differential operator;

s222, obtaining a gradient mode and a corresponding gradient angle of the candidate pixel point through the obtained square sum of the difference in the horizontal direction and the vertical direction;

s223, dividing the gray-scale image according to the gradient angle theta which is obtained by +/-i pi/8, i is 1,3,5 and 7 to obtain division areas which respectively correspond to four directions of the horizontal, vertical and two diagonal lines of the image, and representing gradient module values of candidate pixel points of the image in the corresponding direction by the gradient module values on each division line;

s224, dividing the gradient modulus of the obtained candidate pixel points in four directions through a high threshold and a low threshold, inhibiting the pixel points of which the gradient modulus is larger than the high threshold and smaller than the low threshold, and reserving the pixel points of which the gradient modulus is between the high threshold and the low threshold to obtain the edge detection graph.

According to some embodiments of the present invention, the Darknet53 backbone network of the detection model includes a first convolution unit that performs convolution, batch regularization, and activation on the input image, wherein a penalty function of the batch regularization is set as follows:

wherein, X represents an input image, y represents a corresponding label, ω represents a weight coefficient vector, J () represents the optimization function of the layer, α represents a strength control parameter, Ω (ω) represents a penalty term, and:

where i represents the number of layers.

According to some embodiments of the invention, the activation process is implemented by a LeakyReLU activation function.

According to some embodiments of the invention, the multi-scale feature extraction network of the detection model comprises: the prediction unit is used for performing target boundary frame prediction on three different scales, wherein each prediction unit comprises a detection head, three detection heads in total, and three convolution layers respectively connected with the three detection heads, wherein a first convolution layer is connected with the output of a fifth group of residual error blocks of the Darknet53 backbone network, a second convolution layer is connected with the output of a fourth group of residual error blocks of the Darknet53 backbone network, a third convolution layer is connected with the output of a third group of residual error blocks of the Darknet53 backbone network, an activation layer and a Dropout layer are connected with the outputs of the first convolution layer, the second convolution layer and the third convolution layer, and meanwhile, the first convolution layer is connected with the second convolution layer, and the second convolution layer is connected with the third convolution layer respectively.

According to some embodiments of the invention, based on the finding that there is a large difference between the types of the steel pipe cracks and the general target types, the invention preferably uses a clustering algorithm to analyze the steel pipe crack data so as to calculate the more general anchor frame proportion suitable for steel pipe crack detection.

More specifically, it is preferable to set the three anchor frame ratios of the smaller detection head to (10, 13), (16, 23), (33, 23); the proportion of three anchor frames of the medium detection head is respectively (30, 61), (62, 35), (55, 110); the three anchor frame ratios of the larger detection head are respectively (115, 90), (150, 190), (373, 320).

According to some embodiments of the present invention, the detection content of the detection head includes: 4 boundary coordinates, 1 objective detection value and 1C-type prediction value of the target, wherein the 4 boundary coordinates can select two groups of coordinates such as the upper left corner position and the lower right corner position of the detected object for explaining the position of the target; 1 objective detection value is the confidence coefficient of the target, namely the probability value that the target is a positive class, and the higher the value is, the more accurate the prediction is; c represents the number of categories of defects.

According to some embodiments of the invention, the activation function of the activation layer is a Maxout function as follows:

wherein, W represents a three-dimensional weight matrix of d × m × k, m represents the number of hidden layer nodes, k represents a hidden node of each hidden layer node, S represents the sum of the activation function values of the layer, i represents the ith layer, j represents the number of neurons in the layer, X represents a feature vector input by a certain layer, and b represents a two-dimensional offset matrix with the size of m × k.

The invention further provides an application of the intelligent detection method, which is used for detecting the surface defects of the petroleum pipeline.

According to some embodiments of the invention, in the applying, the defect type comprises one or more of a crack, a fissure, an erosion, a scratch, a presence of foreign matter and/or plaque, a presence of inclusions.

The beneficial effects of the invention include:

compared with the traditional steel pipe crack inspection method, the detection method disclosed by the invention has the advantages that the characteristics are more obvious through an image processing means, and the detection effect is favorably improved;

according to the invention, by combining a deep learning method, the model can automatically detect input data, so that a large amount of labor cost and detection time are saved, and whether the pipeline has defects can be judged more quickly and accurately;

the invention does not depend on specific hardware conditions and has good engineering practicability;

the detection method can realize efficient and accurate pipeline defect detection through small hardware storage equipment or a platform with relatively low calculation performance and relatively less running memory.

The detection method can accurately identify various pipeline defects with regular or irregular shapes.

The detection method has environmental universality.

The detection method has excellent application effect on pipelines with tiny and difficultly distinguished defects, complex environments and various types of defects, such as petroleum pipelines.

Drawings

Fig. 1 is a schematic flow chart of a detection method according to an embodiment.

Fig. 2 is a schematic diagram of a structure of a Darknet53 backbone network according to an embodiment.

Fig. 3 is a schematic diagram of a multi-scale feature extraction network structure according to a specific embodiment.

FIG. 4 is a data set illustration as described in example 1.

Fig. 5 is an interface diagram of the labelImg tool described in example 1.

FIG. 6 is a comparison of the image processing described in example 1.

Detailed Description

According to the technical scheme of the invention, a specific implementation mode comprises the following steps:

s1 obtains the pipeline video stream data in the avi format through the camera, the set resolution is 1280 x 760, the content with the pipeline defect in the video is captured and stored as the image in the jpg format according to the speed of 5 frames/sheet, the content without the pipeline defect in the video is captured and stored as the image in the jpg format according to the speed of 125 frames/second, and the initial training image data is obtained through the processing.

More specifically, as shown in fig. 1, when the camera device can normally obtain the video stream, the image further obtained according to the camera device is preprocessed, and when the camera device cannot obtain the video stream, the error log can be saved and the warning information can be output to prompt that the device is abnormal.

S2 performs edge detection preprocessing on the resultant image data.

More specifically, it may further include:

s21 converts the image obtained by the video stream into a gray scale image, for example, the obtained image is an RGB image, and it is subjected to a graying conversion by the following formula to obtain a gray scale image:

Gray＝0.299R+0.587G+0.114B，

wherein Gray represents Gray, R represents an R value of an image, G represents a G value of an image, and B represents a B value of an image.

S22, carrying out edge detection on the obtained gray-scale image to obtain an edge detection image;

the present invention further provides the following preferred edge detection embodiments:

s221 obtains the difference of the gray-scale map in the horizontal and vertical directions through a first-order differential operator, as follows:

wherein G is_xRepresenting the difference in the horizontal direction, G_yRepresents the difference in the vertical direction, a represents the resulting gray scale map, and represents the convolution process;

s222, obtaining a gradient mode and a gradient angle of the candidate pixel points in the gray-scale image through the obtained difference in the horizontal direction and the vertical direction according to the following formula:

θ＝atan3(G_y，g_x) (1)，

wherein G represents a gradient mode, theta represents a gradient angle, -pi is less than or equal to theta;

s223, dividing the gray-scale image according to the gradient angle theta which is obtained by +/-i pi/8, i is 1,3,5 and 7 to obtain divided areas which respectively correspond to four directions (0 degrees, 45 degrees, 90 degrees and 135 degrees) of the horizontal, vertical and two diagonal lines of the image, and representing gradient module values of candidate pixel points of the image in the corresponding direction by the gradient module values on each dividing line;

s224, dividing the obtained gradient modulus values in four directions through a high threshold and a low threshold, inhibiting pixel points of which the gradient modulus values of the pixel points are larger than the high threshold and smaller than the low threshold, and reserving the pixel points of which the gradient modulus values of the pixel points are between the high threshold and the low threshold to obtain an edge detection graph;

preferably, the high threshold is set to be 1.5 times of the average value of the gradient modulus values of all candidate pixel points in the image, and the low threshold is set to be 0.5 times of the average value of the gradient modulus values of all candidate pixel points in the image.

S3, inputting the preprocessed image into a detection model which is trained to perform defect identification, wherein the detection model is constructed on the basis of a Darknet53 backbone network and a multi-scale feature extraction network;

as shown in fig. 2, the Darknet53 backbone network includes a main network layer formed by 52 convolution layers, and 1 fully-connected layer containing 1 × 1 convolution. Wherein, the main network layer specifically includes: the convolutional encoder comprises a first convolutional unit (convolutional) containing 1 two-dimensional convolutional layer and 5 groups of residual error blocks (resblock _ body), wherein each group of residual error blocks comprises 1 single convolutional layer (convolutional) and a residual error processing group consisting of 1 to a plurality of residual error processing elements (res _ unit _ n), each residual error processing element comprises 2 convolutional layers, and specifically, the residual error processing elements in the first group to the fifth group of residual error blocks are respectively 1, 2, 8 and 4. This results in a host network of Darknet53 totaling: 1+ (1+1 × 2) + (1+2 × 2) + (1+8 × 2) + (1+8 × 2) + (1+4 × 2) ═ 52 layers.

As shown in fig. 3, the multi-scale feature extraction network includes: and the prediction units are used for predicting the target boundary frame on three different scales, wherein each prediction unit comprises a detection head which performs convolution operation and predicts three set anchor frames, and 3 convolution layers which are respectively connected with the 3 detection heads, wherein the first convolution layer is connected with the output of the fifth group of residual blocks of the Darknet53 backbone network, the second convolution layer is connected with the output of the fourth group of residual blocks of the Darknet53 backbone network, the third convolution layer is connected with the output of the third group of residual blocks of the Darknet53 backbone network, and meanwhile, the first convolution layer is connected with the second convolution layer, and the second convolution layer is connected with the third convolution layer.

On the basis of the structure, the training process of the model is as follows:

s31, inputting the training sample graph into the Darknet53 backbone network, performing convolution processing on the image through the two-dimensional convolution layer of the first convolution unit, and performing batch regularization processing on the obtained convolution image in the first convolution unit and activation processing through a LeakyReLU activation function;

more specifically, the batch regularization can preferably be in an L2 regularization mode, and the learning rate is preferably 5 e-4.

More specifically, the penalty function of the batch regularization process is preferably set as follows:

wherein, X represents a training sample, y represents a corresponding label, omega represents a weight coefficient vector, J () represents an optimization function of the layer, omega (omega) represents a penalty term, alpha represents a strength control parameter and is set to be 0.001; and wherein:

where i represents the number of layers.

The inventors further provide proof that this regularization does not produce sparseness, as follows:

setting the optimal solution of the original objective function J (omega) as a second-order derivable weight coefficient vector omega^*Then J (ω) is at ω^*The second order taylor expansion of (d) is:

wherein H represents a characteristic value of λ_jJ (ω) at ω^*The hessian matrix of (c).

When in use

When taking the minimum value, there are:

addition of an objective function due to regularization by L2 to J (ω)

Therefore, the method comprises the following steps:

i.e., ω^*Component on each feature vector of H and

scaling, then

L2 regularization does not produce sparseness.

More specifically, the LeakyReLU activation function is set as follows:

wherein, a_iDenotes a fixed parameter, x, within the interval (1, + ∞)_iRepresenting the input value.

With respect to the ReLU function in which all negative values are set to zero, the learyrelu assigns a non-zero slope to all negative values, which can overcome the problem that if the learning rate is large, half of the neurons in the network will not activate any data if the gradient change is 0 after the ReLU neurons and after the large gradient update.

Preferably, the slope of the LeakyReLU activation function is set to 0.1.

S32 inputs the image obtained by the first convolution unit into the residual block, and the residual block ensures that the model is normally propagated backward even if the gradient is small.

S33, inputting image features obtained by extracting 3 different groups of residual blocks through a Darknet53 backbone network into a multi-scale feature extraction network for multi-scale training to obtain 3 trained feature output graphs;

s34, processing the feature output graph through a Maxout activation function and dropout to obtain a fused detection result.

Wherein the Maxout activation function is set as follows:

wherein, W represents a three-dimensional weight matrix of dxmxk, m represents the number of nodes of a hidden layer, k represents the number of hidden nodes of each hidden layer node, namely the number of nodes of the hidden layer, S represents the sum of the activation function values of the layer, i represents the ith layer, j represents the number of neurons of the layer, X represents the input feature vector of a certain layer, and b represents a two-dimensional offset matrix with the size of mxk.

In the above process, the three-dimensional weight matrix W and the two-dimensional offset matrix b are obtained by learning.

When the Maxout function is used for activation, the three feature maps can be fused into a single feature map with the same dimension. The characteristic maps share the weight of each layer before the RoI pooling layer, and RoI with different scales is propagated to the target RoI pooling layer in a forward mode to obtain the characteristic map with fixed resolution. This advantageously increases the pipe defect extraction capability.

S34 optimizes the model.

More specifically, in the target detection task, the detection performance is sensitive to data, and model optimization can be performed through fine tuning, for example, a suitable model can be obtained by increasing the training times to be 2 times of the general training times, modifying the learning rate to be a value between 0.001 and 0.0001 according to the data set scale, modifying the weight attenuation value to be a value between 0.001 and 0.0001 according to the data set scale, and the like.

The model optimization operations described above may also be further performed on the trained model to compensate for temporarily degraded accuracy and potential performance degradation of the algorithm.

The present invention is described in detail below with reference to the following embodiments and the attached drawings, but it should be understood that the embodiments and the attached drawings are only used for the illustrative description of the present invention and do not limit the protection scope of the present invention in any way. All reasonable variations and combinations that fall within the spirit of the invention are intended to be within the scope of the invention.

Example 1

Obtaining a defect detection model by:

data acquisition

The invention is characterized in that the data is composed of two parts: a homemade data set and a NEU public data set. The homemade data set is derived from a video data set of an oil exploitation site, and a part of a video picture of the homemade data set is shown in figure 4. Because the collected field data set mainly comes from videos in a short time and the data has certain regularity, in order to avoid the problem, the collected videos are extracted according to frames to obtain image data.

The method is used for detecting whether the petroleum pipeline has defects or not, and an NEU public data set is added to supplement data diversity for achieving a better detection effect. The original labels of the public data set are classified as: mill scale (RS), patches (Pa), cracks (Cr), Pitting Surface (PS), inclusions (Is) and scratches (Sc) total 6 different types of surface defects.

Data pre-processing

Because the extracted data is a video format file, the video format file is converted into an image format file through an OpenCV development library to serve as primary screening data, and the problems of fuzzy data samples, irregular size and the like can be solved by using OpenCV. Meanwhile, the picture data acquired from the photographed video and the data crawled from the network have different resolutions, so that normalization processing is performed to format the picture randomly to an appropriate picture size, in this embodiment, the picture resolution after frame extraction is 4032 × 3024, and the picture resolution is reduced by 160 × 160 according to the proportion.

Data screening

And eliminating pictures without steel pipe data in the pictures obtained by framing, and selecting parts with steel pipes as the marked data set.

Data marking

Marking various self-defined targets in the image of the labeling data set through a labelImg tool: severe corrosion (seriously) and general corrosion (lightly), and automatically generating corresponding configuration files, wherein the tool interface is shown in figure 5.

Data normalization

Xml data generated by the laboratory tool was converted to txt documents in the VOC standard data format, after which the data was divided into validation and training sets in a quantitative ratio of 1: 9.

Image pre-processing

The comparison graph of the detection graph and the gray scale graph obtained by performing edge detection on the training set image by the edge detection method in the specific embodiment is shown in fig. 6.

Establishing a detection model

And adding multi-scale context information fusion processing between the 5 th and 6 th convolution layers in front of the three detection heads to enable the model to extract more small features, wherein the minimum scale in the multi-scale context is the original length and width output of the layer, the length and width of the medium scale is twice of the minimum scale, and the length and width of the maximum scale is four times of the minimum scale. And (3) training the detection model by selecting a random gradient descent (SGD) with momentum of 0.8 and weight attenuation of 0.0005, and randomly adjusting the size of an input image according to a certain scale factor to realize multi-scale training.

In the anchor frame prediction corresponding to the three detection heads, the proportions of three anchor frames of the smaller detection head are respectively (10, 13), (16, 23), (33, 23); the proportion of three anchor frames of the medium detection head is respectively (30, 61), (62, 35), (55, 110); the three anchor frame ratios of the larger detection head are respectively (115, 90), (150, 190), (373, 320).

In the embodiment, 100 stages of sparse training are performed in total, and the global threshold value of the channel pruning module is set

And pruning the model, wherein the residual hyper-parameters of the sparse training are the same as those of the normal training. Finally, the model is optimized and the algorithm is trained iteratively using the same high parameters as normal training.

Deploying the trained model on line

Serializing the trained model, constructing an API (application program interface) through a lightweight web framework flash in Python, and deploying the model on line.

The above examples are merely preferred embodiments of the present invention, and the scope of the present invention is not limited to the above examples. All technical schemes belonging to the idea of the invention belong to the protection scope of the invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention, and such modifications and embellishments should also be considered as within the scope of the invention.

Claims

1. The intelligent pipeline defect detection method based on image processing and deep learning is characterized by comprising the following steps of: the method comprises the following steps:

s1 obtaining image data from the video about the pipe;

s2 performing preprocessing including edge detection on the image data;

2. The intelligent detection method according to claim 1, characterized in that: the edge detection comprises the steps of obtaining gradient module values of the grayed image in the horizontal direction, the vertical direction and the two diagonal directions, and screening candidate pixel points according to the obtained gradient module values through a high threshold value and a low threshold value.

3. The intelligent detection method according to claim 2, wherein: the pretreatment comprises the following steps:

s21 converting the obtained image data into grayscale image data;

4. The intelligent detection method according to claim 1, characterized in that: the Darknet53 backbone network of the detection model comprises a first convolution unit for performing convolution, batch regularization and activation processing on an input image, wherein a penalty function of the batch regularization processing is set as follows:

wherein, X represents an input image, y represents a corresponding label thereof, ω represents a weight coefficient vector, J () represents an optimization function of the layer, α represents a strength control parameter, Ω (ω) represents a penalty term, and:

where i represents the number of layers.

5. The intelligent detection method according to claim 3, wherein: the activation process is implemented by the LeakyReLU activation function.

6. The intelligent detection method according to claim 1, characterized in that: the multi-scale feature extraction network of the detection model comprises: the prediction unit is used for predicting a target boundary frame on three different scales, wherein each prediction unit comprises a detection head which is used for carrying out convolution operation and used for predicting three anchor frames with different scales, three detection heads in total, and three convolution layers which are respectively connected with the three detection heads, wherein a first convolution layer is connected with the output of a fifth group of residual error blocks of the Darknet53 backbone network, a second convolution layer is connected with the output of a fourth group of residual error blocks of the Darknet53 backbone network, a third convolution layer is connected with the output of a third group of residual error blocks of the Darknet53 backbone network, the prediction unit further comprises an activation layer and a Dropout layer which are connected with the outputs of the first convolution layer, the second convolution layer and the third convolution layer, and meanwhile, the first convolution layer is respectively connected with the second convolution layer and the third convolution layer.

7. The intelligent detection method according to claim 6, wherein: the detection content of the detection head comprises: 4 boundary coordinates of the target, the confidence coefficient of the target and the number of defect types corresponding to the target.

8. The intelligent detection method according to claim 6, wherein: the activation function of the activation layer is a Maxout function as follows:

9. Use of the intelligent detection method of any one of claims 1-8 in petroleum pipeline defect detection.

10. Use according to claim 9, characterized in that: the defect types include one or more of cracks, fissures, corrosion, scratches, the presence of foreign matter and/or plaque, the presence of inclusions.