CN110349134B

CN110349134B - Pipeline disease image classification method based on multi-label convolutional neural network

Info

Publication number: CN110349134B
Application number: CN201910565360.9A
Authority: CN
Inventors: 唐露新; 张宇维
Original assignee: Tianhe College of Guangdong Polytechnic Normal University
Current assignee: Tianhe College of Guangdong Polytechnic Normal University
Priority date: 2019-06-27
Filing date: 2019-06-27
Publication date: 2022-12-09
Anticipated expiration: 2039-06-27
Also published as: CN110349134A

Abstract

The invention discloses a pipeline disease image classification method based on a multi-label convolutional neural network, which comprises the following steps: step 1: collecting a pipeline endoscopic detection video, and extracting image frames in the video; step 2: calculating the time stamp characteristics of each image; and step 3: sending a part of the image frames collected in the step 1 into a multi-label convolutional neural network model for training to obtain the multi-label convolutional neural network model capable of correctly classifying the types of the pipeline diseases; and 4, step 4: the method comprises the steps that a trained multi-label convolutional neural network model is used for detecting an endoscopic image of a pipeline to be detected, then the multi-label convolutional neural network model outputs a single hot code, the type of the existing pipeline diseases is determined according to the single hot code, a multi-label classification layer is added on the basis of the existing inclusion-ResNet-v 2 network, and the classification function of the multiple pipeline disease images is achieved.

Description

Pipeline disease image classification method based on multi-label convolutional neural network

Technical Field

The invention relates to the field of computer digital image processing and deep learning algorithm based on a convolutional neural network, in particular to a pipeline disease image classification method based on a multi-label convolutional neural network.

Background

There are two commonly used methods for acquiring an endoscopic image of a pipeline: one is a pipeline inspection robot technology based on closed circuit television inspection (CCTV), and the other is a pipeline periscope technology based on pipeline quick view inspection (QV).

In the prior art, a processing method after an image is obtained still mainly adopts manual observation at present and then carries out classification and identification. The following difficulties arise from manual observation:

the increase of the pipeline construction length can cause the generation of huge endoscopic videos and image data, for example, a pipeline detection robot operates at the maximum speed of 2.5-3m/s, videos of at least 5 minutes need to be shot for a pipeline with the length of 1000m, 9000 pictures can be generated when the videos are shot according to the frame rate of 30fps, and the detection result is inaccurate due to factors such as work fatigue and subjective consciousness because of the inevitable manual inspection and screening of huge data quantity; therefore, a computer digital image processing technology is needed to realize automatic classification and identification of diseases in the pipeline endoscopic image.

The pipeline is wide in distribution and multiple in types, and the types of generated diseases are relatively complex, but pipeline detection is an important means for maintaining normal operation of the pipeline, from manual submerged inspection to detection of professional instruments such as ground penetrating radar and sonar, and then to the mode of acquiring an endoscopic image of the pipeline by adopting a pipeline robot or a pipeline periscope for observation at present, the safety and universality of the pipeline are greatly improved, and the digital image processing technology successfully realizes automatic identification and monitoring of several pipeline diseases.

At present, two main schemes are available for classifying endoscopic images of pipelines by using a computer digital image processing technology:

(1) subjectively acquiring image features of specific disease types, and classifying by using a traditional machine learning classifier:

aiming at the images of the diseases of cracking, dislocation and disjointing in the pipeline, extracting horizontal and diagonal wavelet components, image entropy and symbiotic correlation as features, and combining the features into a feature vector;

<2> extracting the roundness, compactness and roughness of the outline as features aiming at the images of dirt deposition and tree root diseases in the pipeline, and combining the features into a feature vector;

extracting angular second moment, image entropy, correlation, dissimilarity, contrast and uniformity as features aiming at disease images such as blockage and illegal pipeline access in a pipeline, and combining the features into a feature vector;

and 4, sending the characteristic vectors into a support vector machine, a random forest and a full-connection neural network for training to realize classification of disease types.

(2) Objectively extracting image abstract features by using a convolutional neural network for classification:

normalizing all the shot endoscopic images of the pipeline into the resolution of 256 multiplied by 256;

sending the pictures into a convolutional neural network shown in figure 3 for training aiming at three diseases of root invasion, dirt deposition and pipe wall breakage which are common in pipeline diseases; in the figure, the input layer × 3 indicates that the input image is a color image, three channels of red, green, and blue are shared, the medium size of each convolution layer indicates the size and the number of channels of the original image through convolution operation. For example, 128 × 128 × 32 indicates that the original image becomes an image with a resolution of 128 × 128 and 32 data channels through the first convolution operation, and this image is referred to as a feature map.

<3> the number in the fully connected layer indicates that the feature map is converted into vectors of specific dimensions, for example 1024, or the feature map is converted into vectors of 1024 dimensions. 3 of the output layer represents a category vector and adopts a one-hot coding identifier;

<4>, for example, if a disease image with tree root intrusion is input, the output layer will output a vector in the shape of [0,0,1 ]; similarly, the deposition of dirt will be output [0,1,0], and the rupture of the tube wall will output [1,0,0].

However, the existing computer digital image processing technology needs to subjectively acquire the corresponding characteristics of the disease image and is too dependent on the quality of the shot image; for the inner wall of a pipeline with dark, damp and complex environment, obtaining high-quality images has certain difficulty;

the distinguishing sensitivity of different disease type image characteristics is different, and certain pipeline detection work experience is required by selecting the characteristics as the classification basis, so that the development of an image detection technology is limited;

in addition, the existing convolutional neural network has the advantages of simple model, few classification types, incapability of solving the problem of complicated pipeline disease type classification and low classification accuracy.

Disclosure of Invention

The invention adds a multi-label classification layer on the basis of the existing increment-ResNet-v 2 network, and realizes the classification function of various pipeline disease images.

A pipeline disease image classification method based on a multi-label convolutional neural network comprises the following steps:

step 1: collecting a pipeline endoscopic detection video, and extracting image frames in the pipeline endoscopic detection video;

step 2: calculating the time stamp characteristics of each image;

and step 3: sending a part of the image frames collected in the step 1 into a multi-label convolutional neural network model for training to obtain the multi-label convolutional neural network model capable of correctly classifying the types of the pipeline diseases;

and 4, step 4: and detecting the endoscopic image of the pipeline to be detected by using the trained multi-label convolutional neural network model, outputting a single-hot code by using the multi-label convolutional neural network model, and determining the type of the existing pipeline diseases according to the single-hot code.

Preferably, the multi-label convolutional neural network model comprises an upper-layer inclusion-ResNet-v 2 network structure and a lower-layer multi-label classification layer, and the upper-layer inclusion-ResNet-v 2 network structure further comprises a random deactivation layer.

Preferably, the processing steps of the lower multi-label classification layer are as follows:

the first step is as follows: a timestamp feature TimeFiture is added in a feature vector output by a random inactivation layer in an upper-layer inclusion-ResNet-v 2 network structure,

the second step is that: performing dimension reduction activation processing on the feature vector added with the timestamp feature in the first step to obtain a medium latitude value,

and thirdly, continuing to perform dimensionality reduction on the medium latitude value in the second step, and finally outputting the one-hot code.

Preferably, the calculation formula of the timestamp feature TimeFeature is as follows:

the Current _ Frame _ Index is a position serial number of the to-be-detected pipeline endoscopic image in the whole pipeline endoscopic detection video, and the All _ Frames _ Num is the total Frame number of the whole pipeline endoscopic detection video.

Preferably, the types of the pipeline diseases comprise: swabbing, tree roots, accumulated mud, debris, scaling, plugging, mild corrosion, moderate corrosion, severe corrosion, dislocation, fracture, dislocation, deformation, intrusion, mild leakage, moderate leakage, and severe leakage.

The beneficial effects obtained by the invention are as follows:

(1) by adopting the convolutional neural network with a complex model, the characteristics do not need to be extracted subjectively, and the accuracy of disease classification is improved;

(2) the time stamp characteristics of the image to be detected are combined, the types of the classification characteristics are enriched, and the time stamp characteristics are utilized to help to distinguish non-disease image frames and irrelevant image frames (such as irrelevant images shot when a pipeline detection periscope or a pipeline detection robot is put into a well or recovered by workers) in a detection video;

(3) can distinguish 17 kinds of pipeline diseases, almost covers all kinds of diseases of the current urban pipeline, and has wider application range.

Drawings

The invention will be further understood from the following description in conjunction with the accompanying drawings. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the embodiments; like reference numerals designate corresponding parts throughout the different views.

FIG. 1 is a block diagram of a network architecture of the present invention;

FIG. 2 is a process flow diagram of the lower multi-label classification layer of the present invention;

fig. 3 is a diagram illustrating a convolutional neural network structure in the prior art.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to embodiments thereof; it should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. Other systems, methods, and/or features of the present embodiments will become apparent to those skilled in the art upon review of the following detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the accompanying claims. Additional features of the disclosed embodiments are described in, and will be apparent from, the detailed description that follows.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by the terms "upper", "lower", "left", "right", etc. based on the orientation or positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but it is not intended to indicate or imply that the device or component referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limiting the present patent, and the specific meaning of the terms described above will be understood by those of ordinary skill in the art according to the specific circumstances.

The first embodiment is as follows:

step 2: calculating the time stamp characteristics of each image;

The multi-label convolutional neural network model comprises an upper-layer inclusion-ResNet-v 2 network structure and a lower-layer multi-label classification layer, and the upper-layer inclusion-ResNet-v 2 network structure further comprises a random inactivation layer.

The processing steps of the lower multi-label classification layer are as follows:

the first step is as follows: adding a timestamp feature Timefeature in a feature vector output by a random inactivation layer in an upper-layer inclusion-ResNet-v 2 network structure;

the second step is that: performing dimension reduction activation processing on the feature vector added with the timestamp feature in the first step to obtain a medium latitude value;

The calculation formula of the timestamp feature TimeFeture is as follows:

the Current _ Frame _ Index is the serial number of the position of the to-be-detected pipeline endoscopic image in the whole pipeline endoscopic detection video, and the All _ Frames _ Num is the total Frame number of the whole pipeline endoscopic detection video.

The types of the pipeline diseases comprise: swabbing, tree roots, accumulated mud, debris, scaling, plugging, mild corrosion, moderate corrosion, severe corrosion, dislocation, fracture, dislocation, deformation, intrusion, mild leakage, moderate leakage, and severe leakage.

Example two:

in this embodiment, the content is further described by combining the above embodiments, so that a person skilled in the art can more clearly understand the implementation of the present invention, wherein a multi-label classification layer is added to the present invention on the basis of the existing inclusion-ResNet-v 2 network to implement the classification function of multiple pipe defect images, and the present invention replaces the SoftMax classifier therein with the multi-label classification layer, so that the inclusion-ResNet-v 2 has the multi-label classification function, thereby maximally detecting the types of defects that should be detected, as shown in fig. 2.

Wherein the random deactivation layer of the upper inclusion-ResNet-v 2 network structure outputs a 1792-dimensional feature vector, corresponding to layer X in fig. 2. The invention adds a one-dimensional feature behind the vector, as shown by a light gray frame in fig. 2, the feature is the time stamp feature of the image to be detected, and the calculation is performed by using a formula 1,

the Current _ Frame _ Index in the formula is the serial number of the position of the to-be-detected pipeline endoscopic image in the whole pipeline endoscopic detection video, the All _ Frames _ Num is the total Frame number of the whole pipeline endoscopic detection video, and the value range of the All _ Frame _ Num is 0-1.

The H and C layers in fig. 2 are the main components of the multi-tag classification layer, wherein the H layer contains 1024 active processing units, namely circles; the layer C comprises 17 disease type identifications output in a single-hot coding mode, and the disease type identifications are respectively as follows from left to right:

c = [ hollow water, tree root, accumulated mud, sundries, scaling, plugging, slight corrosion, moderate corrosion, severe corrosion, disjointing, cracking, stagger, deformation, invasion, slight leakage, moderate leakage and severe leakage ].

For example, if there are tree roots and mud accumulation diseases in the image, the C layer outputs a vector [0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0], and if there are no diseases, outputs a vector [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0].

Example three:

in the embodiment, the content is further described by combining the above embodiments, so that a person skilled in the art can more clearly understand the implementation of the present invention, wherein a multi-label classification layer is added to the present invention on the basis of the existing inclusion-ResNet-v 2 network, so as to implement the classification function of multiple pipeline disease images, and the present invention replaces the SoftMax classifier therein with the multi-label classification layer, so that the inclusion-ResNet-v 2 has the multi-label classification function, thereby maximally detecting the types of diseases that should be detected.

As shown in fig. 2, wherein the X layer is the output vector of the random deactivation layer in the upper inclusion-ResNet-v 2 network structure shown.

The random inactivation layer of the original inclusion-ResNet-v 2 network structure only outputs the information of the former 1792 dimension, and the invention adds the time information of one dimension on the basis.

The process of converting the X layer into the H layer is a forward calculation process of the fully-connected neural network, and aims to properly reduce the dimension of the 1792+ 1-dimensional vector into 1024 dimensions. For convenience of description, several symbols are appropriately added to the drawings, wherein X ₁ ～X ₁₇₉₃ Represents the output of the original Incepration-ResNet-v 2 random deactivation layer plus a new dimension in the text; h ₁ ～X ₁₀₂₄ Representing the dimension values of 1793-dimensional vector into 1024-dimensional vector after H-layer activation, C ₁ ～C ₁₇ Representing the 17-dimensional unique thermal coding obtained after C-layer processing; w _x->H Is a weight vector, which represents the connecting line connecting the X layer and the H layer, and the total number of dimensions is 1973 × 1024, because all the dimensions of the X layer are connected with all the dimensions of the H layer (so called full connection), the calculation process is as follows:

with H in H layer ₁ For example, as shown in equation 2,

wherein W _xi->H1 Indicates that the ith element of the X layer is connected with the H layer H ₁ Is the connection weight, and sigmoid is suchA function capable of mapping real numbers to numbers between 0 and 1, the function being expressed as

The image is S-shaped, and the value range is between 0 and 1. H ₂ ～H ₁₀₂₄ Is calculated in a similar manner to equation 1.

The process of H-layer to C-layer conversion, similar to the above calculation method, uses equation 3, with C ₁ The calculation process of (a) is taken as an example,

all C are calculated _i According to the characteristics of sigmoid, at the C layer, a 17-dimensional vector with element values between 0 and 1 is output, if the output vector is C = {0 (i.e. C) ₁ = 0), 0.8 (i.e., C) ₂ =0.8, and so on thereafter), 0.9,0,0,0,0,0,0,0,0,0,0,0,0,0,0}.

According to C = [ hollow water, tree roots, accumulated mud, sundries, scaling, plugging, slight corrosion, moderate corrosion, severe corrosion, disjointing, cracking, staggered opening, deformation, invasion, slight leakage, moderate leakage and severe leakage ], the represented pipeline endoscopic image has tree root diseases at 80% probability and accumulated mud diseases at 90% probability.

The original inclusion-ResNet-v 2 network structure described in the present invention is the prior art, and the description of each step according to the prior art can be mainly divided into an input layer, a Stem initial convolution layer, a 5 × inclusion × ResNet-a residual layer, a Reduction-a Reduction layer, a 10 × inclusion × ResNet-a residual layer, a Reduction-B Reduction layer, a 5 × inclusion × ResNet-C residual layer, a global average pooling layer, a random deactivation layer, and a Softmax classifier, where the upper layer described in the present invention includes the input layer, the Stem initial convolution layer, the 5 × inclusion × ResNet-a residual layer, the Reduction-a Reduction layer, the 10 × inclusion × ResNet-a residual layer, the Reduction-B Reduction layer, the 5 × inclusion × res-C residual layer, the global average pooling layer, and the random deactivation layer, and the specific actions and relationships of each layer are not described in the prior art one by one, and the present invention is not described any more.

As shown in fig. 2, wherein the X layer is the output vector of the random deactivation layer in the inclusion-ResNet-v 2 network structure shown in fig. 2.

The process of converting the X layer into the H layer is a forward calculation process of the fully-connected neural network, and aims to properly reduce the dimension of the 1792+ 1-dimensional vector into 1024 dimensions. For convenience of description, several symbols are appropriately added to the figure, wherein X ₁ ～X ₁₇₉₃ Represents the output of the original Incepration-ResNet-v 2 random deactivation layer plus one dimension newly added in the text; h ₁ ～X ₁₀₂₄ Representing the dimension values of 1793-dimensional vector into 1024-dimensional vector after H-layer activation, C ₁ ～C ₁₇ Representing the 17-dimensional unique hot code obtained after C-layer processing; w _x->H Is a weight vector, which represents the connecting line connecting the X layer and the H layer, and the total number of dimensions is 1973 × 1024, because all the dimensions of the X layer are connected with all the dimensions of the H layer (so called full connection), the calculation process is as follows:

with H in H layer ₁ For example, as shown in equation 2,

wherein W _xi->H1 Indicates that the ith element of the X layer is connected with the H layer H ₁ And sigmoid is a function that can map a real number to a number between 0 and 1, the expression of the function being

H-layer to C-layer transitionUsing equation 3, as C, similar to the above calculation method ₁ The calculation process of (a) is taken as an example,

all C are calculated _i According to the characteristics of sigmoid, at the C layer, a 17-dimensional vector with element values between 0 and 1 is output, if the output vector is a vector

C = {0 (i.e., C) ₁ = 0), 0.8 (i.e., C) ₂ =0.8, and so on) 0.9,0,0,0,0,0,0,0,0,0,0,0,0,0,0}

Although the invention has been described above with reference to various embodiments, it should be understood that many changes and modifications may be made without departing from the scope of the invention. That is, the methods, systems, and devices discussed above are examples. Various configurations may omit, substitute, or add various procedures or components as appropriate. For example, in alternative configurations, the methods may be performed in an order different than described, and/or various components may be added, omitted, and/or combined. Moreover, features described with respect to certain configurations may be combined in various other configurations, as different aspects and elements of the configurations may be combined in a similar manner. Further, elements therein may be updated as technology evolves, i.e., many elements are examples and do not limit the scope of the disclosure or claims.

Specific details are given in the description to provide a thorough understanding of the exemplary configurations including implementations. However, configurations may be practiced without these specific details, for example, well-known circuits, processes, algorithms, structures, and techniques have been shown without unnecessary detail in order to avoid obscuring the configurations. This description provides example configurations only, and does not limit the scope, applicability, or configuration of the claims. Rather, the foregoing description of the configurations will provide those skilled in the art with an enabling description for implementing the described techniques. Various changes may be made in the function and arrangement of elements without departing from the spirit or scope of the disclosure.

In conclusion, it is intended that the foregoing detailed description be regarded as illustrative rather than limiting, and that it be understood that these examples are illustrative only and are not intended to limit the scope of the invention. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims

1. A pipeline disease image classification method based on a multi-label convolutional neural network is characterized by comprising the following steps:

step 2: calculating the time stamp characteristics of each image;

and 4, step 4: detecting an endoscopic image of a pipeline to be detected by using a trained multi-label convolutional neural network model, outputting a single-hot code by using the multi-label convolutional neural network model, and determining the type of the existing pipeline diseases according to the single-hot code;

the multi-label convolutional neural network model comprises an upper-layer inclusion-ResNet-v 2 network structure and a lower-layer multi-label classification layer, wherein the upper-layer inclusion-ResNet-v 2 network structure further comprises a random inactivation layer;

the second step: performing dimension reduction activation processing on the feature vector added with the timestamp feature in the first step to obtain a medium latitude value;

2. The method for classifying the pipeline disease images based on the multi-tag convolutional neural network as claimed in claim 1, wherein the calculation formula of the timestamp feature is as follows:

3. The method for classifying the pipeline disease images based on the multi-label convolutional neural network as claimed in claim 1, wherein the pipeline disease types comprise: swabbing, tree roots, accumulated mud, debris, scaling, plugging, mild corrosion, moderate corrosion, severe corrosion, dislocation, fracture, dislocation, deformation, intrusion, mild leakage, moderate leakage, and severe leakage.