CN114648534A

CN114648534A - Pipe network defect intelligent identification method and device based on video frame clustering, and medium

Info

Publication number: CN114648534A
Application number: CN202210566909.8A
Authority: CN
Inventors: 周政瀚; 罗标; 肖淼文; 张雪; 侯智焱
Original assignee: Chengdu Univeristy of Technology
Current assignee: Chengdu Univeristy of Technology
Priority date: 2022-05-24
Filing date: 2022-05-24
Publication date: 2022-06-21

Abstract

A pipe network defect intelligent identification method based on video frame clustering, a device and a medium thereof are provided, wherein the method comprises the following steps: extracting a characteristic value of each video frame in the video data and generating a characteristic value set; inputting a video frame characteristic value set, presetting a threshold value, dividing the video frame characteristic value set into a first category, dividing the first video frame characteristic value in the video frame characteristic value set into a first category of centroids, calculating Euclidean distances between each subsequent video frame characteristic value and the centroids of the existing i categories in order, comparing the Euclidean distances with the preset threshold value, and further belonging the video frame characteristic value set to one of the existing i categories or dividing the video frame characteristic value set into a new (i + 1) th category and dividing the video frame characteristic value set into the centroids of the new category; and selecting the video frame characteristic value closest to the corresponding centroid in each category as a key frame characteristic value. The invention not only improves the automatic detection efficiency and accuracy of the pipe network defect, but also reduces the labor intensity of workers, and has great popularization and application values in the pipe network video defect detection.

Description

Pipe network defect intelligent identification method and device based on video frame clustering, and medium

Technical Field

The invention relates to the technical field of intelligent detection of pipe network defects, in particular to a pipe network defect intelligent identification method, a pipe network defect intelligent identification device and a pipe network defect intelligent identification medium based on video frame clustering.

Background

The underground drainage pipe network is an important component of urban drainage, and along with the increase of the service life of the underground drainage pipe network, the drainage pipe network gradually has the defects of deformation, damage, corrosion, fracture, leakage and the like, so that the serious hazards of pipeline burst, waterlogging, pavement collapse and the like are caused, and great economic loss and personal harm are caused.

The current underground pipe network detection mainly includes manual detection and Closed Circuit Television (CCTV) robot detection. The difference between the two detection methods is that the video data acquisition methods are different, the manual detection is that the video data is acquired manually, and the CCTV robot detection is that the video data is acquired by a robot camera. The video data acquired by the two methods are interpreted manually, then the pipeline defect assessment is carried out, and an industry detection report is generated. The two methods are judged by operators with rich experience in the pipeline defect detection stage, and the operators in the industry have uneven levels and strong liquidity and are insufficient in mastering industrial regulations and standards. In addition, the two methods require workers to check the equipment and the environment on the spot, and are long in time consumption, large in personnel demand, low in efficiency and poor in accuracy. Therefore, the existing pipe network defect detection method has great limitation and has great improvement space.

Disclosure of Invention

The invention provides a pipe network defect intelligent identification method and device based on video frame clustering, which can improve the pipe network defect detection efficiency and improve the pipe network defect detection accuracy.

The specific technical scheme of the invention is as follows:

according to a first technical scheme of the invention, a method for intelligently identifying pipe network defects based on video frame clustering is provided, and the method comprises the following steps: extracting the characteristic value of each video frame in the video data to generate a video frame characteristic value set X = { X = { (X) }₁… x_k…x_nAnd presetting a threshold; dividing a first category, dividing a first video frame characteristic value in a video frame characteristic set into a first category centroid, calculating Euclidean distance between each subsequent video frame characteristic value and the centroid of an existing i (i is more than or equal to 1 and less than or equal to n) category in sequence, comparing the Euclidean distance with a preset threshold value, and further belonging the video frame characteristic value to one of the existing i categories or dividing a new (i + 1) th category into a new category centroid;

and selecting the video frame closest to the corresponding centroid in each category as a key frame, and taking the arithmetic mean of the video frame characteristic values as the key frame characteristic value of a certain category if the video frame characteristic values closest to the centroid exist in the category.

According to a second technical scheme of the invention, an intelligent pipe network defect identification device based on video frame clustering is provided, which comprises a calculation unit, wherein the calculation unit is configured to: extracting the characteristic value of each video frame in the video data to generate a video frame characteristic value set X = { X = { (X) }₁…x_k…x_nAnd presetting a threshold;

determining a first class, dividing a first video frame characteristic value in a video frame characteristic set into a first class centroid, calculating Euclidean distance between each subsequent video frame characteristic value and the centroid of an existing i (i is more than or equal to 1 and less than or equal to n) class in order, comparing the Euclidean distance with a preset threshold value, and further belonging the video frame characteristic value to a certain class in the existing i classes or dividing a new i +1 class into new centroids of the new classes. Calculating the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth class (j is more than or equal to 1 and less than or equal to i), if the Euclidean distance value is less than a preset threshold value, classifying the characteristic value of the video frame into the jth class, taking the arithmetic mean of the characteristic values of all the video frames in the jth class to update the centroid of the jth class, if the Euclidean distance value is more than or equal to the preset threshold value and j is not equal to i, calculating the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth +1 class and comparing the Euclidean distance with the preset threshold value again, and if the calculated Euclidean distance is more than or equal to the preset threshold value of the jth class and j is equal to i, dividing the ith +1 class and dividing the characteristic value of the video frame into the centroid of the ith +1 class;

and selecting the video frame characteristic value closest to the corresponding centroid in each category as a key frame characteristic value, and taking the arithmetic mean of the video frame characteristic values as the key frame characteristic value of a certain category if the video frame characteristic values closest to the centroid exist in the category.

According to a third aspect of the present invention, there is provided a computer-readable storage medium having stored thereon computer-readable instructions, which, when executed by a processor of a computer, cause the computer to perform the method according to any one of the embodiments of the present invention.

According to the pipe network defect intelligent identification method, the pipe network defect intelligent identification device and the pipe network defect intelligent identification medium based on the video frame clustering, disclosed by each embodiment of the invention, the automatic detection efficiency and the accuracy of the pipe network defects can be improved, the labor intensity of workers can be reduced, and the pipe network defect intelligent identification method, the pipe network defect intelligent identification device and the medium have great popularization and application values in pipe network video defect detection.

Drawings

In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.

Fig. 1 shows a flowchart of an intelligent pipe network defect identification method based on video frame clustering according to an embodiment of the present invention.

Fig. 2 shows a flow chart of a video frame clustering method according to an embodiment of the present invention.

Fig. 3 shows a clustering result diagram of an intelligent pipe network defect identification method based on video frame clustering according to an embodiment of the invention.

Fig. 4 shows a network structure diagram of an AlexNet network model according to an embodiment of the present invention.

FIG. 5 shows a schematic diagram of a classifier according to an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention will now be further described with reference to the accompanying drawings.

Fig. 1 shows a flowchart of an intelligent pipe network defect identification method based on video frame clustering according to an embodiment of the present invention. The embodiment of the invention provides a pipe network defect intelligent identification method based on video frame clustering, and as shown in fig. 1, the method starts with step S100, and extracts each video frame characteristic value in video data to generate a video frame characteristic value set X = { X = (the first step is a video frame characteristic value set of video frames) = and includes the steps of₁…x_k…x_nAnd presetting a threshold value.

In step S200, a first category is determined and the first video frame feature value in the video frame feature value set is divided into a first category centroid, and then each video frame feature value is orderly compared with the centroid of the existing i (i is greater than or equal to 1 and less than or equal to n) category to calculate the euclidean distance and compare the euclidean distance with the preset threshold value, so as to belong to a certain category in the existing i categories or divide the video frame feature value into a new (i + 1) th category and divide the video frame feature value into the centroid of the new category.

Wherein, step S200 is specifically implemented as: calculating the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth class (j is more than or equal to 1 and less than or equal to i), if the Euclidean distance value is less than a preset threshold value, classifying the characteristic value of the video frame into the jth class, and taking the arithmetic mean of the characteristic values of all the video frames in the jth class to update the centroid of the jth class, if the Euclidean distance value is more than or equal to the preset threshold value and j is not equal to i, calculating the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth +1 class and comparing the Euclidean distance with the preset threshold value again, and if the calculated Euclidean distance is more than or equal to the preset threshold value of the jth class and j is equal to i, dividing the ith +1 class and dividing the characteristic value of the video frame into the centroid of the ith +1 class. And (4) circulating the characteristic value of the kth video frame until the characteristic value belongs to a certain class in the existing i classes or a new class is generated and the characteristic value of the video is taken as the centroid of the new class. By analogy, each feature value in the video frame feature value set X is processed as above, so that a plurality of categories and centroids of the plurality of categories can be obtained, and further, a cluster map of all the video frame feature values can be obtained.

And step S300, selecting the video frame characteristic value closest to the corresponding centroid in each category as a key frame characteristic value, and taking the arithmetic mean of the video frame characteristic values as the key frame characteristic of a certain category if the video frame characteristic values closest to the centroid exist in the category. Different types of defect characteristics of the network can be basically reflected through the key frame characteristic values, so that the automatic defect detection efficiency is improved.

The Euclidean distance calculation formula is shown as a formula (1):

formula (1)

In the formula,

is Euclidean distance between a certain video frame characteristic value and a certain centroid, wherein the centroids comprise a first class centroid and a new class centroid, and

for a point in m-dimensional Euclidean space where a feature value of a certain video frame is located,

is a point in an m-dimensional Euclidean space where a certain centroid is located, m is a plurality of characteristics contained in the characteristic value of the video frame,

for the value of the ith dimension in m-dimensional euclidean space for the k-th video frame feature value,

the value of the ith dimension in the m-dimensional Euclidean space is the characteristic value of the jth class centroid.

The feature values of all video frames in the first class are arithmetically averaged to update the centroid of the ith class by the following formula (2):

formula (2)

In the formula

Represents the updated class j centroid,

representing a set of feature values belonging to the j-th category,

shall belong to

Of a certain video frame.

Illustratively, as shown in fig. 2, the input video frame feature value set X = { X =₁…x_k…x_nAnd presetting a threshold value. Classifying a first category and assigning a first video frame feature value in a video frame feature setAnd dividing the video frames into a first class of centroids, orderly calculating Euclidean distances between the characteristic values of each subsequent video frame and the centroids of the existing i (i is more than or equal to 1 and less than or equal to n) classes, comparing the Euclidean distances with a preset threshold value, and further belonging the video frames to one of the existing i classes or dividing a new i +1 th class as the centroid of the new class. Specifically, the Euclidean distance between the characteristic value of the kth video frame and the class centroid of the j (j is more than or equal to 1 and less than or equal to i) is calculated, and if the Euclidean distance is less than a preset threshold value, the characteristic value of the video frame is classified into the class centroid of the kth video framejIn class and tojThe arithmetic mean of all video frame feature values in class is taken to updatejAnd if the Euclidean distance value is greater than or equal to a preset threshold value and j is not equal to i, calculating the Euclidean distance between the characteristic value of the kth video frame and the jth +1 class centroid and comparing the Euclidean distance with the preset threshold value again, and if the calculated Euclidean distance value is greater than or equal to the preset jth class threshold value and j is equal to i, dividing the ith +1 class and dividing the characteristic value of the video frame into the ith +1 class centroid. And (4) circulating the characteristic values of the k-th video frame until the characteristic values belong to a certain class in the existing i classes or a new class is generated and the characteristic values of the video are divided into centroids of the new class. By analogy, each feature value in the video frame feature value set X is processed as above, so that a plurality of categories and centroids of the plurality of categories can be obtained, and further, all video frame feature value cluster maps are obtained. As shown in fig. 3, the video frame feature value closest to the centroid of the class i is divided into the key frame feature value for output. And if a certain class has a plurality of video frame characteristic values closest to the centroid, taking the arithmetic mean of the video frame characteristic values as the key frame characteristic value of the class.

In some embodiments, as shown in fig. 1, after step S300, step S400 is further included, in which a classifier is used to perform secondary classification on the keyframe, and a pipe network defect identification result is output.

In some embodiments, the pipe network defect identification process takes a pipeline closed circuit television video as input, firstly, the video is segmented into continuous image frames, then, each frame of image is sent into a trained AlexNet network for feature extraction, each video frame feature value in video data is extracted, and a video frame feature value set X = { X } is generated₁…x_k…x_nAnd presetting a threshold; dividing a first category and dividing a first video frame characteristic value in a video frame characteristic set into a first category centroid, calculating Euclidean distance between each subsequent video frame characteristic value and the centroid of the existing i (i is more than or equal to 1 and less than or equal to n) category in order, comparing the Euclidean distance with a preset threshold value, and further attributing the video frame characteristic value to a certain category in the existing i categories or generating a new (i + 1) th category and dividing the video frame characteristic value into the new category centroid. Specifically, the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth class (j is more than or equal to 1 and less than or equal to i) is calculated, if the Euclidean distance value is less than a preset threshold value, the characteristic value of the video frame is classified into the jth class, all the characteristic values of the video frame in the jth class are arithmetically averaged to update the centroid of the jth class, if the Euclidean distance value is more than or equal to the preset threshold value and j is not equal to i, the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth +1 class is calculated and compared with the preset threshold value again, and if the calculated Euclidean distance is more than or equal to the preset threshold value of the jth class and j is equal to i, the ith +1 class is divided and the characteristic value of the video frame is divided into the centroid of the ith +1 class. And (4) circulating the characteristic values of the k-th video frame until the characteristic values belong to a certain class in the existing i classes or a new class is divided, and dividing the characteristic values of the video into centroids of the new class. By analogy, each feature value in the video frame feature value set X is processed as above, so that a plurality of categories and centroids of the categories can be obtained, and further all video frame cluster maps are obtained; and selecting the video frame characteristic value closest to the corresponding centroid in each category as a key frame characteristic value, and taking the arithmetic mean of the video frame characteristic values as the key frame characteristic value of a certain category if the video frame characteristic values closest to the centroid exist in the category.

The network model in the embodiment of the invention can be an AlexNet neural network, and the AlexNet neural network mainly comprises a convolutional layer, a maximum pooling layer, an activation function, Local Response Normalization (Local Response Normalization) and a full connection layer. The classifier may be a support vector machine, SVM. The AlexNet neural network and the SVM can be trained on the basis of a data set through the following method, so that the extraction of the characteristic value of each video frame in the video data and the secondary classification of the characteristic value of the key frame are respectively realized, and the pipe network defect recognition result is output.

Specifically, an AlexNet neural network model is used for carrying out feature extraction on an input video frame image, a complete AlexNet neural network is established, training is carried out by using training data, and the trained model is stored. And inputting training data into the model to obtain characteristic values, and then using the characteristic values of the video frames as input data of the SVM classifier to train the SVM classifier. During testing, the characteristic values of the video frames are extracted and the result is predicted by the SVM, and the optimal model of each SVM is stored.

The process of training the AlexNet neural network model is as follows. Establishing a data set: extracting video frames from historical pipe network detection data and drainage pipe network shooting videos and establishing a data set, labeling the extracted video frames for each frame of image in the data set according to requirements in drainage pipe network detection and evaluation regulations to form a pipeline image set S and an image tag set L, wherein for each image X (n) in the pipeline image set S, the image tag set L corresponding to each image X (n) is provided, and L (1), L (2), … L (o)) represents the pipeline abnormity type of the X (n) th image; and the data set S is divided into a training set S1, a verification set S2 and a test set S3 according to the proportion of 60%, 20% and 20%. The training set S1 images were cropped and then the AlexNet network was trained using the training set S1.

The AlexNet neural network model network structure shown in fig. 4 mainly includes 5 convolutional layers, 3 pooling layers, and 3 fully-connected layers, where the first convolutional layer mainly uses 96 cores with convolution kernel size of 11 × 3, step size stride =4, and extended edge pad = 0; activating the function ReLU and performing local normalization (LRN); then performing maximum pooling, wherein the pooling size is 3 x 3, and the step length stride = 2; a second convolution layer, using 256 kernels with convolution kernel size 5 x 48, step size stride =1, and extended edge pad = 2; activating the function ReLU and performing local normalization (LRN); then performing maximum pooling, wherein the pooling size is 3 x 3, and the step length stride = 2; a third convolution layer, adopting 384 convolution kernels with the size of 3 × 256, the step size stride =1, and the extended edge pad = 1; then activated using ReLU; a fourth convolution layer, adopting 384 convolution kernels with the size of 3 × 256, with the step size stride =1, and the extended edge pad = 1; then activated using ReLU; a fifth convolution layer, wherein 256 convolution kernels with the sizes of 3 × 256 are adopted, the step size stride =1, and the extended edge pad = 1; then activating using ReLU followed by maximum pooling level of 3 x 3 with step stride = 2; the sixth layer is a full connection layer, and the number of the neurons is 4096; the ReLU activation function generates 4096 values; the seventh layer is a full connection layer, and the number of the neurons is 4096; the ReLU activation function generates 4096 values; the eighth layer is an output layer, and 4096 data output by the seventh layer are all connected with 1000 neurons of the eighth layer to output a characteristic value.

In a neural network, a ReLU function is generally used to increase a nonlinear factor, and introducing nonlinearity can effectively alleviate the gradient disappearance problem and increase the expressive power of the network, as shown in formula (3):

formula (3)

In the formula,xrepresenting the output of the connected previous layer network structure.

In the neural network, the neuron output is mapped by using an activation function in a non-linear way, but in order to prevent gradient explosion and improve the generalization capability of the model, the result obtained by the ReLU is normalized, and the normalization formula is shown as formula (4):

formula (4)

In the formula,

is a value after the normalization, and is,

represents the output value of the activation function, wherein a represents the convolution kernel to be calculated, t represents the t-th channel, g, h represents the value to be normalizedThe position coordinates and the position of the width and height dimensions of the image are not more than the width and height of the image after the convolution of the image;

the convolution kernel representing the required calculation is in the second placedThe characteristics of the individual channels are such that,drepresents taking 0 to

Maximum value, z represents

The range of the neighborhood, if meet the boundary condition, complement with 0; n represents the total number of convolution kernels;

a constant representing the case of dividing 0 to prevent it from occurring,

the indication constants are all adjustable parameters that,

the number of the symbols representing the constant number,

all are manually set hyper-parameters.

And (3) using the feature values of the image frames extracted from the AlexNet network as a training set, training a Support Vector Machine (SVM), and performing secondary classification on the extracted key frames, wherein the number of SVM is determined by the number of the types of the defects of the pipe network.

Specifically, first, data and a learning target are input, wherein the data is a feature vector of each image frame to thereby constitute a feature space, and the learning target is set to a binary variable

Indicating a class of absence and having a certain type of deficiency. The feature space where the input data is located uses a decision boundary (decision boundary) hyperplane to carry out learning object according to the condition of absenceTraps are separated from defective ones (with some kind of defect).

The above-mentioned partition hyperplane decision boundary calculation formula is:

formula (5)

Wherein

A normal vector representing the hyperplane,Twhich represents a transposition of the first image,frepresents the intercept of the hyperplane, and the intercept of the hyperplane,Xrepresenting the eigenvalues of some training sample of the input.

As long as the normal vector is determined

And interceptfA divided hyperplane can be uniquely determined. The decision boundary divides the basis vector space into two sets, and the classifier classifies all points on one side of the decision boundary as belonging to one class and all points on the other side as belonging to the other class.

The distance formula from the points on the two sides of the hyperplane to the hyperplane is calculated as follows:

formula (6)

Wherein,drepresenting the point-to-hyperplane distance on either side of the hyperplane,

a normal vector representing the hyperplane is shown,Twhich represents the transpose of the image,frepresents the intercept of the hyperplane,Xrepresenting the characteristic value of a certain input training sample;

and verifying the trained SVM by using the S2 verification set, judging whether the trained SVM has an optimal hyperplane or not, and enabling the model to be in an optimal state by adjusting hyperparameters. And selecting the optimal model, testing by using an S3 test set, estimating the generalization capability of the model, selecting the model with strong generalization capability, inputting the pipe network image to be detected into the model, detecting the defect, and obtaining the output result of the model. A plurality of SVM classifiers are used to output defect classes for the video frames as shown in fig. 5.

In summary, according to the pipe network defect intelligent identification method based on video frame clustering provided by the embodiment of the invention, each frame of image is subjected to feature extraction through the AlexNet neural network, each frame of image is clustered by using Euclidean distance, and finally, a key frame is extracted and an SVM is used for defect classification, so that the accurate judgment of the drainage pipeline defect category is realized. The automatic detection efficiency and accuracy of the pipe network defects can be improved, the labor intensity of workers can be reduced, and the automatic detection method has great popularization and application values in pipe network video defect detection.

The embodiment of the invention also provides a pipe network defect intelligent identification device based on video frame clustering, which comprises a computing unit, wherein the computing unit is configured to: extracting the characteristic value of each video frame in the video data to generate a video frame characteristic value set X = { X = { (X) }₁…x_k…x_nAnd presetting a threshold;

dividing a first category and dividing a first video frame characteristic value in a video frame characteristic set into a first category centroid, calculating Euclidean distance between each subsequent video frame characteristic value and the centroid of the existing i (i is more than or equal to 1 and less than or equal to n) category in order, comparing the Euclidean distance with a preset threshold value, and further belonging the video frame characteristic value to a certain category in the existing i categories or dividing a new (i + 1) th category into the centroid of a new category. Specifically, the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth class (j is more than or equal to 1 and less than or equal to i) is calculated, if the Euclidean distance value is less than a preset threshold value, the characteristic value of the video frame is classified into the jth class, all the characteristic values of the video frame in the jth class are arithmetically averaged to update the centroid of the jth class, if the Euclidean distance value is more than or equal to the preset threshold value and j is not equal to i, the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth +1 class is calculated and compared with the preset threshold value again, and if the calculated Euclidean distance is more than or equal to the preset threshold value of the jth class and j is equal to i, the ith +1 class is divided and the characteristic value of the video frame is divided into the centroid of the ith +1 class. And (4) circulating the characteristic values of the k-th video frame until the characteristic values belong to a certain class in the existing i classes or a new class is divided, and dividing the characteristic values of the video into centroids of the new class. By analogy, each feature value in the video frame feature value set X is processed as above, so that a plurality of categories and centroids of the categories can be obtained, and further a cluster map of all the video frame feature values is obtained;

In some embodiments, the euclidean distance calculation formula is as shown in equation (1):

formula (1)

In the formula,

for the ith dimension of the eigenvalue of the jth class centroid in m-dimensional Euclidean spaceThe value of (c).

In some embodiments, the computing unit is further configured to: the feature values of all video frames in the first class are arithmetically averaged to update the centroid of the ith class by the following formula (2):

formula (2)

In the formula

Represents the updated class j centroid,

representing a set of feature values belonging to the j-th category,

shall belong to

Of a certain video frame.

In some embodiments, the apparatus further comprises a network model configured to extract a feature value for each video frame in the video data.

In some embodiments, the network model comprises 5 convolutional layers and 2 fully-connected layers connected in sequence; pooling layers are respectively arranged behind the first layer, the second layer and the fifth layer of convolution layers, each layer is provided with an activation function, an input video frame is transmitted in a forward direction, and the characteristics of a 7 th layer of full-connection layer are used as output to obtain corresponding video frame characteristic values; the activation function is shown in equation (3):

formula (3)

Wherein,xis the output of the connected upper-level network fabric.

In some embodiments, the convolved activation functions of the first and second layers are normalized using the local response of equation (4):

formula (4)

In the formula,

is a value after the normalization, and is,

representing the output value of the activation function, wherein a represents a convolution kernel required to be calculated, t represents a tth channel, g and h represent the position coordinates and the position of the width and height dimensions of the value to be normalized, and the size of the position coordinates and the position dimensions does not exceed the width and height of the image after the image is subjected to convolution;

the convolution kernel representing the required computation is in the second placedThe characteristics of the individual channels are such that,drepresents taking 0 to

Maximum value, z represents

The range of the neighborhood, if the boundary condition is met, the range is completed by 0; n represents the total number of convolution kernels;

a constant indicating the case of dividing 0 in order to prevent the occurrence,

the indication constants are all adjustable parameters that,

the number of the symbols representing the constant number,

all are manually set hyper-parameters.

In some embodiments, the apparatus further includes a classifier configured to perform a second classification on the keyframes, and output a pipe network defect identification result.

In some embodiments, the classifier is trained by:

based on the input data and the learning target, the feature space where the input data is located uses a hyperplane of a decision boundary to separate the learning target into a non-defective class and a defective class (with a certain defect); the input data is a feature vector of each video frame, and a learning target is set as a binary variable to represent a non-defective class and a defective class (having a certain defect);

the decision boundary separating the hyperplanes is calculated as:

formula (5)

Wherein,

a normal vector representing the hyperplane,Twhich represents the transpose of the image,frepresents the intercept of the hyperplane,Xrepresenting the characteristic value of a certain input training sample;

calculating the distance from the points on the two sides of the hyperplane to the hyperplane by the formula (6):

formula (6)

evaluating the hyperplane with a verification set, continuously updating the normal vector

And interceptfTo determine an optimal hyperplane.

The pipe network defect intelligent identification device based on video frame clustering provided by the embodiment of the invention has basically the same technical effect as the method explained in the foregoing, and the description is not repeated here.

Embodiments of the present invention also provide a computer-readable storage medium having stored thereon computer-readable instructions, which, when executed by a processor of a computer, cause the computer to perform the method according to any of the embodiments of the present invention.

The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being covered by the appended claims and their equivalents.

Claims

1. A pipe network defect intelligent identification method based on video frame clustering is characterized by comprising the following steps:

extracting the characteristic value of each video frame in the video data to generate a video frame characteristic value set X = { X = { (X) }₁…x_k…x_nAnd presetting a threshold;

determining a first category, dividing the first video frame characteristic value in the video frame characteristic value set into a first category centroid, sequentially calculating Euclidean distance between each subsequent video frame characteristic value and the centroid of the existing i categories, comparing the Euclidean distance with a preset threshold value, and further belonging the video frame characteristic value to one of the existing i categories or dividing the video frame characteristic value into a new (i + 1) th category and dividing the video frame characteristic value into the centroid of the new category:

calculating the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth class, if the Euclidean distance value is smaller than a preset threshold value, classifying the characteristic value of the video frame into the jth class, and taking arithmetic average to all the characteristic values of the video frame in the jth class to update the centroid of the jth class, if the Euclidean distance value is larger than or equal to the preset threshold value and j is not equal to i, calculating the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth +1 class and comparing the Euclidean distance with the preset threshold value again, and if the calculated Euclidean distance is larger than or equal to the preset threshold value of the jth class and j is equal to i, dividing the characteristic value of the kth video frame into the centroid of the ith +1 class and dividing the characteristic value of the video frame into the centroid of the ith +1 class;

2. The method of claim 1, wherein the euclidean distance is calculated as shown in formula (1):

formula (1)

In the formula,

is the value of the ith dimension in the m-dimensional Euclidean space for the eigenvalue of the jth class centroid.

3. The method of claim 1, wherein the i-th class centroid is updated by arithmetically averaging all video frame feature values in the i-th class by the following formula (2):

formula (2)

In the formula

Represents the updated class j centroid,

representing a set of feature values belonging to the j-th category,

shall belong to

Of a certain video frame.

4. The method of claim 1, wherein the extracting the feature value of each video frame in the video data comprises:

and extracting the characteristic value of each video frame in the video data by using an AlexNet neural network model.

5. The method of claim 4, wherein the AlexNet neural network model comprises sequentially connecting 5 convolutional layers and 2 fully-connected layers; pooling layers are respectively arranged behind the convolution layers of the first layer, the second layer and the fifth layer, each layer is provided with an activation function, an input video frame is transmitted in a forward direction, and the characteristics of a 7 th layer full-connection layer are used as output to obtain corresponding video frame characteristic values; the activation function is shown in equation (3):

formula (3)

In the formula,xis the output of the connected upper layer network structure.

6. The method of claim 5, wherein the convolved activation functions of the first and second layers are normalized using the local response of equation (4):

formula (4)

In the formula,

is a value after the normalization, and is,

Maximum value, z represents

a constant representing the case of dividing 0 to prevent it from occurring,

the indication constants are all adjustable parameters that,

the number of the symbols representing the constant number,

all are manually set hyper-parameters.

7. The method according to any one of claims 1-6, wherein after selecting the video frame feature value closest to the corresponding centroid in each category as the key frame feature value, the method further comprises:

and performing secondary classification on the key frame characteristic values by using a classifier, and outputting a pipe network defect identification result.

8. The method of claim 7, wherein the classifier is trained by:

based on the input data and the learning target, the feature space where the input data is located uses a decision boundary hyperplane to separate the learning target according to a non-defective class and a defective class; the input data is a characteristic value of each video frame, and the learning target is a binary variable

Indicating a non-defective class and a defective class;

the partition hyperplane decision boundary calculation formula is:

formula (5)

In the formula,

a normal vector representing the hyperplane,Twhich represents the transpose of the image,

represents the intercept of the hyperplane, and the intercept of the hyperplane,Xrepresenting the characteristic value of a certain input training sample;

calculating the distance from the points on both sides of the hyperplane to the hyperplane by the formula (6):

formula (6)

which represents the intercept of the hyperplane,Xrepresenting the eigenvalues of some training sample of the input.

9. A pipe network defect intelligent identification device based on video frame clustering is characterized by comprising a computing unit, wherein the computing unit is configured to:

extracting the characteristic value of each video frame in the video data to generate a video frame characteristic value set X = { X = { (X) }₁… x_k…x_nAnd presetting a threshold;

dividing a first category, dividing a first video frame characteristic value in a video frame characteristic set into a first category centroid, calculating Euclidean distance between each subsequent video frame characteristic value and the centroid of the existing i category in order, comparing the Euclidean distance with a preset threshold value, further belonging the video frame characteristic value to one of the existing i categories or dividing the video frame characteristic value into a new i +1 category, and dividing the video frame characteristic value into the centroid of the new category, specifically calculating the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth category, if the Euclidean distance value is smaller than the preset threshold value, classifying the video frame characteristic value into the jth category, arithmetically averaging all the video frame characteristic values in the jth category to update the centroid of the jth category, if the Euclidean distance value is larger than or equal to the preset threshold value and j is not equal to i, calculating the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth +1 category, and comparing the preset threshold value again, if the calculated Euclidean distance is larger than or equal to a preset jth class threshold value and j is equal to i, dividing the (i + 1) th class and dividing the video frame characteristic value into an (i + 1) th class centroid;

10. A computer-readable storage medium having computer-readable instructions stored thereon, which, when executed by a processor of a computer, cause the computer to perform the method of any one of claims 1-8.