CN114648534A - Pipe network defect intelligent identification method and device based on video frame clustering, and medium - Google Patents
Pipe network defect intelligent identification method and device based on video frame clustering, and medium Download PDFInfo
- Publication number
- CN114648534A CN114648534A CN202210566909.8A CN202210566909A CN114648534A CN 114648534 A CN114648534 A CN 114648534A CN 202210566909 A CN202210566909 A CN 202210566909A CN 114648534 A CN114648534 A CN 114648534A
- Authority
- CN
- China
- Prior art keywords
- video frame
- characteristic value
- centroid
- value
- category
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007547 defect Effects 0.000 title claims abstract description 51
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000006870 function Effects 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 15
- 230000004913 activation Effects 0.000 claims description 13
- 230000002950 deficient Effects 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 8
- 238000003062 neural network model Methods 0.000 claims description 5
- 230000004044 response Effects 0.000 claims description 4
- 230000000295 complement effect Effects 0.000 claims description 2
- 238000005192 partition Methods 0.000 claims description 2
- 238000012935 Averaging Methods 0.000 claims 2
- 238000001514 detection method Methods 0.000 abstract description 22
- 238000012706 support-vector machine Methods 0.000 description 14
- 238000013528 artificial neural network Methods 0.000 description 7
- 238000000605 extraction Methods 0.000 description 4
- 210000002569 neuron Anatomy 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000003213 activating effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 230000006378 damage Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000007797 corrosion Effects 0.000 description 1
- 238000005260 corrosion Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0004—Industrial image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
A pipe network defect intelligent identification method based on video frame clustering, a device and a medium thereof are provided, wherein the method comprises the following steps: extracting a characteristic value of each video frame in the video data and generating a characteristic value set; inputting a video frame characteristic value set, presetting a threshold value, dividing the video frame characteristic value set into a first category, dividing the first video frame characteristic value in the video frame characteristic value set into a first category of centroids, calculating Euclidean distances between each subsequent video frame characteristic value and the centroids of the existing i categories in order, comparing the Euclidean distances with the preset threshold value, and further belonging the video frame characteristic value set to one of the existing i categories or dividing the video frame characteristic value set into a new (i + 1) th category and dividing the video frame characteristic value set into the centroids of the new category; and selecting the video frame characteristic value closest to the corresponding centroid in each category as a key frame characteristic value. The invention not only improves the automatic detection efficiency and accuracy of the pipe network defect, but also reduces the labor intensity of workers, and has great popularization and application values in the pipe network video defect detection.
Description
Technical Field
The invention relates to the technical field of intelligent detection of pipe network defects, in particular to a pipe network defect intelligent identification method, a pipe network defect intelligent identification device and a pipe network defect intelligent identification medium based on video frame clustering.
Background
The underground drainage pipe network is an important component of urban drainage, and along with the increase of the service life of the underground drainage pipe network, the drainage pipe network gradually has the defects of deformation, damage, corrosion, fracture, leakage and the like, so that the serious hazards of pipeline burst, waterlogging, pavement collapse and the like are caused, and great economic loss and personal harm are caused.
The current underground pipe network detection mainly includes manual detection and Closed Circuit Television (CCTV) robot detection. The difference between the two detection methods is that the video data acquisition methods are different, the manual detection is that the video data is acquired manually, and the CCTV robot detection is that the video data is acquired by a robot camera. The video data acquired by the two methods are interpreted manually, then the pipeline defect assessment is carried out, and an industry detection report is generated. The two methods are judged by operators with rich experience in the pipeline defect detection stage, and the operators in the industry have uneven levels and strong liquidity and are insufficient in mastering industrial regulations and standards. In addition, the two methods require workers to check the equipment and the environment on the spot, and are long in time consumption, large in personnel demand, low in efficiency and poor in accuracy. Therefore, the existing pipe network defect detection method has great limitation and has great improvement space.
Disclosure of Invention
The invention provides a pipe network defect intelligent identification method and device based on video frame clustering, which can improve the pipe network defect detection efficiency and improve the pipe network defect detection accuracy.
The specific technical scheme of the invention is as follows:
according to a first technical scheme of the invention, a method for intelligently identifying pipe network defects based on video frame clustering is provided, and the method comprises the following steps: extracting the characteristic value of each video frame in the video data to generate a video frame characteristic value set X = { X = { (X) }1… xk…xnAnd presetting a threshold; dividing a first category, dividing a first video frame characteristic value in a video frame characteristic set into a first category centroid, calculating Euclidean distance between each subsequent video frame characteristic value and the centroid of an existing i (i is more than or equal to 1 and less than or equal to n) category in sequence, comparing the Euclidean distance with a preset threshold value, and further belonging the video frame characteristic value to one of the existing i categories or dividing a new (i + 1) th category into a new category centroid;
and selecting the video frame closest to the corresponding centroid in each category as a key frame, and taking the arithmetic mean of the video frame characteristic values as the key frame characteristic value of a certain category if the video frame characteristic values closest to the centroid exist in the category.
According to a second technical scheme of the invention, an intelligent pipe network defect identification device based on video frame clustering is provided, which comprises a calculation unit, wherein the calculation unit is configured to: extracting the characteristic value of each video frame in the video data to generate a video frame characteristic value set X = { X = { (X) }1…xk…xnAnd presetting a threshold;
determining a first class, dividing a first video frame characteristic value in a video frame characteristic set into a first class centroid, calculating Euclidean distance between each subsequent video frame characteristic value and the centroid of an existing i (i is more than or equal to 1 and less than or equal to n) class in order, comparing the Euclidean distance with a preset threshold value, and further belonging the video frame characteristic value to a certain class in the existing i classes or dividing a new i +1 class into new centroids of the new classes. Calculating the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth class (j is more than or equal to 1 and less than or equal to i), if the Euclidean distance value is less than a preset threshold value, classifying the characteristic value of the video frame into the jth class, taking the arithmetic mean of the characteristic values of all the video frames in the jth class to update the centroid of the jth class, if the Euclidean distance value is more than or equal to the preset threshold value and j is not equal to i, calculating the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth +1 class and comparing the Euclidean distance with the preset threshold value again, and if the calculated Euclidean distance is more than or equal to the preset threshold value of the jth class and j is equal to i, dividing the ith +1 class and dividing the characteristic value of the video frame into the centroid of the ith +1 class;
and selecting the video frame characteristic value closest to the corresponding centroid in each category as a key frame characteristic value, and taking the arithmetic mean of the video frame characteristic values as the key frame characteristic value of a certain category if the video frame characteristic values closest to the centroid exist in the category.
According to a third aspect of the present invention, there is provided a computer-readable storage medium having stored thereon computer-readable instructions, which, when executed by a processor of a computer, cause the computer to perform the method according to any one of the embodiments of the present invention.
According to the pipe network defect intelligent identification method, the pipe network defect intelligent identification device and the pipe network defect intelligent identification medium based on the video frame clustering, disclosed by each embodiment of the invention, the automatic detection efficiency and the accuracy of the pipe network defects can be improved, the labor intensity of workers can be reduced, and the pipe network defect intelligent identification method, the pipe network defect intelligent identification device and the medium have great popularization and application values in pipe network video defect detection.
Drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.
Fig. 1 shows a flowchart of an intelligent pipe network defect identification method based on video frame clustering according to an embodiment of the present invention.
Fig. 2 shows a flow chart of a video frame clustering method according to an embodiment of the present invention.
Fig. 3 shows a clustering result diagram of an intelligent pipe network defect identification method based on video frame clustering according to an embodiment of the invention.
Fig. 4 shows a network structure diagram of an AlexNet network model according to an embodiment of the present invention.
FIG. 5 shows a schematic diagram of a classifier according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention will now be further described with reference to the accompanying drawings.
Fig. 1 shows a flowchart of an intelligent pipe network defect identification method based on video frame clustering according to an embodiment of the present invention. The embodiment of the invention provides a pipe network defect intelligent identification method based on video frame clustering, and as shown in fig. 1, the method starts with step S100, and extracts each video frame characteristic value in video data to generate a video frame characteristic value set X = { X = (the first step is a video frame characteristic value set of video frames) = and includes the steps of1…xk…xnAnd presetting a threshold value.
In step S200, a first category is determined and the first video frame feature value in the video frame feature value set is divided into a first category centroid, and then each video frame feature value is orderly compared with the centroid of the existing i (i is greater than or equal to 1 and less than or equal to n) category to calculate the euclidean distance and compare the euclidean distance with the preset threshold value, so as to belong to a certain category in the existing i categories or divide the video frame feature value into a new (i + 1) th category and divide the video frame feature value into the centroid of the new category.
Wherein, step S200 is specifically implemented as: calculating the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth class (j is more than or equal to 1 and less than or equal to i), if the Euclidean distance value is less than a preset threshold value, classifying the characteristic value of the video frame into the jth class, and taking the arithmetic mean of the characteristic values of all the video frames in the jth class to update the centroid of the jth class, if the Euclidean distance value is more than or equal to the preset threshold value and j is not equal to i, calculating the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth +1 class and comparing the Euclidean distance with the preset threshold value again, and if the calculated Euclidean distance is more than or equal to the preset threshold value of the jth class and j is equal to i, dividing the ith +1 class and dividing the characteristic value of the video frame into the centroid of the ith +1 class. And (4) circulating the characteristic value of the kth video frame until the characteristic value belongs to a certain class in the existing i classes or a new class is generated and the characteristic value of the video is taken as the centroid of the new class. By analogy, each feature value in the video frame feature value set X is processed as above, so that a plurality of categories and centroids of the plurality of categories can be obtained, and further, a cluster map of all the video frame feature values can be obtained.
And step S300, selecting the video frame characteristic value closest to the corresponding centroid in each category as a key frame characteristic value, and taking the arithmetic mean of the video frame characteristic values as the key frame characteristic of a certain category if the video frame characteristic values closest to the centroid exist in the category. Different types of defect characteristics of the network can be basically reflected through the key frame characteristic values, so that the automatic defect detection efficiency is improved.
The Euclidean distance calculation formula is shown as a formula (1):
In the formula,is Euclidean distance between a certain video frame characteristic value and a certain centroid, wherein the centroids comprise a first class centroid and a new class centroid, andfor a point in m-dimensional Euclidean space where a feature value of a certain video frame is located,is a point in an m-dimensional Euclidean space where a certain centroid is located, m is a plurality of characteristics contained in the characteristic value of the video frame,for the value of the ith dimension in m-dimensional euclidean space for the k-th video frame feature value,the value of the ith dimension in the m-dimensional Euclidean space is the characteristic value of the jth class centroid.
The feature values of all video frames in the first class are arithmetically averaged to update the centroid of the ith class by the following formula (2):
In the formulaRepresents the updated class j centroid,representing a set of feature values belonging to the j-th category,shall belong toOf a certain video frame.
Illustratively, as shown in fig. 2, the input video frame feature value set X = { X =1…xk…xnAnd presetting a threshold value. Classifying a first category and assigning a first video frame feature value in a video frame feature setAnd dividing the video frames into a first class of centroids, orderly calculating Euclidean distances between the characteristic values of each subsequent video frame and the centroids of the existing i (i is more than or equal to 1 and less than or equal to n) classes, comparing the Euclidean distances with a preset threshold value, and further belonging the video frames to one of the existing i classes or dividing a new i +1 th class as the centroid of the new class. Specifically, the Euclidean distance between the characteristic value of the kth video frame and the class centroid of the j (j is more than or equal to 1 and less than or equal to i) is calculated, and if the Euclidean distance is less than a preset threshold value, the characteristic value of the video frame is classified into the class centroid of the kth video framejIn class and tojThe arithmetic mean of all video frame feature values in class is taken to updatejAnd if the Euclidean distance value is greater than or equal to a preset threshold value and j is not equal to i, calculating the Euclidean distance between the characteristic value of the kth video frame and the jth +1 class centroid and comparing the Euclidean distance with the preset threshold value again, and if the calculated Euclidean distance value is greater than or equal to the preset jth class threshold value and j is equal to i, dividing the ith +1 class and dividing the characteristic value of the video frame into the ith +1 class centroid. And (4) circulating the characteristic values of the k-th video frame until the characteristic values belong to a certain class in the existing i classes or a new class is generated and the characteristic values of the video are divided into centroids of the new class. By analogy, each feature value in the video frame feature value set X is processed as above, so that a plurality of categories and centroids of the plurality of categories can be obtained, and further, all video frame feature value cluster maps are obtained. As shown in fig. 3, the video frame feature value closest to the centroid of the class i is divided into the key frame feature value for output. And if a certain class has a plurality of video frame characteristic values closest to the centroid, taking the arithmetic mean of the video frame characteristic values as the key frame characteristic value of the class.
In some embodiments, as shown in fig. 1, after step S300, step S400 is further included, in which a classifier is used to perform secondary classification on the keyframe, and a pipe network defect identification result is output.
In some embodiments, the pipe network defect identification process takes a pipeline closed circuit television video as input, firstly, the video is segmented into continuous image frames, then, each frame of image is sent into a trained AlexNet network for feature extraction, each video frame feature value in video data is extracted, and a video frame feature value set X = { X } is generated1…xk…xnAnd presetting a threshold; dividing a first category and dividing a first video frame characteristic value in a video frame characteristic set into a first category centroid, calculating Euclidean distance between each subsequent video frame characteristic value and the centroid of the existing i (i is more than or equal to 1 and less than or equal to n) category in order, comparing the Euclidean distance with a preset threshold value, and further attributing the video frame characteristic value to a certain category in the existing i categories or generating a new (i + 1) th category and dividing the video frame characteristic value into the new category centroid. Specifically, the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth class (j is more than or equal to 1 and less than or equal to i) is calculated, if the Euclidean distance value is less than a preset threshold value, the characteristic value of the video frame is classified into the jth class, all the characteristic values of the video frame in the jth class are arithmetically averaged to update the centroid of the jth class, if the Euclidean distance value is more than or equal to the preset threshold value and j is not equal to i, the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth +1 class is calculated and compared with the preset threshold value again, and if the calculated Euclidean distance is more than or equal to the preset threshold value of the jth class and j is equal to i, the ith +1 class is divided and the characteristic value of the video frame is divided into the centroid of the ith +1 class. And (4) circulating the characteristic values of the k-th video frame until the characteristic values belong to a certain class in the existing i classes or a new class is divided, and dividing the characteristic values of the video into centroids of the new class. By analogy, each feature value in the video frame feature value set X is processed as above, so that a plurality of categories and centroids of the categories can be obtained, and further all video frame cluster maps are obtained; and selecting the video frame characteristic value closest to the corresponding centroid in each category as a key frame characteristic value, and taking the arithmetic mean of the video frame characteristic values as the key frame characteristic value of a certain category if the video frame characteristic values closest to the centroid exist in the category.
The network model in the embodiment of the invention can be an AlexNet neural network, and the AlexNet neural network mainly comprises a convolutional layer, a maximum pooling layer, an activation function, Local Response Normalization (Local Response Normalization) and a full connection layer. The classifier may be a support vector machine, SVM. The AlexNet neural network and the SVM can be trained on the basis of a data set through the following method, so that the extraction of the characteristic value of each video frame in the video data and the secondary classification of the characteristic value of the key frame are respectively realized, and the pipe network defect recognition result is output.
Specifically, an AlexNet neural network model is used for carrying out feature extraction on an input video frame image, a complete AlexNet neural network is established, training is carried out by using training data, and the trained model is stored. And inputting training data into the model to obtain characteristic values, and then using the characteristic values of the video frames as input data of the SVM classifier to train the SVM classifier. During testing, the characteristic values of the video frames are extracted and the result is predicted by the SVM, and the optimal model of each SVM is stored.
The process of training the AlexNet neural network model is as follows. Establishing a data set: extracting video frames from historical pipe network detection data and drainage pipe network shooting videos and establishing a data set, labeling the extracted video frames for each frame of image in the data set according to requirements in drainage pipe network detection and evaluation regulations to form a pipeline image set S and an image tag set L, wherein for each image X (n) in the pipeline image set S, the image tag set L corresponding to each image X (n) is provided, and L (1), L (2), … L (o)) represents the pipeline abnormity type of the X (n) th image; and the data set S is divided into a training set S1, a verification set S2 and a test set S3 according to the proportion of 60%, 20% and 20%. The training set S1 images were cropped and then the AlexNet network was trained using the training set S1.
The AlexNet neural network model network structure shown in fig. 4 mainly includes 5 convolutional layers, 3 pooling layers, and 3 fully-connected layers, where the first convolutional layer mainly uses 96 cores with convolution kernel size of 11 × 3, step size stride =4, and extended edge pad = 0; activating the function ReLU and performing local normalization (LRN); then performing maximum pooling, wherein the pooling size is 3 x 3, and the step length stride = 2; a second convolution layer, using 256 kernels with convolution kernel size 5 x 48, step size stride =1, and extended edge pad = 2; activating the function ReLU and performing local normalization (LRN); then performing maximum pooling, wherein the pooling size is 3 x 3, and the step length stride = 2; a third convolution layer, adopting 384 convolution kernels with the size of 3 × 256, the step size stride =1, and the extended edge pad = 1; then activated using ReLU; a fourth convolution layer, adopting 384 convolution kernels with the size of 3 × 256, with the step size stride =1, and the extended edge pad = 1; then activated using ReLU; a fifth convolution layer, wherein 256 convolution kernels with the sizes of 3 × 256 are adopted, the step size stride =1, and the extended edge pad = 1; then activating using ReLU followed by maximum pooling level of 3 x 3 with step stride = 2; the sixth layer is a full connection layer, and the number of the neurons is 4096; the ReLU activation function generates 4096 values; the seventh layer is a full connection layer, and the number of the neurons is 4096; the ReLU activation function generates 4096 values; the eighth layer is an output layer, and 4096 data output by the seventh layer are all connected with 1000 neurons of the eighth layer to output a characteristic value.
In a neural network, a ReLU function is generally used to increase a nonlinear factor, and introducing nonlinearity can effectively alleviate the gradient disappearance problem and increase the expressive power of the network, as shown in formula (3):
In the formula,xrepresenting the output of the connected previous layer network structure.
In the neural network, the neuron output is mapped by using an activation function in a non-linear way, but in order to prevent gradient explosion and improve the generalization capability of the model, the result obtained by the ReLU is normalized, and the normalization formula is shown as formula (4):
In the formula,is a value after the normalization, and is,represents the output value of the activation function, wherein a represents the convolution kernel to be calculated, t represents the t-th channel, g, h represents the value to be normalizedThe position coordinates and the position of the width and height dimensions of the image are not more than the width and height of the image after the convolution of the image;the convolution kernel representing the required calculation is in the second placedThe characteristics of the individual channels are such that,drepresents taking 0 toMaximum value, z representsThe range of the neighborhood, if meet the boundary condition, complement with 0; n represents the total number of convolution kernels;a constant representing the case of dividing 0 to prevent it from occurring,the indication constants are all adjustable parameters that,the number of the symbols representing the constant number,all are manually set hyper-parameters.
And (3) using the feature values of the image frames extracted from the AlexNet network as a training set, training a Support Vector Machine (SVM), and performing secondary classification on the extracted key frames, wherein the number of SVM is determined by the number of the types of the defects of the pipe network.
Specifically, first, data and a learning target are input, wherein the data is a feature vector of each image frame to thereby constitute a feature space, and the learning target is set to a binary variableIndicating a class of absence and having a certain type of deficiency. The feature space where the input data is located uses a decision boundary (decision boundary) hyperplane to carry out learning object according to the condition of absenceTraps are separated from defective ones (with some kind of defect).
The above-mentioned partition hyperplane decision boundary calculation formula is:
WhereinA normal vector representing the hyperplane,Twhich represents a transposition of the first image,frepresents the intercept of the hyperplane, and the intercept of the hyperplane,Xrepresenting the eigenvalues of some training sample of the input.
As long as the normal vector is determinedAnd interceptfA divided hyperplane can be uniquely determined. The decision boundary divides the basis vector space into two sets, and the classifier classifies all points on one side of the decision boundary as belonging to one class and all points on the other side as belonging to the other class.
The distance formula from the points on the two sides of the hyperplane to the hyperplane is calculated as follows:
Wherein,drepresenting the point-to-hyperplane distance on either side of the hyperplane,a normal vector representing the hyperplane is shown,Twhich represents the transpose of the image,frepresents the intercept of the hyperplane,Xrepresenting the characteristic value of a certain input training sample;
and verifying the trained SVM by using the S2 verification set, judging whether the trained SVM has an optimal hyperplane or not, and enabling the model to be in an optimal state by adjusting hyperparameters. And selecting the optimal model, testing by using an S3 test set, estimating the generalization capability of the model, selecting the model with strong generalization capability, inputting the pipe network image to be detected into the model, detecting the defect, and obtaining the output result of the model. A plurality of SVM classifiers are used to output defect classes for the video frames as shown in fig. 5.
In summary, according to the pipe network defect intelligent identification method based on video frame clustering provided by the embodiment of the invention, each frame of image is subjected to feature extraction through the AlexNet neural network, each frame of image is clustered by using Euclidean distance, and finally, a key frame is extracted and an SVM is used for defect classification, so that the accurate judgment of the drainage pipeline defect category is realized. The automatic detection efficiency and accuracy of the pipe network defects can be improved, the labor intensity of workers can be reduced, and the automatic detection method has great popularization and application values in pipe network video defect detection.
The embodiment of the invention also provides a pipe network defect intelligent identification device based on video frame clustering, which comprises a computing unit, wherein the computing unit is configured to: extracting the characteristic value of each video frame in the video data to generate a video frame characteristic value set X = { X = { (X) }1…xk…xnAnd presetting a threshold;
dividing a first category and dividing a first video frame characteristic value in a video frame characteristic set into a first category centroid, calculating Euclidean distance between each subsequent video frame characteristic value and the centroid of the existing i (i is more than or equal to 1 and less than or equal to n) category in order, comparing the Euclidean distance with a preset threshold value, and further belonging the video frame characteristic value to a certain category in the existing i categories or dividing a new (i + 1) th category into the centroid of a new category. Specifically, the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth class (j is more than or equal to 1 and less than or equal to i) is calculated, if the Euclidean distance value is less than a preset threshold value, the characteristic value of the video frame is classified into the jth class, all the characteristic values of the video frame in the jth class are arithmetically averaged to update the centroid of the jth class, if the Euclidean distance value is more than or equal to the preset threshold value and j is not equal to i, the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth +1 class is calculated and compared with the preset threshold value again, and if the calculated Euclidean distance is more than or equal to the preset threshold value of the jth class and j is equal to i, the ith +1 class is divided and the characteristic value of the video frame is divided into the centroid of the ith +1 class. And (4) circulating the characteristic values of the k-th video frame until the characteristic values belong to a certain class in the existing i classes or a new class is divided, and dividing the characteristic values of the video into centroids of the new class. By analogy, each feature value in the video frame feature value set X is processed as above, so that a plurality of categories and centroids of the categories can be obtained, and further a cluster map of all the video frame feature values is obtained;
and selecting the video frame characteristic value closest to the corresponding centroid in each category as a key frame characteristic value, and taking the arithmetic mean of the video frame characteristic values as the key frame characteristic value of a certain category if the video frame characteristic values closest to the centroid exist in the category.
In some embodiments, the euclidean distance calculation formula is as shown in equation (1):
In the formula,is Euclidean distance between a certain video frame characteristic value and a certain centroid, wherein the centroids comprise a first class centroid and a new class centroid, andfor a point in m-dimensional Euclidean space where a feature value of a certain video frame is located,is a point in an m-dimensional Euclidean space where a certain centroid is located, m is a plurality of characteristics contained in the characteristic value of the video frame,for the value of the ith dimension in m-dimensional euclidean space for the k-th video frame feature value,for the ith dimension of the eigenvalue of the jth class centroid in m-dimensional Euclidean spaceThe value of (c).
In some embodiments, the computing unit is further configured to: the feature values of all video frames in the first class are arithmetically averaged to update the centroid of the ith class by the following formula (2):
In the formulaRepresents the updated class j centroid,representing a set of feature values belonging to the j-th category,shall belong toOf a certain video frame.
In some embodiments, the apparatus further comprises a network model configured to extract a feature value for each video frame in the video data.
In some embodiments, the network model comprises 5 convolutional layers and 2 fully-connected layers connected in sequence; pooling layers are respectively arranged behind the first layer, the second layer and the fifth layer of convolution layers, each layer is provided with an activation function, an input video frame is transmitted in a forward direction, and the characteristics of a 7 th layer of full-connection layer are used as output to obtain corresponding video frame characteristic values; the activation function is shown in equation (3):
Wherein,xis the output of the connected upper-level network fabric.
In some embodiments, the convolved activation functions of the first and second layers are normalized using the local response of equation (4):
In the formula,is a value after the normalization, and is,representing the output value of the activation function, wherein a represents a convolution kernel required to be calculated, t represents a tth channel, g and h represent the position coordinates and the position of the width and height dimensions of the value to be normalized, and the size of the position coordinates and the position dimensions does not exceed the width and height of the image after the image is subjected to convolution;the convolution kernel representing the required computation is in the second placedThe characteristics of the individual channels are such that,drepresents taking 0 toMaximum value, z representsThe range of the neighborhood, if the boundary condition is met, the range is completed by 0; n represents the total number of convolution kernels;a constant indicating the case of dividing 0 in order to prevent the occurrence,the indication constants are all adjustable parameters that,the number of the symbols representing the constant number,all are manually set hyper-parameters.
In some embodiments, the apparatus further includes a classifier configured to perform a second classification on the keyframes, and output a pipe network defect identification result.
In some embodiments, the classifier is trained by:
based on the input data and the learning target, the feature space where the input data is located uses a hyperplane of a decision boundary to separate the learning target into a non-defective class and a defective class (with a certain defect); the input data is a feature vector of each video frame, and a learning target is set as a binary variable to represent a non-defective class and a defective class (having a certain defect);
the decision boundary separating the hyperplanes is calculated as:
Wherein,a normal vector representing the hyperplane,Twhich represents the transpose of the image,frepresents the intercept of the hyperplane,Xrepresenting the characteristic value of a certain input training sample;
calculating the distance from the points on the two sides of the hyperplane to the hyperplane by the formula (6):
Wherein,drepresenting the point-to-hyperplane distance on either side of the hyperplane,a normal vector representing the hyperplane,Twhich represents the transpose of the image,frepresents the intercept of the hyperplane,Xrepresenting the characteristic value of a certain input training sample;
evaluating the hyperplane with a verification set, continuously updating the normal vectorAnd interceptfTo determine an optimal hyperplane.
The pipe network defect intelligent identification device based on video frame clustering provided by the embodiment of the invention has basically the same technical effect as the method explained in the foregoing, and the description is not repeated here.
Embodiments of the present invention also provide a computer-readable storage medium having stored thereon computer-readable instructions, which, when executed by a processor of a computer, cause the computer to perform the method according to any of the embodiments of the present invention.
The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being covered by the appended claims and their equivalents.
Claims (10)
1. A pipe network defect intelligent identification method based on video frame clustering is characterized by comprising the following steps:
extracting the characteristic value of each video frame in the video data to generate a video frame characteristic value set X = { X = { (X) }1…xk…xnAnd presetting a threshold;
determining a first category, dividing the first video frame characteristic value in the video frame characteristic value set into a first category centroid, sequentially calculating Euclidean distance between each subsequent video frame characteristic value and the centroid of the existing i categories, comparing the Euclidean distance with a preset threshold value, and further belonging the video frame characteristic value to one of the existing i categories or dividing the video frame characteristic value into a new (i + 1) th category and dividing the video frame characteristic value into the centroid of the new category:
calculating the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth class, if the Euclidean distance value is smaller than a preset threshold value, classifying the characteristic value of the video frame into the jth class, and taking arithmetic average to all the characteristic values of the video frame in the jth class to update the centroid of the jth class, if the Euclidean distance value is larger than or equal to the preset threshold value and j is not equal to i, calculating the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth +1 class and comparing the Euclidean distance with the preset threshold value again, and if the calculated Euclidean distance is larger than or equal to the preset threshold value of the jth class and j is equal to i, dividing the characteristic value of the kth video frame into the centroid of the ith +1 class and dividing the characteristic value of the video frame into the centroid of the ith +1 class;
and selecting the video frame characteristic value closest to the corresponding centroid in each category as a key frame characteristic value, and taking the arithmetic mean of the video frame characteristic values as the key frame characteristic value of a certain category if the video frame characteristic values closest to the centroid exist in the category.
2. The method of claim 1, wherein the euclidean distance is calculated as shown in formula (1):
In the formula,is Euclidean distance between a certain video frame characteristic value and a certain centroid, wherein the centroids comprise a first class centroid and a new class centroid, andfor a point in m-dimensional Euclidean space where a feature value of a certain video frame is located,is a point in an m-dimensional Euclidean space where a certain centroid is located, m is a plurality of characteristics contained in the characteristic value of the video frame,for the value of the ith dimension in m-dimensional euclidean space for the k-th video frame feature value,is the value of the ith dimension in the m-dimensional Euclidean space for the eigenvalue of the jth class centroid.
3. The method of claim 1, wherein the i-th class centroid is updated by arithmetically averaging all video frame feature values in the i-th class by the following formula (2):
4. The method of claim 1, wherein the extracting the feature value of each video frame in the video data comprises:
and extracting the characteristic value of each video frame in the video data by using an AlexNet neural network model.
5. The method of claim 4, wherein the AlexNet neural network model comprises sequentially connecting 5 convolutional layers and 2 fully-connected layers; pooling layers are respectively arranged behind the convolution layers of the first layer, the second layer and the fifth layer, each layer is provided with an activation function, an input video frame is transmitted in a forward direction, and the characteristics of a 7 th layer full-connection layer are used as output to obtain corresponding video frame characteristic values; the activation function is shown in equation (3):
In the formula,xis the output of the connected upper layer network structure.
6. The method of claim 5, wherein the convolved activation functions of the first and second layers are normalized using the local response of equation (4):
In the formula,is a value after the normalization, and is,representing the output value of the activation function, wherein a represents a convolution kernel required to be calculated, t represents a tth channel, g and h represent the position coordinates and the position of the width and height dimensions of the value to be normalized, and the size of the position coordinates and the position dimensions does not exceed the width and height of the image after the image is subjected to convolution;the convolution kernel representing the required calculation is in the second placedThe characteristics of the individual channels are such that,drepresents taking 0 toMaximum value, z representsThe range of the neighborhood, if meet the boundary condition, complement with 0; n represents the total number of convolution kernels;a constant representing the case of dividing 0 to prevent it from occurring,the indication constants are all adjustable parameters that,the number of the symbols representing the constant number,all are manually set hyper-parameters.
7. The method according to any one of claims 1-6, wherein after selecting the video frame feature value closest to the corresponding centroid in each category as the key frame feature value, the method further comprises:
and performing secondary classification on the key frame characteristic values by using a classifier, and outputting a pipe network defect identification result.
8. The method of claim 7, wherein the classifier is trained by:
based on the input data and the learning target, the feature space where the input data is located uses a decision boundary hyperplane to separate the learning target according to a non-defective class and a defective class; the input data is a characteristic value of each video frame, and the learning target is a binary variableIndicating a non-defective class and a defective class;
the partition hyperplane decision boundary calculation formula is:
In the formula,a normal vector representing the hyperplane,Twhich represents the transpose of the image,represents the intercept of the hyperplane, and the intercept of the hyperplane,Xrepresenting the characteristic value of a certain input training sample;
calculating the distance from the points on both sides of the hyperplane to the hyperplane by the formula (6):
9. A pipe network defect intelligent identification device based on video frame clustering is characterized by comprising a computing unit, wherein the computing unit is configured to:
extracting the characteristic value of each video frame in the video data to generate a video frame characteristic value set X = { X = { (X) }1… xk…xnAnd presetting a threshold;
dividing a first category, dividing a first video frame characteristic value in a video frame characteristic set into a first category centroid, calculating Euclidean distance between each subsequent video frame characteristic value and the centroid of the existing i category in order, comparing the Euclidean distance with a preset threshold value, further belonging the video frame characteristic value to one of the existing i categories or dividing the video frame characteristic value into a new i +1 category, and dividing the video frame characteristic value into the centroid of the new category, specifically calculating the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth category, if the Euclidean distance value is smaller than the preset threshold value, classifying the video frame characteristic value into the jth category, arithmetically averaging all the video frame characteristic values in the jth category to update the centroid of the jth category, if the Euclidean distance value is larger than or equal to the preset threshold value and j is not equal to i, calculating the Euclidean distance between the characteristic value of the kth video frame and the centroid of the jth +1 category, and comparing the preset threshold value again, if the calculated Euclidean distance is larger than or equal to a preset jth class threshold value and j is equal to i, dividing the (i + 1) th class and dividing the video frame characteristic value into an (i + 1) th class centroid;
and selecting the video frame characteristic value closest to the corresponding centroid in each category as a key frame characteristic value, and taking the arithmetic mean of the video frame characteristic values as the key frame characteristic value of a certain category if the video frame characteristic values closest to the centroid exist in the category.
10. A computer-readable storage medium having computer-readable instructions stored thereon, which, when executed by a processor of a computer, cause the computer to perform the method of any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210566909.8A CN114648534A (en) | 2022-05-24 | 2022-05-24 | Pipe network defect intelligent identification method and device based on video frame clustering, and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210566909.8A CN114648534A (en) | 2022-05-24 | 2022-05-24 | Pipe network defect intelligent identification method and device based on video frame clustering, and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114648534A true CN114648534A (en) | 2022-06-21 |
Family
ID=81996677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210566909.8A Pending CN114648534A (en) | 2022-05-24 | 2022-05-24 | Pipe network defect intelligent identification method and device based on video frame clustering, and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114648534A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109800824A (en) * | 2019-02-25 | 2019-05-24 | 中国矿业大学(北京) | A kind of defect of pipeline recognition methods based on computer vision and machine learning |
CN110910021A (en) * | 2019-11-26 | 2020-03-24 | 上海华力集成电路制造有限公司 | Method for monitoring online defects based on support vector machine |
WO2020062433A1 (en) * | 2018-09-29 | 2020-04-02 | 初速度(苏州)科技有限公司 | Neural network model training method and method for detecting universal grounding wire |
CN111695482A (en) * | 2020-06-04 | 2020-09-22 | 华油钢管有限公司 | Pipeline defect identification method |
CN112070044A (en) * | 2020-09-15 | 2020-12-11 | 北京深睿博联科技有限责任公司 | Video object classification method and device |
CN113221710A (en) * | 2021-04-30 | 2021-08-06 | 深圳市水务工程检测有限公司 | Neural network-based drainage pipeline defect identification method, device, equipment and medium |
CN113766330A (en) * | 2021-05-26 | 2021-12-07 | 腾讯科技(深圳)有限公司 | Method and device for generating recommendation information based on video |
WO2022016328A1 (en) * | 2020-07-20 | 2022-01-27 | 深圳大学 | Metal foreign body detection method and apparatus, and terminal device |
-
2022
- 2022-05-24 CN CN202210566909.8A patent/CN114648534A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020062433A1 (en) * | 2018-09-29 | 2020-04-02 | 初速度(苏州)科技有限公司 | Neural network model training method and method for detecting universal grounding wire |
CN109800824A (en) * | 2019-02-25 | 2019-05-24 | 中国矿业大学(北京) | A kind of defect of pipeline recognition methods based on computer vision and machine learning |
CN110910021A (en) * | 2019-11-26 | 2020-03-24 | 上海华力集成电路制造有限公司 | Method for monitoring online defects based on support vector machine |
CN111695482A (en) * | 2020-06-04 | 2020-09-22 | 华油钢管有限公司 | Pipeline defect identification method |
WO2022016328A1 (en) * | 2020-07-20 | 2022-01-27 | 深圳大学 | Metal foreign body detection method and apparatus, and terminal device |
CN112070044A (en) * | 2020-09-15 | 2020-12-11 | 北京深睿博联科技有限责任公司 | Video object classification method and device |
CN113221710A (en) * | 2021-04-30 | 2021-08-06 | 深圳市水务工程检测有限公司 | Neural network-based drainage pipeline defect identification method, device, equipment and medium |
CN113766330A (en) * | 2021-05-26 | 2021-12-07 | 腾讯科技(深圳)有限公司 | Method and device for generating recommendation information based on video |
Non-Patent Citations (3)
Title |
---|
何江 等: "基于机器视觉的AlexNet网络煤矸石检测系统", 《煤炭技术》 * |
张玉辉 等: "基于 KNN算法的特征过滤预处理研究", 《现代信息科技》 * |
王冬雪: "基于多特征的多视角人脸表情识别研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108830188B (en) | Vehicle detection method based on deep learning | |
CN109784203B (en) | Method for inspecting contraband in weak supervision X-ray image based on layered propagation and activation | |
WO2023155069A1 (en) | Deep-learning-based surface defect detection method for mobile phone battery | |
CN112949572B (en) | Slim-YOLOv 3-based mask wearing condition detection method | |
CN107944396B (en) | Knife switch state identification method based on improved deep learning | |
CN109118479B (en) | Capsule network-based insulator defect identification and positioning device and method | |
CN106845421B (en) | Face feature recognition method and system based on multi-region feature and metric learning | |
CN110569738B (en) | Natural scene text detection method, equipment and medium based on densely connected network | |
Hassanin et al. | A real-time approach for automatic defect detection from PCBs based on SURF features and morphological operations | |
EP3478728A1 (en) | Method and system for cell annotation with adaptive incremental learning | |
CN112200121B (en) | Hyperspectral unknown target detection method based on EVM and deep learning | |
CN111612784A (en) | Steel plate surface defect detection method based on classification-first YOLO network | |
CN111242144B (en) | Method and device for detecting abnormality of power grid equipment | |
CN111242899B (en) | Image-based flaw detection method and computer-readable storage medium | |
CN111582126A (en) | Pedestrian re-identification method based on multi-scale pedestrian contour segmentation fusion | |
CN114926391A (en) | Perspective transformation method based on improved RANSAC algorithm | |
CN116740652B (en) | Method and system for monitoring rust area expansion based on neural network model | |
CN114694178A (en) | Method and system for monitoring safety helmet in power operation based on fast-RCNN algorithm | |
CN117315380B (en) | Deep learning-based pneumonia CT image classification method and system | |
Mandyartha et al. | Global and adaptive thresholding technique for white blood cell image segmentation | |
CN113781483B (en) | Industrial product appearance defect detection method and device | |
CN116051539A (en) | Diagnosis method for heating fault of power transformation equipment | |
CN116977859A (en) | Weak supervision target detection method based on multi-scale image cutting and instance difficulty | |
Sulistyaningrum et al. | Classification of damaged road types using multiclass support vector machine (SVM) | |
Fujita et al. | Fine-tuned pre-trained mask R-CNN models for surface object detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |