CN112750125B - Glass insulator piece positioning method based on end-to-end key point detection - Google Patents
Glass insulator piece positioning method based on end-to-end key point detection Download PDFInfo
- Publication number
- CN112750125B CN112750125B CN202110118779.7A CN202110118779A CN112750125B CN 112750125 B CN112750125 B CN 112750125B CN 202110118779 A CN202110118779 A CN 202110118779A CN 112750125 B CN112750125 B CN 112750125B
- Authority
- CN
- China
- Prior art keywords
- glass insulator
- key point
- image
- loss
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0004—Industrial image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/12—Edge-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10052—Images from lightfield camera
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30108—Industrial image inspection
- G06T2207/30164—Workpiece; Machine component
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a glass insulator sheet positioning method based on end-to-end key point detection, which comprises the following steps: 1) constructing and labeling a power inspection glass insulator example segmentation data set; 2) data expansion is carried out by utilizing a data enhancement algorithm; 3) training to obtain an example segmentation model, and cutting the minimum external polygonal image of the area where the glass insulator is located to serve as a data set for key point detection in the next step; 4) marking key points to detect a data set and performing data expansion; 5) designing an end-to-end key point detection model, and continuously tuning and training; 6) the trained instance segmentation model and the key point detection model are connected in series to work: and inputting the glass insulator picture to be detected into the trained example segmentation model, cutting the region of the segmentation result, and inputting the cut region into the trained key point detection model to obtain the coordinate value of the key point at the position of each glass insulator piece in the graph. The invention can improve the speed and the accuracy of the positioning of the glass insulator sheet.
Description
Technical Field
The invention relates to the technical field of image pattern recognition and computer vision, in particular to a glass insulator sheet positioning method based on end-to-end key point detection.
Background
The insulator is a very important and common part in a power transmission network, plays roles of insulation, support and the like, and once a fault occurs, the insulator can cause the contact between power transmission lines or with a tower, so that the power supply interruption accident is caused, and great influence is brought to social economy and civilian life. In the current insulator usage of the power transmission line, the glass insulator is eliminated by auto-explosion, zero measurement is not needed, the glass insulator has excellent ageing resistance and occupies a proportion close to one third, and the glass insulator still has some troublesome problems in the operation and maintenance process, such as too high auto-explosion rate, so that the glass insulator is correctly identified and positioned, the number of the glass insulators is counted, the auto-explosion defect is found, and remedial measures are taken in time, and the method is an important part of the operation and maintenance of the power transmission line.
The traditional glass insulator self-explosion detection method is a manual inspection mode, and depends on manual interpretation, the mode is low in efficiency and also needs a large amount of labor cost, the environment of the glass insulator is mostly in a transformer substation under a high-voltage environment or in a field power transmission line with severe geographic environment, and the manual inspection mode can threaten the personal safety of workers. In recent years, under the great trend of 'smart power grids', the unmanned aerial vehicle is widely applied to the inspection and operation of the power grids, the inspection efficiency is greatly improved, a large number of power equipment state images are collected and stored, and how to reasonably and effectively use the images becomes an important step for further intelligent operation and maintenance work.
At present, the traditional image processing technology is mostly adopted in the related research of the glass insulator spontaneous explosion detection, the glass insulator is segmented and extracted by the priori knowledge that the glass insulator presents light green and oval outlines in an image, the universality of the algorithm is greatly limited by the premise, and the method cannot be applied to the real environment with complex background. In addition, a deep learning method is also adopted in research, and the deep learning method is mainly divided into three stages, wherein in the first stage, the whole string of glass insulators are positioned and identified by using a target detection or instance segmentation algorithm; in the second stage, the position of each insulator is further positioned, and the ROI area of the whole string of glass insulators obtained in the first stage is subjected to further glass insulator single-sheet example segmentation; and in the third stage, a heuristic method is adopted to calculate the distance between each glass insulator so as to identify the existing self-explosion defects. The whole algorithm flow is reasonably and feasible by being divided into three stages, but in the second stage, in order to accurately divide the glass insulators in various scenes to obtain the position information of each insulator, an example division algorithm is continuously adopted, the model complexity is too high, the real-time requirement is difficult to achieve, and a large amount of manpower and time are needed for marking the outline of each glass insulator in a large amount of image data sets.
In order to simplify the positioning algorithm of each insulator in the second stage, the invention provides the glass insulator picture positioning method which is suitable for complex backgrounds and can meet the requirements of real time and high accuracy, and the method has higher practical application value.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a glass insulator piece positioning method based on end-to-end key point detection.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: a glass insulator sheet positioning method based on end-to-end key point detection comprises the following steps:
1) collecting a visible light image of the power equipment shot in the power inspection process, sorting out an image mainly containing a glass insulator, namely a glass insulator image, constructing an example segmentation data set by using the glass insulator image, and manually marking the area of the whole string of glass insulators by using an image marking tool;
2) data expansion is carried out on all original data sets by using a data enhancement algorithm, so that the data volume is increased;
3) training an example segmentation data set by using an example segmentation model to obtain an example segmentation model capable of accurately segmenting the region where the whole string of glass insulators are located, segmenting each picture in the example segmentation data set by using the example segmentation model, cutting and storing the segmented regions, and summarizing the segmented regions into a key point detection data set;
4) manually marking the central point of the edge of the outer ring glass of each glass insulator in the data set by using an image marking tool to detect the central point as the key point of the glass insulator piece, and performing data expansion by using a data enhancement algorithm;
5) aiming at the key point detection data set, designing an end-to-end key point detection model, setting different parameters to carry out debugging training on the model, and storing the key point detection model which has the best performance in a verification set;
6) the trained example segmentation model and the key point detection model are connected in series to work: inputting a glass insulator picture to be detected into a trained example segmentation model, cutting a region of a segmentation result, namely a local glass insulator region, and inputting the region into a trained key point detection model to obtain coordinate values of key points of positions of glass insulator pieces in the picture, so that the glass insulator pieces are accurately positioned.
In the step 1), the collected glass insulator pictures are high-definition visible light pictures with the width 8688 and the height 5792, and because training is directly carried out on the high-definition images, a lot of memory space is occupied, under the condition that the definition of the pictures is guaranteed, all the pictures are adjusted in size, namely, interpolated into pictures with the width 1448 and the height 965, and then labeling and training are carried out; in addition, in order that the area where the glass insulator cut in step 3) is located does not contain a background as much as possible, the marking range is the minimum circumscribed polygon of the whole string of glass insulators.
In step 2), the data enhancement algorithm is used for data expansion of the image, and the method comprises the following steps:
a. random image rotation: the rotating angle is randomly selected from-20 degrees to +20 degrees;
b. random image cropping: cutting four fifths of the whole image randomly;
c. randomly turning horizontally;
d. random contrast and color transition.
In step 3), the adopted instance segmentation model is Yolact + +, which is an improved version of a one-stage real-time instance segmentation model Yolact, further improving the operation speed and segmentation accuracy of the overall model, and the model divides instance segmentation into two parallel subtasks: firstly, generating a group of prototype masks, secondly, predicting the coefficient of each mask, and finally, carrying out linear combination on the coefficients of the prototype masks and the masks to generate example masks; the model is used for segmenting each picture in the example segmentation data set, so that a mask in the region where each glass insulator string is located in each picture and an image part corresponding to the minimum external moment of the mask can be obtained, the mask is assumed to be R, the angle of a mask binary image, namely the orientation angle of the glass insulator in the image, is calculated, the R is rotated by the angle, the glass insulator in the R is in the horizontal direction, and finally the minimum external moment region where the glass insulator is located is cut out according to the mask to serve as the image data of the key point detection part.
In the step 4), manually marking the central point of the edge of the outer ring glass of each glass insulator in the key point detection data set by using an image marking tool labelme as the key point of the glass insulator piece, wherein the marking type is selected to be point, and the marking sequence is strictly observed from left to right row by row, so that the coordinate regression of the key point is calculated subsequently and the predicted coordinate point corresponds to each other; in order to avoid model overfitting and improve generalization capability of the model, data expansion is carried out by using a data enhancement algorithm, and the data expansion comprises the following steps:
a. randomly turning horizontally or vertically;
b. random contrast and color transition.
In step 5), designing an end-to-end key point detection model suitable for the glass insulator, wherein the specific conditions are as follows:
a. network architecture
Adjusting the size of an input image and interpolating the input image to a fixed width and height, and setting the specification of the image input by the network as [ b × 3 × h × w ] to ensure that the width and height of the image input into the network are all multiples of 16, wherein b is batch size, 3 is the number of RGB (red, green and blue) channels of the image, h is the height of the image, and w is the width of the image;
the whole network structure is divided into 3 main parts: the method comprises the steps of extracting characteristics of a backbone network backbone, a key point coordinate regression branch core _ head and a probability heat map prediction branch headmap _ head, wherein the core _ head and the headmap _ head are parallel structures, and the specific meanings and detailed structures of all parts are as follows;
the function of the backbone network for feature extraction is to extract features, the HourglassNet which is good in effect in human body posture estimation and widely applied is referred to, the HourglassNet is formed by serially connecting Hourglass modules of basic modules of a down-sampling-up-sampling structure similar to a Hourglass, and feature extraction is carried out through symmetrical structures of the Hourglass modules, so that the network continuously repeats feature transfer processes from bottom to top and from top to bottom, local and global information is integrated, features under different scales are captured, and relative relations between key points are better acquired; the feature extraction backbone network comprises two parts, wherein the first part is mainly used for converting the channel number of an input image from 3 to 128, two-dimensional convolution with 1 x 1 of convolution kernels of two layers is adopted in specific network design, the channel number is changed into 64 through the first layer of convolution, the channel number is changed into 128 through the second layer of convolution, namely after the two-dimensional convolution of the first part is carried out, the specification of a data stream in the network is changed into [ b x 128 x h x w ]; the second part uses a classical Hourglass module, the left structure of the down-sampling of the module is mainly a convolution layer and a down-sampling pooling layer, and the purpose is to extract image features and continuously reduce the resolution of a feature map so as to filter redundant features and learn feature information with stronger robustness; the right upper sampling structure mainly comprises a convolution layer and an upper sampling layer to restore the resolution which is the same as that of the left characteristic diagram, so that the information fusion of bottom layer and high layer characteristics can be realized through the characteristic diagram fusion with the same size, and the precision of key point positioning is greatly improved; after passing through the Hourglass module of the second part, the data stream specification in the network is still [ b × 128 × h × w ];
the key point coordinate regression branch coord _ head is used for directly regressing to obtain the coordinates of key points in the graph and the confidence coefficient of each coordinate prediction as the key points; the branch needs to set a hyper-parameter N, wherein N is the most possible number of glass insulator pieces in the glass insulator picture; the branch takes the output of a feature extraction backbone network backbone as input, and the network structure mainly comprises a first convolution layer, a first maximum pooling layer, a second convolution layer, a second maximum pooling layer and an average pooling layer; the parameters of the first convolutional layer are that the number of output channels is 64, the convolutional kernel is 3 multiplied by 3, the step length is 1, and the padding is 1, namely the data stream specification is changed into [ b multiplied by 64 multiplied by h multiplied by w ] after the first convolutional layer is passed; the data flow specification becomes [ b × 64 × h/2 × w/2] after passing through the first maximum pooling layer; the number of output channels of the second convolution layer is set to be N x 3, so that the data flow specification becomes [ b x (N x 3) x h/2 x w/2] after passing through the second convolution layer; the data flow specification becomes [ b × (N × 3) × h/4 ×/4] after passing through the second maximum pooling layer; after the last average pooling layer, the data stream specification becomes [ b × (N × 3) × 1 × 1 ]; that is, each graph will have N × 3 predicted outputs, where the first 2 × N outputs represent the coordinates of N keypoints and the next N outputs are the confidence levels that the corresponding keypoint coordinates are predicted as keypoints;
the function of the probability heat map prediction branch heat map _ head is to predict the probability heat map of each pixel point in the input image as a key pointpredI.e. the heatmap of the branch outputpredThe size is consistent with the size of the image input into the network; the branch also takes the output of the backhaul as input, and mainly comprises three convolutional layers, the number of output channels is 64, 32 and 1 in sequence, the sizes of the convolutional cores are all 3 multiplied by 3, and the step length and the padding are all 1, namely after the three convolutional layers, the data stream specification is changed into [ b multiplied by 1 multiplied by h multiplied by w](ii) a Before final output, the output is normalized to be between (0,1) through a sigmoid function, and thus the value of each pixel point represents the probability value that the coordinate position is the key point;
b. loss function design
The Loss of multitask in the training process consists of three parts of Loss, which are respectively defined as Heatmaploss、Coordloss、ScorelossThe specific calculation method of each loss is as follows:
Heatmaplosscalculated is the predicted heatmappredWith the Gaussian heatmap as an annotationgenLoss in between; generating a Gaussian heatmap of Picture PgenThe method comprises the following steps: after labeling the key points of the picture P with labelme, a list [ (pointx) containing the coordinates of the key points is generated1,pointy1),...,(pointxk,pointyk),...,(pointxn,pointyn)]The annotation file (b) is that n is the number of the key points, a gaussian heat map is generated for each key point, and finally, the corresponding positions of all the gaussian heat maps are summed to obtain a final gaussian heat map, namely:
wherein i is a row coordinate, j is a vertical coordinate, i is more than or equal to 0 and less than w, y is more than or equal to 0 and less than h, w is the width of the picture P, h is the height of the picture P, heatmapgen_kIs the kth key point (pointx)k,pointyk) The generated gaussian heatmap is specifically defined as follows:
wherein, delta is a Gaussian function adjusting factor;
Heatmaploss=Binary_Cross_Entropy(heatmappred,heapmapgen)
Coordlosscalculated is the predicted Loss of keypoint coordinates, using the Smooth-L1-Loss employed in Fast RCNN as a reference, let (tx)k,tyk) Denotes the predicted kth coordinate, (vx)k,vyk) For the corresponding labeled keypoint coordinates:
wherein smoothL1The function is defined as follows, x being the argument:
Scorelosscalculated is confidence p ^ and heatmap of each predicted key point coordinategenValue p of the corresponding position in*Cross entropy loss between;
Scoreloss=-p*log(p^)
the total training loss is the sum of the losses of the three parts, and the specific gravity is the same and is 1, namely:
Loss=Heatmaploss+Coordloss+Scoreloss
c. setting training parameters
All layers of the model adopt a Kaiming parameter initialization method, an experiment optimizer is set to Adam, and the initial learning rate is set to be 0.001.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the output of a conventional, e.g., human keypoint detection network is a multi-channel output, each channel representing a heat map of one keypoint. In the glass insulator picture, each glass insulator piece is almost completely the same in local view, so that different key points cannot be regarded as different categories in network design for outputting and calculating loss, and the problem is solved well by outputting a heat map of only one channel in the invention.
2. Besides the probability heat map which is consistent with the size of the picture is output, the coordinate of another branch directly predicting the key point and the confidence coefficient of the other branch directly predicting the key point are added, the integral model is more accurately predicted by the integration of the loss of the two branches, the specific position coordinate of the key point can be obtained end to end, and the secondary calculation of the probability heat map is not needed to finally determine the coordinate of the predicted key point.
3. Compared with the model for positioning the glass insulator piece based on example segmentation, the model for positioning the glass insulator piece based on key point detection has the advantages of lower model complexity, higher calculation speed, capability of easily meeting the real-time requirement and no need of calculating the specific position of the glass insulator piece according to the segmented mask.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a flow chart of end-to-end keypoint detection model training and testing.
FIG. 3 is a diagram of segmentation and detection effects; in the figure, (a) is a segmentation effect diagram of an example of the glass insulator, and (b) is a key point detection effect diagram of the glass insulator.
Detailed Description
The present invention will be further described with reference to the following specific examples.
As shown in fig. 1 to fig. 3, the method for positioning a glass insulator sheet based on end-to-end keypoint detection provided by this embodiment includes the following steps:
1) the electric power patrol inspection infrared images accumulated by the electric power company contain various electric power equipment, only visible light images with glass insulators as main targets are sorted and used in the invention, the collected glass insulator images are high-definition visible light images with widths of 8688 and heights of 5792, the high-definition images occupy too much memory space when being directly trained, and under the condition of ensuring the definition of the images, all the images are interpolated into the images with widths of 1448 and 965 and then are marked and trained. Marking the minimum external polygon of the whole string of glass insulators as a marking range so that the area of the glass insulators cut in the step 3) does not contain a background as much as possible, and obtaining an example segmentation data set.
2) And data expansion is carried out on all original data sets by using a data enhancement algorithm, so that the data volume is increased.
In order to avoid model overfitting and improve the generalization capability of the model, data expansion is carried out on the image by using a data enhancement algorithm, wherein the data expansion comprises the following steps (the random probability of all the steps is 0.5):
a. random image rotation: the rotating angle is randomly selected from-20 degrees to +20 degrees;
b. random image cropping: cutting four fifths of the whole image randomly;
c. randomly turning horizontally;
d. random contrast and color transition.
3) And training the example segmentation data set by using the example segmentation model to obtain the example segmentation model capable of accurately segmenting the region where the whole string of glass insulators is located. And (3) segmenting each picture in the instance segmentation data set by using the model, cutting and storing the segmented regions, and summarizing the segmented regions into a key point detection data set.
The example segmentation model adopted by the embodiment is Yolact + +, and the model is an improved version of the one-stage real-time example segmentation model Yolact, so that the running speed and the segmentation accuracy of the whole model are further improved, and the indexes such as mAP and the like reach the level close to the Mask RCNN of the two-stage example segmentation model. The model divides the instance into two parallel subtasks: firstly, a group of prototype masks are generated, secondly, the coefficient of each mask is predicted, and finally, the prototype masks and the coefficients of the masks are linearly combined to generate the example masks. The model is used for segmenting each picture in the example segmentation data set, so that a mask of a region where each glass insulator string is located in each picture and an image part (assumed as R) corresponding to the minimum external moment of the mask can be obtained, the angle of a mask binary image (namely the orientation angle of the glass insulator in the image) is calculated, the R is rotated by utilizing the angle, the glass insulator in the R is enabled to be in the horizontal direction, and finally the minimum external moment region where the glass insulator is located is cut out according to the mask to serve as data of a key point detection part. The advantage of doing so is that the image area that cuts off will contain the image background to the minimum extent, and the glass insulator all is in the horizontal direction, has reduced the scene complexity that the key point detection model will learn greatly.
4) And manually marking the central point of the edge of the outer ring glass of each glass insulator in the key point detection data set as the key point of the glass insulator piece by using an image marking tool labelme, wherein the marking type is selected to be point, and the marking sequence is strictly observed from left to right line by line, so that the coordinate regression of the key point in the subsequent calculation is corresponding to the predicted coordinate point. Similarly, in order to avoid model overfitting and improve generalization capability of the model, data expansion is performed by using a data enhancement algorithm, including (random probability of all steps is 0.5):
a. randomly turning horizontally or vertically;
b. random contrast and color transition;
the reason why the center point of the outer ring glass edge of each glass insulator is selected as the key point of the glass insulator sheet during manual marking is as follows: even if the unmanned aerial vehicle shoots the glass insulator image, the glass insulator image is not directly opposite, but is inclined at a certain angle, the edge of the outer ring glass of each glass insulator in the two-dimensional image is also clearly visible and cannot be covered by other adjacent glass insulator pieces.
5) Referring to the existing network models PPGNet and HourglassNet, an end-to-end key point detection model suitable for a glass insulator is designed, and the specific conditions are as follows:
a. network architecture
Through statistics, the average value of the width and the height of the specification of the glass insulator pictures in the data set is about (640,140), and in order to ensure that the width and the height of the image input into the network are all multiples of 16 so as to avoid the problem that the size of the feature map does not correspond in the process of downsampling and upsampling fusion, the size of the input image is adjusted and interpolated to be 640 and 144, so that the specification of the image input into the network is [ b × 3 × 144 × 640] (wherein b is batch size).
The whole network structure is divided into 3 main parts: the method comprises the steps of extracting characteristics of a backbone network backbone, a key point coordinate regression branch core _ head and a probability heat map prediction branch headmap _ head, wherein the core _ head and the headmap _ head are parallel structures, and the specific meanings and detailed structures of all parts are as follows;
the main function of the backbone network is to extract features, the HourglassNet which has a good effect in human body posture estimation and is widely applied is referred to, HourglassNet is formed by serially connecting Hourglass modules of a basic module of a down-sampling-up-sampling structure similar to a Hourglass, and feature extraction is carried out through the symmetrical structure of the Hourglass modules, so that the network continuously repeats the feature transmission process from bottom to top and from top to bottom, local and global information is integrated, features under different scales are captured, and the relative relationship between key points is better acquired. In the invention, the feature extraction backbone network comprises two parts, wherein the first part mainly functions to convert the number of channels of an input image from 3 to 128, the specific network design adopts two-layer two-dimensional convolution with convolution kernel of 1 × 1, the number of channels is changed into 64 by the first layer of convolution, the number of channels is changed into 128 by the second layer of convolution, namely the specification of a data stream in the network is changed into [ b × 128 × 144 × 640] after the two-layer two-dimensional convolution of the first part. The second part uses a classical Hourglass module, the left structure of the down-sampling of the module is mainly a convolution layer and a down-sampling pooling layer, and the purpose is to extract image features and continuously reduce the resolution of a feature map so as to filter redundant features and learn feature information with stronger robustness; and the up-sampling structure on the right side mainly comprises a convolution layer and an up-sampling layer so as to restore the resolution which is the same as that of the feature map on the left side, thereby realizing the information fusion of bottom layer features and high layer features through the feature map fusion with the same size, and greatly improving the precision of key point positioning. After passing through the Hourglass module of the second part, the data flow specification in the network is still [ b × 128 × 144 × 640 ].
The key point coordinate regression branch coord _ head branch has the function of directly regressing the coordinates of the key points in the graph and the confidence degree of the coordinates predicted as the key points. The branch needs to set a super parameter N, where N is the maximum possible number of glass insulator pieces in the glass insulator picture, and in this embodiment, N is 52. The branch takes the output of the backbone as the input, and the network structure mainly comprises a first convolution layer, a first maximum pooling layer, a second convolution layer, a second maximum pooling layer and an average pooling layer. The parameters of the first convolutional layer are that the number of output channels is 64, the convolutional kernel is 3 multiplied by 3, the step length is 1, and the padding is 1, namely the data stream specification is changed into [ b multiplied by 64 multiplied by 144 multiplied by 640] after passing through the first convolutional layer; the data flow specification becomes [ b × 64 × 72 × 320] after passing through the first maximum pooling layer; the number of output channels of the second convolutional layer is set to be N × 3, so that the data flow specification becomes [ b × (N × 3) × 72 × 320] after passing through the second convolutional layer; the data flow specification becomes [ b × (N × 3) × 36 × 160] after passing through the second maximum pooling layer; after the last average pooling layer, the data stream specification was changed to [ b × (N × 3) × 1 × 1 ]. That is, each graph will have (N × 3) predicted outputs, where the first 2 × N outputs represent the coordinates of N keypoints and the next N outputs are the confidence with which the corresponding keypoint coordinates are predicted as keypoints.
The function of the probability heat map prediction branch heat map _ head is to predict the probability heat map of each pixel point in the input image as a key pointpredI.e. the heatmap of the branch outputpredThe size is consistent with the size of the image input into the network; the branch also takes the output of the backhaul as input, and mainly comprises three convolutional layers, the number of output channels is 64, 32 and 1 in sequence, the sizes of the convolutional cores are all 3 × 3, and the step size and padding are all 1, namely after the three convolutional layers, the data stream specification is changed into [ b × 1 × 144 × 640]](ii) a Before final output, the output needs to be normalized to be between (0,1) through a sigmoid function, and the interpretability is enhanced.
b. Loss function design
The Loss of multitask in the training process consists of three parts of Loss, which are respectively defined as Heatmaploss、Coordloss、ScorelossThe specific calculation method of each loss is as follows:
Heatmaplosscalculated is the predicted heatmappredWith the Gaussian heatmap as an annotationgenWith the loss in between. Here to generate a Gaussian heatmap of picture PgenThe method of generating a gaussian heatmap is illustrated by way of example: after labeling the key points of the picture P with labelme, a packet is generatedKey point coordinate List [ (pointx)1,pointy1),...,(pointxk,pointyk),...,(pointxn,pointyn)]And (n is the number of the key points) generating a Gaussian heat map for each key point by the annotation file, and finally summing corresponding positions of all the Gaussian heat maps to obtain a final Gaussian heat map. Namely:
wherein i is a row coordinate, j is a vertical coordinate, i is more than or equal to 0 and less than w, y is more than or equal to 0 and less than h, w is the width of the picture P, and h is the height of the picture P. heatmapgen_kIs the kth key point (pointx)k,pointyk) The generated gaussian heatmap is specifically defined as follows:
delta is a Gaussian function regulating factor, and the invention takes delta to be 3.
Heatmaploss=Binary_Cross_Entropy(heatmappred,heapmapgen)
CoordlossCalculated is the predicted Loss of keypoint coordinates, using the Smooth-L1-Loss employed in Fast RCNN as a reference, let (tx)k,tyk) Denotes the predicted kth coordinate, (vx)k,vyk) For the corresponding labeled keypoint coordinates:
wherein smoothL1The function is defined as follows, x being the argument:
Scorelosscalculated is confidence p ^ and heatmap of each predicted key point coordinategenValue p of the corresponding position in*Cross entropy loss between;
Scoreloss=-p*log(p^)
the total training loss is the sum of the losses of the three parts, and the specific gravity is the same and is 1, namely:
Loss=Heatmaploss+Coordloss+Scoreloss
c. setting training parameters
All layers of the model adopt a Kaiming parameter initialization method, an experiment optimizer is set to Adam, the initial learning rate is set to be 0.001, and the batch size is 16.
d. Setting a training completion flag
And setting a training completion flag as the number of iterations reached.
e. Preservation model
After the training is finished, the structure and the weight of the key point detection model are stored so as to be loaded into the model when the reasoning needs to be carried out on the tested glass insulator picture.
6) The trained example segmentation model and the key point detection model are connected in series to work: inputting a glass insulator picture to be detected into a trained example segmentation model, cutting off a region of a segmentation result (namely a local glass insulator region) and inputting the region into the trained end-to-end key point detection model to obtain coordinate values of key points at positions of glass insulator sub-sheets in the graph, and realizing accurate positioning of the glass insulator sub-sheets.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.
Claims (5)
1. A glass insulator piece positioning method based on end-to-end key point detection is characterized by comprising the following steps:
1) collecting a visible light image of the power equipment shot in the power inspection process, sorting out an image mainly containing a glass insulator, namely a glass insulator image, constructing an example segmentation data set by using the glass insulator image, and manually marking the area of the whole string of glass insulators by using an image marking tool;
2) data expansion is carried out on all original data sets by using a data enhancement algorithm, so that the data volume is increased;
3) training an example segmentation data set by using an example segmentation model to obtain an example segmentation model capable of accurately segmenting the region where the whole string of glass insulators are located, segmenting each picture in the example segmentation data set by using the example segmentation model, cutting and storing the segmented regions, and summarizing the segmented regions into a key point detection data set;
4) manually marking the central point of the edge of the outer ring glass of each glass insulator in the data set by using an image marking tool to detect the central point as the key point of the glass insulator piece, and performing data expansion by using a data enhancement algorithm;
5) aiming at a key point detection data set, designing an end-to-end key point detection model, setting different parameters to carry out debugging training on the model, and storing the model which shows the best performance in a verification set; the method comprises the following steps of designing an end-to-end key point detection model suitable for a glass insulator, wherein the specific conditions are as follows:
a. network architecture
Adjusting the size of an input image and interpolating the input image to a fixed width and height, and setting the specification of the image input by the network as [ b × 3 × h × w ] to ensure that the width and height of the image input into the network are all multiples of 16, wherein b is batch size, 3 is the number of RGB (red, green and blue) channels of the image, h is the height of the image, and w is the width of the image;
the whole network structure is divided into 3 main parts: the method comprises the steps of extracting characteristics of a backbone network backbone, a key point coordinate regression branch core _ head and a probability heat map prediction branch headmap _ head, wherein the core _ head and the headmap _ head are parallel structures, and the specific meanings and detailed structures of all parts are as follows;
the function of the backbone network for feature extraction is to extract features, the HourglassNet which is good in effect in human body posture estimation and widely applied is referred to, the HourglassNet is formed by serially connecting Hourglass modules of basic modules of a down-sampling-up-sampling structure similar to a Hourglass, and feature extraction is carried out through symmetrical structures of the Hourglass modules, so that the network continuously repeats feature transfer processes from bottom to top and from top to bottom, local and global information is integrated, features under different scales are captured, and relative relations between key points are better acquired; the feature extraction backbone network comprises two parts, wherein the first part is mainly used for converting the channel number of an input image from 3 to 128, two-dimensional convolution with 1 x 1 of convolution kernels of two layers is adopted in specific network design, the channel number is changed into 64 through the first layer of convolution, the channel number is changed into 128 through the second layer of convolution, namely after the two-dimensional convolution of the first part is carried out, the specification of a data stream in the network is changed into [ b x 128 x h x w ]; the second part uses a classical Hourglass module, the left structure of the down-sampling of the module is mainly a convolution layer and a down-sampling pooling layer, and the purpose is to extract image features and continuously reduce the resolution of a feature map so as to filter redundant features and learn feature information with stronger robustness; the right upper sampling structure mainly comprises a convolution layer and an upper sampling layer to restore the resolution which is the same as that of the left characteristic diagram, so that the information fusion of bottom layer and high layer characteristics can be realized through the characteristic diagram fusion with the same size, and the precision of key point positioning is greatly improved; after passing through the Hourglass module of the second part, the data stream specification in the network is still [ b × 128 × h × w ];
the key point coordinate regression branch coord _ head is used for directly regressing to obtain the coordinates of key points in the graph and the confidence coefficient of each coordinate prediction as the key points; the branch needs to set a hyper-parameter N, wherein N is the most possible number of glass insulator pieces in the glass insulator picture; the branch takes the output of a feature extraction backbone network backbone as input, and the network structure mainly comprises a first convolution layer, a first maximum pooling layer, a second convolution layer, a second maximum pooling layer and an average pooling layer; the parameters of the first convolutional layer are that the number of output channels is 64, the convolutional kernel is 3 multiplied by 3, the step length is 1, and the padding is 1, namely the data stream specification is changed into [ b multiplied by 64 multiplied by h multiplied by w ] after the first convolutional layer is passed; the data flow specification becomes [ b × 64 × h/2 × w/2] after passing through the first maximum pooling layer; the number of output channels of the second convolution layer is set to be N x 3, so that the data flow specification becomes [ b x (N x 3) x h/2 x w/2] after passing through the second convolution layer; the data flow specification becomes [ b × (N × 3) × h/4 ×/4] after passing through the second maximum pooling layer; after the last average pooling layer, the data stream specification becomes [ b × (N × 3) × 1 × 1 ]; that is, each graph will have N × 3 predicted outputs, where the first 2 × N outputs represent the coordinates of N keypoints and the next N outputs are the confidence levels that the corresponding keypoint coordinates are predicted as keypoints;
the function of the probability heat map prediction branch heat map _ head is to predict the probability heat map of each pixel point in the input image as a key pointpredI.e. the heatmap of the branch outputpredThe size is consistent with the size of the image input into the network; the branch also takes the output of the backhaul as input, and mainly comprises three convolutional layers, the number of output channels is 64, 32 and 1 in sequence, the sizes of the convolutional cores are all 3 multiplied by 3, and the step length and the padding are all 1, namely after the three convolutional layers, the data stream specification is changed into [ b multiplied by 1 multiplied by h multiplied by w](ii) a Before final output, the output is normalized to be between (0,1) through a sigmoid function, and thus the value of each pixel point represents the probability value that the coordinate position is the key point;
b. loss function design
The Loss of multitask in the training process consists of three parts of Loss, which are respectively defined as Heatmaploss、Coordloss、ScorelossThe specific calculation method of each loss is as follows:
Heatmaplosscalculated is the predicted heatmappredWith the Gaussian heatmap as an annotationgenLoss in between; generating a Gaussian heatmap of Picture PgenThe method comprises the following steps: after labeling the key points of the picture P with labelme, a list [ (pointx) containing the coordinates of the key points is generated1,pointy1),...,(pointxk,pointyk),...,(pointxn,pointyn)]The annotation file (b) is that n is the number of the key points, a gaussian heat map is generated for each key point, and finally, the corresponding positions of all the gaussian heat maps are summed to obtain a final gaussian heat map, namely:
wherein i is a row coordinate, j is a vertical coordinate, i is more than or equal to 0 and less than w, y is more than or equal to 0 and less than h, w is the width of the picture P, h is the height of the picture P, heatmapgen_kIs the kth key point (pointx)k,pointyk) The generated gaussian heatmap is specifically defined as follows:
wherein, delta is a Gaussian function adjusting factor;
Heatmaploss=Binary_Cross_Entropy(heatmappred,heapmapgen)
Coordlosscalculated is the predicted Loss of keypoint coordinates, using the Smooth-L1-Loss employed in Fast RCNN as a reference, let (tx)k,tyk) Denotes the predicted kth coordinate, (vx)k,vyk) For the corresponding labeled keypoint coordinates:
wherein smoothL1The function is defined as follows, x being the argument:
Scorelosscalculated is confidence p ^ and heatmap of each predicted key point coordinategenValue p of the corresponding position in*Cross entropy loss between;
Scoreloss=-p*log(p^)
the total training loss is the sum of the losses of the three parts, and the specific gravity is the same and is 1, namely:
Loss=Heatmaploss+Coordloss+Scoreloss
c. setting training parameters
All layers of the model adopt a Kaiming parameter initialization method, an experiment optimizer is set to Adam, and the initial learning rate is set to be 0.001;
6) the trained example segmentation model and the key point detection model are connected in series to work: inputting a glass insulator picture to be detected into a trained example segmentation model, cutting a region of a segmentation result, namely a local glass insulator region, and inputting the region into a trained key point detection model to obtain coordinate values of key points of positions of glass insulator pieces in the picture, so that the glass insulator pieces are accurately positioned.
2. The method for positioning the glass insulator sheet based on the end-to-end key point detection as claimed in claim 1, wherein in step 1), the collected glass insulator images are all high-definition visible light images with a width of 8688 and a height of 5792, and since direct training of such high-definition images occupies a lot of memory space, under the condition of ensuring the definition of the images, all the images are resized, i.e. interpolated, into images with a width of 1448 and a height of 965, which ensure that the aspect ratio is not changed, and then are labeled and trained; in addition, in order that the area where the glass insulator cut in step 3) is located does not contain a background as much as possible, the marking range is the minimum circumscribed polygon of the whole string of glass insulators.
3. The method of claim 1, wherein in step 2), the image is data-augmented using a data enhancement algorithm, comprising:
a. random image rotation: the rotating angle is randomly selected from-20 degrees to +20 degrees;
b. random image cropping: cutting four fifths of the whole image randomly;
c. randomly turning horizontally;
d. random contrast and color transition.
4. The method as claimed in claim 1, wherein in step 3), the example segmentation model is Yolact + +, which is a modified version of a one-stage real-time example segmentation model Yolact, further improving the operation speed and segmentation accuracy of the overall model, which divides the example into two parallel subtasks: firstly, generating a group of prototype masks, secondly, predicting the coefficient of each mask, and finally, carrying out linear combination on the coefficients of the prototype masks and the masks to generate example masks; the model is used for segmenting each picture in the example segmentation data set, so that a mask in the region where each glass insulator string is located in each picture and an image part corresponding to the minimum external moment of the mask can be obtained, the mask is assumed to be R, the angle of a mask binary image, namely the orientation angle of the glass insulator in the image, is calculated, the R is rotated by the angle, the glass insulator in the R is in the horizontal direction, and finally the minimum external moment region where the glass insulator is located is cut out according to the mask to serve as image data of a key point detection part.
5. The method for positioning glass insulator sheets based on end-to-end key point detection according to claim 1, wherein in step 4), an image labeling tool labelme is used to manually label the center point of the outer ring glass edge of each glass insulator in the key point detection data set as the key point of the glass insulator sheet, the labeling type is selected as point, and the labeling sequence is strictly observed from left to right row by row, so that the predicted coordinate points correspond to each other when the coordinates of the key points are subsequently calculated to perform regression; in order to avoid model overfitting and improve generalization capability of the model, data expansion is carried out by using a data enhancement algorithm, and the data expansion comprises the following steps:
a. randomly turning horizontally or vertically;
b. random contrast and color transition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110118779.7A CN112750125B (en) | 2021-01-28 | 2021-01-28 | Glass insulator piece positioning method based on end-to-end key point detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110118779.7A CN112750125B (en) | 2021-01-28 | 2021-01-28 | Glass insulator piece positioning method based on end-to-end key point detection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112750125A CN112750125A (en) | 2021-05-04 |
CN112750125B true CN112750125B (en) | 2022-04-15 |
Family
ID=75653317
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110118779.7A Active CN112750125B (en) | 2021-01-28 | 2021-01-28 | Glass insulator piece positioning method based on end-to-end key point detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112750125B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113449718A (en) * | 2021-06-30 | 2021-09-28 | 平安科技(深圳)有限公司 | Method and device for training key point positioning model and computer equipment |
CN113902733B (en) * | 2021-10-29 | 2024-08-02 | 广东电网有限责任公司江门供电局 | Spacer defect detection method based on key point detection |
CN116109635B (en) * | 2023-04-12 | 2023-06-16 | 中江立江电子有限公司 | Method, device, equipment and medium for detecting surface quality of composite suspension insulator |
CN116213962B (en) * | 2023-05-10 | 2023-08-11 | 杭州乾瑭云科技有限公司 | Metal plate cutting control method and system based on state prediction |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111047554A (en) * | 2019-11-13 | 2020-04-21 | 华南理工大学 | Composite insulator overheating defect detection method based on instance segmentation |
CN111402247A (en) * | 2020-03-23 | 2020-07-10 | 华南理工大学 | Machine vision-based method for detecting defects of suspension clamp on power transmission line |
CN111462057A (en) * | 2020-03-23 | 2020-07-28 | 华南理工大学 | Transmission line glass insulator self-explosion detection method based on deep learning |
CN112001294A (en) * | 2020-08-19 | 2020-11-27 | 福建船政交通职业学院 | YOLACT + + based vehicle body surface damage detection and mask generation method and storage device |
CN112233092A (en) * | 2020-10-16 | 2021-01-15 | 广东技术师范大学 | Deep learning method for intelligent defect detection of unmanned aerial vehicle power inspection |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111563452B (en) * | 2020-05-06 | 2023-04-21 | 南京师范大学镇江创新发展研究院 | Multi-human-body gesture detection and state discrimination method based on instance segmentation |
-
2021
- 2021-01-28 CN CN202110118779.7A patent/CN112750125B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111047554A (en) * | 2019-11-13 | 2020-04-21 | 华南理工大学 | Composite insulator overheating defect detection method based on instance segmentation |
CN111402247A (en) * | 2020-03-23 | 2020-07-10 | 华南理工大学 | Machine vision-based method for detecting defects of suspension clamp on power transmission line |
CN111462057A (en) * | 2020-03-23 | 2020-07-28 | 华南理工大学 | Transmission line glass insulator self-explosion detection method based on deep learning |
CN112001294A (en) * | 2020-08-19 | 2020-11-27 | 福建船政交通职业学院 | YOLACT + + based vehicle body surface damage detection and mask generation method and storage device |
CN112233092A (en) * | 2020-10-16 | 2021-01-15 | 广东技术师范大学 | Deep learning method for intelligent defect detection of unmanned aerial vehicle power inspection |
Also Published As
Publication number | Publication date |
---|---|
CN112750125A (en) | 2021-05-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112750125B (en) | Glass insulator piece positioning method based on end-to-end key point detection | |
CN111047554B (en) | Composite insulator overheating defect detection method based on instance segmentation | |
CN113569667B (en) | Inland ship target identification method and system based on lightweight neural network model | |
CN111523521B (en) | Remote sensing image classification method for double-branch fusion multi-scale attention neural network | |
CN111079739B (en) | Multi-scale attention feature detection method | |
CN110930342B (en) | Depth map super-resolution reconstruction network construction method based on color map guidance | |
CN111401361A (en) | End-to-end lightweight deep license plate recognition method | |
CN110009700B (en) | Convolutional neural network visual depth estimation method based on RGB (red, green and blue) graph and gradient graph | |
CN111291826A (en) | Multi-source remote sensing image pixel-by-pixel classification method based on correlation fusion network | |
CN113449691A (en) | Human shape recognition system and method based on non-local attention mechanism | |
CN112884715A (en) | Composite insulator grading ring inclination fault detection method based on deep learning | |
CN113610905B (en) | Deep learning remote sensing image registration method based on sub-image matching and application | |
CN114841244A (en) | Target detection method based on robust sampling and mixed attention pyramid | |
CN115953408A (en) | YOLOv 7-based lightning arrester surface defect detection method | |
CN112419333A (en) | Remote sensing image self-adaptive feature selection segmentation method and system | |
CN111462090A (en) | Multi-scale image target detection method | |
CN114445615A (en) | Rotary insulator target detection method based on scale invariant feature pyramid structure | |
CN114862768A (en) | Improved YOLOv5-LITE lightweight-based power distribution assembly defect identification method | |
CN114565824B (en) | Single-stage rotating ship detection method based on full convolution network | |
CN115830535A (en) | Method, system, equipment and medium for detecting accumulated water in peripheral area of transformer substation | |
CN115690574A (en) | Remote sensing image ship detection method based on self-supervision learning | |
CN114677357A (en) | Model, method and equipment for detecting self-explosion defect of aerial photographing insulator and storage medium | |
CN116452848A (en) | Hardware classification detection method based on improved attention mechanism | |
CN116385950A (en) | Electric power line hidden danger target detection method under small sample condition | |
CN111160372A (en) | Large target identification method based on high-speed convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |