Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a transformer substation insulator defect detection neural network construction method for adaptive selection of a receptive field, and the method is a one-stage method for improving a FPN structure and a receptive field adaptive selection module by combining a Resnest (Split-Attention Networks) network, so that the precision is improved on the premise of not obviously increasing the number of parameters, and the defect detection of the transformer substation insulator is realized.
The invention adopts the following technical scheme.
The method for constructing the receptive field self-adaptive transformer substation insulator defect detection neural network comprises the following steps:
step 1, collecting a patrol image of a transformer substation in a real scene, and extracting an insulator image from the patrol image;
step 2, based on a defect simulation technology, performing data enhancement on the insulator image to generate a defect insulator sample image; in the defective insulator sample image, the ratio of the number of defective insulator pieces to the number of normal insulator pieces is 1: 1; constructing a data set for detecting the defects of the insulator of the transformer substation by using the defective insulator sample image;
step 3, constructing a defect detection neural network, wherein the defect detection neural network comprises the following steps: based on a backbone network of a ResNest residual module, an FPN network spliced by channels, a receptive field self-adaptive selection module and a detection head are adopted;
and 4, training the defect detection neural network by using the data set constructed in the step 2, and taking the trained defect detection neural network as a transformer substation insulator defect detection neural network based on receptive field self-adaption.
Preferably, in step 1, the inspection image includes a plurality of power device images, an insulator image is obtained after cutting the image, and an original data set is constructed by using the insulator image.
Preferably, step 2 comprises:
step 2.1, based on a statistical method, taking the pixel color with the largest proportion in the insulator image as a defect filling color;
step 2.2, calculating the maximum value of the inter-class variance of the background and the target in the insulating sub-image according to the following relational expression:
in the formula (I), the compound is shown in the specification,
the maximum value of the between-class variance of the background and the target in the insulator image is obtained, and the gray value t corresponding to the maximum value of the between-class variance is used as a segmentation threshold value of the background and the target in the insulator image;
P0(t) and P1(t) the ratio of the number of background and target pixels in the insulator image to the total image, H0(t) and H1(t) average gray values of the background and the target in the insulator image respectively;
step 2.3, the gray value interval of the insulator image is [0, m ], and the gray value interval of the insulator image is divided into a first interval [0, t ] and a second interval [ t +1, m ] by a division threshold t;
step 2.4, segmenting the background and the target in the insulator image according to the first interval and the second interval of the insulator image gray value by the following relational expression:
in the formula (I), the compound is shown in the specification,
f(i,j)to divide the gray value of the ith row and the jth column in the previous insulator image,
P(i,j)dividing the insulator image into a gray value of the ith row and the jth column in the divided insulator image;
satisfy P(i,j)The region 1 is an insulator region image which is a target segmented from the insulator image;
step 2.5, based on a Radon transformation method, performing line integration on the insulator region image along different directions by using straight lines with different intercept, and taking a straight line corresponding to the maximum value of the line integration as a main shaft of the insulator region;
step 2.6, parallel lines are respectively made on the positions, at distances dist from the main shaft, of the two sides of the main shaft in the insulator area, and the outer sides of the parallel lines are defined as defect generation areas; wherein the value of dist is 0.5 times the radius of the insulator sheet;
step 2.7, randomly generating defects in the defect generation area, and filling the defects by using defect filling colors to obtain defect insulator images; wherein the randomly generated defect number in the defect generation area is 3-7;
and 2.8, performing filtering smoothing treatment on the defective insulator image by using a Gaussian filter to generate a defective insulator sample image.
Preferably, in step 2, the data set for detecting the insulator defect of the substation is divided into a training set and a test set according to a ratio of 4: 1.
Preferably, in the defect detection neural network, a main network based on a ResNest residual module is used for carrying out feature extraction on a defective insulator sample image to obtain three feature maps with different resolutions; improving a fusion mode in the FPN network, namely performing channel splicing on three feature graphs with different resolutions to obtain a first feature fusion graph, a second feature fusion graph and a third feature fusion graph, and simultaneously performing continuous dimensionality reduction on the feature graph with the lowest resolution in the three feature graphs with different resolutions to obtain a fourth feature fusion graph and a fifth feature fusion graph;
inputting the five feature fusion graphs into a receptive field self-adaptive selection module to obtain a receptive field self-adaptive feature graph; and inputting the receptive field self-adaptive characteristic diagram into a detection head to obtain a category prediction result and a frame regression prediction result of the insulator defect detection boundary frame.
Preferably, in step 3, the backbone network based on the reseest residual module includes N layers of reseest residual modules, and each residual module includes a direct connection path and an indirect connection path; wherein, the direct connection path directly inputs the characteristics output by the residual error module of the previous stage to the residual error module of the current stage without processing the characteristics; the indirect connection path performs cyclic convolution operation on the characteristics output by the last-stage residual error module, and inputs the characteristic residual error obtained by the convolution operation into the current-stage residual error module; the final output of the backbone network is the sum of the characteristic residuals transmitted by the direct connection path and the indirect connection path.
Preferably, the backbone network comprises a 33-layer reseest residual module;
outputting a first feature map by a layer 7 residual module; outputting a second feature map by a layer 30 residual module; the third feature map is output by the residual module of layer 33.
Preferably, the resolution of the first feature map is 40 × 40 × 512, the resolution of the second feature map is 20 × 20 × 1024, and the resolution of the third feature map is 10 × 10 × 2048.
Preferably, in the FPN network in step 3, the channel splicing is performed on each feature map according to the following relation:
in the formula (I), the compound is shown in the specification,
Plfor the l-th feature fusion map obtained by channel splicing, l is 1,2,3, i.e., P1As a first feature fusion map, P2As a second feature fusion map, P3A third feature fusion map;
Conv1×1is a convolution operation of 1 x 1, and the convolution operation,
Conv3×3is a convolution operation of 3 x 3, and the convolution operation,
concat is the number of splicing operations that are performed,
up is a linear interpolation 2 times Up-sampling operation,
Clfor the l dimension reduction characteristic diagram after the channel dimension reduction, l is 1,2,3, and the following relation is satisfied:
Cl=Conv1×1(Fl)
in the formula, FlThe first characteristic diagram output by the backbone network; i.e., F1Is a first characteristic diagram, F2Is a second characteristic diagram, F3Is a third characteristic diagram.
Preferably, in the FPN network of step 3, a third feature map F output to the backbone network is also output3Performing a convolution operation of 3 × 3 to obtain a fourth bitSign the fusion map P4Then, for the fourth feature fusion map P4Performing a convolution operation of 3 × 3 to obtain a fifth feature fusion map P5。
Preferably, in step 3, the detecting steps of the receptive field adaptive selection module and the detecting head include:
3.1, respectively carrying out dimensionality reduction on the first to fifth feature fusion graphs based on 1 multiplied by 1 convolution operation to obtain first to fifth dimensionality reduction feature fusion graphs;
step 3.2, extracting first induced field features from the first to fifth dimension reduction feature fusion graphs based on 1 multiplied by 1 convolution operation; extracting second receptive field characteristics from the first dimension-reduction characteristic fusion graph to the fifth dimension-reduction characteristic fusion graph based on the first cavity convolution operation; extracting a third receptive field feature from the first to fifth dimension-reduction feature fusion graphs based on a second cavity convolution operation; wherein the first hole convolution operation is a 3 × 3 convolution operation with a hole rate of 2, and the second hole convolution operation is a 3 × 3 convolution operation with a hole rate of 4;
step 3.3, aggregating the first receptive field characteristic, the second receptive field characteristic and the third receptive field characteristic in a channel splicing mode to obtain a first aggregated receptive field characteristic;
step 3.4, performing 1 × 1 convolution operation on the first aggregation receptive field characteristic, and fusing multi-channel information to obtain a first fused receptive field characteristic;
step 3.5, extracting a fourth receptive field characteristic from the first fusion receptive field characteristic based on 1 multiplied by 1 convolution operation; meanwhile, extracting a fifth receptive field characteristic from the first fusion receptive field characteristic based on a third cavity convolution operation; extracting a sixth receptive field feature from the first fusion receptive field feature based on a fourth cavity convolution operation; wherein the third hole convolution operation is a 3 × 3 convolution operation with a hole rate of 6, and the fourth hole convolution operation is a 3 × 3 convolution operation with a hole rate of 12;
step 3.6, polymerizing the fourth receptive field characteristic, the fifth receptive field characteristic and the sixth receptive field characteristic in a channel splicing mode to obtain a second polymerized receptive field characteristic;
step 3.7, fusing multi-channel information on the second polymerization receptive field characteristic based on 1 multiplied by 1 convolution operation; based on 1 × 1 convolution operation, performing channel number transformation on the fused second polymerization receptive field characteristic, wherein the channel number obtained by transformation is consistent with the first to fifth characteristic fusion graphs;
step 3.8, adding the second polymerized receptive field characteristic subjected to fusion and channel transformation with the first to fifth characteristic fusion graphs to obtain a receptive field self-adaptive characteristic graph;
step 3.9, activating the receptive field self-adaptive characteristic map by a ReLU activation function and then outputting the receptive field self-adaptive characteristic map to a detection head, and outputting a category prediction result and a frame regression prediction result of the insulator defect detection boundary frame by the detection head;
the detection head comprises a classification branch and a regression branch, the classification branch outputs a classification prediction result, and the regression branch outputs a frame regression prediction result.
Preferably, in step 3.9, the regression branch and the classification branch each include four convolutional layers; the convolution kernel size in each convolution layer is 3, and the step length is 1;
and the parameters between the regression branch and the classification branch are not shared.
Preferably, step 3 further comprises:
step 3.10, setting the overlapping degree threshold of the insulator defect detection boundary frame output by the detection head to be 0.5, and screening redundant insulator defect detection boundary frames by using a Soft-nms non-maximum value inhibition method, wherein the following relational expression is satisfied:
in the formula (I), the compound is shown in the specification,
Sdd is the confidence of the D-th insulator defect detection boundary frame, D is 1,2, …, D is the total number of the insulator defect detection boundary frames,
bdfor the d-th insulator defect detection bounding box,
m is the insulator defect detection boundary box with the highest confidence coefficient,
Iou(m,bd) The overlapping degree of the d-th insulator defect detection boundary frame and the insulator defect detection boundary frame with the highest confidence coefficient is obtained;
when Iou (m, b)d)>And when the confidence coefficient of the d-th insulator defect detection boundary box is exponentially attenuated at 0.5.
Preferably, in step 4, training a defect detection neural network by using a random gradient descent algorithm and a learning rate preheating method;
in the first 500 iterations, the learning rate increases; at the 16 th and 22 nd rounds, the learning rate was decreased to 10% of the initial value of the learning rate, the batch size was set to 8, and the training was stopped until the loss was below 0.1.
Compared with the prior art, the method has the advantages that the method can obtain good balance between precision and speed, and realizes the fine detection of the insulator sheet in the scene of the transformer substation; important features can be learned and concerned, irrelevant features can be suppressed, network convergence speed is increased, and detection precision is improved; the method can self-adaptively adjust the receptive field, obtain higher precision for targets with different sizes and enhance the generalization capability of the network.
Detailed Description
The present application is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present application is not limited thereby.
Referring to fig. 1, a neural network construction method for detecting defects of insulator in transformer substation with adaptive receptive field includes steps 1 to 4.
Step 1, collecting a patrol inspection image of a transformer substation in a real scene, and extracting an insulator image from the patrol inspection image.
Specifically, in the step 1, the inspection image comprises a plurality of power equipment images, an insulator image is obtained after the images are cut, and an original data set is constructed by using the insulator image.
Step 2, based on a defect simulation technology, performing data enhancement on the insulator image to generate a defect insulator sample image; in the defective insulator sample image, the ratio of the number of defective insulator pieces to the number of normal insulator pieces is 1: 1; and constructing a data set for detecting the defects of the insulator of the transformer substation by using the defective insulator sample image.
Specifically, step 2 comprises:
step 2.1, based on a statistical method, taking the pixel color with the largest proportion in the insulator image as a defect filling color;
step 2.2, calculating the maximum value of the inter-class variance of the background and the target in the insulating sub-image according to the following relational expression:
in the formula (I), the compound is shown in the specification,
maximum between class variance for background and target in insulator imageAnd the gray value t corresponding to the maximum value of the inter-class variance is used as a segmentation threshold value of the background and the target in the insulator image;
P0(t) and P1(t) the ratio of the number of background and target pixels in the insulator image to the total image, H0(t) and H1(t) average gray values of the background and the target in the insulator image respectively;
step 2.3, the gray value interval of the insulator image is [0, m ], and the gray value interval of the insulator image is divided into a first interval [0, t ] and a second interval [ t +1, m ] by a division threshold t;
step 2.4, segmenting the background and the target in the insulator image according to the first interval and the second interval of the insulator image gray value by the following relational expression:
in the formula (I), the compound is shown in the specification,
f(i,j)to divide the gray value of the ith row and the jth column in the previous insulator image,
P(i,j)dividing the insulator image into a gray value of the ith row and the jth column in the divided insulator image;
satisfy P(i,j)The region 1 is an insulator region image which is a target segmented from the insulator image;
step 2.5, based on a Radon transformation method, performing line integration on the insulator region image along different directions by using straight lines with different intercept, and taking a straight line corresponding to the maximum value of the line integration as a main shaft of the insulator region;
step 2.6, parallel lines are respectively made on the positions, at distances dist from the main shaft, of the two sides of the main shaft in the insulator area, and the outer sides of the parallel lines are defined as defect generation areas; wherein the value of dist is 0.5 times the radius of the insulator sheet;
step 2.7, randomly generating defects in the defect generation area, and filling the defects by using defect filling colors to obtain defect insulator images; wherein the randomly generated defect number in the defect generation area is 3-7;
and 2.8, performing filtering smoothing treatment on the defective insulator image by using a Gaussian filter to generate a defective insulator sample image.
In the preferred embodiment of the present invention, in order to weaken the edge burrs of the defect circle and make the connection between the target and the background more natural, the generated defect image is finally filtered and smoothed by a gaussian filter with a gaussian kernel of (5 × 5) size.
Specifically, in step 2, the data set for detecting the insulator defect of the transformer substation is divided into a training set and a test set according to the ratio of 4: 1.
It should be noted that in the preferred embodiment of the present invention, the division ratio of the training set and the test set is a non-limiting preferred choice, and those skilled in the art can select different division ratios according to the training requirements of the model and the size of the data set.
Step 3, constructing a defect detection neural network, wherein the defect detection neural network comprises the following steps: the method comprises a main network based on a ResNest residual module, an FPN network spliced by channels, a receptive field self-adaptive selection module and a detection head.
In the defect detection neural network shown in fig. 2, a main network based on a reseest residual module is used for carrying out feature extraction on a defect insulator sample image to obtain three feature maps F with different resolutions1、F2、F3(ii) a The method improves the fusion mode in the FPN network by adding, namely three characteristic graphs F with different resolutions1、F2、F3Channel splicing is carried out to obtain a first feature fusion graph P1The second feature fusion map P2And a third feature fusion map P3Simultaneously, for the third feature map F with the lowest resolution in the three feature maps with different resolutions3Continuously reducing dimensions to obtain a fourth feature fusion graph P4And a fifth feature fusion map P5;
Inputting the five feature fusion graphs into a receptive field self-adaptive selection module to obtain a receptive field self-adaptive feature graph; and inputting the receptive field self-adaptive characteristic diagram into a detection head to obtain a category prediction result and a frame regression prediction result of the insulator defect detection boundary frame.
Specifically, in step 3, the backbone network of the defect detection neural network includes N layers of detect residual modules, and each residual module includes a direct connection path and an indirect connection path; wherein, the direct connection path directly inputs the characteristics output by the residual error module of the previous stage to the residual error module of the current stage without processing the characteristics output by the residual error module of the previous stage; the indirect connection path performs cyclic convolution operation on the characteristics output by the last-stage residual error module, and the characteristic residual error obtained by the convolution operation is input to the current-stage residual error module; the final output of the backbone network is the sum of the characteristic residuals transmitted by the direct connection path and the indirect connection path.
In the preferred embodiment of the present invention, a reseest network is constructed according to a ResNet structural paradigm, and the network parameter configuration is shown in table 1.
The third column of table 1 represents convolution layer parameters, brackets in each group represent a reseest residual module, a split-attention module is a core module of the reseest network, the split-attention module replaces the second convolution operation in the resenet block, and information interaction across feature graph groups is realized through combination of grouped convolution and attention mechanism. The residual error module comprises two paths, wherein one path is a direct connection path of the input features, the other path is a residual error of the input features obtained by performing three times of convolution operation in brackets on the input features, and finally the features on the two paths are added to obtain the output of the ResNest network. In table 1, the first convolutional layer0 has no reseest residual module, and 64 convolutional kernels with the size of 7 × 7 and the step size (stride) of 2 are used; the second convolutional layer1 includes 3 reseest residual modules, in each of which the first convolutional layer is 64 (1 × 1) convolutional kernels, the second convolutional layer is a split-attribute module, and the third convolutional layer is 256 (1 × 1) convolutional kernels; and, max pool in the second convolutional layer1 represents the maximum pooling operation with a pooling unit size of (3 × 3) and a step size of 2.
Table 1 parameters of the reseest network according to an embodiment of the present invention
As can be seen from table 1, in the preferred embodiment of the present invention, the backbone network includes 33 layers of reseest residual modules; outputting a first feature map F by a layer 7 residual error module1(ii) a Outputting a second feature map F by a layer 30 residual module2(ii) a Outputting a third feature map F by a residual module of layer 333。
Specifically, the first characteristic diagram F1Has a resolution of 40 × 40 × 512, and a second feature map F2Has a resolution of 20X 1024, and a third feature map F3Has a resolution of 10 × 10 × 2048.
Specifically, in the FPN network shown in fig. 3, the feature maps are channel-spliced according to the following relationship:
in the formula (I), the compound is shown in the specification,
Plfor the l-th feature fusion map obtained by channel splicing, l is 1,2,3, i.e., P1As a first feature fusion map, P2As a second feature fusion map, P3A third feature fusion map;
Conv1×1is a convolution operation of 1 x 1, and the convolution operation,
Conv3×3is a convolution operation of 3 x 3, and the convolution operation,
concat is the number of splicing operations that are performed,
up is a linear interpolation 2 times Up-sampling operation,
Clfor the l dimension reduction characteristic diagram after the channel dimension reduction, l is 1,2,3, and the following relation is satisfied:
Cl=Conv1×1(Fl)
in the formula, FlThe first characteristic diagram output by the backbone network; i.e., F1Is a first characteristic diagram, F2Is a second characteristic diagram, F3Is a third characteristic diagram.
In the preferred embodiment of the invention, three characteristic maps are usedThe dimension of the channel is reduced to 256, and a first dimension reduction feature map C is generated1(256 × 40 × 40), second dimension reduction feature map C2(256 × 20 × 20), third dimension reduction feature map C3(256 × 10 × 10). Then the third feature map F3Generating a third feature fusion map P after 3 x 3 convolution3Fusing the third feature to the graph P3Expanding resolution to be equal to that of the second dimension reduction characteristic diagram C through linear interpolation 2 times upsampling operation2The third feature is fused into the graph P3And a second dimension reduction feature map C2Splicing and fusing on channel dimension, and obtaining a second feature fusion graph P through 1 × 1 convolution2. The linear combination of information on the same pixel position with different channel dimensions can be obtained by performing 1 × 1 convolution operation on each spliced and fused feature map, and cross-channel information interaction is realized.
In the method provided by the invention, the fusion mode of adding elements in the FPN network is changed into the multi-channel splicing mode, so that the first to third feature fusion graphs spliced by channels can be obtained, and the technical problems that a main network lacks cross-channel interaction information and the efficiency of a feature fusion part is poor in the prior art are solved.
Specifically, in the FPN network of step 3, a third feature map F output to the backbone network is also output3Performing a convolution operation of 3 × 3 to obtain a fourth feature fusion map P4Then, for the fourth feature fusion map P4Performing a convolution operation of 3 × 3 to obtain a fifth feature fusion map P5。
Preferably, in step 3, as shown in fig. 4, the detecting steps of the receptive field adaptive selection module and the detecting head include:
3.1, respectively carrying out dimensionality reduction on the first to fifth feature fusion graphs based on 1 multiplied by 1 convolution operation to obtain first to fifth dimensionality reduction feature fusion graphs; in fig. 4, the input amount X is the first to fifth feature fusion maps;
step 3.2, extracting first induced field features from the first to fifth dimension reduction feature fusion graphs based on 1 multiplied by 1 convolution operation; extracting second receptive field characteristics from the first dimension-reduction characteristic fusion graph to the fifth dimension-reduction characteristic fusion graph based on the first cavity convolution operation; extracting a third receptive field feature from the first to fifth dimension-reduction feature fusion graphs based on a second cavity convolution operation; wherein the first hole convolution operation is a 3 × 3 convolution operation with a hole rate (rate) of 2, and the second hole convolution operation is a 3 × 3 convolution operation with a hole rate of 4;
step 3.3, aggregating the first receptive field characteristic, the second receptive field characteristic and the third receptive field characteristic in a channel splicing mode to obtain a first aggregated receptive field characteristic;
step 3.4, performing 1 × 1 convolution operation on the first aggregation receptive field characteristic, and fusing multi-channel information to obtain a first fused receptive field characteristic;
step 3.5, extracting a fourth receptive field characteristic from the first fusion receptive field characteristic based on 1 multiplied by 1 convolution operation; meanwhile, extracting a fifth receptive field characteristic from the first fusion receptive field characteristic based on a third cavity convolution operation; extracting a sixth receptive field feature from the first fusion receptive field feature based on a fourth cavity convolution operation; wherein the third hole convolution operation is a 3 × 3 convolution operation with a hole rate of 6, and the fourth hole convolution operation is a 3 × 3 convolution operation with a hole rate of 12;
step 3.6, polymerizing the fourth receptive field characteristic, the fifth receptive field characteristic and the sixth receptive field characteristic in a channel splicing mode to obtain a second polymerized receptive field characteristic;
step 3.7, fusing multi-channel information on the second polymerization receptive field characteristic based on 1 multiplied by 1 convolution operation; based on 1 × 1 convolution operation, performing channel number transformation on the fused second polymerization receptive field characteristic, wherein the channel number obtained by transformation is consistent with the first to fifth characteristic fusion graphs;
step 3.8, adding the second polymerized receptive field characteristic subjected to fusion and channel transformation with the first to fifth characteristic fusion graphs to obtain a receptive field self-adaptive characteristic graph;
step 3.9, activating the receptive field self-adaptive characteristic map by a ReLU activation function and then outputting the receptive field self-adaptive characteristic map to a detection head, and outputting a category prediction result and a frame regression prediction result of the insulator defect detection boundary frame by the detection head; in fig. 4, the receptive field adaptive feature map is an output quantity Y after being activated by the ReLU activation function;
the detection head comprises a classification branch and a regression branch, the classification branch outputs a classification prediction result, and the regression branch outputs a frame regression prediction result.
Wherein the ReLU activation function is expressed as: f (x) max (0, x).
Preferably, in step 3.9, the regression branch and the classification branch each include four convolutional layers; the convolution kernel size in each convolution layer is 3, and the step length is 1;
and the parameters between the regression branch and the classification branch are not shared.
Preferably, step 3 further comprises:
step 3.10, setting the overlapping degree threshold of the insulator defect detection boundary frame output by the detection head to be 0.5, and screening redundant insulator defect detection boundary frames by using a Soft-nms non-maximum value inhibition method, wherein the following relational expression is satisfied:
in the formula (I), the compound is shown in the specification,
Sdd is the confidence of the D-th insulator defect detection boundary frame, D is 1,2, …, D is the total number of the insulator defect detection boundary frames,
bdfor the d-th insulator defect detection bounding box,
m is the insulator defect detection boundary box with the highest confidence coefficient,
Iou(m,bd) The overlapping degree of the d-th insulator defect detection boundary frame and the insulator defect detection boundary frame with the highest confidence coefficient is obtained;
when Iou (m, b)d)>And when the confidence coefficient of the d-th insulator defect detection boundary box is exponentially attenuated at 0.5.
The detection head uses a Soft-nms non-maximum value inhibition method to remove repeated detection frames in a mode of reducing confidence, so that the detection performance of the network on dense targets can be improved.
Further, in step 4.9, the regression branch and the classification branch both include four convolutional layers; the convolution kernel size in each convolution layer is 3, and the step length is 1;
and the parameters between the regression branch and the classification branch are not shared.
In a preferred embodiment of the present invention, the defect detecting neural network shown in fig. 2 includes 5 detecting heads, the first to fifth feature fusion maps with different receptive field features are respectively input to one detecting head, and parameters of the 5 detecting heads are shared. Each detection head consists of two branches of regression and classification, each branch consisting of four (3 × 3) convolutional layers, the parameters between the regression branch and the classification branch are not shared. And (4) predicting the category of the anchor frame of each pixel point by the classification branch, and predicting the offset of the coordinate, the width and the height of the center point of the anchor frame relative to the anchor frame by frame regression. For feature maps with different sizes, anchor frames with different sizes and shapes are adopted in the detection head, the low feature layer is small in receptive field and suitable for detecting small targets, the high feature layer is large in receptive field and suitable for detecting large targets, and the size of the anchor frames is increased along with the increase of the levels. In order to ensure that the anchor frame can adapt to targets with different sizes and shapes, 3 scales and 3 different length-width ratios are respectively set, and 9 anchor frames are generated by one pixel point. Specific parameter settings are as in table 2:
table 2 anchor frame parameter configuration for one embodiment of the present invention
In the preferred embodiment of the invention, a plurality of detection heads are used for detecting a plurality of characteristics of different receptive fields at the same time, so that the technical problems that in the prior art, a detection head can only extract the characteristics of the fixed receptive field from a characteristic diagram with each resolution, and missing detection, false detection and the like are easy to occur when detecting defects of different scales of the insulator are solved.
And 4, training the defect detection neural network by using the data set constructed in the step 2, and taking the trained defect detection neural network as a transformer substation insulator defect detection neural network based on receptive field self-adaption.
Specifically, in step 4, training a defect detection neural network by using a random gradient descent algorithm and a learning rate preheating method;
the training optimizer adopts SGD optimizer, and the initial value of learning rate is set to 1 × 10-3Using a linear growth strategy, the learning rate increases in the previous 500 iterations; at the 16 th and 22 nd rounds, the learning rate was decreased to 10% of the initial value of the learning rate, the batch size was set to 8, and training was stopped until the loss was less than 0.1.
The main network adopts a pre-training model, freezes the parameters of stage1, does not perform gradient updating, detects the overfitting condition of the model which is enhanced and reduced by using data in network training, and adopts random mirror image inversion with the probability of 0.5. The software configuration of the system is python 3.7.10, pytorch1.8.1, cuda11.3, pycharm 2020.3; the hardware configuration is intel core i7-10750H 6 core 12 thread cpu, single video card: NVIDIA GeForce RTX-2060.
1688 images are used for training, 422 images are used for testing, the detection model obtained in the embodiment is compared with the classic algorithms SSD and RetinaNet, and the index of each type of detection Precision (AP) and the index of the mean Average Precision (mapp) are recorded, as shown in table 3.
TABLE 3 comparison of algorithmic Properties
As can be seen from the data in table 3, the SSD network based on VGG16 has a low accuracy, the detection accuracy of the classic RetinaNet network is slightly improved, but the improvement effect is small, and the method of the present invention is still improved by 2.16% on the basis of the RetinaNet network, thereby obtaining the highest detection accuracy.
To verify the effectiveness of the improved method, each improved portion of the inventive method was compared to the original algorithm, as shown in table 4.
TABLE 4 modified part comparison table
As can be seen from the data in table 4, the mAP index of the backbone network is improved by 1.83%, because resest can learn to focus on important features and suppress irrelevant features, compared with resenet, more effective features are extracted without increasing the number of the important features. After using the improved FPN structure on the basis of ResNest, mAP is increased by 0.13%, which proves that the improved FPN improves the expressive ability of the network through information interaction across the feature layer. On the basis of ResNest, an improved FPN structure and a receptive field self-adaptive selection module are combined, mAP is improved by 0.2%, and the excellent performance of the self-adaptive selection receptive field optimization technology in the multi-scale target detection problem is verified.
The present applicant has described and illustrated embodiments of the present invention in detail with reference to the accompanying drawings, but it should be understood by those skilled in the art that the above embodiments are merely preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not for limiting the scope of the present invention, and on the contrary, any improvement or modification made based on the spirit of the present invention should fall within the scope of the present invention.