CN113469992B

CN113469992B - Power equipment image defect detection method based on enhancement of different-level feature representation

Info

Publication number: CN113469992B
Application number: CN202110804090.XA
Authority: CN
Inventors: 徐中满; 刘术娟
Original assignee: Hefei Zhongke Rongdao Intelligent Technology Co ltd
Current assignee: Hefei Zhongke Rongdao Intelligent Technology Co ltd
Priority date: 2021-07-16
Filing date: 2021-07-16
Publication date: 2022-02-18
Anticipated expiration: 2041-07-16
Also published as: CN113469992A

Abstract

The invention relates to a method for detecting image defects of electric equipment based on enhancement of different hierarchical feature representations, which solves the defect of low image defect detection rate of the electric equipment caused by larger difference of image proportions of defect areas in the prior art. The invention comprises the following steps: acquiring training data; constructing an image defect detection network of the power equipment; training an image defect detection network of the power equipment; acquiring an image of the to-be-detected power equipment; and obtaining an image defect detection result of the power equipment. According to the method, the different-level feature representation enhancement in the power equipment image is fully considered, the classification feature is enhanced by using the positioning feature, the positioning feature is enhanced by using the classification feature, the high-level feature beneficial to classification and the low-level feature beneficial to positioning can be automatically combined, the feature enhancement is carried out by using the correlation among the different-level features, and the defect detection rate and the accuracy rate of the power equipment image are improved.

Description

Power equipment image defect detection method based on enhancement of different-level feature representation

Technical Field

The invention relates to the technical field of electrical equipment informatization, in particular to an electrical equipment image defect detection method based on enhancement of different-level feature representation.

Background

In the detection application of the transformer substation defect image, the image in the actual transformer substation is influenced by various environmental factors, so that the detection precision is low. In order to solve the problem, an effective idea is to collect a large amount of image data of defects in the transformer substation under different states, and then improve the target detection precision through training based on a deep learning detection model so as to meet the actual application degree.

However, in the process of collecting, the situation that the defect in the same scene has a long shot and a short shot occurs, as shown in fig. 2a and 2b, the gauge damage defect is smaller in the large image and larger in the short image; as further shown in fig. 3a and 3b, bird's nest defects are smaller in the large image but larger in the near image. Defects caused by the images of the electric equipment are too different in the images, classification information is more required for detection when the image is large, and positioning information is more required when the image is small.

Therefore, how to effectively distinguish based on the above to improve the defect image detection rate of the power equipment has become an urgent technical problem to be solved.

Disclosure of Invention

The invention aims to solve the defect that the defect detection rate of the image defect of the power equipment is low due to the fact that the difference of the image ratio of the defect area is large in the prior art, and provides a method for detecting the image defect of the power equipment based on the enhancement of different hierarchical feature representations to solve the problem.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a power equipment image defect detection method based on enhancement of different-level feature representation comprises the following steps:

acquisition of training data: acquiring and preprocessing the electric power equipment image with the marked defects to form a training data set;

constructing an image defect detection network of the power equipment: constructing an image defect detection network of the power equipment, which consists of different level feature representation networks, different level feature enhancement representation networks and a defect detection network;

training of the power equipment image defect detection network: inputting a training data set into an image defect detection network of the power equipment for training;

acquiring an image of the to-be-detected power equipment: acquiring an image of the electric power equipment to be detected, and preprocessing the image;

obtaining an image defect detection result of the power equipment: inputting the preprocessed image of the power equipment to be detected into the trained image defect detection network of the power equipment, and detecting a defect result in the image of the power equipment.

The construction of the power equipment image defect detection network comprises the following steps:

constructing different hierarchical feature representation networks based on a shallow convolutional neural network and a deep convolutional neural network, wherein the different hierarchical feature representation networks comprise convolution operation, pooling operation, activation operation and batch normalization operation;

setting convolutions of the shallow convolutional neural network and the deep convolutional neural network to adopt preset initial convolution to check the action domains, adding the action domains and summing the action domains to obtain a convolution result; performing pooling operation by adopting a preset pooling kernel to perform step-by-step sliding operation on the characteristic diagram, and obtaining a pooling value by using the maximum value in the action domain or averaging to obtain a pooling value;

constructing different hierarchical feature enhanced representation networks based on the feature pyramid network, and transmitting semantic information in high-level features to low-level features from top to bottom by adopting a laterally connected hierarchical structure;

two parts are included in the set feature pyramid network: different levels of features from bottom to top represent a network and an interpolation addition training process of adjacent layers from top to bottom; setting feature enhancement on the basis of features of each level;

constructing a defect detection network: the defect detection network comprises 2 branches, a classification branch of 1 × C convolution uses softmax loss, a positioning branch of 1 × 4 convolution uses smoothL1 loss, and final detection is carried out according to the enhanced characteristics of each hierarchy.

The training of the power equipment image defect detection network comprises the following steps:

inputting pictures X, W and H in a training data set into different hierarchical feature representation networks of the power equipment image defect detection network, wherein W represents the width of the pictures and H represents the height of the pictures;

setting a first unit of a different hierarchy feature representation network: step size is 2, convolution kernel size is 7 × 7, channel number is 64, batch normalization BN and nonlinear activation function Relu are added after each convolution, and output is recorded as s 1: w/2 × H/2 × 256;

setting a second unit of the different-level feature representation network: step size 2, convolution kernel size 3 × 3, pooling using maximum pooling operation, then stacking 3 convolution blocks, each convolution block containing 1 × 1 convolution kernel of 64 channels, 3 × 3 convolution of 64 channels, 1 × 1 convolution of 256 channels, respectively, the output being the sum of the input and output of the convolution block denoted s 2: w/4 × H/4 × 256;

setting a third unit of the different-level feature representation network: stacking 4 convolution blocks, each convolution block containing a1 × 1 convolution kernel of 128 channels, a3 × 3 convolution of 128 channels, and a1 × 1 convolution of 512 channels, respectively, the sum of the input and output of which is a convolution block is denoted as s 3: w/8 × H/8 × 256;

setting a fourth unit of the different-level feature representation network: stacking 23 convolution blocks, each convolution block containing 1 × 1 convolution kernel of 256 channels, 3 × 3 convolution of 256 channels, 1 × 1 convolution of 1024 channels, respectively, and the sum of the input and output of which is a convolution block is denoted as s 4: w/16 × H/16 × 256;

setting a fifth unit of the different-level feature representation network: stacking 3 convolution blocks, each convolution block respectively containing 1 × 1 convolution kernel of 512 channels, 3 × 3 convolution of 512 channels, 1 × 1 convolution of 2048 channels, and the sum of the input and output of the convolution block whose output is s 5: w/32 × H/32 × 256;

obtaining feature maps of 5 levels including s1, s2, s3, s4 and s5 through different level feature representation networks, inputting the feature maps of 5 levels into different level feature enhancement representation networks, respectively carrying out channel normalization on the feature maps with different channels through convolution of 1 × 256, carrying out up-sampling and low-level addition on the feature maps from a high level layer by layer to obtain a feature map of a current level, obtaining the output of a feature pyramid network through a3 × 3 convolution de-aliasing effect, and fusing semantic information into low-level features, wherein the different level feature representation enhancement networks are trained as follows:

for feature map s 1: W1H 1 256, wherein W1 is W/2 and H1 is H/2 and represents the width of the feature map s1 and the height of the feature map s1, and global averaging processing is performed at the channel level, that is, all channels of s1 are directly added and averaged; the feature map s2, the feature map s3, the feature map s4 and the feature map s5 are all subjected to global average processing, and the processing formula is as follows:

where num represents the number of channels in each layer of the feature map, here 256, si^averageFeature map si representing 256 channels in si_channel,channel∈[1,256],i∈[1,5]Averaging the results;

2 times of upsampling is carried out on the average value of the channel characteristic graph of each layer to obtain si^a,i∈[1,5]Representing the first step to enhance the different hierarchical special representation:

will si^aI-1, 2,3,4,5 size normalized to s3^aSize derived operation intermediate character si^bI is 1,2,3,4,5, then carrying out the second step of feature enhancement on the average value of the feature-enhanced channel feature map of each layer,

i.e. the different feature hierarchy representation after the second step of feature enhancement

Wherein sj^c*si^cIs a matrix

Intermediate operator element after performing softmax normalization operation, thus sj^dThe features are enhanced for the second time, namely the enhanced features after the fusion are recombined according to the relationship between the features, not only the low layer comprises the high-layer features, but also the high layer comprises the low-layer features, so that the semantic information and the position information are fully fused according to the relationship between the features;

to is directed atsj^dThe 5 layers of characteristics are characterized in that each layer performs 1 × 256 convolution to obtain 256-dimensional channel characteristics sj finally used for detection^e：

The classification branch of 1 × C convolution of the defect detection network uses softmax loss, the positioning branch of 1 × 4 convolution uses smoothL1 loss, and the feature-enhanced feature sj is^eAnd j belongs to {1,2,3,4,5} input defect detection network to carry out back propagation.

The obtaining of the image defect detection result of the power equipment comprises the following steps:

inputting the preprocessed images Y, W and H of the electric power equipment to be detected into the trained different-level feature representation network, wherein W represents the width of the picture, and H represents the height of the picture, and s1, s2, s3, s4 and s5 are obtained respectively;

the network has been given 5 levels of profile output by different hierarchical profiles,

wherein for s 1: W1H 1 256, where W1 is W/2, which indicates the width of the feature map s1, H1 is H/2, which indicates the height of the feature map s1, global averaging is performed at the channel level, that is, all channels of s1 are directly added to be averaged, and the feature map s2, the feature map s3, the feature map s4 and the feature map s5 are all subjected to global averaging:

where num represents the number of channels in each layer of the feature map, here 256, si^averageFeature map si representing 256 channels in si_channel,channel∈[1,256],i∈[1,5]And averaging the results.

And then 2 times of upsampling is carried out on the average value of the channel characteristic graph of each layer to obtain si^a,i∈[1,5]The first step of the representation enhances the different hierarchical ad-hoc representation,

si^a＝si^average+2*upsampling(s(i+1)^average),i＝1,2,3,4

s5^a＝s5^average

then mix si^aI-1, 2,3,4,5 size normalized to s3^aSize derived operation intermediate character si^bI is 1,2,3,4,5, performing a second step of feature enhancement on the average value of the feature-enhanced channel feature map of each layer,

i.e. the different feature level representation after the second feature enhancement is

Wherein sj^c*si^cIs a matrix

Intermediate operator element after performing softmax normalization operation, thus sj^dThe features are enhanced for the second time, namely the enhanced features after fusion are recombined according to the relationship between the features;

for sj^dJ e {1,2,3,4,5} of 5 layers of features, each layer performing 1 x 256 convolution to obtain 256-dimensional channel features that are ultimately used for target detection:

for enhanced features sj^eAnd j is belonged to {1,2,3,4,5} and a detection result is directly obtained by using a trained defect detection network.

Advantageous effects

Compared with the prior art, the method for detecting the image defects of the electric power equipment based on the enhancement of the different-level feature representation fully considers the enhancement of the different-level feature representation in the image of the electric power equipment, enhances the classification features by utilizing the positioning features, enhances the positioning features by utilizing the classification features, can automatically combine the features which are beneficial to classification at a high level and the features which are beneficial to positioning at a low level, enhances the features by utilizing the correlation among the different-level features, blends the classification features at the low level into the positioning features at the high level, blends the classification features at the high level into the classification features at the low level, finds the optimal features which are beneficial to both classification and positioning among 2 tasks, and improves the image defect detection rate and the accuracy rate of the electric power equipment.

Drawings

FIG. 1 is a sequence diagram of the method of the present invention;

FIG. 2a is a far view of a gauge breakage defect in the prior art;

FIG. 2b is a graph showing a defect in a gauge of the prior art;

FIG. 3a is a far view of a bird's nest defect in the prior art;

FIG. 3b is a close view of a bird's nest defect in the prior art.

Detailed Description

So that the manner in which the above recited features of the present invention can be understood and readily understood, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings, wherein:

as shown in fig. 1, the method for detecting image defects of electrical equipment based on enhancing different hierarchical feature representations according to the present invention includes the following steps:

firstly, acquiring training data: and acquiring the electric power equipment image with the marked defects, and preprocessing the electric power equipment image to form a training data set.

And secondly, constructing an image defect detection network of the power equipment: and constructing an image defect detection network of the electric power equipment, which consists of different level feature representation networks, different level feature enhancement representation networks and a defect detection network. The method comprises the following specific steps:

(1) constructing different hierarchical feature representation networks based on a shallow convolutional neural network and a deep convolutional neural network, wherein the different hierarchical feature representation networks comprise convolution operation, pooling operation, activation operation and batch normalization operation.

Setting convolutions of the shallow convolutional neural network and the deep convolutional neural network to adopt preset initial convolution to check the action domains, adding the action domains and summing the action domains to obtain a convolution result; and performing pooling operation by adopting a preset pooling kernel to perform step-by-step sliding operation on the feature map, and obtaining a pooling value by using the maximum value in the action domain or averaging to obtain the pooling value.

The different-level feature representation networks refer to convolutional neural networks, whether the shallow convolutional neural network or the deep convolutional neural network mainly comprises convolution operation, pooling operation, activation operation and batch normalization operation, the convolution mainly adopts preset initial convolution to check the action domains for addition and then summation to obtain a convolution result, the purpose is to extract features, meanwhile, the receptive field can be enlarged, the pooling operation mainly adopts the step-by-step sliding operation of a preset pooling kernel on the feature map, and the maximum value in the action domain is used for obtaining a pooling value or the pooling value is obtained by averaging, so that the feature redundancy is reduced, the calculated amount is reduced, meanwhile, the receptive field can be enlarged, the linear activation operation is mainly because the convolution and the pooling are linear operations, therefore, the activation operation is required to introduce a nonlinear factor to improve the network approximation capability, and the batch normalization operation is mainly used for improving the training speed.

(2) And constructing different hierarchical feature enhanced representation networks based on the feature pyramid network, and transmitting semantic information in the high-level features to the low-level features from top to bottom by adopting a laterally connected hierarchical structure.

Two parts are included in the set feature pyramid network: different levels of features from bottom to top represent a network and an interpolation addition training process of adjacent layers from top to bottom; and setting the characteristic enhancement on the basis of the characteristics at each level.

The traditional different-level feature enhancement representation network mainly represents a feature pyramid network, and mainly adopts a laterally connected hierarchical structure to transmit semantic information in high-level features to low-level features from top to bottom. In the feature pyramid structure, two parts are mainly included: the different level features from bottom to top represent the network process and the interpolation and addition process of adjacent layers from top to bottom. However, the main enhancement of the feature pyramid network is represented by the process of interpolation addition of adjacent layers from top to bottom, namely, semantic information is blended into low-layer features, but spatial information is not blended into high-layer features, so that features are enhanced on the basis of the features of each layer by utilizing the correlation among the features of each layer, so that the features of each layer are recombined, the semantic information of the high layer is mixed in the low layer, the position information of the low layer is also mixed in the high layer, and the feature is mixed by utilizing the correlation among the layers.

(3) Constructing a defect detection network: the defect detection network comprises 2 branches, a classification branch of 1 × C convolution uses softmax loss, a positioning branch of 1 × 4 convolution uses smoothL1 loss, and final detection is carried out according to the enhanced characteristics of each hierarchy.

In practical applications, the defect detection network may also be a one-stage detection network or a two-stage detection network. The first-stage detection network directly performs final detection according to the enhanced characteristics of each layer, and the second-stage detection network performs initial detection according to the enhanced characteristics of each layer and performs final detection according to the corresponding primary target to the enhanced characteristics of each layer.

Thirdly, training an image defect detection network of the power equipment: and inputting the training data set into the power equipment image defect detection network for training. The method comprises the following specific steps:

(1) inputting pictures X, W and H in a training data set into different hierarchical feature representation networks of the power equipment image defect detection network;

A1) setting a first unit of a different hierarchy feature representation network: step size is 2, convolution kernel size is 7 × 7, channel number is 64, batch normalization BN and nonlinear activation function Relu are added after each convolution, and output is recorded as s 1: w/2 × H/2 × 256;

A2) setting a second unit of the different-level feature representation network: step size 2, convolution kernel size 3 × 3, pooling using maximum pooling operation, then stacking 3 convolution blocks, each convolution block containing 1 × 1 convolution kernel of 64 channels, 3 × 3 convolution of 64 channels, 1 × 1 convolution of 256 channels, respectively, the output being the sum of the input and output of the convolution block denoted s 2: w/4 × H/4 × 256;

A3) setting a third unit of the different-level feature representation network: stacking 4 convolution blocks, each convolution block containing a1 × 1 convolution kernel of 128 channels, a3 × 3 convolution of 128 channels, and a1 × 1 convolution of 512 channels, respectively, the sum of the input and output of which is a convolution block is denoted as s 3: w/8 × H/8 × 256;

A4) setting a fourth unit of the different-level feature representation network: stacking 23 convolution blocks, each convolution block containing 1 × 1 convolution kernel of 256 channels, 3 × 3 convolution of 256 channels, 1 × 1 convolution of 1024 channels, respectively, and the sum of the input and output of which is a convolution block is denoted as s 4: w/16 × H/16 × 256;

A5) setting a fifth unit of the different-level feature representation network: stacking 3 convolution blocks, each convolution block respectively containing 1 × 1 convolution kernel of 512 channels, 3 × 3 convolution of 512 channels, 1 × 1 convolution of 2048 channels, and the sum of the input and output of the convolution block whose output is s 5: W/32H/32 256.

(2) Obtaining feature maps of 5 levels including s1, s2, s3, s4 and s5 through different level feature representation networks, inputting the feature maps of 5 levels into different level feature enhancement representation networks, respectively carrying out channel normalization on the feature maps with different channels through convolution of 1 × 256, carrying out up-sampling and low-level addition on the feature maps from a high level layer by layer to obtain a feature map of the current level, then carrying out 3 × 3 convolution de-aliasing effect to obtain the output of a feature pyramid network, and fusing semantic information into the low-level features. Because the sampling is carried out from high to low layer by layer and the addition is carried out on the low-level, semantic information is merged into the low-level features, but spatial information of the low-level is not merged into the high-level features, and the different-level feature representation enhancement network is trained as follows:

B1) for feature map s 1: W1H 1 256, W1 is W/2, H1 is H/2, and global averaging is performed at the channel level, that is, all channels of s1 are directly added and averaged; the feature map s2, the feature map s3, the feature map s4 and the feature map s5 are all subjected to global average processing, and the processing formula is as follows:

B2) the average value of the channel feature map of each layer is subjected to 2 times of upsampling and top-down addition to carry out first-step enhancement of different levels, and the first-step enhancement is specifically represented as follows:

B3) will si^aI-1, 2,3,4,5 size normalized to s3^aSize to obtain si^bI is 1,2,3,4,5, then carrying out the second step of feature enhancement on the average value of the feature-enhanced channel feature map of each layer,

i.e. the feature after the second step of feature enhancement is

Wherein sj^c*si^cIs a matrix

Elements after performing softmax normalization operations, thus sj^dThe features are enhanced, namely the enhanced features after fusion are recombined according to the relationship between the features, not only the low layer comprises the high-layer features, but also the high layer comprises the low-layer features, so that the semantic information and the position information are fully fused according to the relationship between the features;

(3) for sj^dAnd j e {1,2,3,4,5} of the 5 layers of features, wherein each layer is subjected to 1 × 256 convolution to obtain 256-dimensional channel features:

(4) classification branch of 1 × C convolution of defect detection network uses softmax loss, localization branch of 1 × 4 convolution uses smoothL1 loss, feature enhanced feature sj^eAnd j belongs to {1,2,3,4,5} input defect detection network to carry out back propagation.

Fourthly, acquiring an image of the power equipment to be detected: and acquiring an image of the electric power equipment to be detected, and preprocessing the image.

And step five, obtaining an image defect detection result of the power equipment: inputting the preprocessed image of the power equipment to be detected into the trained image defect detection network of the power equipment, and detecting a defect result in the image of the power equipment. The method comprises the following specific steps:

(1) inputting the preprocessed images Y, W and H of the electric power equipment to be detected into the trained different-level feature representation networks to respectively obtain s1, s2, s3, s4 and s 5;

(2) the network has been given 5 levels of profile output by different hierarchical profiles,

wherein for s 1: W1H 1 256, wherein W1 is W/2, H1 is H/2, global averaging is performed at the channel level, that is, all channels of s1 are directly added to be averaged, and the feature map s2, the feature map s3, the feature map s4 and the feature map s5 are subjected to global averaging:

then 2 times of up-sampling top-down addition is carried out on the average value of the channel feature map of each layer to carry out first step of enhancing different levels of special representation,

si^a＝si^average+2*upsampling(s(i+1)^average),i＝1,2,3,4

s5^a＝s5^average

then mix si^aI-1, 2,3,4,5 size normalized to s3^aSize to obtain si^bI 1,2,3,4,5, the feature-enhanced channel profile mean for each layerA second step of feature enhancement is performed,

i.e. the feature after the feature enhancement is

Wherein sj^c*si^cIs a matrix

Elements after performing softmax normalization operations, thus sj^dThe features are enhanced, namely the enhanced features after fusion are recombined according to the relationship between the features;

(4) features sj enhanced for features^eAnd j is belonged to {1,2,3,4,5} and a detection result is directly obtained by using a trained defect detection network.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are merely illustrative of the principles of the invention, but that various changes and modifications may be made without departing from the spirit and scope of the invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A method for detecting image defects of electric equipment based on enhancement of different-level feature representations is characterized by comprising the following steps:

11) acquisition of training data: acquiring and preprocessing the electric power equipment image with the marked defects to form a training data set;

12) constructing an image defect detection network of the power equipment: constructing an image defect detection network of the power equipment, which consists of different level feature representation networks, different level feature enhancement representation networks and a defect detection module;

13) training of the power equipment image defect detection network: inputting a training data set into an image defect detection network of the power equipment for training;

131) inputting pictures X, W and H in a training data set into different hierarchical feature representation networks of the power equipment image defect detection network, wherein W represents the width of the pictures and H represents the height of the pictures;

1311) setting a first unit of a different hierarchy feature representation network: step size is 2, convolution kernel size is 7 × 7, channel number is 64, batch normalization BN and nonlinear activation function Relu are added after each convolution, and output is recorded as s 1: w/2 × H/2 × 256;

1312) setting a second unit of the different-level feature representation network: step size 2, convolution kernel size 3 × 3, pooling using maximum pooling operation, then stacking 3 convolution blocks, each convolution block containing 1 × 1 convolution kernel of 64 channels, 3 × 3 convolution of 64 channels, 1 × 1 convolution of 256 channels, respectively, the output being the sum of the input and output of the convolution block denoted s 2: w/4 × H/4 × 256;

1313) setting a third unit of the different-level feature representation network: stacking 4 convolution blocks, each convolution block containing a1 × 1 convolution kernel of 128 channels, a3 × 3 convolution of 128 channels, and a1 × 1 convolution of 512 channels, respectively, the sum of the input and output of which is a convolution block is denoted as s 3: w/8 × H/8 × 256;

1314) setting a fourth unit of the different-level feature representation network: stacking 23 convolution blocks, each convolution block containing 1 × 1 convolution kernel of 256 channels, 3 × 3 convolution of 256 channels, 1 × 1 convolution of 1024 channels, respectively, and the sum of the input and output of which is a convolution block is denoted as s 4: w/16 × H/16 × 256;

1315) setting a fifth unit of the different-level feature representation network: stacking 3 convolution blocks, each convolution block respectively containing 1 × 1 convolution kernel of 512 channels, 3 × 3 convolution of 512 channels, 1 × 1 convolution of 2048 channels, and the sum of the input and output of the convolution block whose output is s 5: w/32 × H/32 × 256;

132) obtaining feature maps of 5 levels including s1, s2, s3, s4 and s5 through different level feature representation networks, inputting the feature maps of 5 levels into different level feature enhancement representation networks, respectively carrying out channel normalization on the feature maps with different channels through convolution of 1 × 256, carrying out up-sampling and low-level addition on the feature maps from a high level layer by layer to obtain a feature map of a current level, obtaining the output of a feature pyramid network through a3 × 3 convolution de-aliasing effect, and fusing semantic information into low-level features, wherein the different level feature representation enhancement networks are trained as follows:

1321) for feature map s 1: W1H 1 256, wherein W1 is W/2 and H1 is H/2 and represents the width of the feature map s1 and the height of the feature map s1, and global averaging processing is performed at the channel level, that is, all channels of s1 are directly added and averaged; the feature map s2, the feature map s3, the feature map s4 and the feature map s5 are all subjected to global average processing, and the processing formula is as follows:

1322) 2 times of upsampling is carried out on the average value of the channel characteristic graph of each layer to obtain si^a,i∈[1,5]Representing the first step to enhance the different hierarchical special representation:

1323) will si^aI-1, 2,3,4,5 size normalized to s3^aSize derived operation intermediate character si^bI is 1,2,3,4,5, then carrying out the second step of feature enhancement on the average value of the feature-enhanced channel feature map of each layer,

Wherein sj^c*si^cIs a matrix

133) for sj^dThe 5 layers of characteristics are characterized in that each layer performs 1 × 256 convolution to obtain 256-dimensional channel characteristics sj finally used for detection^e：

134) The classification branch of 1 × C convolution of the defect detection network uses softmax loss, the positioning branch of 1 × 4 convolution uses smoothL1 loss, and the feature-enhanced feature sj is^eJ belongs to {1,2,3,4,5} input defect detection network to carry out back propagation;

14) acquiring an image of the to-be-detected power equipment: acquiring an image of the electric power equipment to be detected, and preprocessing the image;

15) obtaining an image defect detection result of the power equipment: inputting the preprocessed image of the power equipment to be detected into the trained image defect detection network of the power equipment, and detecting a defect result in the image of the power equipment.

2. The method for detecting the image defects of the electric power equipment based on the enhancement of the different-level feature representations according to claim 1, wherein the construction of the network for detecting the image defects of the electric power equipment comprises the following steps:

21) constructing different hierarchical feature representation networks based on a shallow convolutional neural network and a deep convolutional neural network, wherein the different hierarchical feature representation networks comprise convolution operation, pooling operation, activation operation and batch normalization operation;

22) constructing different hierarchical feature enhanced representation networks based on the feature pyramid network, and transmitting semantic information in high-level features to low-level features from top to bottom by adopting a laterally connected hierarchical structure;

23) constructing a defect detection network: the defect detection network comprises 2 branches, a classification branch of 1 × C convolution uses softmax loss, a positioning branch of 1 × 4 convolution uses smoothL1 loss, and final detection is carried out according to the enhanced characteristics of each hierarchy.

3. The method for detecting the image defect of the electric power equipment based on the enhancement of the different-level feature representation according to claim 1, wherein the obtaining of the image defect detection result of the electric power equipment comprises the following steps:

31) inputting the preprocessed images Y, W and H of the electric power equipment to be detected into the trained different-level feature representation network, wherein W represents the width of the picture, and H represents the height of the picture, and s1, s2, s3, s4 and s5 are obtained respectively;

32) the network has been given 5 levels of profile output by different hierarchical profiles,

si^a＝si^average+2*upsampling(s(i+1)^average),i＝1,2,3,4

s5^a＝s5^average

Wherein sj^c*si^cIs a matrix

33) for sj^dJ e {1,2,3,4,5} of 5 layers of features, each layer performing 1 x 256 convolution to obtain 256-dimensional channel features that are ultimately used for target detection:

34) for enhanced features sj^eAnd j is belonged to {1,2,3,4,5} and a detection result is directly obtained by using a trained defect detection network.