CN115909093A

CN115909093A - Power equipment fault detection method based on unmanned aerial vehicle inspection and infrared image semantic segmentation

Info

Publication number: CN115909093A
Application number: CN202211290849.8A
Authority: CN
Inventors: 李阳; 孙鹏; 刘涛; 曲文涛; 辛鸿远; 张书瑄; 宫铁光; 王犇
Original assignee: Qianguo Fuhui Wind Energy Co ltd
Current assignee: Qianguo Fuhui Wind Energy Co ltd
Priority date: 2022-10-21
Filing date: 2022-10-21
Publication date: 2023-04-04

Abstract

The invention relates to the technical field of power equipment fault detection. The method for detecting the faults of the power equipment based on unmanned aerial vehicle routing inspection and infrared image semantic segmentation is provided. The method comprises the following specific steps: the unmanned aerial vehicle finishes the inspection work of the whole power grid according to a fixed height and a track, the thermal infrared imager acquires infrared image information of the power equipment through photographing and recording, and target marking is carried out; preprocessing and image enhancement are carried out on the infrared image, the quality of a data set is increased, and preparation is made for improving the generalization capability of a subsequent semantic segmentation network; selecting a Unet semantic segmentation algorithm to complete the segmentation of the power equipment in the infrared image, training a network by using a data set, and adjusting a hyper-parameter to converge the total loss; the Unet network is improved, and the identification precision is improved; and performing cluster analysis, and respectively judging whether each power device has a heating fault.

Description

Power equipment fault detection method based on unmanned aerial vehicle inspection and infrared image semantic segmentation

Technical Field

The invention relates to the technical field of power equipment fault detection. The method for detecting the power equipment fault based on unmanned aerial vehicle routing inspection and infrared image semantic segmentation is provided.

Background

The power industry has always been an important industry for supporting the development of national economy in China. In the key period of rapid development of science and technology, electricity is an important driving force and also is the basis of stable operation of the society, and the supply of high-quality electric energy is the requirement of the nation and people on electricity production enterprises. The transmission line distribution range of electric wire netting is wide, and the natural environment of locating is complicated various, and transmission line and shaft tower equipment stand wind for a long time and blows the solarization, and damage such as rainwater erosion produces corrosion easily, wearing and tearing, spontaneous explosion, brings very big hidden danger to transmission line's safety and stability operation. In order for the power system to operate safely and reliably, power equipment must be periodically inspected. In the traditional manual inspection, an inspector usually walks to a position near a power transmission line and then inspects power equipment by means of equipment such as a telescope or a thermal infrared imager and a corona detection camera. However, the power grid is usually erected in a mountainous area with inconvenient traffic and complex environment, the speed of manual inspection of the power transmission line is slow, the efficiency is low, the labor intensity is high, and the equipment is difficult to inspect under extreme weather conditions. Sometimes, inspection personnel need to climb the electric power tower to check, and safety accidents are easily caused.

The unmanned aerial vehicle inspection and image recognition technology is a real-time fault monitoring technology developed along with the development of a deep learning algorithm and a photographic imaging technology, the unmanned aerial vehicle completes automatic inspection on a power grid, the visual angle is wide, a visual blind area is absent, the unmanned aerial vehicle is safe and reliable, and the fault recognition precision is satisfactory. The method is widely applied to operation and maintenance of various systems with high reliability requirements, such as hydropower stations, large-scale transformer substations and the like. The unmanned aerial vehicle is used for inspecting the power equipment in real time, the work burden of field operation and maintenance personnel can be reduced, and the safe and reliable operation of the equipment is effectively guaranteed.

Disclosure of Invention

The image semantic segmentation algorithm is a main mode for judging whether the power equipment has faults or not by the unmanned aerial vehicle. The unmanned aerial vehicle carries the thermal infrared imager to record images and video information of the power equipment in the routing inspection process, and when a heating fault occurs in data, a semantic segmentation algorithm can position specific high-temperature targets and position contours. The unmanned aerial vehicle high accuracy positioning system provides the longitude and latitude coordinate of trouble emergence point to the ground satellite station, is favorable to the maintenance of whole electric wire netting and clears away fast of trouble.

The technical scheme adopted by the invention is as follows:

s1, an unmanned aerial vehicle finishes the inspection work of the whole power grid according to a fixed height and a flight path, and a thermal infrared imager acquires infrared image information of power equipment through photographing and video recording and marks a target.

And S2, preprocessing and enhancing the infrared image, increasing the quality of the data set, and preparing for the improvement of the subsequent semantic segmentation network generalization capability.

And S3, selecting a Unet semantic segmentation algorithm to complete the segmentation of the power equipment in the infrared image, training a network by using the data set obtained in the S2, and adjusting the hyper-parameters to converge the total loss.

And S4, improving the Unet network and improving the identification precision.

And S5, performing cluster analysis to respectively judge whether each power device has a heating fault.

Further, the step S1 of acquiring data specifically includes the following steps:

s101, before the unmanned aerial vehicle patrols and examines, a patrolling route and the flying height are designed, and the flying height of the unmanned aerial vehicle needs to be far away from wires and tower poles as far as possible in order to prevent the unmanned aerial vehicle from scratching electric equipment during patrolling and examining because an electric network is provided with a plurality of overhead wires and high-voltage towers.

S102, designing an unmanned aerial vehicle routing inspection route by using a track planning algorithm, and controlling a camera and a holder by a worker to the greatest extent without dead angles and visual field blind areas.

S103, in the data acquisition process, the infrared image and the high-definition color image can be shot simultaneously, and due to the fact that the infrared image is low in resolution ratio and poor in contrast, the currently shot power equipment can be located by adopting longitude and latitude coordinates of the color image.

S104, labeling the images by adopting labelme software, giving different class names to different classes of electric power equipment, and finally generating a mask label graph for training a subsequent semantic segmentation network

The step S2 is to increase the quality of the data set, and comprises the following specific steps:

s201, preprocessing the image, firstly processing the image by adopting a gray histogram equalization algorithm due to uneven gray distribution of the infrared image, equalizing the gray histogram, and in the process of image identification, increasing gray contrast can highlight important characteristics of the image.

First consider a continuous function and let the variable r represent the gray level of the image to be enhanced. Let r be normalized to [0,1], and r =0 for black and r =1 for white.

For a continuous function, assume that its transformation function is:

s＝T(r)0≤r≤1

in the original image, one gray value s corresponds to each r. Wherein the transformation function satisfies the following condition:

1. t (r) is required to be a single value in [0,1] and monotonically increasing.

2. When r is more than or equal to 0 and less than or equal to 1, T (r) is more than or equal to 0 and less than or equal to 1. So that the output gray scale is limited to the same range as the input gray scale.

Rewrite the formula s = T (r):

r＝T ^-1 (s)0≤s≤1

let P _r (r) and P _s (s) represent probability density functions of the random variables r and s, respectively. A basic result is obtained from basic probability theory:

the probability density function of the transformation variable s is thus determined by the gray level probability density function of the input image and the selected transformation function.

Selecting a transformation function:

further comprising:

/>

bringing the above into

To obtain P _s (s)＝1。

S202, denoising the image on the basis of the S201, wherein the infrared image has poor contrast and low resolution. The filtering algorithm can sharpen the image and enhance the image quality.

The noise is removed by adopting a bilateral filtering algorithm, the bilateral filtering algorithm is a nonlinear edge-preserving filtering method, and the purpose of edge-preserving and denoising is achieved by considering the spatial proximity and the pixel value similarity of the image. The method has the characteristics of non-iteration, simplicity, edge protection and the like, and the principle is as follows:

where ω (i, j, k, l) is the product of the spatial domain kernel and the pixel domain kernel, σ _d ，σ _r All the parameters are artificially set smoothing parameters, I (I, j) is the gray value of the central pixel point of the sliding window, I (k, l) is the gray value of any other pixel points in the window, and finally the gray value I of the pixel point after denoising is calculated _D (i,j)：

S203, because the data sets of the power equipment are few, the data sets are expanded by rotating, overturning, cutting and the like, so that the number of images is increased, and the overfitting influence of the network is eliminated.

Further, the specific step of S3 is as follows:

s301, the core algorithm of the Unet network based on deep learning is a random gradient descent algorithm with first-order momentum: SGD + Momentum, which introduces first-order Momentum on the basis of SGD and increases inertia.

x _t+1 ＝x _t -αv _t+1

m _t ＝βm _t-1 +(1-β)gt

In the above formula, the hyperparameter alpha is the learning rate, beta is usually 0.9 according to the experience _t As the current position, x _t+1 Is the next position, v _t Is the previously accumulated gradient, v _t+1 Is the gradient that is currently being accumulated and,

for the gradient at this moment, ρ is the momentum factor.

The exponential weighted average (exponential decay average) is to add a decay coefficient to control how much the historical information is acquired, and is approximately equal to the average value of the gradient vector sum at 1/(1-beta) historical time.

S302, selecting an activation function:

the mathematical expression of the ReLu function is:

f(x)＝max(0,x)

as can be seen from the expression, the ReLu function abandons power calculation, when X is greater than 0, the mapping is self, when X is less than 0, the direct mapping is 0, and the training speed of the network is accelerated. The derivatives are expressed as follows:

it can be seen from the derivative expression that when X >0, the derivative is constantly equal to 1, which solves the problem of gradient disappearance during back propagation, so that the depth design of the neural network can be deeper and deeper, and the sparse expression capability of the neural network is also improved. The ReLu function is a linear function when X is greater than 0, the calculation gradient is simpler compared with a nonlinear function, but the ReLu function is nonlinear in the whole interval, the expression capacity is stronger compared with the linear function, the ReLu function also has the defects, when the input is a negative number, the output is 0 through the ReLu, the neurons are permanently inactivated, and the weight cannot be updated. In response to this drawback, researchers have developed other activation functions based on the ReLu function, such as the leak ReLu function, whose mathematical expression is as follows:

f(x)＝max(0.01x,x)

as can be seen from the table equation, when the input X is less than 0, the output is no longer 0, solving the problem of complete inactivation of the neuron.

S303, selecting a loss function:

in the formula p _l(x) (x) Which category is represented by l (x) in (1), i.e., l: Ω → {1,. K }。

w (x) is the weight of each pixel point on the feature map, w _c (x) Is a weight map for balancing class frequencies, d ₁ Indicating the distance d from a certain pixel point to the background to the nearest target boundary ₂ Indicating the distance from a certain pixel point to the background to the second closest target boundary.

S304, inputting the labeling data into the debugged semantic segmentation network model to start training, and selecting to freeze a part of parameters of the network after a certain epoch. And observing whether the verification loss and the total loss are converged and the descending speed of the verification loss and the total loss, and adjusting the hyper-parameters and the optimization algorithm according to the primary prediction result until the segmentation precision reaches the expected index.

S4, the improved Unet network is realized by the following steps:

s401, training the network by adopting a transfer learning and freezing backbone network training mode, and adopting pre-training weights on the public data set to enable network loss convergence to be faster and consumption of calculation power to be less.

S402, an attention introducing mechanism module:

the Conditional Block Attention Module (CBAM) is a simple and efficient Attention Module (Attention Module) designed for Convolutional neural networks. For the feature map generated by the convolutional neural network, CBAM calculates the attribute map of the feature map from two dimensions of a channel and a space, and then multiplies the attribute map with the input feature map to perform the adaptive learning of the features. The CBAM is a lightweight general-purpose module, and can be integrated into various convolutional neural networks for end-to-end training.

The Unet semantic segmentation network carries out end-to-end prediction on the infrared image and classifies each pixel point, so that an attention mechanism module can be added into the model to further improve the network segmentation effect. Assuming that the input characteristic diagram is F ∈ R ^CxHxW (ii) a Using CBAM to derive one-dimensional channel attention map M _C ∈R ^Cx1x1 And two dimensionsSpatial attention map M _S ∈R ^1xHxW (ii) a The overall attention mechanism can be summarized as:

in the formula

The multiplication of corresponding position elements is shown, F 'represents the result of the input feature layer processed by the channel attention module, and F' represents the referred Future, i.e. the final feature layer.

The Channel Attention Module uses the Channel relationships between features to generate Channel Attention maps. Since each channel of feature layers is considered as a feature detector, the attention of the channel is focused on which feature layers of a given input image are meaningful, i.e. each channel is given different weights to characterize its importance, and in order to effectively calculate the channel attention, a method of compressing the spatial dimension of the input feature map is adopted, and the AvgPool (mean pooling) and MaxPool (maximum pooling) methods are simultaneously used, which proves more characterizing in real time than a pooling method alone.

Firstly, the spatial information of the characteristic diagram is aggregated by utilizing average pooling and maximum pooling to respectively obtain

And &>

Will then->

And &>

Forwarding to a shared MLP network with only one hidden layer, wherein the number of neurons of the hidden layer is C/r, and r (reduction ratio) is a hyper-parameter. The dimensionalities of two Channel attribution maps obtained respectively through maximum pooling and average pooling are Cx1x1, and the dimensionalities of the two Channel attribution maps are finally added and are the final output M of the Channel attribution through a sigmoid function _C The specific algorithm is as follows:

M _C (F)＝σ(MLP(AvgPool(F))+MLP(MaxPool(F)))

where σ is the sigmoid function. Spatial attention maps are generated using spatial relationships between features. Unlike the channel attention module, which focuses on information part locations, in addition to the channel attention module, to compute spatial attention, the average pooling and maximum pooling operations are first applied along the channel axis and concatenated to generate a valid feature descriptor, i.e., channel information for one feature layer is aggregated using two pooling operations to generate two-dimensional feature maps: f _savg ∈R ^1xHxW And F _smax ∈R ^1xHxW Each representing the average pooling characteristic and the maximum pooling characteristic of the channel, and then performing a connection and convolution operation using a standard convolution layer to obtain a two-dimensional spatial attention map, which is calculated as follows:

M _S (F)＝σ(f ^7x7 (CONCAT(AvgPool(F),MaxPool(F))))

in the formula, sigma is a sigmoid function, and CONCAT represents splicing in a depth dimension.

S403, learning rate adjustment strategy:

when we use a gradient descent algorithm to optimize the objective function, the learning rate should become smaller as we get closer to the global minimum of the Loss value to make the model as close to this as possible, and Cosine annealing (Cosine annealing) can reduce the learning rate by a Cosine function. The function value of the cosine function firstly slowly decreases with the increase of x, then rapidly decreases, and slowly decreases again. This descent pattern well meets the learning rate adjustment requirement.

In the hot restart method, T is executed _i After an epoch a warm restart is started (arm restart), whereas the index i refers to the number of restarts, where a restart is not a restart but rather an annealing algorithm is simulated by increasing the learning rate, and the restart is followed by using the old x _t As an initial solution, x here _t That is, solving the solution of the loss function, i.e., the weights in the neural network, by gradient descent, since the restart is to skip the local optima by increasing the learning rate, x needs to be set _t Set as the old value.

The principle of cosine annealing (cosine annealing) is as follows:

character meaning in the expression: i is the number of network runs (index value).

The maximum value and the minimum value of the learning rate are respectively expressed, and the range of the learning rate is defined. T is _cur It indicates how many epochs are currently executed, but T _cur Is updated after each batch run, and when an epoch has not yet been executed, T is updated _cur The value of (d) may be a decimal number. For example, if the total sample is 100 and the size of each batch is 4, then the cycle would be 25 reads in the batch in one epoch, and then the value after the first batch in the first epoch is updated to 1/25=0.04, and so on. T is a unit of _i Indicates the total epoch number in the i-th run. />

S5, segmenting the hyperspectral image under the characteristic spectrum by adopting a clustering algorithm, and specifically comprising the following steps:

and (5) carrying out clustering analysis on the images by adopting a K-means clustering algorithm. The input to the K-means clustering algorithm is a data set D = { x } containing N data objects ₁ ,x ₂ ,…,x _n Output as k mutually independent clusters of classes C = { C = } ₁ ,C ₂ ,…,C _n The method comprises the following specific steps:

step1, randomly selecting k data objects from an input data set D as initial cluster center points;

step2, calculating the similarity between the data objects in the data set D and the k cluster class central points, and distributing the cluster objects to the class clusters represented by the class cluster central points with the highest similarity;

step3, counting data object information in each cluster, taking the average value as a new cluster center point, and updating cluster center point information;

step4, iteratively executing step2 and step3 until the algorithm is executed, wherein the center point of the cluster class is not changed any more.

Wherein x = { x for two data objects containing m attributes ₁ ,x ₂ ,…,x _m } and y = { y ₁ ,y ₂ ,…,y _m And calculating the similarity by using Pearson correlation coefficients as follows:

after the cluster center of each target is obtained, whether the power equipment has heating faults or not can be judged according to the temperature of the surrounding pixels.

Drawings

FIG. 1: power equipment fault detection method flow chart based on unmanned aerial vehicle inspection and infrared image semantic segmentation

Detailed Description

The invention introduces a power equipment fault detection method based on unmanned aerial vehicle inspection and infrared image semantic segmentation, and a specific implementation flow chart is shown in fig. 1.

The specific embodiment of the invention is as follows:

s1, an unmanned aerial vehicle finishes the inspection work of the whole power grid according to a fixed height and a track, and a thermal infrared imager acquires infrared image information of power equipment through photographing and recording and marks a target.

And S2, preprocessing and enhancing the infrared image, increasing the quality of the data set, and preparing for improving the subsequent semantic segmentation network generalization capability.

And S3, selecting a Unet semantic segmentation algorithm to complete the segmentation of the power equipment in the infrared image, training a network by using the data set obtained in the S2, and adjusting the hyper-parameters to make the total loss convergent.

And S4, improving the Unet network and improving the identification precision.

S103, the infrared image and the high-definition color image can be shot simultaneously in the data acquisition process, and due to the fact that the infrared image is low in resolution and poor in contrast, the currently shot power equipment can be located through longitude and latitude coordinates of the color image.

For a continuous function, assume that its transformation function is:

s＝T(r)0≤r≤1

1. t (r) is required to be single-valued in [0,1] and monotonically increasing.

Rewrite the formula s = T (r):

r＝T ^-1 (s)0≤s≤1

therefore, the probability density function of the transformation variable s is determined by the gray level probability density function of the input image and the selected transformation function.

Selecting a transformation function:

further, there are:

bringing the above into

To obtain P _s (s)＝1。

where ω (i, j, k, l) is the product of the spatial-domain kernel and the pixel-domain kernel, σ _d ，σ _r All the parameters are artificially set smoothing parameters, I (I, j) is the gray value of the central pixel point of the sliding window, I (k, l) is the gray value of any other pixel points in the window, and finally the gray value I of the pixel point after denoising is calculated _D (i,j)：

Further, the specific step of S3 is as follows:

s301, the core algorithm of the Unet network based on deep learning is a random gradient descent algorithm with first-order momentum: SGD + Momentum introduces first-order Momentum on the basis of SGD, increases inertia.

x _t+1 ＝x _t -αv _t+1

m _t ＝βm _t-1 +(1-β)gt

In the above formula, the hyperparameter alpha is the learning rate, beta is according to the channelThe test frequency is 0.9,x _t As the current position, x _t+1 Is the next position, v _t Is the previously accumulated gradient, v _t+1 Is the gradient that is currently being accumulated and,

for the gradient at this moment, ρ is the momentum factor.

S302, activating function selection:

the mathematical expression of the ReLu function is:

f(x)＝max(0,x)

as can be seen from the expression, the ReLu function abandons the power calculation, when X is greater than 0, the mapping is self, when X is less than 0, the direct mapping is 0, and the training speed of the network is accelerated. The derivatives are expressed as follows:

f(x)＝max(0.01x,x)

S303, selecting a loss function:

in the formula p _l(x) (x) Which class is denoted as l (x) in (1), i.e., l: Ω → { 1., K }.

S304, inputting the labeling data into the debugged semantic segmentation network model to start training, and selecting to freeze a part of parameters of the network after a certain epoch. And observing whether the verification loss and the total loss are converged and the descending speed of the verification loss and the total loss, and adjusting the hyper-parameters and the optimization algorithm according to the preliminary prediction result until the segmentation precision reaches an expected index.

S4, the improved Unet network is realized by the following steps:

s401, training the network by adopting a transfer learning and freezing backbone network training mode, and adopting pre-training weights on an open data set to enable network loss convergence to be faster and consumption calculation power to be less.

S402, an attention introducing mechanism module:

the Conditional Block Attachment Module (CBAM) is a simple and efficient Attention Module (attachment Module) designed for Convolutional neural networks. For the feature map generated by the convolutional neural network, CBAM calculates the attribute map of the feature map from two dimensions of a channel and a space, and then multiplies the attribute map with the input feature map to perform the adaptive learning of the features. The CBAM is a lightweight general-purpose module, and can be integrated into various convolutional neural networks for end-to-end training.

The Unet semantic segmentation network carries out end-to-end prediction on the infrared image and classifies each pixel point, so that an attention mechanism module can be added into the model to further improve the network segmentation effect. Assuming that the input characteristic diagram is F ∈ R ^CxHxW (ii) a Using CBAM to derive one-dimensional channel attention map M _C ∈R ^Cx1x1 And two-dimensional space attention map M _S ∈R ^1xHxW (ii) a The overall attention mechanism can be summarized as:

in the formula

It represents the multiplication of corresponding position elements, F 'represents the result of the input feature layer processed by the channel attention module, and F' represents the referred Future, i.e. the final feature layer.

And &>

Will then->

And &>

Forwarding to a shared MLP network with only one hidden layer, wherein the number of neurons of the hidden layer is C/r, and r (reduction ratio) is a hyper-parameter. The dimensionality of two Channel attribution maps obtained by respectively obtaining maximum pooling and average pooling is Cx1x1, and the two are added and are the final output M of the Channel attribution through a sigmoid function _C The specific algorithm is as follows:

M _C (F)＝σ(MLP(AvgPool(F))+MLP(MaxPool(F)))

M _S (F)＝σ(f ^7x7 (CONCAT(AvgPool(F),MaxPool(F))))

S403, learning rate adjustment strategy:

when we use a gradient descent algorithm to optimize the objective function, the learning rate should become smaller as we get closer to the global minimum of the Loss value to make the model as close to this as possible, and Cosine annealing (Cosine annealing) can reduce the learning rate by a Cosine function. The function value of the cosine function firstly slowly decreases with the increase of x, then rapidly decreases, and slowly decreases again. This descending pattern satisfies the adjustment requirement of the learning rate well.

In the hot restart method, T is executed _i After an epoch a warm restart is started (arm restart), whereas the index i refers to the number of restarts, where a restart is not a restart but rather an annealing algorithm is simulated by increasing the learning rate, and the restart is followed by using the old x _t As an initial solution, x here _t That is, solving the solution of the loss function, i.e., the weights in the neural network, by gradient descent, since the restart is to skip the local optima by increasing the learning rate, it is necessary to put x _t Set as the old value.

The principle of cosine annealing (cosine annealing) is as follows:

The maximum value and the minimum value of the learning rate are respectively expressed, and the range of the learning rate is defined. T is _cur It indicates how many epochs are currently executed, but T _cur Is updated after each batch run, and when an epoch has not been executed, T is updated _cur The value of (d) may be a decimal number. For example, if the total sample is 100 and the size of each batch is 4, then the cycle would be 25 reads in the batch in one epoch, and then the value after the first batch in the first epoch is updated to 1/25=0.04, and so on. T is _i Indicates the total epoch number in the i-th run.

by K-performing cluster analysis on the images by means of a means clustering algorithm. The input to the K-means clustering algorithm is a data set D = { x } containing N data objects ₁ ,x ₂ ,…,x _n Output as k mutually independent clusters of classes C = { C = } ₁ ,C ₂ ,…,C _n The method comprises the following specific steps:

Wherein x = { x for two data objects containing m attributes ₁ ,x ₂ ,…,x _m And y = { y = ₁ ,y ₂ ,…,y _m And calculating the similarity by using Pearson correlation coefficients as follows:

after the clustering center of each target is obtained, whether the power equipment has heating faults or not can be judged according to the temperature of the pixels around the target.

Claims

1. A power equipment fault detection method based on unmanned aerial vehicle routing inspection and infrared image semantic segmentation is characterized by comprising the following steps:

s1, an unmanned aerial vehicle finishes the inspection work of the whole power grid according to a fixed height and a flight path, and a thermal infrared imager collects infrared image information of power equipment through photographing and video recording and carries out target marking;

1) The unmanned aerial vehicle is required to design an inspection route and the flying height before inspection, and the flying height of the unmanned aerial vehicle needs to be far away from wires and tower poles as far as possible in order to prevent electrical equipment from being scratched when the unmanned aerial vehicle inspects the inspection route because the power grid is provided with a plurality of overhead wires and high-voltage towers;

2) Designing an unmanned aerial vehicle routing inspection route by using a track planning algorithm, and controlling a camera and a pan-tilt head by staff to the greatest extent without dead angles and visual field blind areas;

3) The infrared image and the high-definition color image can be shot simultaneously in the data acquisition process, and the currently shot power equipment can be positioned by adopting longitude and latitude coordinates of the color image due to low resolution and poor contrast of the infrared image;

4) Labeling images by adopting label software, giving different class names to different types of electric equipment, and finally generating a mask label graph for training a subsequent semantic segmentation network

S2, preprocessing and image enhancement are carried out on the infrared image, the quality of the data set is improved, and preparation is made for the improvement of the generalization capability of a subsequent semantic segmentation network;

1) The method comprises the following steps of image preprocessing, image processing, image identification and image processing, wherein the image is processed by adopting a gray histogram equalization algorithm due to uneven gray distribution of an infrared image, the gray histogram equalization is adopted, the important characteristics of the image can be highlighted by increasing gray contrast in the image identification process, the histogram equalization is realized by changing the distribution of pixel points on each gray level to ensure that the pixel points have the same number of pixel points, and the aims of homogenizing the distribution of the image in the whole dynamic variation range of the gray level, improving the brightness distribution state of the image and enhancing the visual effect of the image are fulfilled;

firstly, considering a continuous function and letting a variable r represent the gray level of an image to be enhanced; let r be normalized to [0,1] and r =0 denote black, r =1 denotes white;

for a continuous function, assume that its transformation function is:

s＝T(r)0≤r≤1

in the original image, for each r, a gray value s is corresponding; wherein the transformation function satisfies the following condition:

(1) T (r) is required to be single-valued in [0,1] and monotonically increasing;

(2) When r is more than or equal to 0 and less than or equal to 1, T (r) is more than or equal to 0 and less than or equal to 1; the output gray level and the input gray level are limited in the same range;

rewrite the formula s = T (r):

r＝T ^-1 (s)0≤s≤1

let P _r (r) and P _s (s) probability density functions representing the random variables r and s, respectively; a basic result is obtained from basic probability theory:

thus, the probability density function of the transformation variable s is determined by the gray level probability density function of the input image and the selected transformation function;

selecting a transformation function:

further, there are:

bring the above into

To obtain P _s (s)＝1；

2) Denoising the image, wherein the infrared image has poor contrast and low resolution; the filtering algorithm can sharpen the image and enhance the image quality;

the noise is removed by adopting a bilateral filtering algorithm, the bilateral filtering algorithm is a nonlinear edge-preserving filtering method, and the purpose of edge-preserving and denoising is achieved by considering the spatial proximity and the pixel value similarity of the image; the method has the characteristics of non-iteration, simplicity, edge protection and the like, and the principle is as follows:

where ω (i, j, k, l) is the product of the spatial-domain kernel and the pixel-domain kernel, σ _d ，σ _r All the parameters are artificially set smoothing parameters, I (I, j) is the gray value of the central pixel point of the sliding window, I (k, l) is the gray value of any other pixel point in the window, and finally the gray value I of the denoised pixel point is calculated _D (i,j)：

3) Because the data set of the power equipment is less, the data set is expanded by adopting the modes of rotation, turnover, cutting and the like, so that the number of images is increased, and the overfitting influence of a network is eliminated;

s3, selecting a Unet semantic segmentation algorithm to complete the segmentation of the power equipment in the infrared image, training a network by using the data set obtained in the S2, and adjusting the hyper-parameters to make the total loss convergent;

further, the specific step of S3 is as follows:

1) The core algorithm of the Unet network based on deep learning is a random gradient descent algorithm with first-order momentum: SGD + Momentum, introducing first-order Momentum on the basis of SGD, and increasing inertia;

x _t+1 ＝x _t -αv _t+1

m _t ＝βm _t-1 +(1-β)gt

in the above formula, the over parameter α is the learning rate, β is usually 0.9 according to experience _t As the current position, x _t+1 Is the next position, v _t Is the previously accumulated gradient, v _t+1 Is the gradient that is currently being accumulated and,

rho is a momentum factor for the gradient at this moment;

the exponential weighted average is to increase an attenuation coefficient to control the acquisition of the historical information, and is approximately equal to the average value of the gradient vector sum of 1/(1-beta) historical time;

2) Selecting an activation function:

the mathematical expression of the ReLu function is:

f(x)＝max(0,x)

as can be seen from the expression, the ReLu function abandons power calculation, when X is greater than 0, the mapping is self, when X is less than 0, the direct mapping is 0, and the training speed of the network is accelerated; the derivatives are expressed as follows:

according to the derivative expression, when X is greater than 0, the derivative is constantly equal to 1, the problem that gradient disappears in the process of back propagation is solved, the depth design of the neural network can be deeper and deeper, and meanwhile, the sparse expression capacity of the neural network is improved; the ReLu function is a linear function when X is greater than 0, the calculation gradient is simpler compared with a nonlinear function, but the ReLu function is nonlinear in the whole interval, the expression capacity is stronger compared with the linear function, the ReLu function also has the defect, when the input is a negative number, the output is 0 through the ReLu, the neuron is permanently inactivated, and the weight cannot be updated; in response to this drawback, researchers have developed other activation functions based on the ReLu function, such as the leak ReLu function, whose mathematical expression is as follows:

f(x)＝max(0.01x,x)

as can be seen from the table transport formula, when the input X is less than 0, the output is no longer 0, thus solving the problem of complete inactivation of the neuron;

3) Selecting a loss function:

in the formula p _l(x) (x) Which class is denoted by l (x), i.e., l: Ω → { 1., K };

w (x) is the weight of each pixel point on the feature map, w _c (x) Is a weight map for balancing class frequencies, d ₁ Indicating the distance d from a certain pixel point to the background to the nearest target boundary ₂ Representing the distance from a certain pixel point to the background to a second nearest target boundary;

4) Inputting the labeled data into a debugged semantic segmentation network model to start training, and selecting to freeze a part of parameters of the network after a certain epoch; observing whether the verification loss and the total loss are converged and the descending speed of the verification loss and the total loss, and adjusting the hyper-parameters and the optimization algorithm according to the preliminary prediction result until the segmentation precision reaches an expected index;

s4, improving the Unet network and improving the identification precision;

the specific implementation steps of the improved Unet network are as follows:

1) Training the network by adopting a training mode of transfer learning and freezing a backbone network, and adopting pre-training weights on an open data set to enable the network loss to be converged faster and the consumed computing power to be less;

2) And an attention mechanism module:

CBAM is an attention module of convolutional neural networks; for the feature map generated by the convolutional neural network, CBAM calculates the attribute map of the feature map from two dimensions of a channel and a space, and then multiplies the attribute map with the input feature map to perform the self-adaptive learning of the features; the CBAM is a light-weight general module, and can be integrated into various convolutional neural networks for end-to-end training;

unet semantic segmentation network for infrared imagesEnd-to-end prediction is carried out, and each pixel point is classified, so that an attention mechanism module can be added into the model to further improve the network segmentation effect; assuming that the input characteristic diagram is F e R ^CxHxW (ii) a Using CBAM to derive one-dimensional channel attention map M _C ∈R ^Cx1x1 And two-dimensional space attention map M _S ∈R ^1xHxW (ii) a The overall attention mechanism can be summarized as:

in the formula

Representing multiplication of corresponding position elements, F 'representing a result of the input feature layer processed by the channel attention module, and F' representing a finally obtained feature layer;

channel Attention Module uses the Channel relationship between the characteristics to generate a Channel Attention map; since each channel of the feature layer is considered as a feature detector, the attention of the channel is focused on which feature layers of a given input image are meaningful, namely, each channel is given different weights to characterize the importance degree of the feature layer, in order to effectively calculate the attention of the channel, a method for compressing the spatial dimension of input feature mapping is adopted, and meanwhile, a method of average pooling and maximum pooling is used, so that the method is proved to be more representative in real time than a method of pooling alone;

And &>

Then will be

And &>

Forwarding to a shared MLP network with only one hidden layer, wherein the number of neurons of the hidden layer is C/r, and r is a hyper-parameter; the dimensionalities of two Channel attribution maps obtained respectively through maximum pooling and average pooling are Cx1x1, and the dimensionalities of the two Channel attribution maps are finally added and are the final output M of the Channel attribution through a sigmoid function _C The specific algorithm is as follows:

M _C (F)＝σ(MLP(AvgPool(F))+MLP(MaxPool(F)))

wherein σ is sigmoid function; generating a spatial attention diagram by using the spatial relation among the features; unlike the channel attention module, which focuses on information part locations, in addition to the channel attention module, to compute spatial attention, the average pooling and maximum pooling operations are first applied along the channel axis and concatenated to generate a valid feature descriptor, i.e., channel information for one feature layer is aggregated using two pooling operations to generate two-dimensional feature maps: f _savg ∈R ^1xHxW And F _smax ∈R ^1xHxW Each representing the average pooling characteristic and the maximum pooling characteristic of the channel, and then performing a connection and convolution operation using a standard convolution layer to obtain a two-dimensional spatial attention map, which is calculated as follows:

M _S (F)＝σ(f ^7x7 (CONCAT(AvgPool(F),MaxPool(F))))

in the formula, sigma is a sigmoid function, and CONCAT represents splicing in a depth dimension;

3) And a learning rate adjustment strategy:

when we use a gradient descent algorithm to optimize the objective function, as we get closer to the global minimum of the Loss value, the learning rate should become smaller to make the model as close to this as possible, while cosine annealing can reduce the learning rate by a cosine function; the function value of the cosine function firstly slowly decreases along with the increase of x, then rapidly decreases, and slowly decreases again; the descending mode well meets the adjustment requirement of the learning rate;

in the hot restart method, T is executed _i After an epoch a warm restart is started, and the index i refers to the number restart, where the restart is not a restart start but rather an annealing algorithm is simulated by increasing the learning rate and the restart is followed by using the old x _t As an initial solution, x here _t That is, solving the solution of the loss function, i.e., the weights in the neural network, by gradient descent, since the restart is to skip the local optima by increasing the learning rate, it is necessary to put x _t Setting as an old value;

the principle of cosine annealing is as follows:

character meaning in the expression: i is the number of network runs;

respectively representing the maximum value and the minimum value of the learning rate, and defining the range of the learning rate; t is a unit of _cur It indicates how many epochs are currently executed, but T _cur Is updated after each batch run, and when an epoch has not yet been executed, T is updated _cur The value of (d) can be a decimal number; for example, the total sample is 100, the size of each batch is 4, then the batch will be read in 25 times in one epoch, and then the value after the first batch is executed in the first epoch is updated to 1/25=0.04, and so on; t is a unit of _i Represents the total epoch number in the ith run;

s5, performing cluster analysis, and respectively judging whether each power device has a heating fault;

performing clustering analysis on the images by adopting a K-means clustering algorithm; the input to the K-means clustering algorithm is a data set D = { x } containing N data objects ₁ ,x ₂ ,…,x _n Output k clusters of mutually independent classes C = { C = } ₁ ,C ₂ ,…,C _n The method comprises the following specific steps:

step4, iteratively executing step2 and step3 until the algorithm is executed, wherein the cluster center point is not changed;