Real-time detection method, server and system for infrared image power equipment components
Technical Field
The invention belongs to the field of power equipment detection, and particularly relates to a real-time detection method, a server and a system for infrared image power equipment components.
Background
The power equipment is a basic unit for the operation of the power grid, the state of the power equipment is effectively and accurately detected and evaluated, the premise of the state maintenance and the life cycle management of the power equipment is provided, the important basis of intelligent scheduling operation is provided, and powerful technical support can be provided for the safe, reliable and efficient operation of the power grid.
In order to perform fault diagnosis on the electric equipment, the electric equipment in the image needs to be detected and located first. In particular, accurate positioning and identification of various components on the electrical equipment is required. The traditional infrared image power equipment component detection technology based on computer vision still adopts the characteristics of manual design, so that the parameters of a plurality of models need to be adjusted for application in a specific scene, and the traditional method cannot provide satisfactory results when the background of an infrared image is relatively complex.
Disclosure of Invention
In order to solve the defects of the prior art, the first object of the invention is to provide a real-time detection method for an infrared image power equipment component, which has high accuracy in positioning the power equipment component and can be used for rapidly detecting the power equipment component in an application scene.
The invention discloses a real-time detection method of infrared image power equipment components, which comprises the following steps:
step (1): acquiring infrared images containing known power equipment components to form a sample set, wherein each infrared image in the sample set is marked with a target frame and is provided with a component-level label, and the target frame is an image area containing a single known power equipment component;
step (2): constructing a neural network for detecting the power equipment component based on a YOLO target detection framework, inputting the infrared images concentrated by using the samples and the corresponding component-level labels into the constructed neural network, and training the neural network;
and (3): and processing the infrared image to be detected with the unknown component-level label by adopting the trained neural network for detecting the power equipment component, and outputting the detection result of the power equipment component.
Further, in the process of training the constructed neural network in the step (2), a feature map fusing the multi-scale features is extracted through the multi-scale features, a prediction frame is established in the feature map, and then the prediction frame is processed through multi-task learning to be close to a target frame.
In the step (2), the infrared image is subjected to multi-scale processing, wherein the scale refers to the change of the image size, and a series of feature maps with different scales are obtained;
and (3) taking any lower-layer feature graph to carry out recombination processing so that the size of the image is one fourth of the original size and the depth is 4 times of the original size, connecting the recombined lower-layer feature graph and the recombined higher-layer feature graph in the depth direction to obtain a fused final feature graph, and using the feature graph after one convolution operation as the input of the multi-task learning step.
In specific implementation, the multi-scale feature extraction includes operations of convolution, activation, pooling and batch normalization in a traditional deep neural network, specifically, an original infrared image I is gradually reduced, and the depth is increased while the image is reduced, so that a series of feature maps with different scales are obtained. In order to obtain a feature map for fusing multi-scale information, the method recombines the feature map with higher resolution of a lower layer into a feature map with one fourth of the original length-width resolution and 4 times of the original depth, and then connects the feature map with the same resolution of a higher layer in the depth direction to obtain a fused feature map.
The image length and width resolution of the feature map of the lower layer is larger, and the image length and width resolution of the feature map of the higher layer is smaller.
Further, the process of training the constructed neural network in the step (2) further includes:
dividing the infrared image into grids with preset sizes, and randomly generating a plurality of prediction frames in each grid with a target frame, wherein each prediction frame is provided with a frame tag;
finding a prediction frame with the maximum overlapping rate with the target frame in each grid with the target frame as an actual prediction frame of the grid;
and (3) performing iterative operation training by using a frame label of the actual prediction frame as a training object by adopting a momentum SGD algorithm, so that the actual prediction frame in each grid is gradually close to the target frame, and finishing the training.
The overlapping rate is the proportion of the overlapping area between the prediction frame and the target frame to the total area of the prediction frame and the target frame.
Further, the processing of the infrared image to be detected with the unknown component-level tag by using the trained neural network for detecting the power equipment component in the step (3) includes:
the method comprises the steps that an unknown image to be detected is obtained at the output end of a neural network detected by a power equipment component, the unknown image to be detected is divided into grids with preset sizes, and results of respective prediction frames are obtained by the grids;
and performing non-maximum suppression on all the prediction frames, and selecting the prediction frames as final prediction results according to the confidence degrees.
Further, the process of suppressing the non-maximum value of all the prediction frames and selecting the prediction frame as the final prediction result according to the confidence coefficient comprises the following steps:
firstly, for all the prediction frames belonging to the same power equipment component category, if the overlapping rate of any two prediction frames is greater than the non-maximum suppression overlapping rate threshold value, setting the confidence coefficient of the prediction frame with the smaller confidence coefficient to be 0, and reserving the prediction frame with the larger confidence coefficient;
and then, screening the reserved prediction frames by using a confidence threshold, excluding the prediction frames with the confidence coefficient smaller than the confidence threshold in the frame labels of the prediction frames, and reserving the prediction frames with the confidence coefficient larger than or equal to the confidence threshold in the frame labels of the prediction frames.
The invention also provides a real-time detection server for the infrared image power equipment components.
The invention discloses a real-time detection server for infrared image power equipment components, which comprises:
the system comprises a sample set construction module, a data acquisition module and a data processing module, wherein the sample set construction module is used for acquiring infrared images containing known power equipment components to form a sample set, each infrared image in the sample set is marked with a target frame and is provided with a component-level label, and the target frame is an image area containing a single known power equipment component;
the neural network training module is used for constructing a neural network for detecting the power equipment components based on the YOLO target detection framework, inputting the infrared images concentrated by using the samples and the corresponding component-level labels to the constructed neural network and training the neural network;
and the component detection module is used for processing the infrared image to be detected with the unknown component-level label by adopting the trained neural network for detecting the power equipment component and outputting the detection result of the power equipment component.
Furthermore, in the neural network training module, a feature map fusing multi-scale features is extracted through the multi-scale features, a prediction frame is established in the feature map, and then the prediction frame is close to a target frame through multi-task learning processing.
In the module, the infrared image is subjected to multi-scale processing, wherein the scale refers to the change of the image size, and a series of feature maps with different scales are obtained;
and (3) taking any lower-layer feature graph to carry out recombination processing so that the size of the image is one fourth of the original size and the depth is 4 times of the original size, connecting the recombined lower-layer feature graph and the recombined higher-layer feature graph in the depth direction to obtain a fused final feature graph, and using the feature graph after one convolution operation as the input of the multi-task learning step.
In specific implementation, the multi-scale feature extraction includes operations of convolution, activation, pooling and batch normalization in a traditional deep neural network, specifically, an original infrared image I is gradually reduced, and the depth is increased while the image is reduced, so that a series of feature maps with different scales are obtained. In order to obtain a feature map for fusing multi-scale information, the method recombines the feature map with higher resolution of a lower layer into a feature map with one fourth of the original length-width resolution and 4 times of the original depth, and then connects the feature map with the same resolution of a higher layer in the depth direction to obtain a fused feature map.
Further, the neural network training module is further configured to:
dividing the infrared image into grids with preset sizes, and randomly generating a plurality of prediction frames in each grid with a target frame, wherein each prediction frame is provided with a frame tag;
finding a prediction frame with the maximum overlapping rate with the target frame in each grid with the target frame as an actual prediction frame of the grid;
and (3) performing iterative operation training by using a frame label of the actual prediction frame as a training object by adopting a momentum SGD algorithm, so that the actual prediction frame in each grid is gradually close to the target frame, and finishing the training.
The overlapping rate is the proportion of the overlapping area between the prediction frame and the target frame to the total area of the prediction frame and the target frame.
The image length and width resolution of the feature map of the lower layer is larger, and the image length and width resolution of the feature map of the higher layer is smaller.
Further, the component detection module is further configured to:
the method comprises the steps that an unknown image to be detected is obtained at the output end of a neural network detected by a power equipment component, the unknown image to be detected is divided into grids with preset sizes, and results of respective prediction frames are obtained by the grids;
and performing non-maximum suppression on all the prediction frames, and selecting the prediction frames as final prediction results according to the confidence degrees.
Further, the component detection module is further configured to:
firstly, for all the prediction frames belonging to the same power equipment component category, if the overlapping rate of any two prediction frames is greater than the non-maximum suppression overlapping rate threshold value, setting the confidence coefficient of the prediction frame with the smaller confidence coefficient to be 0, and reserving the prediction frame with the larger confidence coefficient;
and then, screening the reserved prediction frames by using a confidence threshold, excluding the prediction frames with the confidence coefficient smaller than the confidence threshold in the frame labels of the prediction frames, and reserving the prediction frames with the confidence coefficient larger than or equal to the confidence threshold in the frame labels of the prediction frames.
The invention also provides a real-time detection system for the infrared image power equipment component.
The invention discloses a real-time detection system for infrared image power equipment components, which comprises a detection server and a client, wherein the detection server is configured to:
acquiring infrared images containing known power equipment components to form a sample set, wherein each infrared image in the sample set is marked with a target frame and is provided with a component-level label, and the target frame is an image area containing a single known power equipment component;
constructing a neural network for detecting the power equipment component based on a YOLO target detection framework, inputting the infrared images concentrated by using the samples and the corresponding component-level labels into the constructed neural network, and training the neural network;
and processing the infrared image to be detected with the unknown component-level label by adopting the trained neural network for detecting the power equipment component, and outputting the detection result of the power equipment component.
Further, the detection server is further configured to: in the process of training the constructed neural network, a feature map fusing multi-scale features is extracted through the multi-scale features, a prediction frame is established in the feature map, and then the prediction frame is close to a target frame through multi-task learning processing.
Further, the detection server is further configured to:
dividing the infrared image into grids with preset sizes, and randomly generating a plurality of prediction frames in each grid with a target frame, wherein each prediction frame is provided with a frame tag;
finding a prediction frame with the maximum overlapping rate with the target frame in each grid with the target frame as an actual prediction frame of the grid;
and (3) performing iterative operation training by using a frame label of the actual prediction frame as a training object by adopting a momentum SGD algorithm, so that the actual prediction frame in each grid is gradually close to the target frame, and finishing the training.
Further, the detection server is further configured to: the method comprises the steps that an unknown image to be detected is obtained at the output end of a neural network detected by a power equipment component, the unknown image to be detected is divided into grids with preset sizes, and results of respective prediction frames are obtained by the grids;
and performing non-maximum suppression on all the prediction frames, and selecting the prediction frames as final prediction results according to the confidence degrees.
Further, the detection server is further configured to: in the process of suppressing the non-maximum value of all the prediction frames and selecting the prediction frames as the final prediction result according to the confidence coefficients, firstly, for all the prediction frames belonging to the same power equipment component category, if the overlapping rate of any two prediction frames is greater than the non-maximum value suppression overlapping rate threshold value, the confidence coefficient of the prediction frame with the lower confidence coefficient is set to be 0, and the prediction frame with the higher confidence coefficient is reserved;
and then, screening the reserved prediction frames by using a confidence threshold, excluding the prediction frames with the confidence coefficient smaller than the confidence threshold in the frame labels of the prediction frames, and reserving the prediction frames with the confidence coefficient larger than or equal to the confidence threshold in the frame labels of the prediction frames.
Compared with the prior art, the invention has the beneficial effects that:
the method utilizes a YOLO target detection framework to train on infrared images of a large number of marked components, fully learns to obtain parameters of the network, and adopts a design mode of a full convolution neural network to enable the test speed of the model to exceed 20 frames per second on a GPU, so that the method is suitable for application scenarios of high-precision and rapid power equipment component detection.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
Fig. 1 is a flowchart of a real-time detection method for infrared image power equipment components according to the present invention.
FIG. 2 is a schematic diagram of a constructed neural network of the present invention.
Fig. 3(a) is a schematic diagram of a detection result of a component of an electrical device according to a first embodiment of the present invention.
Fig. 3(b) is a schematic diagram of a detection result of the power equipment component according to the second embodiment of the present invention.
Fig. 3(c) is a schematic diagram of a detection result of the power equipment component according to the third embodiment of the present invention.
Fig. 3(d) is a schematic diagram of a detection result of the power equipment component according to the fourth embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a real-time detection server for infrared image power equipment components according to the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Due to the excellent learning ability and expression ability of deep learning, breakthrough progress is made in the generalized target detection field. In order to provide data required by training a deep learning model, approximately 8000 infrared thermographs of the power equipment are collected firstly, and the infrared thermographs are labeled at a component level. The difference between the detection of the power equipment components in the infrared image and the generalized target detection is mainly reflected in that the power equipment may be inclined in the infrared image, and most of the current target detection tasks are directed at positive target detection.
Fig. 1 is a flowchart of a real-time detection method for infrared image power equipment components according to the present invention.
As shown in fig. 1, a real-time detection method for infrared image power equipment components of the present invention includes:
step (1): acquiring infrared images containing known power equipment components to form a sample set, wherein each infrared image in the sample set is marked with a target frame and is provided with a component-level label, and the target frame is an image area containing a single known power equipment component.
Each infrared image I is marked with a target frame, the target frame is an image area containing a single known power equipment component, each infrared image I is provided with a component-level label, and the component-level label is [ c ]i,xi,yi,θi,wi,hi]Wherein i represents the sequence number of the target box, ciThe electric power equipment components are represented by the types of the electric power equipment components contained in the target frame, and the total number of the electric power equipment components is C; x is the number ofi,yiX and y coordinates, theta, respectively representing the center point of the target framei,wi,hiRespectively representing the inclination angle, width and height of the target frame; the x-coordinate and the y-coordinate refer to the horizontal and vertical coordinates of the image, respectively. The tilt angle is the angle between the longitudinal edge of the target frame and the image abscissa direction.
In specific implementation, the power equipment components are divided into six types, namely a porcelain bushing, a sealing cover-CT, a sealing cover-PT, a flange, a grading ring and an arc extinguish chamber. The infrared image I comprises power equipment, and the power equipment is divided into parts, so that different power equipment parts in the image can belong to different power equipment and can belong to the same power equipment.
Step (2): a neural network for detecting the power equipment components based on the YOLO target detection framework is constructed, and as shown in fig. 2, infrared images collected by using a sample set and component-level labels corresponding to the infrared images are input to the constructed neural network and trained.
In the process of training the constructed neural network in the step (2), a feature map fusing the multi-scale features is extracted through the multi-scale features, a prediction frame is established in the feature map, and then the prediction frame is processed through multi-task learning to be close to a target frame.
In the step (2), the infrared image is subjected to multi-scale processing, wherein the scale refers to the change of the image size, and a series of feature maps with different scales are obtained;
and (3) taking any lower-layer feature graph to carry out recombination processing so that the size of the image is one fourth of the original size and the depth is 4 times of the original size, connecting the recombined lower-layer feature graph and the recombined higher-layer feature graph in the depth direction to obtain a fused final feature graph, and using the feature graph after one convolution operation as the input of the multi-task learning step.
In specific implementation, the multi-scale feature extraction includes operations of convolution, activation, pooling and batch normalization in a traditional deep neural network, specifically, an original infrared image I is gradually reduced, and the depth is increased while the image is reduced, so that a series of feature maps with different scales are obtained. In order to obtain a feature map for fusing multi-scale information, the method recombines the feature map with higher resolution of a lower layer into a feature map with one fourth of the original length-width resolution and 4 times of the original depth, and then connects the feature map with the same resolution of a higher layer in the depth direction to obtain a fused feature map.
The image length and width resolution of the feature map of the lower layer is larger, and the image length and width resolution of the feature map of the higher layer is smaller.
The process of training the constructed neural network in the step (2) further includes:
dividing the infrared image into grids of S multiplied by S size, and randomly generating B (B is an integer greater than or equal to 1) prediction frames in each grid with a target frame, wherein each prediction frame is provided with a frame label; b prediction frames are allowed to be overlapped, and whether the target frame exists in the grid is judged according to whether the central point of the target frame is positioned in the grid;
each prediction box has a box label [ s, p, t ]x,ty,tθ,tw,th]Wherein s represents the confidence level of the existence of the power equipment component in the prediction frame, p represents the probability distribution of the class to which the power equipment component belongs under the condition that the power equipment component exists in the prediction frame, and tθIndicates the tilt angle of the prediction frame, tx,tyX, y coordinates, t, respectively representing the center point of the prediction boxw,thThe sub-table represents the width and the height of the prediction frame; wherein the initial values of the confidence s and the probability distribution p are randomly generated and the initial values are not equal to zero.
Finding a prediction frame with the maximum overlapping rate with the target frame in each grid with the target frame as an actual prediction frame of the grid;
and (3) performing iterative operation training by using a frame label of the actual prediction frame as a training object by adopting a momentum SGD algorithm, so that the actual prediction frame in each grid is gradually close to the target frame, and finishing the training.
The overlapping rate is the proportion of the overlapping area between the prediction frame and the target frame to the total area of the prediction frame and the target frame.
The training process of the neural network in the step (2) specifically comprises a positioning loss function, a classification loss function and a tilt angle consistency constraint loss function:
L=Lloc+Lcls+Lort
wherein L represents the sum of various loss functions, LlocRepresenting the localization loss function, LclsRepresenting the classification loss function, LortRepresenting a tilt angle uniformity loss function;
the localization loss function is expressed as:
wherein the content of the first and second substances,
an indication function indicating whether the jth prediction frame in the ith grid is the nearest prediction frame, and indicating the function when the jth prediction frame in the ith grid is the
nearest prediction frame 1, when the jth prediction box in the ith mesh is not the nearest prediction box
Is 0;
indicating that there is a confidence true for the power plant component in the prediction box,
representing the actual values of the target frame position and angle parameters,
λ
noobjweight of a loss function, λ, representing the absence of a grid of power plant components
locA loss function weight representing a positioning task;
indicating whether the jth prediction frame in the ith grid is not the nearest prediction frame or not, and indicating the function when the jth prediction frame in the ith grid is not the
nearest prediction frame 1, when the jth prediction box in the ith mesh is the closest prediction box
Is 0.
s
ijRepresents the prediction confidence of the jth prediction box in the ith mesh,
and representing the true confidence corresponding to the jth prediction box in the ith grid. When the jth prediction box in the ith mesh is the closest prediction box,
is 1, otherwise
Is 0.
t
ijRepresents the predicted position and angle parameters of the jth prediction box in the ith mesh,
and representing the position and angle parameters of the real target frame corresponding to the jth prediction frame in the ith grid. Only when the jth prediction box in the ith mesh is closest to the prediction box, it has a corresponding real target box.
Indicating an L2 norm calculation.
The classification loss function is expressed as:
wherein λ isclsA loss function weight representing the classification task;
wherein p is
ijA predicted probability distribution representing that the jth prediction box in the ith grid belongs to each power equipment component class,
and the probability distribution representing that the jth prediction box in the ith grid belongs to each electric power equipment component type. This partial loss function is only calculated if the jth prediction box in the ith mesh is the closest prediction box.
The tilt angle consistency constraint loss function is expressed as:
where G denotes the G-th device group, it is assumed that image I has G device groups in total.
Indicating the relationship between the jth prediction frame of the ith grid and the gth equipment group when the jth prediction frame of the ith grid is the closest prediction frame and belongs to the gth equipment group omega
gWhen the temperature of the water is higher than the set temperature,
is 1, otherwise is 0;
represents the average value of the tilt angle of the g-th device group;
representing the predicted tilt angle of the jth prediction box in the ith mesh.
The invention keeps the inclination angles of the prediction frames corresponding to all components belonging to the same power equipment consistent through the inclination angle consistency constraint loss function: all components on the same power equipment are firstly found, so that the inclination angles of any two component prediction frames are close, and the average values of the inclination angles and the inclination angles of the central connecting line of the two component prediction frames are close. So that each of the G devices on the image is composed of a set of parts whose tilt angles are close to the corresponding parts of the prediction box.
The method adopts a target detection framework based on YOLO, and improves the positioning capability of the model by adding the inclination angle consistency constraint during training. During testing, the image of the unknown label is directly transmitted in the forward direction through the neural network once, and the detection result of the power equipment component can be obtained through non-maximum suppression. Speeds in excess of 20 frames per second were achieved on the GPU during testing.
In the specific implementation, the momentum is set to be 0.9, 90000 iterations are performed, the learning rate of the previous 30000 iterations is 0.01, the learning rate of the later 60000 iterations is 0.001, and after training is finished, parameters of the neural network detected by the power equipment component are saved.
And (3): and processing the infrared image to be detected with the unknown component-level label by adopting the trained neural network for detecting the power equipment component, and outputting the detection result of the power equipment component.
The process of processing the infrared image to be detected with the unknown component-level tag by using the trained neural network for detecting the power equipment component in the step (3) includes:
the method comprises the steps that an unknown image to be detected is obtained at the output end of a neural network detected by a power equipment component, the unknown image to be detected is divided into grids with preset sizes, and results of respective prediction frames are obtained by the grids;
and performing non-maximum suppression on all the prediction frames, and selecting the prediction frames as final prediction results according to the confidence degrees.
The process of suppressing the non-maximum value of all the prediction frames and selecting the prediction frames as the final prediction result according to the confidence coefficient comprises the following steps:
firstly, for all the prediction frames belonging to the same power equipment component category, if the overlapping rate of any two prediction frames is greater than the non-maximum suppression overlapping rate threshold value, setting the confidence coefficient of the prediction frame with the smaller confidence coefficient to be 0, and reserving the prediction frame with the larger confidence coefficient;
and then, screening the reserved prediction frames by using a confidence threshold, excluding the prediction frames with the confidence coefficient smaller than the confidence threshold in the frame labels of the prediction frames, and reserving the prediction frames with the confidence coefficient larger than or equal to the confidence threshold in the frame labels of the prediction frames.
In specific implementation, an unknown image is used as input of a neural network detected by a power equipment component to obtain prediction frames of all grid predictions, a non-maximum suppression algorithm is adopted, when the overlapping rate of two prediction frames predicted to be the same type is larger than 0.4, the confidence coefficient of the prediction frame with the smaller confidence coefficient is set to be 0, and the prediction frame with the larger confidence coefficient is reserved. And finally, selecting a prediction box with the confidence coefficient larger than 0.2 as a final prediction result. FIG. 2 shows example power equipment component detection results.
The present embodiment finally performs a test on the collected infrared image power equipment component data set, as shown in fig. 3(a) -3 (d), which includes 6 types of power equipment components, including a common porcelain bushing, a common enclosure-CT, a common enclosure-PT, a flange, a grading ring, and an arc extinguishing chamber. Wherein 0 is a porcelain bushing; 1 is a seal cover-CT; 2 is a sealing cover-PT; 3 is a grading ring; 4 is a flange; and 5, an arc extinguishing chamber.
60% of the data set was randomly selected for training and the remaining 40% was tested. And evaluating by using standard evaluation criteria AP and mAP of target detection, wherein the AP and mAP values of the method on the test set are shown in the table 1, wherein the mAP value is the average of the AP values of all categories. The larger the AP and mAP values, the better the performance.
As can be seen from the table above, the value of the AP for this method reached 92.4, with individual classes of AP values exceeding 95. The test result can show that the method has remarkable technical effect.
The method utilizes a YOLO target detection framework to train on infrared images of a large number of marked components, fully learns to obtain parameters of the network, and improves the accuracy of the method for positioning the components of the power equipment by utilizing the inclination angle consistency constraint. The design mode of the full convolution neural network adopted by the method enables the test speed of the model to exceed 20 frames per second on the GPU, and the method is suitable for application scenes of high-precision and rapid power equipment component detection.
Fig. 4 is a schematic structural diagram of a real-time detection server for infrared image power equipment components according to the present invention.
As shown in fig. 4, the present invention provides a server for real-time detection of infrared image power equipment components, including:
(1) and the sample set construction module is used for acquiring infrared images containing known electric equipment components to form a sample set, wherein each infrared image in the sample set is marked with a target frame and is provided with a component-level label, and the target frame is an image area containing a single known electric equipment component.
(2) The neural network training module is used for constructing a neural network for detecting the power equipment components based on the YOLO target detection framework, inputting the infrared images concentrated by using the samples and the corresponding component-level labels to the constructed neural network and training the neural network;
in the neural network training module, a feature map fusing multi-scale features is extracted through the multi-scale features, a prediction frame is established in the feature map, and then the prediction frame is close to a target frame through multi-task learning processing.
In the module, the infrared image is subjected to multi-scale processing, wherein the scale refers to the change of the image size, and a series of feature maps with different scales are obtained;
and (3) taking any lower-layer feature graph to carry out recombination processing so that the size of the image is one fourth of the original size and the depth is 4 times of the original size, connecting the recombined lower-layer feature graph and the recombined higher-layer feature graph in the depth direction to obtain a fused final feature graph, and using the feature graph after one convolution operation as the input of the multi-task learning step.
In specific implementation, the multi-scale feature extraction includes operations of convolution, activation, pooling and batch normalization in a traditional deep neural network, specifically, an original infrared image I is gradually reduced, and the depth is increased while the image is reduced, so that a series of feature maps with different scales are obtained. In order to obtain a feature map for fusing multi-scale information, the method recombines the feature map with higher resolution of a lower layer into a feature map with one fourth of the original length-width resolution and 4 times of the original depth, and then connects the feature map with the same resolution of a higher layer in the depth direction to obtain a fused feature map.
The neural network training module is further configured to:
dividing the infrared image into grids with preset sizes, and randomly generating a plurality of prediction frames in each grid with a target frame, wherein each prediction frame is provided with a frame tag;
finding a prediction frame with the maximum overlapping rate with the target frame in each grid with the target frame as an actual prediction frame of the grid;
and (3) performing iterative operation training by using a frame label of the actual prediction frame as a training object by adopting a momentum SGD algorithm, so that the actual prediction frame in each grid is gradually close to the target frame, and finishing the training.
The overlapping rate is the proportion of the overlapping area between the prediction frame and the target frame to the total area of the prediction frame and the target frame.
The image length and width resolution of the feature map of the lower layer is larger, and the image length and width resolution of the feature map of the higher layer is smaller.
(3) And the component detection module is used for processing the infrared image to be detected with the unknown component-level label by adopting the trained neural network for detecting the power equipment component and outputting the detection result of the power equipment component.
The component detection module is further configured to:
the method comprises the steps that an unknown image to be detected is obtained at the output end of a neural network detected by a power equipment component, the unknown image to be detected is divided into grids with preset sizes, and results of respective prediction frames are obtained by the grids;
and performing non-maximum suppression on all the prediction frames, and selecting the prediction frames as final prediction results according to the confidence degrees.
The component detection module is further configured to:
firstly, for all the prediction frames belonging to the same power equipment component category, if the overlapping rate of any two prediction frames is greater than the non-maximum suppression overlapping rate threshold value, setting the confidence coefficient of the prediction frame with the smaller confidence coefficient to be 0, and reserving the prediction frame with the larger confidence coefficient;
and then, screening the reserved prediction frames by using a confidence threshold, excluding the prediction frames with the confidence coefficient smaller than the confidence threshold in the frame labels of the prediction frames, and reserving the prediction frames with the confidence coefficient larger than or equal to the confidence threshold in the frame labels of the prediction frames.
The server provided by the invention utilizes a YOLO target detection framework to train on infrared images of a large number of marked components and fully learn to obtain network parameters, and the design mode of the full convolution neural network adopted by the system enables the test speed of the model to exceed 20 frames per second on a GPU, so that the system is suitable for an application scene of high-precision and rapid power equipment component detection.
The invention also provides a real-time detection system for the infrared image power equipment component.
The invention discloses a real-time detection system for infrared image power equipment components, which comprises a detection server and a client, wherein the detection server is configured to:
acquiring infrared images containing known power equipment components to form a sample set, wherein each infrared image in the sample set is marked with a target frame and is provided with a component-level label, and the target frame is an image area containing a single known power equipment component;
constructing a neural network for detecting the power equipment component based on a YOLO target detection framework, inputting the infrared images concentrated by using the samples and the corresponding component-level labels into the constructed neural network, and training the neural network;
and processing the infrared image to be detected with the unknown component-level label by adopting the trained neural network for detecting the power equipment component, and outputting the detection result of the power equipment component.
Further, the detection server is further configured to: in the process of training the constructed neural network, a feature map fusing multi-scale features is extracted through the multi-scale features, a prediction frame is established in the feature map, and then the prediction frame is close to a target frame through multi-task learning processing.
Further, the detection server is further configured to:
dividing the infrared image into grids with preset sizes, and randomly generating a plurality of prediction frames in each grid with a target frame, wherein each prediction frame is provided with a frame tag;
finding a prediction frame with the maximum overlapping rate with the target frame in each grid with the target frame as an actual prediction frame of the grid;
and (3) performing iterative operation training by using a frame label of the actual prediction frame as a training object by adopting a momentum SGD algorithm, so that the actual prediction frame in each grid is gradually close to the target frame, and finishing the training.
Further, the detection server is further configured to: the method comprises the steps that an unknown image to be detected is obtained at the output end of a neural network detected by a power equipment component, the unknown image to be detected is divided into grids with preset sizes, and results of respective prediction frames are obtained by the grids;
and performing non-maximum suppression on all the prediction frames, and selecting the prediction frames as final prediction results according to the confidence degrees.
Further, the detection server is further configured to: in the process of suppressing the non-maximum value of all the prediction frames and selecting the prediction frames as the final prediction result according to the confidence coefficients, firstly, for all the prediction frames belonging to the same power equipment component category, if the overlapping rate of any two prediction frames is greater than the non-maximum value suppression overlapping rate threshold value, the confidence coefficient of the prediction frame with the lower confidence coefficient is set to be 0, and the prediction frame with the higher confidence coefficient is reserved;
and then, screening the reserved prediction frames by using a confidence threshold, excluding the prediction frames with the confidence coefficient smaller than the confidence threshold in the frame labels of the prediction frames, and reserving the prediction frames with the confidence coefficient larger than or equal to the confidence threshold in the frame labels of the prediction frames.
The system provided by the invention utilizes a YOLO target detection framework to train on infrared images of a large number of marked components and fully learn to obtain network parameters, and the design mode of the full convolution neural network adopted by the system enables the test speed of the model to exceed 20 frames per second on a GPU, so that the system is suitable for an application scene of high-precision and rapid power equipment component detection.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.