CN114092838A

CN114092838A - Photovoltaic defect detection method based on deep learning target detection

Info

Publication number: CN114092838A
Application number: CN202111337200.2A
Authority: CN
Inventors: 薛辉; 张喜山; 田野; 赵晓龙; 吴薇; 苏冬雨
Original assignee: Qianguo Fubang Energy Technology Service Co ltd
Current assignee: Qianguo Fubang Energy Technology Service Co ltd
Priority date: 2021-11-12
Filing date: 2021-11-12
Publication date: 2022-02-25

Abstract

The invention discloses a photovoltaic defect detection method based on deep learning target detection, and belongs to the technical field of deep learning, neural networks and photovoltaic power station defect detection. The specific method comprises the following steps: firstly, an unmanned aerial vehicle is used for collecting original data, and the collected data are transmitted back to a background system for frame-by-frame extraction and pretreatment; then, making an infrared image data set; then putting the picture into a network for training, and outputting a characteristic picture; and finally, integrating the prediction network into a photovoltaic detection background system, and detecting the infrared video shot and returned by the unmanned aerial vehicle in real time. The method realizes infrared photovoltaic fault detection by combining deep learning with an unmanned aerial vehicle infrared imaging technology, has simple operation process, greatly saves manpower and material resources, and has higher recognition speed, strong robustness and high detection efficiency compared with the traditional photovoltaic hot spot detection method.

Description

Photovoltaic defect detection method based on deep learning target detection

Technical Field

The invention relates to the technical field of photovoltaic power station fault detection, in particular to a photovoltaic defect detection method based on deep learning target detection.

Background

Solar photovoltaic power generation refers to a power generation mode of directly converting light energy into electric energy without a thermal process. The method comprises photovoltaic power generation, photochemical power generation, photoinduction power generation and photobiological power generation. Photovoltaic power generation is a direct power generation method that utilizes a solar-grade semiconductor electronic device to effectively absorb solar radiation energy and convert the solar radiation energy into electric energy, and is the mainstream of current solar power generation. Electrochemical photovoltaic cells, photoelectrolytic cells and photocatalytic cells are used in photochemical power generation, and photovoltaic cells are currently used in practice.

The photovoltaic power generation system mainly comprises a solar cell, a storage battery, a controller and an inverter, wherein the solar cell is a key part of the photovoltaic power generation system, and the quality and the cost of a solar cell panel directly determine the quality and the cost of the whole system. The solar cell mainly comprises a crystalline silicon cell and a polycrystalline silicon cell, and the solar cell mainly comprises an amorphous silicon solar cell, a copper indium gallium selenide solar cell and a cadmium telluride solar cell.

The photovoltaic module works for a long time, a plurality of faults are easy to generate, the most serious fault is a hot spot fault, and the photovoltaic defect detected in the invention is a hot spot. When the photovoltaic module works, the current of the single battery in the module is reduced due to shading or self reasons, when the working current exceeds the current of the single battery, the partial battery is in a reverse bias state, the function in a circuit is changed from a power supply to a load, energy is consumed, and therefore a local overheating phenomenon is formed in the module, and hot spots are generated.

The traditional hot spot detection method comprises an I-V curve method, a manual handheld thermal imager inspection method and the like, and is low in detection speed, power loss and inaccurate in detection, complex in detection process, time-consuming and labor-consuming. With the development of science and technology, the method for carrying the infrared spectrum camera by the unmanned aerial vehicle can carry out preliminary detection on infrared photovoltaic faults.

According to the invention, the rotor unmanned aerial vehicle is adopted to carry an infrared spectrum camera to acquire images, the images are quickly returned through a 5G network, real-time detection is carried out at the background, and a target detection model is deeply learned while the timeliness is ensured, so that higher detection precision is obtained.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to provide a photovoltaic defect detection method based on deep learning target detection, which ensures detection speed and detection accuracy.

The technical scheme is as follows: the invention provides an infrared photovoltaic defect detection method based on a deep learning target detection algorithm, which comprises the following steps of:

and S1, collecting the original data. In the sunny weather with better illuminance, the rotor unmanned aerial vehicle and the infrared spectrum camera carried by the rotor unmanned aerial vehicle are used for shooting the photovoltaic module, path planning is carried out in advance before shooting, and shooting is carried out in a time period from 11 am to 16 pm in the early autumn. The shot photovoltaic panel infrared video is transmitted back in real time and is processed in the next step;

and S2, creating an infrared image data set. Since deep learning target detection algorithms require a large amount of labeling data, infrared images need to be labeled prior to training. First, all images are annotated using an open source tool, Labelimage, and an xml file including annotation information is generated. Storing the xml file in a VOCdevkit structure directory prepared in advance, wherein the xml file comprises an options file for storing the xml file, a JPEGImages file for storing an original picture and an ImageSets-Main for storing the txt file;

and S3, performing data enhancement operation on the original data. Because the number of samples is small, the number of original data sets needs to be increased by methods such as horizontal turning, rotation, inversion, cutting and splicing of the marked infrared pictures so as to ensure the accuracy of detection results;

and S4, carrying out prior frame clustering on the marked infrared images. And (4) clustering the prior frames by using a K-means + + algorithm. The K-Means + + algorithm is an unsupervised clustering algorithm, is relatively simple to realize, has good clustering effect, is widely applied, and meets the requirements on rapidness and accuracy when being applied to the method;

and S5, training a network. The invention trains data using an improved yolo network. And putting the processed data set into a network, changing the corresponding category information, the category name, the training iteration number and the corresponding file path, and running a train. The essence of the neural network is to obtain target weight information and calculate loss in the forward propagation process, and add a loss function to adjust the weight in the backward propagation process. When the loss value obtained through hundreds of iterations is very small, the weight file can be saved for target detection. The backbone network in the invention uses Darknet53 network structure, which is composed of a series of residual error structures, the original residual error block is continuously stacked on the basic backbone part, the branch part is equivalent to a residual error edge, and is directly connected to the last through a small amount of processing, and each convolution layer is operated through Batch Normalization Batch Normalization and Mish activation function. The backbone network divides the feature mapping of the basic layer into two parts, and then combines the two parts through a cross-stage hierarchical structure, thereby reducing the calculated amount and ensuring the accuracy;

and S6, outputting the feature map. After the image input model is processed, feature maps of three scales are output, the feature maps of different sizes correspond to different prior frames, and the three feature maps of different levels are fused. The shape of the 3 output tensors of the YOLO algorithm are: (19, 19, 225), (38, 38, 255), (76, 76, 225). Each grid corresponds to 3 Anchor boxes, and each bounding box to be predicted corresponds to 5 values;

s7, the prediction algorithm is integrated into a photovoltaic detection big data processing platform, after the infrared video shot and returned by the unmanned aerial vehicle is extracted at intervals of frames, real-time detection can be performed, abnormal information is accurately analyzed, a detection report is sent, and the unmanned aerial vehicle can be maintained and processed conveniently in time. For the extraction of the infrared video key frame, an algorithm based on interframe difference is adopted, and the principle is that difference is carried out on two frame images to obtain the average pixel intensity of the images, so that the change size of the two frame images can be measured. Therefore, based on the average intensity of the inter-frame difference, whenever a certain frame in the video has a large change from the picture content of the previous frame, it is considered as a key frame and extracted.

Further, in step S5, in order to enhance the detection of the target, the CutMix algorithm is used for data enhancement. The CutMix used by the invention is spliced by using two pictures and spliced in the modes of random zooming, random cutting and random arrangement, so that the data set is further increased and the robustness of the system is enhanced. The infrared photovoltaic hot spot target is generally small, the AP of the small target is generally much lower than that of the medium target and the large target, in order to enable the identification to be more accurate, 2 pictures are randomly used, randomly scaled and then randomly distributed for splicing, the detection data set is greatly enriched, particularly, the random scaling increases many small targets, and the robustness of the network is better. In addition, the CutMix data enhancement can directly calculate the data of 2 pictures during training, and the training speed is ensured while the data set is expanded;

the photovoltaic defect detection method based on deep learning target detection provided by the invention has the following beneficial effects:

1. the unmanned aerial vehicle is adopted to automatically acquire the infrared video of the photovoltaic panel, so that manpower and material resources are greatly saved, the real-time performance is ensured, and the system has stronger robustness;

2. the adopted Yolo model can better adapt to the identification of the infrared small target, has excellent performance on detection speed and detection precision, adopts a data enhancement method to more effectively increase the original data volume, and is more beneficial to identifying fault characteristics by a new backbone network.

Drawings

FIG. 1 is a flow chart of an unmanned aerial vehicle infrared imaging photovoltaic detection system;

FIG. 2 is an overall flow chart of the present invention;

Detailed Description

The technical solution of the method of the present invention is further described below with reference to the accompanying drawings of the method of the present invention.

A specific work flow of a photovoltaic defect detection method based on Yolov4 and a thermal infrared image is as follows:

and S4, carrying out prior frame clustering on the marked infrared images. And (4) clustering the prior frames by using a K-means + + algorithm. The K-Means + + algorithm is an unsupervised clustering algorithm, is relatively simple to realize, has good clustering effect, is widely applied, and meets the requirements on rapidness and accuracy when being applied to the method:

the location selection of k initialized centroids has a great influence on the final clustering result and the running time, so that the proper k centroids need to be selected. If it is simply a completely random selection, it may result in slow algorithm convergence. The K-Means + + algorithm is the optimization of the method for initializing the centroid randomly by K-Means.

The K-Means + + optimization strategy for initializing centroids is also simple, as follows:

a) randomly selecting a point from the input data point set as a first cluster center mu₁；

b) For each point X in the dataset_iCalculating its distance to the nearest cluster center among the selected cluster centers

c) Selecting a new data point as a new cluster center according to the following selection principles: d (x) larger points, with a higher probability of being selected as cluster centers;

d) repeating b and c until k clustered centroids are selected;

e) the K centroids are used as initialization centroids to run the standard K-Means algorithm.

And S5, training a network. And putting the processed data set into a network, and changing the corresponding class file category information, the class name, the training iteration times and the corresponding file path, wherein the total training time is set to be 5000 epochs in the initial training, and because the shallow layer characteristics have higher similarity after the iteration times are increased to a certain number and the generalization capability is better when the number of layers is higher, a part of Darknet53 training network is frozen when the 4000 th epoch is set, and in addition, the training speed can be accelerated to a certain extent. Then adding a CutMix online data enhancement method, and optimizing an objective function by using a gradient descent method in the network training process.

Then run a train. The essence of the neural network is to obtain target weight information and calculate loss in the forward propagation process, and add a loss function to adjust the weight in the backward propagation process. When the loss value obtained through hundreds of iterations is very small, the weight file can be saved for target detection. The backbone network in the invention uses Darknet53 network structure, which is composed of a series of residual error structures, the original residual error block is continuously stacked on the basic backbone part, the branch part is equivalent to a residual error edge, and is directly connected to the last through a small amount of processing, and each convolution layer is operated through Batch Normalization Batch Normalization and Mish activation function. The backbone network divides the feature mapping of the basic layer into two parts, and then combines the two parts through a cross-stage hierarchical structure, thereby reducing the calculated amount and simultaneously ensuring the accuracy, and the Mish activation function can further improve the training precision, and the principle is as follows:

y＝x*tanh(ln(1+exp(x)))

during training, the CutMix algorithm is used for data enhancement. The CutMix used by the invention is spliced by using two pictures and spliced in the modes of random zooming, random cutting and random arrangement, so that the data set is further increased and the robustness of the system is enhanced. The infrared photovoltaic hot spot target is generally small, the AP of the small target is generally much lower than that of the medium target and the large target, in order to enable the identification to be more accurate, 2 pictures are randomly used, randomly scaled and then randomly distributed for splicing, the detection data set is greatly enriched, particularly, the random scaling increases many small targets, and the robustness of the network is better. In addition, the CutMix data enhancement can directly calculate the data of 2 pictures during training, and the training speed is ensured while the data set is expanded;

in the process of training the network, the SPP module is inserted between the backhaul and the output layer, so that fusion characteristics can be better extracted. The SPP structure is applied to convolution of the last feature layer of Darknet53, after the last feature layer of PDarknet53 is convoluted for three times, the SPP structure is processed by using the maximum pooling of four different scales respectively, and the pooling kernel sizes of the maximum pooling downsampling are respectively 13 × 13, 9 × 9, 5 × 5 and 1 × 1;

and S6, outputting the feature map. After the image input model is processed, feature maps of three scales are output, the feature maps of different sizes correspond to different prior frames, and the three feature maps of different levels are fused. The shape of the 3 output tensors of the YOLO algorithm are: (19, 19, 225), (38, 38, 255), (76, 76, 225). Each grid corresponds to 3 Anchor boxes, and each bounding box to be predicted corresponds to 5 values. The loss function of the network includes three aspects: category loss, confidence loss, location loss:

the category loss function adopts multi-category cross entropy loss, and the formula is as follows:

in the above formula, a represents a prediction box, b represents an object class, and X_abE {0,1}, X when the prediction box contains the real target_ab1, X when the prediction box does not contain a true target value_ab＝0。

The Sigmoid probability of the target category existing in the network prediction target boundary box is represented;

the target confidence coefficient adopts binary cross entropy loss, and the formula is as follows:

the position loss adopts DIOU loss, and the formula is as follows:

L_DIOU＝1-IOU(A,B)+ρ²(A_ctr,B_ctr)/c²

wherein, A is a prediction frame, and B is a real frame; a. the_ctrTo predict the frame center point coordinates, B_ctrAnd coordinates of the center point of the real frame. ρ (g) is the Euclidean distance calculation; c is diagonal line of A, B smallest enclosing frameA length;

The flow of the algorithm is briefly described as follows:

firstly, reading a video, and sequentially calculating the interframe difference between every two frames to further obtain the average interframe difference strength;

secondly, we can choose one of the following three methods to extract the key frame, which are all based on the difference between frames;

and finally, selecting the frame with the local maximum of the average inter-frame differential strength as a key frame of the video by using the local maximum, wherein the extraction result of the method is better in richness and is uniformly dispersed in the video. It should be noted that smoothing the average inter-frame differential intensity time series is a significant skill when using this method. The method can effectively remove noise to avoid simultaneously extracting a plurality of frames in similar scenes as key frames.

Claims

1. A photovoltaic defect detection method based on deep learning target detection is mainly characterized by comprising the following steps:

s1, collecting original data: in sunny weather with good illuminance, the rotor unmanned aerial vehicle and the carried infrared spectrum camera are used for shooting the photovoltaic module, path planning is carried out in advance before shooting, and the shot photovoltaic panel infrared video is transmitted back in real time and is processed in the next step;

s2, preparing an infrared image data set: since a deep learning target detection algorithm requires a large amount of labeled data, the infrared image needs to be labeled before training; firstly, labeling all images by using an open source tool Labelimage to generate an xml file containing labeling information; storing the xml file in a VOCdevkit structure directory prepared in advance, wherein the xml file comprises an options file for storing the xml file, a JPEGImages file for storing an original picture and an ImageSets-Main for storing the txt file;

s3, performing data enhancement operation on the original data: because the number of samples is small, the number of original data sets needs to be increased by methods such as horizontal turning, rotation, inversion, cutting and splicing of the marked infrared pictures so as to ensure the accuracy of detection results;

s4, carrying out prior frame clustering on the marked infrared images: clustering the prior frames by using a K-means + + algorithm; the K-Means + + algorithm is an unsupervised clustering algorithm, is relatively simple to realize, has good clustering effect, is widely applied, and meets the requirements on rapidness and accuracy when being applied to the method:

the position selection of k initialized centroids has great influence on the final clustering result and the running time, so that proper k centroids need to be selected; if it is only a completely random choice, it may result in slow algorithm convergence; the K-Means + + algorithm is the optimization of the method for initializing the centroid randomly by the K-Means;

d) repeating b and c until k clustered centroids are selected;

e) the K centroids are used as initialized centroids to run a standard K-Means algorithm;

s5, training the network: putting the processed data set into a network, and changing corresponding class file category information, category name, training iteration times and corresponding file paths, wherein the total training time is set to 5000 epochs in the primary training, and because the similarity of shallow layer characteristics is higher after the iteration times are increased to a certain number, and the generalization capability is better when the number of layers is higher, a part of Darknet53 training network is frozen when the number of layers is 4000 epochs, and in addition, the training speed can be accelerated to a certain extent; adding a CutMix online data enhancement method, and optimizing an objective function by using a gradient descent method in the network training process;

then running a train. The essence of the neural network is that target weight information is obtained and loss is calculated in the forward propagation process, and a loss function is added to adjust the weight in the backward propagation process; when the loss value obtained through hundreds of iterations is very small, the weight file can be saved for target detection; the backbone network in the invention uses Darknet53 network structure, which is composed of a series of residual error structures, the original residual error block is continuously stacked on the basic backbone part, the branch part is equivalent to a residual error edge, and is directly connected to the last through a small amount of processing, and each convolution layer is operated through Batch Normalization and Mish activation function; the backbone network divides the feature mapping of the basic layer into two parts, and then combines the two parts through a cross-stage hierarchical structure, thereby reducing the calculated amount and simultaneously ensuring the accuracy, and the Mish activation function can further improve the training precision, and the principle is as follows:

y＝x*tanh(ln(1+exp(x)))

in the training process, a CutMix algorithm is used for data enhancement; the CutMix used by the invention is spliced by using two pictures and spliced in the modes of random zooming, random cutting and random arrangement, so that the data set is further increased, and the robustness of the system is enhanced; the infrared photovoltaic hot spot target is generally small, the AP of the small target is generally much lower than that of the medium target and the large target, in order to ensure that the identification is more accurate, 2 pictures are randomly used, randomly scaled and then randomly distributed for splicing, so that a detection data set is greatly enriched, particularly, the random scaling increases many small targets, and the robustness of the network is better; in addition, the CutMix data enhancement can directly calculate the data of 2 pictures during training, and the training speed is ensured while the data set is expanded;

in the process of training the network, an SPP module is inserted between the backhaul and the output layer to better extract fusion characteristics; the SPP structure is applied to convolution of the last feature layer of Darknet53, after the last feature layer of PDarknet53 is convoluted for three times, the SPP structure is processed by using the maximum pooling of four different scales respectively, and the pooling kernel sizes of the maximum pooling downsampling are respectively 13 × 13, 9 × 9, 5 × 5 and 1 × 1;

s6, outputting of the feature map: after the picture input model is processed, feature maps of three scales are output, the feature maps of different sizes correspond to different prior frames, and the feature maps of three different levels are fused; the shape of the 3 output tensors of the YOLO algorithm are: (19, 19, 225), (38, 38, 255), (76, 76, 225); each grid corresponds to 3 Anchor boxes, and each bounding box to be predicted corresponds to 5 values; the loss function of the network includes three aspects: category loss, confidence loss, location loss:

in the above formula, a represents a prediction box, b represents an object class, and X_abE {0,1}, when the prediction box contains trueReal object time X_ab1, X when the prediction box does not contain a true target value_ab＝0；

the position loss adopts DIOU loss, and the formula is as follows:

L_DIOU＝1-IOU(A,B)+ρ²(A_ctr,B_ctr)/c²

wherein, A is a prediction frame, and B is a real frame; a. the_ctrTo predict the frame center point coordinates, B_ctrCoordinates of the center point of the real frame; ρ (g) is the Euclidean distance calculation; c is the diagonal length of A, B minimum enclosing frame;

s7, integrating a prediction algorithm into a photovoltaic detection big data processing platform, performing frame-by-frame extraction on an infrared video shot and returned by an unmanned aerial vehicle, performing real-time detection, accurately analyzing abnormal information and sending a detection report, wherein the detection report comprises fault information and positioning information, and is convenient for timely maintenance; for the extraction of the infrared video key frame, an algorithm based on interframe difference is adopted, and the principle is that difference is carried out on two frames of images to obtain the average pixel intensity of the images, so that the change size of the two frames of images can be measured; therefore, based on the average intensity of the inter-frame difference, whenever a certain frame in the video and the content of the previous frame generate a large change, the certain frame is considered as a key frame and is extracted;

the flow of the algorithm is briefly described as follows:

finally, the local maximum is used, the frame with the average inter-frame difference intensity local maximum is selected as the key frame of the video, the extraction result of the method is better in richness, and the extraction result is uniformly dispersed in the video; it should be noted that when this method is used, it is an effective skill to smooth the time series of the average inter-frame differential intensity; the method can effectively remove noise to avoid simultaneously extracting a plurality of frames in similar scenes as key frames.