CN114529817A

CN114529817A - Unmanned aerial vehicle photovoltaic fault diagnosis and positioning method based on attention neural network

Info

Publication number: CN114529817A
Application number: CN202210156492.8A
Authority: CN
Inventors: 王立辉; 肖惠迪; 苏余足威
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2022-02-21
Filing date: 2022-02-21
Publication date: 2022-05-24

Abstract

The invention relates to an unmanned aerial vehicle photovoltaic fault diagnosis and positioning method based on an attention neural network, which comprises the following steps: 1. acquiring a photovoltaic module infrared picture of aerial photography of an inspection unmanned aerial vehicle, and reading real-time position information and attitude data of the unmanned aerial vehicle; 2. constructing an FPT structure based on an attention mechanism and the FPN structure; 3. constructing a BP neural network, carrying out information fusion by adopting an FPT structure, taking an aerial photograph as input, taking a pixel coordinate of a photovoltaic module where a defect is located as output, and training the constructed neural network to obtain a neural network for detecting the infrared image defect; 4. segmenting an original image to obtain a photovoltaic module mask, and determining a faulty photovoltaic module according to a positioning result of a target detection network to obtain corner pixel coordinates of a target module; 5. and constructing a coordinate conversion model according to the real-time coordinates and the attitude angles shot by the unmanned aerial vehicle, and converting pixel coordinates output by the neural network into position coordinates under geodetic coordinates according to the geometrical relation between the ground and the air to obtain the position information of the photovoltaic module where the defect is located. The method is suitable for detecting and positioning the defects of the photovoltaic module based on unmanned aerial vehicle routing inspection, can realize real-time monitoring of the defects of the module, and improves the defect detection precision.

Description

Unmanned aerial vehicle photovoltaic fault diagnosis and positioning method based on attention neural network

Technical Field

The invention belongs to the field of intelligent fault routing inspection of solar photovoltaic power stations, and particularly relates to an unmanned aerial vehicle photovoltaic fault diagnosis and positioning method based on an attention neural network.

Background

With the increase of the demand of the society for green clean energy, the photovoltaic power station industry based on the solar power generation technology is rapidly developed. The photovoltaic power station occupies a large area and is mainly distributed in the desert, wasteland, water surface and other outdoor natural environments, and the photovoltaic modules are placed in the severe outdoor environment and are exposed to wind and sunlight all the year round, so that the problems of faults and defects faced by the photovoltaic modules are serious. The real-time management and control detection and daily maintenance of the power generation system often require higher labor cost, have the defects of strong subjectivity, single inspection means and the like, and are difficult to meet the increasing inspection requirement. In order to ensure the high-efficiency work of the photovoltaic power station, an unmanned intelligent inspection mode is urgently needed. The intelligent defect detection and positioning technology can realize the diversification of the inspection modes, simultaneously fully exerts the advantages of high precision, flexible reaction and all-weather of the robot, meets the requirement of high-frequency unmanned inspection, and has important significance for improving the generating efficiency of the photovoltaic power station and ensuring the safe and efficient operation of the large photovoltaic power station.

In recent years, the defect detection of the photovoltaic module based on the picture mainly comprises two main categories of traditional signal processing algorithm and artificial intelligence algorithm. The traditional signal processing algorithm comprises a defect positioning and segmentation algorithm based on methods such as anisotropic diffusion filtering, matched filtering, vascular filtering and the like, can only aim at a certain defect, is difficult to process the defects of various types and obvious appearance difference in the actual working environment, and has low application value; the artificial algorithm only usually adopts a deep learning convolutional neural network to carry out target detection and instance segmentation to realize defect detection and judgment, and the training is repeated for enough times to realize minimum global loss so as to achieve the training effect. Considering that the defects are various in types, the expression forms are different greatly, and the samples of all categories are not distributed uniformly, the reasonable design of the structure of the neural network and the selection of the efficient loss function calculation method become the key for constructing the neural network and are also the main factors for determining the network detection accuracy and the positioning accuracy.

The position information acquisition of the aerial image mainly comprises a method based on a photovoltaic string CAD graph and a method based on fusion of a mobile carrier and visual position information. The method based on the photovoltaic string CAD graph depends on satellite remote sensing data such as Google earth and the like, the positioning precision is low, and the routing inspection process of the unmanned aerial vehicle is influenced. The method based on the fusion of the mobile carrier and the visual position information restores a high-precision map in modes of two-dimensional reconstruction, three-dimensional reconstruction and the like, but the method does not contain semantic information, cannot automatically acquire the position distribution of a photovoltaic area, is low in identification precision, still needs manual modification, and cannot realize full automation.

Disclosure of Invention

In order to solve the technical problems, the invention provides an unmanned aerial vehicle photovoltaic fault diagnosis and positioning method based on an attention neural network, which alleviates the problem of insufficient positioning precision of deep learning target detection, is suitable for photovoltaic power station defect detection based on aerial images, can realize real-time positioning of defect components, and has the advantages of small calculation amount, good real-time performance and further intelligentized photovoltaic power station defect detection and positioning efficiency.

An unmanned aerial vehicle photovoltaic fault diagnosis and positioning method based on an attention neural network comprises the following steps:

(1) the method comprises the steps of obtaining a photovoltaic module infrared picture of the aerial photo of the inspection unmanned aerial vehicle, and reading real-time position information and attitude data of the unmanned aerial vehicle. Wherein the flying height of the unmanned aerial vehicle is h, and the GPS coordinate is (x)_D，y_D) Visual field angle gamma and pitch angle theta of camera_ZDCourse angle psi_ZD。

(2) Constructing an FPT module based on an attention mechanism and an FPN structure, wherein the FPT module comprises three converter structures of a converter ST, an implanted converter GT and a rendering converter RT, and global information interaction and top-down and bottom-up interlayer local information interaction in a feature layer are respectively realized:

the ST structure conducts convolution operation on the current characteristic layer for three times to obtain values, keys and series, and the series and the keys are respectively divided into N equal parts to obtain q_i、k_jCalculating the similarity 3 of each pair by dot product_i，jThen, MoS normalization is adopted to obtain a weight coefficient w_i，jAnd carrying out weighted summation on values to obtain a characteristic diagram:

wherein F_simFor dot-product operation, F_mulFor matrix multiplication operations, F_MoSThe normalization operation is defined as follows for the normalization operation,

for the parameters that are trainable for normalization purposes,

for all keys_i，jThe arithmetic mean of (a):

GT is the feature interaction from top to bottom, the semantic information of the deep feature map is implanted into the high-resolution information of the shallow feature map, the enhancement of the shallow feature map is realized, the correlation is calculated by adopting the negative Euclidean distance in consideration of the difference of the semantic information extracted by the feature maps with different sizes, and the formula is expressed as follows:

wherein q is_iShallow feature map, k_j，v_jFrom deep level profiles, F_eudRepresenting Euclidean distances, i.e. each pair q_iAnd k_jThe closer the distance, the greater its weight:

F_eud(q_i，k_j)＝-||q_i-k_j||² (4)

RT is feature fusion from bottom to top, aiming at rendering high semantic information through high-resolution pixel information to realize rendering of a deep feature map, wherein Q is from a high-level feature map, K and V are from a low-level feature map, weighting Q through obtaining weight w by performing global average pooling on K, reducing the size of V through convolution with step length, and finally adding K and Q with the same size after processing and performing thinning processing to obtain an output feature map;

wherein GAP represents the size of the global average pooling, alignment K, Q; f_attWeighting the characteristic diagram Q for an outer product function; f_sconvRepresents a 3 x 3 convolution with step size for downsizing the low-level feature map V; f_convRepresents a 3 × 3 convolution for weighting the result Q_attOptimizing; f_addThe adding is performed firstly, and then the thinning processing is performed through the 3 multiplied by 3 convolution;

(3) constructing a single-stage target detection BP neural network, extracting features by adopting CSPDarknet53 as a backbone network, carrying out information fusion by adopting an FPT structure, carrying out prediction and regression by an output module by utilizing an anchor frame idea, and designing a loss function by adopting an optimal binary matching idea; the method comprises the steps of taking the aerial image size as 416 multiplied by 3 as network input, taking the pixel coordinate of the center of a photovoltaic module where a defect is located as output, training a constructed neural network, and obtaining the neural network aiming at infrared image defect detection;

the output module divides the characteristic diagram into S multiplied by S grids, each grid is responsible for predicting 9 anchor frames with different scales, namely learning the relative offset and the probability of each category of the anchor frames in the grids, and learning is needed (t)_x，t_y，t_w，t_h，p_obj，p_cls) Six parameters, wherein (t)_x，t_y) Representing the offset of the anchor frame center coordinates relative to the grid points, (t)_w，t_h) Representing the ratio of the width and height of the bounding box to the width and height of the preset bounding box, p_objIndicating the probability, p, that the bounding box contains the detected object_clsRepresenting the probability that the detected target belongs to each category, and finally screening all prediction results through a non-maximum suppression algorithm improved based on DIOU to obtain the coordinates and category information of a target boundary box as network output;

and the loss function adopts a bipartite graph matching algorithm, after each generation of training is finished, the optimal bipartite graph matching between a prediction result and a true value is found through a loss function minimization strategy, and the loss function is calculated only for a prediction target box which is successfully matched, wherein the loss function is defined as the weighted sum of classification loss and positioning loss:

wherein

Represents the classification loss, calculated using the Focal loss,

representing the normalized euclidean distance of the center points of the two target boxes,

representing the normalized Manhattan distance, λ, between the upper left corner and the lower right corner of the two target boxes_cls、λ_cls、λ_ManhattanIs a weighting coefficient;

in the training stage, a one-to-one matching strategy of the prediction result and the true value target frame is adopted to screen the target frame, and only the successfully matched target frame is subjected to loss function calculation, so that the learning of the network on the distribution of the true value target frame in the characteristic learning process is enhanced, the calculation efficiency of the loss function is improved, the model convergence speed can be improved, the probability that the adjacent target frame in the prediction result is inhibited is indirectly reduced, and the detection accuracy is improved.

(4) Network LEDN using semantic segmentation_et, segmenting the original image to obtain a photovoltaic module string mask; carrying out binarization segmentation on the photovoltaic module string mask to obtain a photovoltaic module mask, then carrying out median filtering to eliminate noise points, successively carrying out corrosion and expansion operations to optimize the photovoltaic module segmentation result, and taking the minimum external rectangle as the optimized photovoltaic module mask; finally, determining the photovoltaic module with the fault according to the positioning result of the target detection network to obtain the corner pixel coordinates of the target module;

(5) the method comprises the steps that a target positioning method based on POS data of an unmanned aerial vehicle is adopted, information such as a camera attitude angle and a visual field angle during image capture, the flight height of the unmanned aerial vehicle, GPS coordinates and the like is obtained through an airborne GPS/INS system, and the GPS coordinates of target pixel points are calculated according to the aerial triangular geometrical relationship;

the target detection module obtains four corner coordinates (x) of the defective component₁，y₁，x₂，y₂) And the defect type, calculating to obtain the pixel coordinates (x, y) of the central point of the photovoltaic module according to the following formula, namely the coordinates of the target pixel point:

the flying height of the unmanned plane is h, and the GPS coordinate is (x)_D，y_D) Visual field angle gamma and pitch angle theta of camera_ZDCourse angle psi_ZDThe camera view range is (y)_f0，y_f1)，(x_f0，x_f1) The GPS coordinates of the target pixel points are (X, Y), and the size of the infrared image is (W, H);

firstly, four corner points of a camera view range, namely GPS coordinates of four vertexes of a picture are calculated, and according to an aerial triangular relationship, the method comprises the following steps:

calculating the GPS coordinate of the central point of the target photovoltaic module by the formula (9) according to the similarity relation between the infrared image pixel coordinate and the real GPS coordinate;

as a further improvement of the invention, the BP neural network constructed in the step (3) adopts the idea of optimal binary matching to design a loss function, the detection result and the true value are subjected to optimal binary matching after each generation of training is finished, and the loss function is calculated only for the successfully matched prediction target frame; compared with IOUloss, the design guides the positioning of a target frame from the positioning of a central point and the positioning of a corner point, and can improve the convergence rate of the model.

As a further improvement of the present invention, CSPDarknet53, which is a backbone network, uses Mish as the neural network activation function, slightly allowing better gradient flow for negative values, rather than a hard zero boundary as in the ReLU function:

Mish(x)＝x·tanh(ln(1+e^x)) (10)

as a further improvement of the invention, the steps of training the constructed BP neural network are as follows:

(1) the data set is subjected to operations of increasing Gaussian noise, adjusting contrast, brightness and sharpness and the like to relieve the problem of unbalanced classification and realize expansion; randomly selecting 65% of the expanded data set as a training data set, forming a verification data set by 15% of pictures, and forming a test data set by the rest 20%;

(2) the BP network tack part adopts random initialization, and the backbone part adopts pre-training weights on a COCO data set to perform transfer learning; in order to prevent the weight of the feature extraction network from being damaged in the initial training stage, the parameters of the main network in the previous 25 generations of training are frozen without participating in gradient updating;

(3) according to an error back propagation algorithm, an Adam optimizer and a small batch random gradient descent method are adopted, a StepLR fixed step attenuation strategy is adopted for a learning rate descent curve, gamma is 0.9, and fine tuning and updating are respectively carried out on weights of the backbone and the nic structures.

Has the advantages that:

the invention discloses an unmanned aerial vehicle photovoltaic fault diagnosis and positioning method based on an attention neural network, which comprises the steps of acquiring a photovoltaic module infrared picture and real-time position information and attitude data of an unmanned aerial vehicle by utilizing aerial photography of an inspection unmanned aerial vehicle, extracting features by using CSPDarknet53 as a backbone network, carrying out information fusion by adopting an FPT structure based on an attention mechanism and an FPN structure, taking the aerial photography picture as input, and taking pixel coordinates of a photovoltaic module with a defect as output; segmenting an original image to obtain a photovoltaic module mask, and determining a faulty photovoltaic module according to a positioning result of a target detection network to obtain corner pixel coordinates of a target module; and establishing a coordinate conversion model according to the real-time coordinates and the attitude angles shot by the unmanned aerial vehicle, and converting pixel coordinates output by the neural network into position coordinates under geodetic coordinates according to the aerial triangular geometrical relationship to obtain the position information of the photovoltaic module where the defect is located. The method relieves the problem of low positioning accuracy of a target detection algorithm based on deep learning, reduces the calculation amount of aerial image positioning, greatly improves the real-time performance of defect positioning, and can realize high-precision real-time monitoring of component defects.

Drawings

FIG. 1 is a flow chart of the disclosed method;

fig. 2 is a schematic diagram of flight parameters and a camera view during unmanned aerial vehicle inspection.

Detailed Description

The invention is described in further detail below with reference to the following detailed description and accompanying drawings:

the invention discloses an unmanned aerial vehicle photovoltaic fault diagnosis and positioning method based on an attention neural network, which comprises the following steps:

step 1: the method comprises the steps of obtaining a photovoltaic module infrared picture of the aerial photo of the inspection unmanned aerial vehicle, and reading real-time position information and attitude data of the unmanned aerial vehicle. Wherein the flying height of the unmanned aerial vehicle is h, and the GPS coordinate is (x)_D，y_D) Visual field angle gamma and pitch angle theta of camera_ZDCourse angle psi_ZD。

Step 2: constructing an FPT module based on an attention mechanism and an FPN structure, wherein the FPT module comprises three structures including a converter (ST), an implantation converter (GT) and a rendering converter (RT), and global information interaction and top-down and bottom-up interlayer local information interaction in a feature layer are respectively realized:

and carrying out three different convolution operations on the current characteristic layer by the ST structure to obtain values, keys and series. Dividing series and keys into N equal parts respectively to obtain q_i、k_jCalculating the similarity s of each pair by dot product_i，jThen, MoS normalization is adopted to obtain a weight coefficient w_i，jAnd carrying out weighted summation on values to obtain a characteristic diagram:

for the parameters that are trainable for normalization purposes,

for all keys_i，jThe arithmetic mean of (a):

GT is the feature interaction from top to bottom, and the semantic information of the deep feature map is implanted into the high-resolution information of the shallow feature map to realize the enhancement of the shallow feature map. Considering that there is a difference in semantic information extracted from feature maps of different sizes, the correlation is calculated by using a negative euclidean distance, and the formula is expressed as follows:

F_eud(q_i，k_j)＝-||q_i-k_j||² (4)

RT is feature fusion from bottom to top, and aims to render high semantic information through high-resolution pixel information and realize rendering of a deep feature map. Q is from the high-level feature map, and K and V are from the low-level feature map. Weighting Q by weighting weight w obtained by performing global average pooling on K, reducing the size of V by convolution with step length, adding K and Q with the same size after treatment, and performing thinning treatment to obtain an output characteristic diagram.

Wherein GAP represents the size of the global average pooling, alignment K, Q; f_attWeighting the characteristic diagram Q for an outer product function; f_sconvRepresents a 3 x 3 convolution with step size for downsizing the low-level feature map V; f_convRepresents a 3 × 3 convolution for weighting the result Q_attOptimizing; f_addWhich means that the refinement process is performed by 3 × 3 convolution after the addition.

And step 3: constructing a single-stage target detection BP neural network, extracting features by adopting CSPDarknet53 as a backbone network, carrying out information fusion by adopting an FPT structure, carrying out prediction and regression by an output module by utilizing an anchor frame idea, and designing a loss function by adopting an optimal binary matching idea; and (3) taking the aerial images with the size of 416 multiplied by 3 as network input, taking the pixel coordinates of the center of the photovoltaic module where the defects are located as output, and training the constructed neural network to obtain the neural network for detecting the infrared image defects.

The output module divides the characteristic diagram into S multiplied by S grids, each grid is responsible for predicting 9 anchor frames with different scales, namely learning the relative offset and the probability of each category of the anchor frames in the grids, and learning is needed (t)_x，t_y，t_w，t_h，p_obj，p_cls) Six parameters, wherein (t)_x，t_y) Representing the offset of the anchor frame center coordinates relative to the grid points, (t)_w，t_h) Representing the ratio of the width and height of the bounding box to the width and height of the preset bounding box, p_objIndicating the probability, p, that the bounding box contains the detected object_clsIndicating the probability that the detection target belongs to each class. And finally, screening all prediction results through a DIOU-based improved non-maximum suppression algorithm NMS, and outputting the obtained coordinates and category information of the target boundary box as a network.

And the loss function adopts a bipartite graph matching algorithm, after each generation of training is finished, the optimal bipartite graph matching between the prediction result and the true value is searched through a loss function minimization strategy, and the loss function is calculated only for the successfully matched prediction target box. The loss function is defined as a weighted sum of classification loss and localization loss:

wherein

Represents the classification loss, calculated using the Focal loss,

representing the normalized Manhattan distance, λ, between the upper left corner and the lower right corner of the two target boxes_cls、λ_cla、λ_ManhattanAre weighting coefficients.

The steps of training the constructed BP neural network are as follows:

(3-1) adding Gaussian noise, contrast, brightness, sharpness adjustment and other operations to the data set to relieve the problem of unbalanced classification and realize expansion; randomly selecting 65% of the expanded data set as a training data set, forming a verification data set by 15% of pictures, and forming a test data set by the rest 20%;

(3-2) the BP network tack part adopts random initialization, and the backbone part adopts pre-training weights on a COCO data set to perform transfer learning; in order to prevent the weight of the feature extraction network from being damaged in the initial training stage, the parameters of the main network in the previous 25 generations of training are frozen without participating in gradient updating;

(3-3) according to an error back propagation algorithm, adopting an Adam optimizer and a small batch random gradient descent method, adopting a StepLR fixed step attenuation strategy for a learning rate descent curve, taking gamma of 0.9, and respectively carrying out fine tuning updating on the weights of the backbone and the hack structures.

And 4, step 4: segmenting the original image by using a semantic segmentation network LEDNet to obtain a photovoltaic module string mask; carrying out binarization segmentation on the photovoltaic module string mask to obtain a photovoltaic module mask, then carrying out median filtering to eliminate noise points, successively carrying out corrosion and expansion operations to optimize the photovoltaic module segmentation result, and taking the minimum external rectangle as the optimized photovoltaic module mask; and finally, determining the photovoltaic module with the fault according to the positioning result of the target detection network to obtain the corner pixel coordinates of the target module.

And 5: a target positioning method based on POS data of the unmanned aerial vehicle is adopted, information such as a camera attitude angle and a visual field angle during image capture, flight height of the unmanned aerial vehicle, GPS coordinates and the like is obtained through an airborne GPS/INS system, and GPS coordinates of target pixel points are calculated according to the aerial triangular geometrical relationship.

the schematic diagram of the flight parameters and the camera view field during the unmanned aerial vehicle inspection is shown in fig. 2. Wherein the flying height of the unmanned aerial vehicle is h, and the GPS coordinate is (x)_D，y_D) Visual field angle gamma and pitch angle theta of camera_ZDHeading angle psi_ZDThe camera view range is (y)_f0，y_f1)，(x_f0，x_f1) The GPS coordinates of the target pixel point are (X, Y), and the size of the infrared image is (W, H).

First, the GPS coordinates of four corner points of the camera field of view, i.e., four vertices of the picture, are calculated. According to the aerial triangle relationship:

according to the similarity relation between the infrared image pixel coordinate and the real GPS coordinate, calculating the GPS coordinate of the central point of the target photovoltaic module according to the following formula:

the above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, but any modifications or equivalent variations made according to the technical spirit of the present invention are within the scope of the present invention as claimed.

Claims

1. Unmanned aerial vehicle photovoltaic fault diagnosis and positioning method based on attention neural network is characterized by comprising the following steps:

the ST structure conducts convolution operation on the current characteristic layer for three times to obtain values, keys and series, and the series and the keys are respectively divided into N equal parts to obtain q_i、k_jCalculating the similarity s of each pair by dot product_i，jThen, MoS normalization is adopted to obtain a weight coefficient w_i，jAnd carrying out weighted summation on values to obtain a characteristic diagram:

for the parameters that are trainable for normalization purposes,

for all keys_i，jThe arithmetic mean of (a):

wherein q is_iShallow feature map, k_j，v_jFrom deep level feature maps, F_eudRepresenting Euclidean distances, i.e. each pair q_iAnd k_jThe closer the distance, the greater its weight:

F_eud(q_i，k_j)＝-||q_i-k_j||² (4)

RT is feature fusion from bottom to top, and aims to render high semantic information through high-resolution pixel information to realize rendering of a deep feature map, wherein Q is from a high-level feature map, K and V are from a low-level feature map, the weight w obtained by globally averaging and pooling K is used for weighting Q, the size of V is reduced through convolution with step length, and finally the K and Q with the same size after processing are added and subjected to thinning processing to obtain an output feature map;

the output module divides the characteristic diagram into S multiplied by S grids, each grid is responsible for predicting 9 anchor frames with different scales, namely learning the relative offset and the probability of each category of the anchor frames in the grids, and learning is needed (t)_x，t_y，t_w，t_h，p_obj，p_cls) Six parameters, wherein (t)_x，t_y) Representing the offset of the anchor frame center coordinates relative to the grid points, (t)_w，t_h) Representing the ratio of the width and height of the bounding box to the width and height of the preset bounding box, p_objIndicating the probability, p, that the bounding box contains the detected object_clsRepresenting the probability that the detected target belongs to each category, and finally screening all prediction results through a DIOU-based improved non-maximum suppression algorithm NMS (network management system), wherein the obtained target boundary box coordinates and category information are used as network output;

wherein

Indicating a classification loss, calculated using Focalloss,

(4) segmenting the original image by using a semantic segmentation network LEDNet to obtain a photovoltaic module string mask; carrying out binarization segmentation on the photovoltaic module string mask to obtain a photovoltaic module mask, then carrying out median filtering to eliminate noise points, successively carrying out corrosion and expansion operations to optimize the photovoltaic module segmentation result, and taking the minimum external rectangle as the optimized photovoltaic module mask; finally, determining the photovoltaic module with the fault according to the positioning result of the target detection network to obtain the corner pixel coordinates of the target module;

(5) adopting a target positioning method based on POS data of an unmanned aerial vehicle, acquiring information such as a camera attitude angle and a visual field angle during image capture, flight height of the unmanned aerial vehicle, GPS coordinates and the like through an airborne GPS/1NS system, and calculating the GPS coordinates of target pixel points according to the aerial triangular geometrical relationship;

the flying height of the unmanned aerial vehicle is h, and the GPS coordinatesIs (x)_D，y_D) Visual field angle gamma and pitch angle theta of camera_ZDCourse angle psi_ZDThe camera view range is (y)_f0，y_f1)，(x_f0，x_f1) The GPS coordinates of the target pixel points are (X, Y), and the size of the infrared image is (W, H);

2. the unmanned aerial vehicle photovoltaic fault diagnosis and positioning method based on the attention neural network as claimed in claim 1, wherein the BP neural network constructed in the step (3) adopts an optimal binary matching idea to design a loss function, the optimal binary matching is performed on the detection result and the true value after the training of each generation is finished, and the loss function is calculated only for the successfully matched prediction target frame; the loss function comprises three parts of classification loss, center point loss and corner point loss.

3. The unmanned aerial vehicle photovoltaic fault diagnosis and localization method based on attention neural network as claimed in claim 1, wherein the CSPDarknet53 as backbone network adopts Mish as neural network activation function, slightly allowing better gradient flow for negative values, rather than hard zero boundary as in ReLU function:

Mish(x)＝x·tanh(ln(1+e^x)) (10)。

4. the unmanned aerial vehicle photovoltaic fault diagnosis and positioning method based on the attention neural network as claimed in claim 2, wherein the step of training the constructed BP neural network is as follows:

5. The unmanned aerial vehicle photovoltaic fault diagnosis and positioning method based on the attention neural network as claimed in claim 1, wherein in step (5), positioning of the target in the aerial image is realized by using an aerial triangle geometric relationship of the aerial attitude of the unmanned aerial vehicle.