Comparison learning-based unsupervised defect detection method for photovoltaic module
Technical Field
The invention relates to the field of industrial product production detection, in particular to a photovoltaic module unsupervised defect detection method based on comparison learning.
Background
Due to the gradual depletion of conventional energy sources, new energy sources typified by solar photovoltaic power generation have been rapidly developed in recent years. In the production and processing process of the photovoltaic cell assembly, besides the defects of the material, the damage rate of the cell can be increased by processing the cell for many times on an automatic production line, so that the assembly has the defects of hidden cracks, fragments, insufficient solder joints, grid breakage and the like, and the defects directly influence the conversion efficiency and the service life of the product. With the continuous improvement of the industrial automation level, the traditional manual detection is not suitable for the current automatic production environment due to the defects of low precision, poor real-time performance and high cost.
Machine vision is a widely applied automated detection technology, and the traditional machine vision detection technology needs manual feature extraction and classifier design, which needs to be participated by professionals in the related field and needs to be adapted to algorithms. However, in the actual industrial production process, the industrial parts are various in types and different in specifications, and the types of defects are greatly different according to different industrial products. The traditional machine vision detection technology has the problems of complex design, poor adaptability, low robustness and the like of feature extraction and classifiers, so that the accuracy requirement and the real-time requirement of industrial production cannot be met in industrial detection.
In recent years, deep learning techniques have been widely used in the image field due to their powerful feature learning capabilities. Because the deep learning model does not need to manually extract the features, the feature extraction and the classifier in the traditional machine vision are fused, and the mapping relation from input to output is directly learned. Therefore, complex and complicated image preprocessing and manual feature extraction operations can be avoided, and meanwhile, the method can also adapt to defects of different scenes and different types, so that the industrial detection accuracy, robustness and instantaneity can be improved.
However, in the photovoltaic module data set, abnormal samples are few, and a serious problem of unbalance of positive and negative samples is caused. In the actual production process, the photovoltaic module is not only of various abnormal types, but also of various abnormal expressions, and meanwhile, the abnormal area is small. For the photovoltaic hidden crack defect, the defect performance is not obvious, the existing deep learning algorithm usually needs to distribute balanced samples, and can only detect a large object, and the detection capability for tiny abnormality is poor.
The contrast learning is widely applied to deep learning tasks by the powerful characteristic representation capability of the contrast learning. It does not require a supervision signal for the labels, but is trained in a self-supervising manner on a large amount of unlabeled data, so that a characteristic representation of the data can be obtained. However, contrast learning is mainly applied to unsupervised pre-training, and no reference is made to the field of image anomaly detection.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a photovoltaic module unsupervised defect detection method based on comparative learning, which comprises the following steps:
step 1, constructing a comparative learning network model: training a contrast learning network model by using normal samples; the comparison learning network model comprises two encoder networks and a symmetrical cross attention network pointed by an object;
step 2, constructing a defect discrimination model: based on the encoder network in the step 1, obtaining the feature distribution of the normal sample, and constructing a defect discrimination model for the features of the normal sample by adopting multivariate Gaussian mixture distribution;
and 3, inputting the image to be detected into the encoder network in the step 1 to obtain characteristics, and obtaining the detection result of the image based on the defect judgment model in the step 2.
Preferably, step 1 comprises:
step 1.1, collecting a photovoltaic module image under an actual working condition;
step 1.2, constructing a data set, comprising:
step 1.2.1, graying the image of the photovoltaic module, correcting the image angle, and changing the size into a fixed size;
step 1.2.2, dividing the gray level image into a normal image set and an abnormal image set, and constructing a training set and a test set, wherein the training set only contains normal images, and the test set contains both normal images and abnormal images;
step 1.3, two encoder networks are constructed: constructing a first encoder network and a second encoder network, wherein the input of the encoder network is a gray image with a fixed size; the two encoders have the same structure but different parameters, and the outputs of the two encoders are recorded as f and q respectively;
step 1.4, constructing a symmetrical cross attention network pointed by an object;
and step 1.5, training a comparison learning network model.
Preferably, in step 1.3, the encoder network comprises 4 residual blocks, each residual block consisting of two convolution layers of 3 × 3 convolution kernels, with an activation function of Relu; the last residual block is then transformed using the convolution layer of the 3 x 3 convolution kernel.
Preferably, step 1.4 comprises:
step 1.4.1, correlation coefficient calculation: the input of the symmetrical cross attention network to which the object points is the output of the two encoder networks, and the input dimensions are reshaped respectively to reserve space position information; calculating the correlation coefficient of q to f by using cosine similarity:
in the above formula, f
i And q is
j Representing the vectors in the spatial dimensions f and q,
represents f
i The transposed vector, | × | count the cells
2 Representing the two norms of the vector, and the correlation coefficient R represents the feature similarity on the main body space of the two pictures;
step 1.4.2, attention weight calculation: and obtaining a vector p by adopting the average pooling according to rows for the correlation coefficient R:
p=avgPool(R)
in the above formula, avgPool represents average pooling operation, p represents average correlation of q to each spatial point of f, and a trainable attention layer is adopted to map and reshape p, so as to obtain a final attention weight graph a of q to f;
step 1.4.3, symmetric attention weight calculation: the correlation coefficient of f to q can be obtained by the symmetrical operation of step 1.4.1:
obtaining an attention weight graph A' of f to q through the step 1.4.2;
step 1.4.4, enhanced feature calculation: multiplying the attention weight map with the features f and q of the encoder output, the enhanced features are calculated as follows:
in the above formula, the first and second carbon atoms are,
representing a point multiplication operation, f 'representing the enhancement feature of f, and q' representing the enhancement feature of q.
Preferably, step 1.5 comprises:
step 1.5.1, initializing the weight of a comparison learning network model to a random smaller value by using a Gaussian function, and setting the value of a hyper-parameter in the model; the hyper-parameters comprise iteration times, sample set size used by each round of training, learning rate size and learning rate attenuation value;
step 1.5.2, randomly sampling a training set, and randomly sampling the training set to obtain two different normal photovoltaic images in each iteration;
step 1.5.3, performing random data enhancement operation on the two normal photovoltaic images to obtain two enhanced photovoltaic images, wherein the data enhancement operation comprises cutting, color conversion and size adjustment;
step 1.5.4, inputting the two enhanced photovoltaic images into a contrast learning network model to respectively obtain enhanced characteristics f 'and q' of the two enhanced photovoltaic images;
step 1.5.5, constructing a loss function of a training contrast learning network model based on the enhanced features f 'and q';
step 1.5.6, according to the comparison learning network model loss function, adopting Adam as an optimizer, and updating the first encoder network and the symmetrical cross attention network pointed by the object by using a gradient descent method;
step 1.5.7, for the second encoder network, updating based on the parameters of the first encoder network by adopting a momentum gradient updating mode, wherein an updating formula is as follows:
θ 2 =mθ 2 +(1-m)θ 1
in the above formula, θ 1 Representing a parameter of the first encoder, theta 2 Representing the parameter of the second encoder, m is a parameter of momentum update, and is set to a value in the range of 0.9 to 0.999 according to the actual situation;
step 1.5.8, judging whether the iteration times are reached: if yes, executing step 1.5.9; if not, returning to the step 1.5.2;
and 1.5.9, storing and comparing parameters in the learning network model and the trained weight.
Preferably, step 1.5.5 comprises:
step 1.5.5.1, feature f at each same spatial position for enhanced features f' and q i 'and q' i Calculating cosine similarity according to the following formula:
step 1.5.5.2, training a loss function of a comparison learning network model, wherein the sum of cosine similarities at all spatial positions on a characteristic diagram takes a negative number:
L=∑ i -D(f i ′,q′ i )。
preferably, step 2 comprises:
step 2.1, reading a first encoder network in the trained comparison learning network model;
step 2.2, fitting a multivariate Gaussian mixture model for each spatial position of the normal photovoltaic image characteristics;
step 2.3, based on the multivariate Gaussian mixture model M ij Computing an anomaly score value s for each spatial location in a feature using mahalanobis distances ij The calculation formula is as follows:
step 2.4, calculating an abnormal discrimination threshold value, comprising:
2.4.1, inputting all images in the training set into an encoder network to obtain the characteristic f of the images;
step 2.4.2, according to the multivariate Gaussian mixture model M ij Computing an anomaly score value s for each spatial location in the feature f ij (ii) a Step 2.4.3, taking the maximum abnormal score value in the training set as an abnormal discrimination threshold pi;
step 2.5, constructing a defect judging module, comprising:
step 2.5.1, inputting the image x to be detected into an encoder network to obtain the characteristic f of the image x;
step 2.5.2, according to the multivariate Gaussian mixture model M ij Computing an anomaly score value s for each spatial location in the feature f ij ;
Step 2.5.3, taking the value with the maximum abnormal score in each space position as the abnormal score s of the image x to be detected;
step 2.5.4, appointing the abnormal discrimination threshold as the threshold pi obtained in the step 2.5.4;
step 2.5.5, judging whether the abnormal score is larger than a specified threshold value pi, if so, judging that the image x to be detected is an abnormal image; otherwise, judging the image x to be detected as a normal image.
Preferably, step 2.2 comprises:
step 2.2.1, performing random data enhancement operation on the normal photovoltaic images in the training set to obtain N enhanced images in total, wherein each image is represented as x k ;
Step 2.2.2, the enhanced image x k Input to the encoder network to obtain its characteristic f k ;
Step 2.2.3N features f for each spatial position (i, j) of the normal image k Fitting a multivariate Gaussian mixture model M ij Mean value of μ thereof ij The calculation formula of (2) is as follows:
its variance σ ij The calculation formula of (2) is as follows:
preferably, step 3 comprises:
step 3.1, reading a defect judging module;
step 3.2, inputting an image to be detected;
step 3.3, preprocessing the image to be detected, comprising the following steps:
3.3.1, cutting the image to be detected, removing a background area, and obtaining a main body part of the image to be detected;
step 3.3.2, carrying out image correction graying processing on the detected image, and carrying out scaling;
and 3.4, inputting the preprocessed image into a defect judging module, and returning a judging result by the defect judging module.
The invention has the beneficial effects that: the invention provides a photovoltaic module unsupervised defect detection method based on contrast learning aiming at photovoltaic module defect detection. The method can effectively detect tiny and various defects of the photovoltaic module, only needs normal photovoltaic images to train, and can solve the problem of sample imbalance. A contrast learning network model is constructed, which comprises two encoder networks and a symmetric cross attention network pointed by an object, and the encoder parameters are updated by using momentum. The multivariate Gaussian mixture model is fitted to each space position of the normal photovoltaic image characteristics, whether the image to be detected has defects or not is judged through the defect detection module, the unknown defects can be quickly and accurately detected under the condition that the number of the defect images is small, and the multivariate Gaussian mixture model has the advantages of being high in environmental adaptability and robustness.
Drawings
FIG. 1 is a schematic diagram of a comparative learning network model;
FIG. 2 is a flowchart of a comparative learning network model training process;
FIG. 3 is a schematic diagram of an encoder network architecture;
FIG. 4 is a schematic diagram of a cross-attention network structure for object pointing;
FIG. 5 is a schematic diagram of a defect discrimination model construction;
FIG. 6 is a schematic diagram showing the experimental result of a normal image of an image to be detected;
FIG. 7 is a diagram illustrating an experimental result of an abnormal image of an image to be detected.
Detailed Description
The present invention will be further described with reference to the following examples. The following examples are set forth merely to aid in the understanding of the invention. It should be noted that, for a person skilled in the art, several modifications can be made to the invention without departing from the principle of the invention, and these modifications and modifications also fall within the protection scope of the claims of the present invention.
As an embodiment, as shown in fig. 1, the invention provides a photovoltaic module unsupervised defect detection method based on comparative learning, and provides an unsupervised anomaly detection model capable of detecting small and unknown anomalies aiming at the problems of high difficulty in identifying defects of a photovoltaic module, various defect forms, complex detection environment and imbalance of positive and negative samples. The unsupervised anomaly detection model comprises a comparison learning network model and a defect judgment model. The comparison learning network model comprises two encoder networks and a symmetrical cross attention network pointed by an object; the encoder network employs a residual network model. As shown in fig. 5, the defect discrimination model obtains the feature distribution of the normal sample based on the encoder network, constructs the defect discrimination model for the features of the normal sample by using the multivariate gaussian mixture distribution, and detects the defect by determining whether the abnormal score value reaches the set threshold value. In practical application, the image to be detected is input into the defect judging module, and the defect judging module returns the judging result.
Specifically, the photovoltaic module unsupervised defect detection method based on comparative learning comprises the following steps:
step 1, constructing a comparative learning network model: training a contrast learning network model by using normal samples; the comparison learning network model comprises two encoder networks and a symmetrical cross attention network pointed by an object;
step 2, constructing a defect discrimination model: based on the encoder network in the step 1, obtaining the characteristic distribution of the normal sample, and constructing a defect discrimination model for the characteristics of the normal sample by adopting multivariate Gaussian mixture distribution;
and 3, inputting the image to be detected into the encoder network in the step 1 to obtain characteristics, and obtaining the detection result of the image based on the defect discrimination model in the step 2.
As shown in fig. 2, step 1 includes:
step 1.1, collecting a photovoltaic module image under an actual working condition;
step 1.2, constructing a data set, comprising:
step 1.2.1, graying the image of the photovoltaic module, correcting the image angle, and changing the size into a fixed size; for example, the fixed size is 64 × 64;
step 1.2.2, dividing the gray level image into a normal image set and an abnormal image set, and constructing a training set and a test set, wherein the training set only contains normal images, and the test set contains both normal images and abnormal images;
step 1.3, two encoder networks are constructed: constructing a first encoder network and a second encoder network, wherein the input of the encoder network is a gray image with a fixed size, for example, the input of the encoder network is a 64 x 64 gray image; the two encoders have the same structure but different parameters, and the outputs of the two encoders are respectively f and q;
step 1.4, constructing a symmetrical cross attention network pointed by an object;
and step 1.5, training, comparing and learning the network model.
As shown in fig. 3, in step 1.3, the encoder network contains 4 residual blocks, each of which consists of two convolution layers of 3 × 3 convolution kernels, with an activation function of Relu; the last residual block is then transformed using the convolution layer of the 3 x 3 convolution kernel.
As shown in fig. 4, step 1.4 comprises:
step 1.4.1, correlation coefficient calculation: the input of the symmetrical cross attention network to which the object points is the output of the two encoder networks, and the input dimensions are reshaped to retain the spatial position information, for example, the input dimensions are reshaped to 512 × 16 and 512 × 16 sizes, respectively; calculating the correlation coefficient of q to f by using cosine similarity:
in the above formula, the first and second carbon atoms are,
and &>
Represents vectors in the f and q spatial dimensions>
Represents f
i The transposed vector of (1) | Y | | non-conducting phosphor
2 Representing the two norms of the vector, and representing the characteristic similarity of the main body space on the two pictures by a correlation coefficient R;
step 1.4.2, attention weight calculation: for correlation coefficient
The vector p is obtained by using the average pooling by rows:
p=avgPool(R)
in the above formula, avgPool represents average pooling operation, p represents average correlation of q to each spatial point of f, and a trainable attention layer is adopted to map and reshape p, so as to obtain a final attention weight graph a of q to f;
step 1.4.3, symmetrical attention weight calculation: the correlation coefficient of f to q can be obtained by the symmetrical operation of step 1.4.1:
obtaining an attention weight graph A' of f to q through the step 1.4.2;
step 1.4.4, enhanced feature calculation: the attention weight maps A and A' represent the attention of a subject between two images, and the features f and q output by the encoder can be multiplied by the attention weight maps to enable the features to emphasize the subject on the picture more, so that the defect judgment is more accurate. Multiplying the attention weight map with the features f and q of the encoder output, the enhanced features are calculated as follows:
in the above formula, the first and second carbon atoms are,
representing a point multiplication operation, f 'representing the enhancement feature of f, and q' representing the enhancement feature of q. The residual mechanism is adopted to keep the information of the original characteristics, and the influence of the main body on the picture is emphasized by adding an attention diagram A. The obtained enhanced features f 'and q' can focus more on a main body on a picture than the original features, so that the interference of an irrelevant background on defect detection can be reduced.
Step 1.5 comprises:
step 1.5.1, initializing the weight of a comparison learning network model to a random smaller value by using a Gaussian function, and setting the value of a hyper-parameter in the model; the hyper-parameters comprise iteration times, the size of a sample set used by each round of training, the size of a learning rate and a learning rate attenuation value;
step 1.5.2, randomly sampling a training set, and randomly sampling from the training set to obtain two different normal photovoltaic images during each iteration;
step 1.5.3, performing random data enhancement operation on the two normal photovoltaic images to obtain two enhanced photovoltaic images, wherein the data enhancement operation comprises cutting, color conversion and size adjustment;
step 1.5.4, inputting the two enhanced photovoltaic images into a comparison learning network model to respectively obtain enhanced characteristics f 'and q' of the two enhanced photovoltaic images;
step 1.5.5, constructing a loss function of a training contrast learning network model based on the enhanced features f 'and q';
step 1.5.6, according to a comparison learning network model loss function, adopting Adam as an optimizer, and updating a first encoder network and a symmetrical cross attention network pointed by an object by using a gradient descent method;
step 1.5.7, for the second encoder network, updating based on the parameters of the first encoder network by adopting a momentum gradient updating mode, wherein an updating formula is as follows:
θ 2 =mθ 2 +(1-m)θ 1
in the above formula, θ 1 Representing a parameter of the first encoder, theta 2 Representing the parameter of the second encoder, m is a parameter of momentum update, and is set to a value in the range of 0.9 to 0.999 according to the actual situation;
step 1.5.8, judging whether the iteration times are reached: if yes, executing step 1.5.9; if not, returning to the step 1.5.2;
and 1.5.9, storing and comparing parameters in the learning network model and the trained weight.
Step 1.5.5 comprises:
step 1.5.5.1, feature f at each same spatial position for enhanced features f' and q i 'and q' i Calculating cosine similarity according to the following formula:
step 1.5.5.2, in order to make the features on the same spatial position as similar as possible, the loss function of the training contrast learning network model takes the negative number of the sum of cosine similarities on all spatial positions on the feature map:
L=∑ i -D(f i ′,q′ i )。
the step 2 comprises the following steps:
step 2.1, reading a first encoder network in the trained comparison learning network model;
step 2.2, fitting a multivariate Gaussian mixture model for each spatial position of the normal photovoltaic image characteristics;
step 2.3, based on the multivariate Gaussian mixture model M ij Computing an anomaly score value s for each spatial location in a feature using mahalanobis distances ij The calculation formula is as follows:
step 2.4, calculating an abnormal judgment threshold value, which comprises the following steps:
step 2.4.1, inputting all images in the training set into an encoder network to obtain the characteristic f of the images;
step 2.4.2, according to the multivariate Gaussian mixture model M ij Computing an anomaly score value s for each spatial location in the feature f ij ;
Step 2.4.3, taking the maximum abnormal score value in the training set as an abnormal discrimination threshold pi;
step 2.5, constructing a defect judging module, comprising:
step 2.5.1, inputting the image x to be detected into an encoder network to obtain the characteristic f of the image x;
step 2.5.2, according to the multivariate Gaussian mixture model M ij Computing an anomaly score value s for each spatial location in the feature f ij ;
Step 2.5.3, taking the value with the maximum abnormal score in each space position as the abnormal score s of the image x to be detected;
step 2.5.4, appointing the abnormal discrimination threshold as the threshold pi obtained in the step 2.5.4;
step 2.5.5, judging whether the abnormal score is larger than a specified threshold value pi, if so, judging that the image x to be detected is an abnormal image; otherwise, judging the image x to be detected as a normal image.
Step 2.2 comprises:
step 2.2.1, performing random data enhancement operation on the normal photovoltaic images in the training set to obtain N enhanced images in total, wherein each image is represented as x k ;
Step 2.2.2, the enhanced image x k Input to the encoder network to obtain its characteristic f k ;
Step 2.2.3N features f for each spatial position (i, j) of the normal image k Fitting a multivariate Gaussian mixture model M ij Mean value of μ ij The calculation formula of (2) is as follows:
its variance σ ij The calculation formula of (2) is as follows:
the step 3 comprises the following steps:
step 3.1, reading a defect judging module;
step 3.2, inputting an image to be detected;
step 3.3, preprocessing the image to be detected, comprising the following steps:
3.3.1, cutting the image to be detected, removing a background area, and obtaining a main body part of the image to be detected;
step 3.3.2, performing image correction graying processing on the detected image, and performing scaling, for example, the scaling is 64 × 64;
and 3.4, inputting the preprocessed image into a defect judging module, and returning a judging result by the defect judging module.
The detection result of the invention is shown in fig. 6 and 7, the first behavior is the image to be detected, and the second behavior is the abnormal score value of the image to be detected. The result shows that the difference between the abnormal point values of the normal image and the abnormal image is large, and the abnormality can be effectively identified.