CN115758284B

CN115758284B - Photovoltaic hot spot fault detection method and system based on fusion kernel entropy and information gain

Info

Publication number: CN115758284B
Application number: CN202211418893.2A
Authority: CN
Inventors: 易辉; 蒋尚俊
Original assignee: Nanjing Tech University
Current assignee: Nanjing Tech University
Priority date: 2022-11-14
Filing date: 2022-11-14
Publication date: 2023-05-16
Anticipated expiration: 2042-11-14
Also published as: CN115758284A

Abstract

The invention discloses a photovoltaic hot spot fault detection method based on fusion kernel entropy and information gain, which comprises the steps of fusing a test sample and a training sample to obtain a fusion sample, carrying out standardized processing on the fusion sample, projecting the fusion sample into a high-dimensional space by using a Gaussian kernel function, carrying out feature decomposition on a kernel matrix to obtain a feature value and a feature vector, and selecting the feature value and the feature vector to calculate feature information of the fusion sample; determining a fault detection threshold value by utilizing characteristic information of the training sample, and judging whether the photovoltaic module has a hot spot fault or not through fault detection variables and comparison with the fault detection threshold value; according to the invention, the characteristic information of the sample is extracted by the nuclear entropy component analysis method, the assumption that sample data accords with Gaussian distribution assumption is not needed, and the application range is wider; according to the invention, the kernel entropy component analysis method is a characteristic extraction method based on information entropy, and the information gain is introduced as a detection variable for hot spot fault detection, so that a better detection effect is achieved.

Description

Photovoltaic hot spot fault detection method and system based on fusion kernel entropy and information gain

Technical Field

The invention relates to a photovoltaic hot spot fault detection method and system based on fusion nuclear entropy and information gain, and belongs to the technical field of photovoltaic hot spot fault diagnosis.

Background

With the continuous aggravation of energy crisis and environmental pollution, new energy technology is vigorously developed. Among all new energy sources, the photovoltaic power generation technology becomes the research focus of related personnel by virtue of the characteristics of cleanness, high efficiency and the like. Photovoltaic power generation is usually deployed in a complex environment, and is easily influenced by the outside to cause a fault problem. The hot spot fault is one of main faults of the photovoltaic power generation system, and when the hot spot fault is serious, the stable operation of the photovoltaic power generation system is disturbed, and even the personal safety is endangered.

The current main detection method of the photovoltaic hot spot fault is based on an infrared image, a detector acquires an infrared image of the photovoltaic module through an infrared thermal imager, and judges whether the hot spot fault occurs to the photovoltaic module according to the temperature difference caused by the hot spot phenomenon. For small photovoltaic power generation systems, this approach is clearly unsuitable. Along with the development of big data, the operation state of the photovoltaic power generation system is judged to have feasibility by analyzing the detection data of the photovoltaic power generation system. The photovoltaic hot spot detection method based on the electrical parameters does not need detection personnel to enter a field environment for detection, and can greatly reduce the burden of the detection personnel. In the existing counting, the photovoltaic hot spot fault detection generally adopts a data tree-based prediction model, for example, CN 107451600A, and discloses an online photovoltaic hot spot fault detection based on an isolation mechanism, so that the measurement of the hot spot fault of any photovoltaic panel in the online photovoltaic array is realized, but the sampling is required to be carried out based on experiences for many times, and the test is inaccurate.

Disclosure of Invention

The invention aims to solve the problems and the defects of the prior art and provides a photovoltaic hot spot fault detection method based on fusion of a nuclear entropy and an information gain.

In order to solve the technical problems, the invention adopts the following technical scheme:

a photovoltaic hot spot fault detection method based on fusion kernel entropy and information gain comprises the following steps:

step 1, collecting photovoltaic module data to form sample data, wherein a training sample is X _train The test sample is X _test The method comprises the steps of carrying out a first treatment on the surface of the Training sample X _train And test sample X _test Fusion to obtain a fusion sample X _f ；

Step 2, calculating a fusion sample X _f Mean and standard deviation, to fuse sample X _f Normalized fusion sample X 'that became 0 standard deviation 1 as mean' _f ；

Step 3, selecting a projection function as a Gaussian kernel function, and using the Gaussian kernel function to obtain X' _f Projecting to a high-dimensional space to obtain a kernel matrix K _f ；

Step 4, for the kernel matrix K _f Performing feature decomposition to obtain a feature value diagonal matrix D and a feature vector matrix E, calculating a Rayleigh entropy value, and performing descending order sequencing on the feature values and the corresponding feature vectors according to the Rayleigh entropy value;

step 5, selecting characteristic values and characteristic vectors according to the Rayleigh entropy values, and calculating characteristic information Y _f ；

Step 6, utilizing the information characteristic Y of the training sample _train Calculating the detection variable F of the training sample _train And determines a fault detection threshold F _m ；

Step 7, calculating the detection variable F of the test sample _test Will detect variable F _test And a detection threshold F _m And comparing, and judging whether the photovoltaic module has a hot spot fault or not.

Preferably, the step 1 specifically includes the following steps:

collecting characteristic data of a photovoltaic module to form sample data X, X= [ X ] ₁ ,x ₂ ...x _θ ...x _S ] ^T S represents the total number of samples taken of the data, θ=1, 2 … S, x _θ The characteristic data of the photovoltaic module IS acquired for the theta time, and comprises irradiance G (W/m 2), temperature T (DEG C), open-circuit voltage UO (V), short-circuit current IS (A), maximum power point current IM (A), maximum power point voltage UM (V), maximum power MP (W) and filling factor F (1); continuously selecting n normal samples from the sample data X to form a training sample X _train ＝[x ₁ ,x ₂ ...x _n ] ^T Wherein, training sample X _train The method comprises n samples, wherein each sample comprises t photovoltaic module characteristic data; continuously selecting m samples from the rest of the sample data X to form a test sample X _test ＝[x ₁ ,x ₂ ...x _m ] ^T Wherein test sample X _test The method comprises the steps of including m samples, wherein each sample includes t photovoltaic module characteristic data; training sample X _train And test sample X _test Fusion to obtain a fusion sample X _f :

X _f ＝[X _train ,X _test ]＝[x ₁ ,x ₂ ...x _i ...x _n+m ] ^T (1)

x _i Representing characteristic data of the photovoltaic module obtained by sampling for the ith time, wherein the characteristic data comprise t photovoltaic module characteristic data, and the photovoltaic module characteristic data comprise irradiance G (W/m) ² ) Temperature T (DEG C), open circuit voltage U _O (V) short-circuit current I _S (A) Maximum power point current I _M (A) Maximum power point voltage U _M (V), maximum power M _P (W) and a fill factor F (1).

The step 2 specifically comprises the following steps: for fusion sample X _f Normalizing based on the characteristics to obtain a normalized fusion sample X' _f Normalization is performed by formula (2):

wherein mean (X _f ) Representing fusion sample X _f Mean value of sd (X) _f ) Representing fusion sample X _f Standard deviation of (2).

The step 3 specifically comprises the following steps: fusion sample X to be normalized _f ' projection into a high-dimensional space, the projection function being a gaussian kernel function, the gaussian kernel function being of formula (3):

wherein x is _i ' represents a normalized fusion sample X _f ' ith sample, x _j ' represents a normalized fusion sample X _f ' j-th sample, ||x ' ' _i -x′ _j The expression "x" represents a sample x' _i And sample x' _j The Euclidean distance between the two, sigma is the bandwidth of a Gaussian kernel function, and sigma is more than 0;

will k (x' _i ,x′ _j ) Substituting into Gaussian kernel function to obtain kernel matrix K of sample projected into high-dimensional space _f ：

The step 4 specifically comprises the following steps: nuclear matrix K _f Satisfy formula (5):

(K _f -λI)e＝0 (5)

wherein λ represents a kernel matrix K _f E represents the kernel matrix K _f I is the identity matrix of (n+m) · (n+m); let determinant |K _f - λi|=0, and find the kernel matrix K _f Is lambda ₁ ,λ ₂ ...λ _i ...λ _n+m ，λ _i Representing a kernel matrix K _f Is the i-th eigenvalue of (a); will characteristic value lambda ₁ ,λ ₂ ...λ _i ...λ _n+m Respectively substituting into (5) to obtain corresponding feature vector e ₁ ,e ₂ ...e _i ...e _n+m ，e _i Representing a kernel matrix K _f Is the i-th feature vector of (a);

due to the kernel matrix K _f Is a semi-positive definite matrix, then the kernel matrix K _f Represented by the form of formula (6):

K _f ＝EDE ^T (6)

in formula (6), d=diag (λ) ₁ ，λ ₂ …λ _i …λ _n+m ) Is a nuclear matrix K _f Diagonal matrix of eigenvalues of (E) = (E) ₁ ，e ₂ …e _i …e _n+m ) Is a feature vector matrix;

calculating the Rayleigh entropy value of the fusion sample, and sorting the feature values and the corresponding feature vectors in descending order according to the Rayleigh entropy value:

in the formula (7), the amino acid sequence of the compound,

for the rayleigh entropy value (estimated value of rayleigh entropy), n+m represents the number of samples, and 1 is a row vector of (n+m) ×1;

Based on Rayleigh entropy

The corresponding eigenvalues and eigenvectors are ordered in descending order to obtain a diagonal matrix D ' =diag (lambda ') composed of the eigenvalues after ordered in descending order ' ₁ ,λ′ ₂ …λ′ _i …λ′ _n+m ) The feature vector matrix E '=diag (E' ₁ ,e′ ₂ …e′ _i …e′ _n+m )；λ′ _i To rank the ith eigenvalue in descending order, e' _i The i-th feature vector is sorted according to descending order;

the nuclear entropy component analysis method is converted into an optimization problem:

Φ _eca feature mapping representing a kernel entropy component analysis; v (V) _d (p) represents the Rayleigh entropy value obtained by taking the first D eigenvalues and the corresponding eigenvectors from the diagonal matrix D' after sequencing; d (D) _d For the diagonal matrix of the first D eigenvalues λ in the ordered diagonal matrix D', D < = n+m, E _d Is the first d eigenvectors (with eigenvalue lambda 'in the ordered eigenvector matrix E' _i Correspondence) to the feature vector matrix.

The step 5 specifically comprises the following steps:

to phi _eca Constraint is carried out, and selection is carried out

Obtaining a d value; m is M _{Kernel entropy threshold} Is constant;

for normalized fusion samples X' _f Projecting in the space where the feature vector is located to obtain feature information Y _f Formula (9):

is a standardized fusion sample X' _f Phi represents a nonlinear mapping matrix, phi (X _f ') represents fusion sample X' _f A matrix subjected to nonlinear mapping;

characteristic information Y _f Feature information Y comprising training samples _train And information characteristic Y of test sample _test 。

The step 6 specifically comprises the following steps:

using characteristic information Y of training samples _train Calculating a detection variable and a fault detection threshold of the training sample;

the information gain of the training sample and the test sample is defined as follows:

G _t ＝Y _train -Y _test (11)

Y _train refer to the information features of training samples, Y _test Information features of the test sample;

when the test sample is a hot spot fault sample, the information gain of the photovoltaic module under the hot spot fault characteristic is as follows:

G _h ＝Y _train -Y _h (12)

Y _h refers to the information characteristics of the hot spot failure samples.

When the test sample is a normal sample, the information gain G of the two groups of normal samples _n The method comprises the following steps:

G _n ＝Y _train -Y _n (13)

Y _n information features of normal samples are pointed out;

the trace of the matrix describes the characteristic information of the matrix, taking into account the information gain G _t Not necessarily square, the detected variable of the hot spot failure is defined as:

F _test ＝tr((G _t ) ^T G _t ) (14)

tr () represents the trace-finding operation of the matrix, i.e. the sum of the diagonal elements of the square matrix.

Will trainTraining sample feature information Y _train According to the following steps of 1: the 1 scale is divided into two parts: y is Y _train1 And Y _train2 ，Y _train1 And Y _train2 The information gain in the training sample room is calculated; deriving information gain G in training sample room _a The method comprises the following steps:

G _a ＝Y _train1 -Y _train2 (15)

Y _train ＝[Y _train1 ,Y _train2 ]handle Y _train According to the following steps of 1: the 1 scale is divided into two parts: y is Y _train1 And Y _train2 For calculating the gain of information within the training sample interval.

The test variables for the training samples are:

F _train ＝tr((G _a ) ^T G _a ) (16)

the detection threshold value of the hot spot fault of the photovoltaic module is F _m ＝maxF _train 。

The step 7 specifically comprises the following steps: information gain G based on training samples and test samples _t (calculation of fault detection variable F of test sample) _test According to the detection threshold F _m Test variable F with test sample _test Judging whether the hot spot fault occurs. To test the test variable F of the sample _test Detection threshold F for hot spot fault of photovoltaic module _m Comparing if F _test ＞F _m And judging that the photovoltaic module has a hot spot fault.

A photovoltaic hot spot fault detection system based on fusion of nuclear entropy and information gain comprises a sample data establishment unit, a sample standardization unit, a nuclear matrix calculation unit, a Rayleigh entropy calculation unit, a characteristic information acquisition unit, a fault detection threshold calculation unit and a fault judgment unit;

the sample data establishing unit acquires the data of the photovoltaic module to form sample data, wherein the training sample is X _train The test sample is X _test The method comprises the steps of carrying out a first treatment on the surface of the Training sample X _train And test sample X _test Fusion to obtain a fusion sample X _f ；

Sample normalization unit calculation fusionCombined sample X _f Mean and standard deviation, to fuse sample X _f Normalized fusion sample X 'that became 0 standard deviation 1 as mean' _f ；

The kernel matrix calculation unit selects the projection function as a Gaussian kernel function, and uses the Gaussian kernel function to calculate X' _f Projecting to a high-dimensional space to obtain a kernel matrix K _f ；

Rayleigh entropy value calculation unit pair kernel matrix K _f Performing feature decomposition to obtain a feature value diagonal matrix D and a feature vector matrix E, calculating a Rayleigh entropy value, and performing descending order sequencing on the feature values and the corresponding feature vectors according to the Rayleigh entropy value;

the feature information acquisition unit selects feature values and feature vectors based on the Rayleigh entropy values, and calculates feature information Y _f ；

The failure detection threshold calculation unit is based on the information feature Y of the training sample _train Calculating the detection variable F of the training sample _train And determines a fault detection threshold F _m ；

The fault judging unit calculates the detection variable F of the test sample _test Will detect variable F _test And a detection threshold F _m And comparing, and judging whether the photovoltaic module has a hot spot fault or not.

The working process of the sample data establishing unit specifically comprises the following steps:

collecting characteristic data of a photovoltaic module to form sample data X, X= [ X ] ₁ ,x ₂ ...x _θ ...x _S ] ^T S represents the total number of samples taken of the data, θ=1, 2..s, x _θ The characteristic data of the photovoltaic module IS acquired for the theta time, and comprises irradiance G (W/m 2), temperature T (DEG C), open-circuit voltage UO (V), short-circuit current IS (A), maximum power point current IM (A), maximum power point voltage UM (V), maximum power MP (W) and filling factor F (1); continuously selecting n normal samples from the sample data X to form a training sample X _train ＝[x ₁ ,x ₂ ...x _n ] ^T Wherein, training sample X _train The method comprises n samples, wherein each sample comprises t photovoltaic module characteristic data; continuously selecting m sample compositions from the remainder of sample data XThe test sample is X _test ＝[x ₁ ,x ₂ ...x _m ] ^T Wherein test sample X _test The method comprises the steps of including m samples, wherein each sample includes t photovoltaic module characteristic data; training sample X _train And test sample X _test Fusion to obtain a fusion sample X _f :

X _f ＝[X _train ,X _test ]＝[x ₁ ,x ₂ ...x _i ...x _n+m ] ^T (1)

The working process of the sample standardization unit specifically comprises the following steps: for fusion sample X _f Normalizing based on the characteristics to obtain a normalized fusion sample X' _f Normalization is performed by formula (2):

The working process of the core matrix computing unit specifically comprises the following steps:

fusion sample X to be normalized _f ' projection into a high-dimensional space, the projection function being a gaussian kernel function, the gaussian kernel function being of formula (3):

Wherein x is _i ' represents a normalized fusion sample X _f ' firsti samples, x _j ' represents a normalized fusion sample X _f ' j-th sample, ||x ' ' _i -x′ _j The expression "x" represents a sample x' _i And sample x' _j The Euclidean distance between the two, sigma is the bandwidth of a Gaussian kernel function, and sigma is more than 0;

The working process of the Rayleigh entropy value calculating unit specifically comprises the following steps:

nuclear matrix K _f Satisfy formula (5):

(K _f -λI)e＝0 (5)

K _f ＝EDE ^T (6)

In the formula (7), the amino acid sequence of the compound,

based on Rayleigh entropy

The corresponding eigenvalues and eigenvectors are ordered in descending order to obtain a diagonal matrix D ' =diag (lambda ') composed of the eigenvalues after ordered in descending order ' ₁ ,λ′ ₂ …λ′ _i …λ′ _n+m ) The feature vector matrix E '=diag (E' ₁ ,e' ₂ …e′ _i …e' _n+m )；λ′ _i To rank the ith eigenvalue in descending order, e' _i The i-th feature vector is sorted according to descending order;

performing nuclear entropy component analysis, and converting a nuclear entropy component analysis method into an optimization problem:

Φ _eca feature mapping representing a kernel entropy component analysis;

V _d (p) represents the Rayleigh entropy value obtained by taking the first D eigenvalues and the corresponding eigenvectors from the diagonal matrix D' after sequencing; d (D) _d For the diagonal matrix of the first D eigenvalues λ in the ordered diagonal matrix D', D < = n+m, E _d Is the first d eigenvectors (with eigenvalue lambda 'in the ordered eigenvector matrix E' _i Correspondence) to the feature vector matrix.

The characteristic information acquisition unit specifically comprises the following steps:

to phi _eca Constraint is carried out, and selection is carried out

Obtaining a d value; m is M _{Kernel entropy threshold} Is constant;

The fault detection threshold calculation unit specifically includes the following steps:

G _t ＝Y _train -Y _test (11)

G _h ＝Y _train -Y _h (12)

Y _h refers to the information characteristics of the hot spot failure samples.

When the test sample is normalInformation gain G of two groups of normal samples during sample _n The method comprises the following steps:

G _n ＝Y _train -Y _n (13)

Y _n information features of normal samples are pointed out;

F _test ＝tr((G _t ) ^T G _t ) (14)

Feature information Y of training sample _train According to the following steps of 1: the 1 scale is divided into two parts: y is Y _train1 And Y _train2 ，Y _train1 And Y _train2 The information gain in the training sample room is calculated; deriving information gain G in training sample room _a The method comprises the following steps:

G _a ＝Y _train1 -Y _train2 (15)

The test variables for the training samples are:

F _train ＝tr((G _a ) ^T G _a ) (16)

The working process of the fault judging unit specifically comprises the following steps:

information gain G based on training samples and test samples _t (calculation of fault detection variable F of test sample) _test According to the detection threshold F _m Test variable F with test sample _test Judging whether the hot spot fault occurs. To test the test variable F of the sample _test Detection threshold F for hot spot fault of photovoltaic module _m Comparing if F _test ＞F _m And judging that the photovoltaic module has a hot spot fault.

The invention has the following beneficial effects:

the invention discloses a fault detection method based on fusion kernel entropy component analysis and information gain, which improves the performance of fault detection by fusing a test sample and a training sample;

according to the invention, characteristic information in the training sample is fully utilized, the training sample and the test sample are fused, the characteristic information in the sample is extracted by using a nuclear entropy component analysis method, and the characteristic extraction method based on information entropy is considered in the nuclear entropy component analysis, so that the information gain is introduced as a detection variable, and the hot spot fault of the photovoltaic module is effectively detected.

According to the invention, the information characteristics of the training sample are fully considered in the characteristic extraction process, the training sample and the test sample are fused and then the characteristic extraction is carried out, so that the normal sample and the fault sample can be better distinguished.

The method for extracting the characteristic information of the sample by selecting the nuclear entropy component analysis method does not need to assume that the sample data accords with Gaussian distribution assumption, and has wider application range.

According to the invention, the kernel entropy component analysis method is a characteristic extraction method based on information entropy, and the information gain is introduced as a detection variable for hot spot fault detection, so that a better detection effect is achieved.

According to the invention, training data and test data are fused to obtain fused data, then characteristic information of a fused sample is extracted through analysis of nuclear entropy components, and finally, an information gain is utilized to obtain a fault detection variable and a detection threshold value. The invention comprises two stages of feature extraction and fault detection. In the feature extraction stage, the test data and the training data are fused to obtain fusion data, the fusion data are subjected to standardized processing and projected to a high-dimensional space by utilizing a Gaussian kernel function, then feature decomposition is carried out on the kernel matrix to obtain a feature value and a feature vector, finally the Rayleigh entropy is calculated, and the feature information of the fusion data is calculated by selecting the feature value and the feature vector according to the size of the Rayleigh entropy. In the fault diagnosis stage, the characteristic information of the fusion data is divided into the characteristic information of the training data and the characteristic information of the test data, then the characteristic information of the training data is utilized to calculate fault detection variables of the training data and obtain a fault detection threshold value, and then the fault detection variables calculated through the characteristic information of the test data are compared with the fault detection threshold value to judge whether hot spot faults occur or not and are used in the field of photovoltaic hot spot fault diagnosis.

Drawings

FIG. 1 is a fault detection flow chart of a photovoltaic hot spot fault detection method based on fusion kernel entropy and information gain;

FIG. 2 is a graph of the feature extraction comparison result of the present invention;

fig. 3 is a graph of the fault detection results of the present invention.

Detailed Description

The invention will be explained in further detail below with reference to the drawings and embodiments. The specific embodiments described herein are to be considered in an illustrative sense only and are not intended to limit the invention. In order that those skilled in the art can better understand the implementation of the present invention, the present invention will use R language for fault diagnosis and verify the inventive results.

As shown in fig. 1, a photovoltaic hot spot fault detection method based on fusion kernel entropy and information gain comprises the following steps:

Step 4, for the kernel matrix K _f Performing feature decomposition to obtain a feature value diagonal matrix D and a feature vector matrixE, calculating a Rayleigh entropy value and sorting the characteristic values and corresponding characteristic vectors in descending order according to the Rayleigh entropy value;

step 5, selecting characteristic values and characteristic vectors according to the Rayleigh entropy, and calculating characteristic information Y _f ；

Step 7, calculating fault detection variable F of the test sample _test Will detect variable F _test And a detection threshold F _m And comparing, and judging whether the photovoltaic module has a hot spot fault or not.

The step 1 specifically comprises the following steps: collecting characteristic data of a photovoltaic module to form sample data X, X= [ X ] ₁ ,x ₂ ...x _θ ...x _S ] ^T S represents the total number of samples taken of the data, θ=1, 2..s, x _θ Represent the first _θ Collecting characteristic data of the photovoltaic assembly, wherein the characteristic data of the photovoltaic assembly comprise irradiance G (W/m 2), temperature T (DEG C), open-circuit voltage UO (V), short-circuit current IS (A), maximum power point current IM (A), maximum power point voltage UM (V), maximum power MP (W) and filling factor F (1); continuously selecting n normal samples from the sample data X to form a training sample X _train ＝[x ₁ ,x ₂ ...x _n ] ^T Wherein, training sample X _train The method comprises the steps of including n samples, wherein each sample comprises 8 photovoltaic module characteristic data; dividing training sample X from sample data X _train The rest part is continuously selected m samples to form a test sample X _test ＝[x ₁ ,x ₂ ...x _m ] ^T Wherein test sample X _test The method comprises the steps of including m samples, wherein each sample includes 8 photovoltaic module characteristic data; training sample X _train And test sample X _test Fusion to obtain a fusion sample X _f The formula (1) is shown.

X _f ＝[X _train ,X _test ]＝[x ₁ ,x ₂ ...x _i ...x _n+m ] ^T (1)

x _i Characteristic data of the photovoltaic module obtained by sampling for the ith time is represented, wherein the characteristic data comprises 8 photovoltaic module characteristic data, and the photovoltaic module characteristic data comprises irradiance G (W/m ² ) Temperature T (DEG C), open circuit voltage U _O (V) short-circuit current I _S (A) Maximum power point current I _M (A) Maximum power point voltage U _M (V), maximum power M _P (W) and a fill factor F (1).

wherein x is _i ' represents a normalized fusion sample X _f ' ith sample, x _j ' represents a normalized fusion sample X _f ' j-th sample, ||x _i ′-x _j ' indicates sample x _i ' and sample x _j 'Euclidean distance between' sigma is the bandwidth of Gaussian kernel function, and sigma > 0;

will k (x) _i ′,x _j ') is substituted into the Gaussian kernel function to obtain a kernel matrix K of the sample projected into a high-dimensional space _f ：

(K _f -λI)e＝0 (5)

due to the kernel matrix K _f Is a semi-positive definite matrix, then the kernel matrix K _f Represented by formula (6).

K _f ＝EDE ^T (6)

in the above-mentioned method, the step of,

for the Rayleigh entropy value (an estimate of Rayleigh entropy), n+m represents the sampleThe number of the codes, 1 is a row vector of (n+m) x 1;

based on Rayleigh entropy

Φ _eca feature mapping representing a kernel entropy component analysis;

in the above, phi _eca Feature mapping for kernel entropy component analysis, V _d (p) represents the Rayleigh entropy value obtained by taking the first D eigenvalues and the corresponding eigenvectors from the diagonal matrix D' after sequencing; d (D) _d For the diagonal matrix of the first D eigenvalues λ in the ordered diagonal matrix D', D < = n+m, E _d Is the first d eigenvectors (with eigenvalue lambda 'in the ordered eigenvector matrix E' _i Correspondence) to the feature vector matrix.

The step 5 specifically comprises the following steps:

to phi _eca Constraint is carried out, and selection is carried out

Obtaining a d value; in this embodiment, M _{Kernel entropy threshold} ＝0.85；

characteristic information Y _f Feature information Y comprising training samples _train And information characteristic Y of test sample _test . As shown in fig. 2, the black color is the characteristic information extracted from the training sample through the nuclear entropy component analysis, and the red color is the characteristic information extracted from the test sample (hot spot failure sample) through the nuclear entropy component analysis.

The step 6 specifically comprises the following steps: using characteristic information Y of training samples _train Calculating a detection variable and a fault detection threshold of the training sample; the kernel entropy component analysis is a characteristic extraction method based on Rayleigh entropy, and an information gain is introduced as a detection variable for hot spot fault detection, wherein the definition of the information gain is as follows:

G(B,A)＝H(B)-H(B|A) (10)

the information entropy of the data B is H (B), and the conditional entropy of the data B in the feature A is H (B/A); g (B, a) represents an information gain of the data B under the condition of the feature a (the information gain represents a degree of reduction in uncertainty of information in the case of the feature a determination);

G _t ＝Y _train -Y _test (11)

Y _train refer to the information features of training samples, Y _test Refers to the information characteristic of the test sample.

G _h ＝Y _train -Y _h (12)

Y _h Refers to the information characteristics of the hot spot failure samples.

When the test sample is a normal sample, the information gain of the two groups of normal samples is as follows:

G _n ＝Y _train -Y _n (13)

Y _n refers to the information characteristic of a normal sample.

F _test ＝tr((G _t ) ^T G _t ) (14)

Feature information Y of training sample _train According to the following steps of 1: the 1 scale is divided into two parts: y is Y _train1 And Y _train2 ，Y _train1 And Y _train2 For calculating the gain of information within the training sample room. Y is Y _train1 And Y _train2 Satisfying the equation (13), which is the information characteristic of the normal sample, to obtain the information gain G in the training sample room _a The method comprises the following steps:

G _a ＝Y _train1 -Y _train2 (15)

The test variables for the training samples are:

F _train ＝tr((G _a ) ^T G _a ) (16)

The step 7 specifically comprises the following steps: information gain G based on training samples and test samples _t (calculation of fault detection variable F of test sample) _test According to the detection threshold F _m Test variable F with test sample _test Judging whether the hot spot fault occurs. To test the test variable F of the sample _test Detection threshold F for hot spot fault of photovoltaic module _m For comparison, as shown in FIG. 3, if F _test ＞F _m And judging that the photovoltaic module has a hot spot fault.

Sample normalization unit calculates fusion sample X _f Mean and standard deviation, to fuse sample X _f Normalized fusion sample X 'that became 0 standard deviation 1 as mean' _f ；

collecting characteristic data of a photovoltaic module to form sample data X, X= [ X ] ₁ ,x ₂ ...x _θ ...x _S ] ^T S represents the total number of samples taken of the data, θ=1, 2..s, x _θ The characteristic data of the photovoltaic module IS acquired for the theta time, and comprises irradiance G (W/m 2), temperature T (DEG C), open-circuit voltage UO (V), short-circuit current IS (A), maximum power point current IM (A), maximum power point voltage UM (V), maximum power MP (W) and filling factor F (1); continuously selecting n normal samples from the sample data X to form a training sample X _train ＝[x ₁ ,x ₂ ...x _n ] ^T Wherein, training sample X _train The method comprises n samples, wherein each sample comprises t photovoltaic module characteristic data; continuously selecting m samples from the rest of the sample data X to form a test sample X _test ＝[x ₁ ,x ₂ ...x _m ] ^T Wherein test sample X _test The method comprises the steps of including m samples, wherein each sample includes t photovoltaic module characteristic data; training sample X _train And test sample X _test Fusion to obtain a fusion sample X _f :

X _f ＝[X _train ,X _test ]＝[x ₁ ,x ₂ ...x _i ...x _n+m ] ^T (1)

nuclear matrix K _f Satisfy formula (5):

(K _f -λI)e＝0 (5)

K _f ＝EDE ^T (6)

in the formula (7), the amino acid sequence of the compound,

based on Rayleigh entropy

The corresponding eigenvalues and eigenvectors are ordered in descending order to obtain a diagonal matrix D ' =diag (lambda ') composed of the eigenvalues after ordered in descending order ' ₁ ,λ′ ₂ …λ′ _i …λ′ _n+m ) The feature vector matrix E '=diag (E' ₁ ,e′ ₂ …e′ _i …e′ _n+m )；λ′ _i To rank the ith eigenvalue in descending order, e' _i To be arranged in descending orderThe i-th feature vector after the sequence;

Φ _eca feature mapping representing a kernel entropy component analysis;

to phi _eca Constraint is carried out, and selection is carried out

Obtaining a d value; m is M _{Kernel entropy threshold} Is constant; />

G _t ＝Y _train -Y _test (11)

G _h ＝Y _train -Y _h (12)

Y _h refers to the information characteristics of the hot spot failure samples.

G _n ＝Y _train -Y _n (13)

Y _n information features of normal samples are pointed out;

F _test ＝tr((G _t ) ^T G _t ) (14)

G _a ＝Y _train1 -Y _train2 (15)

Y _train ＝[Y _train1 ,Y _train2 ]handle Y _train According to the following steps of 1:1 division of proportionsIs two parts: y is Y _train1 And Y _train2 For calculating the gain of information within the training sample interval.

The test variables for the training samples are:

F _train ＝tr((G _a ) ^T G _a ) (16)

The invention uses Rstudio software to carry out simulation verification of the performance comparison of the method: the random number seed is set to 100 and the parameters of the gaussian kernel function are set to 0.1. Through verification, the method of the embodiment can effectively realize fault detection of the photovoltaic hot spots.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules or units or groups of devices in the examples disclosed herein may be arranged in a device as described in this embodiment, or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into a plurality of sub-modules.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or groups of embodiments may be combined into one module or unit or group, and furthermore they may be divided into a plurality of sub-modules or sub-units or groups. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

Furthermore, some of the embodiments are described herein as methods or combinations of method elements that may be implemented by a processor of a computer system or by other means of performing the functions. Thus, a processor with the necessary instructions for implementing the described method or method element forms a means for implementing the method or method element. Furthermore, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is for carrying out the functions performed by the elements for carrying out the objects of the invention.

The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions of the methods and apparatus of the present invention, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.

In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to perform the method of the invention in accordance with instructions in said program code stored in the memory.

By way of example, and not limitation, computer readable media comprise computer storage media and communication media. Computer-readable media include computer storage media and communication media. Computer storage media stores information such as computer readable instructions, data structures, program modules, or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of computer readable media.

As used herein, unless otherwise specified the use of the ordinal terms "first," "second," "third," etc., to describe a general object merely denote different instances of like objects, and are not intended to imply that the objects so described must have a given order, either temporally, spatially, in ranking, or in any other manner.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of the above description, will appreciate that other embodiments are contemplated within the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is defined by the appended claims.

Claims

1. The photovoltaic hot spot fault detection method based on fusion kernel entropy and information gain is characterized by comprising the following steps of:

Step 7, calculating the detection variable F of the test sample _test Will detect variable F _test And a detection threshold F _m Comparing, and judging whether the photovoltaic module has a hot spot fault;

the step 1 specifically comprises the following steps:

collecting characteristic data of a photovoltaic module to form sample data X, X= [ X ] ₁ ,x ₂ ...x _θ ...x _S ] ^T S represents the total number of samples taken of the data, θ=1, 2 … S, x _θ The characteristic data of the photovoltaic module is acquired for the theta time, and the characteristic data of the photovoltaic module comprises irradiance, temperature, open-circuit voltage, short-circuit current, maximum power point voltage, maximum power and filling factor F; continuously selecting n normal samples from the sample data X to form a training sample X _train ＝[x ₁ ,x ₂ ...x _n ] ^T Wherein, training sample X _train The method comprises n samples, wherein each sample comprises t photovoltaic module characteristic data; selecting m consecutive samples from the rest of the sample data X to form a test sample X _test ＝[x ₁ ,x ₂ ...x _m ] ^T Wherein test sample X _test The method comprises the steps of including m samples, wherein each sample includes t photovoltaic module characteristic data; training sample X _train And test sample X _test Fusion to obtain a fusion sample X _f :

X _f ＝[X _train ,X _test ]＝[x ₁ ,x ₂ ...x _i ...x _n+m ] ^T (1)

x _i Representing characteristic data of the photovoltaic module obtained by the ith sampling;

the step 4 specifically comprises the following steps:

nuclear matrix K _f Satisfy formula (5):

(K _f -λI)e＝0 (5)

wherein λ represents a kernel matrix K _f E represents the kernel matrix K _f I is the identity matrix of (n+m) · (n+m); determinant |K _f - λi|=0, and find the kernel matrix K _f Is lambda ₁ ,λ ₂ ...λ _i ...λ _n+m ，λ _i Representing a kernel matrix K _f Is the i-th eigenvalue of (a);

will characteristic value lambda ₁ ,λ ₂ ...λ _i ...λ _n+m Respectively substituting into (5) to obtain corresponding feature vector e ₁ ,e ₂ ...e _i ...e _n+m ，e _i Representing a kernel matrix K _f Is the i-th feature vector of (a);

nuclear matrix K _f Represented by the form of formula (6):

K _f ＝EDE ^T (6)

D＝diag(λ ₁ ，λ ₂ …λ _i …λ _n+m ) Is a nuclear matrix K _f Diagonal matrix of eigenvalues of (E) = (E) ₁ ，e ₂ …e _i …e _n+m ) Is a feature vector matrix;

for the Rayleigh entropy value, n+m represents the number of samples, and 1 is a row vector of (n+m) ×1;

Based on Rayleigh entropy

Corresponding eigenvalue and eigenvectorThe rows are sorted in descending order to obtain a diagonal matrix D ' =diag (lambda ') composed of eigenvalues after descending order sorting ' ₁ ,λ' ₂ …λ' _i …λ' _n+m ) Feature vector matrix after descending order sequencing

λ' _i To rank the ith eigenvalue in descending order, e' _i The i-th feature vector is sorted according to descending order;

performing nuclear entropy component analysis:

Φ _eca feature mapping representing a kernel entropy component analysis;

V _d (p) represents the Rayleigh entropy value obtained by taking the first D eigenvalues and the corresponding eigenvectors from the diagonal matrix D' after sequencing; d (D) _d Diagonal matrix composed of the first D eigenvalues in the ordered diagonal matrix D', D < = n+m, E _d Is a feature vector matrix formed by the first d feature vectors in the ordered feature vector matrix E'.

2. The method for detecting the photovoltaic hot spot fault based on the fusion nuclear entropy and the information gain according to claim 1, wherein the method comprises the following steps of,

the step 2 specifically comprises the following steps:

for fusion sample X _f Normalizing based on the characteristics to obtain a normalized fusion sample X' _f Normalization is performed by formula (2):

3. The method for detecting the photovoltaic hot spot fault based on the fusion nuclear entropy and the information gain according to claim 2, wherein,

The step 3 specifically comprises the following steps:

/>

4. The method for detecting the photovoltaic hot spot fault based on the fusion nuclear entropy and the information gain according to claim 1, wherein the method comprises the following steps of,

the step 5 specifically comprises the following steps:

to phi _eca Constraint is carried out, and selection is carried out

Obtaining a d value; m is M _{Kernel entropy threshold} Is constant;

5. The method for detecting photovoltaic hot spot failure based on fusion kernel entropy and information gain as claimed in claim 4, wherein,

the step 6 specifically comprises the following steps:

defining information gain G for training samples and test samples _t The method comprises the following steps:

G _t ＝Y _train -Y _test (11)

when the test sample is a hot spot fault sample, the information gain G of the photovoltaic module under the hot spot fault characteristic _h The method comprises the following steps:

G _h ＝Y _train -Y _h (12)

Y _h information features of the hot spot fault samples;

G _n ＝Y _train -Y _n (13)

Y _n information features of normal samples are pointed out;

the detected variable of the hot spot fault is defined as:

F _test ＝tr((G _t ) ^T G _t ) (14)

tr () represents the trace calculation of the matrix;

information gain G in training sample room _a The method comprises the following steps:

G _a ＝Y _train1 -Y _train2 (15)

Y _train ＝[Y _train1 ,Y _train2 ]handle Y _train According to the following steps of 1: the 1 scale is divided into two parts: y is Y _train1 And Y _train2 For calculating the gain of information within the training sample interval:

the detection variables of the training samples are:

F _train ＝tr((G _a ) ^T G _a ) (16)

6. The photovoltaic hot spot fault detection system based on fusion of the nuclear entropy and the information gain is characterized by comprising a sample data establishing unit, a sample normalizing unit, a nuclear matrix calculating unit, a Rayleigh entropy calculating unit, a characteristic information obtaining unit, a fault detection threshold calculating unit and a fault judging unit;

Rayleigh entropy value calculation unit kernelMatrix K _f Performing feature decomposition to obtain a feature value diagonal matrix D and a feature vector matrix E, calculating a Rayleigh entropy value, and performing descending order sequencing on the feature values and the corresponding feature vectors according to the Rayleigh entropy value;

The fault judging unit calculates the detection variable F of the test sample _test Will detect variable F _test And a detection threshold F _m Comparing, and judging whether the photovoltaic module has a hot spot fault;

Collecting characteristic data of a photovoltaic module to form sample data X, X= [ X ] ₁ ,x ₂ ...x _θ ...x _S ] ^T S represents the total number of samples taken of the data, θ=1, 2 … S, x _θ The characteristic data of the photovoltaic module is acquired for the theta time, and the characteristic data of the photovoltaic module comprises irradiance, temperature, open-circuit voltage, short-circuit current, maximum power point voltage, maximum power and filling factor; continuously selecting n normal samples from the sample data X to form a training sample X _train ＝[x ₁ ,x ₂ ...x _n ] ^T Wherein, training sample X _train The method comprises n samples, wherein each sample comprises t photovoltaic module characteristic data; continuously selecting m samples from the rest of the sample data X to form a test sample X _test ＝[x ₁ ,x ₂ ...x _m ] ^T Wherein test sample X _test The method comprises the steps of including m samples, wherein each sample includes t photovoltaic module characteristic data; training sample X _train And test sample X _test Fusion to obtain a fusion sample X _f :

X _f ＝[X _train ,X _test ]＝[x ₁ ,x ₂ ...x _i ...x _n+m ] ^T (1)

the working process of the sample standardization unit specifically comprises the following steps:

wherein mean (X _f ) Representing fusion sample X _f Mean value of sd (X) _f ) Representing fusion sample X _f Standard deviation of (2);

wherein x is _i ' represents a normalized fusion sample X _f ' ith sample, x _j ' represents a normalized fusion sample X _f ' j-th sample, ||x _i ′-x _j ' indicates sample x _i ' and sample x _j 'Euclidean distance between' sigma is the bandwidth of Gaussian kernel function, and sigma > 0; will k (x) _i ′,x _j ') is substituted into the Gaussian kernel function to obtain a kernel matrix K of the sample projected into a high-dimensional space _f ：

nuclear matrix K _f Satisfy formula (5):

(K _f -λI)e＝0 (5)

wherein λ represents a kernel matrix K _f E represents the kernel matrix K _f I is the identity matrix of (n+m) · (n+m); determinant |K _f - λi|=0, and find the kernel matrix K _f Is lambda ₁ ,λ ₂ ...λ _i ...λ _n+m ，λ _i Representing a kernel matrix K _f Is the i-th eigenvalue of (a); will characteristic value lambda ₁ ,λ ₂ ...λ _i ...λ _n+m Respectively substituting into (5) to obtain corresponding feature vector e ₁ ,e ₂ ...e _i ...e _n+m ，e _i Representing a kernel matrix K _f Is the i-th feature vector of (a);

nuclear matrix K _f Represented by the form of formula (6):

K _f ＝EDE ^T (6)

based on Rayleigh entropy

For the corresponding eigenvalue sumThe feature vectors are subjected to descending order to obtain a diagonal matrix D ' =diag (lambda ') formed by feature values after descending order ' ₁ ,λ' ₂ …λ' _i …λ' _n+m ) Feature vector matrix after descending order sequencing

performing nuclear entropy component analysis:

Φ _eca feature mapping representing a kernel entropy component analysis;

V _d (p) represents the Rayleigh entropy value obtained by taking the first D eigenvalues and the corresponding eigenvectors from the diagonal matrix D' after sequencing;

D _d for the diagonal matrix of the first D eigenvalues λ in the ordered diagonal matrix D', D < = n+m, E _d Is a feature vector matrix formed by the first d feature vectors in the ordered feature vector matrix E';

to phi _eca Constraint is carried out, and selection is carried out

Obtaining a d value; m is M _{Kernel entropy threshold} Is constant;

characteristic information Y _f Feature information Y comprising training samples _train And information characteristic Y of test sample _test ；

G _t ＝Y _train -Y _test (11)

G _h ＝Y _train -Y _h (12)

Y _h information features of the hot spot fault samples;

G _n ＝Y _train -Y _n (13)

Y _n information features of normal samples are pointed out;

F _test ＝tr((G _t ) ^T G _t ) (14)

tr () represents the trace calculation of the matrix;

G _a ＝Y _train1 -Y _train2 (15)

Y _train ＝[Y _train1 ,Y _train2 ]handle Y _train According to the following steps of 1:1 into two parts Y _train1 And Y _train2 ；

The test variables for the training samples are:

F _train ＝tr((G _a ) ^T G _a ) (16)